GPT ImageAI Image GenerationPrompting

June 28, 2026

How to Prompt GPT Image Models

By Synthex

Image prompting gets much easier when you stop treating the prompt like a magic sentence.

A good image prompt is closer to a small creative brief. It tells the model what the image is for, what should be visible, what style it should follow, what must stay unchanged, and what should be avoided.

This guide is based on OpenAI's GPT Image Generation Models Prompting Guide. The practical idea is simple: the clearer the visual job, the easier it is for the model to make the right image without five messy retries.

You do not need to write huge prompts. You need to write prompts that make the important decisions visible.

What you'll learn

How to structure a good GPT Image prompt.
When to use low, medium, or high quality.
How to prompt for photorealistic images.
How to get cleaner text inside generated images.
How to edit images without drifting away from the original.
How to use multiple input images.
How to prompt for ads, infographics, product mockups, logos, and character consistency.
What to check before using GPT Image models in a production workflow.

What this is really about

Most weak image prompts fail because they leave too much unsaid.

For example:

Make a nice product photo.

The model has to guess:

What product?
What surface?
What lighting?
What camera angle?
What background?
What mood?
Is this for ecommerce, an ad, a pitch deck, or social media?
Should there be text?
Should the image feel realistic, premium, casual, playful, technical, or editorial?

A stronger prompt removes the important guesses:

Create a photorealistic ecommerce product photo of a matte black insulated water bottle on a pale stone surface.
Use soft morning window light from the left, a clean neutral background, realistic contact shadow, and a premium minimal composition.
The bottle should be centered, upright, fully visible, and sharply in focus.
No text, no logos, no hands, no extra objects.

That prompt is not complicated. It is just specific.

The basic prompt structure

Use this order when you are not sure how to start:

Create [type of image] for [use case].

Subject:
[Main thing in the image]

Scene:
[Where it is, what surrounds it, what the viewer sees]

Style:
[Photorealistic, illustration, 3D render, infographic, editorial, etc.]

Composition:
[Framing, angle, subject placement, negative space, layout]

Lighting and mood:
[Soft daylight, high contrast, warm studio light, calm, energetic, etc.]

Constraints:
[What to avoid, what must stay unchanged, what text must or must not appear]

You can write this as a paragraph or as labeled sections. For production work, labeled sections are easier to maintain because you can adjust one part without rewriting the whole prompt.

The five things that matter most

1. Say the intended use

The model needs to know what kind of image it is making.

An ad, infographic, UI mockup, product photo, children's book illustration, and logo all have different visual rules.

Weak:

Create an image of a coffee machine.

Better:

Create a vertical infographic that explains how an automatic coffee machine moves from beans to brewed coffee.
Show the bean hopper, grinder, water tank, boiler, pressure system, and cup as a clear labeled flow.

The second prompt tells the model the format, audience, and visual job.

2. Be specific about the subject

If the subject matters, describe it.

For objects, mention:

Material.
Shape.
Color.
Surface texture.
Size relationship.
Condition.
Important details.

For people, mention:

Framing.
Pose.
Gaze.
Clothing.
Action.
Scale.
What the person is interacting with.

Example:

Create a photorealistic candid image of an elderly sailor standing on a small fishing boat.
He has weathered skin, visible wrinkles, sun texture, and faded sailor tattoos.
He is calmly adjusting a fishing net while a small dog sits nearby on the deck.
Medium close-up at eye level, natural coastal daylight, honest unposed feeling.

Notice the phrase photorealistic. If you want something to feel like a real photograph, say that directly.

3. Control composition

Composition means where things sit in the image.

Good composition instructions include:

centered subject
full body visible
top-down view
wide landscape shot
close-up macro detail
negative space on the left
logo area top right
three panels stacked vertically

If the image needs to work as a social graphic, ad, product listing, or blog thumbnail, composition matters as much as style.

Example:

Create a 16:9 blog thumbnail.
Place the main object slightly right of center.
Leave clean negative space on the left for a headline overlay.
Use shallow depth of field and soft realistic shadows.

Without placement instructions, the model may make a beautiful image that is awkward to use.

4. Say what must stay unchanged

This matters most for edits.

If you are editing an existing image, separate the change from the invariants.

Use this pattern:

Change only [specific thing].
Preserve [identity, pose, background, camera angle, lighting, layout, text, product shape].
Do not change [important details].

Example:

Replace only the jacket with a dark green wool coat.
Preserve the person's face, body shape, pose, hairstyle, expression, skin tone, background, camera angle, and lighting.
The coat should fit naturally with realistic folds, shadows, and fabric texture.
Do not add accessories, logos, text, or background changes.

This is the part beginners often skip. The model needs to know not only what to change, but also what to protect.

5. Use constraints without turning the prompt into noise

Constraints are useful:

no watermark
no extra text
no unrelated logos
no cartoon style
do not change the background
preserve product label exactly

But a giant list of negatives can become hard to reason about.

Use constraints for the things that actually matter. If the output must preserve a label, identity, or layout, say that clearly. If the image just needs to be generally clean, keep the negative list short.

Choosing quality settings

GPT Image workflows usually involve a tradeoff between speed and fidelity.

Use this simple rule:

Setting	Use when
`low`	You need fast drafts, high-volume variants, internal previews, or quick experiments
`medium`	You want a balanced default for normal production exploration
`high`	You need dense text, polished infographics, close-up portraits, identity-sensitive edits, or final-quality output

For many workflows, start with low or medium, then move to high only when the image needs extra precision.

Do not use the highest setting automatically. It can be slower and more expensive. The useful question is:

Does this image need more fidelity, or does it need a better prompt?

Often, the better prompt matters more.

Prompting for text inside images

GPT Image models are much better at text than older image models, but text still needs careful prompting.

If text must appear in the image:

Put the exact text in quotes.
Say it should appear once.
Say it should be verbatim.
Describe placement.
Describe font style.
Ask for strong contrast.
Avoid asking for too much small text in one image.

Example:

Create a realistic billboard mockup for a shampoo brand on a highway at sunset.

Billboard text, exact and verbatim:
"Fresh and clean"

Typography:
Bold sans-serif, centered, high contrast, clean kerning, easy to read from a distance.

Constraints:
The text should appear once.
No extra words, no watermarks, no unrelated logos.

For tricky brand names or unusual spellings, spell the word out letter by letter in the prompt. For small text, dense layouts, or multi-panel designs, use a higher quality setting and expect to review closely.

Prompting for photorealism

For realistic images, do not only say "realistic."

Describe the image like a real photo:

Lighting.
Lens or viewpoint.
Surface texture.
Natural imperfections.
Framing.
Depth of field.
Material behavior.
Everyday detail.

Good photorealistic prompts often include phrases like:

photorealistic
real photograph
natural daylight
realistic contact shadows
subtle skin texture
worn material detail
unposed candid moment

Use camera language for the overall look, not as a guarantee of exact physics. A phrase like 50mm lens can help communicate a natural portrait feel, but it should not be treated as a precise simulation setting.

Prompting for infographics and diagrams

Infographics need structure more than mood.

Tell the model:

The topic.
The audience.
The layout.
The parts or steps.
The hierarchy.
Whether labels are needed.
How much detail should appear.

Example:

Create a vertical educational infographic for beginners explaining how a home espresso machine works.

Show the flow from water tank to boiler to pressure system to coffee puck to cup.
Use clear arrows, simple labels, clean sections, and a calm technical style.
Make it understandable for someone who has never opened a coffee machine before.

For dense infographics, use medium or high quality and keep the text concise. If the graphic needs exact wording, write the wording explicitly.

Prompting for ads and marketing images

Ad prompts work best when they read like a creative brief.

Include:

Brand or product context.
Audience.
Cultural feel.
Scene.
Composition.
Tagline.
Text constraints.
What should not appear.

Example:

Create a polished campaign image for a young streetwear brand called Thread.

Scene:
A group of friends hanging out after school in a city plaza, wearing relaxed contemporary streetwear.

Mood:
Energetic, natural, stylish, friendly, not overproduced.

Copy:
Include the tagline exactly once:
"Yours to Create."

Typography:
Clean, bold, legible, integrated into the ad layout.

Constraints:
No extra text, no watermarks, no unrelated logos.

The model can make better creative choices when it understands the campaign, not just the objects.

Prompting for edits

Editing is where prompt discipline matters most.

Use this pattern:

Edit the image to [specific change].
Change only [target].
Preserve [everything important].
Match [lighting, shadows, perspective, texture].
Do not [common failure modes].

Example:

Replace only the white dining chairs with warm natural wood chairs.
Preserve the room layout, table, camera angle, window light, floor shadows, wall color, and surrounding objects.
The new chairs should match the original perspective and cast realistic contact shadows.
Do not redesign the room.

The phrase replace only is useful. It reduces the chance that the model restyles the whole image when you wanted one contained change.

Prompting with multiple images

When using multiple input images, label them in the prompt.

Do not assume the model will infer which image is the product, which is the style reference, and which is the scene.

Use:

Image 1: the original room photo.
Image 2: the chair style reference.

Use Image 2 only as the chair style reference.
Replace the chairs in Image 1 with chairs in the style of Image 2.
Preserve the room, camera angle, lighting, table, walls, and floor from Image 1.

For compositing, specify:

Which object moves.
Which scene stays.
Where the object should go.
How scale, lighting, shadows, and perspective should match.
What should remain unchanged.

Use cases that work especially well

GPT Image models are useful across a wide range of workflows.

Use case	What to focus on
Product mockups	Preserve geometry, label clarity, shadows, and background intent
Virtual try-on	Preserve identity, pose, body shape, face, hair, and lighting
Logos	Ask for original, simple, scalable marks with strong silhouette
Infographics	Define structure, labels, audience, and hierarchy
Ads	Write like a creative brief with exact copy and audience context
Comics	Define panels, characters, sequence, and consistency
Interior edits	Change one object while preserving the room
Style transfer	Separate content to preserve from style to apply
Character consistency	Create a character anchor, then reuse it with strict continuity notes

The common thread is control. Say what the image should become, but also say what should not drift.

Common misunderstandings

"Longer prompts are always better"

No.

Long prompts can work, but only when they are organized. A short structured prompt is usually better than a long pile of adjectives.

"The model should know what I mean by premium"

Maybe, but it helps to define it.

Instead of only saying premium, describe the visible cues:

Clean composition.
Soft realistic shadows.
Tactile materials.
Balanced negative space.
Sharp product edges.
Controlled color palette.

"Edits only need the new thing"

No.

For edits, the preserve list matters as much as the change. If you do not say what should stay the same, the model may treat the whole image as flexible.

"Text rendering means I can fill the image with copy"

Not always.

Text works best when it is short, explicit, high contrast, and placed clearly. Dense layouts need stricter prompting and closer review.

"One perfect prompt should solve everything"

Not usually.

The better workflow is to start clean, review the output, then make one small change at a time. If the image drifts, restate the critical constraints.

What to do first

Start with this simple workflow:

Choose the use case: photo, ad, infographic, edit, mockup, logo, or character art.
Write the subject and intended use.
Add composition and lighting.
Add style or medium.
Add exact text only if text is needed.
Add constraints and preserve rules.
Generate at low or medium for exploration.
Move to high when detail, identity, or small text matters.
Review the output against the prompt.
Iterate with one clear change at a time.

Here is a reusable template:

Create [image type] for [use case].

Subject:
[What should be visible]

Scene:
[Where it is and what surrounds it]

Style:
[Photorealistic, illustration, 3D, infographic, ad campaign, etc.]

Composition:
[Framing, angle, placement, negative space]

Lighting and mood:
[Lighting direction, color, atmosphere]

Text:
[Exact quoted text, or "no text"]

Constraints:
[No logos, no watermarks, preserve X, change only Y]

If you are editing an image, add:

Preserve:
[Identity, pose, geometry, layout, camera angle, lighting, background, labels]

Change only:
[The specific thing to edit]

Final takeaway

Good GPT Image prompts are not about sounding artistic.

They are about making the visual task clear.

Name the use case. Describe the subject. Control the composition. Choose the quality level intentionally. Put exact text in quotes. For edits, say what changes and what stays unchanged. For multi-image workflows, label each input and explain how they relate.

The model can handle a lot, but it should not have to guess the parts that matter most.

How to Prompt GPT Image Models

What you'll learn

What this is really about

The basic prompt structure

The five things that matter most

1. Say the intended use

2. Be specific about the subject

3. Control composition

4. Say what must stay unchanged

5. Use constraints without turning the prompt into noise

Choosing quality settings

Prompting for text inside images

Prompting for photorealism

Prompting for infographics and diagrams

Prompting for ads and marketing images

Prompting for edits

Prompting with multiple images

Use cases that work especially well

Common misunderstandings

"Longer prompts are always better"

"The model should know what I mean by premium"

"Edits only need the new thing"

"Text rendering means I can fill the image with copy"

"One perfect prompt should solve everything"

What to do first

Final takeaway

Further reading