The 6-part prompt structure that works
Strong product photography prompts follow a predictable order. Front-load what matters most to the generator, save mood and styling for later. Reorder this and quality drops measurably.
| Order | Component | Example |
|---|---|---|
| 1 | Subject + count | "A single ceramic coffee mug" |
| 2 | Material & key details | "matte black glaze, cylindrical, no handle" |
| 3 | Camera setup | "shot at 50mm, slight overhead angle, shallow depth of field" |
| 4 | Lighting | "soft diffused light from upper left, subtle rim light" |
| 5 | Background & surface | "on a warm beige linen surface, blurred terracotta wall behind" |
| 6 | Output spec | "square 1:1 crop, e-commerce product photography" |
The reason this order works: generators weight tokens earlier in the prompt more heavily. If your subject and material come last, after a paragraph of mood description, the model will hallucinate object details to fit the vibe rather than the other way around.
Vocabulary that gets you photographic results
Generic adjectives like "beautiful," "professional," and "high quality" do almost nothing. The model has no anchor for what those mean for your product. Replace them with specific photographic terms that map to real-world reference images the model was trained on.
Vague (avoid)
- "Professional photo"
- "Nice lighting"
- "Clean background"
- "High quality"
- "Beautiful product shot"
- "Modern look"
Specific (use)
- "Studio product photography, catalog style"
- "Soft key light from 45 degrees, fill card on right"
- "Seamless white cyclorama, subtle ground shadow"
- "Sharp focus, fine surface texture visible"
- "Three-quarter angle, eye-level, centered composition"
- "Minimalist Scandinavian aesthetic, muted oat palette"
Borrow vocabulary from real photography. Terms like "octabox," "bounce card," "feathered light," "85mm," "f/8," and "tabletop product photography" tap into massive amounts of training data and produce far more consistent results than mood-board adjectives.
Negative prompts: what to exclude
Half the battle in product photography is preventing the AI from adding things you didn't ask for. Models love to invent hands, reflections, extra packaging, watermarks, and busy backgrounds. Negative prompts (where supported) save hours of regeneration.
A baseline negative prompt list for product photography:
"watermark, text, logo, signature, blurry, distorted, deformed, extra limbs, duplicate, low resolution, jpeg artifacts, oversaturated, cluttered background, props, hands holding product, mannequin parts, sticker, price tag"
Reference images beat words almost every time
If your generator supports image-to-image or reference input, use it. A single reference photo communicates more about your brand's aesthetic than 200 words of description ever will. Use words to describe what should change, not what should stay the same.
Workflow: shoot one reference image of your product (even a phone photo on a kitchen counter works), then prompt the model to recompose it with new lighting, background, or context. You'll spend less time describing your product's geometry and more time directing the scene.
Patterns that consistently fail
Some prompt patterns look reasonable but reliably produce garbage. After running thousands of generations across catalogs, these are the ones to avoid:
Stacking 6+ stylistic adjectives. "Cinematic, moody, dramatic, ethereal, dreamy, atmospheric, premium" — the model averages them into mush. Pick one or two anchor terms.
Asking for specific text or brand names on packaging. Diffusion models still struggle with legible typography. Add real text in post-production, not in the prompt.
Describing more than one product at once. "A bottle of shampoo and a bar of soap" will produce one warped hybrid. Generate each product separately, then composite.
Counting beyond 3. "Five lipsticks lined up" usually returns four or six. For multi-product shots, generate the base and add SKUs in compositing.
Iterate on one variable at a time. Lock your subject, material, and camera spec, then sweep through lighting variants. Lock lighting, sweep backgrounds. Treat the prompt like a controlled experiment — you'll converge on a repeatable formula for your brand within 20-30 generations.
A reusable template you can adapt today
Save this as a starting block and fill in the variables for any product shoot. Tested across apparel, beauty, electronics, and home goods.
[Subject + count], [primary material/finish], [secondary details]. Studio product photography, [angle: three-quarter / front / overhead] at [eye level / slight high angle], [focal length: 50mm / 85mm], sharp focus on [hero detail]. Lighting: [soft / hard] [key light position], [fill or rim notes]. Background: [seamless color / textured surface / contextual scene], [shadow style]. [Aspect ratio], e-commerce catalog style, photorealistic.
Filled-in example for a leather wallet:
"A single bifold wallet, full-grain tan leather with visible stitching, closed and standing slightly open. Studio product photography, three-quarter angle at eye level, 85mm, sharp focus on stitching and grain. Lighting: soft key light from upper left, subtle bounce fill on right, gentle ground shadow. Background: warm cream paper sweep, no props. 1:1 square, e-commerce catalog style, photorealistic."
If you're producing dozens or hundreds of variations from a template like this, platforms like Retouchable handle the prompt-to-asset pipeline so you can focus on the creative inputs rather than wrangling generations one at a time.