DALL-E 3 is an amazing text to image creator, but short text prompts generate inconsistent results. What I mean by inconsistent is if you want 30+ AI generated images in the same style, it is highly unlikely to achieve this using short prompts.
The reason for this is obvious, if the AI prompt engineer doesn’t include the style etc… for the image to be created, DALL-E 3 will randomly select art styles, media, colors etc…
Example Short DALL-E 3 Prompt
Text to image prompt “Terrifying ghostly shadow vampire”
The image below shows 4 sets of 4 images (with DALL-E 3 on BING Designer it tends to generate 4 images each time you click create). You can see each set of 4 images used the same style etc…, but each new generation used a different style. There’s no consistency.
The prompt is only 4 words, it’s mostly descriptive terms. There’s no references to Artists, Colors, Art Movements, Styles, Mediums, Techniques, Photography Types, Design Tools, Referenced Communities, Culture, Genre etc…
We are basically asking for:
Subject: Shadow Vampire
Descriptive Terms: Terrifying, Ghostly
It’s not a lot for the AI to work with, so the AI algorithm will choose: Artists, Colors, Art Movements, Styles, Mediums, Techniques, Photography Types, Design Tools, Referenced Communities, Culture, Genre and possibly other Descriptive Terms.
Which means each time you generate a new set of DALL-E 3 images they will have no consistency. If all you are doing is playing around with AI image creation that might be fine, but if you wanted 30+ images for an eBook (I’ve created two eBooks using AI images, each requiring a few dozen images) you need consistency in the generated images.
I used CLIP Interrogator and CLIP Interrogator 2 to guess what DALL-E 3 used to generate the 4 sets of images. I’ve included one example from each set of 4 AI images.
I used the two CLIP image to text results to name the images, which if I wasn’t telling you what I was doing would make it look like I deliberately generated these specific styles.
DALL-E 3 Poster Art of a Winged Devilman
CLIP Interrogator Guessed Text to Image Prompt
“a silhouette of a demon standing in front of a castle, poster art by Anne Stokes, featured on deviantart, gothic art, gothic, goth, 2d game art”
CLIP Interrogator 2 Guessed Text to Image Prompt
“a bat flying over a castle with a full moon in the background, poster art, inspired by Dan Mumford, shutterstock, full body savage devilman, vampire fashion, winged human, batman t shirt”
DALL-E 3 Photograph of a Man in a Vampire Costume with a Scary Shadow
CLIP Interrogator Guessed Text to Image Prompt
“a man in a dracula costume standing in front of a wall, a character portrait by Dirk Crabeth, trending on deviantart, gothic art, behance hd, antichrist, gothic”
CLIP Interrogator 2 Guessed Text to Image Prompt
“a man dressed as a vampire with a rose in his hand, a picture, shutterstock, gothic art, shadows screaming, wallpaper – 1 0 2 4, count dracula, photo from a promo shoot”
DALL-E 3 Gothic Portrait of a Male Vampire with Red Eyes
CLIP Interrogator Guessed Text to Image Prompt
“a scary looking man with red eyes and a bloody face, a digital rendering by Anne Stokes, deviantart, gothic art, gothic, goth, black background”
CLIP Interrogator 2 Guessed Text to Image Prompt
“a close up of a person’s face with a full moon in the background, a picture, gothic art, long fangs, with long hair and piercing eyes, male vampire, iron maiden”
DALL-E 3 Black and White Photo of a Terrifying Ghostly Woman with Long Claws
CLIP Interrogator Guessed Text to Image Prompt
“a ghostly woman walking down a dark street, concept art by Ben Templesmith, deviantart contest winner, gothic art, creepypasta, lovecraftian, demonic photograph”
CLIP Interrogator 2 Guessed Text to Image Prompt
“a black and white photo of a woman in a ghost costume, a black and white photo, by Darek Zabrocki, pixabay contest winner, long claws, in an alley at night back lit, nightmare render, elden ring monster”
Although CLIP Interrogator and CLIP Interrogator 2 aren’t exactly accurate (if you ran the prompts above I doubt they’d look like the test images), it still gives an indication of the hidden settings DALL-E 3 used.
Example Long DALL-E 3 Prompt
One more set of test images, but this time we’ll go with a longer prompt and select various settings.
Subject: Shadow Vampire
Descriptive Terms: Terrifying, Ghostly, Horror
Artists: Greg Rutkowski, Tim Burton
Colors: contrasting colors
Art Movements: action painting
Styles: photorealism
Mediums/Techniques: ink drawing, stipple
Photography Types: 1900s photograph
Design Tools: Unreal Engine 5
Referenced Communities: deviantart
Culture/Genre: dark academia
Turn it into a DALL-E text to image prompt.
“Terrifying, Ghostly, Shadow Vampire by Greg Rutkowski, Tim Burton. Horror, contrasting colors, action painting, photorealism, ink drawing, stipple, 1900s photograph, Unreal Engine 5, deviantart, dark academia”
Some of the terms are contradictory, but we will see what DALL-E 3 makes of this.
BING Designer generated a content warning; I forgot they have blocked Tim Burton! Yes, you can’t run text to image prompts with the phrase “Tim Burton” on BING Designer!
The truly annoying part is there’s no information about what in the prompt was the problematic word/phrase. Which means you have to start guessing what out of a 25+ word prompt might trigger the woke programmers working for Microsoft!
“Donald Trump” or “Joe Biden” for example generates the warning below:
Content warning
This prompt has been blocked. Our system automatically flagged this prompt because it may conflict with our content policy. More policy violations may lead to automatic suspension of your access.
If you think this is a mistake, please report it to help us improve.
Let’s try again without Tim Burton:
DALL-E 3 longer descriptive text to image prompt: “Terrifying, Ghostly, Shadow Vampire by Greg Rutkowski. Horror, contrasting colors, action painting, photorealism, ink drawing, stipple, 1900s photograph, Unreal Engine 5, deviantart, dark academia”
BING Designer did not like this prompt, probably because it’s horror and BING Designer blocks a lot of NSFW type content (blood and gore for example). On one run it generated the “Unsafe image content detected” with a link to their content guidelines. Presumably all 4 images were unsafe!
I ran the longer prompt 7 times to generate just 12 images: I assume 16 of the 28 generated images were deemed unsafe, so not published. Makes working with DALL-E 3 via BING Designer truly frustrating.
You can see from the image above the AI image results are a lot more consistent.
I ran the image below through CLIP Interrogator and CLIP Interrogator 2.
CLIP Interrogator Guessed Text to Image Prompt
“a drawing of a man in a top hat and trench coat, a comic book panel by Todd Lockwood, featured on deviantart, gothic art, creepypasta, official art, darksynth”
CLIP Interrogator 2 Guessed Text to Image Prompt
“a black and white drawing of a man in a top hat, concept art, inspired by J. J. Grandville, gothic art, zombie reaching out of a grave, dark wallpaper, martin ansin, grinning sinisterly”
The results don’t have much in common with the actual text to image prompt I used. Recall the actual prompt was “Terrifying, Ghostly, Shadow Vampire by Greg Rutkowski. Horror, contrasting colors, action painting, photorealism, ink drawing, stipple, 1900s photograph, Unreal Engine 5, deviantart, dark academia”
CLIP Interrogator only managed to determine one term accurately: deviantart.
I had no idea what creepypasta was referring to. From Wikipedia: “A creepypasta is a horror-related legend which has been shared around the Internet. The term creepypasta has since become a catch-all term for any horror content posted onto the Internet.”
I ran the CLIP Interrogator 2 image to text prompt result through DALL-E 3 and this is the result.
DALL-E 3 prompt: “a black and white drawing of a man in a top hat, concept art, inspired by J. J. Grandville, gothic art, zombie reaching out of a grave, dark wallpaper, martin ansin, grinning sinisterly”
Although they are similar subjects, man in top hat, black and white drawing… the style isn’t like the others using the original prompt. Fun to play around with though.