DALL-E 3 and Stable Diffusion XL 1.0 (SDXL 1.0) are two popular choices for AI art generation, but which one is the best?
They both have their pros and cons, for example I create almost all my AI images on NightCafe Studio using SDXL 1.0 and find the Stable Diffusion XL 1.0 models are easier to use when you are after consistency in style. Of course there’s a steep learning curve with NightCafe Studio due to the number of advanced options, but even for a newbie it’s usable.
In comparison DALL-E 3 has no advanced options, but tends to produce better looking images; it’s much easier to generate quality images with DALL-E 3 vs SDXL 1.0.
Which is best depends very much on what your goals are.
A Simple AI One Word Prompt Test
DALL-E 3 “Sinister” Text to Image Prompt Test
Using BING Designer (which uses DALL-E 3) I added the single word “Sinister” and ran the text to image prompt generator. Note: there are no advanced option with BING Designer.
Each text to image prompt run generates up to 4 images, (BING Designer tends to generate 4 DALL-E 3 images per attempt), so I had 16 AI generated images in total.
One image from each batch is included below.
Since we only used a single word, Sinister, and BING Designer has no advanced options, the images generated was mostly down to how the DALL-E 3 AI interpreted that single word. Which randomly selected settings did it use?
The first test result of a black and white photograph of a scary house and a shadow is a really good result. IMHO it’s certainly sinister. What do you think?
The second and fourth results are similar, apparently a wood in moonlight where the trees have no leaves is sinister.
The third result is a bit rubbish for sinister. Personally I don’t find the swirling ink particularly Sinister. Do you?
Overall DALL-E 3 gave some good results. I liked the scary house and the shadow the most.
SDXL 1.0 “Sinister” Text to Image Prompt Test
Using NightCafe Studio Creator which uses SDXL 1.0 I added the single word “Sinister” and ran the text to image prompt generator four times.
Note: NightCafe has a lot of advanced options, for this test I turned OFF Safe Mode (when on, Safe Mode avoids creating NSFW images which could have reduced how scary the images were) and turned OFF the Default Negative prompts (useful for reducing deformed hands, feet etc… and having signatures and text when you don’t want them: see later) and kept the other settings at their defaults.
All 4 Stable Diffusion XL 1.0 image results are below.
All four Stable Diffusion XL 1.0 generated images can be described as sinister. As you can see the first three results include text. With a one word text to image prompt we are leaving the decision of other settings down to the AI and model chosen.
All four generated images have a sinister feel to them IMHO. What do you think?
Result one looks like a cross between a wanted poster and a page from a graphic novel.
Results two and three are movie poster like.
Result four is the only one not including text and is a weird looking sinister monster.
If it wasn’t for the fact SDXL 1.0 is awful at generating text within images (it almost always misspells words: Stable diffusion 1.0 certainly won’t be winning any spelling bees) I might have gone for image 3. With all the misspellings my favorite is image four. What do you think?
Stable Diffusion XL 1.0 Negative Prompts
DALL-E 3 doesn’t have negative prompts. With DALLE-3 if you don’t want a “Cat with a hat” when generating cat images, you have to specifically state “no hats”. With Stable Diffusion XL 1.0 there are negative prompts, so to avoid cats with hats, adding “hat” or “hats” to the negative prompt will reduce the chances of hats being generated.
With NightCafe SDXL 1.0 I could run the Sinister text to image prompt again using Negative prompts. The Negative prompt option allows the inclusion of words NOT to be included. If we only wanted AI generated images without text (no movie posters), we might have negative prompts like these:
Reduce the chance of the AI generated image having text and artist signatures.
“text, signature”
Reduce the chance of the AI generated image having blood and bones.
“blood, bones”
Reduce the chance of the AI generated image having trees and moonlight.
“trees, moonlight”
Reduce the chance of the AI generated image including a beautiful woman.
“beautiful, woman”
Or you could have all the above and more.
“text, signature, blood, bones, trees, moonlight, beautiful, woman, boring, sepia, cute”
Remember everything added to the Negative prompt is what you do NOT want. If you added “ugly” to the negative prompt it’s similar to adding “beautiful” to the main prompt.
Just for fun I ran the Sinister prompt with the inclusion of the negative prompt: “text, signature, blood, bones, trees, moonlight, beautiful, woman, boring, sepia, cute”. The result was, well, terrible!
Despite adding “text” to the negative prompt the generated image still included text. I guess there wasn’t enough in the positive prompt (just one word) and too much going on in the negative prompt. I rarely run single word prompts, so this probably wouldn’t be a problem if the positive prompt was of a reasonable size.
Also don’t forget, AI image generators are new tech and they aren’t perfect (yet).