DALL-E 3 vs SDXL 1.0 Prompt Interpretation

SDXL 1.0 had a problem interpreting the text to image prompt for a tornado best with sharp teeth and claws, while DALL-E 3 had no problems.

Text to Image Prompt: “A Tornado beast. The tornado is a beast made from black clouds, a black tornado beast with claws and sharp teeth, fills the entire space, feelings of fear, feeling of horror, a devastated city and people running away in fear. Epic cinematic brilliant stunning intricate meticulously detailed dramatic atmospheric maximalist digital matte painting.”

With Stable Diffusion the settings where left at their defaults with no SDXL model selected and no negative prompt.

You can see from the image, SDXL 1.0 failed to interpret the prompt correctly. Stable Diffusion XL basically generated a city experiencing a devastation tornado: they are good creations, but NOT what the prompt asked for. There’s very little indication of a “Tornado beast” or “a black tornado beast with claws and sharp teeth”.

In comparison, DALL-E 3 interpreted the prompt accurately. All 4 DALL-E creations depicted a tornado beast with sharp teeth and claws. DALL-E also better depicted people running away in fear on 3 of the 4 images, whilst SDXL only had 2 images with people and it didn’t depict them “running away in fear”.

DALL-E 3 succeeded, where SDXL 1.0 failed.

