Yeah, it's pretty normal for image generation. It has no real understanding of language. All it knows is, "This shape is often near this shape." Honestly, it's surprising that nearly all the shapes in these are actually letters, even if they're in the wrong order. You'll often see half of a letter missing, two letters merged, or other weird letter-like shapes.
Why not just make it generate the image and you add in the text. It's far simpler than generating the entire image yourself, the text alone is like an hour, tops, in Photoshop.