Hi there,
I'm curious to know other people's approach in working with Stable Diffusion. I'm just a hobbyist myself and work on creating images to illustrate the fictional worlds I'm building for fun.
However, I find that getting very specific images (that are still visually pleasing) is really difficult.
So, how do you approach it? Are you trying to "force" your imagined picture out by making use of control net, inpainting and img2img? I find that this approach usually leeds to the exact image composition I'm after but will yield completely ugly pictures. Even after hours of inpainting the best I can get to is "sorta ok'ish", surely far away from "stunning". I played around with control net for dozens of hours already, experimenting with multi-control, weighting, control net only in parts of the image, different starting and ending steps, ... but it's only kinda getting there.
Now, opposed to that, a few prompts can generate really stunning images, but they will usually only vaguely resemble what I had in mind (if it's anything else than a person in a generic pose). Composing an image by only prompts is by no means easier/faster than the more direct approach mentioned above. And I seem to always arrive at a point where the "prompt breaks". Don't know how to describe this, but in my experience when I'm getting too specific in prompting, the resulting image will suddenly become ugly (like architecture that is too closely described in the prompt having all wrong angles suddenly).
So, how to you approach image generation? Do you give a few prompts and see what SD can spit out with that? Taking delight in the unexpected results and explore visual styles more than specific image compositions? Or are you trying to be stubborn like me and want to use it as a tool for illustrating imagination - which at the latter it doesn't seem nearly as good at as at the former.
I usually let it do what it wants more, in the interest of good looking outputs. For more complex things I use a combination of the things you describe - prompt and controlnet tweaks - and img2img. You can let the original generation be ugly and overbaked, but as long as it has the right composition, you can then send it through img2img with a reduced prompt based more around style than composition. Or if you're really having the trouble getting the composition you want, you can even make a sketch or rough edit of it, then run that through img2img.
You're probably aware but it's worth mentioning: CFG scale and the model you use have a huge impact on overbaking (i.e. the ugly over+contrasted look with weird artifacts that happens when there's too detailed of a prompt). Any model trained to do something in particular will be much more prone to this; Deliberate v2 is my preferred model for how flexible it is, it takes a lot to get overbaked outputs. Also, lowering the CFG reduces overbaking risk a lot, and while it does add more 'randomness' it can sometimes be worth it. All about balancing it with your prompt.
Protip - If an image is good but not quite perfect, stick to the same seed and use the X/Y script to run the image lots of times at different CFG levels.
A lot of the time I try to just let images come out as the AI imagines them - Just running img2img prompts, often in big batches, then picking the pictures that best reflect what I wanted.
But I do also have another process when I want something specific, which involves doing img2img to generate a pose and general composition, flipping that image into both a controlnet (for composition) and a segmentanything mask (for latent couple) and then respinning the same image with the same seed with those new constraints. When you run with the controlnet and the mask you can turn the CFG way down (3 or 4) but keep the coherence in the image so you get much more naturalistic outputs.
This is also a good way to work with LORAs that are either poorly made or don't work well together - The initial output might look really burned, but when you have the composition locked in you can run the LORAs at much lower strength and with lower CFG so they sit together better.
I’ve found that AI generation is good if you have a vague idea of what you want, but it can be frustrating if have something specific in mind. I try to approach generation with that in mind: I’ll plan out the large points of what I want but keep an open mind on the finer details.
If I wanted to generate a more specific image, I would first try to do a sketch in another program and then feed that into ControlNet. I haven’t actually done this though since I’m usually able to get something close enough that I can work with.
A lot of trial and error. I dont jump around from model to model though. I tend to stick to one, unless Im trying to get a drastically different style (like realism vs anime). But mostly just starting with a basic prompt, adding in words or phrases for things I want to add, and just a lot of renders till I get something I like.
I usually set ut with a vague goal in mind, and play around from there. Often inspired by something seen on civitai, reddit, etc, and then explore variations. Basing it on others' prompts is very useful for learning, as you will quickly know if you have the "knobs" in sensible positions to start with. If it's nowhere close to the reference output, you can figure out where you're going wrong.
Then, once you hit something that inspires you, dial in the prompt by keeping the seed the same. Dial in the cfg and clip skip by doing x/y matrix and see where the interesting areas are. Once all of them are nailed in, let it iterate over seeds to your heart's content.