It’s probably Stable diffusion. I use comfyui since you can watch the sausage get made but there’s also other UIs like automatic1111. Originally for a qr pattern beautifier, there is a controlnet that takes a two tone black and white “guide” image. but you can guide it to follow any image you feed it. Such as a meme edited to be black and white, or text like “GAY SEX.”
This is done by combining a Diffusion model with ControlNet interface. As long as you have a decently modern Nvidia GPU and familiarity with Python and Pytorch it's relatively simple to create your own model.
I implemented this paper back in March. It's as simple as it is brilliant. By using methods originally intended to adapt large pre-trained language models to a specific application, the author's created a new model architecture that can better control the output of a diffusion model.
Fun fact: some people have a nature fetish and will dig holes in the ground or bore holes in trees so they can literally fuck the earth. I’m not judging, but now you have to share the burden of this knowledge.
So, using something like a hybrid illusion generator to make a pair of masks and providing different prompts for the highs/lows of each mask? https://charliecm.github.io/hybrid-image/