i feel kinda conflicted. on one hand i don't want AI Corps to make money of off others' work especially given that there is no attribution with image/language models. On the other hand, I believe no one has a real ownership over an infinitely reproduce able digital media. Its why NFT bros were laughed at with copy paste.
Obviously AI models don't have any real creativity like humans do. But don't humans make art by combining past experiences? If you look at an art, it will likely end up influencing the art you make. So is that really 'stealing'?
You can see that its not just artists but also record publishers who are suing AI Corps. They want to make AI Slop themselves but don't want others to have it.
The discourse around AI art is entirely too property brained* to ever arrive at an actual solution**, but "let's actively commit industrial sabotage by feeding the AI poisonous nonsense that breaks it" is about as good a material praxis as is possible right now. If it weren't for the material ramifications of giving everyone a magic slop generator that would actually be pretty good, except all that does in hellworld is enable grifters to make endless seas of low-effort slop to try to grift some money while businesses start cannibalizing themselves harder to replace support staff with dangerously wrong chatbots and artists with empty slop generators.
It's basically the same old conflict of the new automated machine sucking absolute shit in every practical way except for scalability and cost-per-unit. If people could throw their literal shoes into the works to break those, the least people can do now is throw a hypothetical digital shoe into its digital brain.
* It genuinely doesn't matter past the most immediate short term if corporations get their unlimited training data or have to build stables of training data that they own completely, because either way they get their proprietary slop generator and the harmful effects of AI generation continue unimpeded.
** AI generation needs to be a poison pill for an entire work's ownability regardless of the ownership status of the generators training data, and it needs to be an exacerbating factor when used for spam or fraud.
Well, training an AI with copyrighted works is pretty much the same as exposing people to art as you say. The issue is not that it violates copyright, which it doesn't, but that artists don't want their work to be used to train what is effectively a competitor.
On the other hand, I believe no one has a real ownership over an infinitely reproduce able digital media. Its why NFT bros were laughed at with copy paste.
If someone made it with their hands, they can do whatever they want with it. If they want it to make a special artwork for their friend, they have every right to get upset if they sell it and it becomes public. I don’t care about intellectual property laws. But people still have a right to do whatever they want with their work.
Anyway, AI art is a corporate tool now. It doesn’t matter if some random guy can produce original artwork or whatever. The point is that companies will still profit billions with this, and if it means restricting some random layman from creating art to preserve some control over your livelihood - which is quite measly compared to a traditional job - then so be it.
It's not that easy, since even if the neural network is trained to recognize poisoned images, you would need to remove the poisoned data from the image to be able to properly categorize it. Without the original nonpoisoned image or human intervention it's going to be exceedingly hard.
This is going to be an arms race, but luckily the AI has to find a few correct answers from a large pool of possibilities, whereas the poison has to just not produce the correct ones. This combined with the effort to retrain the models every time a new version of the poison pops up is going to keep the balance on the side of the artists at least for a while.
yeah but this is only going to work if every digital artist uses it. The amount of clean data will swamp this 'poisoned' (read: mislabeled) data and make it irrelevant. Honestly there is probably already more mislabeled data out there already than all the artists who ever pay for this would ever produce.
This seems like a 'take advantage of scared artists' grift to me, the artists version of one of those little boxes you plug in to protect your house from 5g.
It can at least protect individual artists from having their future work being made into a LORA, which is happening to basically every NSFW artist in existence at the moment.
I wouldn't be confident about that. Usually people training a LORA will be training the text encoder as well as the Unet that does the actual diffusion process. If you pass the model images that visually look like cats, are labeled as "a picture of a cat", and that the text encoder is aligned towards thinking is "a picture of a dog" (the part that Nightshade does), you would in theory be reinforcing what pictures of cats look like to the text encoder, and it would end up moving the vectors of "picture of a cat" and "picture of a dog" to where they are very well clear of each other. Nightshade essentially relies on being able to line up the Unet to the wrong spots on the text encoder, which shouldn't happen if the text encoder is allowed to move as well.
Alright, Hexbear resident AI poster here, and person who read the paper:
This works by attacking the part of a model that extracts features (think high-level concepts you can describe with words) from images. For Stable Diffusion, this is CLIP (v1.5), OpenCLIP (v2.1), or both (XL, for some reason), but other models can use different ones like DeepFloyd IF using T5 (which was tested in this paper). It takes a source image, and a generated image with features that are very different from that. It extracts the features from that second image, and perturbs the first image so that the image looks close to what it already looked like, and applies the minimum necessary alterations so that the feature extractor thinks it looks like something else.
The resulting image will look like it has artifacting on it, so it should be noticeable to anyone who is looking for adversarial noise manually. On larger datasets, nobody has time for that. It may ruin the look of images that are supposed to be flat colored (while also being easier to remove from those images by thresholding the image). This was a very common complaint with Glaze, and it seems that things have not improved on that front much.
As for how effectively this can be filtered or counteracted, I have some doubts about what the paper says about countermeasures. A lot of these are going to be speculation until they actually release a model, because currently nobody is able to generate Nightshade poisoned images to test with besides the authors. One of the methods they tested is checking CLIP similarity scores between captions and images in the dataset -- this makes the most sense as a countermeasure since it is the vector of the attack and is already something that is done to filter out poor-quality matches. They compared an attack where the wrong captions are given ("dirty-label") to one where they use their method. They claim that their CLIP filtering on the dirty label attack has 89% recall with a 10% false positive rate with the control data being a clean LAION dataset. I have worked with LAION. I filter and deduplicate any data that I pull before actually using it. I can say, from experience, that that 10% is most likely not a false positive rate -- LAION contains a lot of low quality matches, and it contains a lot of images that have been replaced with placeholders too. The threshold that I used last ended up dropping about 25% of the dataset. So when they say 47% recall and 10% FPR for doing the same thing to filter Nightshade images, I am inclined to believe they used a threshold that is too low. Notably, they do not disclose the threshold used, and clearly only tested one.
A second concern is that no form of image transformations attempting to remove the adversarial noise are covered. It's difficult to test things on this front without them releasing the model or a public demo of any sort, but I know some people who have had success in making AI-generated images pass as not AI generated by using an Open Image Denoise pipeline (where you add some amount of gaussian noise to the image then let a deep-learning based filter remove it). I do strongly suspect that this would work for removing the adversarial denoise, and I and probably others will try to test that out. There was a widely publicized 12-line python program that removed Glaze, so it's actually somewhat concerning that the authors wouldn't want to get ahead of speculation on this front. The result also doesn't need to look pretty if we're limiting the scope to filtering the dataset: probably one of the better ways to counteract it would be to discover some sloppy transformation that wipes out the noise and leaves the rest of the image recognizable, then see if that image has much difference with the original (potentially poisoned) image.
Third, it doesn't seem that they've covered what happens if CLIP is unfrozen during training. This isn't something you'd always be able to do (training from scratch, for example, will require that you have CLIP frozen so the diffusion component can get aligned to it, and I have noticed that CLIP can undergo some pretty severe damage if you make certain drastic changes to the model with it unfrozen), but it's pretty common practice for people training LoRAs or full finetunes of SD to unfreeze CLIP so that it can learn to interpret text differently. If you unfreeze CLIP, and you start passing it images of cats which to the current model look like "picture of a dog" (with probably some aspects of "picture of a cat"), then as you train the model you would be telling the model it is wrong when it treats the picture of a cat like a picture of a dog, and you would be updating the weights to differentiate those concepts better, and it should in theory render Nightshade ineffective over time. Again, this is not explored in the paper. It also isn't guaranteed to be without side effects, because I have seen damage done to CLIP on certain training regimens without people actively trying to damage it.
As a final note -- part of what enabled this attack to be developed was open-source AI models that can be run locally, where researchers can actually look at the individual components in some detail. Any attack like this is going to be far less effective on models like Midjourney, DALL-E 3, or Imagen because we only know what those companies disclose about them and have no way of even running then locally, let alone figuring out what something will do to training. I would be cautious about declaring this to be a victory, because a lot of developments like this have more potential to tilt things in favor of large AI companies than they do to slow down development of the whole field, and a lot of the larger companies are aware of this.