OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models.

www.businessinsider.com OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models.

In January, OpenAI launched a system for identifying AI-generated text. This month, the company scrapped it.

OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models.::In January, OpenAI launched a system for identifying AI-generated text. This month, the company scrapped it.

106

106 comments

Text written before 2023 is going be exceptionally valuable because that way we can be reasonably sure it wasn’t contaminated by an LLM.

This reminds me of some research institutions pulling up sunken ships so that they can harvest the steel and use it to build sensitive instruments. You see, before the nuclear tests there was hardly any radiation anywhere. However, after America and the Soviet Union started nuking stuff like there’s no tomorrow, pretty much all steel on Earth has been a little bit contaminated. Not a big issue for normal people, but scientists building super sensitive equipment certainly notice the difference between pre-nuclear and post-nuclear steel
The wording of every single article has such an anti AI slant, and I feel the propaganda really working this past half year. Still nobody cares about advertising companies, but LLMs are the devil.

Existing datasets still exist. The bigger focus is in crossing modalities and refining content.

Why is the negative focus always on the tech and not the political system that actually makes it a possible negative for people?

I swear, most of the people with heavy opinions don't even know half of how the machines work or what they are doing.
We built a machine to mimic human writing. There's going to a point where there is no difference. We might already be there.
Predictable issue if you knew the fundamental technology that goes into these models. Hell it should have been obvious it was headed this way to the layperson once they saw the videos and heard the audio.

We're less sensitive to patterns in massive data, the point at which we cant tell fact from ai fiction from the content is before these machines can't tell. Good luck with the FB aunt's.

GANs final goal is to develop content that is indistinguishable... Are we surprised?

Edit since the person below me made a great point. GANs may be limited but there's nothing that says you can't setup a generator and detector llm with the distinct intent to make detectors and generators for the sole purpose of improving the generator.
On the one hand, our AI is designed to mimic human text, on the other hand, we can detect AI generated text that was designed to mimic human text. These two goals don't align at a fundamental level
Permanently Deleted
So every accusation of cheating/plagiarism etc. and the resulting bad grades need to be revised because the AI checker incorrectly labelled submissions as "created by AI"? OK.
I mean, the entire goal of the technology was to create human-like text.
This just illustrates the major limitation of ML: Access to reliable training data. A machine that has no concept of internal reasoning can never be truly trusted to solve novel problems, and novel problems, from minor issues to very complex ones, are solved in a bunch of professions every day. That's what drives our world forward. If we rely too heavily on AI to solve problems for us, the issue of obtaining reliable training data to train future AI's will only expand. That's why I currently don't think AI's will replace large swaths of the work force, but to a larger degree be used as a tool by the humans in the workforce.
Relax, everybody. I have figured out the solution. We pass a law that all AI generated text has to be in Pig Latin or Ubbi Dubbi.
i wonder why Google is still not considering buying reddit and other forums where personal discussion takes place and most user base sort quality content free of charge. it has been established already that Google queries are way more useful when coupled with reddit
I wonder if AI generated texts (or speech) will impact our language. Kinda interesting thing to think about.
FWIW It's not clear cut if AI generated data feeding back into further training reduces accuracy, or is generally harmful.

Multiple papers have shown that generated images by high quality diffusion models with a proportion of real images in mix (30-50%) improves the adversarial robustness of the models. Similiar things might apply to language modeling.
OpenAI also financially benefits from keeping the hype training rolling. Talking about how disruptive their own tech is gets them attention and investments. Just take it with a grain of salt.
Permanently Deleted
I wonder if it was too many false positives, like when some tool said the US constitution was written by AI. Which seems quite logical considering that LLMs imitate humans very closely and cannot by themselves prevent hallucinations which is the best predictor of whether it was written by a person in good faith or not.
test test test
test test test
test
test test test
Permanently Deleted
If it could, it couldn’t claim that the content out produced was original. If AI generated content were detectable, that would be a tacit admission that it is entirely plagiarized.
The silver lining I suppose is they admit the AI has yet to become self aware so that's good.

You've viewed 106 comments.