An update to Google's privacy policy suggests that the entire public internet is fair game for it's AI projects. If Google can read your words, assume they belong to the company now, and expect that they’re nesting somewhere in the bowels of a chatbot.
An update to Google's privacy policy suggests that the entire public internet is fair game for it's AI projects.
I'm guessing previously they indexed the data and didn't actually use it.
Nowadays they're using it directly themselves.
You could argue previously they were making money from Google ads but google was always ad free I believe. Results had sponsors for sure but that wasn't linked to your data
I'm guessing previously they indexed the data and didn't actually use it.
Nowadays they're using it directly themselves.
You could argue previously they were making money from Google ads but google was always ad free I believe. Results had sponsors for sure but that wasn't linked to your data
I'm guessing previously they indexed the data and didn't actually use it.
Nowadays they're using it directly themselves.
You could argue previously they were making money from Google ads but google was always ad free I believe. Results had sponsors for sure but that wasn't linked to your data
I don't see why this is a problem (apart from supposedly private data like email), it's not just Google that can do this, all this data is available to everyone for everyone who can use it to benefit. If you want to make Google pay for a publicly available good, tax them accordingly. That's the point of taxes: if you are successful enough to take advantage in any way from a country's public roads, education system, access to a labour market and a functioning society generally, taxing the massive profits from using that system is fair, not enclosing everything and holding access to the content we contributed hostage.
Yeah public data is public. If anyone doesn't want their shitty comments or whatever to be used for AI training then put it behind a login or something.
Except that’s not true, public posting of content does not trump copyright protection. Google using content for AI purposes is almost certainly a copyright issue. I may post content for human consumption but that does not mean I allow it to be used by a private corporation for profit purposes
If you want to make Google pay for a publicly available good, tax them accordingly.
Tax them where? In the US? But a lot of the content they scrape would be European. So does EU get to tax them for content scraped from EU users and US for content scraped from US users? Actually, how DO we define the locality of online content? By host server? Site owning company/person's legal location? Content poster's location?
Much as I'd love to see Google pay more taxes, I'm not sure how this would play out.
As long as they don’t present that data as their own, I am fine with it. But wait, that’s exactly what they’re doing.. I have a vision of a thousand lawsuits shoved down the throat of the mighty Alphabet.
I see a problem with it - just like there is a problem with all their data collection. They are taking our data without consideration nor compensation, and using it for their profitable commercial enterprise. They should be paying us for that data.
You can't build a car without paying for the nuts and bolts. Yet that is exactly what they've been doing, and they've become filthy rich doing it, at the expense of every one of us.
Why is AI scraping not respecting robots.txt? It wasn’t ok early internet days, so why is it ok now? People are complaining about being overloaded by scrapers like it’s the 90’s
Basically it's a file people put in their root directory of their domain to tell automated web crawlers what sections of the website and what kind of web crawlers are allowed to access their resources.
It isn't a legally binding thing, more of a courtesy. Some sites may block traffic if they're detecting the prohibited actions, so it gives your crawlers an idea of what's okay in order to not get blocked.
It’s a plain text file that is hosted on your site that should be visible to the internet. Basically allows/disallows scraping from search engines in your site.
Don't see a problem with it as long as they don't get copyright on the outputs of their AI. That would make enforcing any IP impossible on the internet because there's no way to prove it wasn't AI generated.
Not saying it's a comforting thought but that's one of several reasons why one doesn't post anything online if they aren't comfortable with outcomes such as this.