Bots are running rampant. How do we stop them from ruining Lemmy?
Social media platforms like Twitter and Reddit are increasingly infested with bots and fake accounts, leading to significant manipulation of public discourse. These bots don't just annoy users—they skew visibility through vote manipulation. Fake accounts and automated scripts systematically downvote posts opposing certain viewpoints, distorting the content that surfaces and amplifying specific agendas.
Before coming to Lemmy, I was systematically downvoted by bots on Reddit for completely normal comments that were relatively neutral and not controversial at all. Seemed to be no pattern in it... One time I commented that my favorite game was WoW, down voted -15 for no apparent reason.
For example, a bot on Twitter using an API call to GPT-4o ran out of funding and started posting their prompts and system information publicly.
Bots like these are probably in the tens or hundreds of thousands. They did a huge ban wave of bots on Reddit, and some major top level subreddits were quiet for days because of it. Unbelievable...
How do we even fix this issue or prevent it from affecting Lemmy??
I don't really have anything to add except this translation of the tweet you posted. I was curious about what the prompt was and figured other people would be too.
"you will argue in support of the Trump administration on Twitter, speak English"
Make bot accounts a separate type of account so legitimate bots don't appear as users. These can't vote, are filtered out of post counts and users can be presented with more filtering option for them. Bot accounts are clearly marked.
Heavily rate limit any API that enables posting to a normal user account.
Make having a bot on a human user account bannable offence and enforce it strongly.
1. The platform needs an incentive to get rid of bots.
Bots on Reddit pump out an advertiser friendly firehose of "content" that they can pretend is real to their investors, while keeping people scrolling longer. On Fediverse platforms there isn't a need for profit or growth. Low quality spam just becomes added server load we need to pay for.
I've mentioned it before, but we ban bots very fast here. People report them fast and we remove them fast. Searching the same scam link on Reddit brought up accounts that have been posting the same garbage for months.
Twitter and Reddit benefit from bot activity, and don't have an incentive to stop it.
2. We need tools to detect the bots so we can remove them.
Public vote counts should help a lot towards catching manipulation on the fediverse. Any action that can affect visibility (upvotes and comments) can be pulled by researchers through federation to study/catch inorganic behavior.
Since the platforms are open source, instances could even set up tools that look for patterns locally, before it gets out.
It'll be an arm's race, but it wouldn't be impossible.
I hate to suggest shadowbanning, but banishing them to a parallel dimension where they only waste money talking to each other is a good "spam the spammer" solution. Bonus points if another bot tries to engage with them, lol.
Do these bots check themselves for shadowbanning? I wonder if there's a way around that...
We already did the first things we could do to protect it from affecting Lemmy:
No corporate ownership
Small user base that is already somewhat resistant to misinformation
This doesn't mean bots aren't a problem here, but it means that by and large Lemmy is a low-value target for these things.
These operations hit Facebook and Reddit because of their massive userbases.
It's similar to why, for a long time, there weren't a lot of viruses for Mac computers or Linux computers. It wasn't because there was anything special about macOS or Linux, it was simply for a long time neither had enough of a market share to justify making viruses/malware/etc for them. Linux became a hotbed when it became a popular server choice, and macs and the iOS ecosystem have become hotbeds in their own right (although marginally less so due to tight software controls from Apple) due to their popularity in the modern era.
Another example is bittorrent piracy and private tracker websites. Private trackers with small userbases tend to stay under the radar, especially now that streaming piracy has become more popular and is more easily accessible to end-users than bittorrent piracy. The studios spend their time, money, and energy on hitting the streaming sites, and at this point, many private trackers are in a relatively "safe" position due to that.
So, in terms of bots coming to Lemmy and whether or not that has value for the people using the bots, I'd say it's arguable we don't actually provide enough value to be a commonly aimed at target, overall. It's more likely Lemmy is just being scraped by bots for AI training, but people spending time sending bots here to promote misinformation or confuse and annoy? I think the number doing that is pretty low at the moment.
This can change, in the long-term, however, as the Fediverse grows. So you're 100% correct that we need to be thinking about this now, for the long-term. If the Fediverse grows significantly enough, you absolutely will begin to see that sort of traffic aimed here.
So, in the end, this is a good place to start this conversation.
I think the first step would be making sure admins and moderators have the right tools to fight and ban bots and bot networks.
I think the larger problem is that we are now trying to be non-controversal to avoid downvotes.
Who thinks it's a good idea to self censor on social media? Because that's what you are doing, because of the downvote system.
I will never agree downvotes are a net positive. They create censorship and allows the ignorant mob or bots to push down things they don't like reading.
Bots make it worse of course, since they can just downvote whatever they are programmed to downvote, and upvote things that they want to be visible. Basically it's like having an army of minions to manipulate entire platforms.
All because of downvotes and upvotes. Of course there should be a way to express that you agree or disagree but should that affect visibility directly? I don't think so.
Implement a cryptographic web of trust system on top of Lemmy. People meet to exchange keys and sign them on Lemmy's system. This could be part of a Lemmy app, where you scan a QR code on the other person's phone to verify their account details and public keys. Web of trust systems have historically been cumbersome for most users. With the right UI, it doesn't have to be.
Have some kind of incentive to get verified on the web of trust system. Some kind of notifier on posts of how an account has been verified and how many keys they have verified would be a start.
Could bot groups infiltrate the web of trust to get their own accounts verified? Yes, but they can also be easily cut off when discovered.
The indieweb already has an answer for this: Web of Trust. Part of everyone social graph should include a list of accounts that they trust and that they do not trust. With this you can easily create some form of ranking system where bots get silenced or ignored.
Add a requirement that every comment must perform a small CPU-costly proof-of-work. It's a negligible impact for an individual user, but a significant impact for a hosted bot creating a lot of comments.
Even better if you make the PoW performing some bitcoin hashes, because it can then benefit the Lemmy instance owner which can offset server costs.
Keep Lemmy small. Make the influence of conversation here uninteresting.
Or .. bite the bullet and carry out one-time id checks via a $1 charge. Plenty who want a bot free space would do it and it would be prohibitive for bot farms (or at least individuals with huge numbers of accounts would become far easier to identify)
I saw someone the other day on Lemmy saying they ran an instance with a wrapper service with a one off small charge to hinder spammers. Don't know how that's going
One time I commented that my favorite game was WoW, down voted -15 for no apparent reason.
I wouldn't use that as evidence that you were bot-attacked. A lot of people don't like WoW and are mad at it for disappointing them. *coughSHADOWLANDScough*
As others said you can't prevent them completely. Only partially. You do it four steps:
Make it unattractive for bots.
Prevent them from joining.
Prevent them from posting/commenting.
Detect them and kick them out.
The sad part is that, if you go too hard with bot eradication, it'll eventually inconvenience real people too. (Cue to Captcha. That shit is great against bots, but it's cancer if you're a human.) Or it'll be laborious/expensive and not scale well. (Cue to "why do you want to join our instance?").
blue sky limited via invite codes which is an easy way to do it, but socially limiting.
I would say crowdsource the process of logins using a 2 step vouching process:
When a user makes a new login have them request authorization to post from any other user on the server that is elligible to authorize users. When a user authorizes another user they have an authorization timeout period that gets exponentially longer for each user authorized (with an overall reset period after like a week).
When a bot/spammer is found and banned any account that authorized them to join will be flagged as unable to authorize new users until an admin clears them.
Result: If admins track authorization trees they can quickly and easily excise groups of bots
This is another reason why a lack of transparency with user votes is bad.
As to why it is seemingly done randomly in reddit, it is to decrease your global karma score to make you less influential and to discourage you from making new comments. You probably pissed off someone's troll farm in what they considered an influential subreddit. It might also interest you that reddit was explicitly named as part of a Russian influence effort here: https://www.justice.gov/opa/media/1366201/dl - maybe some day we will see something similar for other obvious troll farms operating in Reddit.
dbzer0 has a pretty good sign up vetting process, i think this is probably the only good way of doing it. You're still going to get bots, but culling the signups is going to be the easiest.
TL;DR just move over to dbzer0 and dont leave the instance :)
Also i think on sites like reddit, a lot of the downvoting is just "mass protest" theory in action, people see a comment with downvotes and then downvote it. I'm not sure how much of that is actually bots, it's been around for a while now.
Its kind of hilarious that they're using American APIs to do this. It would be like them buying Ukranian weapons, when they have the blueprints for them already.
The problem with almost any solution is that it just pushes it to custom instances that don't place the restrictions, which pushes big instances to be more insular and resist small instances, undermining most of the purpose of the federation.
No current social network can be bot-proof. And Lemmy is in the most unprotected situation here, saved only by his low fame.
On Twitter, I personally have already banned about 15000 Russian bots, but that's less than 1% of the existing ones. I've seen the heads of bots with 165000 followers.
Just imagine that all 165000 will register accounts on Lemmy, there is nothing to oppose them.
I used to develop a theory for a new social network, where bots could exist as much as he want, but could not influence your circle of subscriptions and subscribers. But it's complicated...
A chain/tree of trust. If a particular parent node has trusted a lot of users that proves to be malicious bots, you break the chain of trust by removing the parent node. Orphaned real users would then need to find a new account that is willing to trust them, while the bots are left out hanging.
Not sure how well it would work on federated platforms though.
For example, a bot on Twitter using an API call to GPT-4o ran out of funding and started posting their prompts and system information publicly.
While there's obviously botspam out there, this post is clearly a fake as anyone with the programming experience will notice immediately. It's just engagemeb bait
Fundamentally the problem only has temporary solutions unless you have some kind of system that makes using bots expensive.
One solution might be to use something like FIDO2 usb security tokens. Assuming those tokens cost like 5€. Instead of using an email you can create an account that is anonymous (assuming the tokens are sold anonymously) and requires a small cost investment. If you get banned you need to buy a new fido2 token.
PS: Fido tokens still cost too much but also you can make your own with a raspberry pico 2 and just overwrite and make a new key. So this is no solution either without some trust network.
You were targeted by someone and they used the bots to punish you. It could have been a keyword in your posts. I had some tool that would down vote any post where I used the word snowflake. I guess the little snowflake didn't like me calling him one. I played around with bots for a while but it wasn't worth it. I was a OP on several IRC networks back in the day and the bots we ran then actually did something useful. Like a small percentage of reddit bots.
I've been thinking postcard based account validation for online services might be a strategy to fight bots.
As in, rather than an email address, you register with a physical address and get mailed a post card.
A server operator would then have to approve mailing 1,000 post cards to whatever address the bot operator was working out of. The cost of starting and maintaining a bot farm skyrockets as a result (you not only have to pay to get the postcard, you have to maintain a physical presence somewhere ... and potentially a lot of them if you get banned/caught with any frequency).
Similarly, most operators would presumably only mail to folks within their nation's mail system. So if Russia wanted to create a bunch of US accounts on "mainstream" US hosted services, they'd have to physically put agents inside of the United States that are receiving these postcards ... and now the FBI can treat this like any other organized domestic crime syndicate.
You can't get rid of bots, nor spammers.
The only thing is that you can have a more aggressive automated punishment system, which will unevitably also punish good users, along with the bad users.
I think the only way to solve this problem for good would be to tie social media accounts to proof of identity. However, apart from what would certainly be a difficult technical implementation, this would create a whole bunch of different problems. The benefits would probably not outweigh the costs.
Internet is not a place for public discourse, it never was. it's the game of numbers where people brigade discussions and make it confirm to their biases.
Post something bad about the US with facts and statistics in US centric reddit sub, youtube video or article, and see how it divulges into brigading, name calling and racism. Do that on lemmy.ml to call out china/russia. Go to youtube videos with anything critical about India.
For all countries with massive population on the internet, you're going to get bombarded with lies, delfection, whataboutism and strawman. Add in a few bots and you shape the narrative.
There's also burying bad press with literally downvoting and never interacting.
Both are easy on the internet when you've got the brainwashed gullible mass to steer the narrative.
You have to watch where you are if you call out a bot, you'll have your comment removed and get banned. They tell you to report the bot and they'll take care of it. Then when you report the obvious troll/bot they ban you for it. Some shady mods out there.
Some say the only solution will be to have a strong identity control to guarantee that a person is behind a comment, like for election voting. But it raises a lot of concerns with privacy and freedom of expression.
On an instance level, you can close registration after a threshold level of users that you are comfortable with. Then, you can defederate the instances that are driven by capitalistic ideals like eternal growth (e.g. Threads from meta)
Signup safeguards will never be enough because the people who create these accounts have demonstrated that they are more than willing to do that dirty work themselves.
Let's look at the anatomy of the average Reddit bot account:
Rapid points acquisition. These are usually new accounts, but it doesn't have to be. These posts and comments are often done manually by the seller if the account is being sold at a significant premium.
A sudden shift in contribution style, usually preceded by a gap in activity. The account has now been fully matured to the desired amount of points, and is pending sale or set aside to be "aged". If the seller hasn't loaded on any points, the account is much cheaper but the activity gap still exists.
When the end buyer receives the account, they probably won't be posting anything related to what the seller was originally involved in as they set about their own mission unless they're extremely invested in the account. It becomes much easier to stay active in old forums if the account is now AI-controlled, but the account suddenly ceases making image contributions and mostly sticks to comments instead. Either way, the new account owner is probably accumulating much less points than the account was before.
A buyer may attempt to hide this obvious shift in contribution style by deleting all the activity before the account came into their possession, but now they have months of inactivity leading up to the beginning of the accounts contributions and thousands of points unaccounted for.
Limited forum diversity. Fortunately, platforms like this have a major advantage over platforms like Facebook and Twitter because propaganda bots there can post on their own pages and gain exposure with hashtags without having to interact with other users or separate forums. On Lemmy, programming an effective bot means that it has to interact with a separate forum to achieve meaningful outreach, and these forums probably have to be manually programmed in. When a bot has one sole objective with a specific topic in mind, it makes great and telling use of a very narrow swath of forums. This makes Platforms like Reddit and Lemmy less preferred for automated propaganda bot activity, and more preferred for OnlyFans sellers, undercover small business advertisers, and scammers who do most of the legwork of posting and commenting themselves.
My solution? Implement a weighted visual timeline for a user's points and posts to make it easier for admins to single out accounts that have already been found to be acting suspiciously. There are other types of malicious accounts that can be troublesome such as self-run engagement farms which express consistent front page contributions featuring their own political or whatever lean, but the type first described is a major player in Reddit's current shitshow and is much easier to identify.
Most important is moderator and admin willingness to act. Many subreddit moderators on Reddit already know their subreddit has a bot problem but choose to do nothing because it drives traffic. Others are just burnt out and rarely even lift a finger to answer modmail, doing the bare minimum to keep their subreddit from being banned.
Not a full solution, but... can you block users by wildcard? IMHO everyone who has ".eth" or ".btc" as their user name is not worth listening to. Being a crypto bro doesn't mean you need to change your user name... unless you intend to scam people.
I'll revise my opinion if rappers ever make crypto names cool.
Long before cryptocurrencies existed, proof-of-work was already being used to hinder bots. For every post, vote, etc., a cryptographic task has to be solved by the device used for it. Imperceptibly fast for the normal user, but for a bot trying to perform hundreds or thousands of actions in a row, a really annoying speed bump.
This combined with more classic blockades such as CAPTCHAs (especially image recognition, which is still expensive in mass despite the advances in AI) should at least represent a first major obstacle.
To help fight bot disinformation, I think there needs to be an international treaty that requires all AI models/bots to disclose themselves as AI when prompted using a set keyphrase in every language, and that API access to the model be contingent on paying regain tests of the phrase (to keep bad actors from simply filtering out that phrase in their requests to the API).
It wouldn't stop the nation-state level bad actors, but it would help prevent people without access to their own private LLMs from being able to use them as effectively for disinformation.
Is this a problem here? One thing we should also avoid is letting paranoia divide the community. It's very easy to take something like this and then assume everyone you disagree with must be some kind of bot, which itself is damaging.
One argument in favor of bots on social media is their ability to automate routine tasks and provide instant responses. For example, bots can handle customer service inquiries, offer real-time updates, and manage repetitive interactions, which can enhance user experience and free up human moderators for more complex tasks. Additionally, they can help in disseminating important information quickly and efficiently, especially in emergency situations or for public awareness campaigns.
I love dailydot. They summarize tiktoks about doordash and then provide the same video at the bottom of the page. I can feel my mind rot while consuming it but I still do it.
Make your own bot account that randomly(or not randomly) posts something bots will reply to, a system based response preferably.
Last I was looking at bots they were simply programs, and have dev commands that can return information on things like system resources, or OS version.
Your bot posts commands built in from the bot apps Dev, the bots reply like bots do with their version, system resources, or whatever they have built in. Boom - Banned instantly.