Instance Admins: Check Your Instance for Vote Manipulation Accounts [PSA]
Over the past 5-6 months, I've been noticing a lot of new accounts spinning up that look like this format:
https://instance.xyz/u/gmbpjtmt
https://instance.xyz/u/tjrwwiif
https://instance.xyz/u/xzowaikv
What are they doing?
They're boosting and/or downvoting mostly, if not exclusively, US news and politics posts/comments to fit their agenda.
Edit: Could also be manipulating other regional news/politics, but my instance is regional and doesn't subscribe to those which limits my visibility into the overall manipulation patterns.
What do these have in common?
Most are on instances that have signups without applications (I'm guessing the few that are on instances with applications may be from before those were enabled since those are several months old, but just a guess; they could have easily just applied and been approved.)
Most are random 8-character usernames (occasionally 7 or 9 characters)
Most have a common set of users they're upvoting and/or downvoting consistently
No posts/comments
No avatar or bio (that's pretty common in general, but combine it with the other common attributes)
Update: Have had several anonymous reports (thanks!) that these users are registering with an @sharklasers.com email address which is a throwaway email service.
What can you, as an instance admin, do?
Keep an eye on new registrations to your instance. If you see any that fit this pattern, pick a few (and a few off this list) and see if they're voting along the same lines. You can also look in the login_token table to see if there is IP address overlap with other users on your instance and/or any other of these kinds of accounts.
You can also check the local_user table to see if the email addresses are from the same provider (not a guaranteed way to match them, but it can be a clue) or if they're they same email address using plus-addressing (e.g. user+whatever@email.xyz, user+whatever2@emai.xyz, etc).
Why are they doing this?
Your guess is as good as mine, but US elections are in a few months, and I highly suspect some kind of interference campaign based on the volume of these that are being spun up and the content that's being manipulated. That, or someone, possibly even a ghost or an alien life form, really wants the impression of public opinion being on their side. Just because I don't know exactly why doesn't mean that something fishy isn't happening that other admins should be aware of.
Who are the known culprits?
These are ones fitting that pattern which have been identified. There are certainly more, but these have been positively identified. Some were omitted since they were more garden-variety "to win an argument" style manipulation.
These all seem to be part of a campaign. This list is by no means comprehensive, and if there are any false positives, I do apologize. I've tried to separate out the "garden variety" type from the ones suspected of being part of a campaign, but may have missed some.
Edit: If you see anyone from your instance on here, please please please verify before taking any action. I'm only able to cross-check these against the content my instance is aware of.
After digging into it, we banned the two sh.itjust.works accounts mentioned in this post. A quick search of the database did not reveal any similar accounts, though that doesn't mean they aren't there.
My bachelor's thesis was about comment amplifying/deamplifying on reddit using Graph Neural Networks (PyTorch-Geometric).
Essentially: there used to be commenters who would constantly agree / disagree with a particular sentiment, and these would be used to amplify / deamplify opinions, respectively. Using a set of metrics [1], I fed it into a Graph Neural Network (GNN) and it produced reasonably well results back in the day. Since Pytorch-Geomteric has been out, there's been numerous advancements to GNN research as a whole, and I suspect it would be significantly more developed now.
Since upvotes are known to the instance administrator (for brevity, not getting into the fediverse aspect of this), and since their email addresses are known too, I believe that these two pieces of information can be accounted for in order to detect patterns. This would lead to much better results.
In the beginning, such a solution needs to look for patterns first and these patterns need to be flagged as true (bots) or false (users) by the instance administrator - maybe 200 manual flaggings. Afterwards, the GNN could possibly decide to act based on confidence of previous pattern matching.
This may be an interesting bachelor's / master's thesis (or a side project in general) for anyone looking for one. Of course, there's a lot of nuances I've missed. Plus, I haven't kept up with GNNs in a very long time, so that should be accounted for too.
Edit: perhaps IP addresses could be used too? That's one way reddit would detect vote manipulation.
[1] account age, comment time, comment time difference with parent comment, sentiment agreement/disgareement with parent commenters, number of child comments after an hour, post karma, comment karma, number of comments, number of subreddits participated in, number of posts, and more I can't remember.
I just had a look at https://lemy.lol/, and they have email verification enabled, so it's not just people finding instances without email check to spam account on there.
I think what we need is an automated solution which flags groups of accounts for suspect vote manipulation.
We appreciate the work you put into this, and I imagine it took some time to put together. That will only get harder to do if someone / some entity puts money into it.
As an end user, ie. not someone who either hosts an instance or has extra permissions, can we in anyway see who voted on a post or comment?
I'm asking because over the time I've been here, I've noticed that many, but not all, posts or comments attract a solitary down vote.
I see this type of thing all over the place. Sometimes it's two down votes, indicating that it happens more than once.
I note that human behaviour might explain this to some extent, but the voting happens almost immediately, in the face of either no response, or positive interactions.
But this is SOO tedious. The annoying bit is it could just be one person who set it up over a weekend, has a script that they plug into when wanting to be a troll, and now all admins/mods have to do more work.
You're fighting the good fight! So annoying that folks are doing it on freaking lemmy.
Is there any existing opensource tool for manipulation detection for lemmy? If not we should create one to reduce the manual workload for instance admins
@ptz@dubvee.org I have cleaned these and some other bot accounts from my instance. I was ok to open registrations to this point because we were able to get reports for almost every activity and we could easily manage them. But unfortunately Lemmy does not have a regulatory mechanism for votes, so I'll keep it manual approval until then.
Also it looks like they're manually creating accounts since we had captcha + email approval in our instance from the beginning. So this means that even with manual approvals, a botnet can be created â just in a delayed manner.
I have a manual process for admitting people, do I need to do anything if I know exactly who is on my instance, or do I need to do anything to protect my instance from other bad acting instances (beyond defederating, which I do when I notice a lot of spam). Any queries you recommend?
Another data point in favor of supporters of Dead Internet Theory .
Also, this is one more example of why it would be better if instances charged a little bit from everyone: spammers will rather run things from their own machines (or some illegal botnet) than paying something with a credit card.
Lemmy should have the option to defederate from instances depending on automated criteria. Sign ups without admin checks are a great attribute to use for defederation, because it leads to such abuse. I've finally blocked most communities and instances that have news about US politics and have a clean feed, but for newcomers, that shit is everywhere.
Users could also be doing and reporting the checking up - if votes were transparent - and they would be able to do it on far wider scale. Oh those leopards, eating your faces, vote obfuscation proponents.
Lemmy should do something like make captcha and email verification the default in the next version, and reject federation from anyone with a lower version. If we accept federation from any instance where this was never turned on, banning accounts one by one is worse than Sisyphean. They'll just keep finding more vulnerable instances that are already trusted and abuse them to spam the rest of the fediverse.
If admins want to manually turn it off, then they should be prepared to manage that.
It's painfully obvious lemmy is overrun with astroturf. Kamala spam has been oppressive and it's just cringe most of the time. I refuse to believe that most of the real users here are that cringe. Also, I support Kamala.