Lemmy federation protocol: How is the "posts from all communities" view sourced?
I'm a newbie to ActivityPub so please be patient with me.
All intros into ActivityPub speak about how a user of a server A subscribes to a specific community from server B, and then server A will be informed about changes in that community.
But on lemmy it's possible to look at the posts of all communities. For a single concrete community it would be relatively easy: server A gets the request to serve the top post of a community on server B, so A simple asks B for the posts.
But there is also the "posts from all communities" tab on the lemmy front page. This opens questions:
Does each lemmy instance has a full copy of all posts of all communities? If this is true: How are new Instances discovered? Is each Instance distributing all updates to all other Instances?
If each lemmy instance has only a partial dataset (this theory is backed by [1]"Only if a least one user on your instance subscribes to the remote community, will the community send updates to your instance.") then how is the "all posts" view composed? is it in reality not "all" but only "all posts that at least one user of this instance is subscribed to"?
If this is the case: what happens if a bad actor subscribes to all communities of all servers? Is there a maximum number of subscriptions per user?
The source of those questions is, that I'm looking for a way to subscribe to all events of all lemmy instances, to be able to build statistics about upvotes, new posts, comments etc. There seems to be a similar API endpoint for mastadon [2] but nothing for lemmy?!
You can stop saying if. It is nearly certain that any instance only has a partial dataset in the same way that a search engine only indexes a partial dataset of every web page.
If this is the case: what happens if a bad actor subscribes to all communities of all servers?
There are bots that were built to do exactly that. I wouldn't call them bad actors unless the instance owner prohibited such actions.
so the instances only save the metadata/title of federated posts, but when a user wants to see the comments or content, then the other instances are queried for more details?
is it in reality not “all” but only “all posts that at least one user of this instance is subscribed to”?
Exactly this, yes. Not literally 'all' (a brand new instance would have nothing in its All feed). This is what was meant by 'partial data set' - everything for a subscribed community (from the moment it was subscribed to), but nothing for a community that no-one's subscribed to.
Some instances run bots to populated their All feed more than what would happen naturally (with the idea being that the bot unsubscribes when a human does)
i want to see how votes/comments accumulate over time on a post, therefore i would have to poll the "all" posts endpoint in a regular interval. but I would either see new posts with small number of comments/upvoted, or already upvoted post, or i would have to download all posts in a regular time interval which seems impossible to me.
Comments are also easy, the API allows pulling them by latest too. If I was writing a search engine, I would probably just track all known instances and just pull local content from them instead of deduplicating. I haven't really looked at how votes are federated though, so that might be more complicated to keep updated.
I expect just syncing posts and comments from all instances to be mostly easy. In the past I was able to pull all posts and comments from smaller instances in like less than 10 minutes. It's mostly just text so it doesn't take that long. After it's pulled, it can be kept mostly up to date by just pulling to the last date received, and should take much less time than the first sync.
I've noticed there's lots of stuff on Lemmy that fails to federate to other instances. I think there's also actually a 3000€ reward at the moment for improving federation, so if you spend very much time on it, it might be a good idea to see if it can be claimed. Though, I don't really know how the milestone system works, and it might only be available to inside contributors.