I'm not a lawyer, but I know tech companies run social media platforms to create data models about users for ad platforms. It seems to me that they could attempt to integrate themselves into a fediverse network and still harvest data, and not even provide services. So perhaps a software license could require that content posted to the platform by users is by default licensed under CC-BY-SA-NC or something that would prevent this.
CC-BY-SA-NC blocks adapting or republishing for commercial purposes. There is no general legal mechanism to stop a corporation from downloading your data and using it internally in whatever way they wish, although GDPR and the california equivalent CCPA give people some additional rights here. Anyway all of this is moot with the advent of LLMs hoovering up data from wherever they can and ignoring all licensing then blending it all together in the "training" process so it can't be deleted.
The fediverse is effectively the opposite of trying to limit the use of the data you post. If you aren't comfortable with the data being fully free and public, don't put it on the fediverse.
All it takes is a single "rogue" instance to pull down your content that doesn't abide by whatever rules you intended to apply to it.
I am a bit over my head on what ActivityPub is capable of, but I imagine you could use ActivityPub, for something like a GitHub clone, to distribute git commits pertaining to AGPL licensed source code across federated instances of the application. So I think it is not necessarily the case that everything on the fediverse is available for anyone to use for any reason. The hexbear terms of service forbids anyone to "Use the Services to harvest, collect, gather or assemble information or data regarding the Services or users of the Services except as permitted in these Terms or in a separate agreement with Hexbear". Of course anyone can violate the rules, especially if they bury it in the plausible deniability of a LLM. But should we just accept that and lay the user data out for big tech companies to leverage?
I know we're not really talking about Git, but since you mentioned a federated GitHub clone of sorts, Radicle is pretty neat and a cool thing to check out!