ELI5: How are these massive adblock lists kept updated so regularly?
Do they get some kind of real-time feed that tells them "hey this URL popped up in the web today, but it is a tracker, so block it", or is this exercise is mostly helped by the crowd ?
The first language I learned was Perl, so regex are very close to my heart. I'm also quite excitable when I drink (I'm a happy drunk), so ask me and I'll give you a very enthusiastic explanation while not noticing that you aren't interested in my detailed explanation and examples. Do it. I dare ya.
Zawinski’s second law - “Sometimes a person looks at a problem and says ‘I know what I’ll do, I’ll use regular expressions’. And now they have two problems.”
On the issue comment you can see maintainers @-ing each other to add things to upstream lists, so it's all one big community rather than being extension specific.
Maybe a general pop-up blocker which is built into modern browsers now, but something that blocks tracking and ads (for example uBlock Origin, AdGuard Home, PiHole...) works off of a list which is kept up to date by crowdsourcing. I've never contributed to one of these efforts, but there are lots of people dedicated to the cause.
The problem with this approach is that the companies will just change the way ads are shown. DNS blocking is impossible to stop, provides you block every ad website.
DNS blocking is easy to stop, you just host the ads on the same domain instead of putting them on a subdomain. There are plenty of ways to do this already. Only reason it works right now is that lots of them have their own separate ad domain that they host from.
As someone who runs a popular blocklist collection, I've come to find that most of the MASSIVE lists are people who collate a whole bunch of lists together and then promote their "one size fits all" solution alongside their donation link. There are very few original high quality ad-blocking lists maintained (where originality is defined as a sizeable amount of unique entries not shared by other lists) and almost all don't appear to openly discuss the magic sauce behind their lists, outside of the obvious case of user submissions.
I bet somebody has an army of virtual browsers loading popular sites every hour or so, grabbing screenshots of the page, and submitting the screenshots to an image model to evaluate if a user would think it’s an advertisement or spam or scam.
Then the positives go to a volunteer to confirm before it goes into the blocker list.