Does anyone know of any off the shelf tool (online or offline) to find duplicates in several DNS blocklists and merge them into one?
Context: I am running AdGuard on one GL.iNet router with ~10 blocklists some of them pretty huge and most of the times the lists are updated the router comes to one halt while doing so, having to often times reboot it through the old power-off-and-on.
I would rather download the lists myself from time to time and merge them into one file but with duplicates extracted somehow.
If I'm understanding you correctly, you could make use of a shell script for this. Use WGET to download lists, then combine them into a single large file, and finally create a new file with no duplicates by using “awk '!visited[$0]++'”
I doubt you’ll find something off the shelf for this. I wrote a powershell script that deduplicates lists and also does a pass over the results to convert any blocks to CIDR notation. If you’re interested I’ll share it.
But honestly you could probably have ChatGPT whip this up for you in your language of choice. It’s pretty straightforward.
What you could do is use any text editor and manually combine the text files with something like notepad++ and deduplicate from there. (Notepad++ can do it natively)
Make sure you read what the different symbols mean with Wally’s blocklists before applying every blocklist. If you stick with the check-marked lists you should find that it blocks ads without too many false positives.
More blacklisted items doesn’t mean more items blocked; often time adding too many lists will break legitimate websites.