Skip Navigation
xkcd #2934: Bloom Filter
  • Well, yes and no. With a straight-up hash set, you're keeping set_size * bits_per_element bits plus whatever the overhead of the hash table is in memory, which might not be tenable for very large sets, but with a Bloom filter that has eg. ~1% false positive rate and an ideal k parameter (number of hash functions, see eg. the Bloom filter wiki article) you're only keeping ~10 bits per element completely regardless of element size because they don't store the elements themselves or even their full hashes – they only tell you whether some element is probably in the set or not, but you can't eg. enumerate the elements in the set. As an example of memory usage, a Bloom filter that has a false positive rate of ~1% for 500 million elements would need 571 MiB (noting that although the size of the filter doesn't grow when you insert elements, the false positive rate goes up once you go past that 500 million element count.)

    Lookup and insertion time complexity for a Bloom filter is O(k) where k is the parameter I mentioned and a constant – ie. effectively O(1).

    Probabilistic set membership queries are mainly useful when you're dealing with ginormous sets of elements that you can't shove into a regular in-memory hash set. A good example in the wiki article is CDN cache filtering:

    Nearly three-quarters of the URLs accessed from a typical web cache are "one-hit-wonders" that are accessed by users only once and never again. It is clearly wasteful of disk resources to store one-hit-wonders in a web cache, since they will never be accessed again. To prevent caching one-hit-wonders, a Bloom filter is used to keep track of all URLs that are accessed by users. A web object is cached only when it has been accessed at least once before, i.e., the object is cached on its second request.

  • xkcd #2934: Bloom Filter
  • Which example do you mean?

    If you meant my user ID example, you'd prepopulate the bloom filter with existing user IDs on eg. service startup or whatever, and then update the filter every time a new user ID is added – keeping in mind that the false positive rate will grow as more are added, and that at some point you may need to create a new filter with a bigger backing bit array

  • xkcd #2934: Bloom Filter
  • That's definitely not what they're most useful for. I mean, you probably can use a bloom filter for implementing spell check, but saying that's where they're most useful severely misses the point of probabilistic set membership queries.

    Bloom filters and their relatives are great when you have a huge set of values – eg. 100s of millions of user IDs in some database – and you want to have a very fast way of checking whether some value might be in that set, without having to query the database. Naturally this assumes that you've prepopulated a bloom filter with whatever values you need to be checking.

    If the result of the bloom filter query is "nope", you know that the value's definitely not in the set, but if the result is "maybe" then you can go ahead and double-check by querying the database. This means that the vast majority of checks don't have to hit that slow DB at all, and even though you'll get some false positives this'll still be much much much faster than having to go through that DB every time.

  • And now you get the bad ending
  • I stopped playing the game after I ran into the first bullet sponge boss, tried several times to beat it with my all-stealth character and realized I'd probably have to start over

  • The inside story of Elon Musk’s mass firings of Tesla Supercharger staff
  • Musk, the employees said, was not pleased with Tinucci’s presentation and wanted more layoffs. When she balked, saying deeper cuts would undermine charging-business fundamentals, he responded by firing her and her entire 500-member team.

    The dude's a petulant child. No wonder conservatives fawn over him.

  • You're all individuals!
  • And not just any Paul Atreides but the OG one.

    I know I'm supposed to hate the '84 Dune with a burning passion, but I just love it. Sure it's weird and campy in places, but that's why I like it.

  • I don't need any of that in my silly little life at all
  • This reeks of someone who uses the word "woke" unironically

    Edit: I couldn't help my curiosity. Turns out not only do they use the word "woke" unironically, they seem to think that grown men dating teenagers is A-OK, because of course they do:

    Imagine my utter lack of surprise.

  • Political Memes @lemmy.world hydroptic @sopuli.xyz
    I don't need any of that in my silly little life at all
    57
    Dune Memes @lemmy.blahaj.zone hydroptic @sopuli.xyz
    You're all individuals!
    3
    Dune Memes @lemmy.blahaj.zone hydroptic @sopuli.xyz
    I don't know about "narrow"
    1
    Political Memes @lemmy.world hydroptic @sopuli.xyz
    "His name is Mongo"
    15
    Timo Mallikas

    (ei originaali, löysin internetseistä)

    1
    Waltham, unknown model, early 20th century

    !

    This watch has been in my family for a while, and its story is pretty classic. I'm Finnish, and a branch of my family tree ended up migrating to the US in the 19th century. Some eventually came back, and one who had been a train dispatcher brought this watch with him.

    I just had it repaired and cleaned, so now it runs perfectly again and looks great.

    2
    The bird that came back from the dead by evolving twice [LiveScience]
    www.livescience.com The bird that came back from the dead by evolving twice

    The flightless Aldabra rail went extinct 136,000 years ago when its atoll home sank beneath the waves. Then it evolved again.

    The bird that came back from the dead by evolving twice

    It's great that this article linked to the original journal article. Nice that it's open access, too! So good to see that it's becoming more common. The academic publishing business is just so… well, in a word, fucked.

    2
    Hardware, Richard Stanley (1990)

    Hardware is a low-budget scifi horror movie that was the directorial debut of Richard Stanley – who is notable for being the initial director in the notorious Island of Dr. Moreau filmatization in the 90's – and starring Dylan McDermott and Stacey Travis.

    I wouldn't call it a good movie, exactly, but it's not terrible either and it definitely has its moments. Stanley's style is pleasantly weird, and the aesthetics are sometimes really on point.

    Here's the rest of my screenshots (sorry about the stupid rounded corners on all of them, my video player insists on including those in screenshots):

    !

    !

    !

    !

    4
    Political Memes @lemmy.world hydroptic @sopuli.xyz
    Landlords counting their money
    9
    InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)HY
    hydroptic @sopuli.xyz
    Posts 106
    Comments 958