# Bloom filter

## Metadata
- Author: [[Wikimedia Foundation, Inc.]]
- Full Title: Bloom filter
- Category: #articles
- Summary: In computing, a Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set.
## Highlights
- [False positive](https://en.wikipedia.org/wiki/Type_I_and_type_II_errors) matches are possible, but [false negatives](https://en.wikipedia.org/wiki/Type_I_and_type_II_errors) are not – in other words, a query returns either "possibly in set" or "definitely not in set". ([View Highlight](https://read.readwise.io/read/01jr3atv2nc055k2jr03m79cf7))
- Bloom proposed the technique for applications where the amount of source data would require an impractically large amount of memory if "conventional" error-free [hashing](https://en.wikipedia.org/wiki/Hash_function) techniques were applied. ([View Highlight](https://read.readwise.io/read/01jr3awn20phvw3bhk1ksz2dev))
- An *empty Bloom filter* is a [bit array](https://en.wikipedia.org/wiki/Bit_array) of m bits, all set to 0. It is equipped with k different [hash functions](https://en.wikipedia.org/wiki/Hash_function), which map set elements to one of the m possible array positions. ([View Highlight](https://read.readwise.io/read/01jr3axv3wpqfjza91wy1wzrkb))
- To *add* an element, feed it to each of the k hash functions to get k array positions. Set the bits at all these positions to 1. ([View Highlight](https://read.readwise.io/read/01jr3az4hh6mst1b4j1bwkpcxs))
- Removing an element from this simple Bloom filter is impossible because there is no way to tell which of the k bits it maps to should be cleared. ([View Highlight](https://read.readwise.io/read/01jr3b062w754azq0ycq69a6sp))
- Bloom filters also have the unusual property that the time needed either to add items or to check whether an item is in the set is a fixed constant, O(k), completely independent of the number of items already in the set. ([View Highlight](https://read.readwise.io/read/01jr3hm477wcpsd1f1pj8jj754))
- Assume that a [hash function](https://en.wikipedia.org/wiki/Hash_function) selects each array position with equal probability. If *m* is the number of bits in the array, the probability that a certain bit is not set to 1 by a certain hash function during the insertion of an element is
1 − 1 m . {\displaystyle 1-{\frac {1}{m}}.}  ([View Highlight](https://read.readwise.io/read/01jr3b35pp3mt67yxgym5qwrnm))
- Now test membership of an element that is not in the set. Each of the *k* array positions computed by the hash functions is 1 with a probability as above. The probability of all of them being 1, which would cause the [algorithm](https://en.wikipedia.org/wiki/Algorithm) to erroneously claim that the element is in the set, is often given as
ε = ( 1 − [ 1 − 1 m ] k n ) k ≈ ( 1 − e − k n / m ) k . {\displaystyle \varepsilon =\left(1-\left[1-{\frac {1}{m}}\right]^{kn}\right)^{k}\approx \left(1-e^{-kn/m}\right)^{k}.} ![{\displaystyle \varepsilon =\left(1-\left[1-{\frac {1}{m}}\right]^{kn}\right)^{k}\approx \left(1-e^{-kn/m}\right)^{k}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/de73929baec5fd76dde95874189051648c635b1d) ([View Highlight](https://read.readwise.io/read/01jr3b23x9gejrd2ppvzzbjmx5))
- This is not strictly correct as it assumes independence for the probabilities of each bit being set. However, assuming it is a close approximation we have that the probability of false positives decreases as *m* (the number of bits in the array) increases, and increases as *n* (the number of inserted elements) increases. ([View Highlight](https://read.readwise.io/read/01jr3b2wz7dk0wje5x7w5sktq0))
- The number of hash functions, *k*, must be a positive integer. Putting this constraint aside, for a given *m* and *n*, the value of *k* that minimizes the false positive probability is
k = m n ln 2. {\displaystyle k={\frac {m}{n}}\ln 2.}  ([View Highlight](https://read.readwise.io/read/01jr3ctp87k2rtwd22p4gwbpwt))
- Some kinds of [superimposed code](https://en.wikipedia.org/wiki/Superimposed_code) can be seen as a Bloom filter implemented with physical [edge-notched cards](https://en.wikipedia.org/wiki/Edge-notched_card). An example is [Zatocoding](https://en.wikipedia.org/wiki/Zatocoding), invented by [Calvin Mooers](https://en.wikipedia.org/wiki/Calvin_Mooers) in 1947, in which the set of categories associated with a piece of information is represented by notches on a card, with a random pattern of four notches for each category. ([View Highlight](https://read.readwise.io/read/01jr3b3nwzr3sn46y2bchdf5pp))