Visualization of the Bloom Filter Concept
Visualization of the Bloom Filter Concept2023 P. Walther

Bloom Filter as Tool to Organize Complex and Big Geospatial Data

Project Description

Bloom filters are probabilistic data structures used to model sets of data points. Their advantage lies in the efficient evaluation of reoccurences in data streams. So far these are mainly used in network routing tasks for relatively small data sets. With this project we want to explore how they can be used in complex and big data environments, such as big geo data sets.

Project Goals

To evaluate and extend the current Bloom Filter usage, we pursue three goals:

For the first project goal, we want to break down barriers that currently make it difficult to use Bloom filters for particularly large data sets. There are tight constraints on the configuration of Bloom filters, for example, the number of hash functions must currently be an integer, the length of the filters should be a power of two, and queries usually have equal error probability. Furthermore, there is little use of Bloom filter data structures beyond simple element testing. In this context, based on a large body of prior work, we aim to conduct a systematic investigation in this project, significantly expanding the possible usage scenarios for the Bloom filter.

In a second goal, we want to implement a benchmarking environment in which the theoretical developments can be evaluated in light of actual hardware. This will include experiments in the context of specialized hardware (e.g., FPGAs and GPUs) to exemplify possibilities for future development.

A third project goal deals with the use of Bloom filters as a data structure for complex geodata. In particular, we are concerned with sparse 2D and 3D data from geoinformatics.

Acknowledgement

This project is kindly supported by the DFG (German Research Foundation). Project-Number: 507196470


© 2020 M. Werner