Adam Gakhov

SE Radio 358: Probabilistic Data Structure for Big Data Problems

Andrii Gakhov, author of the book Probabilistic Data Structures and Algorithms for Big Data Applications talks about probabilistic data structures and their application to the big data domain. Host Robert Blumen spoke with Dr. Gakhov about how probabilistic data structures differ from their exact counterparts; hash functions – cryptographic and non-cryptographic; space versus accuracy tradeoffs; space versus processing time tradeoffs; the main problem domains: membership testing, cardinality, frequency, similarity and rank.  Bloom Filters for membership testing: performance characteristics, use cases, design patterns using Bloom Filters for lookup problems; and how they are implemented.  LinearCount and HyperLogLog for cardinality: use cases web applications, implementation.  CountMinSketch for frequency estimation.  Existing library support.  Should PDS be taught in beginning courses?

Show Notes

Related Links

Join the discussion
1 comment

More from this show