Adar Lieber-Dembo from Cloudera discusses Apache Kudu, which is a columnar data storage system for fast analytics and fast ingestion of large datasets. Kudu takes its inspiration from systems in the Hadoop ecosystem, but it addresses many of their shortcomings. SE Radio’s Akshay Manchale spoke with Adar about motivations behind building Kudu, features available for users to ingest and query data, and operational aspects of running Kudu. They also talked about special features such as partitioning and distributing data in a Kudu cluster, features for high availability, and HybridTime and integration of Kudu with other data analysis and SQL engines. The interview ends with a brief discussion of the advantages of column-based storage in databases.
Show Notes
Related Links
- Hadoop and HDFS
- Apache Kudu
- Kudu Paper
- Apache Cassandra
- Apache Impala
- Apache Spark
- RAFT Consensus Algorithm
- Column-Oriented DBMS
Related Episodes
- Episode 157: Hadoop with Philip Zeyliger
- Episode 272: Frances Perry on Apache Beam
- Episode 179: Cassandra with Jonathan Ellis
- Episode 233: Fangjin Yang on OLAP and the Druid real-time-analytical-data-store
- Episode 277: Gil Tene on Tail Latency
- Episode 255: Monica Beckwith on Java Garbage Collection
SE Radio theme: “Broken Reality” by Kevin MacLeod (incompetech.com — Licensed under Creative Commons: By Attribution 3.0)