Sean Knapp, CEO of Ascend.io, discusses data pipelines and data pipeline automation. Sean spoke with Host Robert Blumen about the ubiquity of data pipelines; what data pipelines do; where the data comes from, how it is transformed, where it goes;...
Adar Lieber-Dembo from Cloudera discusses Apache Kudu, which is a columnar data storage system for fast analytics and fast ingestion of large datasets. Kudu takes its inspiration from systems in the Hadoop ecosystem, but it addresses many of their...
Andrii Gakhov, author of the book Probabilistic Data Structures and Algorithms for Big Data Applications talks about probabilistic data structures and their application to the big data domain. Host Robert Blumen spoke with Dr. Gakhov about how...
Stephen Ewen, one of the original creator of Apache Flink discusses streaming architecture. Streaming architecture has become more important because it enables real-time computation on big data. Edaena Salinas spoke with Stephen Ewen about the...
Venue: Internet Jeff Meyerson talks to Haoyuan Li about Alluxio, a memory-centric distributed storage system. The cost of memory and disk capacity are both decreasing every year–but only the throughput of memory is increasing exponentially. This...
Venue: Internet Ben Hindman talks to Jeff Meyerson about Apache Mesos, a distributed systems kernel. Mesos abstracts away many of the hassles of managing a distributed system. Hindman starts with a high-level explanation of Mesos, explaining the...
Venue: Internet Jeff Meyerson talks to Jun Rao, a software engineer and researcher (formerly of LinkedIn). Jun has spent much of his time researching MapReduce, scalable databases, query processing, and other facets of the data warehouse. For the...
Recording Venue: Skype Guest: Grant Ingersoll Grant Ingersoll, founder of the Mahout project, talks with Robert about machine learning. The conversation begins with an introduction to machine learning and the forces driving the adoption of this...
Dave explains why reading source code is at least as important a skill as writing source code. He shares approaches for how to get to grips with unknown and undocumented source code even if it is non-trivial in size. He finishes with advice for how...