Yi Pan, lead maintainer of Apache Samza discusses the internals of the Samza project as well as the Stream Processing ecosystem. Host Adam Conrad spoke with Pan about the three core aspects of the Samza framework, how it compares to other streaming...
Adar Lieber-Dembo from Cloudera discusses Apache Kudu, which is a columnar data storage system for fast analytics and fast ingestion of large datasets. Kudu takes its inspiration from systems in the Hadoop ecosystem, but it addresses many of their...
Venue: Internet Ben Hindman talks to Jeff Meyerson about Apache Mesos, a distributed systems kernel. Mesos abstracts away many of the hassles of managing a distributed system. Hindman starts with a high-level explanation of Mesos, explaining the...
Venue: Internet Flavio Junqueira is the author of Zookeeper: Distributed Process Coordination. Flavio and Jeff Meyerson begin by defining ZooKeeper and talking about what ZooKeeper is and isn’t. ZooKeeper can be thought of as a patch against certain...
Venue: Internet Jeff Meyerson talks to Jun Rao, a software engineer and researcher (formerly of LinkedIn). Jun has spent much of his time researching MapReduce, scalable databases, query processing, and other facets of the data warehouse. For the...
Recording Venue: Skype Guest: Grant Ingersoll Grant Ingersoll, founder of the Mahout project, talks with Robert about machine learning. The conversation begins with an introduction to machine learning and the forces driving the adoption of this...