Venue: Internet
Jeff Meyerson talks to Jun Rao, a software engineer and researcher (formerly of LinkedIn). Jun has spent much of his time researching MapReduce, scalable databases, query processing, and other facets of the data warehouse. For the past three years, he has been a committer to the Apache Kafka project. Jeff and Jun first compare streaming to messaging, and the frameworks that support each. Kafka is a big data messaging or pub/sub system. Traditionally, these are two different types of systems, but the lines have become blurred recently. Kafka can also be looked at as a distributed commit log. Next, they discuss the vocabulary of Kafka, including producers and consumers. They wrap up by exploring Kafka from the perspective of durability and reliability and discuss some failure cases.
Show Notes
Related Links
- Apache Kafka: http://kafka.apache.org
- Original Kafka paper: http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
- Kafka Basic Training: http://www.slideshare.net/miguno/apache-kafka-08-basic-training-verisign
- Building LinkedIn’s Real-time Activity Data Pipeline: http://sites.computer.org/debull/A12june/pipeline.pdf
- Kafka: A Little Introduction: https://speakerdeck.com/pingles/kafka-a-little-introduction
- Apache Storm: https://storm.apache.org
- Apache Samza: http://samza.incubator.apache.org
- Apache Zookeeper: http://zookeeper.apache.org
Hi!
It feels like the Download link is broken… I cannot download the mp3. Could you please have a look?
Thanks!
Ivan
It works now. Could you please try it again? Perhaps libsyn was down momentarily?
[…] Apache Kafka (podcast) – very good introduction into Apache Kafka project. AT least I know know where it’s applicable and some internals of it. […]
[…] [3] http://www.se-radio.net/2015/02/episode-219-apache-kafka-wit… […]
I’ve recently been enlightened by a few of your podcasts. Zookeeper keeps coming up in the conversation. It would be good to hear a bit more about it.
[…] Apache Kafka with Jun Rao […]