Venue: Internet
Kyle Kingsbury, known as Aphyr on Twitter and for his blog by the same name, talks to Stefan Tilkov about consensus in distributed systems and about his experience in testing systems to see how they behave in case of failures. In addition to discussing some of the theoretical foundations, such as the CAP theorem, isolation levels, and consensus protocols, Kyle talks about some specific databases, including MongoDB, Riak, and Redis, and discusses how they maintain and achieve — or fail to achieve — a consistent state. Finally, there’s some advice for practitioners on how to pick a solution and understand its properties.
Show Notes
Related Links
- Kyle’s website https://aphyr.com/
Call me maybe – the Jepsen blog post series https://aphyr.com/tags/Jepsen
- ANSI SQL Isolation Levels http://docs.oracle.com/cd/B12037_01/server.101/b10743/consist.htm#sthref1919
- CAP Theorem http://dl.acm.org/citation.cfm?id=564601
- CRDTs: CRDTs: Consistency without concurrency control http://pagesperso-systeme.lip6.fr/Marc.Shapiro/papers/RR-6956.pdf?
- Dynamo http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
- Postgres http://www.postgresql.org/
- Redis http://redis.io/
- Redis’ Sentinel protocol http://redis.io/topics/sentinel
- MongoDB https://www.mongodb.org/
- Riak http://basho.com/products/riak-kv/
- zookeeper http://zookeeper.apache.org/
- NuoDB http://www.nuodb.com/
- Kafka https://kafka.apache.org/
- Cassandra https://cassandra.apache.org/
- Rabbit MQ https://www.rabbitmq.com/
- consul https://www.consul.io/
- etcd https://github.com/coreos/etcd
- Elasticsearch https://www.elastic.co/products/elasticsearch
- Raft https://raft.github.io/
- Clojure http://clojure.org/
- TLA+ http://www.tlaplus.net/
- Spin http://spinroot.com/spin/whatispin.html
- Quickcheck http://www.quviq.com/products/erlang-quickcheck/
- Foundation DB https://foundationdb.com/
- Paxos http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf
- ZAB (Zookeeper Atomic Broadcast) http://www.tcs.hut.fi/Studies/T-79.5001/reports/2012-deSouzaMedeiros.pdf
- Knossos https://github.com/aphyr/knossos
- Christopher Meiklejohn’s website http://christophermeiklejohn.com/
- Strong Consistency Models https://aphyr.com/posts/313-strong-consistency-models
- Peter Bailis’s website http://www.bailis.org/
Thanks for the great podcast. A fantastic explanation of CAP! I also found the review of the various products extremely useful. Very relevant to the problem domain I’m currently working on. We are considering using ELK as a source of record of auditing. After hearing the problems discovered with Elasticsearch, I’m wondering if we want to rethink that decision.
Just a small nit. The audio was a bit muffled on Kyle’s end. Maybe guests could be encourage to use a headset?
Fantastic episode. I’m going to have to listen to it more closer or just read Tilkov’s work.
It was a pity the audio quality was so bad. I am interested in the topic, but could not understand either of them very well (particularly Kyle). Ended up just deleting it after ten minutes of perseverance.
Anybody know exactly what Kyle’s words were in this podcast around the time he said something about operating Elasticsearch in prod being a “tire fire?” He said he would not recommend it to anyone who ___?
[…] SE-Radio Episode 241: Kyle Kingsbury on Consensus in Distributed Systems : Software Engineering Radi… […]
[…] Fournier’s website SE Radio Episode on consensus in distributed systems with Kyle Kingsbury SE Radio episode on distributed coordination with Apache ZooKeeper with Flavio Junqueira SE Radio […]
[…] SE Radio Episode on consensus in distributed systems with Kyle Kingsbury […]
audio quality is too bad!