Software engineer Alex Boten, author of Cloud Native Observability with Open Telemetry, joins host Robert Blumen for a conversation about software telemetry and the OpenTelemetry project. After a brief review of the topic and the OpenTelemetry...
Jeffery D Smith, author of Operations Anti-Patterns, DevOps Solutions, discusses anti-patterns in software development organizations and how they can be fixed. Host Robert Blumen spoke with Smith about why he chose to focus on what can go wrong;...
Jamie Riedesel, author of Software Telemetry Book, discusses software telemetry, why telemetry data is so important, and the discipline of tracing, logging, and monitoring infrastructure. Host Gavin Henry spoke with Riedesel about what telemetry is...
Rob Skillingon discusses the difficulty of scaling monitoring and alerting to high dimensional spaces, as are typically found in modern applications at scale. High cardinality versus high dimension spaces. The episode begins with a review of...
Chris Richardson of microservices.io and author of the book Microservices Patterns discuss microservices patterns which constitute a set of best practices and building-block solutions to problems inherent in building applications out of small...
Felienne talks to Diomidis Spinellis about debugging. The discussion covers: defining “debugging” which can mean using a debugger or the process of finding and removing bugs; how is it done best; variation across different programming languages or...
Ben Sigelman CEO of LightStep and co-author of the OpenTracing standard discusses distributed tracing, a form of event-driven observability useful in debugging distributed systems, understanding latency outlyers, and delivering “white box” analytics...
Edaena Salinas talks with Tammy Butow about Chaos Engineering. Topics include: the factors that caused Chaos Engineering to emerge, the different types of chaos that can be introduced to a system, how to structure experiments. Some of the chaos...
Bryan Reinero talks with Jason Hand about handling outages and responding to failures. The episode explores basic problem-solving strategies and diagnostic techniques, organizing teams to address incidents efficiently, communicating with...
Venue: QCon San Francisco 2016 Gil Tene joins Robert Blumen for a discussion of tail latency. What is latency? What is “tail latency”? Why are the upper percentiles of latency more relevant to humans? How is human interaction with an...