Costa Alexoglou, co-founder of the open source Hopp pair-programming application, talks with host Brijesh Ammanath about remote pair programming. They start with a quick introduction to pair programming and its importance to software development before discussing the various problems with the current toolset available and the challenges that tool developers face for enabling pair programming. They consider the key features necessary for a good pair-programming tool, and then Costa describes the journey of building Hopp and the challenges faced while building it.
Brought to you by IEEE Computer Society and IEEE Software magazine.
Show Notes
Related Episodes
- SE Radio 682: Duncan McGregor and Nat Pryce on Refactoring (discusses team-based refactoring)
- SE Radio 582: Leo Porter and Daniel Zingaro on Learning to Program with LLMs (discusses AI-assisted programming)
- SE Radio 386: Spencer Dixon on Building Low Latency Applications with WebRTC (discusses building a pair programming tool)
Other References
- Hopp remote pair programming app
- How pair prompting could mitigate the risks of AI assisted programming
- We chose Tauri over Electron. 18 months later, WebKit is breaking us.
- Atlassian Survey: Engineering at AI scale: Unlocking velocity and visibility
Transcript
Transcript brought to you by IEEE Software magazine.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.
Brijesh Ammanath 00:00:18 Welcome to Software Engineering Radio. I’m your host, Brijesh Ammanath. And today my guest is Costa Alexoglou. Costa is a software engineer at Grafana Labs and co-founder of Hopp, an open-source software pair programming application. During his free time Costa is building design system cookbooks, which hopefully will become a central place to learn about design systems as a project manager, designer, or engineer. Previously Costa led the design system at Neo4j and co-founded Visualize. Costa, welcome on the show.
Costa Alexoglou 00:00:49 Yeah, nice to meet you, Brijesh.
Brijesh Ammanath 00:00:51 Today as we’ll discuss and deep dive into remote pair programming. Call out a few related episodes. So, in Episode 682, Duncan and Nat had spoken about Refactoring and discussed team-based refactoring. In Episode 582, Leo and Daniel spoke about Learning to Program at LLMs and discussed AI assisted programming. In Episode 386, Spencer spoke about low latency applications with WebRTC and discussed building a pair programming tool. So, let’s start with the fundamentals. Costa, for listeners who might be new to the concept, what is pair programming in a simplest form?
Costa Alexoglou 00:01:29 Yeah, so pair programming is effectively a practice of program with your teammate in one screen. So, you have one monitor, one keyboard, one mouse, and then usually one code and the other is looking. And then in the new era of distributed working, we have the remote pair programming, which is effectively the same, one teammate share their screens and the other teammates, if you are doing mob programming, as they call it, control the screen and give you feedback.
Brijesh Ammanath 00:02:00 Right. And when it’s done well, what does it look like and what does it feel to team members who are doing remote pair programming?
Costa Alexoglou 00:02:07 Yeah, when pair programming is done right, it’s a nice experience because you have a ping pong type of collaboration. So, usually it’s also a style of pair programming. It’s called driver navigator. So, for people that are familiar with WRC racing, this is a nice analogy. So, you have the driver, which usually focuses on syntax or in the WRC analogy on the road. And then you have the navigator which focuses on map effectively in our case, the architecture or bug or two steps ahead in the coding rather than just typing the code and taking care of the C decks. So, yeah, a good pair programming is around those principles besides some other good is like really good focus and staying on the problem besides having a fragmented attention when you work alone.
Brijesh Ammanath 00:02:56 And it’s not a new concept, right? Because I remember using pair programming when we were implementing a project where we used extreme programming. I believe it was a fundamental concept or a principle in extreme programming.
Costa Alexoglou 00:03:08 Yeah, yeah, exactly. And by the way, extreme programming, I think it’s making a comeback nowadays that velocity is super faster with AI, but definitely not a new concept. Right now, we in quote “reinvent” this for the remote-first culture because many teams are right now distributed. I work fully remote, but definitely not a new project. And some of the two main rock stars in the industry, let’s call them Jeff Dean and Sanjay, I don’t want to pronounce his last name, I will probably be incorrect, but two of the most senior Google engineers, they’re level 11 engineers, I think, which it’s the only two engineers in Google in that level. So, it’s like the best engineers at Google. Actually, they’re famous about pair programming together every day. So, every day they code together, they share the keyboard, and those two people literally produced map reduce, big tables, spanner — basically, the software that the internet runs today. So definitely not a new concept and some amazing results from amazing people. So yeah, I agree with what you said.
Brijesh Ammanath 00:04:10 Okay. Can you share any examples or any specific moment from your own career where pair programming saved you or a project that you were working on from going off the rails?
Costa Alexoglou 00:04:21 Yeah, for sure. And this is also the story of how I became accustomed to pair programming myself. So in the previous company I was working, we were trying to make a big refactor to releasing new version of the platform, which was basically database optimization, the software, and it was working well for vanilla Postgres flavor, meaning just the Postgres server running and the new refactoring would strive to support AWS RDS, Google Cloud SQL, Microsoft Azure flexible server. I even saw many Postgres flavors and I was the tech lead for this project. And at the same time, the version one platform was built by a previous coworker of mine and a friend from Germany. So, we were working remotely and this literally helped us accelerate from a projected timeline of one year to four months or five months, I can’t remember exactly, but we accelerated a lot based on purely pair programming every day.
Costa Alexoglou 00:05:16 So that meant we went on a call and then he was providing me really good context around what decisions were taking in version one of the platform. Why was something built the way it was? So, it was a good onboarding for me to get the context of the business logic and I was providing the context around the architecture and how we can abstract this to support many more flavors. So that was my very first experience around better programming daily for many hours with a person and understand the value of sharing context instantly. And besides that, one of the major benefits was that literally you have pull request reviews on the fly. So, when you code with another person before giving to the pool request or merge request as they call it in GitLab, you have somebody reviewing your code on the fly and giving you feedback so the other person will share a context and then you will have much faster merge time effectively or review time on your pool request. So that was definitely one of the very first times that we accelerated a lot of the timeline to deliver a new major factor of the platform and made me understood, okay, this is really valuable as a software engineer to practice
Brijesh Ammanath 00:06:27 One year to four months. That’s very impressive.
Costa Alexoglou 00:06:30 Yeah, that was indeed. Yeah.
Brijesh Ammanath 00:06:31 Maybe moving to the current context where you have AI tools helping you write more and more code and there’s this trend of vibe coding where developers go and develop with the flow with almost AI tool as your partner, do you see that as a good use case for pair programming or do you think that pair programming is becoming obsolete because now your AI tool becomes your pair programmer?
Costa Alexoglou 00:07:01 Yeah, that’s an interesting question and unfortunately, now with vibe coding and everything, pair programming gets the notion that it’s AI pair programming. But besides that, I think it’s an interesting stat from Atlassian Survey. So that 16% of our time as engineers is around programming and the rest is other tasks. And I think this is important. First of all, that means that 84%, the rest 84% of our time is not coding per se. That means architecture reviews, meeting, debugging, taking Grafana dashboards, onboarding new teammates. So, this 84% is still up for grabs for pair programming, remote pair programming or just pair programming. So, this is something that does not change with AI. Now for the rest 16% that we code and spend our time coding either be without AI or with AI, I think right now everybody’s using AI. So, making the assumption that we use AI is still something that I think we will adapt and we will still do this in a pair programming environment.
Costa Alexoglou 00:08:07 I mean real with a human person programming environment. And yeah, we make all this pair prompting or something similar, but if we think about this, it’s still programming, right? I mean we need to have context about what we prompt the AI to do. It’s not like that we will remove fully the human factor from the equation and still you need context still. You need fundamental knowledge of the programming lags else you’re going to produce slop code effectively as they call it. So super locality code and then you will put the burden on the reviewer to fix the mess you created while you didn’t know what you were doing. So, circling back to the question, I don’t think that this will change. Definitely things are changing on the way we program. So instead of typing character by character the code and hope that it will compile, at the end of the day, AI will take care of that. But still the prompting and the context around this, it’s fully human centered and I don’t think pair programming being removed from the equation anytime soon.
Brijesh Ammanath 00:09:06 You touched on debugging. Do you have any example or a story about where you tried to fix a frustrating bug, and that pair programming actually helped you out over there?
Costa Alexoglou 00:09:18 Many times, but back to them, the story of how I started pair programming, what we were practicing was effectively database optimization. And database optimization is literally trying to break the database. When I say break mean benchmark heavily, try to stress the database so it’s close to breaking. And that meant for us maybe 50, 60% of our time was to debug Grafana dashboards. I mean going to Grafana and then trying to understand what was happening inside the database, check traces, tech performance, like queries per second, transactions per second, auto vacuum for Postgres for example. So those metrics that we were debugging heavily and pair programming and I will get to that later on, on how we solve this with Hopp was literally debugging dashboards. And in that case, it was helpful to get the human eye in our perspective, my teammates, and try to navigate around what was breaking, why the database behaved the way it was behaving and have this synchronous back and forth for ideas, of what is going on or what we should try as an experiment yet.
Costa Alexoglou 00:10:25 So that’s one debugging. Another experience from debugging is when you are on call, so when you’re on call, maybe sometimes you cannot fix an issue on your own and you want some teammates to jump on your call. Maybe someone is from the backend team, maybe someone is from the platform team and maybe the issue is lying between platform and backend at the same time. So, you want somebody to represent the platform and try to figure out the solution. So, this is also an example that pair programming really shine because there is a burning fire, and you try to fix it together with another person.
Brijesh Ammanath 00:10:56 I think we have established it very well that pair programming is a very useful practice, and I think a big part of implementing the practice since most programmers are now remote or working remotely, is the tools that we use. In your pitch, you mentioned that teams end up usually fighting their tools. Can you expand on that?
Costa Alexoglou 00:11:17 Yeah, for sure. And it all started from my own experience and effectively I called this like the dev by a thousand cuts. And what I mean by this, usually we would start pre-programming with my teammates using Slack huddles. It’s the equivalent let’s say of Google Meet or from a Zoom call or a Microsoft Teams call. And the first cut was the video quality. And what I mean by that maybe many listeners will have experienced this many, many times. So as soon as you start screen sharing the very first question you’ll ask if you are the ones that is sharing or that you’ll hear from a participant, can you zoom in or do you see my screen nice? And this is the very first question that has always popped up and why do you want to zoom in? Mainly because the video quality is not great, and this makes sense because Google Meet or Slack Huddles are optimizing for video conferencing and not for screen sharing and especially real time screen sharing.
Costa Alexoglou 00:12:14 So that was the very first thing that was annoying because it’s not natural to use, for example, an ID with a super large font just for your teammate to be able to look on your screen. So that was the very first cut that was constantly happening. The second one, I called this the navigation hill. And what I mean by the navigation hill is that you always needed to point your teammate where to go. So, you said, no, no, go to the left, go to the right, find this ID. So, imagine that you used Grafana or you used AWS console or Azure console and those are complex you guys. So just to give instructions to your teammate where to go and find the cause or find the problem or go to generate a new token was frustrating. So, you ended up fighting the tools and breaking your productivity flow rather than getting super smooth in the collaboration.
Costa Alexoglou 00:13:06 And the third part is that there are some tools that are for pair programming ID specific like Live Server for example, in VS code, which are good, but they are only specific to your ID. So, if you are being pragmatic, your context is not your ID, you don’t only write code, you have terminals, you have dashboards, you have AWS console, you have Google Docs or Microsoft Docs if you are viewing architecture designs. So those are also good tools, but super narrow scoped and the context of software engineering, it’s much broader that I feel like sharing your whole screen and being able to share everything you see to your teammate, it’s much more helpful rather than sharing on your ID. So effectively those are the three main issues, video quality, no remote control, and having the full context of a laptop rather than your ID. Those are the problems that I was facing when we used Slack Huddle or Google Meet.
Brijesh Ammanath 00:14:09 Right. And why does that slightly blurry text matter more when you are doing pair programming than while say you are watching videos?
Costa Alexoglou 00:14:18 Yeah, so videos depend on the latency specs and what I mean by that, videos should be high definition. If you are a platform like Netflix for example or YouTube, but you don’t care about latency because you can buffer some frames when you have good internet, everything is on demand. So, this can work out. But the specific part about conferencing is that you want high definition but also super low latency. And I will expand later on about the low latency. So, the conversion around conferencing tools is that they will sacrifice a little bit, not a little bit, maybe a lot the quality of the video you’re streaming. So, you can have many receivers, many participants that can receive this and you can receive this in okay time, not super-fast, not super great quality, but you will reliably receive this. And of course, that ends up making your text super blurry and you end up increasing this a lot because even if you increase this a lot, it will not get blurry.
Costa Alexoglou 00:15:21 I mean if you have one character for your whole screen, it will be sure that you will see this. But of course, this misses the mark because it’s not how you code in real life. I mean you want your letters to be a little bit small so you can see the full line of your code base. You may want to have two tabs open with two files so you can explore them. So, you can understand every time you zoom in you lose a little bit more context of your window. That can be code, that can be Grafana panel, that can be at the table in AWS console. Anything you can imagine when you zoom in, you lose context and effectively if you have high-definition quality, you don’t lose this context because you don’t need to zoom in anymore.
Brijesh Ammanath 00:16:00 Got it. Just wanted to double click on navigation fail. So, if I was doing pair programming in person, you would have one keyboard, whereas when you’re doing remote pair programming, you’ve got two people with two keyboards. So, what’s the difference between watching someone code to being able to control the environment together?
Costa Alexoglou 00:16:22 Yeah, so the main difference is that sometimes you have a clear start, clear idea of what you want to do or where you want to go. And it will definitely be faster than transforming this idea from your head to speeds and then speeds to go to the other participant and the other participant understanding exactly what you want to do. So sometimes it just speeds up things when you can control remotely. And besides that, if we go back to the driver navigator or pair programming paradigm, that means that even if I share my screen and the other participant controls my computer, that also means that we can still swap roles. So, we can spend half an hour me coding and you for example, doing architectural thinking and debugging and thinking two steps ahead. But you know, thinking is a little bit taller on the brain and then we can swap roles.
Costa Alexoglou 00:17:15 So you will be the driver, I will be the navigator. So, this is another benefit. And besides that, serving one PC is important, it can speed up things like setting up environment’s dev environments is not super easy every time and also pushing the work so the other can continue. So having one laptop where we have the work in progress so we can then at the end of the day push it to a branch and call it a day is also helpful because we can still pair together, but the work is one laptop running and not back and forth and setting up to environments, et cetera, et cetera.
Brijesh Ammanath 00:17:49 Have you ever been in any pairing session where you just wanted to grab the keyboard from your navigator because the tool was fighting you? Can you tell me about that story or example?
Costa Alexoglou 00:18:01 Yeah, just to clarify, pair programming session that I have remote control or pair programming session with tools that don’t support remote control, and I wanted it.
Brijesh Ammanath 00:18:10 Yeah, didn’t support remote control.
Costa Alexoglou 00:18:12 Many times. I mean that was literally the reason why I searched for alternatives and the only thing I could find was proprietary pair programming tools, not even one open source. That was literally the reason that I was so many times frustrated that I can just show you where to go, I can just show you what to type, but I still need to say to you where exactly to go. And many times, and this killed my productivity flow so many times, which was frustrating, I mean that was the reason that we are saying that you’re fighting the tools because those tools are not meant for pair programming, they’re meant for video conferencing and simple screen sharing if you’re doing a presentation but not for pair programming work where you sometimes need to have control and need to have control of the mouse or of the keyboard or both.
Brijesh Ammanath 00:19:03 Right. So, I think that’s a good segue into the next section where we’ll deep dive into the tool that you’ve built up. So, the primary or the core problem that you were trying to solve was the lack of remote control and you also wanted to look at the key challenges which are around video quality and navigation failures and only being able to share IDs with some of the existing tools provided. Walk me through the early days when you were building the first prototype. What was it like and how was it working when you got the first prototype out when you actually felt that this could make a difference to programmers’ life?
Costa Alexoglou 00:19:39 Yeah, so how it started is that I started searching alternatives and I found one that was super expensive, and it worked exactly to my needs, super smooth video remote control. So, I said this would exist, but the price was absurd like $30 per user per month, which if you ask me in a European company, a tech manager would not super easily expense this because this is per person, this is expensive. And then I have this idea, I mean why not create something open source that people can contribute to making this better? And we can make something that is working across operating systems. Because right now I work in a Mac OS, I have a teammate that works in Ubuntu, I have another teammate that works in Windows. I mean we don’t want to be restricted in one operating system and even if we don’t know how to support this, I mean its open source.
Costa Alexoglou 00:20:32 If some people want this, they will come and make a contribution. So that’s how it started. Then we started laying out the specs of these systems and as we searched around one of the, let’s call it the field spec was the latency requirement, and the latency requirement was less than 100 milliseconds. And where you in star our WebRTC, like Google Meet and Zoom, you may have 500 milliseconds. And why was it super important based on Apple’s human interaction guidelines, 100 milliseconds is the threshold that something starts to feel sluggish and that your remote control this rather than being native on your computer. So that was the very first requirement, can we make something with WebRTC technology that will be less than 100 milliseconds? That was the very first. The second one was that we wanted to make open source. So, we tried to create a system with components that anybody can run on their own hardware, no APIs keys, not to sign up anywhere, you can run this locally on your PC. Those were the two main things. And as it’s open source, being able to at some point be cross-platform, we started with this principle in mind. Of course, initially we supported fully MicroWest because it needs work to support other operating systems. Then we started supporting Windows and sometime in the future we want to go to Linux. But those were the very first things that we started mapping out on how we’re going to tackle this.
Brijesh Ammanath 00:22:03 Can you expand a bit on the technology stack you have used for HOPP and what factors led to picking up each of those tools?
Costa Alexoglou 00:22:11 Yeah, for sure. So just to explain a little bit how HOPP works. So as an application, it leaves on your menu bar and when you click this you can call your teammates. So, when you click your teammates for example, your teammate receives a call and then they can accept, and you start a pairing session effectively. So, I will take this from the backend to the front-end side. So, in the backend we selected Golan, which is nice for web services, and it can scale well. As a database, we used Postgres, which is an open-source SQL database, really nice for our use case. And Redis, Redis is a key value database, but it also has support for Pub/Sub workflows. And in our case, this enables the Realtime communication, for example, when you call a user, you want the real time for another participant to receive a call and this Redis database enables that and the real time communication for that type of messages.
Costa Alexoglou 00:23:05 So this is the backend part. And for the application part what we selected was Tauri. Tauri is a Rust based framework that effectively you can build desktop apps that wraps WebKit instances. In Mac OS, it wraps the WebKit instance. In Windows, it wraps WebView and in Linux it wraps WebKit GDK. And I will explain later on why we are scraping that. But those were the first initial picks. And then for the Cross-OS Correlate, we name this in hub core layer, that means screen sharing, streaming your video and remote control. We used Rust basically because Tauri is also supporting Rust as a backend and because it’s low level enough so we can do fast the things that we want to do fast, but at the same time, really, really mature and nice developer toolkit. So, if you compare with C and C++, besides the memory safety debate that it exists already, it supports a nice toolkit for you to format your code, catch users early on, really nice compiler. So those were effectively the decisions we make to start with this technology stack.
Brijesh Ammanath 00:24:41 Can you maybe double click on the challenges you faced with WebKit and what made you think of moving away from it and what did you move into?
Costa Alexoglou 00:24:51 Yep, effectively WebKit, so the framework we used was Tauri again and it bundles a different browser per operating system. The very first issue that we faced was that there is inconsistent rendering across operating systems, and that was a real pain. And what I mean by this is that you could have a cursor, for example, a virtual cursor with some shadows playing nicely in Windows but then breaking and looking ugly in MacOS. And that was basically because WebKit is different from WebView because WebView in Windows is, let’s say a fork of — based on Chromium, not a fork, but based on Chromium while WebKit is its own browser. And at the same time in Linux we hit WebKit SDK, which doesn’t support WebRTC and effectively video streaming is all about WebRTC. So that was a deal breaker that we found out way later on that we cannot see a solution in Linux with WebKit SDK because it doesn’t support WebRTC.
Costa Alexoglou 00:25:58 So those were the two main issues: inconsistent renderings and not being able to support Linux. Some other issues that we faced was really bad audio quality because we were capturing the audio from inside the Tauri browser, let’s call this. And the audio quality was not good. And if you think about this pair programming is fundamental about two things, video — three things, let’s say: video, audio and remote control. If you don’t nail one of the fundamental pieces audio, it’s bad being honest. So that was a bummer and even in the ratings we get because after every pairing session we ask users if they want to rate us. Usually if we get four out of five, the one star missing is always the audio quality, which is really sad if you ask me. And besides the audio issue that will make us move to something more low level is access to the video streaming buffers.
Costa Alexoglou 00:26:55 And let me share an example. Let’s say that I’m sharing my screen and I’m sharing this in 1080 quality. So simple HD quality, if I’m on the receiving side and I’m inside the browser, there is nothing I can do. But if you are on the backend side and you receive the video buffer in a low level languagelike Rust, that means that you have access to the video buffers and what you can do in Mac-OS for example, there are some APIs that hit the metal GPU that you can have with almost zero milliseconds penalty, upscaling your image in the GPU. And that means that you can get a 1080 quality video, pass this to the metal GPU and upscale this to 4K while the one is streaming in 1080. So, for the same bandwidth you can have 4K analysis that you look on your screen with zero performance penalties. And this is something really nice that we want to also support in the future because you imagine we may have latency issues because you may be in India, I may be in Sweden, so there is a gap between us. Geographically a gap between us and that means higher latency by definition. But even if you stream in simple HD, I’ll still get 4K because of this upscaling. And this is something fascinating for me because it’s effectively free quality with no bandwidth penalties and not latency penalties effectively.
Brijesh Ammanath 00:28:20 And you are able to tap into this outside of the Tauri framework?
Costa Alexoglou 00:28:24 Exactly, because right now we have the browser, and we just display the video in a video tag in the browser. But in order to achieve this, that means that you will need to receive the framing in the browser, find a way from the browser to send this somehow to a low-level thread in Rust let’s say, or and in any back UTF then pass this to the GPU and then transfer this back to the browser to render this. And we’ve made some experiments and that adds minimum 50 milliseconds penalty even way more because if you then transfer back to the browser 4K frame and then if you want to tackle 30 to 40 to 60 frames per second, that means that effectively you’re going to have huge delay. But also, your laptop will be on fire because besides also upscaling, you also need to transfer the buffers here and there. And it’s not like, you know, a buffer with a reference, but you actually copy them every single time, which is a huge penalty performance wise.
Brijesh Ammanath 00:29:23 And what technology solves for this? If it’s not Tauri, what are you looking at?
Costa Alexoglou 00:29:28 Yeah, so we’ve looked many alternatives and basically it will still stay Tauri because Tauri is really nice for the Windows that don’t use with a WebRTC. So, if we don’t display the video, we don’t capture the audio there, it’s really nice for the main UI to be hosted in Tauri and it also comes with free goodies like automatic updates, which is nice for us because you can update your app on the fly and Tauri takes care of it also bundling and everything that we care about. But the Windows that are going to use video and audio streaming, like the camera Window, if we set our camera or the screen sharing window that displays our, the seller’s screen is going to use two main frameworks. Three, let’s say win it when it is a Rust framework for window management that works across all operating systems.
Costa Alexoglou 00:30:19 And then a next framework called iced, iced is around building UIs in Rust. That works also across all operating system and wcpu, which is a Rust framework for GPU rendering. So effectively screen sharing that is going to have 30, 46 frames per second and it’s going to be intensive to draw, it’s going to be offloaded to the GPU. So, everything will run, fast. Zero milliseconds, performance penalties, like latency penalties. So those will be the tools that we are right now in the process of refactoring to go to provide a better experience audio-wise, video-wise and quality-wise with the upscaling in the future.
Brijesh Ammanath 00:31:04 When you were initially building the tool, was there one performance requirement that you refused to compromise on even though it made things harder for you to build the tool?
Costa Alexoglou 00:31:16 The one important thing we, cared about was high-definition streaming with low latency. Because right now we may not have the best audio because of the limitation I explained, but something that we could never, we refused to announce this to the public until we reached the specific threshold of latency, and we spent months working on it. So, 100 milliseconds latency for high-definition video was the absolute threshold of announcing this to the world. And of course, finding the technology, something that I did not explain in the tech stack also. And of course, finding the technology that will allow us to do this. For example, for WebRTC we use an open-source project called the LiveKit that they also have a cloud offering, which is nice because it has distributed WebRTC infrastructure. So, they have servers in many places so they can try to minimize your latency based on where the participants are located.
Costa Alexoglou 00:32:15 So after we picked LiveKit and we measured that for just simple data transferring, it can achieve super low latency, the one latency threshold we needed. Then we focused on the video streaming latency that we achieved. And this of course had other technical challenges. For example, if you think about this, how do you measure video latency and what am I not by that I am controlling your computer so I make a click on my keyboard, then this keyboard should go to your computer. Then your computer for example, displays this character and then this frame is getting captured and getting back to me so I can view this. So, this is the whole loop of 100 milliseconds latency. And then you ask how do you exactly measure this? Yes. Because this is untapped territory being honest and something that we did was fingerprinting. And what I mean by that is that when we sent a key event, for example as a controller, I send a key event with metadata that said, I want to measure the latency then the receiver, if it reads that it needs to annotate the metadata.
Costa Alexoglou 00:33:25 What we did is that for the whole first line of the video, we added black pixels. So, this way we knew that this is an annotated frame, that when we get this back to be displayed in my screen that I control, I know that this went back and forth. So, then I compare the timestamps, and I know exactly how much time it took for the back-and-forth latency, and this is how we measured video streaming latency effectively for the whole round trip. And this is something that, I won’t say it’s novel, but it’s something that it’s not well documented because not many articles online exist around how you measure around triple latency for these specific cases. And this is how we solved it specifically to, to have a more, let’s say, data-driven approach on which decoder are we going to use? Because that was also an important part. How are we testing decoders or how do we tweak decoders to work well for our use case? So that was the whole pipeline we started creating to make more informed decisions.
Brijesh Ammanath 00:34:30 That’s a very interesting way of solving the problem. I thought you might have just given the tool to HOPP, to a few programmers and got the feedback to see how it feels from a speed or latency perspective. But the fact that you actually had a solution which gave you actual data to measure your latency, that’s just amazing.
Costa Alexoglou 00:34:48 Yeah, I hope it was that easy to give it to people. But you know, you have huge variance because I might give it to a teammate that they ended up having bad WiFi connection or they may were running 10 agents at the same time and the agents were overloading their PC and then everything was slow because their PC was on fire effectively. So, we wanted something way more data driven for us to be sure that what we are going to see is tested and it should work based on the conditions that you have good connection because we took it for granted that you’re going to have at least okay connection we tested. And also, something that we tested besides decoders is that in macros for example, you have networking tools that can help you create fake network profiles. So, we also tried decoders to see how they behave also in different network profiles. So how much bandwidth its decoder consumed. So, we knew for example beforehand how much they’re going to consume in production when we see HOPP, are they going to work in bad connections? For example, I’m from Greece, for example. Originally, we have bad upload speed in the whole country. So how would HOPP work in a Greek environment if somebody tested from Gateway because download is fairly good, but upload is bad. So, we wanted to simulate different various scenarios. In that case.
Brijesh Ammanath 00:36:12 Was the latency measure measurement solution, was that the most or the toughest technical problem you solved or were there other technical problems that you solved were which were tougher than that one?
Costa Alexoglou 00:36:26 Yeah, so definitely that was an interesting problem that we solved. Measuring how we measure the latency and creating all those profiles to have a more, let’s say, automated pipeline on making informed decisions. That was definitely one. The next of course is trying to understand WebRTC because we use it in a low level to tweak this and we don’t use just the browser primitives to, to just do a screen sharing. That means that we needed to understand a beast of a project where WebRTC is a complex project maintained mainly by Google.
Brijesh Ammanath 00:37:00 Before you go into that technical solution, if you can spend a minute explaining what WebRTC is and what problem does it solve?
Costa Alexoglou 00:37:08 That’s a great question. So, WebRTC is effectively, it’s an operation technology driven by Google, if I’m not mistaken, but I think it’s driven by Google and maybe Ericsson now that I recall. Effectively it enables web browser mobile apps to perform a real time peer-to-peer voice audio and data streaming. So, when you go to Google Meet, for example, everything behind it uses WebRTC. So, it sends the video frames, it sends the audio frames, your screen sharing may be and data at the same time because for example, when you, for example for the pair programming powering, if you press keys in your keyboard, we transfer them through a data channel. All this is delegated by WebRTC. So that’s effectively what WebRTC is doing without getting, you know, super into depth of what is it exactly. But in the higher level that’s what it does.
Brijesh Ammanath 00:38:02 Right. And what technical solution did you solve around WebRTC, which you found was, you know, as tough as solving for the video latency problem?
Costa Alexoglou 00:38:11 Well we found out, so we could start, for example, when we picked Tauri, or let’s say you can share your screen with JavaScript from the browser, we could start like this, but fast. We found out that browsers don’t expose, low level APIs that you’ll need to tweak to make something super-fast. So that’s why we started screen sharing from the Rust side and not from inside the browser. That was the initial thing. And then when you start working with WebRTC and with the low-level internals, you start understanding that this is a complex project. I mean this is a project maintained for many many years, driven by many large corporations, I guess Google, Discord, Zoom, Microsoft. It has many levels of abstractions and knobs to tweak. That was hard for us to understand, okay, where should we look at to make this faster?
Costa Alexoglou 00:39:06 So for example, we were spending one month trying to tweak the decoders and encoders, just to find out that there was one specific flag in WebRTC that make things run at least two or three times faster and we’ve missed it just one flag in the whole code base with good defaults for the encoders. So that was hard for us to try to understand how to work with this and also to work WebRTC is written in C++ for example, and we were using the Rust side of things, the Rust bindings and also try to navigate this complex landscape of you’re working with the bindings over C++ project that you’re not familiar with and you try to find, you know, super low level details that will make your goals feasible. Because again, we had a goal of 100 milliseconds and that was no game. I mean we needed to go super low level to make this happen else we would just use a browser and call it a day.
Brijesh Ammanath 00:40:03 I have to ask this. So right now, what latency does Hopp has it achieved?
Costa Alexoglou 00:40:10 Yeah, so now, and we have a really good explanation post in our blog for anybody that is interested in this topic on how we actually measure things and our numbers. And now we are in the rates, depending on the quality you serve of around 80 milliseconds for HD plus quality. Then if you go to the 4K, you will have a little bit north of 100 milliseconds and we don’t support 8K and most people don’t even have 8K this place. I will not even measure that for now. But we are on the verge of 100 milliseconds, give or take 20 milliseconds depending on the quality of your screen. And again, depending on your upload speed, I mean if you have a bad upload speed, nothing can save you effectively. There is nothing it can be done if you’re entering this bad. But those are the numbers we have right now
Brijesh Ammanath 00:41:02 That programming fundamentally means you’re sharing a screen with another person who’s not, you know, in the same room. Were there any security aspects or considerations you needed to take into account as you built up?
Costa Alexoglou 00:41:16 Actually, this is a really interesting question. This is something that we would want to tackle in the future, but definitely there are — and what I mean that there are, not about grabbing remotely things because of course always the one that shares the screen has the control and they can kick out if there is a bad actor, for example. But from my perspective, it’s mostly about, you know, ACERMA terminal. I may mistakenly open my .zshrc file or my .bashrc file and I have sensitive tokens there about my Claude code, API token that I don’t want my teammates to see, for example, and maybe screenshot and do something. So, this is something that we want to do, and I think in one point when we have screen streaming closer to the GPU, we might be able to do some fancy stuff around hiding, for example, sensitive information.
Costa Alexoglou 00:42:13 But for now, this is the only security concern if you may ask me and nothing else because there is no remote-control execution. You just send key events that we control so there are no security holes there. If you wish, rather than let’s even not even call the social hack rather than mistakenly opening a file and being visible while we stream. And for bad actors in between. So, let’s say that even if I stream something — I said before that we use LiveKit, which effectively is a super secure network. LiveKit for example is right now used by Open AI for example, when you use voice agent in Open AI, they use LiveKit. So, it’s a super secure network; they’re compliant, they have good encryption for the streaming. So, there is no security concern about being someone in the middle of your WebRTC stream and, you know, being afraid of something will happen.
Brijesh Ammanath 00:43:05 Was there any lesson or learning that you learned the hard way that you were not aware of before you started building out Hopp that you would like to share with our listeners, which would be useful for them or anybody who’s building performance critical applications?
Costa Alexoglou 00:43:20 That’s an interesting question. So, first of all, I think the best lesson, I won’t call it this the best lesson, but you need to start building to learn the lesson. So just being proactive and action driven. I mean that’s the best way to learn. The second thing that I learned is that it’s important to set up the specs and now that we are in the AI driven era, it’s also the spec driven development. Some may have already seen this, it’s becoming a trend. Also setting up the specs before you start biding is nice because it lets you narrow down and not only your focus but also your options. Because if you have the specs and send off, trying out things and breaking them every single time because you find new requirements. In our case, it helps us a lot narrow down the focus on the stack we needed to use.
Costa Alexoglou 00:44:12 Of course we made mistakes and one mistake for example, not going from day zero to something more Rust-based, but at the same time we accelerated a lot because we found our first users, we got the first user base contribution based in GitHub. So, pros and cons in everything, there’s a trade-off. No freelances in engineering, but for me, just when you create performance critical software, the most important thing that you need to ask yourself are the specs because you build for the specs. And if you don’t have them, you might want to build something which is performance-critical and you think it’s performance critically, but maybe it’s not. Maybe you need to, you know, evaluate more the readability of the code or the velocity of the code you’re sifting and it maybe you need to choose Python or maybe you need to integrate better with machine learning models and frameworks.
Costa Alexoglou 00:45:06 Again, Python, maybe it’s more web server use Golan or JavaScript or anything. So, for me, after learning and tackling this project, I think understanding the specs of what you want to provide effectively. And if we go backwards, if we go from the user perspective, start gathering the specs from the user perspective. For example, for us, the 100 milliseconds spec was driven by what the users should experience and then get this backwards and understand what the tech behind this should be. So that was my biggest learning after all this. And of course, I mean I’m a little bit biased, but better program more with your teammates. I’m sure people will love this. I love and still love this experience. It helps me accelerate more the work I’m doing. And it also, it’s really really nice for context and knowledge sharing. I mean, I pair with principal engineers, people with 20 years of experience. I’m always amazed by what they know, and this is the best way for me to grow personally myself. So yeah, those are my two learnings. I would advise the listeners.
Brijesh Ammanath 00:46:14 So to summarize it’ll be in my minds eye, the way I look at it is make sure you have a problem that you want to solve for. Once you’ve got the problem, make sure you’ve got specs for the solution that you want to build. And the third most important point is start building. There’s no point on just thinking about it and designing it if you’re not building it
Costa Alexoglou 00:46:31 On point. Exactly.
Brijesh Ammanath 00:46:33 Excellent. So maybe that’s, that’s a good segue to the next section. You know, we’ve covered a lot of ground over here. So, as we close off a few final questions: In terms of start building, if developers want to start contributing towards Hopp, where should they go and how should they start off? Do you have any suggestions or advice around that?
Costa Alexoglou 00:46:52 If they want to start contributing for Hopp, we have an open-source GitHub repository. It’s called GitHub/Hopp, H-O-P-P, so they can go there and contribute. Right now, we have contribution guidelines, and we support MacOS and Windows. We have also some issues with the label Good First issues. So, if they’re junior engineers and they just want to get their hands dirty, we have some labels Good First issue. So, they can tackle an easy task. And we have a really big spectrum. So, they can contribute to the backend, they can contribute to the database driver, they can contribute to the application itself, they can contribute to the web app, which is JavaScript and React. So, we welcome any engineering contribution: we try to help them, of course. We have a Discord channel at the same time. So, if you’re a contributor and you face difficulty running this locally because you missed, for example, the Rust toolkit or you missed a goal or you missed something else, you’re more than welcome to ping us there and we’ll help you. So, this is the very first, you know, step if you want to start contributing to Hopp.
Brijesh Ammanath 00:47:59 And where do you see pair programming going in the future? So, say five or 10 years down the line, do you think that we’ll have separate tools for pair programming, or will it be just built into the IDEs or maybe even the operating system?
Costa Alexoglou 00:48:12 I think yes, we’ll have different tooling, and it will become more and more a thing, especially now we think about distributed working, remote working as a thing that for some engineers that always existed. But the reality is far from truth. I mean, if you think about it, COVID accelerated remote working a lot and that means we have not yet really adapted in many places of our even lifestyle, how we work, how many hours we work, how we separate work from, you know, because I’m sitting in my desk that I also do hobby project to do my work and my laptop is the same and you know, I get pinged again and things have not fully exhausted yet for this new lifestyle. But I think in the next years we’ll definitely have way more tools tailored for remote working. And one vertical of remote working is pair programming, which is not yet there. We hope we are fixing this and we make our own, you know, dent in the universe, our own contribution for pair programming. And I’m sure many more will come for this remote first era.
Brijesh Ammanath 00:49:18 Right. So final question, what’s one thing you hope listeners will walk away from thinking about after listening to this episode?
Costa Alexoglou 00:49:27 I hope that at least because my assumption is that many people are not familiar with pair programming. That it’s a tool that you can have in your tool belt that will help you become a better engineer yourself. Either you use Hopp or not, you may use Google Meet, you may use Microsoft Teams, you may use Zoom. Anything that you use that pair programming can fundamentally accelerate your career. It can make you a better programmer and it can make you, you know, write better code, have good knowledge of the code base, share the knowledge, help your teammates get onboarded better. So effectively you can use it as you wish but becoming a better programmer by pair programming and accelerate maybe your own or being a tool in your own, as you mentioned extreme programming, you may use extreme programming. This is complimentary. Many people are doing this as a tool in this paradigm. So that’s my, you know, hope after listeners, listen to this podcast.
Brijesh Ammanath 00:50:33 Costa, thank you for coming on the show. It’s been a real pleasure. This is Brijesh Ammanath for Software Engineering Radio. Thank you for listening.
Costa Alexoglou 00:50:39 Thank You.
[End of Audio]


