Xe iaso

SE Radio 621: Xe Iaso on

Xe Iaso of discusses their hosting platform with SE Radio host Jeremy Jung. They cover building globally distributed applications with Anycast, using Wireguard to encrypt inter-service communication, writing custom code to handle load balancing and scaling with fly-proxy, why serving EU customers has unique requirements, letting users use docker images without the docker runtime by converting them to firecracker and cloud hypervisor microVMs, the differences between regular VMs and microVMs, challenges of acquiring and serving GPUs to customers. when to use Kubernetes, and dealing with abuse on the platform. Brought to you by IEEE Computer Society and IEEE Software magazine.

Show Notes

SE Radio Episodes


Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number.

Jeremy Jung 00:00:18 Today I am talking to Xe Iaso. Xe is the senior techno philosopher. I think I got that right at and the last time we spoke it was about tail scale, but today we’re going to talk about Fly. So Xe, welcome back to Software Engineering Radio.

Xe Iaso 00:00:33 Thanks for having me back. Great to be here.

Jeremy Jung 00:00:35 The first thing we should start with is just defining what for those who haven’t heard about it. So why don’t we start with that?

Xe Iaso 00:00:43 is a way to run your app as containers across 35 data centers in one command the same way you would to deploy to one data center. You package your thing up as a docker program, configure its http port press button, receive server.

Jeremy Jung 00:00:59 There’s so many different hosting providers today. What would you say are the main unique parts about Fly? I know you just touched on the multiple locations.

Xe Iaso 00:01:10 Probably the BGP Anycasting and the private networking. BGP Anycasting is a technique that basically lets you use the entire internet as your load balancer or you declare you advertise the same IP range from multiple data centers and then the public BGP constellation will just intelligently route to whichever is the closest in internet geography speak for me, all of the stuff goes to the data center down in Toronto even though the Montreal data center is technically closer. BGP is weird. And the other big thing is private networking powered by Wireguard so that all of your internal traffic between regions is encrypted over the wire and there’s no egress charges going from region to region internally.

Jeremy Jung 00:01:54 And you were saying how Montreal was closer to you, but it went to Toronto. Is there some kind of, is it Anycast that’s responsible for finding what it thinks is the shortest or the quickest path?

Xe Iaso 00:02:06 Well, it’s less Anycast and more my ISP figure out in the BGP bid, which path is the shortest. I think that the connection from where I am in Ottawa to the Montreal Data Center might leave Bell’s network boundary. And I don’t know how Bell’s network works internally, but I’d be willing to guess that crossing a network boundary is more expensive than not having to, so that’s why it’s sending it down to Toronto.

Jeremy Jung 00:02:31 That’s sort of interesting. So it’s not even something that you control, it’s these ISPs may decide to route this traffic somewhere that’s not necessarily the fastest but might be cheaper to them.

Xe Iaso 00:02:43 Yes. BGP routing is an absolute nightmare. Don’t look into it. If you want to sleep well at night, let the experts handle it. That’s the Hakuna Matata way to deal with network engineering. Yeah.

Jeremy Jung 00:02:52 You mentioned really briefly about connections from region to region. So people who’ve worked with hosting services, they’re used to, let’s say Amazon, they have things like VPCs and ways of creating private networks. What is different about Fly’s particular implementation?

Xe Iaso 00:03:11 VPCs are very opinionated things. They let you basically have any kind of network topology you want for better and for worse. The way that Fly.ios private networking differs is that it’s very opinionated. It is IPV six only and you don’t get to control the subnet, but it’s in the IPV six unique local address space. So statistically you’re not going to be interfering with it. But you know how randomness works. I have it set up so that one of my machines in my home lab is a router from my other private network setup to my private network and I can use it to poke things like my large language model inference server from basically anywhere without having to worry about encrypting traffic on the wire.

Jeremy Jung 00:03:53 If I understand correctly, your applications themselves don’t necessarily need to encrypt the traffic when you’re doing something from like your home lab to this instance in’s data center because Wireguard or whatever infrastructure you have is taking care of the encryption.

Xe Iaso 00:04:12 Yes. It does mean that sometimes you have to tell applications, yes it’s okay, it’s actually encrypted, trust me bro. But it works out a lot better than you’d think. Removing problems from the application layer actually makes your life easier because it decreases the way that things can go wrong and makes it easier to tell where the things went wrong.

Jeremy Jung 00:04:31 And you’ve touched on how Anycast allows you to have a single IP address and routes you to potential different instances around the world. Maybe we can go into the details of let’s say you have an application and somebody connects to it. Maybe you could walk through what the steps are from there, if that makes sense.

Xe Iaso 00:04:51 Okay. So I’m going to assume that we have a somewhat Chrome internet connection that works and is able to resolve Google. So when you make a connection to an app’s IP address, your ISP will send you to one of the data centers and that data center is running the fly-proxy program. I think it’s written in Rust. The details don’t matter, but it’ll send it to a data center that has the fly-proxy program running the fly-proxy will look up in its, I think it’s a Raft store of information about what IP addresses have, which apps are running on what ports, and then route you to either a random machine inside your current data center or a random machine globally or the nearest one, I forget which I forget the logic, I think they changed it recently. But either way it just gets a connection to a machine somewhere and then proxies for it. And this also means that if you have an instance of your app running in Seattle and you are closest to the San Jose data center, your client will connect to San Jose and then transparently get Wireguard routed to Seattle, the response gets sent back to San Jose and back to the user. This is all transparent and actually doesn’t add any latency even though you think it might.

Jeremy Jung 00:06:02 So your application itself doesn’t have to be location aware, it’s the proxy that’s deciding which instance.

Xe Iaso 00:06:10 I do have one of my apps running in all 35 data centers partially as because I could and partially because I wanted to see what the challenges were for operating in an app in that many data centers. And it turned out the challenges were basically zero.

Jeremy Jung 00:06:23 And this fly-proxy to someone who’s familiar with other proxies like let’s say NGINX or HA proxy?

Xe Iaso 00:06:33 Think about it as like spicy, opinionated NGINX, that configures itself for you. There’s nothing stopping you from setting up your own NGINX on top of a Fly machine and routing things that way. It’s more of the platform does it for you so you don’t need to think about it. You just associate a certificate with or a domain slash certificate with an app and you know you’re off to the races. If you go to my blog, it just automatically routes you to, I think I only have my blog single home in Toronto right now. I made some engineering mistakes that make it hard to go multi-region.

Jeremy Jung 00:07:06 And okay, so then your application could run without ACTPS because the proxy in front is doing the TLS termination.

Xe Iaso 00:07:16 Yep. There is a way to have it do TLS termination yourself and I think the only reason why you’d want to do that is Minecraft. Yeah, pretty sure the only reason you’d want to do that is Minecraft, although I don’t remember if Minecraft does TLS termination anymore. I haven’t been in the Minecraft hosting server hosting scene for like a decade.

Jeremy Jung 00:07:33 Probably not one of your biggest customers there.

Xe Iaso 00:07:36 No. Minecraft is a uniquely bad case when it comes to hosting because it’s like RAM hungry CPU hungry and single thread performance hungry. It sucks everywhere.

Jeremy Jung 00:07:46 So are there things that the app needs to be aware of when it’s behind the proxy? Because it’s like if I create an application and I assume there’s one instance that my clients are going to talk to, I would think there might be slightly different than if somebody talks to my app they may get any number of instances. Are there specific things you need to account for there?

Xe Iaso 00:08:08 In the case of my website, I need to account for the fact that I have a really weird combination of a dynamic app and a static site. So like Patreon will send web hooks that will cause it to refresh the data but it won’t send the web hook to all of the regions, it will just send it to one of them unless I do some advanced stuff like using Fly replay to shunt the request around like a, to basically pass the request around like a token and a token ring network to make sure everything’s updated. It won’t all get updated at the same time. There are ways I can work around it, I just haven’t figured out the clever way to do it yet.

Jeremy Jung 00:08:44 Yeah, because an example I was thinking of is sometimes applications expect you to be talking to the same instance because you may have some state in memory or something like that. Yeah.

Xe Iaso 00:08:55 I am not sure what the rules are, but I imagine that if you are in the same TCP session it would go to the same machine. Again, I don’t know what the rules are. I think it might vary via TCP sessions or HTTP two streams or phases of the moon or you know, other goat sacrifice related things. No wait, this is networking goat teleportation related things.

Jeremy Jung 00:09:16 Important distinction.

Xe Iaso 00:09:17 Yeah, goat teleportation is a very, very underutilized part of Chrome.

Jeremy Jung 00:09:21 And I think I saw one of the parts of this proxy is the ability to scale up and scale down, do some form of load balancing. Can you kind of explain a little bit about how that works?

Xe Iaso 00:09:34 So the proxy by itself is fairly dumb. It will just choose a random machine either in the local data center or close to it, but there’s a way to add smarts to it by using a response header called Fly replay where it allows you to shunt the request to somewhere else. So if you have a web hook handler and you want to send it to a different machine, you can reply with Fly replay with that machine ID and the proxy will shunt it over to the other machine and even wake it up from sleep if it needs to. So you can take a request, you can set up an app such that it gets a request, it uses the machine’s API to find a list of sleeping machines and then picks one at random shunts the request over the machine automatically wakes up, handles the thing and then after about 30 seconds it goes back to sleep and you get something like, it gets something akin to CGI but more fancy globally distributed with machines.

Jeremy Jung 00:10:32 So when you’re referring to CGI, you’re saying there’s some custom code that you can run in the proxy itself to make this decision?

Xe Iaso 00:10:40 It’s not in the proxy itself, it’s more that your request hits an app and then that app’s replies instantly with the Fly replay header to tell the proxy to replay that request somewhere else.

Jeremy Jung 00:10:50 Oh, I see.

Xe Iaso 00:10:50 It only works for requests where the entire a CTP 1.1 form of the request is one megabyte or less. But if you do it with web sockets you can shunt over the connect request which can allow for some really interesting things.

Jeremy Jung 00:11:02 I See. So it’s more like you have an API where your application’s own instance has to recognize that I’m not capable of handling this request because I’m already too overloaded and you’re basically sending this header back to the proxy so the proxy knows -oh okay I’m going to spin up another instance and then

Xe Iaso 00:11:22 Either spin up another instance or like for example, let’s say that you are able to detect unambiguously somehow that a request is coming from an EU customer and you can use that to just automatically have the request get shunted back over to an EU server so you don’t have to deal with the acronym hell.

Jeremy Jung 00:11:42 . Okay, this website needs cookies, right?

Xe Iaso 00:11:45 Uh, travel to Berlin or anywhere in Germany. The cookie banners have cookie banners. It’s absolutely nuts. Oh yeah and when you go to enough websites in the EU, you will have a mark of cane cookie on your browser session so that when you go back to the US you still get the EU level of cookie banners.

Jeremy Jung 00:12:04 Yeah, just in case, right? They’re like just in case were probably maybe you were an EU citizen so we just want to be sure.

Xe Iaso 00:12:11 But then again like ramen shops in San Francisco had to worry about GDPR compliance because they may have gotten the email address of an EU resident at one point.

Jeremy Jung 00:12:20 Oh wow. .

Xe Iaso 00:12:22 Yeah, it’s very well-intentioned legislation. It’s just very wide reaching because computer bad.

Jeremy Jung 00:12:28 So let’s talk a little bit about the equivalent of the VMs that people are running because I know there’s a tagline on your website that says Docker without docker.

Xe Iaso 00:12:40 Docker is the universal package format for the internet. It’s pretty safe to say that the best way to deploy any arbitrary Linux application to any arbitrary Linux server is to probably put it into Docker or something that’s compatible with Docker. So in this case, uses Docker as a package format and then actually ends up running your container with a separate Linux kernel inside Firecracker under the hood. But it’s basically using the Docker image as a way to encapsulate your app and all of its dependencies into its own little jail so that it can’t hurt anything or be affected by anything on the host.

Jeremy Jung 00:13:16 So the Docker file is this the primitive but what you’re doing is you’re taking the Docker file and creating a VM based on that?

Xe Iaso 00:13:24 Less the docker file and more the built docker image. Okay. Yeah, it creates a micro VM image on Firecracker and runs it that way. It has all the networking hooked up to your private network and the scary public internet.

Jeremy Jung 00:13:37 And what’s the difference between a VM you start with say VMware or Parallels or something versus these Firecracker micro VMs?

Xe Iaso 00:13:47 The main difference is the attack surface. A Firecracker micro VM Firecracker is a very opinionated hypervisor. It doesn’t have very many devices, it doesn’t have a floppy drive, it doesn’t have a PC speaker, it doesn’t have USB. Firecracker doesn’t even have PCI forwarding support. So it’s very small and it allows you to remove a bunch of stuff that you don’t need from the kernel in order to boot so that it can boot faster. I have seen things go from the underscore start symbol in the kernel to the user space in IT tool getting called within about 60 to 120 milliseconds.

Jeremy Jung 00:14:23 So is that a function of them removing all those Io components and the parts you wouldn’t need in a server environment?

Xe Iaso 00:14:31 Yeah, there is a surprising amount of time that is spent in the kernel in the boot process and just the kernel starting up that is waiting for like PCI devices to wake up or like your Wi-Fi driver to load the firmware from DISC or something. It’s kind of remarkable on how long your boot process ends up spending doing nothing. And then there’s cases like that one recent thing in free BSD where people figured out that they were doing like driver loading using Bubble Sort and they swapped that out to Merge Sort or something and like boot up to sped up by like a hundred milliseconds and very complicated cases. But yeah, less surface area, less bloat, more fast. Although there is a case where Firecracker can’t be used and that’s with GPU instances because you know Firecracker doesn’t have PCI pass through and what are GPUs but just opinionated PCI devices with too much RAM.

Jeremy Jung 00:15:24 So the GPU instances, they’re more traditional VMs just because of that.

Xe Iaso 00:15:29 Actually it’s basically the same thing but they use cloud hypervisor instead of Firecracker. Cloud hypervisor is a different thing. It uses the virtualization framework KVM, but it allows you to pass through PCI devices.

Jeremy Jung 00:15:41 So it still gets most of the same advantages. It’s just built by whoever built that instead of Firecracker.

Xe Iaso 00:15:48 Yes, I believe Firecracker was originally built by AWS for Lambda. I don’t know the history of cloud hypervisor but I can’t imagine it’s dissimilar.

Jeremy Jung 00:15:58 And you said the time to start them is in the hundreds of milliseconds so that’sÖ

Xe Iaso 00:16:04 Yeah. The thing that’s going to take the longest is your app starting up and depending on the stack you can have it be really fast or really slow. I write most of my services in Go and Rust so they start faster than the average one. But if you have your app written in such a way that it opens its port as the first thing, it’ll actually start getting traffic routed to it before it’s finished starting up and it’ll look faster.

Jeremy Jung 00:16:27 Are there any trade-offs between using a full VM? I mean other than the Io parts you mentioned that are really not relevant for your use case?

Xe Iaso 00:16:37 Oh certainly the main intention with Docker images is that you crystallize the contents of it once when the image is being built and then you never update it ever again. With a traditional VM it is basically a giant ball of mutable state where that mutable state is what packages it’s running, what kernel it’s using and the configuration of every single thing on there. Really what a micro VM from a Docker container does is it strips away all of that and gives you an opinionated floor to build up on from. And that opinionated floor is we will start the command listed in your docker file CMD entry and then it is your problem.

Jeremy Jung 00:17:14 Yeah and that really lowers the potential surface for you where if you let somebody make a VM they could do pretty much anything with the VM whereas with if you force them to provide you a docker image, you can only do what Docker has.

Xe Iaso 00:17:30 I believe there’s a mark file system as immutable flag to, I haven’t had to use it yet, but I believe it does exist. So you can do that and when you do that I believe it also marks slash temp as mutable for obvious reasons.

Jeremy Jung 00:17:46 Maybe you can talk a little bit about where these micro VMs are being hosted. My understanding is you’re not a layer on top of a cloud provider like AWS or Azure or GCP. So maybe go into what you actually do and what the thinking was behind that.

Xe Iaso 00:18:05 purchased hardware because metal hardware will always be faster than having to share CPU time with noisy neighbors in the cloud. So they have these physical metal hardware and they’re distributed out across the globe in different data centers. I think it’s, I don’t remember if I’m allowed to say what data center providers they’re using, but if you are so inclined, I’m pretty sure you can look it up from a BGP looking class. I think that might tell it. I don’t know, I haven’t really done networking stuff since I stopped being an SRE or at least networking at the BGP level and it uses epic. I think it’s at least the equivalent of Zen 2. But some of the newer things in the fleet are Zen 3 and Zen 4 equivalent. But I do know that the GPU nodes, because they’re the newest, have the most model number of CPU.

Jeremy Jung 00:18:54 And the cores to those. When you issue someone one of these micro VMs, they just map one to one. You’re not sharing the cores with other users?

Xe Iaso 00:19:03 You can choose to share if you want. There’s shared cores and performance cores. Shared cores are potentially oversubscribed and performance cores are very not. But realistically you won’t need performance cores for the P99 of use cases because most of the time your application is just going to sit there doing nothing anyways.

Jeremy Jung 00:19:23 Yeah, I mean I guess a lot of times applications are stuck in Io.

Xe Iaso 00:19:29 Not just stuck in Io but like waiting for the next request. I did some timing analysis on my website and even when I was on like the front page of Hacker News and getting you know, thousands of hits, most of the time that was spent by the application was literally just waiting for the next request.

Jeremy Jung 00:19:46 Oh because it processes requests so quickly. If you’re thinking about the average webpage that they’re not really that CPU intensive.

Xe Iaso 00:19:53 Yeah. The worst that it has to do is uncompressed something from a zip file and because of how caches work, I expect that to be basically free. I haven’t done the math but I haven’t needed to.

Jeremy Jung 00:20:05 And when you’re talking about uncompressing from a zip file, like what’s the zip file? Why is it?

Xe Iaso 00:20:10 Oh that’s just a sin I made in designing my own website worry.

Jeremy Jung 00:20:13 Oh okay.

Xe Iaso 00:20:14 It serves everything from a zip file full of G zip streams so I can cheat.

Jeremy Jung 00:20:17 Interesting, okay. Very specific case then.

Xe Iaso 00:20:21 Yeah, my website has a lot of interesting implementation sins that do make it fast but also make it annoying but do make it worthy of talking about.

Jeremy Jung 00:20:31 And it’s yours so you know how it works.

Xe Iaso 00:20:34 Supposedly on a good day.

Jeremy Jung 00:20:37 The servers. So you mentioned how you actually are buying hardware. Is it your staff that’s actually going and racking these things or is it more you rent some space out at a data center and there’s actually somebody there who takes care of that sort of thing?

Xe Iaso 00:20:52 As far as I understand, it’s co-location and remote hands for the majority of use cases. But yeah, it’s mostly co-location and remote hands. There’s just too many data centers and too many continents too justify having staff around every single one of those data centers. It’s like 35 data centers and it’s a startup like does have people across all those continents. But I don’t know the distribution of where people are that work there versus where the data centers are and who has experience in you know, hardware, SRE stuff. I would be willing to bet that it’s mostly remote hands.

Jeremy Jung 00:21:28 Yeah, that would make sense. I mean there’s certainly going to be cases where you need to buy more hardware and scale up but it’s not going to be this consistent thing. So I would think you would want some help from the data center there. Running a platform where you are going to get new customers and you’re going to need new hardware. I wonder if you have any insight into I suppose capacity planning or how much utilization you’re comfortable with before you’re like okay we need to go buy more things, that sort of thing.

Xe Iaso 00:22:00 I don’t do most of the capacity planning stuff. I do developer relations or realistically marketing, but I do know that people are keeping track of what is being used the most and trying to stay out of the curve with capacity planning. It’s just especially difficult because when you become working at a startup and you do infra stuff, you do not gain the ability to read people’s minds over the internet. So you know there’s always going to be some level of nuance and guesswork, but I guess part of my job is to make their job harder by making more users sign up so that there’s more capacity needed.

Jeremy Jung 00:22:36 It’s a race, you’re trying to get them more customers than they can handle.

Xe Iaso 00:22:39 Yeah, I’m trying to nerd snipe people just fast enough that it causes them to need more capacity but not so fast that it causes them capacity issues. Yeah, I think last year or the year before I forget which I wrote an article about how before I worked there, I wrote an article about how I thought was the reclaimer of Heroku’s Magic and apparently I caused them capacity issues. So that went well doing my job before I even had it. Yeah, .

Jeremy Jung 00:23:04 Oh okay. This was a blog post before you worked there?

Xe Iaso 00:23:06 There was a blog post that was written before I worked there, yeah. I should probably go edit it and add that disclaimer that this was written before I worked there. I probably will after this call.

Jeremy Jung 00:23:15 They got that one for free.

Xe Iaso 00:23:17 I did get some credits out of it and those credits basically paid for my entire cloud bills and I still am writing-off of them.

Jeremy Jung 00:23:26 Oh nice. Because I was thinking as far as the capacity planning thing, I mean I know that with GPUs a lot of companies they want them, right? They want access to them. And so I would imagine that would be something where you have like a certain pool and if a big customer came to you and went like, oh we need 200-800 hundreds for this period of time, you might be like, oh okay, uh hold on a moment.

Xe Iaso 00:23:47 Yeah. The problem with GPUs is that they are backorder for like ages. Oh my gosh. It is very hard to source the kind of GPUs that you need in the quantity that you need in order to keep above the curve. It also doesn’t help that a bunch of the large language models that are getting released have like 70 billion parameters and like the size of a language model is like the parameter count times the parameter size in bits plus 20% for inference space. It’s absolutely just pure insanity. NVIDIA seems determined on not giving people enough VRAM. Like the machine I’m talking on has an RTX 40-80 and that has 16 gigs of VRAM and sometimes I feel limited by that. Like my MacBook Pro has 64 gigs of RAM, Pro tip don’t get a MacBook Pro with 64 gigs of RAM. It makes every time you go into get it serviced two week wait for them to ship a new motherboard from China. The models just keep getting bigger. But my MacBook can run Llama 3 70B, which is basically GPT-4 and only had bigger, the bigger token window. GPUs are really annoying. They do work, they’re just annoying to source on mass.

Jeremy Jung 00:25:04 One of the selling points that you’ve talked about is this global deployment of apps, but I think you mentioned that your own website is on a single region. So like what’s wrong with just having a single region and using CDNs and caching? I mean that’s what I think a lot of people are used to. So what are the cases where that’s not enough?

Xe Iaso 00:25:24 Basically when the speed of light is your limit. The speed of light is a fixed constant, and nothing can really go faster than it. Like no matter what you do, the speed of light from Northern Virginia to the EU will always be at least 125 milliseconds. And if those 125 milliseconds matter, that’s when you’d scale something out globally. But otherwise you know you don’t have to care.

Jeremy Jung 00:25:45 Are there specific, I suppose customers or use cases you’ve seen where these are examples of where you really need that?

Xe Iaso 00:25:53 Yeah, a lot of it has to do with EU customers. They want all their data to be hosted in the EU for understandable data of sovereignty reasons and being able to just put everything in the EU for them is trivial to the point where they don’t even need to bother support to ask how to do it. I know that with Heroku setting up an app in the EU meant that it was its own special type of app and had to have a different domain name than your main app and can cause issues and the like. But with, you know you just, if an app is hosted in the EU and you’re in Seattle, you just are transparently routed to the EU behind the scenes.

Jeremy Jung 00:26:30 And I suppose in this EU case, do you mean that you would actually be splitting your database across regions or what would that look like?

Xe Iaso 00:26:41 I’m going to say the annoying senior software engineer thing here of it depends because it absolutely depends on like what database you’re using, how you have it set up, like how you do sharding for things that are EU versus not EU. It massively depends, but I believe the platform is flexible enough to handle whatever twisted desires your heart has.

Jeremy Jung 00:27:00 Of course, like you said, it does depend. I’m just curious if you take someone’s example of you’ve got a web application with a Postgres database, MySQL database and you’ve got this requirement of I’ve got customers in the EU and I’ve got customers in the US I’m just kind of curious if you’ve seen a specific kind of thing there?

Xe Iaso 00:27:21 I haven’t seen a good answer to this and I’m pretty sure the answer is you can work around it but it sucks. I’m not sure what the best option is because like a lot of the data sovereignty requirements is newer than most frameworks have the ability to handle it. And in some cases just the question of is this user an EU or US resident is a complicated thing. So I’d imagine that it’s very much a, it depends and you probably have to do some extensive hacking to your app to get it to happen. But you know that would happen with any cloud provider and at the very least with it allows you to hide a bit of the badness because you can just request things from the EU database over the private network and you can also request things from the North America database over the private network and shunt requests around to the right regions. Should a US citizen be requesting something from the EU or vice versa?

Jeremy Jung 00:28:12 That’s fair. So I suppose you have the primitives to decide how you want to deal with it and mostly because of the BGP Anycast component and the private networking between regions.

Xe Iaso 00:28:26 And the proxy and Fly replay working on top of both, yeah.

Jeremy Jung 00:28:29 One of the posts on your blog you have a post that’s just titled ìDo I need Kubernetes and the Conclusionî.

Xe Iaso 00:28:38 Okay. So I am actually working on a conference talk for DevOps days Austin, based of the publishing schedule of this blog. This talk will already be live and on my blog probably and I talk about how you scale a social network from a single thing running on a developer’s laptop in a hackathon to the globe spanning abomination beyond all human comprehension. And in there I say that the point where you need Kubernetes is where you are just slightly more complicated than having thing multiple things in multiple regions. And you know, you can hit the point with where you would probably need Kubernetes, but that’s where has your back because has a Kubernetes instance in it. It uses virtual Kubelet to translate Kubernetes YAML line noise into Fly machines for you under the hood.

Jeremy Jung 00:29:28 That’s specific example you gave where you said you need more than just instances in multiple regions. What are some examples of things that you need more of?

Xe Iaso 00:29:37 More than like running identical copies of one region’s deployments in another region. Or if you need something more opinionated like I can only run the notification provider in the EU for EU customers or something. And a lot of that could still be implemented on top of directly, but there’s also the Kubernetes provider should you need it. The acronym is FKS and I don’t know if I’m allowed to pronounce it on this podcast, but what you’re imagining is the canonical name that we use all the time internally. I’m kind of anti-Kubernetes and I’ve played with FKSA a bit and I think that it might make me want to revise my anti Kubernetes standpoint. I mostly use Kubernetes back when everything was V1 Alpha or V0 alpha, all the YAML things were changing constantly. You had to assemble a production worthy deployment from like 17 beta things that

Xe Iaso 00:30:28 frankly broke way too often to be reliable in production. Like NGINX ingress controller was still in beta back then and that’s like NGINX. It was an absolute nightmare and it nearly made me burn out of the tech industry entirely. It just kept thinking that people said this was so much better and yet like I don’t need the flexibility to be able to handle scheduling all of the backend stuff for a Chick-fil-A restaurant or run Weapon Systems on an F35 fighter jet like I am just trying to run a couple HTP services here. Why is this so hard?

Jeremy Jung 00:31:02 Yeah and I suppose like you said, because you would only use it if you had these really specific requirements, there’s probably people that jump to it when they really don’t need to yet.

Xe Iaso 00:31:14 Yeah, I mean in terms of a hammer, it is a decent hammer. Kubernetes does get a lot of things right, like not having to require people to SSH into production to view logs, that already eliminates a huge class of vulnerabilities from developers laptops and NPM. Like the main way that you apply configuration changes is by sending a configuration document to the server and it just figures out the differences. That’s huge. But it’s just they have these really nice fundamentals and then everything layered on top of it just kind of disappoints me.

Jeremy Jung 00:31:44 I think you touched on this a little bit, but the Kubernetes support in is not actually running Kubernetes, is that right?

Xe Iaso 00:31:52 I believe it runs part of the control plane. It does have a core DNS container, but it’s not Kubernetes directly, it’s virtual Kubelet, which is something that is basically a thing that takes the Kubernetes API and that allows you to translate it to whatever calls under the hood so you can use it to spin up ESXI VMs or like functions as a service stuff with web assembly or whatever you want. And this is just so happens to be shims to fly machines.

Jeremy Jung 00:32:18 So in a way it’s similar to the whole concept of Docker without Docker where you’ve got the docker file on the docker images but you’re not running them using the Docker runtime.

Xe Iaso 00:32:30 Kubernetes sand Kubernetes. Yeah. Yep.

Jeremy Jung 00:32:32 That’s your next blog post.

Xe Iaso 00:32:33 I’ll give that idea to send you. I haven’t taken the time to really dig deep into FKS as much as I have wanted to because like I have been busy working on conference talks, and I put effort into it.

Jeremy Jung 00:32:47 Yeah. For the listener who can’t see, Xe just got a big stack of papers that looks like it’s notes or drafts for their upcoming conference talks. So you’ve worked at not just Fly but before you worked at Heroku so you have some familiarity with platform as a service. And I was curious from your perspective, what are some challenges of running one that most developers don’t think about?

Xe Iaso 00:33:12 Abuse. Abuse is sometimes hilarious. Frankly every so often somebody sets up a URL shortener and this just seems to unleash the kraken when it comes to abuse reports where you’ll get people reporting abuse for a URL shortener where all it is doing is, this violates our copyright and you’re like how? There is a best of abuse channel inside the slack where you have some of the human rated best abuse messages we get and occasionally you find somebody doing something that’s like, okay, maybe we should actually stop this. But other times you just have the ongoing war against Bitcoin miners. The war against Bitcoin miners I’m pretty sure is what killed Heroku’s free tier because people discovered that the builder machines have slightly faster CPUs than everything else. So they just spun up a bunch of build things that we’re doing Bitcoin mining in the build step. And because we don’t live in a society where we can have all our dependencies pre-declared so that the build step doesn’t need to have internet access, the build step has to have internet access by design. So that allows Bitcoin mining to happen.

Jeremy Jung 00:34:19 So as a part of the build step, it’s just a really long endless build step that’s mining Bitcoin? Okay.

Xe Iaso 00:34:25 It’s just one shell script that just starts up like XM Rigor or whatever and CPU mines, if you’re wondering if this is at all profitable for them, if they were paying for it, no it would not be, but they aren’t.

Jeremy Jung 00:34:37 Yeah, I mean I would think they would have to open a lot of accounts toÖ

Xe Iaso 00:34:42 Biblical numbers.

Jeremy Jung 00:34:43 Yeah, because even if the Build servers are faster, they have to be mining incredibly slowly.

Xe Iaso 00:34:50 Yeah. CPU mining is just slow. It’s a massively parallel operation and that’s why people do it on GPUs or Build A6 to do it in hardware as fast as they can, but it’s slow.

Jeremy Jung 00:35:00 And then the abuse requests you got for the link shortener, that’s just because they make a link shortener on Fly that redirects to something hosted somewhere else and then you’re getting complaints for the thing that’s hosted somewhere else?

Xe Iaso 00:35:14 Apparently. I don’t understand the logic but I’m not paid to.

Jeremy Jung 00:35:18 Yeah. So have you encountered like similar things with the mining with Fly or is it different because I think you need a credit card, right?

Xe Iaso 00:35:28 It is different because you need a credit card. Realistically it slows people down. Stolen credit cards can make it easier but it’s an ongoing battle and we seem to be winning for now. It’s really frustrating though because at some level having a credit card requirement is a very tight squeeze in the user acquisition funnel and you know, if your goal is to make the user number line go up into the right, having that tight squeeze doesn’t help but it’s required because we live in a society. The best way to do it is probably universal basic income across the globe? That way people don’t need to turn to using public Compute Cloud stuff for mining Bitcoin.

Jeremy Jung 00:36:08 It’s a problem that might be a little too much for a startup to take on.

Xe Iaso 00:36:13 A little bit, yeah. So all you can really do is just to pluck off the ones that you see and to laugh about the rest.

Jeremy Jung 00:36:19 Yeah. In your experience as a developer advocate, what are the most common challenges you see people having? Like what’s bringing them to Fly and then what are they struggling with?

Xe Iaso 00:36:32 The thing that can actually be one of the most difficult ones is getting people to understand how cheap it actually is? Because the pricing model is somewhat multi-dimensional and most people struggle having more than like three dimensions of something in your head at once where it’s you know, by CPU cores versus the amount of time it’s running, versus the amount of memory and sometimes the amount of persistent disc that’s four dimensions and most people can’t really comprehend that. I’ve been thinking about like putting some examples in there. Like one of my projects is something called Arsen and every 12 hours it queries some APIs to get things like the phase of the moon, the current astrological sign, three tarot cards and the current price of Ethereum in United States dollars. And it uses that to make a horoscope about what’s going on using a large language model hosted on a Fly GPU, a Fly GPU machine. And I price that out and it would cost me about $15 per month in the worst possible case. And I have since optimized it further by using Llama 3 8B instead of Mixtral 8x7B. So it’d probably be down to closer to 10 bucks a month.

Jeremy Jung 00:37:42 Yeah, I think because for myself I was sort of curious for somebody running something that would require a GPU, how long I suppose it takes to even spin it up? Like if it’s not running and you need to start up from cold start, is there a period to even get things running?

Xe Iaso 00:38:03 Yeah, the good news is that things are fast enough that the weights should load from NVME volumes to GPU memory in like single digit minutes worst case. It all depends on how you’re running the model, how the model is loaded, what the bit width of the model is. Because loading in in four model into a GPU that doesn’t have in four in hardware can be slower because it has to convert the tensors to something that the GPU can’t understand. It’s complicated, annoying, I hate it, and computers are bad, but it does work enough. I have found that Llama 3 7B or Llama 3 8B at Float 16 seems to load the fastest because GPUs natively have Float 16 supports.

Jeremy Jung 00:38:44 Yeah and I suppose I can see your point about determining price to be a little bit difficult because I think one of the examples projects I saw with somebody, they built the equivalent of like a mid-journey. They built this Discord bot that would spin up a GPU and use the stable diffusion image generation model to generate images. But looking from the outside in, it wasn’t clear to me if let’s say people only used it every few hours or so, how much was I going to be spending? And so I can see how that could be a challenge.

Xe Iaso 00:39:21 Yeah, my Arsen thing uses about two and a half minutes of GPU time per day, which usually works out well enough. It’s about to upgrade the model to Llama 3 8B, I can move it to a smaller GPU size, probably an L40 S in Seattle and I’m going to be able to save a fair bit on there because sometimes with enough GPU nodes sleeping, uh, machine sleeping on the same like server computer box, you can run into a case where you try to request something and it can’t start up because all the GPUs are in use. That’s a rough edge that I know people are working around. It’s just something that has never come up before because GPUs are weirdly sticky, and you can’t share GPUs between people for obvious reasons.

Jeremy Jung 00:40:06 Yeah, and I suppose your use case too is where you know it’s going to be on a daily interval and it sounds like okay, you can predict it’s going to be the three minutes or so and when it’s dependent on user interaction then it gets a lot more difficult to predict.

Xe Iaso 00:40:21 Yeah. And that’s when you have like a hot pool of GPUs, and you spin up more when you need to and everything’s complicated and it’s expensive. But usually by that point you’re making enough money that you can afford to figure it out. I have been thinking about using the L40 S’s N-V-E-N-C course or I think they’re called NVENC to do video encoding a bit better than I can on my CPU at home, but I just haven’t gotten around to it yet.

Jeremy Jung 00:40:48 So I think maybe the last thing we should touch on is because Fly’s, I suppose specialty is on running globally distributed applications. Are there specific use cases or databases or software that really works well in that environment that people should think about?

Xe Iaso 00:41:08 As ATLDR? If your app subscribes to the 12-factor app philosophy that Heroku published a decade or so ago, you’re going to be fine. There is an S3 compatible store called Tigris that does run on top of itself. So it can also act as your CDN for you. So you can get that basically for free as long as you pay for Tigris, which it goes on your bill. In terms of databases, I mean Postgres probably doesn’t suck. I don’t know about how to scale Postgres globally because I’ve not been a database SRE. I assume that it’s either solvable or you do super base to solve it for you. Other databases that might go well, Valkey nay Redis, probably would be fine with an Valkey cluster or whatever they use for it. I know that there’s Upstash that has basically Redis that can be used as an extension running down the list. What else is there? Either way, a lot of the databases are like they can do it, it probably sucks just about as badly as it will elsewhere, but it won’t be like uniquely bad.

Jeremy Jung 00:42:14 One other thing I could think of is that some providers will have, because we were talking about abuse earlier, they’ll have a web application firewall. So if somebody’s just hammering your app not to use it, but because for whatever reason they don’t like you. For your customers, I suppose what are things you suggest they do in that case?

Xe Iaso 00:42:34 Most of the reason why you’d need a WAF is if your compliance people say you need a WAF. And in that case CloudFlare is probably fine. At this time there’s not a web application firewall built into the platform, but the platform would be a good place to build such a WAF should you need it to exist. Especially because of all of the Wireguard networking means that you can have like your WAF user space, Wireguard connections to route traffic to the individual apps.

Jeremy Jung 00:43:02 So I suppose the proxy itself, I mean is there anything there that might protect you in any way?

Xe Iaso 00:43:09 I am not sure. I could imagine how you could make a WAF with the proxy. They just don’t think there’s one there already. As I said, the proxy is kind of dumb.

Jeremy Jung 00:43:17 And then you mentioned CloudFlare. Do you have customers that are using CDNs in combination with what they have on Fly?

Xe Iaso 00:43:26 Yes. The native CDN via Tigris is very new still. So I’d imagine a lot of existing customers would use their own CDN stuff, either from CloudFlare or like me a Bespoke setup. And I recently got rid of the Bespoke setup because it was frankly causing more problems than it gave me benefits and I switched it over to Tigris and I haven’t been happier because I don’t have to think about it.

Jeremy Jung 00:43:49 And then those outside CDNs, they don’t have any issue with the fact that when they talk to the proxy, you’re not really sure where they’re going to be routed. Right. Like in a lot of traditional applications it’s like, I know it’s talking to Virginia, but in this caseÖ

Xe Iaso 00:44:06 Yeah. I don’t know how that interacts there. It’s probably one of those BGP goat teleportation questions. I don’t know what’s up there. I’m assuming that it’s solvable as most things are.

Jeremy Jung 00:44:17 Is there anything else that you think you wanted to mention or thought we should have talked about?

Xe Iaso 00:44:21 does have a machines API to let you spin up more compute, like opening a faucet to get more water and that does let you get the way with a lot of fun. There’s a lot of fun ways to use the machine’s API. When I upload image files from my laptop to Tigris, I actually have something wired up so it’ll spin up a high CPU, high RAM machine just to do the image conversions, upload things to Tigris and then shut down. Because image conversion is one of the surprisingly worst cases for CPU and RAM usage. And I have that wired up as an internal API call for my CDN named ZDN. And when other things like Arsen need to upload an image, it will send the image to ZDN, ZDN will spin up the machine, the machine will do the image conversion, upload it to Tigris, give a URL back to ZDN, which gives it back to our son, which puts it on the website. So that allows you to like chain stuff together and be able to get multiple uses out of things. Or really like the ideal spherical cow reading of microservices architectures.

Jeremy Jung 00:45:26 So I suppose the cause at the beginning of the conversation we talked a little bit about how there’s some form of automatic scaling where the proxy can decide to start instances or shut them down. But this Fly machine’s API is, you being able to directly control when to create and when to shut them down.

Xe Iaso 00:45:47 Yes. You can find out more at

Jeremy Jung 00:45:51 Cool. So if people want to learn more about Fly or check out what you’re up to, because I know you have a really excellent blog, where should they head?

Xe Iaso 00:46:00 They should head to That is X-E-I-A-S-O dot NET or X-Ray-Edgar-India-Alpha- I got them confused because I use voice control software, and the voice control software letter for I starts with a letter S.

Jeremy Jung 00:46:21 Ah, okay.

Xe Iaso 00:46:23 It’s Sit for the letter I.

Jeremy Jung 00:46:24 That’s going to rewire your brain there.

Xe Iaso 00:46:26 Oh yeah. Don’t mess up your hands because voice control software will mess you up. I can’t spell anything for my husband anymore.

Jeremy Jung 00:46:34 Cool. Well Xe, thank you so much for chatting with me.

Xe Iaso 00:46:39 Thank you for having me. I hope you’re all listening. Have a good day. Stay hydrated.

Jeremy Jung 00:46:42 This has been Jeremy Jung for Software Engineering Radio. Thanks for listening.

[End of Audio]

Join the discussion

More from this show