SE Radio 622: Wolf Vollprecht on Python Tooling in Rust

Wolf Vollprecht, the CEO and founder of Prefix.dev, speaks with host Gregory M. Kapfhammer about how to implement Python tools, such as package managers, in the Rust programming language. They discuss the challenges associated with building Python infrastructure tooling in Python and explore how using the Rust programming language addresses these concerns. They also explore the implementation details of Rust-based tooling for the Python ecosystem, focusing on the cross-platform Pixi package management tool, which enables developers to easily and efficiently install libraries and applications in a reproducible fashion. Brought to you by IEEE Computer Society and IEEE Software magazine.

Show Notes

Related Episodes

Other References

Transcript

Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number.

Gregory M. Kapfhammer 00:00:18 Welcome to Software Engineering Radio. I’m your host Gregory Kapfhammer. Today’s guest is Wolf Vollprecht, the CEO and founder of Prefix.dev. He is the creator of both the Mamba Package manager implemented in C++ and the Pixi package manager implemented in Rust. Wolf is also the creator of numerous packages on Conde Forge, a core member of the Conde Forge team and the founder of the Robo Stack Project. Welcome to Software Engineering Radio, Wolf.

Wolf Vollprecht 00:00:49 Thanks for having me. I’m very excited to be here.

Gregory M. Kapfhammer 00:00:52 I’m glad you’re here. Today during our episode we’re going to be talking about Rust based tooling for the Python programming language. To keep our conversation concrete, we’re going to focus on the Pixi package manager. To get our conversation started, I have a two-part question. The first part is, what is the Pixi package manager and then secondly Wolf, why did you choose to implement it in Rust?

Wolf Vollprecht 00:01:17 Excellent questions. So for Pixi, we are actually starting to also call it the workflow manager. So it’s not just a package manager. And Pixi is a project that we’ve recently started to kind of kick off the next generation package management for the Conda ecosystem. And we are sitting on top of the or standing on top of the shoulders of the entire existing Conda ecosystem. And so all the packages that already exist are compatible with Pixi. And Pixi gives you a very modern workflow to work with projects, using these operating system agnostic packages in a way. So you can easily manage your projects on Windows, Mac OS and Linux using Pixi as a package manager. And it’s also at the same time through the roots of our like Conda heritage in a way, we have a very tight integration with the Python ecosystem as well. And yeah, Pixi basically lets you first of all manage your packages and dependencies, then create some cross-platform tasks that you can easily execute on any computer with the idea that all you have to do is get clone my project and then Pixi run start. And that will do all the magic that’s necessary to kind of get your project up and running on that computer that you are on.

Gregory M. Kapfhammer 00:02:32 That sounds really interesting, Wolf. We’re going to dive into the specific details of the Pixi workflow management system. Before we go into the details, can you tell us why did you choose to implement it in Rust?

Wolf Vollprecht 00:02:44 Yeah, for sure. So we actually started out with our low-level libraries and they’re called Redler and Redler implements their Rust crates, which is just a Rust word for libraries I think. And the RedLab project essentially implements everything you need to deal with Conda packages. So downloading what’s called repo data to installing the packages into a target environment and doing everything in between. It’s implemented in that low level library and this library is actually used across all of our projects. So on our website Prefix.dev, we use Redler in the background on our tool to build Conda packages. We use Redler with the tool is called Redler Built and Pixi. And so the first motivation for like using Rust here was that we could easily use it also in our server backend because before I was implemented or I was developing a lot on Mamba and Mamba has a C++ sort of library in the back and that was not easy to use in a server environment because of the memory safety issues of C++. And actually there’s no such an extensive ecosystem of like server libraries in C++ and it’s much easier to kind of do that with Rust in my experience.

Gregory M. Kapfhammer 00:04:05 Alright, thank you for that response. We’re going to start talking a little bit about some of the Python tools and the Python ecosystem and then later we’re going to investigate the specifics of how you built your systems using Rust. So let’s start with some of the basics of Python package management. There’s a couple tools that I think we should discuss. I anticipate some of our listeners will know about them, but Wolf if you could give an intro, what are the tools that are called PyPI and Pipx and how do developers use those?

Wolf Vollprecht 00:04:35 Yeah, so Pip is sort of the, I don’t know, grandfather is maybe the wrong word, but one of the basic package managers in the PyPI ecosystem. And the PyPI ecosystem again is something else than the Conda ecosystem. The PyPI stands for Python Package Index and its basically community-managed website that hosts all of the Python packages more or less in existence. And it’s very easy to just upload and self-publish your own package to the Python package index. It’s very widely used by Python developers and Pip is sort of the premier tool, like the community standard tool I would say, to install these packages. But it’s also, well I don’t want to say simple is the wrong word, but it implements a very bare bones kind of experience. It just basically fetches packages and installs them while other package managers maybe do some more things like lock files. But we’ll talk about that later, I guess. And then Pipx is a tool to take to make packages globally available on your CLI so that you can Pipx cowsay or something like this — and cowsay is just a fun executable — and when you install that with PipX it will be available on your comment line and you can just use it as if you would have installed it with apt get or something like this systemwide.

Gregory M. Kapfhammer 00:05:59 Yeah, thanks for those responses. Let’s talk now about some perhaps slightly more modern Python package management tools. The two that come to my mind are PipM and Poetry. Can you talk a little bit about how those tools work?

Wolf Vollprecht 00:06:12 So the honest answers that I haven’t extensively used either of those two tools .

Gregory M. Kapfhammer 00:06:18 Okay, that’s fine.

Wolf Vollprecht 00:06:19 But I know a few things about Poetry at least because when we developed Pixi we looked at the implementations quite a bit and Poetry has one really nice feature which is log files and I already mentioned that before. Log files basically record exactly the packaged versions that you are using at the time when you were executing something like Poetry install, I assume. And this record is great because then you can easily share it with your collaborators, and they will get the same versions that you have running on your system and you’re not going to run into any conflicts based on diffusing different versions and maybe producing different results or running into a back and stuff like this.

Gregory M. Kapfhammer 00:06:58 Thanks for that response about Poetry. I agree. One of the cool things about the tool is that it first of all adheres to the PI project toil standard in many ways and second of all it has those log files that you’ve mentioned that support reproducible builds. What I’d like to do now is turn our attention to Conda and Mamba once again. Can you tell us a little bit about these tools and how they fit into the Python ecosystem?

Wolf Vollprecht 00:07:22 Definitely. So Conda and Mamba are two tools that also work in the Conda ecosystem. That’s like the sort of bigger term here. And Mamba has a re-implementation of all the principles of Conda but in C++ and Conda was one of the first package managers that basically brought binary packages to the Python ecosystem. So that was before wheels were a thing. But fundamentally Conda also works quite different from Python packages or specifically the PyPI index because with Conda you have channels that are repositories that’s more modeled after like a Linux distribution where all the packages have to work together and be compatible with each other. So the Conda ecosystem has somewhat long history, I think it’s been definitely over 10 years by now. And they are fundamentally, if you look at them, the Conda packages themselves are binary packages that work on multiple platforms.

Wolf Vollprecht 00:08:21 So you have Windows, Linux and Mac OS support and now more specifically Conda has a very rich tradition in the scientific Python ecosystem because you usually combine Python with C++ or C Code or Fortran even. And that has been traditionally very difficult to support on the Python package index. And so Conda has evolved alongside that. What we are trying to do is get it even more widespread into more ecosystems, but I’m going coming to that later on as well. And Mamba. So at some point Conda became very slow. Because one of the fundamental things that Conda does is to use a satisfiability solver to figure out all the dependencies of a package and all the matching versions. And there’s a lot of things that I have to explain, but Conda Forge is one of the largest channels and channels is one repository. Conda Forge is a big open-source community with over 5,000 members and it had some experience, some exponential growth essentially because more and more packages and packaged versions were added and Conda became really, really slow, unbearable even. And what I did is over a weekend sort of sat down, tried some tools from the Linux ecosystem which is especially Lips Solve and used Lips Solve as an alternative set solver in Conda to get higher speed and Mamba as a result of that work where essentially, we use Lips Solve to really drastically speed up Conda. And nowadays I’m happy to report that Lips Solve and Lip Mamba actually is integrated into Conda so that all Conda users can experience that speed.

Gregory M. Kapfhammer 00:09:59 Thanks for that response. You mentioned that one of the challenges a Python programmer might face if they were using Conda was something that was related to performance. And I should note that later we’re going to talk about the details that are related to SAT solving and how you do that in Rust with a certain package that you integrated into Pixi. But before we get into the details about Rust based tooling with Pixi, can you comment on any other challenges that a programmer may face when they’re using these existing Python package management tools?

Wolf Vollprecht 00:10:30 So if you use some of the more simple tools like Pip and Pipx, you might run into performance problems as well because there are some challenges that we also discovered during our work in the PyPI ecosystem where package metadata is just very difficult to kind of solve for in the Python ecosystem itself. So on PyPI you might face issues with version compatibility and ABI versioning and things like that, that are more consistently manageable in something like the Conda ecosystem because again, in Conda you basically use a channel and a channel as a managed repository where certain things are kind of kept in lockstep, like the compiler versions, the library versions that are used and the link libraries and all of this. And that’s one of the big pluses of yeah, the Conda ecosystem.

Gregory M. Kapfhammer 00:11:21 Okay, thank you for sharing those details. I want to turn our attention now to using Pixi, which is a tool that you implemented in Rust. I remember you said before that it’s for a whole management of a workflow. We’re going to talk about some of the specifics about how it helps to manage that workflow. One of the things that I noticed on the website that describes Pixi is that people can use it in a cross-platform fashion on Windows, Mac and Linux. My question is as follows, what did you have to do to make the tool work effectively on three major operating systems?

Wolf Vollprecht 00:11:57 So the majority of work was already done because, well we are piggybacking on the Conda ecosystem and Conda Forge specifically as a really large repository of existing packages. And so a lot of the nice tools that we want to use, or our users might want to use are already available. But then there are some more intricate challenges because Windows and Unix are usually quite different and especially there is no Bash or not really a good Bash on Windows except for GIT Bash, but it’s not really like tightly integrated into the operating system. And so what we are doing is we are using something from the node JS community specifically from Dino, it’s called Dino Task Shell. And for our cross-platform tasks, it basically takes something that looks like a small Bash snippet but executes it in its own shell. And that’s one of the things where we were really happy to use Rust because we could just reuse an existing library there.

Gregory M. Kapfhammer 00:12:53 Aha. So this highlights one of the benefits of using Rust in order to build Pixi. Another thing that I noticed is that you say that Pixi can be used to install both libraries and applications. Can you talk a little bit about the distinctions between a library and an application?

Wolf Vollprecht 00:13:09 So just generally one of the features of Pixi is that you can do Pixi globally install that is very similar to Pipx, which we discussed before where you, if you install something globally, it puts an executable on your path and that means from anywhere on your system you can use it. And so that is something that you would use for applications that you want to have globally available that are not project specific et cetera. On the other hand, libraries, so we manage loads of libraries on the Conda Forge channel and those are tools that you use inside of your project to link against on the basis of sort of code. And so you would add these development libraries like non PI for example. It’s not really an application, it’s a library that you use to some area computing. And so yeah, that’s what you would use in your projects. And for the projects you can either use a Pixi toml file or Pi project toml file and define all your dependencies there and then get them.

Gregory M. Kapfhammer 00:14:07 Okay. You mentioned these toml files and perhaps that connects to my next question. I wanted to talk a little bit about reproducibility. It seems that reproducibility is at the core of Pixi. Can you explain what is this reproducibility, why is it important and how does Pixi provide it?

Wolf Vollprecht 00:14:26 So I think reproducibility is important in many ways. First of all, there’s a big talk about reproducible builds and that’s also something that we want to support with Red Lab Build for example, is that you can be really sure that, you know, you can reproduce this exact program on your computer if you compile something, et cetera. And then on the other hand, as I mentioned before, Conda is big in the scientific ecosystems and reproducibility in scientific world is also a big problem. And with Pixi we want to make sure that the scientist that uses Pixi can get the same results now and then also get the same results in 10 years if you run the same data pipelines and things like this. And we make sure of that by creating a very tight lock file that locks down packages down to the SHA hash. And the other magic ingredient here is that the Conda Forge ecosystem never deletes any old packages. So that means our channel, or the Conda Forge channel is growing sort of unbounded. It’s currently at 12 terabytes or something, but it also means that you can reach back and get the packages from like a few years ago or you could even get Python 2.7 if you really wanted to.

Gregory M. Kapfhammer 00:15:41 Yeah, what you’re describing is an incredible feature of Conda. Thank you for sharing that detail. One of the things I also note as a Python programmer myself is that it’s really important to create a certain kind of isolated environment. Sometimes people call that a virtual environment. Is it possible to create these virtual environments with Pixi?

Wolf Vollprecht 00:16:02 Yes, very much so. At the basis of Conda is also the idea of these environments. They go a bit further than in the Python ecosystem because we also put any shared libraries and everything into the virtual environments. You essentially get an entire Unix prefix in your Conda environment or Pixi environment. And with Pixi, so even if you globally install a package, it lives in its own environment. It is sort of isolated from all the other tools that you globally install. And then additionally, when you create your Pixi project or Pixi toml or Pip project toml file, you can create one or multiple environments for this project and they are local to your project. They live in a dot Pixi folder. And you can have for example, one environment that’s just your default environment. That’s the one that you always get to sort of run your project and then you can create multiple additional environments, for example, for testing or for linting or for all these kinds of more specific tasks that you might want to do but that are not really required for the average user of your project for example.

Gregory M. Kapfhammer 00:17:10 Hmm, that’s interesting. I wanted to pick up on something that you talked about a moment ago in the context of reproducibility. If I’m understanding correctly, I think Pixi actually packages a version of Python along with a project. Is that true?

Wolf Vollprecht 00:17:25 Yes, that’s very much true and that’s also one of the core things of the Conda ecosystem. You can not only get Python packages, but you can also get Python and a load of different other tools like R or C++ compilers, Fortran compilers, number of modern languages like Rust or Go or Nim or zic. All of these are packaged on Conda Forge and all of these can be used in your projects essentially. So it goes beyond sort of just Python, but to really sort of capture the entirety of the Python ecosystem, you also need to kind of have a C or C++ compiler essentially. Because as I said, a lot of the Python tools have C or C++ extensions. And so in the Conda ecosystem we manage all of them.

Gregory M. Kapfhammer 00:18:15 So from the perspective of having an isolated reproducible build environment, why is it important to package these programming languages along with the actual application that you’re building?

Wolf Vollprecht 00:18:27 It’s important for Conda Forge as a distribution. It might not be important for your application itself, but the benefits for us are that you can get the compilers and all the control over the compilers. For example at the time when we are building the packages for distribution later on. And so as Conda, we manage exactly the version of the, for example C compiler that is used and then we build for example the Python extensions with that compiler and the reproducibility comes in because for example, a new compiler might change the output of your build. And when I say reproducible builds, the idea is that you really get bit for bit competitor or equivalent builds even if you are running on different machines. And that’s what we are sort of striving for where you can get exactly the same builds. And our way of doing that is by controlling tightly the Conda packages that are used in diversions and then making sure.

Gregory M. Kapfhammer 00:19:23 If I’m understanding you correctly, it sounds like the behavior of my scientific application should be exactly the same, whether I run it on Windows or Mac OS or Linux, is that correct?

Wolf Vollprecht 00:19:35 That’s almost correct. So I mean that’s the ideal case for sure, but we cannot control the differences between compilers on different operating systems because we are using different compilers on different operating systems. So if you have some C code, it might actually have different optimizations enabled on Mac OS, on clang, on arm processors versus internet processors, and things like this. So I cannot give you that guarantee, but ideally it should be true.

Gregory M. Kapfhammer 00:20:00 That’s a really wonderful comment. I think what you’re telling me here is that you’re still dependent on the idiosyncrasies associated with compilers for different operating systems. Okay. What I want to do now is to turn our attention to other types of package managers. Like for example we talked about previously Poetry or PipM, if I’m remembering correctly, languages like Poetry do not actually bundle the Python language or a C++ compiler directly with the package itself. And in fact, Poetry assumes that you have to install Python on your own first. Can you briefly comment why could that be challenging for a Python developer?

Wolf Vollprecht 00:20:43 It’s definitely challenging because if you want to support your users on that journey and they want to use your project on uh, operating system that you don’t know, you first have to figure out how you can get Python on that operating system. And yeah, I think that’s one of the bigger hurdles for adoption and that’s one of the things that Rust, for example, has solved in a relatively nice way, by creating something called Rust Up, just run Rust up and you get some version of Rust, whether it’s on Windows, Mac OS or Linux and I think the Python community currently lacks this little bit. And then the second challenge is that, you know, project might want to use the latest features and requires Python 3.12. And now I don’t know if the Windows app store has Python 3.12 or what version they are on or Homebrew or a number of these different system level package managers.

Gregory M. Kapfhammer 00:21:37 Thanks for sharing that point. I agree if you have to install Python yourself, you can often run into a whole bunch of challenges. I know that there are some tools that try to step into the gap in order to address those challenges. Maybe systems like Asdf or you could use RTX or I know some people use the Nix language or the Nix package manager. Can you briefly comment how do tools like Nix or Asdf or RTX fit into this picture that we’re developing?

Wolf Vollprecht 00:22:06 Yeah, so they’re pretty different tools I would say. So Asdf as far as I know basically knows how to get a number of different tools. Like you can get no JS or Python or maybe Rust or other sort of lower-level system tools that you want independently of your operating system, but they basically take the releases from some official, I guess repository or GitHub releases or something like this. Nix is pretty different in that respect because Nix is a software distribution, very similar or not very similar, but similar to Conda Forge, they have their own recipes, they do their own compilation of all of the different tools, and you basically buy into a software distribution that is also hosted on GitHub. So there’s one big NIX packages repository that has all these recipes and when you get, for example, I don’t know, no JS or Python from NIX, it might be compiled on your own system, or it might use some sort of global cash of all the NIX packages that are available. But these are not from the official releases, but they come from sort of NIX maintainers and that’s much more similar to Conda Forge where we try to build all the packages ourselves on CI Pipelines.

Gregory M. Kapfhammer 00:23:28 Thanks for illustrating some of the connections between NIX and Conda Forge and then explaining a little bit about how Asdf works. I’d like to turn our attention to Pixi, which is the Rust based package manager and workflow manager that we’re talking about now. I think it’s helpful for us to walk through some of the steps that a Python developer might follow through and it would be really helpful Wolf in addition to commenting on what you would do to achieve that step, if you could tell us a few technical details about how Pixi actually achieves that goal. So let’s start with the first step. I’m assuming that if I’m a Python programmer and I’m using Pixi, somehow I have to initialize my project. Can you briefly comment on how I do that using Pixi and then again, how does Pixi achieve that goal?

Wolf Vollprecht 00:24:18 So maybe we should start with the first step, which is to install Pixi. Here on some systems you might be lucky and you can just use something like Brew install Pixi, which is using another package manager to get another package manager, which is kind of funny, but that’s one way. And the other way that works on all operating systems is to use the command that we have on our documentation, which is Pixi.sh, and then you can use Curl and Bash to basically download the version of Pixi and run it. And after that you can use Pixi self-update to update your Pixi version easily. Now if you want to start a project using Pixi, you first have to decide is it a Python project, then you can use pyproject.toml (?)or you can just use a pixi.toml file for Python projects or anything else.

Wolf Vollprecht 00:25:02 And so we recently adopted the pyproject.toml standard specifically for Python users, but as I mentioned before, Pixi is sort of more universal. So we also have our own format. And the way that it works is you do Pixi init, that will create an empty folder with the pixi.toml file inside, and then you can start filling out some metadata about your project. For example, the author, the version, a description and these kinds of things. And that’s copied a little bit from a cargo.toml file that’s used in the Rust community. And then the key thing is that you can define your tasks and your dependencies. And now in your dependencies you can basically specify Python as a dependency and the Python version that you want to use. And the rest of your dependencies can be things like pandas, numpy, Jupyter Notebooks, et cetera, et cetera.

Wolf Vollprecht 00:25:52 And then there are two more things that you need to specify. So the first one is you need to have an understanding of the platforms that you want your project to run on. So by default, Pixi will use your current platform. So I’m using an M1 Apple silicon computer, and so that is the O6 Arm 64 platform. But you might also want your colleagues that are using Linux or Windows to use your project and then you would add WIN 64 or Linux 64 or Linux Arch 64 and all these kinds of different platforms that you might want your project run on. And then all you need to do is basically Pixi install or Pixi run Python or things like this and that will just install all the dependencies, make sure that your log file is up to date, write out a new Pixi lock file if that’s not the case and run your tools.

Gregory M. Kapfhammer 00:26:48 Okay, thank you for explaining that. I know that many tools in the Python ecosystem have a distinction between application dependencies and development dependencies. And I think you hinted at this previously, does Pixi provide this feature? And if so, can you develop and explain a little bit more about it?

Wolf Vollprecht 00:27:06 With Pixi you can have just regular dependencies that those are added to your default environment and then you can add any number of your own individual environments with more like developer focused dependencies. So let’s say you want some tools to lint your recipes or run tests and things like this, then you can use specific developer environments like def environments and adding to that. So the way that works in Pixi right now is that you first define features and then you define environments, and you combine one or more features to create an environment. And that’s a bit of a maybe interesting idea that we have there, which is that you can say, okay, I have a feature that is test and that will add Pi test to your dependencies. And then you might have a feature that is Pi 3-11 and Pi 3-12 and those just depend on two different versions of Python.

Wolf Vollprecht 00:28:05 And then you can create environments that combine those two features. So that would say, okay, I want to test and the Pi 3-11 feature in my Pi 3-11 test environment and I want the test and Pi 3-12 feature in my Pi 3-12 test environment. What’s cool about that is that you have a very easy way to sort of buildup matrices of things that you want to test. And a lot of library developers for example, they have kind of different requirements than application developers and application developers sort of only cares about running against one version of Python and one version of non Pi or whatever, while the library developer wants to make sure that, you know, he has the biggest coverage of the different configurations that the users could run in including like testing maybe against multiple non Pi versions, multiple Python versions and things like this. And so that’s something that you can do with using these different features and environments in Pixi.

Gregory M. Kapfhammer 00:28:59 Thanks for sharing the details about that feature. I can absolutely see how that would be helpful. As you were talking, it made me think of many of the things that I often have to do when I’m using GitHub actions. Can you talk a little bit about how these features of Pixi might help me when I’m doing linting or development or testing inside of a continuous integration environment?

Wolf Vollprecht 00:29:22 So I think like my dream is that for most projects you wouldn’t really have to write a GitHub actions file anymore, but instead you would put everything into a Pixi toml or Pi project toml file that would include all the tasks that you want to run and all the dependencies that you need. And it would just breeze through it, and you would just have to do something like Pixi run lint and it exactly does what you want, like lint your program and you don’t need, yeah, basically debugging locally is the same as debugging in CI. That’s the dream.

Gregory M. Kapfhammer 00:29:53 Wow. That’s a really powerful statement because at that point I don’t have to worry about writing a program in GitHub actions essentially. I can just have everything specified in my Pixi setup. And then if I understand you correctly, it sounds like everything that would happen on my laptop is exactly what would happen in CI, is that right?

Wolf Vollprecht 00:30:12 That is exactly like, that’s part of why we have the lock file to make sure that your environment is consistent with the one that’s used in CI.

Gregory M. Kapfhammer 00:30:20 Okay. Now what I’d like to do is to turn our attention to some more details about implementing Pixi and Rust and then we’re going to talk about some of the additional Rust packages that you have built in order to support the development of Pixi. So first of all, one of the things that I noticed from reading the documentation about Pixi is that you referred to it as being a fast tool. So I’m wondering can you briefly tell us what do you mean by the fact that Pixi is fast?

Wolf Vollprecht 00:30:48 So one of the big features sort of Mamba was always that it was fast and we know that people really enjoy a fast tool in their tool belt. And with Pixi we kind of managed to get it a little faster, even in Mamba, which is written in C++ and already implements a lot of good tricks and Rust really helped there. So the ideas that we really download everything as parallel as possible and extracted at the same time as downloading. And then the last step of that process is linking where you basically take the file and then create a Heart Link or reference into your environment and Rust really helped us there to make everything async and as parallel as possible.

Gregory M. Kapfhammer 00:31:29 Okay. So you mentioned Async and Parallel. It sounds like you’re trying to run many different tasks at the same time on a computer. Are there any other specific features of Rust that help you to be fast when it comes to developing and or using Pixi?

Wolf Vollprecht 00:31:45 So for developing, I think it’s very fair to say that the extensive ecosystem of crates really helps us to be fast in developing new features. And additionally the Rust compiler also helps us to be fast because it makes it relatively easy and straightforward to, for example, upgrade versions of other dependencies and be sure that if it compiles you have type safety, if it doesn’t compile, it’s like hmm, you already know where to look to fix it. Versus with Python you maybe only find out at runtime that something is broken. So that’s a big plus. And lastly, I mean Rust compiled down to machine code, which always helps with speed.

Gregory M. Kapfhammer 00:32:24 Okay, so you mentioned machine code and I absolutely agree with that. I quickly wanted to pick up on the fact that you said that Rust helps our programs to be type safe, but we don’t have that same type of safety if we were building the tool in Python itself, can you briefly explore that in greater detail? What is type safety and how does that help you as a Rust programmer who’s building Pixi?

Wolf Vollprecht 00:32:49 So type safety basically means that the compiler makes sure that all the types are declared, they are statically known and we are sure about the types that we receive in a function and also that we use to call a function with where in Python it used to be that you couldn’t even use types really and then typing got added later on and you can run tools to analyze your program to understand whether all the types are as expected, but the Rust compiler has a much stronger sort of, yeah, just because of the language and the sort of specifications of the language, it’s much stronger in terms of, yeah, checking all the types.

Gregory M. Kapfhammer 00:33:37 One of the things that I think may sound interesting and yet perhaps counterintuitive is that Pixi can work for Python development and in fact development in many other programming languages, but you built it in Rust. I don’t want to beg the obvious here, but what led you to pick Rust when you decided to create the Pixi system?

Wolf Vollprecht 00:33:59 So it was not initially completely obvious decision because I already had a lot of experience with C++ and Mamba and we could have stick to the C++ ecosystem, but in hindsight I’m very happy that we chose Rust. It comes with this rich ecosystem; it helped us to use the same sort of underlying code in our website as well and our platform and we want to capitalize on that in the future much more by building awesome services. On top of that, a lot of the things would have just been quite a bit harder in C++ and also slower. And I don’t miss to debug segmentation faults. So uh, I think that’s some of the reasoning.

Gregory M. Kapfhammer 00:34:40 That’s one thing I could really connect with . I also don’t enjoy debugging segmentation faults in C or C++ programs. I think that response was very helpful. So what I want to do now is turn our attention to some of the other Rust based tools that support Pixi. And the first one that I remember reading about is something that’s called Rattler Build. Can you tell us what is Rattler Build and how does that integrate into Pixi? Yeah,

Wolf Vollprecht 00:35:06 So it doesn’t really integrate straight away into Pixi as of now, but Red LabID is sort of a companion project that is used to build the Conda packages in the first place. And we’ve been really hard at work on Red LabID for the past year or so and we are relatively close to actually enable it in the Conda forge distribution which has a long sort of, yeah usage of Conda Build and has over 20,000 different recipes and things like this. This is a large-scale project and we’ve been gearing up to that moment for a long time now and uh, hopefully we can get it over the finish line very soon. But basically if you want to create a package for your own software or someone else a software, you can use BU today to compile it and turn it into a content package that you can install with Pixi.

Gregory M. Kapfhammer 00:36:00 Okay. Now one of the things I remember about Rattler build is that it helps developers to create cross-platform relocatable binaries. Can you comment quickly what does it mean if it’s cross-platform and then perhaps go into additional detail what does it mean if a binary is relocatable?

Wolf Vollprecht 00:36:17 So cross-platform in that sense that Red Lab build works on all the different platforms that we support. So Windows, Mac, and Linux. But you need to run somewhat specific code on each of the platforms. Sometimes it’s directly translatable, sometimes you need some special handling on Windows et cetera. But basically Red Lab Build is completely ready to build your packages for all of these platforms. And then relocatability is one of the key features that enables us to have these virtual environments and it basically means that you can take a package that was built by Red Lab Build and installed it anywhere on your system. If you take a Debian package, you cannot usually install it anywhere on your system. It has to live in a very specific place. For example Slash Lip cURL or something like this. That’s where Lip cURL has to be, while for us it can be on any prefix on your system.

Wolf Vollprecht 00:37:07 So any virtual environment. And that is enabled by two tricks, one on Mac and one on Linux and Windows works a bit differently. In Linux we use a tool called Patch ELF and we use Patch ELF to take shared library because we also like shared libraries in Conda Forge as opposed to static libraries. And we use Patch ELF to encode something that is a relative rpath. And so the shared library will attempt to load shared libraries that are relative to itself instead of looking at some absolute path. And we do the same trick with another tool that’s called install_name_tool on Mac OS. And we’re also currently in the process of trying to rewrite those tools and Rust so that the entire experience is more integrated even. And lastly, another trick that we’re using is when we build the package, we install it into a prefix that has a very, very long name, it has a insanely long placeholder and as a last resort what we can do is at installation time we can just sort of replace that long string with the final installation prefix and those are the two tricks that make the packages relocatable and the binary inside.

Gregory M. Kapfhammer 00:38:22 Okay, thanks for explaining those tricks. Both of those make sense to me. Before we move on to the next topic, I wanted to make one more point about Rattler Build. Is this a system that works for Conda and Conda Forge and PyPI or only one of those systems?

Wolf Vollprecht 00:38:38 So Rattler Build only works for Conda packages right now and it’s not really intended as something that works directly for PyPI.

Gregory M. Kapfhammer 00:38:48 Okay. I want to turn our attention now to another system which is called RIP. When I learned about RIP it said that it was a library for resolving and installing Python PyPI packages from Rust into a virtual environment. Is that the right understanding and can you develop what RIP does and how it works?

Wolf Vollprecht 00:39:09 Yes. So RIP is a library that is also written in Rust and that can basically resolve packages from PyPI which works slightly different than Conda and then install the packages into a virtual environment. That’s completely correct. And we implemented our own SAT resolver and we made it worked pretty nicely for PyPI packages as well where we had to do some additional modifications specifically for that ecosystem because contrary, so in the Conda world what you get is you get all the rapid data, all the index data of all the packages upfront in one gigantic file and on PyPI you have to grab it yourself piece by piece. And so what we did is we made RIP able to lazily get the package data and then resolve the packages lazily.

Gregory M. Kapfhammer 00:40:01 You mentioned a moment ago the concept of SAT solving and I think this would be interesting to explore in greater detail. Can you tell us what is SAT solving and how in the world does that actually connect to installing Python PyPI packages?

Wolf Vollprecht 00:40:17 Yeah, so both Conda and PyPI use sat solvers, Conda since a relatively long time or since the inception and PyPI or Pip specifically added that later on maybe a few years ago. The idea of SAT solving is that you have all of these different dependencies and dependency specifiers. So for example, non-PI might depend on Python 3.12 and so on. And so that’s a dependency with a dependency specifier and you want to figure out what version is compatible with my version of non-PI and all the other dependencies that you’ve specified and you can translate that into a satisfiability problem that is pretty well studied in computer science with like one small difference and that you’d not only want to know if there’s any solution that satisfies your request, like if you ask for non-PI let’s say, but you also want, usually you also want the solution that gives you the highest versions of all your dependencies. And I think that is roughly equivalent to something called max set, which is a set problem that tries to maximize some cost function or whatever.

Wolf Vollprecht 00:41:26 Yeah, so both Conda and PyPI use that to figure out the right packages to install and they use very different sat solve approaches. Conda historically used some sort of like different sat solvers with their own kind of extra loops on top to maximize for some things. Then Mamba replaced that completely with libsolve, which worked very nice. And in our project, in Pixi, we use Resolvo, which is a Rust library that we’ve implemented. And that is our sat solver, and it’s based on libsolve and minisat, which is a sort of widely known I think sat solver that is easy to understand.

Gregory M. Kapfhammer 00:42:14 Okay. So when you use this Rust-based Resolvo package inside of RIP, the idea is you’re trying to find some package at a specific version that will work for all the dependencies of the system. Is that the right way to think about it?

Wolf Vollprecht 00:42:30 That’s the right way to think about it, yeah. So basically if you install numpy it might depend on a host of other libraries and you want to figure out what are the compatible versions and what are the maximum compatible versions that you can get. Those are the ones you select.

Gregory M. Kapfhammer 00:42:48 Okay, good. Now one of the things I learned about Resolvo is that it supports something called incremental or lazy constraints solving. Why is it necessary to perhaps be lazy when you’re solving constraints in order to install a package?

Wolf Vollprecht 00:43:02 It’s necessary because in the PyPI world you get the metadata not from one central index file but from all the wheels themselves. So instead of downloading one gigantic file, you go and download the first wheel for example non-PI in our case, and then you look at the dependencies of non-PI and then you recursively fetch the other wheels that you might need. So that takes a while and to make that faster, we don’t want to do that upfront. We want to sort of have the Resolvo guide us on what to download and yeah, that’s basically what Resolvo can do is lazily fetch what’s necessary. And recently we also made it async so that you can do that in parallel with sort of like keeping on iterating and yeah, those are some of the tricks we implemented for PyPI.

Gregory M. Kapfhammer 00:43:51 So this is the second or third time that you mentioned the idea of a wheel in the context of Python. Can you comment what is a wheel? And then when you’re done with that, can you share a little bit about how RIP looks into the wheel and then supports this aggressive caching of PyPI metadata?

Wolf Vollprecht 00:44:09 I can, so a wheel is basically a zip file, and it’s a newer format compared to estes, and there was also something called ecc files in between. So estes are just source distributions, that means no compiled code whatsoever, and they’re also the most annoying ones probably to install because you need to kind of build them on your own local machine, which takes time and effort and sometimes gives you incompatibilities. Egg files are completely deprecated, you should never use them, and wheels are the modern way of packaging Python programs and they have some nice properties. One is that they have some more static metadata inside for. Estes you first have to build the ester to kind of understand everything and like you have to execute a set that Pipe script sometimes at least to know the dependencies let’s say. And with wheels you have static metadata that you can just read and then you know the dependencies.

Wolf Vollprecht 00:45:10 The other point is that wheels can ship binary artifacts similar to Conda packages actually where you have shared objects or shared libraries or executables inside of the wheel files and you don’t have to compile them on your own machine. And so one thing that we made for example, for making wheel file retrieval very fast and efficient is that we are reading this, it’s a zip file and we are reading it I believe from the end and we read a certain amount of bytes and we try to get the static metadata from that amount that we’re reading from the end of the wheel file without having to download the entire file. And based on that little bit that we’re downloading, we can already determine then whether we have to download the entire file or what the dependencies of that file are, then we can already continue and then download the file and things like this. So it’s a way to kind of speed up the process of getting all the metadata that we need.

Gregory M. Kapfhammer 00:46:05 To be clear, when you’re talking about metadata, I’m assuming that one part of the metadata for a package is the dependencies for that package. Is that correct?

Wolf Vollprecht 00:46:13 Yes.

Gregory M. Kapfhammer 00:46:14 Okay. So then you’re downloading the data, you’re caching the data and you’re recognizing that you can’t get all the data from one location, it’s in individual wheels and then once you have that you use Resolvo to perform constraint solving. Is that the rough approximation for the workflow?

Wolf Vollprecht 00:46:32 That’s the rough approximation. That’s correct.

Gregory M. Kapfhammer 00:46:34 Okay, good. So if listeners are interested in learning about other Rust based programs, they may want to listen to Episode 581 where we talk about the Warp dev Terminal Window. What I want to do now is to turn our attention to an opportunity for reflection. Briefly, I think it would be great Wolf, if you could talk a little bit about some of the key benefits associated with building Pixi and Rattler Build and RIP in the Rust programming language. I know you’ve hinted at for example, type safety and the development environment. Are there other key benefits associated with using Rust to build all of these tools?

Wolf Vollprecht 00:47:12 I think the biggest benefit, and I already mentioned this, is the vast ecosystem of existing crates that are at least to some degree well maintained and you can get some interesting crates with interesting features straight away and just use it in your project. That’s really nice. I think that’s where Rust has more sort of resemblance of maybe the node JS and PM ecosystem or the Python ecosystem while C++ is really lacking in that regard. Like with C++ you maybe get some bigger libraries but not these kind of like small nice things. Memory safety is a big feature for me really like statically compiling things is a big one. The async ecosystem of Rust is great and also not really available like that in C++. Yeah, Fearless Concurrency is one of the keywords and that is actually pretty nice.

Gregory M. Kapfhammer 00:48:03 Yeah thanks for mentioning the idea of Fearless Concurrency. I can definitely see how that would help you to make Python tooling that’s faster in Rust. What I want to do now is to turn our attention to some of the drawbacks associated with using Rust. Did you notice any drawbacks associated with picking Rust in order to build things like Pixi?

Wolf Vollprecht 00:48:22 There’s one drawback and that’s always speed of course. So the compilation speed itself is, well you know, sometimes you have to wait a few seconds or half a minute or something like that for your program to compile and be ready to like test or use. It’s a relatively new language but not super new. So maybe I think that has pros and cons in terms of access to like, let’s say talent. A lot of people are very interested in learning Rust or using Rust, but maybe there’s not so many people that have over 10 years of experience. Yeah. But it’s been a very good experience so far, so I don’t really have many complaints to be honest.

Gregory M. Kapfhammer 00:49:00 Okay. If a new programmer was just getting started in order to learn how to program in Rust, do you have any advice for them?

Wolf Vollprecht 00:49:08 I think my personal approach is always to just, you know, like pick a project, try to do it and experiment with it and learn by doing basically. There’s also the Rust programming book which is really available online and I have a copy at home and it helped me getting started and then since, but I’m sure everybody knows this by now, but chat GPT actually also helped me quite a bit to really get into Rust and these kind of things and just send your arrows there pasted and then see what it says and usually it’s decent twice.

Gregory M. Kapfhammer 00:49:41 Aha, good response. So what I want to do now is draw our episode to a conclusion. We’ve talked about Rust based tooling for the Python programming language and we focused for example on a tool called Pixi. Do you have a call to action for the listeners of Software Engineering Radio when it comes to either Python programming or Rust programming?

Wolf Vollprecht 00:50:04 I don’t think I have a call to action on that regard, but what I would like all listeners to do is check out our documentation on Pixi data stage, try Pixi and tell us your feedback. We are more or less 24/7 on Discord and always listening to feedback and ideas and if you want to do some work on a CLI tool in Rust, then obviously we accept pull requests, and we’d love more community contributions.

Gregory M. Kapfhammer 00:50:33 Okay, thank you for that response. I myself can comment that the Discord community is really active and that Pixi.sh is a wonderful website, so thank you for creating those community resources as well.

Wolf Vollprecht 00:50:45 Definitely.

Gregory M. Kapfhammer 00:50:47 Are there any other topics that we haven’t covered during the conversation today that you want to highlight for our listeners?

Wolf Vollprecht 00:50:53 I think I will have to make one statement, which is that we have developed RIP, and I talked about it and we’re really proud of it, but we actually decided at some point to switch to another Rust based PyPI compatible package manager. So in Pixi we are currently integrating with another tool that’s called UV. So, we still have big hopes in terms of using at some point the Resolvo library inside of UV because we think it would be cool. But as of now, sort of, RIP is on pause because basically UV exists and is a Rust implementation of the same ideas as RIP and for us it’s easier to sort of concentrate on making Pixi a really great experience than dealing with multiple projects at once.

Gregory M. Kapfhammer 00:51:42 I remember reading that announcement about the use of UV on the website for your company. Can you help our listeners to understand a little bit better what is UV, who created it, and then how is it going to integrate into your tooling?

Wolf Vollprecht 00:51:56 Yeah, so UV is, Pip-compatible package manager. One thing that UV did is that it created a comment line that you can use as a drop-in replacement for Pip or other PyPI tools, and it’s a Rust program. It uses a slightly different SAT solver but based on the similar ideas — PubGrub-RS is the one that they’re using — and it is a very nice project, has very good sort of implementation for the caches and things like this. And so, it basically gives you a very decent speed boost over just using Pip. And for us it’s important to have a Rust-native thing that we can use inside of Pixi to really make sure that our lock files are the way they should be, and we have the right packages, and we can lock down not only the Conda packages but also the PyPI packages and make sure that everything is consistent.

Gregory M. Kapfhammer 00:52:53 Okay. Thank you. Quickly, I just wanted to confirm UV is also implemented in Rust and it’s an open-source project available on GitHub. Are those points both right?

Wolf Vollprecht 00:53:04 Yes. And it’s made by Astral, the company that also created Ruff.

Gregory M. Kapfhammer 00:53:10 Yes. Thanks for mentioning Ruff, which is another tool that also uses the Rust programming language and integrates into the Python ecosystem. Do you want to briefly comment what was Ruff since you mentioned it a moment ago?

Wolf Vollprecht 00:53:23 So Ruff is a linter for Python and also a formatter. So yeah, it can replace a number of tools like Flake or Pylint, and number of others, with one tool that’s a lot faster because it’s written in Rust and uses a different way of parsing Python code, essentially.

Gregory M. Kapfhammer 00:53:44 Okay. As we finish our episode here today, if there’s a listener who wants to get started with using Pixi, what would you suggest that they do?

Wolf Vollprecht 00:53:52 I would suggest to check out the documentation. We are currently writing a bunch of tutorials that should be interesting to follow.

Gregory M. Kapfhammer 00:53:59 Okay. Thank you for all of these details about how to build Rust-based tooling for the Python programming language. This has been incredibly helpful and an awesome conversation. If you’re a listener who wants to learn more about programming in Rust or Python, you’re welcome to check the show notes for additional references and details. Wolf, thanks again for being a guest on Software Engineering Radio.

Wolf Vollprecht 00:54:22 Thank you.

Gregory M. Kapfhammer 00:54:22 All right, thanks listeners, if you have any questions or comments, we’ll be delighted to receive them. Bye now.

[End of Audio]

SE Radio 622: Wolf Vollprecht on Python Tooling in Rust

Show Notes

Related Episodes

Other References

Transcript

Join the discussion

More from this show

SE Radio 727: Jeroen Janssens and Thijs Nieuwdorp on Using Polars

SE Radio 726: Scott Kingsley on the Swagger Ecosystem

SE Radio 725: Danny Yang and Sam Goldman on the Pyrefly Type Checker

Menu

Recent posts

Search

Search

SE Radio 622: Wolf Vollprecht on Python Tooling in Rust

Show Notes

Related Episodes

Other References

Transcript

Join the discussion

More from this show

SE Radio 727: Jeroen Janssens and Thijs Nieuwdorp on Using Polars

SE Radio 726: Scott Kingsley on the Swagger Ecosystem

SE Radio 725: Danny Yang and Sam Goldman on the Pyrefly Type Checker

Menu

Recent posts