Dave Cross, owner of Magnum Solutions and author of GitHub Actions Essentials (Clapham Technical Press), speaks with SE Radio host Gavin Henry about GitHub actions, the value they provide, and the best practices for using them in your projects. Cross describes the vast range of things that developers can do with GitHub Actions, including some use cases you might never have thought about. They start with some general discussion of CI/CD and then consider the three main types of events that drive GitHub actions before digging in to details about fine-grained action events, Action Marketplace, contexts, yaml, docker base images, self-hosted runners, and more. They further explore identity management, permissions, dependency management, saving money, and how to keep your secrets secret.
- Episode 554: Adam Tornhill on Behavioral Code Analysis
- Episode 544: Ganesh Datta on DevOps vs Site Reliability Engineering
- Episode 521: Phillip Mayhew on Test Automation in Gaming
- Episode 498: James Socol on Continuous Integration and Continuous Delivery (CI/CD)
- Episode 482: Luke Hoban on Infrastructure as Code
- Episode 440: Alexis Richardson on GitOps
- Episode 424: Sean Knapp on Dataflow Pipeline Automation
- Episode 221: Jez Humble on Continuous Delivery
- Evolution of GitHub Action Workflows
- On the Use of GitHub Actions in Software Development Repositories
- How Do Software Developers Use GitHub Actions to Automate Their Workflows?
- GitHub Actions Essentials – Automate, Integrate, Deploy: Unlocking the Power of GitHub Actions
- Features • GitHub Actions
- Dave Cross (@[email protected])
- GitHub – PerlToolsTeam/planetperl: Perlanet configuration for a Perl Planet
- planetperl/buildsite.yml at master · PerlToolsTeam/planetperl
- Git scraping: track changes over time by scraping to a Git repository
- First interaction – GitHub Marketplace
- GitHub – actions/runner-images: GitHub Actions runner images
- Events that trigger workflows – GitHub Docs
- Events that trigger workflows – GitHub Docs
Transcript brought to you by IEEE Software magazine.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.
Gavin Henry 00:00:16 Welcome to Software Engineering Radio. I’m your host, Gavin Henry, and today my guest is Dave Cross. Dave has been programming professionally since 1988 and a Perl user for a very long time. I actually came across Dave in 2010 when I was a big Perl Catalyst user. He is the author of Data Mining with Perl from Manning and a co-author of Perl Template Toolkit from O’Reilly. Dave runs and owns Magnum Solutions, an open-aource development consultancy based in London. His latest book is called GitHub Actions from Clapham Technical Press. Dave, welcome to Software Engineering Radio. Is there anything I missed that you’d like to add?
Dave Cross 00:00:58 Hi, thank you for having me. No, just to emphasize what you said about my career starting in 1988, which means I am very old, and the fact that I’m enthusiastic about some of these newer technologies is because so much of my career was spent without them.
Gavin Henry 00:01:15 So you’ve seen the before where it was all manual and everything.
Dave Cross 00:01:19 Absolutely. This is so much easier.
Gavin Henry 00:01:21 Excellent. Perfect. So we’re going to have a chat about, obviously, this show’s about GitHub actions. We’re going to talk about the value they provide and discuss an example project that implements the main parts of continuous integration and continuous deployment with a few surprises along the way. So let’s get started. Continuous integration and continuous deployment. Let’s start with the basics. Dave, what is CI?
Dave Cross 00:01:45 So CI, it’s automating the bits of your project which mean that you can measure the quality of your project, I guess, It means that every time you commit some new code to your code base or changed code into your code base, you can run processes which do things like run unit tests, run a linter against your code base, and perform other quality metrics like maybe measuring the complexity of the code or the coverage of your tests, that kind of thing. The kind of numbers that might end up on a dashboard that is on a monitor hanging above the development team so that everyone who walks past the team can see how good your code is.
Gavin Henry 00:02:33 If you were to come across a new project on GitHub or your recommended one, what would be the first thing you’d look at to see what the continuous integration would be?
Dave Cross 00:02:42 I think the first thing that I would be looking at is the coverage. Just to see how well the test suite matches the amount of code that you’ve actually got in the project. Having a test suite that covers the code base well means that you have more — it’s easier to change code and know that you’re not breaking things.
Gavin Henry 00:03:04 Yeah, it gives you that safety net, doesn’t it? And obviously you’d want there to be some type of continuous integration in the project.
Dave Cross 00:03:11 Yes, yeah.
Gavin Henry 00:03:13 So that would be the first thing hopefully. What is continuous deployment?
Dave Cross 00:03:17 So that’s the step that comes after continuous integration. It means that once you are happy that your code is good or even better than it was previously, then you can automatically take that code from your GitHub server or whatever source code system you are using and move it into production in a manner that is easy to reproduce. So, hopefully just pressing a button and at the end of some processes running, the code is up on your production server and running.
Gavin Henry 00:03:56 Excellent. Thank you. For the listeners who want dig into CICD — continuous integration, continuous deployment — more, we’ve actually done a full show on it, which was show 498 with James Socol on Continuous Integration, Continuous Delivery. We’ve done episode 554 on Behavioral Code Analysis, which was really good. Episode 544, 482, 440, 424 and an older one on Continuous Delivery, Episode 221. I’ll put those links in the show notes, but it helps expand on this very light overview I’ve just done with Dave. So before I move us on to the core of the show, which is GitHub Actions, is there a sort of low-hanging fruit to put into CI as a safety net and something in CD, or does it depend on the project and you know, the software developer?
Dave Cross 00:04:47 To a large extent, I guess it does depend on the project, but as I said earlier, I think getting your unit tests running in some kind of CI framework is very useful.
Gavin Henry 00:04:58 Excellent. And there’s simple things on GitHub, like, I suppose it depends on the project, like the Dependabot thing or what’s their static analysis one? CodeQL, I think it.
Dave Cross 00:05:09 Yes. Yeah. Yeah. And there’s things that do things like looking for secret and things like that.
Gavin Henry 00:05:16 Yeah. Depends on what you’ve got in your project I suppose.
Dave Cross 00:05:19 Yeah.
Gavin Henry 00:05:21 Excellent. Thank you. Right, so now we’re going to dig into GitHub actions. Most of the show will be spent between this section and the example project. Please bear with us. So Dave, what is or are GitHub actions?
Dave Cross 00:05:35 So GitHub actions is a, I was trying to, trying to work out when it was that GitHub Actions was introduced. I reckon it’s about a couple of years old, and listeners may have come across product like Jenkins or Travis CI or Circle CI, which many projects, or many of my clients, are using to do CI and CD. GitHub actions is GitHub’s answer to that. It allows you to define workflows and the definition of those workflows actually sit within your code repo. And then, in response to various events, GitHub will fire up a container and run through the steps in your process, which allows you to do CI and CD, but it isn’t limited to that. And as we’ll, I’m sure, mention later there are plenty of other things that you can do with it.
Gavin Henry 00:06:32 Yeah, I think for a long time, GitHub, as it was GitHub or before Microsoft, there wasn’t any GitHub actions. So you had to use one of those, and then they were quite late to the game, weren’t they, for various reasons?
Dave Cross 00:06:45 Yes, yes. But I guess they countered that by coming up with something that is more powerful than Jenkins or Travis CI because, as I say, it’s not just limited to CI and CD.
Gavin Henry 00:06:58 Yeah, exactly. And you mentioned there that it does things based on certain events. Would that be solely defined as an event-driven architecture?
Dave Cross 00:07:06 Yes, it’s an event-driven architecture, but I suppose you need to be quite liberal in your definition of what an “event” is because it’s event-driven to the extent that you can trigger your workflow to run when something is pushed to your code base: when you get a pull request to your code base, when someone raises an issue in your code base, all these kinds of obvious source code control events. But there are other things. It basically gives you a complete cron job implementation. You can trigger events, trigger workflows on time purely; you can trigger workflows manually. You can get a button on the workflow page and say just run this now. Or the other thing you can do is basically use it as a web hook. So you can just make an HTTP request GitHub, and it will trigger your workflow. So there’s plenty of different ways of running a workflow.
Gavin Henry 00:08:07 Oh, that’s brilliant. It’s one I wasn’t aware of is the web hook option. And I’d like to explore with you, I think it’s on the agenda, when somebody raises an issue as well, what you can do with that. So I presume the owner of the project needs to create some type of definition of what they want to happen with GitHub Actions. Can you take me through what that looks like?
Dave Cross 00:08:31 Yeah. So inside your repo, GitHub have defined there’s now a dot GitHub directory, which you can create. And that’s where GitHub-specific files go. One example is, you mentioned earlier, dependabot; and you can put a YAML file in there, dependabot.yaml, and that defines what kind of dependabot of interactions you want. But also within that dot GitHub directory is a workflows directory. And inside there you can create as many YAML files as you like. And each of those is a workflow definition. So within a workflow definition, there are a number of steps. There’s kind of a header, which gives the workflow a name, tells it what architecture you want it to run on, and I guess we’ll come back to that in more detail a bit later on. And then there’s a number of jobs which define the code. And jobs can be broken down into individual steps, and each individual step is a piece of code that you want to run. So that, that’s kind of the high-level look at what a workflow definition looks like.
Gavin Henry 00:09:42 And before I move on to the next question, those that don’t know what dependabot is, I suppose better define it. Do you want to have a go, Dave?
Dave Cross 00:09:49 Yes. So dependabot is a — I’m not sure whether it comes from GitHub or whether they’ve brought it in from another company. It does a number of things. The thing when I first saw it was when I started having some GitHub pages, websites within my repos, that were generated using various node applications. And dependabot would come along every once in a while and check dependencies within my node applications and make sure that I wasn’t running versions that had known security leaks. And it wouldn’t just check and give me a warning, it would actually produce a pull request, which fixes the problem by bringing the dependency up to a known good version. It works in a number of different ecosystems, checking for outdated dependencies that have security issues.
Gavin Henry 00:10:43 Thanks. So just to summarize, GitHub has some predefined things it wants to see in its project repository, which would be — on your file system, it would look like a hidden folder, but it’s actually dot Gitrhub. GitHub-specific things live in there, depending on what you’re trying to do. But in general there’s a workflows, is it flows?
Dave Cross 00:11:05 Workflows directory, yes.
Gavin Henry 00:11:07 Yeah. And then inside there, anything that is a, does it have to be a .YML file, or…?
Dave Cross 00:11:15 I think so, yeah. I’ve never tried putting anything else there, but yeah.
Gavin Henry 00:11:18 Me either. So anything in there that it can parse and figure out, it would generally show up under the Actions tab on the GitHub project?
Dave Cross 00:11:27 That’s right, yes.
Gavin Henry 00:11:29 And just to touch on the event-driven thing, I presume you can go to the Actions tab on your project and click Go to run something?
Dave Cross 00:11:38 Yeah, so I talked about there being a kind of header section in the workflow definition file. One of the options there is name, as I mentioned, but the most important one is On — just O-N — and that defines how your workflow is triggered. And so, it would be a list of different ways that you want it to trigger on a pull request or a push. And one of those is a special value called workflow batch. And I can never remember whether it’s an underscore or a dash, but if you’ve got On Workflow Batch in your workflow definition file, then when you go to the page for that action in your repo, there will be a button that just says “run Action” or “run workflow.” And you just push that and it runs it.
Gavin Henry 00:12:26 So you could use that to actually go to production — so it’s not automated; someone has to push it?
Dave Cross 00:12:34 Yes.
Gavin Henry 00:12:35 I didn’t know that. Excellent. Can the Actions use Docker images, or otherwise how do the Actions get the binaries they need? You know, as in that’s your project being built into a binary or libraries it needs or something.
Dave Cross 00:12:48 So the Actions all run on containers, on Docker containers. GitHub supply some of their own standard containers and there are ones for various popular operating systems. They will do some light enhancements to them. For example, they will install the GitHub command line package. So you’ve got access to that. So without doing anything clever, it will just run on a pretty standard operating system container. But you are perfectly able to define your own container. So if you are using one of the GitHub containers, then as you hint at, problem is you need to install all of the software that you need in order to run your processes. So, it’s often a good idea to define your own container that’s got the software already installed. You can store that on any of the popular container repositories. So you can put it up on the Docker hub for example. And then in the header of your workflow definition, you would say this runs on; and then give it the path to your Docker container. And then when the workflow’s triggered, GitHub would pull that container down and start it up. So you’ve already got all the software installed.
Gavin Henry 00:14:19 Yeah, that’s a good point because I normally just run as if I’m sitting at a dev or ubuntu machine or you’ve been to machine all the app get to install the different things I need. But if I did my own container and pushed that to Docker hub or some other place, I could just pull that down and really reduce some of my time it takes to run.
Dave Cross 00:14:37 Yeah, this is something that I’ve done quite a lot of recently because I had some Actions whose job was to generate static webpages that were powered, that were run in GitHub pages and it was using a module, a Perl module that was being installed on every run and it was taking five minutes to get the container ready in order to run the software that builds the website. So I just spent a couple of hours putting together a container that had all the software installed, and now these workflows run in a minute or something rather than six or so seven minutes.
Gavin Henry 00:15:16 Yeah, I suppose it’s a trade-off between you keeping your own container up to date, you know, and –
Dave Cross 00:15:21 Yes, now you’ve got one more thing to look after.
Gavin Henry 00:15:24 At least, well if there’s no time constraint, I suppose, I know GitHub does spin off some long things where you can set how long it runs for. It does the opportunity to flex the install process and make sure it’s always working, I suppose, depending on what your project is. But isn’t there the concept of a where I see that lately? Yeah, from one of the deployment things that I use. All right. And Docker in general, it’s got the build image, doesn’t it? Does GitHub Actions have the similar concept? So cache is the last build for you?
Dave Cross 00:15:56 Yes. The honest answer is, I don’t know, I’ve never noticed it because these websites that I was talking about building, it was building everything from scratch every time. Maybe that was just because I hadn’t turned the cache on.
Gavin Henry 00:16:07 Okay. Perhaps.
Dave Cross 00:16:09 But basically, I guess I’m using the Docker hub as a cache.
Gavin Henry 00:16:12 Yes it’s a similar thing. I’ll put some links in the show notes if I can find something. Anyway, these Actions, can you reuse them? Let’s just go back to the the word — two words: GitHub Actions — and Action is the workflow definition, isn’t it?
Dave Cross 00:16:28 Well, to be honest, I think GitHub have quietly muddied the water here. GitHub Actions is what they call this entire feature. But when they are talking about setting up a workflow, they are very careful to call it a workflow definition file. So your YAML file is a workflow definition file, the meet, this definition file, as I said, is a number of jobs which are broken down into steps. And now each step is either a piece of code that you can run, so basically a piece of bash code that runs as though you were typing it on the command line on your Ubuntu container, or it’s what they call an Action, which is kind of an overloading of the term because in this case, an Action is a reusable piece of code that people can make available for your use in your workflows by putting it in a special format on GitHub. So Action really has two slightly different meanings in the GitHub ecosystem, but what Actions really are, it’s almost like a library, it’s a reusable piece of code that you can slot into your steps in your workflow definition.
Gavin Henry 00:17:47 Yeah, just when I asked that question, it sort of muddled it in my head.
Dave Cross 00:17:53 Yes, and that’s completely understandable.
Gavin Henry 00:17:55 Just to summarize, GitHub Actions is like their product name? The workflow definition that we’re in control of is what sits in our project. And then if we want to shrink our workflow file or do something complicated or just, you know, use something that’s used by other people, the actual word Actions is what they call the reusable blocks that you can call in your workflow file to do something that you might not be able to do or, you know, it saves you time. Cause you have to think about it.
Dave Cross 00:18:26 Exactly. For example, the most commonly used actions, the action that is used in pretty much every workflow file ever, is called action/checkout. And you use that as one of the steps, one of the first steps in your workflow definition file. And that will check out the code of your repo onto the container.
Gavin Henry 00:18:50 And the container definition would be something like Ubuntu, Windows, Mac, depending on what version you want to change on the architecture.
Dave Cross 00:18:54 Yes, correct.
Gavin Henry 00:18:55 Okay. So I think we’ve defined the product. You have Actions, the workflow, what a workflow file looks like. Those of us that have used Ansible, it’s kind of similar and it kind of looks like a Docker file as well, which we’ve done shows on. The Action’s reusable because it’s a separate library, as it were. The access model, because we’re using a Docker container, where’s that container living? Is that a root user? What does it have access to the thing that’s running the code?
Dave Cross 00:19:27 So there’s a number of different levels to this. As to where the container runs, I think GitHub really want you to think about it in the same way as you would think about a serverless implementation in AWS. You don’t care about where the server is running; it’s just running on a container that is running on one of GitHub’s pieces of hardware somewhere in the world. I haven’t come across anything like areas that AWS have. You can’t say you want it to run in that part of the world or anything like that. It’s just a container that’s running somewhere in the world. The next level is that you are running as a user on that container, and on the GitHub standard containers that they give you, you’ll be straight in there as root. Obviously, if you’re building your own container, then that might have different setups.
Dave Cross 00:20:24 One confusion that I sometimes get is switching between a GitHub container and a container that I’ve developed myself is that the GitHub container, like I said, puts you in as root and then I switch to my own container and you are no longer root. So you have to sudo when you want to install. And every time I make that switch, it catches me out. And you’ll see a couple of commits when I’m mixing whether I need to add a sudo or take away a sudo command. If you’re not running as root, you are running as a user that has access to root through a sudo. But that’s common in enough within a container, I guess. And then the third level is how is that user, what permissions does that have on your GitHub repo? And the answer is, it basically runs as the person that owns the repo where it is hosted, where the workflow definition file lives.
Dave Cross 00:21:22 You have part of the environment that GitHub workflow sets up for you. You have an environment variable, effectively — it’s not quite an environment variable — called GitHub token, which has access to the permissions that the owner of the repo has to the repo, which are by default going to be all read and write access to the repo. But you can add a permissions definition both at the job level and also at the individual step level to change the permissions that the workflow has to your repo. So, you can cut back the permissions so you can’t accidentally write to things that you don’t want to write to, for example.
Gavin Henry 00:22:13 When you go to the settings of a GitHub project and you got multiple collaborators or team members that you might have assigned different roles to or access levels, it’s the GitHub repository owner that is the permission set that is used for running the Actions.
Dave Cross 00:22:30 But then of course, depending on what’s happened — I mean, if you fork a repo, then the fork is obviously owned by a different person. So it’s the fork includes the workflow definition files, but they will only have permissions on their fork of the repo rather than your main copy of the repo.
Gavin Henry 00:22:51 So, if you’ve got any secret definitions, which we’ll touch on, because you’re pulling some other repo or you know you’re pushing somewhere that needs an SSH key or something like that, the fork obviously wouldn’t have access to that environment. So some jobs may fail.
Dave Cross 00:23:06 Yes, but that would be good, probably.
Gavin Henry 00:23:07 I’ve not been in the situation, myself, where I’ve had to think of that level of granularity for running different bits of a workflow with different permission levels, so that’s good to know. Thank you. So they state — as in the GitHub — that they offer cross-platform support? What does that mean?
Dave Cross 00:23:25 So it means that the GitHub-supplied runners, the containers that we’ve talked about before, they are available basically in three different flavors. There is a Ubuntu flavor, there is a Windows flavor, and there is a Mac OS flavor. And for each of those, there are some different versions available. I’m not entirely sure how far back those versions go. So when you set up a workflow definition, you say what it runs on. One of the things you can do, one of the easiest ways get a workflow up and running is just to say that it runs on Ubuntu, and they will just pull down the latest version of their lightly modified Ubuntu container, and you can run it on there. But that also work for Windows and Mac OS. Of course, because you can run your own containers too, there’s nothing to stop you running on a container that runs a completely different OS.
Gavin Henry 00:24:30 So if you’re not defining Ubunto latest, or Mac OS latest, or whatever, would you put in that line the Docker image you want to pull, or are you running your own Docker image inside Ubuntu?
Dave Cross 00:24:44 No, no. You run it instead. So yes, the Runs On is either one of their labeled for their own containers or definition of your own container.
Gavin Henry 00:24:55 Okay. Because I’ve seen myself mess up Docker on my workstation here, which is a Fedora one. I’ve then used virtual box to run a Debian thing and then run Docker inside that. So I thought it was something like that.
Dave Cross 00:25:07 No, I don’t believe so, no.
Gavin Henry 00:25:09 I’ve personally been caught this week and last week on a couple of the projects when I use Ubuntu latest or Mac OS latest or something and I’ve had to go back to a fixed version because they’ve changed what the latest tag is. And then all the libraries you depend on or different environmental variables or the bundled version of Python or Homebrew or something has changed and all your stuff breaks.
Dave Cross 00:25:32 Yeah, I can understand that. What they’ll do is if you go to each Action — or each workflow, to be accurate — and each run of that workflow has a page on your repo. And so, if you go to that workflow, when those changes are imminent, there will be a notice that appears quite clearly at the bottom of that page saying you are using Ubuntu latest; in three weeks’ time. That will go from being 22.04 to 23.04 or something like that. So they do try to pass that information on. But yes, if you are…
Gavin Henry 00:26:10 I didn’t know that cause to something back last week.
Dave Cross 00:26:14 And obviously, if you want to be really careful about the version that you are using, then yeah you will want to give version number rather than latest.
Gavin Henry 00:26:25 No, it’s a tricky one because it’s flexing your software on a different version of the platform. So it’s kind of good in a way, but it’s noise because you haven’t changed any of your code potentially.
Dave Cross 00:26:36 Yes. But then, presumably the people that are using your software will be updating their OS at some point. So you do want to know about those breakages.
Gavin Henry 00:26:45 Yeah, exactly. Similar when it’s an open-source project, you just don’t want that red icon on your project either. So it gives you the old broken windows philosophy. So, do you have any concept of how scalable and performant Github actions are, in your experience?
Dave Cross 00:27:03 I’m not really sure in what way it needs be scalable. Are you picturing a repo that fires off events every few seconds that start running a workflow?
Gavin Henry 00:27:14 I’m thinking about maybe this is just a case of putting a link into the available images that GitHub have and how many CPUs they give to a container and how much RAM, you know. Say you’ve got a really RAM heavy project, will that run or, you know, will you have to pay more to get that, or will it just take 30 minutes instead?
Dave Cross 00:27:32 Yeah, to be honest, I’m not sure what size the containers are that they’re running.
Gavin Henry 00:27:38 Okay, I’ll dig that out.
Dave Cross 00:27:41 Performance, well we’ve already talked earlier about the speed up that I got from switching from using one of their containers to using a container that I’d built myself that had already got the software installed on it. The other thing that you can think about there is you can control what the workflow does when things fail. You might want to fail as quickly as possible. If something goes wrong, then there’s no point in carrying on. So you can make things — maybe not faster, but stop running sooner — so you get the results quicker by controlling the error flow.
Gavin Henry 00:28:19 Yeah. That gives you a good indication if something you’ve done has taken too long and you can set it to bail out as well.
Dave Cross 00:28:26 Yeah. And you can also, as you mentioned earlier, there’s time out, and I think the default time out is something like three minutes, but you can bring that in if you want.
Gavin Henry 00:28:34 Perfect, thanks. I think we’ve mentioned it a couple of times, but if you want to store passwords for — secrets is the general term — credentials in there because you have to go and fetch something or push something out in your continuous deployment, how would you go about that?
Dave Cross 00:28:52 So, if you go to the settings in your repo, you will find that there is a secrets item on the menu, and you can go in there and you can fill in secrets that are just key-value pairs. And I’m no security expert, but GitHub tell us that that information is stored in very secure manner on their servers. Obviously, it has to be a reversible encryption so that then get access to the value so that they can use them. But they exist at three different levels. You can have secrets at the organization level. So, if you have secrets that are shared across repos in your organization or in your user, maybe API keys that are used by different pieces of software in your organization, or individually at the repo level. And also you can define environments against your repo, which means that you can have an environment, a staging environment and a production environment, and you can say that this workflow is running in this environment, and then you can perhaps access a different version of the secret, depending on which of the environments it’s working in. So you might have a different API key for development and production for example.
Gavin Henry 00:30:24 That’s why I love doing these shows because I always learn something I didn’t know. I’ve been trying to my head around how to do something for somethin I’m working on at the moment.
Dave Cross 00:30:34 That’s good, that’s good. If you go to the settings as well as secrets and things like that, there’s an environment option, and you can go in there and just set up as many different environments.
Gavin Henry 00:30:43 Maybe I’ve just fed my imposter syndrome some more thoughts there. So, do we know where these actions or containers run? Is it on GitHub’s infrastructure or … cause it’s Microsoft Azure or something like that? Do they tell us anything about that?
Dave Cross 00:31:00 I don’t know that it’s a secret. I’ve never looked into it in any detail. So I don’t know is the honest answer to that. As I mentioned earlier, I think they would like you to think about it in a serverless way. It’s just a container that runs somewhere, and you get some results back.
Gavin Henry 00:31:16 Do you know if there’s an option to run the container on some of your own stuff, like a half on-prem type solution?
Dave Cross 00:31:22 I was just about to mention that there is the option to run a self-hosted runner.
Gavin Henry 00:31:27 Do you want to just define a runner?
Dave Cross 00:31:29 A runner is the container that runs your workflow.
Gavin Henry 00:31:34 Cool. Yeah, I’m familiar with that from GitLab type.
Dave Cross 00:31:37 Yeah. Yeah, so you can, again, it’s one of these things that you can go into your, I think it’s at your repo level, you can define self-hosted runners, and they give you a piece of software that you then need to install on wherever you are going to run stuff locally. And that then communicates with the GitHub servers and does whatever GitHub wants you to do. Yeah. You can run GitHub workflow runners on your own hardware if you want.
Gavin Henry 00:32:06 And that’ll pull down their images and things like that.
Dave Cross 00:32:08 Yeah. Two most obvious reasons for doing that would be security. You have stuff you really don’t want to have running on GitHub’s servers, and secondly costs, because they don’t charge you for running stuff on your own hardware.
Gavin Henry 00:32:24 And I suppose, yeah, I mean that what you mentioned before, building your own image to run your own jobs on means that you know exactly what’s in that image as well. We did a lot of shows on supply chain security. So that would help validate and prove for any eyes or regulations or bodies that you’re in to say we’re in full control.
Dave Cross 00:32:44 Yeah, I mean you say that but I’ve certainly never built a Docker container that hasn’t been built on top of somebody else’s Docker container. So that’s worth thinking about.
Gavin Henry 00:32:55 Yeah, normally just pulling something slim/slim or you know, one of these ones that have spent ages building all the different bits that Perl needs or, you know, ELECTRA needs, or something like that. Okay. That almost finished off that section before we move on to the example project. The runners are good example that you can just use their infrastructure but run on your machines. Yeah. So then you’re in control of security, hardware, resources; you’ll need a good internet connection to pull down the images the first time at least.
Dave Cross 00:33:28 Yes.
Gavin Henry 00:33:29 Okay. So going to move on to the last bit of our show. I think we’ve done a great exploration and definition of GitHub actions, which is the product name. Then we’ve got the workflow that we’re a controller of the file, the YAML file, and then the actual keywords term Actions, which is things we can use in the, in the GitHub Actions marketplace to run stuff for us. We then make a decision of whether we want to use stock containers or pull in our own, whether we want to run those on GitHub’s infrastructure and potentially pay for that usage above and beyond what we get for free. Or if we’re for example a bank or similar, we might want to use the free runner service where we install a binary on our own operating system and that pulls down the images. So, let’s scoop all that up and go through a project that you’ve worked on or you’ve read about that benefited from GitHub actions. So have you got something in mind from that we could wrap about?
Dave Cross 00:34:26 There’s a couple of things that maybe we can talk about, but I thought maybe, I mean we all hopefully understand the CICD thing, so I think we might touch on a couple of other uses for it. Do you know the software developer Simon Willison? Have you had him on? You should get him on at some point.
Gavin Henry 00:34:47 I’ll have a look.
Dave Cross 00:34:47 He came up with a concept he calls Git-scraping, which is powered by actions. He has a piece of software called Datasette, which is good for looking at SQLite databases, and Git-scraping is a way of building these databases. And what he does is he uses the cron job functionality for triggering things and he’ll find a website that’s got interesting data in the form of a JSON file and he will go away and in the GitHub workflow he will scrape that JSON file, and then use Git to do a diff between that and the previous version. And then he will, well, obviously Git will give him a history of the changes in the data. So, I mean he’s doing things like he’s talking about websites that are monitoring forest fires in California and stuff like that. And then he can, by taking the differences and putting them in his SQLite database and using his magic Datasette software, it builds websites that enable you to plot that data on a graph or build various interesting ways that the data has changed over time.
Dave Cross 00:35:56 So that’s I think is quite fun and different use of GitHub Actions. I guess basically you should realize that what they’re doing is GitHub are giving you free access to running cron tabs on their service. So anything that you can think of that is a schedule, do some stuff and then store it either in GitHub or in a database, is something that you can do from GitHub actions. So you know, the sky’s the limit there really. Another thing I’ve done, you mentioned right at the start that I was involved in the Perl community, and so you know about CPAN; some of your listeners might not realize that CPAN is the repository of free Perl libraries — sorry, add-on code for your Perl programs. And the POLE community in CPAN is very keen on they do a lot of unit testing.
Dave Cross 00:36:57 So I built a site called CPAN dashboard, which anyone who writes CPAN modules can create a pull request to my site, adding them themselves to my site. And basically all we need is their CPAN username. And then the site uses GitHub actions to run some software which pulls information about all of their CPAN modules from CPAN. So uses the meta CPAN API and then produces a list of all of their modules and — oh, they also need to tell me which CI tools they are using, whether it’s GitHub Actions or Travis CI or Circle CI — and it then goes away on a schedule and interrogates all of those services and builds badges for all of those modules on all of the CI services that that author uses. So, it produces a rather nice visual representation of all the modules that the authors have written and how well they are doing on the various CI services. As a CPAN author myself, I use that if I’ve got an afternoon where I’ve got nothing much to do, I might go and have a look at some of the badges on that and see where my software can be improved.
Gavin Henry 00:38:24 Thanks Dave. So just to summarize, because I want to go over an example project that you discussed in your book. So building a static website. Just to summarize those two examples because I think they’re great showing the completely two different ways to do things. The first one uses the cron job function of GitHub actions. So it goes off and pairs a website, does a diff in Git and then does the different things that Simon wants to do. I’ll put in a link into the show up for anything that you can give me about that too. And the second one is a site that you run in collaboration with CPAN and met CPAN where anyone can use the pull request on Action in the GitHub workflow file to run and trigger a few different things based on the fact that they forked your repository and create a pull request and then off it all goes. So that will be a massive time saving for you in the community as well.
Gavin Henry 00:39:13 So, anything you can give me for the show notes for that, that would be great.
Dave Cross 00:39:57 Sure.
Gavin Henry 00:39:18 I know it might be a simple project, example project, but just to scoop up everything we’ve discussed for the sort of last 15 minutes, let’s go through a static website. If we could highlight the manual things you’d normally do and then what you can do with the GitHub actions touching on the event you’re going to use, secrets you’re going to have to think about, whether you’re going to have to access anything else that isn’t on GitHub, and how you manage that. That’d be great.
Dave Cross 00:39:46 So yeah, static websites are in many ways quite dull because you can actually do that without GitHub actions at all. I’ve been dealing with what I call semi-static websites, which are a little bit more interesting. If you remember used to be quite popular in the maybe 15 years ago, the idea of a Planet website. Python had a piece of software called PlanetPlanet, which basically what you do is you take RSS feeds, web feeds, from various sources and you aggregate them and you build a website that is basically a news page for a particular topic. Maybe you’re interested in Dr. Who, for example, and there are various websites that publish news about Dr Who, and different stories will appear during the day.
Gavin Henry 00:40:33 Yeah, I used to like those, ones on Postgres or any open-source ones or whatever you’re looking at.
Dave Cross 00:40:39 So I’ve got a few sites that work like this. So, basically you have a simple GitHub workflow that mostly work on the cron job basis. Every three hours, for example, it wakes up, it pulls in the RSS feeds from the half a dozen websites that you’re talking about. It then combines those RSS feeds into a new RSS feed, which it publishes. And also using probably the template toolkit — ’cause I still use Perl for a lot of my personal stuff — it will build an index.html and publish that to a Github website. So it rebuilds the whole thing every few hours. But the other thing that it does is, this is obviously driven from a configuration file. It could be a database, but I use a text-based configuration file, which lists all of the feeds that I’m aggregating — and obviously that might change, I might edit that, add a new feed or a feed has gone away so you delete it or feeds move and stuff like that. And so it, the GitHub workflow definition has an ON tag which looks for pushes. So a commit that has been pushed to the repo, but you can further filter the push by saying, I want you to trigger for a push, but only when the push touches this particular file. So when the push includes a change to the definition file, the config file, then it fires and rebuilds the website pulling in the new feed or losing the old broken feed or whatever.
Gavin Henry 00:42:29 That’s exactly what I’ve been looking for, as well. Yesterday, I’ve got a project I’m working on, which is about a new SaaS thing that is particular to my sector, but it’s got the marketing web pages as part of the main site that has all the API backend stuff. So when I make a front-end change to say a contact page or a pricing page, I don’t really want to run the whole test suite. And burn through any minutes I’ve got or anything like that. So that’s given me the perfect idea to just say, you know, if anything in these folders, if that’s an option, get touched, then run the GitHub action files.
Dave Cross 00:43:06 So any of these triggers that fire your workflow, they all have various types of filters on them.
Gavin Henry 00:43:15 Yeah, I thought it was an all or nothing cause it’s been driving me mental. Yeah. Sorry, trying to think of, I don’t want to see that fail because I’m changing this. That’s, and for the event workflows, so you’ve explained there that there’s a cron job that runs every three hours to go off and fetch the RSS feeds; it then will commit to, I presume, another repository, which is the Github pages?
Dave Cross 00:43:38 Well actually no. This is something that I’ve taught myself recently because it was causing me a problem. Github pages can work in several different ways. They can serve the website from a different branch, or they can serve it from a slash docs directory from the main branch, or from the route directory. For things like this where I’ve got some processing, I take the top the slash docs approach where I’ve got all the code maybe in the route directory and then it runs stuff and it dumps the finished website into the docs folder, and then it commits that new file from the docs folder into the repo. Now that’s a bit of a problem because this is running around every three hours committing a new version of the index file and the new RSS feed file. So, I found it causes a couple of problems.
Dave Cross 00:44:37 Firstly, it means that every time that I go to my checkout of that repo on my local disc, I have to always remember to start with a Git pull because there have been so many changes since the last time I worked on that file. So I need to make sure that the repo is up to date. And secondly, and this might not be seen as a problem by some people, but I was finding that a couple of my repos were the most committed GitHub repos in the UK for the whole of last year because it was, well they were running so many automated commits, and it’s kind of cheap cause I’m not actually writing these, not that the number of Github commits you do should be seen as a game of any kind. But it was like, it was almost like I was cheating at winning at the game.
Dave Cross 00:45:27 But it turns out that you don’t actually need to store the website that you’ve built in the repo. One of the things that we haven’t touched on, ’cause there’s a lot of GitHub action stuff we haven’t had time to touch on, but one of the things we haven’t touched on is a thing called artifacts. So you can generate what basically ends up as a zipped-up tar ball. It gets stored as an artifact on GitHub servers, and you can control how long that artifact is kept for. But basically, if you go to the webpage inside your repo for a GitHub workflow run and it generated an artifact, you can download that artifact to your local machine and examine it.
Gavin Henry 00:46:11 Is that artifact something that you’ve told it to generate? Or is that a general term for…?
Dave Cross 00:46:15 There’s a GitHub action called Build Artifact or something like that.
Gavin Henry 00:46:20 Would, that be a binary or something to deploy or?
Dave Cross 00:46:23 No, that’s, we talked about the Actions earlier. The libraries that you can use within your workflow. That’s one of these. You just give it the path to the file or files that you want to go in the zip file.
Gavin Henry 00:46:37 Can you give me an example of what would be in that?
Dave Cross 00:46:39 I use it, for example, when you’re installing a CPAN module. I mean, this is probably true for other languages as well. If there are errors, it writes a log file. But cause it’s written that log file on your container, which has then ceased to exist when the run finishes, if a module didn’t install successfully, then you don’t know why it was broken. You don’t know what went wrong. So if you create an artifact, you say — I talked earlier about controlling the error flow and one of the things you can do on an error is take the log files and bundle them up into an artifact.
Gavin Henry 00:47:16 This would be more apparent because normally if something fails, I’ve experienced, you can go into the failed job and expand the debug logs and see it. But I presume that’s only if you’re spitting out the logs to standard error or you know, you’re running a step by step to get installed, but if it’s on your own container or something and that’s gone. The logs aren’t spit out.
Dave Cross 00:47:39 It gets frustrating because it says installing this module failed; for full details, see this file. And then it gives you a path to a file that no longer exists. You create an artifact, give it the path to where you know that file is going to be created, and it bundles up any files it finds stores those on GitHub servers, and you can then have a link to download that artifact on the webpage for that run. So you can download it and examine it at your leisure.
Gavin Henry 00:48:11 So just going back to this example project, we’ve got a cron job definition to run, let’s say every three hours. You’ve got another event there that runs something if you do a push to a certain config file, ’cause you’ve done that level of granularity, not just a push to the whole project, which is how I’ve always done it, which I didn’t even know you could do, which is amazing to learn today. Can you have more definitions as many as you want there? That granularity? And you’ve also put the artifact job on this project as well.
Dave Cross 00:48:43 Yes. So you can have multiple keys under the On command. So in fact, if you think about it, the job that you need to run in the cron job is regenerating the website. The job that you need to run when the configuration file changes is rebuild the website in exactly the same way.
Gavin Henry 00:49:04 But when I tell it to, because there’s an event that’s triggered it, which is the push?
Dave Cross 00:49:08 So the only thing that’s different is the way that it’s triggered. So for these semi-static planet files, they typically all have three keys in the On trigger. The cron tab one, if you’ve changed the config file one, which might also — the other thing that might change is I might tweak the template for the index.html file, the template that generates the webpage. So obviously if I change that, I need to regenerate the file as well. But also I will put in the workflow dispatch key as well because I just want to have that button appear that means I can manually run it whenever I want, which often helps with debugging or something like that.
Gavin Henry 00:49:50 That’s helped me out as well because I’ve also, I’ve been in the point where I’ve got a crime job that runs a static code analysis on one of my projects. So when I make a commit, I have to wait till the next day to go and exactly see the results for some of it. Yeah, I didn’t know about the dispatch thing because I’ve always only rerun them if you go into the Action output and click rerun all jobs or rerun failed jobs. So that’s great.
Dave Cross 00:50:14 We’ve got three different triggers, but they all have the same effect, which means that I can put them all in the same workflow definition file, which is just called build.YAML or something like that. Just that there are three ways to trigger that. Either there’s a push on one of the important files, or it’s cron job, or I just press the button and all three of those events have the same action, the same effect.
Gavin Henry 00:50:42 And they could have access to different secrets at different levels because you’ve clicked the button. You might have.
Dave Cross 00:50:48 They could do. Yes. Yes. I mean there’s all sorts of other things you can do. So you have, as well as access to secrets, you have access things called contexts, which is information about that run. And one of the context is the GitHub context, like a hash or a dictionary, it’s called GitHub dot something and the dot something will be the repo name or the actor who’s the name of the person who triggered the run, the actual GitHub username — the Git reference that the action is working on.
Dave Cross 00:51:23 Just all these pieces of information about what actually triggered the run. So you can, even though you’ve got the same workflow that’s triggered on three different things, one of the things that you could look at within the GitHub context is what the event was that triggered. So you can take different actions if you wanted to.
Gavin Henry 00:51:44 Ok. Is there a difference of what you can do if your a repository is a public one — say cause it’s your website or it’s an open-source project — versus a private one?
Dave Cross 00:51:54 I haven’t seen any difference. There’s a difference in pricing.
Gavin Henry 00:51:58 Yeah, I think you have to pay for your own private stuff.
Dave Cross 00:52:00 All the pricing is done on basically the number of minutes of container time that you use. And as I’ve got a pro account, so within my private repos across all of my private repos, I get something like 3000 minutes of free time every month and anything over that, it gets billed to me and it’s fractions of a penny per minute.
Gavin Henry 00:52:23 Thanks. Going back to your example of the three ways you could deploy a GitHub static site, I presume that might possibly change your development process because you’ve got these different things that you can only access a certain way. Is it something you need to bear in mind when you’re using GitHub actions that things work a certain way, or it sounds like it’s extremely flexible given the,
Dave Cross 00:52:46 I can’t think of a counter example. I think all of my code that I have written to run inside GitHub actions is all completely agnostic about the fact that it’s running inside a GitHub action, if that makes sense. It’s code that I can run quite happily outside of GitHub actions. It doesn’t rely on anything in the environment that it gets from GitHub actions. That stuff, all of the GitHub Action stuff, goes in the workflow definition file, not actually in the code which I’m running. So I don’t think I’ve needed to change the way that I’ve written software.
Gavin Henry 00:53:28 Thanks. I presume this would just come down to the fact that you have to remember when you’re doing your testing, you’re not in production. So that’s a separate thing from what you can do in Github actions. You just have to do things the way and use your fixtures and all sorts of different stuff. Okay. That finishes off the example project section nicely, which was your planet cron job scrapes RSS feeds every three hours. Do something based on the push and your artifacts, which I think gave us a nice overview of most of the different parts of GitHub actions. There’s one quick question that I think we’ve got time for before I close us off is in your book and earlier on in the show you talked about what you could do, or you mentioned when somebody raises an issue you could do something. What is that? Is it a workflow where if somebody opens an issue in your project?
Dave Cross 00:54:17 Yeah, so one of the triggers is, I can’t remember what the name of the trigger is, but you get an issue raised and in that instance the GitHub context that I just mentioned would be packed full of all sorts of information about the issue, like the text of the issue and any tags that it’s been given.
Gavin Henry 00:54:37 Oh, so just going back to the context, that’s a set of environment variables that you could pull on that specific to that instance, situation. Ah, that makes more sense.
Dave Cross 00:54:49 Yeah. So you could add other tag to the issue. Oh, one nice thing that I’ve seen is, you know things about the person that raised the issue. You can know whether it’s the first time that this GitHub user has raised an issue against this project, and you can send them a nice welcoming email or add a comment to the issue saying, thank you, welcome to the project. It’s always nice to have new people. There’s a few things that you can do around that to just sort of automatically welcome people into the project.
Gavin Henry 00:55:23 Excellent, thank you. So yeah, I think we’ve done a great job of covering why you should use or expand your use of GitHub actions.
Dave Cross 00:55:31 Which you should do by buying my book .
Gavin Henry 00:55:33 Yeah, exactly. I’ll make sure there’s a link in the show notes for that. Cause I’ve enjoyed, reading through everything I learned so much more than I thought I knew anyway. But now’s your opportunity to highlight any one thing that you’d want a software engineer to remember from our show.
Dave Cross 00:55:50 Can I have two things?
Gavin Henry 00:55:51 Yeah.
Dave Cross 00:55:51 One thing that I think is worth mentioning is that I know a lot of teams already have a lot of resource invested in existing CICD solutions. They’ll already have duff in Circle CI or Jenkins or stuff like that. Well, GitHub have produced a thing called the GitHub actions importer, which allows you to easily move your workflows from a different system into GitHub actions. So that’s a good, that’s an easy way to try things out. The main thing is CI and CD are great and everyone should be using them, but GitHub action isn’t just that. As I said earlier, GitHub actions gives you access to containers running on GitHub hardware, and the sky really is the limit in what you can do with it. And I’d love to hear about any interesting things that people end up doing.
Gavin Henry 00:56:50 Yeah, there were two great examples of projects that I didn’t think about with the cron things. So thank you for that. And you’re my first guest that’s ever had two things in the section that I see .
Dave Cross 00:57:03 I’m a rebel.
Gavin Henry 00:57:04 So where can people find out more? They can follow you on Twitter, which I’ve put in the show notes. Is that what you prefer, or is there anywhere else to get in touch?
Dave Cross 00:57:14 Yeah, I’m on Twitter. I’m also on mastadon — @fosstodon.org, I think. On most social media I use the same tag, which is @davorg. I’m even on that on LinkedIn. So if anybody wants to touch base me on LinkedIn. If they want talk about more professional things, then maybe that’s the appropriate place.
Gavin Henry 00:57:38 Thanks Dave. Thank you for coming on the show. It’s been a real pleasure.
Dave Cross 00:57:41 It’s been a real pleasure.
Gavin Henry 00:57:42 This is Gavin Henry for Software Engineering Radio. Thank you for listening. [End of Audio]