Matt Frisbie, author of Building Browser Extensions, speaks with host Kanchan Shringi about browser extensions, including key areas where they’ve been successful. Based on Matt’s experience as a developer working for Google, Doordash, and a startup he founded, they examine tools for building extensions, as well as APIs they have access to. The conversation presents detailed issues such as cross-browser compatibilities to keep in mind when developing extensions and mechanisms in the browser to prevent security vulnerabilities, and finally examines how emerging platforms can help developers take advantage of exciting new possibilities with web extensions.
Show Notes
- Matt’s book: Building Browser Extensions
- Matt’s article on chatgpt extensions
- Plasmo Extension platform
- Chrome Developer Docs
- Converting a web extension for Safari
- Matt’s twitter
- Matt’s LinkedIn
- Matt’s website
- Matt’s GitHub
Transcript
Transcript brought to you by IEEE Software magazine.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.
Kanchan Shringi 00:00:17 Hello everyone, welcome to this episode of Software Engineering Radio. This is your host Kanchan Shringi, and our guest today is Matt Frisbie. Matt has worked in web development for over a decade. He’s worked at Google, Door Dash, and has been a startup co-founder. As a Google software engineer, Matt worked on both the AdSense and Accelerated Mobile Pages platforms. Matt has very recently authored a book on browser extensions, which is our topic today. In addition to this book, Matt has authored three others, Professional JavaScript for Web Developers, Angular 2 Cookbook, and Angular JS Web Application Development Cookbook. Welcome to the show, Matt. It’s great to have you here. Is there anything you would like to add to your bio before we get started with browser extensions?
Matt Frisbie 00:01:03 I think that’s good. Yeah, I’m ready to talk about the book. Excited to be on.
Kanchan Shringi 00:01:07 So, let’s just start with what are browser extensions, and can you give us examples of popular extensions and possibly key industries where extensions are most popular?
Matt Frisbie 00:01:18 Sure. So, I say in the book that browser extensions are strange and powerful parasites, and I think that really captures the nature of them. So, they’re these pieces of software that are mostly written with standard web technology, and they sit on top of your browser and they can do a lot of things — just about anything. Really the only restriction is kind of the permissions you give it. And I describe in the book, they’re kind of a hybrid of a website and a mobile app. And so, from there you are really only limited by your imagination because I think most software developers don’t truly appreciate how powerful this software is. There are some limitations and we’ll get into that in the podcast, but I think that they’re really underrated as a platform, in general. So major industries, so according to Google, almost half of Chrome users have at least one browser extension installed.
Matt Frisbie 00:02:15 And I would bet any amount of money that the most popular extension is an ad blocker because that’s, most people do not want to see ads when they’re browsing the web, and ad blockers are extremely effective at that. So, that’s by far the most popular format. But obviously that’s not a money maker — it does tend to be free and Open Source software. But there are large companies that are based off of primarily browser extensions. So, the largest one that people have heard of is probably Honey, which is an extension that will automatically — well, it does a lot of things, but the thing that it’s probably best known for is it automatically looks up and tries coupon codes when you’re shopping online and then to get you the best discount. And PayPal bought Honey for 4 billion dollars just a couple years ago. So, Honey outgrew the browser extension platform, but that was still definitely its primary piece of software.
Matt Frisbie 00:03:09 So there are large companies. Loom is another great one. I actually know one of the co-founders of Loom, and their extension is a screen recording software that allows you to easily generate instructional recordings or whatever of a piece of software. And they have raised a ton of money. They’re eight figure valuation, they’re a huge company. A big piece of their software platform is a browser extension. So, these companies are out there. Other categories are like, AI assistance, like Grammarly is a big one, which watches you type and is able to make corrections and suggestions. Password managers is a big one, so like LastPass, although their reputation is a little bit stained lately. And then there’s also things like developer tools. So, React is far and away the most popular javascript framework. There are plenty of extensions that are able to kind of plug into a React project and then expose additional information that’s helpful to developers inside the browser console. So, there’s a ton of different places that browser extensions really thrive. And then there are some up-and-coming areas that I’m really excited about that we’ll talk about in the podcast.
Kanchan Shringi 00:04:19 So you mentioned Honey; you said they started as a browser extension, and now they have some other mediums, but starting as a browser extension is useful because it’s right there where the user needs them. And that certainly sounds very useful for React or any other dev tools as well. So, you talked about Chrome, we’ve used the word browser extension. So, can you develop the extension for one browser and expect to be able to run it on all others?
Matt Frisbie 00:04:49 Right, so the landscape is a little bit complicated right now. So no, there’s not a way to write it once and have it work everywhere. There are certain platforms that are trying to get closer to that, but there are idiosyncrasies that are unique to each browser. You have to do it slightly differently. However, the browser extensions have pretty much coalesced around the web extensions API, which was the successor to the Mozilla’s original extension language was XUL and XPCOM, which was a much more extensible and, some would argue, a superior platform that was able to customize almost anything about the browser. That has given way to the modern web extensions API, which has a smaller interface. It’s still quite powerful. And then that’s kind of the meat of what extensions use to do what they do. So, most browsers do support that API, but there are quirks that require special considerations for each browser.
Matt Frisbie 00:05:51 So, there’s kind of levels of compatibility that you can strive for. So personally, if I’m just one person working on an extension, if I publish an extension in the Chrome web store, which is by far the largest platform, automatically I can address like 80% of desktop browsers because you get Chrome right off the bat obviously, which is probably about two thirds of traffic. And then you also get all the browsers that are built off of the chromium open-source browser engine. So that gives you Opera, that gives you Edge, that gives you Brave; there are some others but just right there you’re addressing 80% of traffic. And so, that generally can be a single code base. So, that gets you pretty far. Where it gets tricky: Firefox, which they are still transitioning to manifest V3, which we’ll talk about later, and then Safari, which is its own animal in and of itself. They have a kind of a wacky way of deploying extensions, but it’s powerful and pretty new. So really the three large buckets if you want to get as much traffic as possible would be like Safari, Firefox (those both require special deployment), and then kind of all chromium extensions, you can almost have a unified code base.
Kanchan Shringi 00:07:05 So you mentioned manifest V3 — and we certainly will dig into that — but it might make sense to just define what the manifest file is at this point so there’s some context.
Matt Frisbie 00:07:15 Sure, yeah. So, the manifest, that’s like the core piece of a browser extension. So, when your browser extension’s loaded in the browser, the manifest tells the browser kind of where everything is, what it’s supposed to be able to do, some basic details like the name and description, icon, and things like that. Yeah, it’s a pretty simple file, but it’s kind of the glue that holds everything together. So, Manifest V3 is the latest iteration of the manifest. Obviously, it comes after Manifest V2, and the transition is kind of controversial. So, the Manifest V3 push is being pushed by Google — pretty much, exclusively, et started there. And so, the way it was initially announced was that is, oh it’s going to improve security. Oh it’s going to improve performance so we’re making all these changes, but it’s pretty obvious that one of the main intentions is to kind of start pushing people kind of away from ad blockers.
Matt Frisbie 00:08:10 So, Manifest V3 kind of wrapped into this transitional period is a phasing out of the primary API that powers AdBlockers, which is the blocking web request API. So, the way that all AdBlockers work, more or less, is that you can give uh, an extension permission to manage every individual network request going out on the browser, so it can see everything. And more importantly it can manage everything. So, an ad blocker, if it’s installed, give it permissions with the blocking web request API. So, if it sees an outgoing request to Doubleclick, which is the Google ad server, it can go, oh no that’s, I’m not going to serve that request, I’m going to block that. And then the browser is very well equipped to go, okay network requests fail all the time, so we just won’t load that.
Matt Frisbie 00:08:59 And so, that’s the meat of how all ad blockers work. There’s additional pieces of it, but that’s the core piece that it’s able to, for any individual request it can go in and zap the request, effectively, and say I’m not loading that. Manifest V3 kills that a API entirely, or at least cripples it in a meaningful way and replaces it with an entirely new API, called the declarative net request. And the problem with that is that for the blocking web request API, you are able to write a piece of JavaScript that runs every time a network request happens or matches a regex or something like that. And then that JavaScript can be arbitrarily complex. So you can say, oh if I detect an ad request being sneaky, I can still sniff it out in all these different ways.
Matt Frisbie 00:09:51 But declarative net request, that API you’re limited to this like declarative style. So you’re basically passing the browser a JSON file with all these static instructions that say, oh block requests to this domain that look like this, that loads an image that loads JavaScript. It’s still pretty customizable, but you are now basically delegating the ad blocking to the browser– say, all requests that match this criteria block those. And it sounds similar, but the important difference is that you no longer have the ability to intimately control what requests are being killed in the browser. And so very powerful tool for ad blockers is not quite as powerful anymore. And so now we’re kind of hoping that basically the browser vendors that, maintain the browser code bases — Google — continues to allow the ad blockers to work. And you can see that there’s a problem there because 80% of Google’s revenues comes from ads, and they’re also maintaining this platform that’s killing ads. So, these things are completely in conflict with each other. And so, this transition in Manifest V3, it seems very predictable that this was going to happen eventually. So, it remains to be seen what else is going to happen with Manifest V3, but it’s problematic because they’re kind of watering down what is probably the most important ability browser extensions, which is to moderate the traffic in and out of the browser.
Kanchan Shringi 00:11:20 That sounds really interesting, though I can imagine from what you said that V2, and I’m hoping the V3 continues to be powerful in terms of what the extensions can do, but also comes in with security best practices — because certainly when there’s too much that the extensions can do, there’s also a fear of, if you don’t have the best practices or you don’t have the best intentions, that things could go wrong. And that could reduce the amount of trust that customers have. So maybe if we can just delve into some security best practices when developing extensions and then we’ll go on to our next topic.
Matt Frisbie 00:12:00 Sure. So, security is a really interesting area of extensions because you have to be cognizant of what they have access to, as well as what the webpage itself has access to. Because I think it’s kind of unusual to think when you’re writing JavaScript that runs in the page that you have to think in an adversarial way because probably the biggest security hole that browser extensions dance around is: the DOM is shared entity now. So, the host webpage, whatever it is, let’s say you’ve written it an extension for, a Gmail plugin for example. So, Gmail is rendering its own html, their JavaScript is running doing its thing; your extensions JavaScript is also running in a separate container, but they’re both talking to the same DOM. So, anything that is being pulled in the DOM, referenced in the DOM, typed into the DOM, is visible to both.
Matt Frisbie 00:12:55 So, for example, if you were to build like a widget that is a content script, which is a piece of JavaScript that runs in the page that lets you render whatever you want, if you for example have a login that sits in the page and someone types in their username and password, the host page can see that username and password and do whatever they want with it; that doesn’t mean, I’m not saying Gmail would do that, but whatever site you’re doing, like they have the keys to the castle. So, there are ways to protect against that, but kind of understanding where the crossover points happen is really important. And I think the other part of it is that I think I’ll go ahead and kind of kick around Lastpass recently because I’m a Lastpass user and I was trying to change hundreds of passwords last week, which is a lot of fun.
Matt Frisbie 00:13:38 The amount of information that a properly permissioned extension has access to is quite profound. So, it’s not a joke when they say that if they have the proper permissions they can see all your web activity, everything you type, everything, the page, everything, all things they can send requests on your behalf. Not saying they do — all the top extensions are very trustworthy, and it’s a great place for open source to kind of instill trust in the user — but they have access to everything. And so, extensions like Lastpass that are recording your passwords, presumably to keep them safe, they are accessing a lot of valuable information, and they have to be good stewards of that information. And any extensions you build that have access to this important information, it needs to be protected because it really is … You know, how much of our life is spent online? If it can sniff everything you’re doing, that’s a really big attack vector. So, really being aware of what you’re storing and where you’re putting it, that’s probably the biggest security concern. And the big LastPass breach recently just kind of underscores what can go wrong when you’re not a good steward of data.
Kanchan Shringi 00:14:44 So, what’s the consumer to do? How do we know to trust an extension?
Matt Frisbie 00:14:51 It’s a good question. So, there’s really no one single way of doing it. So personally, all the extensions that I actively maintain, they’re all open source, so you can see everything that’s getting packaged into the extension. I find that that, I mean not a lot of people will take the time to go through the code, but it just being there is kind of a okay, I, I can see what you’re doing. And so, there’s a certain level of trust that that instills. So, the way that extensions display permission messages, some people have problems with it because it’s a little bit aggressive and it is. So like for example the tabs permission, which is a very common permission to request. So you can open new tabs, close tabs, move them around, whatever it is. I think the Chrome permission for that will say can view your entire browsing history and something else, something very scary sounding.
Matt Frisbie 00:15:43 So, I think one big thing is pay attention to what the permission warning messages are telling you because that’s really the last line of defense. That’s like if it’s saying it can see all your browsing history and, read all your webpages, whatever the message is, like that’s real. If an untrusted person gets access to this extension, they really can see everything and cause a lot of problems. And the reason I bring this up is that it’s pretty common for extensions to be purchased, or people to attempt to purchase them, just to get access to the users and permissions. So, for example, I launched a chatGPT extension recently and it got a bunch of users right away cause I was pretty early out of the gate, and I had people contacting me looking to acquire the extension. I don’t know what they were going to do with it, I didn’t say yes to any of them, but I don’t think their intentions were good because I think it’s a pretty common pattern for someone to come swoop in, buy an extension, do bad things with it, and then kind of move on to the next one because they are this kind of asymmetric model that like if you pay a couple thousand bucks to get an extension with 20,000 users and you have access to all their web browsing activity, that’s a problem.
Matt Frisbie 00:16:57 And that definitely goes on, I would hope not too often, but those people are out there. So yeah, pay attention to the warning messages, and stick to trustworthy extensions. So yeah, I mean stick to the bigger ones and maybe don’t be too adventurous.
Kanchan Shringi 00:17:12 So, later in the show I think we should spend a little bit of time, given this, on how the developer can create more trust as well. Let’s move into the architecture now for a little bit. So, it might be useful to start with just the very basics of how the browser renders the webpage and then the key elements of how extensions fit in there.
Matt Frisbie 00:17:37 Sure. So, the architecture is probably one of the more confusing aspects to people that are new to developing extensions because outside of the manifest file there’s really no aspect that’s required for any given extension. And on top of that there’s really no primary user interface with which you can interact with whatever the extension is doing. So, there’s some big pieces that are common. There’s like the popup, which is an interface that you can open when someone either clicks the toolbar icon or you can also trigger it with a hot key. There’s the options page, which is this like a standalone webpage that the URL the host is like on Chrome — it’s like chrome extension colon slash slash and then the path to wherever the file is — there’s content scripts, which is probably one of the more extensible and customizable aspects where you can inject JavaScript and CSS into the page and then do all sorts of interesting stuff with it.
Matt Frisbie 00:18:39 There’s the developer tools interface, which you can add custom pages like right into the browsers developer tools that have access to the special subset of APIs. And then there’s the, well in Manifest V3 it’s a service worker, there’s a background script that’s, it’s a service worker so it’s event-driven and then that’s like the piece of JavaScript that’s like listening to browser events and can push to storage, and do all sorts of stuff. That’s the piece that sort of ties all of the disparate elements together. And so, it’s a really bizarre stack. So, the extension itself functions as like kind of a file server. So, let’s say you’re building it with a React app for your options page, all the requests for static files and things like that are like they’re being directed towards the extension and it’s able to like serve these files into an options page or like you can, so you can serve files directly into the webpage itself for a content script.
Matt Frisbie 00:19:36 So it’s kind of this constellation of different pieces, and different extensions will use whatever is the most appropriate. So, like Honey, for example, they’ll paint stuff into the page like a little widget for showing when it’s trying to inject coupon codes, or LastPass when you are using it because all the stuff needs to be protected. It can’t put your passwords in the page so that lives inside the pop-up because only the extension exe has access to the pop-up, or React developer tools, they will, there’s really no user interface outside of the developer tools itself because that’s the place that kind of makes the most sense. Because when you’re building a website and using React, that’s where you’re spending a ton of time inside the developer tools. So yeah, it’s just a bunch of these pieces that can be assembled in different ways and you kind of use what you need and don’t use what you don’t need.
Kanchan Shringi 00:20:31 So the popup is where the user has to take an explicit action to invoke the extension, is that correct?
Matt Frisbie 00:20:39 Right, so the popup is probably the interface that most people are familiar with. So there’s right when you install an extension you’ll get inside like the extension bar — on desktop, at least — you’ll get a little extra icon that you can, is a clickable target. And so most people are not tech savvy or whatever, this is going to be the most comfortable experience for them because it’s a visible button: they can see it, they can click on it, they can right-click on it and do different things. But the interface of click the button, get the popup window, most people are very comfortable with doing that, and so, most extensions should at least have something there when you click that because people are definitely expecting it.
Kanchan Shringi 00:21:19 But what I understood is Honey used the content script, which automatically changed the page in some way that the user would recognize, but the user would not have necessarily to take any action to have that. Is that correct?
Matt Frisbie 00:21:33 Precisely. So, the biggest problem with content scripts — or, it’s not a problem, it’s by design — is that you can’t programmatically open them. So, if you want to kick open a small window to show a settings page or a login or whatever it is, there’s no event that can open the pop-up script other than something that the user directly does. So, it’s either, you’re either clicking the icon itself or you’re doing a hot key to open the pop-up, but you can’t call a piece of JavaScript to open the pop-up. It doesn’t work that way. So, a lot of popular extensions will emulate that. So sometimes they’ll like kind of do a content script widget in a similar spot to where the popup would open to kind of make it feel more familiar because that you can open up programmatically but of course you’re not getting the sandbox safety of the popup script. But yeah, so like Honey it needs to talk to the page and it’s not handling any sensitive information because of the coupon codes, like who cares? So all the widgets in the page, they’ll just stick right in there because that, you can control that programmatically and there’s no restriction on when you can show it. So, content scripts are definitely more user-friendly because there’s just more you can do with them.
Kanchan Shringi 00:22:46 And then you talked about the service worker, which is a background script, so I assume there needs to be some kind of communication between the background script and content script.
Matt Frisbie 00:22:58 Right, so the communication medium for extensions is messaging. So, one of a couple different types of messaging, but it’s all basically the same concept as a post message that, it’s this asynchronous messaging and you can open a channel, it can just be a one-off messaging. It’s bidirectional so you can send a response, multiple tabs can talk to the service worker at once, extensions can talk to each other. There’s also ways for extensions to talk to native software. It’s all done via message passing because the different pieces, they will have different exposure to the APIs. So for example, a content script doesn’t have access, like it can’t handle extension events, but it can send messages to the background which thereby can handle those events. So, the background will be able to exchange messages with the content script exchange messages with the popup exchange messages with the options page and the background or sorry, excuse me, the developer tools.
Matt Frisbie 00:23:57 And so it’s kind of acting as the hub for the extension itself. And the service worker is also useful in this case because that’s the, it’s guaranteed to be a singleton, so there’s only ever going to be one service worker for any given extension. And so that’s very useful because if you’re tying together a bunch of these different UI pieces that have all these different considerations, the service worker, you can always kind of fall back on that there’s only going to be one handing handling message in this way and that kind of makes it easier to tie everything together.
Kanchan Shringi 00:24:27 So we’ve talked about permissions and in the context of, a security mechanism as well. What are permissions and what are the different kind of permissions and how does the author request permissions?
Matt Frisbie 00:24:42 Permissions are tricky because I devoted a whole chapter to them. At some level they expect, pretty much how you’d think, that if you want to do something that requires any elevated permission, you request the corresponding permission. So, for example, like there’s an alarms API so you can like have a piece of code run in the background like every minute for example. So that’s called an alarm. So, you’d request the alarms API. If you want to talk to a certain domain. So like let’s say I wanted to send requests to google.com, I can request to host permission and I would say okay give me a define a regex that gives me the ability to talk to google.com from the extension and there’s, I don’t know, like a hundred different permissions that you can request, some of which will trigger a warning message and some of which will not.
Matt Frisbie 00:25:35 So like, for example, the alarms API: it’s not really doing anything sensitive. When you submit to the Chrome web store, you will need to say, like, here’s what it’s for, and you put in like a sentence, but the user’s never going to see a popup because that’s not, there’s no opportunity for abuse really. Whereas, if you’re requesting access to the tabs API, or you’re requesting the all URLs host permission, which gives you the access to everything, they’re going to get a popup that says, the extension either on update or when it’s installed saying the extension is requesting all this stuff; is that okay? And so, a useful pattern, if you don’t want to scare the user, because some of the warning messages can be very scary are optional permissions. So basically you can have, when they initially install it, you can have the core subset of permissions that you extension requires to work and those will be applied automatically.
Matt Frisbie 00:26:32 And then if you want to have them on a one-off basis grant additional permissions, you can do that; it will still incur the warning window for permissions that are more sensitive but they will, they’ll be explicitly requesting them so it won’t be as scary as like getting all these warning messages on install. One caveat with permissions, which is a pretty ugly aspect of extension development in my submission, is that if you add required permissions to an extension and then push that out in an update, everyone who has it installed will have to reapprove the extension, which is depending on how much they need, the extension can have a substantial amount of attrition. So, your browser will disable it. Like in Chrome, if you have extensions installed, you’ve probably seen this before, there’s like a little yellow exclamation point in the settings menu and then you’ll have to explicitly reenable the extension that’s requested a sensitive permission. And a lot of developers will get bit by this when they’re not expecting it because it’s a really unpleasant user flow. So, if you’re trying to avoid things like that, optional permissions are your friend.
Kanchan Shringi 00:27:38 So permissions is a key mechanism, and there may be concerns in how some of these are displayed to the user, but are there other mechanisms in the browser to prevent any extension vulnerabilities?
Matt Frisbie 00:27:53 Yeah, so manifest V3 went pretty far to address some of these. So, one of the biggest things that was taken away was the ability to execute arbitrary scripts. So that’s been taken away and it’s unclear. So, there are extensions like they’re called user script extensions and so the idea is that in manifest V2 you’d be able to install an extension — there’s one called Grease Monkey, there’s one called Tamper Monkey; they’re pretty popular — that you can basically define your own JavaScript that has access to the extension API and you can do whatever you want, run it whenever you want. It’s about as extensible as it can get. And basically, that would be calling the JavaScript eval function and then that’s what’s running your JavaScript or some equivalent of it. But you basically, you can inject arbitrary JavaScript, and in this case, the user wants it. Manifest V3 does away with that entirely presumably to deal with cross-site scripting cause that’s a pretty sensitive thing.
Matt Frisbie 00:28:52 So in manifest v3, all the JavaScript that runs has to come from the packaged extension. So, you can’t load a third party script from like a remote website. You can’t type in JavaScript and have it run that. None of that’s allowed anymore, with the exception of you can run it in sandbox, but that’s less useful. And so, all these problems kind of go away to a certain extent when you take away this ability to run third-party JavaScript. But at the same time, it’s problematic because you’re inherently disabling these really useful extensions that a lot of people find useful.
Kanchan Shringi 00:29:24 Let’s spend some time on the extension-specific APIs. So you introduced these earlier on. Can you describe the scope of what can be done with access to these APIs?
Matt Frisbie 00:29:38 Sure. So, there’s common ones — you know, storage is a really common one. So, you can request different types of storage that are separated from the webpage itself. So, it’s an asynchronous storage; you can request different amounts of space. So, there’s like an unlimited storage permission and you can store as much stuff as you want, which is useful for extensions where you’re recording video or stuff like that. Yeah, so there’s, I mean APIs for authentication. So, one kind of tricky corner of extensions is like how do you authenticate someone? And so, there’s this whole set of — like, OAuth, especially, like how do you deal with like authenticating a person with the OAuth protocol, which is particularly difficult because you need these callback URLs. And so, browser extensions have a native way of dealing with these things, but it’s kind of tricky to do because these OAuth is kind of built around being used in like a website format, and so yeah, there’s a whole API to deal with, like OAuth and extension.
Matt Frisbie 00:30:36 Yeah, I mean I talked about the messaging. There’s a ton of like APIs to deal with like the browser chrome itself. So, there’s an OMNIBOX API which allows you to like kind of show auto complete search results like from the browser bar; there’s like a context menu API. So like when you write click, like you can add an entire right click menu that’s sensitive to like what you’re clicking on the page. There’s a ton of APIs dealing with like network requests themselves. So like you can sniff what’s being loaded on the page, what is the browsing history, things like that. There’s like a pair of bookmarks API so you can manage the person’s bookmarks, the tabs API talked about. So, you have total control over what tabs are being opened, closed, pinned, muted, whatever it is. Yeah, the list goes on and on, but it’s pretty extensive what they can do, and it’s not really any that much of an exaggeration to say that pretty much anything that you can do in the browser and extension can do for you to a certain extent.
Kanchan Shringi 00:31:38 So can we talk about some of the key differences across browsers, that you know of, in support of these APIs?
Matt Frisbie 00:31:48 Yeah, so browsers have mostly coalesced around the web extensions APIs. So that core set of APIs is pretty well supported. Where there’s some fragmentation are on — so for example, Mozilla is planning on continuing its support for the blocking web request API even though like next week they’re about to roll out support for manifest v3. So that fragmentation is interesting, we’ll see where it goes because now Firefox is becoming the only major platform that will be supporting the blocking rub requests API that all ad blockers need. Firefox also has like a whole bunch of extra like themes and things like that. They have their own bag of APIs that are unique to that platform. There’s some idiosyncrasies with how the different, uh, APIs behave on platforms, but there’s not a ton. So, if you’re, I mean if you’re within the core web extensions API, everything any major browser vendor has pretty nicely come in and supported the web extensions API. So, development is pretty nice in that respect.
Kanchan Shringi 00:32:54 That’s good to know since that certainly reduces the amount of work you would have to do to have your extension work across browsers. Moving on to a little bit of detail now on popup pages, content scripts, and background scripts. So, starting with the content scripts and maybe popup pages, how much control does the developer have on the styling? And especially I think that’s relevant in the case of content script because you are updating the existing webpage. So, what should you keep in mind as you start to style the user interface of the extension, and any caveats there?
Matt Frisbie 00:33:35 Yeah, so contents scripts are an interesting one. So, popup pages and options pages, those behave pretty much as you would expect. So, it’s a webpage that you’re exclusively rendering, there’s no CSS bleed or anything like that. So, what you see is what you get for those pages. And so, that’s really useful because that’s it’s more akin to a traditional web development environment where it’s basically rendering like a website. Content scripts are a different animal, right? So, in the manifest you’re defining pieces of JavaScript files and CSS files that are being injected into the page and you get to define when they’re injected and then if there’s a domain match, like whether or not a piece of JavaScript or CSS should be injected. But after that that’s the JavaScript kind of is on its own and so you have to sort of bootstrap the user interface just using these pieces of JavaScript.
Matt Frisbie 00:34:31 And so there’s some interesting considerations, obviously one is CSS bleed. And so, the ideal pattern that I actually can’t take credit for this, this is the Plasmo guys who wrote the forward for the book; they were the first ones that came up with this — that I saw was basically, if you want to put a widget in the page but you want to protect it from CSS, you put it inside a shadow DOM so you’ll have the JavaScript will render whatever your widget looks like inside a shadow DOM and then you can have the CSS be injected inside that shadow DOM so it doesn’t affect the parent page and then it’s restricted to only the shadow DOM itself. And so that’s a really useful pattern because now you can style stuff the way you want to and you don’t have to worry about messing up the parent page, which is pretty problematic and that’s kind of a big pain point or that was a big pain point until I started folding SHA shadow DOM into all my extensions.
Matt Frisbie 00:35:22 So good job plasma guys on that one. Yeah, but then as I mentioned before, the content script then how it folds into the page is really up to you. So, there are some context where you’ll want to integrate more tightly. So if you are, so a lot of like, like Gmail extensions will, the developers will ahead of time, they kind of know what the DOM looks like inside Gmail and so they can go, oh I can look for this certain pattern of , pieces of the DOM and then I can inject my own button in there and then it’ll look like a piece of the native DOM and so then I can style it however I want and they can have it trigger whatever interfaces I want, but it’s going to be stuck inside the page so it’ll look like a piece of Gmail. Other ones will like pop over the page.
Matt Frisbie 00:36:02 So for example, if you’re like a Lastpass user, you’ll notice that Lastpass will stick a little Lastpass icon over the end of an input element. And if you click that, it’ll pop a widget over the page. That’s a pretty common pattern because it’s pretty cheap and easy to locate like a small box widget over the page. And then there’s also extensions that will kind of have like a floating button in the bottom right of the page that will trigger something more substantial kind of a like a popover or like a modal window. That’s a pretty common content script pattern. Yeah, so content scripts are, they’re really, they have to be judiciously applied because like you can totally mangle the host page or the host page will mangle your content script. So, you have to be very careful with yeah how it’s being applied on the page, but at the same time it’s the most natural way to like extend and interact with the host page and it really allows for the most powerful stuff.
Kanchan Shringi 00:36:55 So you mentioned that the mangling can happen in both directions, and one of the key user experience elements was having a floating element on top, which I assume will be helpful with the CSS property Z index. So, are there any best practices around using that property?
Matt Frisbie 00:37:16 So it really depends on what the host page looks like. So, there are certain pages where it makes sense to just use the Z index to kind of just force your widget on top of everything. But then you have to be you have to be sensitive about is the host page also going to use Z index to push stuff on top and am I going to interfere with that? So, Z index, that’s one of the few CSS properties that’s going to cause problems, right? Because you’re setting the Z index presumably on the shadow DOM host element itself. So, you could be pulling your extension on top of everything and then the user’s going to be like, what’s going on? I can’t see anything. Or vice versa, the host page is going to be covering up your stuff and make your extension unusable.
Matt Frisbie 00:38:01 So in any event, it really requires, if you’re going the content script route, you have to be very sensitive towards what is actually going on the page because that’s really going to drive how you’re integrating with the host platform. So, like for example, a server-side rendered website will be much easier to integrate with because you know that if it’s just sending you back a blob of HTML, you can modify that without fear of like a single page application blowing it up — or I mean at least it’s less likely. Whereas if you’re integrating with this like really involved React app, it could be rerendering all the time and the URLs might look the same and you have to be aware of like is the host page going to wipe out my content script entirely? So, all these things are kind of under the same umbrella of really having a good understanding of what your host page is doing and can do that will make it play nicely with whatever your extension’s trying to do.
Kanchan Shringi 00:39:01 Besides the mechanism you mentioned using shadow DOM, can iFrames be used as well to isolate styling?
Matt Frisbie 00:39:07 They can. I don’t recommend it. It’s all the benefits of using an iFrame. So, the problem with iFrames is if you’re putting it on a host page, you’re subject to the same origin restrictions as the host page. So, if you’re trying to load an iFrame from whatever the host website is of the extension, if you’re sticking an iframe in the page, the host page may zap that request and say yeah you can’t, I’m not letting you embed that domain. And so, at that point the sandboxing abilities of an iFrame, they’re harder to work with and they’re a little bit more finicky than a shadow DOM, and shadow DOM is supported pretty much everywhere. So, I would say it was pretty rare where the use case of an iframe exceeds the utility and kind of flexibility of just a vanilla shadow DOM post.
Kanchan Shringi 00:39:54 And you certainly don’t have a lot of these complexities if you use a power page, but it’s not as well integrated with the page that you’re accessing. So, there’s some trade-offs there.
Matt Frisbie 00:40:04 Right, right. Yeah.
Kanchan Shringi 00:40:06 What about the use of JavaScript frameworks or libraries to write these extensions?
Matt Frisbie 00:40:11 Yeah, so this is the place where JavaScript frameworks really shine. So, as I mentioned before, the popup and options pages are working off of what’s essentially a simple file server that the extension is behaving as. And so if you want to have like an options page, for example, so the URL for the options page can be something like Chrome extension colon slash slash, your extension ID slash options html and that’s going to send a get request for that actual file inside the extension file server. Now if you want to have a really complicated, let’s say you’re doing a React app there — let’s say you want routing; well, there’s no ability to handle a traditional server-side routing. So I’ve found that doing like a hash routing solution is more useful there because obviously, the hash part of the URLs is not affecting the path to the actual file.
Matt Frisbie 00:41:06 And so, this is where single page applications really shine because you can now have a fully-fledged webpage that has routing and can support going back and forth and things like that, but it’s still living off of the same file, and there’s really not an alternate way of doing that unless you’re having separate HTML files for each interface, and that’s just not as good of an experience as a single page app. And the other part of it is that because it’s basically a local file server, it can load everything really quickly. So, considerations like large static assets or, loading things from the server, these things are still important and you don’t want to load 10 megabytes of JavaScript for, like a one page file, but everything’s going to load basically instantly because it’s loading off your local file system instead of some remote server. So, these things make extensions can be a pretty snappy interface just because everything’s local and everything’s running off of the same box.
Kanchan Shringi 00:42:05 Thanks for that, Matt. I wanted to cover some other key topics that we’ve mentioned in our talk but not necessarily in sufficient details. So, let’s start with the dev tools pages. I think you mentioned that in the context of helping developers implementing React. So, can you just give a little bit of what the use case would be? What is an example of an extension here in this area?
Matt Frisbie 00:42:35 Sure. Well, the two that I use frequently are just React developer tools and then there’s a ReduX, a React redux extension as well. And so, the premise is relatively simple. Basically, an extension is able to create custom handles inside the browsers developer tools, and it’s pretty similar to an options page or a popup page that you can render it however you like, but it also gets access to a few special browser APIs. So, it can really tightly understand what the DOM structure is like. You can inject pieces scripts into the page to locate elements or, simple little script injections. You can sniff web traffic in a really rich way. And so, typically the way these are used is that like the React developer tools extension can tell if the webpage is running a React app, and then there are different there’s different ways to kind of make them play with each other.
Matt Frisbie 00:43:32 So you can have it be like, oh yeah, like the, your React app is rendering this way and, these components are working this way and, props and so forth are filtering down. So, it’s especially useful for kind of debugging and understanding what’s going on with single page applications because they can intimately integrate with what’s going on on the page, understand it and then show show it in developer tools in in ways that you can interact and understand kind of what’s going on behind the scenes. And so, I would most all major single page applications have some sort of developer tools like companion extension like this since it does prove to be so useful in so many cases.
Kanchan Shringi 00:44:13 Okay, makes sense. So, we did chat about manifest v2 versus v3 to some extent. You said the key difference was it might make it harder, it likely will make it harder to implement ad blockers. What else is different, and what else is the challenge of migrating from v2 to v3?
Matt Frisbie 00:44:35 Yeah, so this is a big pain point for certain extensions. So, most extensions will find that the transition’s pretty painless, but there are a few types of extension that are, their future looks pretty bleak. So, you’re right. So, off the top of my head, so we’ve talked about user script extensions — tamper monkey and grease monkey — so, the ability to run custom JavaScript is getting thrown out the window. The Chrome team has said that they’re looking into ways to continue to allow these, because these extensions have millions of installs, so it remains to be seen what’s happening with those but those are, last I checked, there was no path forward for those, although the Chrome team does indicate that they want to save them. Other ones are. So, the service worker introduces some interesting problems. So previously in manifest V2, the background was a headless webpage, kind of, and so you would have access to the DOM and DOM APIs, and so, certain extensions would use those for authentication, audio APIs, things like that.
Matt Frisbie 00:45:43 And so now that it’s a service worker, there’s no DOM anymore, right? It’s the service worker global object. And so, there’s things like JSDOM that lets you kind of emulate a DOM or there’s like the offscreen canvas API, which lets you recapture some of this behavior. But taking away the DOM has been a big headache for a lot of extensions. So, some of the workarounds are keeping a designated tab open all the time, like just have a supplementary HTML tab open all the time and then use that DOM. That’s not a solution though because the user has now this extra tab floating around all the time just so you can have access to the DOM. It’s a bad solution. So, it’s not clear what’s going to happen with those extensions. They also might be in trouble. And then one big one is the life cycle of the service worker itself.
Matt Frisbie 00:46:29 So there are some extensions that — like, let’s say you wanted to open a web socket, a long-lived web socket or something that needs to run for an extended period of time; Chrome will aggressively shut down a service worker because it’s a service worker and it’s designed to be this quickly destroyable restartable thing. And so, any extension that needs a background script to run persistently no longer has the ability to do so because the Chrome will shut it down after I think five minutes is the typical timeout. So, there are hacks that can prolong the lifecycle of the service worker, but any extensions that need a long-running background script, there’s no path forward for those either. And all these problems I’ve mentioned, the Chrome team has indicated they want to address them and has said that for a long time, but it seems to be progressing slowly and at the same time it seems like — so I’ll put it this way, it seems like the Chrome team cares because they had initially set a rollout date of this month, actually; January 2023 was the hard stop date for, I think it was all manifest V2 extensions in the Chrome web store would go dark. They’ve already cut off V2 submissions, but the existing published V2 extensions could still be updated and would still be public.
Matt Frisbie 00:47:49 They have pushed that back to, I think, the middle of 2023 — so June or something — because they know that they’re not ready and all these extensions that are extremely popular are going to be killed if they turn them off. So, it seems to be the Chrome team cares about preserving these extensions that are kind of getting crushed by manifest V3, but I don’t really know what’s going to happen because it doesn’t seem like there’s much of a plan, and I don’t know; they haven’t rammed it through yet, but it would not surprise me if it got to that point and they just said well you’ll have to figure it out. So yeah, in summation, Manifest V3 is pretty controversial, and it remains to be seen what’s going to happen.
Kanchan Shringi 00:48:25 But if you are writing a new extension you should start with V3.
Matt Frisbie 00:48:29 At this point, yes, unless you’re really … People are still writing V2 extensions because they’re targeting Firefox or they really want to hang on to the old APIs. But, if you want to use the Chrome web store, if you want to have the most users, V2 is dead. Time to go for V3.
Kanchan Shringi 00:48:46 So I did read about a Safari web extension converter. Do you have any experience using that?
Matt Frisbie 00:48:52 Yeah, so this was, I made sure to cover this in the book. So, this was a really interesting development. So, I think it was, let’s say two years ago. Safari, or Apple, rolled out extension support for Safari. And so, traditionally extensions have not been a mobile platform. So, Google obviously has never rolled out extension support for Mobile Chrome. I think everyone kind of understands why they didn’t do that because they don’t want to lose the ad revenue because it would be billions of dollars out the window if they did that. So, I think they just said, yeah, we’re not doing extensions for mobile. Too bad. So, there are ways to get extensions on mobile Firefox is one way, but there’s a Kiwi browser for Android so you can do it on an Android phone, but there’s really no like first-class support from like the primary browser vendors — until Safari rolled out extension support.
Matt Frisbie 00:49:44 So, it’s pretty far away from how you would typically develop extensions. So extensions have been — you know, HTML, CSS and JavaScript, it’s basically just bundled together and then shoved into the browser, and the browser understands what to do with it. In Safari, it’s packaged kind of like an app. It does function, essentially, as an extension in the browser. So, if I wanted to publish an extension for Safari, I’d build it inside X code; I’d have to get an Apple developer account, and I would publish it on the app store. And then once it’s installed, like on your phone for example, you’ll see it installed as kind of an app. And when you dive into the code, it’s pretty interesting because all the same extension pieces are there. So, you still have a manifest, you still have content scripts, and all the files still work the same, but there’s this kind of wrapper for the app.
Matt Frisbie 00:50:33 And so, the wrapper itself is a mobile app for Safari, and you can talk to it with the native messaging API. So, there’s this entirely extra piece of software that’s running on the phone that’s, it’s a Safari app. I’m not a Safari developer, but you can see it on the device and you can write code for it to behave as an app. So, it’s this whole extra piece that lets you kind of talk to the phone itself. So, the safari aspect of it is interesting for that reason, one — because it’s kind of this new domain — but it’s also the first major foray into extensions for phones because the iPhone is by far the most popular phone, and Safari is a huge chunk of users. Being able to run an extension — admittedly, inside a walled garden — is still pretty exciting.
Matt Frisbie 00:51:23 And it’s really the … Hats off to Apple for doing that because it seems like Chrome, the Google team is just never going to do it for mobile. And mobile computing is more than half of web traffic these days, and it’s kind of silly that we’re not able to use mobile devices, so good job Apple for supporting that. And so, it’s still kind of a clunky interface. So, developing for it is kind of, it’s kind of difficult and not nearly as easy as publishing to the Chrome web store, but it is certainly a promising future in the context of browser extensions.
Kanchan Shringi 00:51:55 So not as easy as a Chrome web store. So, what is the approval process? Do you have a separate one for each browser?
Matt Frisbie 00:52:02 Yeah, so each browser has its own store. So, the Chrome web store that’s for Chrome, although caveat on that is that other Chromium browsers can install from the Chrome web store, but Edge has its own Edge extension store; Opera has its own store; Mozilla has its own store. Yeah, so you have to, if you want to appear in these stores, you have to submit your bundled extension to each of these, and there’s a separate approval process. So, I’ll say that it’s a pain, for sure. The Plasmo guys who wrote the forward, they have a pipeline that allows you to automatically deploy to all the stores, which is really awesome. And I suggest anyone who has to deploy to all the stores, it’s quite something and they’ve put a lot of work into making it great. But you can also do it just, like, one-off.
Matt Frisbie 00:52:48 So if you just want to publish in the Chrome web store, there’s an API you can do it through, or you can just upload a zip file. That’s, the zip file is pretty low overhead. And then yeah, there’s — I should say it depends on what your extension has asked for. So, if it’s a low-permission extension that doesn’t ask for anything sensitive, I will typically see my extension, like, go live in under 30 minutes. So, it seems like that’s an automated approval pipeline and they, Google has some automated process that they go, okay, yeah, that’s probably not stealing anything. So, we can just publish that. Go right ahead. So, like the book has a companion extension called ‘example Chrome extension’ that, well because it has to demo all of the APIs the extension requests basically every API imaginable. So, whenever I submit updates to that extension, it takes days because obviously someone has to like sit down and look at it and be like okay, why are they asking for all the permissions in the world? And then, it gets published in the same way, but that takes much, much, much longer. So, I think the approval process is pretty straightforward. I think it’s just, developers need to understand, like, if you’re requesting certain extensions, it takes like 50 times longer for the approval to go through, which can be pretty annoying.
Kanchan Shringi 00:54:03 Let’s spend a little bit of time, maybe a couple of minutes, on testing and monitoring. Is there anything unique to testing extensions, or is it similar to how you would test any other web application?
Matt Frisbie 00:54:17 Yeah, so testing is tricky. So, a lot of it’s pretty manual. So, one interesting thing with Manifest V3 is that, for modern web developers, pretty much every build tool offers a hot module reload feature, right? So, it can quickly swap out pieces of the application that were updated; there’s no reload required and it can show it to you immediately, which is amazing when you’re writing code and don’t have to refresh the page every time. The problem is that this bumps up against what Manifest V3 allows for. So, and depending on what piece, so like if you’re writing a content script, for example, and you’re writing a, let’s say you’ve written a React widget to be injected into the page, the hot module reload can’t reload just that thing. So, it’s not compatible there. So, you have to do a page reload. And at the same time, you also have to be careful about when you need an extension reload.
Matt Frisbie 00:55:13 So, like when you’re going to kick out the service worker and replace it with the updated one. And so, there’s a whole bunch of pain points when you’re doing this. So for example, if your — one thing that I still bites me in the butt to this day is that if you are inspecting the service worker when in development, if you leave the inspector window open, it will keep the service worker alive even after you reload it. And so, all these really weird bugs like come out of the window, you’re like, what is going on? And so, it’s just the browser, it’s like okay, I got to keep this alive because you still have the inspector window open. And so, yeah, so testing extensions is kind of hard because it’s living in this weird like browser space, and then all of the traditional build tools are kind of geared towards web development.
Matt Frisbie 00:55:56 So, some things translate, but yeah, it’s still a manual process and there are still all these kinks that may or may not be worked out. As for monitoring extensions, I would actually say it’s a superior experience to monitoring web apps. And the reason for that is for the same reason that extensions are so popular because ad blockers for webpage — an ad blocker eats like half of your analytics traffic. So, for example, if you stick Google Analytics on a page, the number of people using your web app, I usually pad it by like 30 or 40% because those requests are getting killed by ad blockers. You will never know that those people are viewing your webpage. However, if you are installing analytics inside an extension, other ad blockers can’t block network requests from your extension. So, if you’re sending analytics from the service worker, you’re going to get perfect fidelity for your analytics, which is great. Like, being able to see 100 percent of the user activity and not have to worry about ad blockers eating all your stuff, that’s great. So, monitoring I would say is actually nicer than webpages because you’re not losing all that analytics data to ad blockers eating your lunch.
Kanchan Shringi 00:57:09 Trying to wrap up now. So, you did talk about the Plasmo platform, I think this was built by Stefan Aleksic and Louis Vilgo — I hope I’m pronouncing the names correctly — who wrote the forward to your book, and you talked about a use case where the platform does help for the approval process across browsers. How else do these platforms help?
Matt Frisbie 00:57:34 Yeah, the Plasmo guys, they’ve gone a really interesting direction. So, they’ve kind of built this declarative model. So, when you’re building an extension, instead of kind of explicitly labeling everything out inside a manifest file, a lot of the boilerplate stuff gets generated for you. So, they’ve figured out like a good way of injecting one or multiple content scripts. They’ve figured out a good way of managing permissions and messaging and things like that. And so, they’ve built it into this kind of opinionated platform, but you get all these benefits once you use the platform. So, they have a really, really nice command line interface. As I mentioned, they’ve got the store deployment pipeline. That’s great. Yeah, they’re actively working on it. And they’re great guys. It’s a lot of fun to talk to them. And so, I think they’re really onto something. So, there aren’t a ton of platforms, and a lot of people home roll stuff, but for anyone who’s looking to find a platform to easily get started, look no further. Plasmo, those guys are killing it right now. And they have, they’re working on some really important stuff, and they’re really advancing sophisticated extension development tools, and boy, Lord knows we need those because this space is, it’s still kind of the wild west.
Kanchan Shringi 00:58:51 It certainly is. I couldn’t find a lot of material besides the official documentations. Then I chanced on your book.
Matt Frisbie 00:58:57 And the book.
Kanchan Shringi 00:58:58 So, we’ll definitely have a link to the book in our show notes. How else can people contact you?
Matt Frisbie 00:59:06 Yeah, so Twitter’s a good way. My Twitter handle is @MattFrizz. Yeah, so buildingbrowserextensions.com is the website for the book. You’ll be able to buy the book there. There’s contact info for me. Yeah, you can find me on LinkedIn. There’s a number of ways to reach me or just Google my name. My personal site is mattfrizz.com. You can find my contact information there. So yeah, pretty reachable.
Kanchan Shringi 00:59:29 So certainly be very interesting conversation. Matt, we’ve got a lot of topics. Is there anything else you think we should talk about today with respect to browser extensions?
Matt Frisbie 00:59:37 So, there is one thing, and that is kind of where the future is for browser extensions. So, I’m sure you’re familiar with chatGPT, which was released in December and has taken the world by storm. And I think that there’s a really interesting pairing between AI tools — especially LLMs (large language models) like chatGPT — and things like extensions. So, I wrote a blog post about this, but there has been an explosion of browser extensions that use tools like chatGPT and other open AI APIs to do cool things inside the browser. And so, a lot of the ones that have come up can write emails for you and can, like, summarize articles, and there’s hundreds of extensions now that utilize these language models in some interesting way. And the pairing really opens up some interesting possibilities because like if you think about what a webpage actually is, right?
Matt Frisbie 01:00:31 It’s this hypertext; it’s a pretty consistent length — at most a few thousand words usually images and things like that. And the inputs to these large language models can handle that amount of text. And so, there’s this new space where these AI tools can like really richly understand like what you’re looking at and can unpack all these things. So, they can summarize what you’re looking at, or they can have this conversational understanding of what your browsing is. And so, there are certainly privacy implications of like do I want to be feeding this closed-source AI model what I’m looking at? But browser extensions can, what I really see them as is like, it’s almost like a, it’s a glimpse into like the future of augmented reality because they’re adding these contextually useful interfaces where you need them.
Matt Frisbie 01:01:25 So, because it can so richly understand what the page is showing, it’s able to go, oh you could really use a little widget here that does XYZ, or we should really we should format the page in this interesting way because this’ll help with XYZ. So, because an extension can richly modify and understand the page, and because LLMs are able to quickly ingest the contents of the page and do useful things with them, there’s this really interesting future where browser extensions are kind of this assistant that are kind of modulating the way that we use the web. And, obviously so much of what we do is now inside a web browser or some computing device in some form. And so, having this layer over what we’re looking at that is a smart layer and is able to modify and suggest things really opens up some interesting possibilities. So, I’m really excited about the future of this pairing between browser extensions and things like chatGPT. Maybe it won’t be the form of necessarily like a browser extension because they’re mostly limited to desktop browsers and that’s really only half a web browsing. But the ability to have this controlled layer that’s like this assistant and can understand what you’re looking at and what you’re doing, it gives a small glimpse into the future of computing, and it really excites me in a profound way.
Kanchan Shringi 01:02:47 I did notice your article on LinkedIn. We’ll definitely have these links in the show notes. This is a very interesting conversation. Matt, thanks for coming onto the show and I’m happy , we could have this conversation.
Matt Frisbie 01:02:58 It’s an unusual space and I was happy to be on to talk about it.
Kanchan Shringi 01:03:01 Thanks all for listening.
[End of Audio]
SE Radio theme: “Broken Reality” by Kevin MacLeod (incompetech.com — Licensed under Creative Commons: By Attribution 3.0)