SE Radio 611: Ines Montani on Natural Language Processing

Ines Montani, co-founder and CEO of Explosion, speaks with host Jeremy Jung about solving problems using natural language processing (NLP). They cover generative vs predictive tasks, creating a pipeline and breaking down problems, labeling examples for training, fine-tuning models, using LLMs to label data and build prototypes, and the spaCy NLP library.

Show Notes

Conference talks

SE Radio

Transcript

Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number.

Jeremy Jung 00:00:46 Hey, this is Jeremy Jung for Software Engineering Radio. Today I’m talking to Ines Montani. She’s the co-founder and CEO of Explosion, which is a company specializing in developer tools for machine learning and Natural Language Processing. She’s also a core developer of the spaCy NLP library and the Prodigy Annotation tool. Ines, welcome to Software Engineering Radio.

Ines Montani 00:01:08 Hi, thanks for having me. Very excited.

Jeremy Jung 00:01:11 So I think the first thing we could start with is defining NLP. So what is NLP for people who aren’t familiar?

Ines Montani 00:01:17 Yeah, I mean it’s a good question because actually, if you look around on the internet or follow the field, you might actually find slightly different definitions. I would describe it as processing large volumes of text. So you have text, and you want to find something out about that text. And more recently a lot of people have also included more general natural language understanding in the terminal P. So even you have a Chatbot that generates text, something ChatGPT, which most people are familiar with, that’s usually also defined under that umbrella even though itís a slightly different task than really just processing the text. But basically, I think the underlying question is there’s text and you want to use a computer to do something with it. That’s NLP

Jeremy Jung 00:02:04 And some of your talks, there’s sort of two categories I think you put problems into. One is generative and then predictive. Okay.

Ines Montani 00:02:11 Yeah, and I think that was also something kind of in response to, yeah, maybe people even being a bit confused about what’s NLP and maybe mostly thinking about the generative part and not so much about the predictive part where you extract structured data from text even though I would say it’s actually still probably in industry and in production, the main area of NLP that is used, especially in companies because there’s just so much unstructured text and there’s so many cases where you want to get the text into a format that you can compute with or work with. So I thought that distinction was very important.

Jeremy Jung 00:02:46 Can you give some specific examples of what are some generative things and what are some predictive examples?

Ines Montani 00:02:53 Yes, generative, it’s of course something classic talking to a dialogue system like a Chatbot question, answering translation. That’s also a task where text goes in, text comes out. And then predictive is really more things along the lines of information extraction. you have a text, and you want to, or you have emails that are coming in and you want to decide are these emails spam, is it about billing? Thatís what usually would be referred to as text classification. So you assign one label to the whole text or then there are other tasks where you’re really extracting spans of texts, person names, organizations, phrases and so on from text. That’s also an area of information extraction and really an area where you predict something, and very structured information based on unstructured text.

Jeremy Jung 00:03:44 And so it sounds generative is something where you’re creating new text, I suppose, and predictive is more you want to know something or learn something about text that’s already there. Is that kind of accurate?

Ines Montani 00:03:58 Yeah, you can put it like that. And often for predictive what’s important is that what you want to find out about a text, you want to map that back into the text. you want to know where are these person names, what are they, how are they related to each other, how are they used? Often you also want to stack these things on top of each other, like you want to start by maybe deciding whatís spam and then for everything that’s not spam, you want to extract what department is it about. And then based on that, if it’s about that department and it’s about billing, what’s the invoice inbound mentioned in the invoice and so in the email and so on. So there’s often there really is a pipeline of steps that you want to apply that can depend on each other and at each have different requirements and different difficulties. Some of them you can maybe use rules for, you can connect it to your database and other things are much more complex and nuanced. And then other things you actually don’t even need machine learning for because you can just use a regular expression or just write it in code. So yeah, there’s like a lot of different things people are trying to do.

Jeremy Jung 00:05:08 Since your company started as a consultancy for NLP work, can you give some examples of projects that your company took on just so that people can get a sense of what are NLP problems I suppose?

Ines Montani 00:05:22 Yeah, so I mean there’s a lot. We actually still do consulting occasionally because we feel like it’s actually very important to stay close to the use cases because if you’re developing tools for people that should solve a problem for them, you want to make sure that you’re actually solving your own problems with the technology. So we still have projects like that. And to give you examples, one common topic or use case is extracting certain information from news and then feeding that into an internal knowledge base. So you might have a company that even stuff like, a company wants to find out whether something in their supply chain or in someone else’s supply chain might be impacted and cause them a lot of problems. That’s, it’s kind of sound like a boring use case, but if you think about it insanely valuable.

Ines Montani 00:06:11 So then what do you do? You want to scrape news and find out that hey, there’s a strike in this small town here and that might later come back to us via these channels. Something like that. Or you want to analyze financial documents about acquisitions and mergers, who was bought, what amounts, and then at the end of it you might want to compute things like which emerging areas are there, how many acquisitions did Apple do in that timeframe and so on. So that’s like really classic information extraction work. And that’s also actually a lot of what we are seeing in the projects that people are trying to solve.

Jeremy Jung 00:06:51 You have a library that your company built called spaCy and I wonder if you could talk about within the context of spaCy and then maybe the things you need to use around it, how would you start to approach a problem like that where you have all this unstructured data, you have news reports you’re bringing in, you have these financial reports. Where do you get started in the process?

Ines Montani 00:07:12 Yeah, so actually that step of really deciding what to do is really the hard part. Often people might think about training a model and all of this, oh algorithm stuff, but all of that is pretty straightforward these days. The hard part is really starting out and deciding what to do and trying things out and there it’s really software development or coding. The first line of code you write is rarely the one you’re going to ship to production. So you need to iterate, refactor, try things out. And it’s very similar with data. So usually, first step even before you get into any of the tools is thinking about what you want the system to do, that sounds simple, but it’s really the fundamental problem that we are all trying to solve. We have a computer, and the computer should do a thing.

Ines Montani 00:07:59 How do we get the computer to do exactly what we want? How do we describe that? So one approach for that, if you have a lot of data, a lot of raw data is to just create labeled examples. So you take that data, you annotate here, all company acquisitions, filter the right examples out, label the spans of text that you are interested in and go through that. And you definitely, even if you’re using approaches that don’t train on label data, you always need that kind of step because you need to have a source of truth, you need to have a way to evaluate your system. A lot of people don’t do that, but if you are thinking about best practices, itís a lot of people don’t write tests when they write code. Thatís a thing, but we all know that that’s not something we should do.

Ines Montani 00:08:45 We all need to write tests. And it’s the same with an evaluation set. So you want to basically start out and create examples where you know the answer so you can test your approaches. And that’s already where things can get tricky even before you get tooling involved because you’ll realize that language doesn’t so neatly divide into categories as, you know may be naively thought and so you have edge cases, you need to decide what do I do with them, what labels do I want, how do I want to divide up the problem? That’s all the work that’s challenging. And that I would say most people, people struggle with the most and often that takes experience, that takes trying things out and then also picking the right components you want to train. Is this a problem where it makes sense to predict something over the whole text or is it a problem where it makes sense to extract spans of text?

Ines Montani 00:09:38 What kinds of spans of text do you want to extract? And so on thereís like a lot of these decisions where often you just have to try things out and where maybe your first idea or the naive idea isn’t the right way forward. So that’s all. So assuming you have all that process you have decided what you want, it makes sense, it’s something that actually works in your data, then you can start picking up the tools, assembling your pipeline. So SpaCy really is a library that was designed with these multi-step pipeline workflows in mind and really also optimize for production usage. So it’s really, really fast. You can train very small models that are very specific to what you want to do. You can combine them with maybe rules or knowledge bases you have. And if you have your data ready, if what you want to train, then the actual training can actually be pretty straightforward and can really just be a command that you run, get your model out and use it. But the path to getting there, that’s what’s tricky. And that’s also, yeah, that’s also why this is the one area that we really work on, we have been working on and also in future projects we’re working on, that’s really what we want to tackle. How can we make that process easier and easier for non-experts who don’t have a machine learning background.

Jeremy Jung 00:10:53 Yeah, so it sounds the very first step is taking what seems maybe a vague problem where you’re saying, I want to know, are there things happening in the world that are going to affect my business right? Which is a very broad question. So you start with, okay, I’m going to bring in all this unstructured data, I’m going to write an application maybe to scrape news sites and get me all these documents. But then you have to break it down into these more specific, I don’t know if questions are the right word?

Ines Montani 00:11:26 Prediction problems. I think that’s why, how we would call it. And also, I think you quickly see that your business problems don’t actually translate into one-to-one prediction problems.

Jeremy Jung 00:11:37 So the very start would be deciding, okay, maybe I need to be able to read these documents and determine what are these documents talking about and then is it a positive thing or is it a negative thing and then kind of classifying all these things and

Ines Montani 00:11:55 Or even is it what does positive or negative mean? One example was I think someone was trying to also trying a similar model for news and things that are happening, reputation management, basically how do people talk about my brand, good or bad? But and then they had this idea that well if people say bad things about the competitor, then that’s good for us and we want to extract that as positive. But that is something that’s without the context of your product and everything else, that’s not something that really translates into prediction problem. The model has no idea. Or if you label it that way and try to teach the model that the model will really, really struggle to make sense of this concept that why is this negative sentiment suddenly good? And so these are all important things you want to tackle before you actually get into the data.

Jeremy Jung 00:12:46 Yeah. So can you give a specific example of a customer that you worked with and how they broke down the problems or how they broke down, okay I need to classify this and classify this and this is how I will use it as the end result?

Ines Montani 00:13:01 Yeah, I mean often it really depends on the specific problem and it also often you have to try things out. But a lot of it comes down to really thinking about okay, what is the structured output that I want and how easy slash hard is it going to be for the model to finally produce that output? And often again thatís something where either you have to have done it before or you have to be able to try things out. There’s definitely, there was one case that I think we’re also going to publish as a case study where it’s a financial use case. They wanted to extract a lot of extremely specific terminology kind of on a level of where you read the sentence and you’re like, I get that these are all words but I have no idea what this is saying.

Ines Montani 00:13:45 But they’re over 30 attributes in this text needs to be processed almost in real time needs to be extracted. And the first thing they did was separate out these attributes into things where the value can only be one of a certain type or an enum basically. And then everything else that needs to be predicted. So for the enums, well that’s pretty straightforward. You can just provide a list of rules or examples or even something more abstract. And then the other ones well you have to create some data for them and, look at them in context. And so initially what they did was, well we want to only go through every example once, let’s queue it all up for annotation and annotate all 30 attributes together, create some data that way, push ourselves through that tedious process. And it was basically not possible to do that efficiently at all because you have to mentally focus on 30 different decisions constantly switch between them.

Ines Montani 00:14:46 So the process was really slow and then after looking at again at it again they were, well let’s try doing one label at a time and really focusing on working through that and sounds a lot more work at first because, for 30 labels you need to look at every example 30 times, but it actually turned into a process that was more than 10 times faster overall because you reduce the cognitive load on the person creating the data. So I thought that was an interesting example because that stuff also matters. And that’s I think also what often keeps people from even going down the route of looking at the data in the first place because it feels this incredibly tedious click work when a lot of it is just about how you structure it and maybe coming up with more creative and smarter ways of looking at your data.

Ines Montani 00:15:39 Because I think overall going by example is still a great way of telling a computer what to do. In a way, I think it has a lot of advantages over writing prompts like people know from say ChatGPT or similar models. But it’s currently a lot harder. So whereas prompts are very accessible, anyone can just write something and have to model, do it, creating examples, training a model is a lot more work. And so that’s actually why this is the area we find super exciting because well okay, how can we take this expert work workflow that is superior in a lot of ways, but make it almost as straightforward as writing a prompt.

Jeremy Jung 00:16:20 In that example you gave, you have these 30 potential labels you can give to a text, can you specifically speak to, are they looking at a sentence or are they looking at a paragraph and then they’re, they’re just labeling that one thing or what’s the …

Ines Montani 00:16:36 So it’s expense of text. So it’s a financial analyst writes something that’s happening in the market and thereís a lot of this specific terminology around how much, what market. Again, it’s one of these things where if you don’t have a background in finance, youíre like, what? It’s barely English, but that’s how the data comes in. And then ideally in real time you also want to have a structured format of it and you can’t make your analysts do this differently because they’re already reporting their things as fast as they can, but you can potentially build a system that can handle it and they actually managed to build an incredibly fast pipeline that’s super small like six megabyte model, which especially nowadays is like whoa, it runs entirely private. Which is another advantage of really being able to build your own systems because you don’t need an API, you can run it in-house.

Ines Montani 00:17:33 And if you are working in a field where that’s important you can’t get around that financial data can’t even hit any third party service apparently even in in the office the people who have access to that are kind of walled off because what they have can like impact the market. So yeah, but they were able to achieve very high accuracy and quite quickly by approaching the problem in kind of a smart way and also reasoning about things. Again I feel I keep saying this in a lot of other contexts, oh you just need to sit down and reason about stuff because, and I think while it’s not the most attractive answer or piece of advice because you still need to sit down and reason about stuff. But ultimately, I think that’s what a lot of it comes down to in software development in AI, the best practices we’ve learned about how to build software still apply and it’s not all changing because suddenly it’s AI.

Jeremy Jung 00:18:30 Yeah. So I think in this specific case they’re looking at these spans of texts and then I don’t know if they’re financial analysts or something that, but they have all these categories or ways to classify what they’re looking at.

Ines Montani 00:18:42 Yeah, it’s like there’s this price at which something is traded and here is, this is about the crude oil market and then this is the location of some port in China, wherever where something is like even though a lot of finance is so digital, there are some areas that ton of stuff happens where so much money is moved that is then shockingly physical where, yeah you suddenly have some port in somewhere in the world, but then you also have other stuff about carbon credits and that is just an example from finance and it was just kind of fresh in my mind. But thereís a lot of that in probably any area you can think of any specific domain. Thereís a lot of that data that’s also, that’s specific and not generic and not general purpose and that makes it hard to work with, but of course also a lot more valuable in something that is a lot more general purpose.

Jeremy Jung 00:19:38 Yeah, in this case it could be things like the location, the financial market, the product or item that they’re talking about.

Ines Montani 00:19:47 Even though I have looked at this and thought about it and have been writing it up to present to people, it is abstract and I don’t even remember all the stuff that it’s about. So for that use case or also for the people who consuming that data and the customers they want to consume this feed as quickly as possible, see what’s happening in a structured format, be able to do analysis of it, feed it into their system and then make decisions about the market, the market becomes more transparent in the process. So that’s all. Yeah, very important. But it’s also very specific and the other use cases we have, I think one we’ve also published around journalism, there’s in data media, content creation, there’s like a bunch of stuff around that where you have user generated content so you don’t even know what you’re going to get, but you want to be able to classify that, detect bad behavior, detect how you can classify, label something, basically make the work that normally humans would do more efficient. Not even necessarily replace humans but make humans better and faster at their work.

Jeremy Jung 00:20:56 Yeah, and the user generated example, I could see someone looking at social media posts maybe and being able to pull out, okay, what’s the subject and are they speaking positively about something or are they attacking someone or is basically being able to figure out what’s this person talking about and are they violating our policies in some way?

Ines Montani 00:21:19 Which brands are people talking about? How are people talking about my brand, our competitor or I don’t know some other industry. That’s all important. And that’s also were having a model that’s a bit more accurate actually makes a big difference because you’re finding out more and you are also finding out things you didn’t already know ideally. If you are, really working with data in a good way, you actually, you are interested in the stuff you don’t already know and not just use it to confirm things you’ve suspected. At least that’s where, it really starts becoming valuable.

Jeremy Jung 00:21:51 Yeah. And in this case, and this kind of goes back to the predictive part you’re talking about earlier is as software developers we’re used to working with structured data, we’re used to working with data we can put into a database and we have all these properties or attributes we know about it versus if you have unstructured text yeah it’s very difficult to get any kind of insight into it because you need to be able to extract all the information out first.

Ines Montani 00:22:19 Yeah, and I think it’s also still important having databases and having, I mean even if you think back to even before computers, what were companies doing? People were categorizing stuff on index cards and there’s this tremendous value in structure and that’s also why I think this is also not going away. You might see some hot takes of people saying, well if we now have these massive, large language models that can just generate stuff, we just feed them all over our text and then we can ask it questions about it. And there are a lot of use cases where that’s pretty exciting and really adds a lot of value, but it’s not necessarily companies are not going to throw out all their databases that they’ve been working on since before they had computers and all the structure they have and replace that with some black box that is much slower, less accurate and more expensive than what they currently have. That’s not how new developments in the field will impact industry. It’s a lot more nuanced and in a way actually much more exciting than that.

Jeremy Jung 00:23:22 Let’s touch on that a little bit more because you gave some of the examples of where a specialized model is setter because you said you can run it locally, you have trained it yourself with your own examples so you have a better sense of that it maps to your expectations I suppose. But where are the, the areas where a large language model is good for this kind of pipeline or what are the kinds of problems where they are good in this case?

Ines Montani 00:23:51 Yeah, so it’s, it’s always a tradeoff and it’s usually it’s also not an either-or question. It’s mostly you can combine the different technologies. So one thing of course where especially a lot of the latest models that have come out are really adding capabilities that we couldn’t do before, that’s really around the text generation. So being able to accurately summarize a text for example that used to be not that great and that took a lot of work to get that to at least be okay and now this is pretty good. So if you have a workflow where you have a long document, you want to summarize that and then extract things from the summary, which is much easier to work with. That’s something weíre taking a large generative model really does something that we couldn’t do before.

Ines Montani 00:24:35 And there’s also the aspect of getting started and really prototyping quickly. Before we always had this problem that there is this bottleneck in the initial data. So before you have a model that predicts something and gives you an idea of does this idea of how I’ve structured this problem, does it make sense? Is this a basically what we talked about before, you sit there reason about things come up with a way to break it down and to test that you need at least a few hundred examples that are also good and if you really do it properly, that’s easily 40 hours of work maybe that you put into that before you have a model that predicts something and it’s probably quite, and then maybe at the end of it you train your model and the accuracy is zero, then you have to figure out what’s the problem?

Ines Montani 00:25:22 Is it because my data is bad? Did I forget some hyper parameter? Did I have a bug in my code? Like it’s, that part is very frustrating and that is something where very easily now with an existing model an API, you can get to a system that’s kind of okay and lets you have a prototype and test out your things almost instantly with the right tooling around it. So for example, for SpaCy we have the spaCy LLM library that provides all these components in the pipeline but powered by a large language model for example via OpenAI, in API or locally. And so you have the same predictive behavior, but the model and the functionality come via prompt from the large language model. And so in minutes you can have something running that you can then improve and then you can also test does it work at all?

Ines Montani 00:26:18 How does it work on my data? And that’s a huge improvement and then you can see which parts you can also mix and match. Maybe there are some things where, where it’s a very generalizable general-purpose task where a large general-purpose model does really well and you want to keep that and then there’s maybe something else where you realize actually you’ve spent all this time trying to predict it but you cannot get better than a regular expression. That’s also often a realization you have to make or why it , why it pays off to have a baseline because yeah, you might spend a lot of time and then realize, yeah, okay this is, yeah you can’t get better than EGX.

Jeremy Jung 00:26:55 And so it sounds when you’re working on a NLP type task, you have all these different tools available to you. You have, like you said, you even said regular expressions, you can try, you have these models where you’re giving examples and

Ines Montani 00:27:10 I would call it transfer learning. So that’s really the idea. You start off with some weights that have been pre-trained on vortex. So you start off with some knowledge about the language and the world and you don’t have to train it from scratch. So that’s something kind of these bird sized models that maybe people have heard about. And the idea here is you still train something specific, but you start off with a lot of knowledge. And then now we also have in context learning, which is, a lot of those large language models that people know from the GPT family for example, especially the later models that are released where you can use a text-based prompt to control the behavior and then you get text out usually. And yeah, those are kind of the, the different options and they’re good for different things and you can combine them for even better workflows.

Jeremy Jung 00:28:03 And, and it sounded what you were saying earlier too is that you might start with using a large language model to solve a specific problem and then based on that you might decide, oh it’s good enough I can just use this. Yeah. Or you might see it and go, well the accuracy’s not that good so I need to go and do some additional like tuning of a model.

Ines Montani 00:28:29 Yeah. Or you could use it to create data for you as well. That’s actually a lot of experiments we’ve been running where we’re trying to test how many examples do you need and how quickly can you generate them actually using the LLM in the loop as well to generate the examples. So you only make corrections which is super-fast and then how long does it take until you can train a model that reaches the same accuracy Because once you get there you have a lot of other benefits. You have a model that does exactly one thing so you don’t have to, you also don’t have to worry about all these you have all these other parts of the model that you don’t need and maybe there’s unexpected behavior. It’s very, very difficult to predict. So you have a model that really only outputs yes or no or it outputs an array of numbers and that’s it.

Ines Montani 00:29:16 That’s what it does. And it that means it can be much, much smaller. So again you can have a powerful pipeline in production that’s six megabytes for the whole thing and then it can run at 10K words per second depending on how you’ve built it. But even a thousand words per second is pretty good if you’re processing things on a large scale and your model can be entirely private so, no data needs to leave your service, it can run offline, don’t even need internet for it. It can run on your local MacBook and you can even sometimes train it on your local MacBook if you have a modern processor. And yeah, if you can get to a model that you can actually make better, that’s like, you really have the best of both worlds and I think especially as the technology develops, which makes for example larger and larger models possible at the same time that it also makes us able to use compute much more efficiently.

Ines Montani 00:30:09 So that means we can also go the other route and try to go smaller and faster and better. And that’s I think an area that’s almost more exciting and more useful for practical contexts than seeing how large we could get. That’s also an interesting area but that’s also, I find it a bit less rewarding than saying, okay, how can we break that down more and make it more practical? That’s where also where APIs make sense because you have, yeah companies OpenAI, they can batch up the requests. There’s a lot of stuff you can do at scale that let you make these API calls cheaper, but you could not provide that if you are running it locally. So it makes sense to have it as an API or to have different providers. But of course then you always have the problem of calling an external API you have the data privacy question; you also have the latency things are getting faster but it’s still making an API call takes a while.

Ines Montani 00:31:08 You might have one word per second for some cases that’s okay for other cases you can’t work with that. And yeah, also the general reliability question of no API provider will be able to provide every model ever. They have to even OpenAI has to retire old models, otherwise thatís already, I donít know how many millions it costs to keep a model app. But if you’re using an older version, the model may change and even if the model is overall performing better and people are reporting better results for your use case, it might be that whatever changed just suddenly makes it worse. So it’s how in software you want to pin your dependencies in some way, you want to make sure that you can reproduce that and that does become a lot harder if you don’t own the model weight. So it’s kind of, it’s always a tradeoff but I think also it is possible to get around that and work around that. The future is not just bigger and bigger black boxes that development also enables kind of a whole counter development.

Jeremy Jung 00:32:10 That’s a good point about pinning dependencies because I’ve heard people talk about the OpenAI GPT models where the version of the model hasn’t changed but people report oh it’s not giving me the same answers it was before. Yeah

Ines Montani 00:32:26 And an OpenAI says well we didn’t change the model. And technically that might be correct because they might have changed some other logic because again that it’s also not public. You have a general idea of how the system works but you don’t know what else happens. And since it’s an API, there also has to be a lot of sanitizing and just working around really bad use cases and so many different layers. So yes, something else might have changed, not the actual model but it’s still changes and you can’t know what changed because you can’t look inside.

Jeremy Jung 00:33:00 Yeah, there was a recent example I thought was pretty interesting where there were people saying , oh during the holiday timeframe it felt the model wasn’t performing as well and there was a theory that perhaps it was because the internally the model is adding the current date to the prompt. Oh so that it knows when you’re making the request and maybe that was influencing the results. So it’s, it’s interesting when you, you don’t have full insight into what’s happening. Yeah, yeah,

Ines Montani 00:33:29 Yeah. That is an interesting, well and also this point out how important it is to really have an evaluation set. I’m often surprised how; many people gloss over that fact because you don’t want to test your model by just typing something in or trying it and seeing how you feel about it. That’s another thing, itís Iím sure a lot of also researchers who report these things about the models, they actually do have an evaluation that they run and then they can say, well today the model definitely per performed 5% worse. But if you’re not doing that then you’re just, it feels today ChatGPT is giving me worse answers. But you can’t quantify that works if it’s just you privately asking it questions about coding or something. But if this is your production system that’s worth a lot of money or that delivers a lot of value, that’s not how you want to evaluate it.

Jeremy Jung 00:34:17 Yeah and OpenAI they even publicly responded where they said, oh we’ve been hearing these reports that the model seems lazier or something like that. I mean what is it? Itís yeah what is that seems which is it very hard, very hard to evaluate against that.

Ines Montani 00:34:32 So I think that’s also why you want to be able to do better than that when it’s really about your product and about what your company is doing.

Jeremy Jung 00:34:41 Yeah and I think it was interesting that you mentioned how to help train a model. You need all these examples and you could use a large language model to generate the example.

Ines Montani 00:34:52 Or to add the structure to do the stuff you otherwise would have to do by hand.

Jeremy Jung 00:34:56 Yeah. And so would that be if the example was — if we needed a bunch of user reviews that we were going to train on — that you would have the large language model generate the review and then give you a first pass at annotating it, or I’m trying to understand.

Ines Montani 00:35:12 My suggestion was mostly annotating. You have a lot of raw data, but you need a lot of examples of reviews where you specifically highlight the products in those reviews, and you scrape them all, but you want to be quicker and you don’t want to do it manually. So then you can actually, for example, you can use something like spaCy LLM, for example, that really gives you the structured output or in our annotation tool Prodigy, we have workflows for that, where it basically just streams that in and you just say yes or no or maybe make a slight correction when it’s wrong. There is also synthetic data, another huge area where you’re really generating the text, and for some things that can work especially, if you can define the constraints around what’s possible. But of course you also have a risk there of feeding model-generated data back into the model and you need to know that it’s actually representative of what your model will see at run time.

Ines Montani 00:36:07 Otherwise there’s no point in doing that. So, but I think the annotation part is definitely something that you can really use the model for. And then if you combine that with some transfer learning where you’re starting off with pre-trained weights that already know things, you can actually get to a point where you only really need a few hundred examples and that’s not a lot. That is something even an individual developer or a small team can just do. You’re not at this point anymore where you need to think about large-scale annotation efforts and Mechanical turk and hiring a bunch of click workers, stuff like that. That is all something that got massively disrupted by technology. And I think it’s good because I do think the practices around the whole mechanical turk market, that was all very shady. You had people doing this work for, I don’t know, $2 an hour, $5 or less. And then creating this data with no connection to the task; that’s not what you need. You want subject matter experts, domain experts — imagine having financial analysts, medical professionals being able to have them in the loop and help you with the data and if you make the process efficient enough then you can do that. And that’s really great and then that really puts the data where it should be.

Jeremy Jung 00:37:29 So it sounds instead of generating the examples, you would scrape them or get your own collection and then you would use the large language model to do a first pass at annotation so that when you have your experts or your people looking them over, they don’t need to start from scratch. They can see, oh all these things have already been annotated and they look right and I can just click the green checkbox.

Ines Montani 00:37:53 Yeah, or this even, can take it one step further and maybe even have a model or some algorithm that maybe tries to sort them in a way that you start off with those that are maybe most important or that the model is less certain about or that look different from the others. And so you can actually focus the human’s time on probably the ones that need more attention than the others. So there’s a lot you can do as part of that process. And yeah, especially also many people start off by having the raw data. That’s why the data generation and synthetic data is a big field, but often the kind of main bottleneck or problem is that you’re sitting on all this raw data or you’re sitting on another example company reports and internal reports and stuff people have written like we’ve amassed so much text over decades that are all sitting around somewhere on computers and you have all this stuff and you want to analyze it in a structured way.

Ines Montani 00:38:52 And so there’s no shortage of raw data, but the hard part is deciding what you want to do with it, how to even define the structure and then creating examples where the answer or translating it into a prompt for first pass, which is also tricky because, again the easy part is it’s accessible, you can just write instructions. But in a lot of cases that also ends us back to where we started before machine learning where you have to define a lot of rules and have to write down this whole manual thinking of all edge cases. And that’s where you get to point where hey actually it’s a lot easier to just give it a few examples in context that show the nuances of language.

Jeremy Jung 00:39:34 And so you were talking a little bit earlier about how before people would have to, it sounds maybe train a model from scratch where you needed lots of examples and that’s why you’re referring to the mechanical turk and hiring all these people to do these annotations for you. But then you mentioned now we have transfer learning and there’s some models you can just take off the shelf and maybe do a little bit of tuning, give a few examples and get the results you need. Are there a few specific examples of those models that I guess, within the last five or 10 years have really made it easier to solve these kinds of problems?

Ines Montani 00:40:13 Yeah, so mostly what I’m referring to here is you basically start off with the representations of the words and then what you train on top is really just the layer or the part that does the specific things. So you start off with, for every word, you have this vector for it that sort of in context represents what it means in, different scenarios. So you’ve built up these huge representations and that actually turns out to really improve your model across all kinds of tasks. So that’s this whole idea that really was popularized with BERT, the model from Google and the paper and this idea that if you train a system to predict the next word, given the previous words, you build up representations of the words in context as sort of this side product that are really, really useful for all kinds of other things.

Ines Montani 00:41:04 Because it turns out that by doing that you building up this knowledge, this abstract knowledge that really contains a lot about the language in the world. Also all these capabilities that the model needs to learn to predict the next word are also super useful for all kinds of other things you’re doing with text. So it’s actually interesting because speaking of specific models, because for a while there was this surge of so many different of what people now call foundation models which really just provide these representations. There was a new one every day and a lot of different things happening or researchers trying out different things. But I feel at this point the community really has converged on a very small set of models often in the BERT family that have consistently done well and that everyone is using.

Ines Montani 00:41:55 So even something this RoBERTa BERT, a lot of these kind of family models also for different languages for different, that is usually what works the best and the really the general purpose pre-trained weights, but also you can train these things yourself and there are even ways to train these word embeddings and word representations in a way that doesn’t cost tons of compute. If you have a lot of raw text, you just let it run over your raw text and build up these representations using the same language modeling approach, predicting the next word. And then you can even do that for your really specific data if you’re sitting on millions of, I donít know, regulatory documents or you have your specific legal stuff or some biomedical drug trial reports, things all this kind of stuff. You’re sitting on all these millions of documents you can pre-train your own embeddings and then use transfer learning and then you only need hopefully very few examples to really add that layer that tells it what specifically it should do.

Jeremy Jung 00:43:07 So these models that I suppose descended from BERT, they have an understanding of maybe general language, but then what you’re adding on top of that is an understanding of your specific domain, whether that’s finance orÖ

Ines Montani 00:43:23 Yeah, or your specific task like extract this thing or these labels and yeah. And before we had transfer learning you kind of had to do all of that from scratch and that was even harder. And that’s also where this idea of armies of annotators came from because it’s really, it’s kind of like the concept of getting a new employee and training them versus raising them from BERT even though it’s, it sounds kind of a creepy example, but before you really had to raise this model for BERT and teach it everything, if you wanted it to predict what’s a verb and what’s a noun, you had to give it so much data that it learned all of that and now you can actually start off with a much larger foundation that way. And also there’s so much context about the world and how certain things are used together. Like once you now get into this or really think about how we use language, there’s so much that’s very arbitrary that makes absolute sense to us and is super intuitive, but it’s really weird to a computer at first and that changes too. So there’s these models have to change as well or your models constantly have to evolve because the world isn’t static and language isn’t static.

Jeremy Jung 00:44:35 And it sounds for someone who is new to this world, I mean we’ve talked a little bit about spaCy, but it sounds like SpaCy itself maybe has collected these different models where somebody who has a specific problem, they can use spaCy and there’s APIs available that are going to help direct you towards which model you should use.

Ines Montani 00:44:58 That’s the idea. Or even beyond model, it’s more about, the specific task or we really try to focus on language and language specific tasks because that’s often how, people who start out coming with an NOP problem, think about it. You don’t usually think about it in terms of vectors that come in and vectors that come out. And there’s a lot more general machine learning contexts, language is treated like any other input or any other vector. And in spaCy we’ve always tried to present it with data structures and with basically concepts that are about language. So you really have a dock object that holds the text, and you have words are split in a linguistic way they also layer down this split in the way that’s most computationally efficient, but on the surface you get actual words. And then from there you have components for extracting entities, for predicting labels over text, for applying rules and so on. And basically also takes the very opinionated approach because we don’t want to provide five different model implementations with different tradeoffs. We want to find one that’s the best, at least, that’s the goal. So we want to give you one best way to do a thing that’s kind of through the whole library, that idea.

Jeremy Jung 00:46:13 So it sounds like from the start, if I am looking at unstructured text and I don’t have something like spaCy, I might start by using regular expressions or split statements to try and figure out, like where are all the words? And is this a sentence and all that sort of thing?

Ines Montani 00:46:31 Yeah and then you quickly realize that, oh sentence, let’s split that on periods. And then you realize, oh wow, thereís actually so many instances of the period character across the language that you never thought about. And yeah, it starts from there. And then on the other end of the spectrum you can put your text in ChatGPT and ask it questions about it.

Jeremy Jung 00:46:50 Yeah. So it seems like a lot of the work that everybody will have to figure out if they’re working with natural language, spaCy gives you the data structures and APIs where you’re working directly with, I suppose the linguistic concepts or the, like you were saying, you get the words, you get the sentences.

Ines Montani 00:47:11 You have data structures for different connections of words for different underlying labels and you have, we also provide pre-trained pipelines that you can download that have a general purpose that you can start with, and a lot of documentation around how to break down your problem. But yeah, the part where you decide what you want is sort of that’s inherently yours and that’s also something that someone else can’t do for you or, you shouldn’t want to lean on someone else to tell you what you should want, maybe people expecting that. But I think also, especially if it’s around a real-world use case, a business problem or even a hobby you’re really passionate about, like you want to be the one you need to be clear about what you want and if you have a good idea or approach it, well then it’s going to be successful. And if you want things that don’t make sense, then you kind of you want to get to a point where you figure that out and adjust these things. But I think, and in that sense, it’s running a business, it’s writing any piece of code or software and it’s ultimately still down to us making decisions basically.

Jeremy Jung 00:48:19 Yeah, I mean it’s not an NLP specific problem, but itís any type of engineering challenge we take on, we have to break down what is the actual problem we’re trying to solve.

Ines Montani 00:48:30 Yeah, exactly. But I feel people often forget about this part when they come into AI or NLP because isn’t it supposed to be artificial intelligence and isn’t it supposed to just be magical? Because I think often that is how it’s presented to us whether it’s in the media, whether it’s even by companies trying to sell the AI so that people come in with expectations that are maybe not realistic about the workflows.

Jeremy Jung 00:48:57 Yeah, though I suppose with the large language models, it can sometimes get closer. Like somebody can have a vague problem and they put it into the LLM and maybe it’s not perfect, but it’s closer to, I have this vague idea in my head and I get an output.

Ines Montani 00:49:14 Exactly. And I think that is where text-based prompts can be quite helpful to guide you through that process. And I think that also does set a new standard for this is how easy it is to get started, but then you often kind of hit a point where you want to go beyond that. And I think what’s interesting is, okay, how can we bring that sort of experience that people have with hey you can just sort of get started, be guided more easily, can you don’t need to do any preparation, you don’t need to really understand the depth of machine learning. That’s what prompts provide and how can we bring that same experience, not even necessarily, oh just chatting with a thing, but more like this ease of use to the quote unquote expert workflow or to the workflow that opens up more possibilities, which is working with creating examples, training your own models. For me that’s kind of probably the most interesting question I’m thinking about and working on right now. And that’s also kind of where our future product is going. Like Prodigy teams, which sort of builds on top of the idea of creating labeled data as a way to tell the system what to do, but then sort of bringing it into the Cloud and making it more accessible to people who don’t come in with a strong developer background.

Jeremy Jung 00:50:30 It sounds maybe the start for somebody who doesn’t quite know specifically what they want, they just have a high-level idea. They might start with using a large language model and then see what results they get from that. But then based on that, then they can maybe work out, can I break down my problem into more specifically what I want? It’s a little hard to describe, but the example you gave at the start was about looking through news articles and figuring out, are these things good or bad for my company, I guess?

Ines Montani 00:51:05 Yeah. Or first, okay, let’s filter out the noise then next let’s extract, I don’t know, the areas and countries and cities that I mentioned. Because then you can match that up with cities you’re doing business with or cities you can batch that up with your supplier database and take the locations where your suppliers are and stuff that. And I think often that is that’s also something you just have to think about it and have those kinds of ideas. And then, you also want to have these, again, modularity. That’s also something we know from normal software development, and we use that as a good approach because it makes our systems more robust. It makes our systems easier to test, easier to understand. And the same thing is true for machine learning models as well. You don’t want your whole pipeline to completely depend on each other and you want to be able to change one part of the system without the whole other stuff potentially falling apart. And you want to be able to evaluate each component and not have it all be this one box. And so I think especially if someone is coming into the field from a more classical software engineering background, I would say, don’t let all of this AI stuff, I don’t know, distract you from the right path that you already have. there are a lot of concepts and things about good software development that you can apply and that definitely apply to AI models as well.

Jeremy Jung 00:52:29 Yeah, and you sort of talked about this earlier, but in the context of when you’re solving a problem, there may be parts where you use a large language model for some parts and then use a more traditional model. You can run locally for other parts and so you build up this pipeline.

Ines Montani 00:52:47 Itís kind of like reusable functions or I think for a lot of these things there are parallels in more general software development. You try to separate your logic out into reusable functions. You try to not repeat yourself; you don’t put everything into one massive function, into huge blocks of conditional logic, things that. You want to find the right level of abstraction. You could easily think everyone goes through this journey when they start developing software, you learn this new concept and then you’re like, oh now I’m going to apply this to everything. And then you end up with this code thatís so abstract that it’s completely impossible to understand and then you kind of go back and youíre like, ah wait, now I went too far the other way. And those are the same kind of lessons that you learn in machine learning. And I think it’s a lot less magic and a lot more practical work with the data and trying to find the best way to define what you want. That’s really always what it comes back to.

Jeremy Jung 00:53:46 Yeah, it seems like the part that’s maybe unique or different in this case is that when you’re making traditional software applications, it’s deterministic in the sense that you’re writing line by line and you’re writing rules and you know that okay, if this happens then it’ll go on this branch and if not, it’ll go on this branch. Whereas this is more of a, I’m going to give the system examples of what is right and then,

Ines Montani 00:54:14 And then it will try to make sense of it and predict something and sure. And I think actually that brings up a good point of how is your application dealing with wrong predictions? And again, that is then also that’s not a machine learning problem that really is an application design engineering problem. You don’t want to set up your system so that any failure has catastrophic effects. Like you want to work around that. And sometimes people work around that by not actually surfacing any predictions to some end user, but having a human in the loop who then just uses the output to help them and so on. They’re a lot of these questions around it, but in a way, you want to use all this new technology and kind of try to get back to a situation that’s as close as possible to a deterministic solution that you’re used to from traditional software development.

Ines Montani 00:55:04 Like you kind of want something that’s easy to control and easy to predict and you want to reduce complexity where you can because that does make your system easier and better to use and better to work with. And yeah, that’s another thing people often forget if you’re just following research, if you are actually implementing a practical project for your own thing, you are allowed to make the problem easier for you. In research you’re trying to find the hardest problems because they’re hard, but you are allowed to make the problem easier if it is better and solves the problem. And that’s also where I think a lot is won by just taking a step back and thinking you can change the problem. Just do something else that gets you to the same solution. And if that’s better and more robust, then that’s a better solution.

Jeremy Jung 00:55:54 Do you have a specific example and say your work with customers where they were trying to solve this problem, that’s really difficult but you’re able to go, okay, let’s rethink what problem we’re actually trying to solve and make that easier.

Ines Montani 00:56:07 So one classic example, it’s a bit comical once you think about it, but it basically, this was I think some application about texts for, I donít know, some court records and basically it mentioned children and maybe it’s about custody or something. And the task was, well we need to filter out the names and days of birth of children so we can censor them and review it because we don’t want to, I donít know, publish the names of minor so that’s yeah, classic use case, lots of data. And so the first initial approach they took was to introduce two new categories and, translate the business problem. So one of them was child name and child date of birth. So at first glance, okay that really translates the business problem. And then, you can go and label all names of children and all their dates of birth and then try to train a model on it.

Ines Montani 00:56:58 But of course if you take a step back, there’s nothing inherently in a name that is a child or not, it’s just a name and also a date of birth. The date of birth of a child is defined by the date being however you define who an adult is, depends on the area but like 18 years if their date of birth is less than 18 years ago, they’re a child. So if you actually included this in your label scheme, you’re basically trying to teach the model this distinction and try to teach the model that. And you can do it with enough data and enough work you can maybe make the model sensitive to that information. But it’s a very roundabout and of course much more difficult way of framing the problem when instead you could just be detecting dates and then you take these dates, you do some normalization which you don’t even need machine learning for and then you do mathís in Python or in whatever language with the dates and then you can figure out whether someone is a minor and it has the added advantage that your model isn’t outdated tomorrow literally because tomorrow people who were a minor today will be an adult.

Ines Montani 00:58:09 So, again it’s the thing where yes, if you hear that example people might laugh and be that’s so obvious. But really thinking about it is easy to make these sorts of steps in a direction that really makes your problem a lot harder than it should be. Especially once you get really deep into the weeds of something domain specific. And I think these mistakes quote unquote are easier to make than you might think. And that’s why I think it’s always important to, think of these examples and evaluate your problem and see am I making my life a lot harder or can I just make the problem easier?

Jeremy Jung 00:58:45 Yeah, I think that makes a lot of sense just in in general really whenever we’re trying to solve a problem, can we break it up differently. And like you were saying, the more we can have used deterministic rules-based code versus stuff that goes into a model that is a little bit more opaque.

Ines Montani 00:59:07 Or even if it does go into a model, you kind of want to still have it be as predictable as it can and have clearly defined scope so, you know okay that’s where it can go wrong and if it goes wrong, here’s kind of what I can do about it so my whole thing doesn’t fall apart or all of your components are separate and can be separately improved. So you know if this component is struggling you can replace that and improve that or maybe put a more powerful model into that one component and handle the other one separately.

Jeremy Jung 00:59:38 Because the smaller the thing that is trying to accomplish probably the easier it is to troubleshoot.

Ines Montani 00:59:45 Yeah. And also if you have it in production, you really want to analyze it and you want something that is operationally simple or you want to keep the operational complexity low. But, for a lot of reasons and if you can do that for your model, that’s good. Just the way it’s good for any code you write if you can achieve that point. And I think everyone has had that experience of like I donít know, you’ve refactored some piece of code or you learn some more stuff in the meantime and get, write some code and compare that to some other stuff you’ve written and you’re like oh my god, how did I write that in such a convoluted way when actually the solution is so straightforward and it can be very similar for NLP and machine learning.

Jeremy Jung 01:00:29 So we haven’t really been specific about what languages that we’re talking about when we’re talking about natural language processing when it comes to English versus Chinese or all these other languages we have in our world. Are the models specifically tuned for certain languages and are the APIs and spaCy specific to languages or can people kind of use them sort of in a generic fashion?

Ines Montani 01:00:56 Yeah, so in general in NLP, I think this is rule of thumb, the more similar languages to English, the better the technology will generally perform because a lot of it was developed for English. There are some models especially nowadays that are kind of trying to be language agnostic, but especially for in spaCy because, we really also try to focus a lot on the language and the linguistic kind of view of, like hey, here are words we found that often you can achieve a lot by really focusing more specifically on the language and working with data sets and also kind of exploiting what specific languages need. If you come from this English centric view, separating text into words feels pretty simple. Like you split on white space and then you add some rules around it and that’s it. But then, once you’re looking at Chinese for example, that’s not an option and you really want to have actually a statistical model that predicts where words start and end.

Ines Montani 01:01:53 So that is also how spaCy solves that at the moment. So for Chinese you actually need a statistical tokenize because, it’s not really viable otherwise. And then they’re also in English base forms of words are also pretty simple with a few exception, like you have, I donít know, trees and then it’s tree thatís quite basic. There are also not so many variations of that, but if you go into other languages, even just creating a word Cloud really needs you to be able to generate the base forms reliably. And that can be much trickier in a lot of languages that have inflected forms and things that the average English speaker might not even think of. Or for example my first language is German, we have really long words that people make fun of but it’s just you have one concept, it’s one word and in English you have like you have income tax declaration, something and also separate words, but it’s one expression and this is just the basic stuff of words.

Ines Montani 01:02:54 And so I think it’s, it’s worth it to take a look at the specifics of the language and actually find different solutions and implementations that work a bit better depending on what you’re doing. But of course it all stands and falls with available data. The one thing we struggle with the most in spaCy, we have a lot of languages that we have basic support for and you can train models, but when it comes to providing pre-trained pipelines, there’s not always that much data available. And sometimes there is, but it comes from academia, and it doesn’t have commercial licenses. And then we are like man, we can train something, but nobody can really use that because it’s a library for commercial use. We don’t want to provide pipelines that are non-commercial only. So that is a big problem and that’s also why I’ve never really fully had time to focus on it. But we really want to, we’ve always really wanted to invest more into making more premium models and actually just doing our own data work and doing, even just doing annotation ourselves even for more complex things across different languages that can actually solve a lot of problems for our users and for general purpose pipelines.

Jeremy Jung 01:04:04 Are there any other companies that have done that that have released premium models that people have to pay for but kind of gives you a head start?

Ines Montani 01:04:12 I mean there’s a lot of APIs that are sort of in that area. I know that they are, they’ve always been yeah companies developing private models. But yeah, I don’t know of any exact examples in the open-source space, but we’ve basically, we have these pipelines that we distribute for free, but we actually have a survey on that currently on the spaCy side to get sort of people’s interests. But of course it does mean we need to invest significantly into doing the work, but then we could provide these pipelines for a subscription and also keep them updated. I think that’s actually one area is just create more data and really build these specific models for these generalizable linguistic tasks that are better and then also keep them updated. Because if you actually think about it, a lot of these corporate people are training on, they’re from 2005 or a lot of them, there are a lot of these current concepts that like the model has never seen and it’s struggling with or something like Covid. Many even the more, more newer systems and even chat systems around that time initially were struggling because COVID had been around for, I don’t know, a few months at the time and no model has seen Covid and that’s all we’re potentially interested in.

Ines Montani 01:05:32 But again, it’s this is a lot more investment upfront. But at the same time we’ve also think another parallel line of work we’ve always done is trying to make it easier and easier for people to train their own systems. So even if you can start off using transfer learning with spaCys, basic tokenization for Arabic for Chinese for Finnish, and then start by maybe plugging in a spaCy LLM component to just see how would it look if I extracted person names, how would it look if I classify this text? And then you can try it and then you can take that, create some examples for it and then use spaCy to train and then see okay, how far can I take this and when can I beat my baseline? And then you basically have this really tiny fast private model that you can almost run on your laptop and thatís a pretty cool outcome.

Jeremy Jung 01:06:31 Yeah, so it, it seems maybe the common flow would be you use an LLM like ChatGPT or something to basically build the prototype to see, is this good enough? And then if it’s not, then that’s when you would go and start doing the transfer learning.

Ines Montani 01:06:50 Yeah, exactly. And that is also something you can do within a spaCy pipeline. So you have the data structures, and you really have this object that you could work with in your code and it behaves exactly the same. And it really also it’s all implemented in C extensions in Python, so it’s really fast it, you can serialize it, they’re basically all these considerations around good data structures that you can I know iterate over and work with and store and convert. And then you set that up in your system in your API, you maybe plug in an LLM component. And then if you add a point actually also do really boring rule-based baselines. I think I mentioned this before, try a regular expression, see how far you can get with a word list because you’d be surprised. And also, for some areas, for some use cases where people think, oh my God, I’ll never solve this, I need so much machine learning.

Ines Montani 01:07:40 And it turns out that actually course of computers are efficient. If you think about, I don’t know the number of all cities in the US above a certain size, I don’t know what I looked up the other, but it’s actually not that many. And it is depending on what you’re after, for example, if you’re looking at cities and in the US or something, you could provide a list and it’s not problem these days to match over a list of 10K, 20K, 50K examples. So, and you can download that off the internet from Wikipedia that list and then you’re done. So there are these things that are just interesting to try for comparison because then you get, you can evaluate that and you get a number at the end and then, okay, that’s what I want to beat. And if your machine learning model is worse, then, man that’s not really, you’re not quite there yet.

Jeremy Jung 01:08:28 Yeah, so don’t underestimate how fast computers are these days, I guess.

Ines Montani 01:08:33 Yeah, and how, how much you can actually just do by computing and also even a lot of if what you’re doing can run in parallel, which is what a lot of, text extraction is you actually, you can run this on CPU machines in the Cloud and can just run a lot of them that can be significantly cheaper than say renting out this expensive GPU machine and running that in the Cloud. There are a lot of areas where you can optimize and that’s actually also why we’ve always focused on also providing workflows for CPU, which is something that not many other libraries in machine learning are doing, but we’ve actually seen that it makes a difference and there are users that really process billions of words with spaCy and have found that, well it’s just if I can run this on CPU, it’s just a lot cheaper and I could still run it really fast. And so there are all these tradeoffs that again, it’s just normal software development, it all still applies and just because there is a new way to do a thing doesn’t mean the previous way goes away. If anything, often the previous way gets better in the same process and then you want to make sure you’re not forgetting about techniques that are now much better.

Jeremy Jung 01:09:52 Well I think that’s a good place to end it on, but is there anything else you wanted to mention or?

Ines Montani 01:09:58 No, I think I’ve talked a lot. I feel weíre also getting almost philosophical at times, so no.

Jeremy Jung 01:10:04 Cool. So if people want to check out what you’re up to or see SpaCy or Prodigy, where should they head?

Ines Montani 01:10:11 Well our main website explosion.ai has an overview of everything we’re doing and also we always to showcase things from the ecosystem, things other people publish. So I think that’s the number one place to go. And then from there you can get, go to our docs, you can find me on social media, maybe say hi at some conferences. I’m trying to attend a bunch of stuff again this year and I’ll be speaking at some conferences in Europe, so, yeah.

Jeremy Jung 01:10:36 Very cool. Well, Ines, thank you for joining me on Software Engineering Radio.

Ines Montani 01:10:41 Thanks again for having me.

Jeremy Jung 01:10:42 This has been Jeremy Jung for Software Engineering Radio. Thanks for listening.

[End of Audio]

SE Radio 611: Ines Montani on Natural Language Processing

Show Notes

Conference talks

SE Radio

Transcript

Join the discussion

More from this show

SE Radio 613: Shachar Binyamin on GraphQL Security

SE Radio 612: Eyal Solomon on API Consumption Management

SE Radio 609: Hyrum Wright on Software Engineering at Google

Menu

Recent posts

Search

Search

SE Radio 611: Ines Montani on Natural Language Processing

Show Notes

Conference talks

SE Radio

Transcript

Join the discussion

More from this show

SE Radio 613: Shachar Binyamin on GraphQL Security

SE Radio 612: Eyal Solomon on API Consumption Management

SE Radio 609: Hyrum Wright on Software Engineering at Google

Menu

Recent posts