Our interviewee today is Ines Montani. Ines is the Co-founder of Explosion. This is the company that developed spaCy which is one of the leading open-source NLP libraries. Ines and her team at Explosion also developed Prodigy which is an annotation tool for AI, Machine Learning and NLP.
Ines is an international speaker too. She enjoys giving talks at several conferences and loves teaching people. I had the pleasure of meeting and greeting Ines at PyCon India 2019. All the details of her talks and blogs can be found here. Ines developed a free online course Advanced NLP with spaCy and if you are interested to learn spaCy (from its core developer) you should definitely check out this course.
I would like to wholeheartedly thank Ines for taking the time to do this interview. I hope this interview serves a purpose towards the betterment of data science and machine learning communities in general :)
An interview with Ines Montani, Co-founder at Explosion
Sayak: Hi Ines! Thank you for doing this interview. It’s a pleasure to have you here today.
Ines: Yay, thanks for having me!
Sayak: Maybe you could start by introducing yourself — what is your current job and what are your responsibilities over there?
Ines: I’m the co-founder of Explosion, a software company specializing in developer tools for AI and NLP. We develop spaCy, a popular open-source library for NLP in Python and Prodigy, a scriptable annotation tool for creating training data for machine learning models. Even though I’m also the founder and director of the company, I still spend most of my time writing code and developing our products, which is what I love most about my job.
Sayak: Oh, that’s absolutely amazing and different from many, for sure! I am curious about how did you become interested in Natural Language Processing. Would you like to share something about that?
Ines: I’ve always been interested in language, and I’ve always been interested in computers. But it took me a while to find a way to combine the two things. I met my co-founder Matthew Honnibal (who’s also the core author of spaCy) in 2014, shortly after he’d left academia to write spaCy. The first project we worked on together was an interactive visualizer for syntactic dependencies. I did linguistics (and some computational linguistics) as part of my degree, so the concepts behind it made a lot of sense to me, and it was exciting to see what was suddenly becoming possible. It also made me realize that there was a lot of potential for combining the different things I was into and reasonably good at to make NLP more powerful and easier to use.
Sayak: Fantastic! Domain knowledge getting applied right away! When you were starting in the field what kind of challenges did you face? How did you overcome them?
Sayak: I certainly agree on that, Ines! What were some of the capstone projects you did during your formative years?
Sayak: The quite adventurous of you, really! Your talk at PyCon India 2019 was titled Let Them Write Code. Would you like to share what motivated you to speak on this topic?
Ines: “Let Them Write Code” is our philosophy for building developer tools. People often ask us for advice about how to design libraries, or why spaCy’s user-facing API works the way it does. So I thought this would be a good topic for a keynote. There are few reliable rules for designing developer tools because it’s all about trade-offs, which means advice about this can easily become vague. But we spend a lot of time thinking about these questions. The gist of the talk is: make your tools programmable and let your users write code.
After the talk, I saw a comment on Twitter saying something along the lines of “I could never put my finger on why I liked spaCy’s API, but now it’s all clear!”, which made me really happy.
Sayak: I really liked the points you discussed throughout your talk and being a developer myself I could really connect to those :) These fields like machine learning are rapidly evolving. How do you manage to keep track of the latest relevant happenings?
Ines: Since we’re not publishing research, “staying up to date” means something slightly different for us. We need to follow both the current state of the art research, as well as people’s actual use cases and challenges in the industry. It definitely helps that we have a very close connection to our users, given that we’re all developers.
The most important thing is understanding the larger arc of where the field is heading and what sort of tools people will need. For instance, when we designed Prodigy in 2016–2017, our main observations were: a) people will keep needing at least some amount of custom labeled data, b) the process is iterative and ongoing, not something that can easily be outsourced, making annotation part of development, and c) transfer learning will improve, so we won’t necessarily need massive training corpora, even for really large models.
Sayak: Those were some important observations back in that time, really. Being a practitioner, one thing that I often find myself struggling with is learning a new concept. Would you like to share how do you approach that process?
Ines: I’m actually pretty bad at sitting down and learning something just for the sake of it — so I always need a project, something I really want to do and that allows me to learn everything I need to know along the way. Sometimes that’s a work-related thing, sometimes a hobby project, sometimes a little bit of both. And the projects typically consist of something I already know very well, and the new thing I want to learn.
People learn differently, so this approach isn’t going to work for everyone. But if you feel like you’re struggling to learn the “traditional” way via books, courses or videos, maybe try and find yourself a project. For instance, let’s say you’re a data scientist who likes rock music and wants to learn how to build a website. Do some cool analysis about the music you’re into and try to put together a website showcasing the result.
Sayak: Thanks for sharing that, Ines. I too follow a hands-on approach when I am learning new things. But sometimes, I tend to go old-school and start by looking at the things that motivated what I am about to learn. Are we getting to see a new course (or even a book) anytime soon?
Ines: We’ve started thinking about a more linguistics-focused course, explaining more abstract concepts via spaCy’s API. I think a lot of users feel that they’re missing some background knowledge to really assemble full solutions out of the various pieces, and some of that comes down to linguistic concepts like the dependency parsers, morphological features, coreference etc.
I also want to produce more video tutorials for our YouTube channel — for example, I’m currently experimenting with using transformer models like BERT and XLNet in Prodigy recipes for even more efficient annotation. Once I have some proper results, I’d love to show off some workflow ideas in a video.
Sayak: I will be waiting eagerly to check them out when they are released, Ines! Any advice for the beginners?
Ines: I kinda hate giving advice, because everyone’s situation is so different. And everything I could say is going to sound cheesy and generic, like *✭˚･ﾟ✧*･ﾟfollow your dreams and don’t give up *✭˚･ﾟ✧*･ﾟ*. Maybe my advice is, listen to less advice?
Sayak: Haha, that was different! Thank you so much, Ines, for doing this interview and for sharing your valuable insights. I hope they will be immensely helpful for the community.
Ines: Thanks, my pleasure!
It was amazing to know about the technical caliber of Ines. She started off at a pretty early age and built her way in a pretty uncanny manner. Being a developer herself, she does quite a bit of developer advocacy for when she is developing tools for the developers. This rare quality sets spaCy and Prodigy apart from many of the existing tools that we have today for AI and NLP.
I hope you enjoyed reading this interview. Watch out this space for the next one and I hope to see you soon. This is where you can find all the interviews done so far.
If you want to know more about me, check out my website.