I am pleased to have Colin Raffel for today’s interview. Colin is currently working as a Research Scientist at Google. Colin’s research interests broadly lie in areas like learning with limited labeled data, transfer learning, especially from an NLP context. Colin is also one of first the authors of the seminal paper Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5).
Among Colin’s other works, my favorite ones are on semi-supervised learning — FixMatch and MixMatch. If you are interested in knowing more about an extension of the MixMatch work, be sure to check out this ICLR 2020 paper on ReMixMatch. In the past, I have also enjoyed his work on Theoretical Insights into Memorization in GANs.
Previously, Colin has been a resident at Google’s AI Residency Program where he spent over a year conducting fundamental Machine Learning research. He will be joining the Computer Science Department at the University of North Carolina as an assistant professor in fall 2020. To know more about him and stay updated about his work, be sure to follow him on Twitter.
An interview with Colin Raffel, Research Scientist at Google
Sayak: Hi Colin! Thank you for doing this interview. It’s a pleasure to have you for the interview.
Colin: Thank you for having me!
Sayak: Maybe you could start by introducing yourself — what your current research interests are, your general approach towards conducting research, etc?
Colin: My current research is focused on making machine learning algorithms less reliant on labeled data. This includes things like unsupervised learning, semi-supervised learning, and transfer learning. In general, I like to choose problems that have substantial real-world impact, but that also require some ingenuity and effort to solve. Learning from limited labels is a good example of something with significant practical implications (since most people don’t have access to giant labeled datasets) but that is sufficiently open-ended that there are a lot of interesting problems to chew on. I also like to do work across domains (text, images, speech, music, etc.).
Sayak: Sure enough! Your research philosophy really reflects on your works and so does your desire to pursue it across different domains. I am curious about what got you into machine learning in the first place and also helped you develop motivation for the sub-fields you are currently working on.
Colin: Before and during my Ph.D., I was mainly researching methods to help machines understand music (and in turn help humans make music). This includes things like automatically transcribing a song or detecting beats in a piece of music. I quickly found that the most promising and powerful tools for attacking these problems were based on machine learning. This eventually led me to develop an interest in core machine learning methods. Starting from music gave me an appreciation for the difficulty and expense of obtaining labeled data — for example, it is pretty expensive to pay a human to transcribe a piece of music by hand. Music research is also typically underfunded compared to other fields since there’s comparatively less money to be made, so data availability is always a problem. Of course, music is not the only field where this is true, so I’ve become more generally excited about alleviating the need for labels across domains.
Sayak: When you were starting what kind of challenges did you face? How did you overcome them?
Colin: Early in my Ph.D., neural networks were just starting to become useful again thanks to advances in compute, dataset size, and regularization. I was interested in applying neural nets to music research, but there were no classes or professors at Columbia who were using them since they had been largely supplanted by kernel methods. This led to a lot of self-learning (for example using the excellent and out-of-date Theano Deep Learning Tutorials) and also led me to create the Columbia Neural Network Reading Group and Seminar Series. Working through papers and inviting people to come to give talks helped me in a big way.
Separately, I also only had a single GPU to run all of my experiments for my thesis work. A natural choice for some of the problems I was working on would have been recurrent neural networks, but processing long music sequences with RNNs was just too slow on my dinky GPU. This led me to develop feed-forward attention which can effectively aggregate the information in a sequence using fully parallelizable computation.
Sayak: “Necessity is the mother of invention” — they said, rightfully, they said! Your research works have spanned so many diverse areas, be it transfer learning in NLP models, be it semi-supervised learning, be it GANs. How do you tend to go about this process i.e. when to work on what?
Colin: I try to do research that would have a substantial impact, i.e. help people solve an important problem that otherwise wouldn’t be solvable. Most of my recent work has been motivated by the fact there are many problems that are ripe for using machine learning but lack enough labeled data to use standard supervised learning. This has led me to work e.g. on unsupervised learning (GANs) and semi-supervised learning. It has also caused me to focus on core machine learning methods that can have an impact across domains. Furthermore, I try to choose problems where there is some reason that I personally should work on it, whether it’s because I have some important expertise, insight, or other advantages. This is simply for the practical reason that there are so many people doing machine learning research that it’s best to work on things that you’re the most likely person to make progress on.
Sayak: Your research philosophy is really very well-grounded. Thank you for sharing in detail. This will be helpful for many of the budding researchers, for sure. I am certain many readers would be interested in this question specifically. What really motivated the work on T5? It’s such a comprehensive and well-written paper!
Colin: Thank you! It’s nice to hear that since the paper was such a gargantuan amount of work. The motivation really stemmed from the fact that there was so much exciting progress in transfer learning for NLP. I think transfer learning really started to work in a meaningful way in 2018, which led to a huge explosion of new techniques. Whenever this happens, it can get hard to compare the advances in different papers and suss out which contributions are the most important. For example, if two papers come out within a month of each other and both purport to improve performance, it can be hard to determine which one is more useful since it’s unlikely that they share the same experimental setup. Our goal in the T5 paper was to provide a comprehensive empirical comparison of all of these different techniques, and then explore the limits of current methods by scaling things up. I have worked on similar papers in the past, e.g. on deep semi-supervised learning and on evaluating machine learning algorithms for music.
Sayak: Semi-supervised learning is something I am personally studying these days, so I will be sure check deep semi-supervised learning out. I am also interested to know about your stint at the AI Residency program. What kind of research projects did you work on there?
Colin: The residency gave me an opportunity to branch out of working specifically on music and into working on core machine learning algorithms more generally. I spent the first half or so developing a monotonic attention mechanism that allows for online and linear-time decoding. I also used the residency as an opportunity to collaborate with a bigger group of people and ended up doing similar work on a differential subsampling mechanism and using gradient estimation to learn hard attention.
Sayak: Wow, those are a very diverse set of problems there! What are some of the areas in Machine Learning you are currently excited about?
Colin: The fact that large language models form an implicit knowledge base through unsupervised learning seems ripe for a lot of powerful and interesting work. Knowledge bases have historically been expensive to create and brittle to query, and the fact that an LM (Language Model) can perform similar operations using simple large-scale dumb unsupervised learning is super exciting. We have some recent work showing that this can produce state-of-the-art performance in open-domain question answering. More broadly I’m excited by the general trend of leveraging unlabeled data to improve generalization and robustness, for example through self-supervised contrastive learning. I hope that in the future it is completely standard to leverage big unlabeled datasets instead of just doing supervised learning.
Sayak: Self-supervised learning is something I am also very excited about to see being used across a wide range of domains for practical use-cases. Being a practitioner, one thing that I often find myself struggling with is learning a new concept. Would you like to share how do you approach that process?
Colin: If possible, I think it’s incredibly helpful to reimplement the concept based on an existing codebase. Unfortunately, most open-source implementations of machine learning methods are not very clearly written or well-documented. Reimplementing it yourself, and carefully and clearly documenting how it works, will force you to deeply understand the method. Basing your implementation on an existing one also gives you a nice fallback when you are stuck or something is not working.
Sayak: I absolutely concur with this philosophy. It also enhances your reasoning power to some extent which is helpful in developing a research mindset. Any advice for the beginners?
Colin: If you can, find a group of like-minded and motivated people who want to learn the same things as you. This is helpful both for solidarity and for giving and getting help. When I was starting to learn about neural networks, I benefited a lot from being part of the group of people who made the Lasagne neural network library. This gave me a community of peers to bounce ideas off of and learn from.
Sayak: Couldn’t agree more on this part :) Thank you so much, Colin, for doing this interview and for sharing your valuable insights. I hope they will be immensely helpful for the community.
Colin: Thanks for interviewing me! It was fun to reflect back on my path.