January 6, 2023
Curtis Northcutt is an American computer scientist and entrepreneur focusing on machine learning and AI to empower people. He is the CEO and co-founder of Cleanlab, an AI software company that improves machine learning model performance by automatically fixing data and label issues in real-world, messy datasets. Curtis completed his PhD at MIT where he invented Cleanlab’s algorithms for automatically finding and fixing label issues in any dataset. He is the recipient of the MIT Morris Levin Thesis Award, the NSF Fellowship, and the Goldwater Scholarship and has worked at several leading AI research groups, including Google, Oculus, Amazon, Facebook, Microsoft, and NASA.
Julian: Hey everyone. Thank you so much for joining the Behind Company Lines podcast. Today we are in the presence of Curtis Northcutt, CEO of Cleanlab. Cleanlab improves any ML model by improving any dataset pioneered at M I t the three M I T PhD. Co-founders are inventing the future of data centric AI to make AI work more accurately and more reliably on messy real world data.
Thank you so much for joining the show. I'm really excited to chat with you, learn a little bit more about, you know, what it takes to really kind of build the models that you're building, but also your founder journey going from, you know, PhD and research, and I know you have some other extracurriculars that, that we got into a little bit pre-show that we'll definitely break into in the show here.
But so exciting to learn about your experience going from research now to building and starting a company. So before we get into Cleanlab and what you're working on now, what were you doing?
Curtis: Yeah. Before Cleanlab. So first of all, happy to be here. a nice intro. Prior to Cleanlab I did a bunch of things.
I'll do the short version. Let me know if you'd like a longer version. Sure. I grew up in rural Kentucky. My dad was a mailman and my mom works still today in a call center. Spent the first 18 years just trying to figure out with limited resources, how do you make a difference in the world? Being very hard on myself and asking myself questions like, am I a good person?
How do I be a good person? How do I try to contribute? How do I make a difference? And the only way I knew that was to do that was through the education process from where I came from. So I just tried to do well in school. Went to Vanderbilt did pretty well there. At Vanderbilt, I did some internships.
I worked at nasa, and then I went to General Electric, and then the summer after that, the NSF. In an r u experience at Notre Dame. After that, I went to Microsoft, worked on the Windows phone, which was a failed product, but a good experience actually. It was a good internship. Yeah. Learned a lot in product management side, which was interesting.
Yeah. Most people in computer science at the time I was studying physics and mathematics. I switched, transitioned to computer science. Was very technical, but I thought, Hey, it's good to be well-rounded. So picked up some PM side. After that, I went to MIT Lincoln. Went to MIT for a PhD and Masters Spent eight years.
I did a long masters. I was slow. It took me four years. I felt like when I entered MIT I was very confident from. Where I had come from, I was pretty good at things. And then I went to m i t and I realized, wow, everyone here is much more experienced and talented than I am. And I had a lot of hard work to do.
After those four years, I did pretty well was able to win a thesis award. And I built m i t in Harvard's cheating detection system. During that time, I discovered the biggest problem in building. Cheating detection system that works for every course in every college. At the time, there were around a hundred courses on Ed.
And I wanted the cheating detection system that I had built to verify and validate a certificate for anyone in the world. Yeah. As I mentioned, I'm from Kentucky, and so where I come from, it would be a big deal for someone to be able to earn a certificate online from m i t. It would change their life.
Yeah. But there was a problem. The problem is that people were creating two accounts on one account. They were copying all the answers, and then on the other account, they were submitting all the answers and they were learning nothing and achieving a certificate. Then they would go get a job with that certificate that they unrightfully earned, and their manager or boss would realize that they don't actually know their stuff.
And the MIT certificate that they've earned is worthless. And what that means is now everyone else who's from Kentucky or Africa or some poor rural. They now try to use that same certificate, but they can't get the job, and so this is a very important problem to me. I wanted to be able to use technology to make a difference.
Not to have, you know, people sort of ruin it for everyone. And so that's why I was building the system. And what I discovered along the way was that you actually cannot, at that time train a machine learning model on noisy labels. There was machine learning algorithms had been invented, assuming perfectly curated data, not real world data.
And what I discovered is real world data is very erroneous and very messy. And it was at that time I started to change my PhD direction instead of focusing on cheating detection and what is the intersection of computer science and online education and human learning, I started shifting more to machine learning algorithms that can deal with real world data so that we can start to make AI actually work for real co real companies, real people, and real problems.
And so I spent the next four years focused on that problem. I spent some time at Microsoft. I spent some time in Jan La Koons Group in 2016 at Facebook, AI Research Fair in New York City. That was a really good experience. I then went to Amazon the summer after that, worked in the Alexa group. I used some early versions of Cleanlab which was one of the first solutions to find errors in any data set and to train machine learning models robustly so that they would work with noisy labels.
And I implemented. In the Alexa data pipeline to help them train a better Alexa model. So when you say Alexa and the device wakes up that was something I worked on. After that I spent two years with Richard Newcomb, who's the director of research at Oculus, which was then bought by Facebook which then became meta.
But I was with them in the very early days when they were still Oculus research. And then there were lots of obvious transitions that those were two very vital years. I learned a lot about managing big organizations, getting to work direct directly with a really cool guy, Richard NewCom, who I consider a friend and a colleague, but also he was a good mentor.
. And then after that I went to Google and I helped the okay Google and hey, Google teams that make your device wake up if you have an Android phone, . And did something similar. I use Cleanlab. I built Cleanlab directly into their database. I know we haven't chatted much about Cleanlab, but this is how I got here.
Yeah, at this time, Cleanlab was just an open source package. And if you're curious, I know this is a long story, but this is interesting. . The reason why open source Cleanlab originally was because when I first tackled that education problem I had told, I told people, Hey, like I have a new solution that will find errors in a data set automatically.
And that's actually, it was a lot for people to swallow at the time. And so the only way I knew to convince people was just to open source the code. And so then instead of saying this thing works, I told them, try this thing. It's open source, you can try it yourself. And people started to realize it does work.
And after that, The papers that I was trying to publish started getting accepted and there was a lot more confidence that, hey, this stuff like actually does work. Yeah. So at that time now, the open source package is integrated at Google and I used it at Amazon. And so, I had sort of done the route of the Facebook, Microsoft, Google, Amazon.
Nasa, M I t, Lincoln Lab, et cetera. And I started my first company as a founder. I ended up using Cleanlab to build that company and then I realized Cleanlab is the real technology. Yeah. And so after I completed my PhD, I went full-time as CEO and Coe co-founder at Cleanlab, and I found two of my best friends who I spent 10 years with at.
They, I was able to get them to join me and I say able to get them because these two people, Anish Ealier and Yos Mueller, my two co-founders could have done anything in the world, . But they chose to work on this with me. And I take a great deal of responsibility for what that means. Yeah. Because these two people could be changing the world on anything.
And so to have them at Cleanlab. We have a responsibility to do good here. Yeah. Because I'm taking them away from other things they could be working on. And that's how I got where I am today.
Julian: Amazing. And oh God, I have so many questions from that background and that experience, but I guess for the other founders out there that are listening talk about that process of. You know, the conversation and convincing your co-founders to jump in, because I know a lot of us have, you know, exceptional individuals who are next to us, but it's hard to get people bought into an idea or a mission, or a company or us as leaders. How is that conversation and what ended up being the compelling reason why co-founders joined?
Curtis: Totally. . I think that it was actually less about convincing them and more about, these are my friends who I have obviously more than two friends, although, you know, when you're a co-founder and you're spending all your time on a company it's like, you know, you do your best. But yeah, . But the point is like these two people in particular are some of the most amazing individuals that I know.
They're also people who I greatly admire, but that's just sort of how I set my life up. , the people who I consider friends and who I spend a lot of time with and I invest in are tend to be people who I admire. And so, I admire them greatly and I just recognize that each of them had a lot to contribute on this particular problem, on this task.
And so, yeah, with, in the case of, and we also, we had worked on papers ahead of time. So it wasn't sort of like a, Hey, out of the blue, do you guys want to work on this? Sure. It was actually a slow and methodical. We've spent a lot of time working together. We did some research papers together. We got to know each other over many years.
Realized that these are people I can trust. These are credible people. These are people who have had a long history of success. There are people who work harder often than I do, and they constantly push me to work harder. And those are the kind of people I want to be around and I wanna surround, you know, myself with.
So all of that being said, then I had to make sure it was a good fit. And so I needed two people. I've spent 10 years on this problem, so I know how to sell Cleanlab. I know how to run, sort of, you know, how do you build a team that understands what we're trying to do and the. But what I needed a lot of help with was on the technical side.
Even though we all three have PhDs in ml you can't do all the technical things and also all the business things in a very quick timeline. Easily, yeah, it can be done. There are single co-founder or single founders, but I didn't want to do it. I wanted to work with people I trust and who I think would make the company.
And so Anish is very interesting. He did his PhD in parallel distributed operating systems at mit. And he is someone who I could trust fully to run our engineering team, but also to see the vision of where we should go in terms of both systematic things, machine learning things, and just fixing issues and problems and data.
Yeah, he has a good vision for that and we've worked on that in the past and he's been very successful. He also is really good. And at Gith. . So anyone who spends a lot of time on GitHub may know of Anier, he has over 30,000 stars on his individual projects, which is more than most companies in like the data quality and labeling space combined.
Wow. And he's just one, one human being, so it's actually really impressive. He's also just really unique, like if you've ever taken, have you ever taken the course called the Missing Semester, CS? . , it's this course online and some of his videos are teaching like GI and like really boring stuff, but there are like over half a million views on those videos because he just does such a good job and he taught it in an MIT lecture hall.
He's just a very unique person and someone, yeah. Who I think anyone would be proud to work with. But in this case, we needed someone who understood how to build and scale like complex distributed AI systems that handle complex ML training. To be able to both run a team and someone that you can rely on trust.
And I needed someone like an niche, so it was just a good fit. And then my other co-founder is Jons Mueller. And Jonni is he's also really good at startup things, but the thing that I think is most wow about him is that he spent four years after his MIT PhD. He finished early quicker than I did.
He was very fast and went off to Amazon to work with Alex Small. And if you haven't heard of Alex Small, he runs. Like a, I don't know if not all of it, but he's definitely like director level or head of AI at at Amazon. He runs one of the larger research AI groups at Amazon. And Jonas worked with him directly, which was a cool opportunity, but what's cooler is what he did there.
He built and was the leading developer and inventor on auto glue on for four years, and all of aws, which I'm sure you've heard of, and every startup founder has heard. A w s when you run SageMaker or any sort of auto ML solution that is running auto gluon and our chief scientist and co-founder Jons Mueller built that.
And so we needed to solve this problem where we're improving machine learning for any data set, right? That's what we do. And so who better to have as our chief scientist than the person who built Amazon's platform to train a model on, for any model, for any data set. And so the combination of someone who can distribute and build paralyzed systems and run really fancy and good engineering with someone who knows how to train any model in a really efficient manner that works for any data set with myself who spent my life figuring out how do you build or not my life, but the last 10 years, figuring out how do you train.
Any model to be better and get an improved model by improving the dataset. And you put those three together and you have a powerhouse team, and that made the cell pretty easy. And also we treat each other well. And so that, that's, that makes you want to keep sticking together and working together long term.
Yeah. But yeah, that's how I convinced them.
Julian: I love that. I, it's incredible. And obviously now you have a rockstar team and you're well underway and you've used and have a lot of use cases with the technology. But just taking a step back again, . What, when you talk about messy data and when you talk about retraining these models with different data sets, are you, what does the data look like that it has a difficult time interpreting into something that's maybe clean or repeatable or routine?
I'm not sure the exact nomenclature is. But what are, what do you mean by messy data? And how do you make it cleaner? Or are there other systems, or is it a larger data site that essentially maps it to recognize it as something that's now. ?
Curtis: Yeah, that's a great question and I can answer with products.
So, we have two sort of main ways that you can use Cleanlab technology. One is Cleanlab Open Source, and one is Cleanlab Studio. And I'll just tell you what they solve. So, and they solve exactly the question that you're asking, so very specific P tasks. Yeah. All of them can improve a machine learning model, but the Cleanlab Studio will do it a little bit better and it's a lot more time and engineering is required to achieve.
And then Cleanlab. Open Source will do, if you have engineering manpower, you can use the Open Source code and you can get a lot of improvements with more algorithms. So the Open Source has more algorithms and then Cleanlab Studio has more engineering behind it to make it work really well if you need like a gooey and an interface and everything automated.
Okay. That in mind I'll answer your question. So here's some of the tasks that we solve in the open source. If you have label error, so you have a. And it's labeled dog, but it's an image of a cat. But say you have 10 billion image, or say you have 10 million images, okay? It's very hard to find, you know, the 10% of 10 million images.
How do you find those 1 million? Like, are you actually gonna look through the other 9 million images and waste your time checking those two? Like that's not a good idea. So what you want is an automated service that will just tell you, look, these are the million that we think are. And we think you should double check them.
And so the open source has a very simple, you know, line of code that you can run. It's called Cleanlab dot filter, fine label issues. And that will tell you exactly the bullion indices of everything we think are errors. But we have a lot more stuff than that too. We'll also give you a label quality score.
So there's Cleanlab dot rank. Dot label quality scores. And that will tell you for every single data point, images, audio, text, tabular, 3D data, it doesn't matter the type of data for any data set, for every data point, we will tell you the probability that label is actually correctly labeled.
And so you can rank your data and now only spend your time on the data that matters. Wow. So that's one example. And I can give you, like, I'll give you four others really quickly. So another one is, . So you have an image of a farm, but the dataset is actually images of animals, but there's no animal in that dataset, and you have no class for like other, or clutter or farm, right?
Yeah. So that image just doesn't belong in your dataset. So what do you do with that? If you train a model on that, it's gonna that image of like grass is gonna be labeled, you know, chicken, but there's no chicken in there, right? Like it's, and your model's gonna get confused and now your model's gonna think grass is.
So what we do is we find all those for you automatically, and we rank every example, and then we provide all the data for you that is out of distribution or is an outlier. And that's another thing that we support. I'll share a couple others. So we also, if you have a bunch of labels, okay, so you have, see it's called multi annotated data.
So you have data and then you have mini annott. Yeah, so you have three or four or five, or maybe for some of your data, you only have one annotator. So you have a mix and it's like complex, like a real world situation. Yeah. And what you want is you want to know, given that I have all these annotations for a single image that only has one label.
and three of my people said it's chicken. And two of them said, cow, how do I know what to label it? . And so there's a lot of algorithms out there that just take into account those labels and then they come up with a consensus based on a majority vote. We did something much fancier. We actually train models in the backend than those models will look at the distribution of data and then have a prediction, and then we wait in an ensemble, which is a collection of models.
. We take the multi annotations and we also take the models predictions and we combine them in an intelligent. That is able to give you the one true label that we have with some uncertainty that we can quantify is the right label. And we can do that for every data point. And so that's much stronger than like your typical crowdsourced algorithms.
And we were able to benchmark and show that it's much stronger. So that's another example. And then I'll just share like a couple more. We were just coming out it's not released yet, so anyone listening, we'll get a little sneak preview. with a full automated active learning pipeline. And if you're not familiar with that, is this means you already have some data labeled.
, and now you have more data and you have to answer this question. And the question is, should I label more data or should I improve the labels of my current data? Which one is gonna improve my machine learning model More? . And that's a hard question. And we automate that now. Yeah. And this is the first package that does this really cleanly and like in a robust way for enterprise.
And so now if you have a model and you just wanna improve it by labeling new data or fixing your current labels or getting better uncertainty of your form, of your current labels to decide whether you wanna train on them or not, now we can automate that with the active learning package that's coming out.
Some other quick things and then I'll stop cuz it's a long answer. But you know, this is what we do, right? Yeah. Yeah. , if you have a re regression task this is support that's coming out very shortly. That one we're pretty excited about. So not just labels, but like real valued targets. For example, you have you know, a ball is dropping and you're trying to guess what the height is, you know, and so that's like you're, you wanna predict where something is and you have like a num numerical location. Is it a dog or a cat or a chicken? So this is something we also support. And so if you have targets that are really off of what they probably should be, we can detect those for you.
You can clean those out, train on clean data for regress or for tagging tasks. So we have multiple labels per data point. We also tell you if you're missing a label or if some of the labels that you have are wrong and you should have fewer. And so this is another task we solve. We also solve for like entity named Entity Recogni.
This is built into the open source. So the open source truly is a full, fully fledged data-centric AI platform. And platform's not quite the right word. It's really a a foundational set of tools to let you solve any task in data-centric ai. And that's a pretty powerful thing because this, I don't know if you've heard like Andrew Wing or any of these folks talking about data-centric ai.
have you heard anything about this? No, I haven't. No. Okay. So, so the core idea is that you can improve a machine learning model by just trying a bunch of models and hyper parameters, which is what people have been doing for like 20 years. Yeah. But then what they realize is when they get out of school, they stop doing that and they spend like 90% of their time just trying to get a a good data set.
Yeah. Yeah. You know, if I'm a student and I'm in school and my teacher is telling me wrong, I'm probably not gonna learn with high accuracy, but if my teacher's telling me really good examples and it's good quality and there's not too many out of distribution examples, there's not too many outliers, there's not too many label errors.
I have good high consensus labels. These are all things that we automate at Cleanlab. Then I'm probably gonna learn pretty well in a decent amount of time with high accuracy. And that's like the dream, right? Yeah. And so having one single platform that allows you to do all of these tests, well, that's what we do.
And those are some of the. I'll add one final thing. Cleanlab Studio. So everything I shared in the open source finds these issues for you. We're obviously a company, so what's our product? You know, what's our business model? So we provide the open source algorithms completely for free, and we allow anyone to use them.
And this is like, we're all from mit. We really believe in the open source model. We take it, we took, taken a lot of research and a lot of publications and we've made them public. We've shared them with the. Because we believe in that model and what our hope is that people who use this stuff and get value will check out the product because it goes way beyond the open source.
So the open source is like a doctor who diagnoses, Hey, these are the problems in your dataset and if, and you're welcome to build a bunch of, you know, engineering and pipeline around that. Or if you wanna fix those problems, you're gonna need a gooey and you're gonna need to be able to have an interface where you can see your data.
Yeah. And see the issues and be able to quickly select and train a better. And that's what Cleanlab Studio does. It automates the fixing and improving of the data set. And so if you're, people are sort of curious, you know, what do I do with the open source? Well, with the open source you find issues and then you probably will just throw them out.
That's what most people do. And you train on a subset of your data. But if you wanna train on all the data with correct labels, then you use Cleanlab Studio. And that's, those are all the different types of tasks that we solve and where each fits in.
Julian: and it's in, it's incredible to see the specificity that you can really, you know, train these data sets and also get the data into a position where it's actually usable and you know, it's tangible enough to maybe, you know, come up with conclusions, both the products on top of and really you know, improve the systems that it's working on.
And so you were working on Amazon, you were working at all these different companies using this technology and improv. And when you made that transition to focus on Cleanlab as a company, what was that transition like? Focusing on, you know, the research and building a repeatable, sustainable and incredibly sophisticated model.
, but then now transitioning into getting more and more people to use it and getting more and more. Companies or clients or individuals to not only use the product, but also to sell. Its I would say premium features that allow it to, you know, kind of take it to the next level. What was that transition like for you in your experience now focusing more or less on the technical side and more so on the business side of, you know, client acquisition, I'm sure and finding the product market fit and exactly who you're, you know, seeking out your ideal customer profile. What was that transition like?
Curtis: Yeah, we made some discoveries along the way. One thing that I think we've done differently than most founders, or at least most artists that I'm aware of, is we didn't spend any money on marketing. What we did, we, we did of course, but we didn't do it in the traditional way.
Yeah. I'm under the belief that at least in the early days of a company, you need to prove yourself as viable without spending a million or $2 million in ads. There should be people, obviously people have to know about the existence of what you're building, and so if you haven't achieved that, then you, there's no point in building something great if no one's ever heard about it.
So you do have to at least know that some people have heard about it. But I started just giving talks. I was invited to the Databricks conference and then Snorkels conference, and then I gave a talk with Andrew W and then I gave a talk at nps. And what I started to realize is people had heard of Cleanlab.
And so that was enough knowing that in just a random talk, you're in an audience and you have say, 10, 20% of people raise their hands. I didn't know that going in, you know, it was just a grad project when I first created it. I didn't know so many people had touched it. So that was cool. That was reifying that we're building something that has a potential to have big impact.
So we didn't spend a bunch of money on marketing. Instead we focused on value. And so what you'll notice if you go to Cleanlab.ai/research is we still publish research. This is very unusual. Most startups stop publishing and they just focus on making money. Yeah. We believe that we will make more money actually by continuing to contribute, and I'll be clear what I mean are who are our customers?
These are data scientists, machine learning engineers, machine learning research, and what do they do? They read. . So what we do is we contribute to what they're interested in. Yeah. We continue to write interesting research papers, invent new algorithms, and then people discover what we're building in that way.
And so instead of marketing where you just spend a bunch of money and you hit people with ads all the time, what we're doing is we're trying to contribute value. Yeah. And in doing so, also marketing. And so it's a win-win. And people tend to like this style. They like the sort of way, the way that they're like, Hey, these people are actually trying to.
And they're doing good versus just like spamming me with an ad. When we, as we grow, we will have a growth phase and then we have to do ads and we have to do traditional marketing if our growth is not meeting the type of demand that we're trying to hit. And we'll have to, you know, reach out to some people.
But that's sort of been the initial marketing approach is we have, we know that we have around 10,000 minimum data scientists using the open. Regularly, maybe a hundred thousand have touched it in some way in the last three years. I don't, we don't know the exact number for open source. Yeah. So, so we have a good idea that the traction is already there just from organic growth and it's a useful thing.
, anybody who works with data can find an issue in a classification data set using Cleanlab, and there are like literally millions of people doing that. . Yeah. So it just makes sense that eventually people will find it. In terms of other business things the work done with Amazon was very compelling.
I'll share the use case. So at Amazon they did not know, and this is a good, this is a good thing to just think through yourself. Do you have an Alexa device?
Julian: I don't uh, only because. I don't want something listening to me. It's just technology for it. I am I am actually, every time I go to Airbnb, I unplugged any Alexa device.
That's just me though. .
Curtis: Yeah. No that's a reasonable perspective and that, oh, I could have a long conversation on that. For the folks who do use Alexa , there's a question that if you worked on that team, you would want to answer that you couldn't answer without Cleanlab.
And that question is how many times Alexa stop? The question is how many times does the device wake up? Thanks for telling me or ele the question. The question is, how many times does the device not wake up when you say the word Alexa, right? Yeah. How would you know? .
Julian: That's a great question. You couldn't know that because you would think, if it's not register, I would assume that you couldn't know that cuz it's not registering your voice or it's not counting that as a time when it would wake up.
So how would you know that otherwise? Is it still intake that command? But just not respond to it and categorize it?.
Curtis: It has to do with uncertainty. Yeah. Okay. If you know what the false, so that's the false negative rate. It's when, right. It should have woken up, but it didn't. If you know the false positive rate and you know the true positive rate and you know the true negative rate, then you can do one minus all those numbers and you can get the false negative rate.
So the question is how do you get those three numbers? And you can estimate those three with Cleanlab, because those are literally just estimating label errors. Yeah. And you take the total number found by Cleanlab divided. The number in the data set and then you can solve for the missing number.
Julian: Yeah. It's incredible to think about how much you're touching as a founder, not only from the technical standpoint, but also, you know, doing the research and really educating your audience.
And it sounds like you've kind of honed in on, on that strategy in particular. And I'm sure there'll be a time when you transition into the growth phase. But I guess my question is, with particular heart about your job, is it dividing and conquering in terms of you know, the technical work versus business work that you have to really focus on or the.
Yeah. What kind of keeps you up every day and maybe doesn't get checked off the task list?
Curtis: Yeah. I mean, it's a lot to keep in your head. The company has grown so much and just the last year. One year ago today of the people who are currently on the team, there were only three of us who were sort of still full-time and were on the team and we went from three to now 25.
When you have that much growth it's very hard, I think for anyone that's over eight times growth. Yeah. So just keeping that in all in your head, all the new projects the number of changes every time a company doubles the CEO job completely. Like completely like the way that I tried to run the company with three people versus six people versus 12, versus, you know, 25, it's completely different.
Like the job is completely different, the. basically every time you double you have to hire someone who does all the things you did at half the size . So that's challenging. It's challenging when every two months your job changes completely. And the success of organization that you've put together, that you care deeply about, relies so heavily on the decisions that you make.
Yeah, I think there are a few tricks though, to make it easier, which I think is worth focusing on instead of why is it so hard? I think, well, what can you do to make it easier? So there are a few things. One, and these are things I learned from my co-founders. One is if something is better than it was before, release it.
Stop struggling with the decision of when to release. Is it better than it was before? Then release it. Yeah. If it's a first time the product is going out, that's a tough one. And you have to be more thoughtful. But if you're updating a current product, just release it. If it's better, if you know it's better, release it and then do that again tomorrow, and then do that again Tomorrow.
In 10 years, you'll have something. Yeah, so that makes things a little easier. Another thing that makes things easier is you have to figure out sort of when making a decision, is this a decision that I should be spending an hour on today? Meaning, can I delay this decision? And so there's something that you're constantly doing, which is training a meta classifier, and that's the decision.
it's a decisions about decisions. You're trying to decide, is this the decision I have to make? There's another decision classifier that you learn to train, at least in, in this role, and that's the decision. Classifier is this decision one that I can just make quickly and it's okay if it goes badly because it doesn't actually hurt us that much.
Like it costs us maybe a thousand bucks if we screw up and you know, in the grand scope of the company, it won't really make a difference. That's another one. And then there's a third decision class. And that's where I have to decide if a decision is one that is reversible. Yeah. And if a decision is non-reversible, most decisions are reversible.
But if a decision is either non-reversible or very hard to reverse in those situations, you have to spend more time and be much more careful. But you'll find that actually most decisions are reversible. So it's better to make them and move quickly than it is to waste a lot of time. Another trick that I've learned from, especially from Jonas and also from an.
Is to have a bias toward action, and that makes the job a lot easier. For example, and this is a good example for anyone building a company if you spend your time asking. So I believe strongly in asking good questions. I think that often people ask the wrong question, and if they had just asked a different question, they would've found a much more beautiful answer.
But they spend all their time answering the wrong. So asking questions is important. That's not what I'm talking about. That's a much higher level type of work. Sure. I'm talking about very low level work, like you just have to get something done. And I think for that, what you want is you want people, and this is something that I've trained myself to be better at, is you want people who just do the work and then ask for updates.
So for example, if you're working with GitHub, if folks are familiar with that, it's better to just make a PR and then put comments in there than it is to put a bunch of back and forth in Slack That slows down the whole. And then no PR was ever made. And at the end of the day, it was like everyone's confused.
And then maybe a week later you make a pr. That's a slow moving company. And as a startup, they're probably going to fail, but a startup where the person just immediately, they think deeply about it. They ask a few questions, they do some quick research, and they take a guess at a PR and they ask some good questions.
And the PR and it has a draft of the code, you know, that's like an afternoon. And then you're much further along and much closer to actually like a final feature. Yeah. In, you know, what a 10th of the. So I think all of these things make the job easier.
Julian: Yeah. Yeah. It's, I think you echo a lot of points a lot of founders make, which is moving fast and really, you know, it, something doesn't have to be a hundred percent completed, but it's, if it's at least 80% or closer to that, then you know, it's definitely worth.
Take, I wouldn't say risk, but taking the chance of putting something out there and then getting the feedback from it to correct it and make it more or make it better. Right. I love that you said release it tomorrow, release it the next day, and then in 10 years you'll have something that's amazing.
I think that speaks to what a lot of, you know, founders kind of focus on it and the mentality behind building and building a company especially quickly with some of the, you know, From a company standpoint, what's some of the biggest risks that Cleanlab faces today?
Curtis: I think we have the same risk of every startup, right?
Yeah. So in, in the sense the market is down. And so I think it's actually a good, the market is not really down for startups. The market's down for the big tech and yeah, I think there was an article I read this morning that was like the Rise of Startups in the Ashes of Big Tech. But that's, you know, one should.
Take, they should still be careful. Like you have to take massive risk. You can't be too careful as a founder. Yeah. You recognize the position requires you to be gutsy in a sense. Like you can't just be really careful all the time, but you shouldn't just completely like, you know, go full manic mode and just, you know, do what whatever you like and think, oh, nothing can touch me.
Like the market is down. Like, be thoughtful about that. So, right. We made some adjustments. One of the risks that we were concerned about. Building building out a massive base for too long and basically playing a slightly longer game. And so we tightened our game. You know, we, there's certain things that we were going to do a year from now and have more things built out that will release a little sooner, and so we'll just move a little quicker.
And that's more just to match the market. . The most important thing obviously for any company is that there's a market to buy what you're selling, like Right. , and if you're not selling anything yet, then there's definitely a bunch of people who want to use the stuff that you are working on. Yeah.
And you either know that because they've signed up ahead of time, if you haven't released it yet, or it's open source and they're using it ahead of time, like op, you know, hugging face. But you need to make sure that like the market wants what you're spending your time on. Right, right. And so we know we have a lot of people who use the open.
And it's really important to us that in the next few years that we're able to drive, you know, some kind of revenue targets and we have a good idea of what those are. And we're on track actually, we're a little ahead of what we expected, but that, that's not guaranteed or promised. You have to build fall safes and what do you do if, like, yeah, a deal that you think is gonna close doesn't close.
And it's very difficult to make guarantees on like, you know, if you're doing top down versus bottom up. And so I won't go too in depth in this call on sort of our business approach and what we're doing. I'll just say that there's always a risk there for any startup, and we're obviously putting things in place to avoid those risks.
I think another big risk that most startups face is that there are founder issues. That's probably the number one actually. Yeah. If you're scrappy, you can always cut. You know, if you need to, you can sort of reduce expenses, reduce, maybe make the team smaller. There are ways to stretch things out.
Raise more money. If your company is not doing well, then you might take a hit, you know, on some. But there are ways to sort of keep the thing the ship going. And if you have a product that's interesting to people, you can usually get something going. But there is something that you cannot survive if you lose all your founders.
Yeah. And there are some rare cases where you have such a good product and team, someone comes in, they're fantastic and the company survives. But most seed stage and Series A companies will die if they have serious founder. . And so the way we mitigated that was I work with people I've known for 10 years, , and I've seen their stalwarts and I've seen them be reliable through very serious hardship and absolutely come out, you know, very effective.
And I've seen them do it multiple times over a decade. Yeah. They have very strong physical health, mental health, emotional health, and intellectual. And so I made sure of that going into the company cuz I'm terrified about That's like the number one thing you can't control for. Yeah. And so I controlled for, with the way I set up the team amazing.
Those are pretty generic answers. Do you know, because in a sense, like yeah, it's we're a, we sell a SaaS product. We're, we have an open source component and any answer that I say is gonna apply to other companies in that space.
Julian: Yeah. Yeah. I guess something that maybe won't apply to every company is long-term wise, if everything goes well, what's the, what's the long-term vision for Cleanlab?
Curtis: Yeah, totally. The idea is that in five years we see multiple markets that have significantly more effective solutions reliant on Cleanlab. And to be very clear, that would mean things like we're seeing digital consulting is that the landscape of that is actually already changing today based on Cleanlab, which is very exciting to us.
Yeah. We didn't know that when we first created the product, we didn't know exactly who would use it. Right. It's like any seed stage company. Yeah. You build something you know, is useful cuz you've got a bunch of data scientists using it in the open source. But then when we released Cleanlab Studio that the app, we didn't know who would start using that and it turned out that like hundreds of sort of mid-tier or middleman type data technology consulting.
Data consulting firms and other folks in that space just immediately picked it up and got they derived enormous value. Yeah. . And so for them, like what they're trying to do is they're trying to help their customers with ML solutions and to do that, they'll do anything they can. And so if they can use Cleanlab and it basically solves their problem for them and then they can upsell that to their own customers, like it's a great deal for them.
Yeah. And so we've been learning a lot in that space. Um, And we've seen that's a pretty good route forward. That's just one small thing though. So in five years, I imagine and I say imagine. There are lots of people who will bullshit you. Like they will literally tell you like, in five years this is what the future's gonna be.
Sure. I can use those words like I can make shit up. Right? But like five years we could also have Coronavirus 17, like yeah, five years. We could have Armageddon, like I don't know what's going happen in five years, but I know what's happening now and what the trajectory we're headed and the vision that I have for the company and that vision is that there are several new markets that can.
Another new market is emotion detection. I've seen several startups build their solutions using Cleanlab, that could not be built prior, and that's because emotion labels are very noisy and very messy. Yeah. You need very Cleanlabel to trade, to train a good ai. So we're seeing like therapy detection, emotion detection, lots of healthcare and medical applications.
that all require and are reliant on Cleanlab to be effective and to train their ML models accurately. Yeah. And then we've seen a lot of success and just regular technology and technology AI assistance like the Alexa and the Google examples. There are pretty much any company you can think of that's a major tech company that's building AI for people.
You've seen all the G P T three stuff. The Transformers, yeah. . All of that is trained on data and the noise in that data is pervasive. It causes problems in those models. Yeah. And so Cleanlab is becoming even more valuable as the landscape of data and technology grows and becomes more valuable to.
Part of the reason we built the company and CRE created it at the time that we did is because every year that goes by our reliance in business and in the capitalist market and in society in general on data increases. . Yeah. Every year we have more data-driven models. Yeah. And so you want a solution that makes that work better?
Julian: Yeah it's incredible to see the advancement in, in, you know, all the companies that are using data and machine learning to, you know, disrupt a lot of different industries. I mean, like you said, the G PT stuff, disrupting, you know, content driven industries. And I love the idea about emotion detection and therapy.
It just seems. The more sophisticated this becomes the better these solutions are overall and the more use cases you can gather. So it's just exceptional to see, you know, not only the necessity that Cleanlab has to a lot of these different companies, but the way I'm sure you're being surprised that, you know, it's being used in the different arenas, and I'm sure that's extremely exciting.
I know we're a little over time here, but I just, I could chat at length about you know, what you're working on and how you got there. But I would love to ask this for the audience sake. And also selfishly for my own research about influence and about the influence in your life.
I know you've had a lot of mentors. You've worked with a lot of exceptional scientists out of everyone or maybe even the top tier. You know, what books are people, whether it was early in your career, Continue to, or have influenced you or impacted you significantly through that time?
Curtis: Yeah. I mean, this is a lot of people will give this answer, but I'll try to give you what, something interesting in the answer. So two, the two people are gonna be my PhD advisor and my dad. Those are both very sort of Okay, cool. I've heard that a million times. Why is that an interesting answer? So, so I'll start with, I'll finish with my dad, the PhD advisor.
His name was Isaac. And I say was, but really like your PhD advisor is always your PhD advisor. And you know, the day that he leaves this planet will be a very sad day. Right. Isaac Schwan and I think he has like 30 years left, you know? But I already woo that day because he is an incredible person.
He invented the quantum computer, so like, let that sink in for a second. He built the first realization of a quantum computer on planet. . He was one of the most influential human beings I've ever interacted with. At least for me personally. He completely deconstructed my mind and how I think about science and problems, and he rebuilt my mind to think about them in a way that is dispassionate, such that I didn't allow bias and emotion to influence my ability to.
to, for scientific thinking and to think about the world in a way that's infallible and falsifiable so that we can, if I tell a customer something about a product that shouldn't be bullshit, like you should be able to back that up, that you make claims with good evidence. And a lot of my mantras come from his teachings to ask good questions and that often good questions, unveil simpler answers.
So that's definitely one. I don't know if you know a common founder would enjoy reading, but he has the number one book written on quantum computing. He's considered to be the author of the book on quantum computing , which you're welcome to look up. Yeah, and you can, I don't need to tell you the name of the book.
You can search Isaac Twang Quantum Computing. It will be, it will pop up. So yeah, definitely big impact. I had an, I'll mention one other person before mentioning my dad, Patrick Winston was another big influence for me. He passed away. He was, A faculty at MIT for, I don't know, three decades. He led c a for a while.
He was the head of the AI department. . And he was a very very formative person in my educational background, particularly in his emphasis on communication. He's a scientist, but he actually taught the how to speak class at m i t and you can look this up on YouTube. YouTube, it's called How to Speak by Patrick Winston.
And this is a machine learning and AI professor, a scientist, and yet he's teaching about communication. It makes, it might make you curious. Someone who has devoted their life, you know, and their faculty at m i t, they devoted their life to artificial intelligence. Why would they take their time and investment and spend it on communication and talking.
Yeah. And if you think deeply about it, you'll realize that our communication is, and it's not just the fact that we communicate that makes us intelligent beings as humans. It's actually the way we communicate. We tell stories. , and this is something that Patrick Winston was very thoughtful about.
He realized. In the way that we tell stories, even right now I'm communicating stories to you, right? Yeah. Yeah. And the way we weed them together, that may be the inner nature of what makes us truly intelligent beings. And he was doing a lot of work in that space, which was left unfinished in his passing.
And I hope that many people are trying to pick that up, but he basically was looking at a bigger picture of all intelligence and how you can link together human intelligence and artificial. Through communication and through stories, and that's part of the reason he thought so deeply about how to speak.
Yeah. Okay. Final person is my dad. You know, I mentioned my dad was a mailman. His dad was also a mailman and his dad was a mailman. Or, you know, they all worked in the post office. So if I had been a mailman, I'd be like a fourth generation mailman. And you take someone like that who has delivered mail and lived a life of servitude his entire life.
It's not like a nice job, you know, like you, you literally carry a satchel, you're soaked in sweat all day. When it's wintery, you slip on the ice people chase you down and ask you to do things. You don't get paid anymore for that . You're delivering people's unlimited packages, you know, during Christmas.
Yeah. It's actually a very, really uncomfortable and rough job, and the pay is very low and it's just not, it's not a comfortable type work, you know. There's, you would think someone like that would be sort of down on themselves, but I think that he actually took being a father as his primary job, even more than that job.
And so he tried to impart a lot of lessons and he really focused on raising me in a way that was principled. Yeah. And where it was focused on doing good work and contributing to the world and adding value. And what do you. What is your purpose here? Like what are you gonna do? You know, like you wake up every day and you need to do your best and you need to go to work and you need to show up and contribute something and create value, and then you should do that again tomorrow.
And I took that to heart and I took it to every level I could and I continued to every day. Another thing he said that I think might seem depressing, but it was very motivating for me, is I would always tell him when I was little, I'd be like, Hey, I wanna be a mailman when I grow up. And he would get really.
And he would say, you know, you can do anything you want. Just don't do a job like this. You know? Yeah. And I think hearing that from your own father really affects you as a child, and it really makes you think about, you know, what are you trying to do on this planet? Like, if this is your occupation, what are you contributing?
What is your value? Yeah. The final thing I'll say is I wrote this in my college essay and if people, you know, I was accepted to everywhere I applied, and so I think it was a pretty effective thing to. But I just wrote an essay about my dad and I just said that he inspired me with five words. And I think, and then I write this whole essay and at the end I share the five words, but the essay talks about my background and where I'm from, what I'm trying to do.
And so you're like waiting for the punchline, you know, what are the five words? Yeah, and it's a bit of a downer, but it's really effective. He would always say, Curtis, don't be like me. And you might hear that and think, man, that's sad. You know? But it wasn't sad. It was really motivating to have your own dad who's like putting you before him.
And he goes to work every day and he's sweating and like makes a low paycheck so that you can go to, you know, to Vanderbilt or whatever and have a better education. I took that to heart and I take it to heart every day, and it keeps me very motivated and I'm grateful every day to have that level of motivation.
Julian: Yeah, I, that's an incredible story I appreciate you sharing that. And my, my father's a welder and he said something very similar to myself, and it is very similar, maybe sentiment, but it's about, you know, it's about defining who you are and using the skills and the tools you have to continuously improve.
And he was the type, you know, wake up at 4:00 AM every day maybe three 30 if you wanted to. But he'd always punch in and he'd always work hard and he'd always try to be his best. And that's grueling over time. You know, we're not built to wake up when it's dark. And I take that lesson from him in terms of clocking in every day, you know, pushing with the same motivation, you know, whether you're hurting or not, or you're sick or what have you.
Obviously take care of yourself, audience, but you know, you gotta challenge yourself to, to push past the discomfort to really improve. You know, Curtis, I know we're at length here and we could continue going, but I really appreciate not only your stories and your sentiment, but also what you're working on and I think.
You know, maybe right now we, we might not be, you know, aware of how it's impacting us, but from what you discussed and how it's affecting, you know, the different companies you're working with, I'm sure a lot of us as the audience members and as listeners will be very much affected by Cleanlab and the products that you know, use the technology to improve themselves.
So, last little bit here before we end is I would love to hear your. Give us your LinkedIns, give us your socials. Where can we find you? Where can we find Cleanlab? If we wanted to start, you know, playing with the product and working with it and building on it or even, you know, working with the studio, where can we find that and where can we get involved?
Curtis: Yeah, it's easy. So you just type Cleanlab GitHub and you'll find the open source. And then for the product, that's easy, you just go to Cleanlab.ai. There should be a big try, Cleanlab studio for free button in the top. and if you want to go straight to the app, you can too. It's app dot Cleanlab.ai . I love it.
Well, in terms of recommendations you can check us out on LinkedIn and Twitter if you are in those spaces. We're pretty active. But I'd recommend just playing with, oh. Also if you have questions, we have a really big Slack community of data scientists and they're really good people. And so you can just type Cleanlab Slack and you can find it really.
But other than that, I would recommend just getting your hands dirty with the actual tools and instead of telling you to star a bunch of things and you know, you know, buy this, use that. Just try it out. You can try out everything for free.
Julian: Amazing. Curtis, I hope you enjoyed yourself on the show and thank you again for being on.
Curtis: Yeah, I'm happy to be here. It is great to chat with you Julian. Wish your podcast a big success too.
Julian: Thank you.