Today, the StarTree team announced their $47 million Series B to accelerate adoption of real-time analytics for user-facing and large scale internal applications. I sat down with Co-founder and CEO Kishore Gopalakrishna to talk about the fascinating origins of StarTree and its underlying technology, Apache Pinot, as well as to discuss Kishore’s ambitions for the enterprise tech startup. Listen to our “Founder Real Talk” podcast episode below or here, or read below for an excerpt of the interview.
This transcript has been edited for clarity.
—
Listen to this episode:
Glenn: Talk a bit about why you embarked on building this real-time user-facing analytics platform Pinot. What was the prompt for building it?
Kishore: Before we embarked on this journey, analytics was really restricted to internal employees of the company. It was single-digit queries per second, most of the time. Where I really got excited with the capabilities with analytics was when we started to build this new product at LinkedIn called “Who Viewed My Profile.” This was literally providing analytics to hundreds of millions of members of LinkedIn, and it was a challenge that no one had really solved very well.
Our first question was very simple: We just took an Elasticsearch solution built on Lucene, but built in-house, and we launched that. The product was successful. It actually took almost a thousand nodes to solve some of the initial queries. We had almost a thousand queries per second. And that’s when we realized that this was good from the product perspective, but infrastructure-wise, it was really expensive for us to operate and manage.
Given the success, LinkedIn wanted to do a lot more with this feature and for pretty much for every vertical you can think of—such as companies, universities, jobs, and articles. And that’s kind of when we decided that we needed to take a step back and evaluate if this was even the right architecture. Was this even the right kind of system that we needed to have as a fundamental layer?
That’s when we went back to the drawing board and built Apache Pinot from scratch. We went from a thousand nodes serving a thousand queries per second to 75 nodes solving 5,000 queries per second. This big improvement marked the start of this trend. And fast forward to today, we have 200,000 queries per second running for the 800 million members of LinkedIn, and almost a hundred applications. Everything that started with the simple “Who Viewed Your Profile” feature is now all over the place. Any number that you see on LinkedIn today is actually served by Apache Pinot.
Glenn: I’ve learned more about Pinot and spoken to various members of your community. Some people describe it as magical. Did you think you were building magic when you were building Pinot? Why do you think they think that?
Kishore: It’s interesting to look at the most challenging dimensions in any analytical system.
- The first is really the freshness of the data—how soon can you get the insights on the data that you’re writing into the system?
- And the second is definitely the latency of how fast you can query—how soon will the query response come back?
- The third reading is the concurrency— how many users can actually use this system concurrently?
The challenge here is really the latency, because this is the thing which drastically impacts other dimensions. Being able to keep that latency as low as you go from batch to real-time, or from a single user to millions of users, is a huge challenge. Most systems are not capable of handling that. And that’s kind of why Pinot was built. There is no magic; it’s really a new concept we came up with. For most systems, it’s understood that you need to do a certain amount of work to answer a query, so the question has always been, “How can we make it work faster?”
The common solution is column store or using more parallelism and nodes. We went a completely different direction. The question we asked was, “Why are we doing this work? Is it possible not to do this work?” And that’s where indexers come in. Instead of trying to do the work faster, we aimed to completely eliminate it or reduce it drastically. So we kind of went into indexes first. In Pinot, we have a ton of indexes that help us achieve data analytics for various use cases. It’s ultimately about reducing the work instead of trying to do work faster.
Glenn: Pinot was open-sourced in 2015 and entered the Apache Incubator in 2018, and has been really quickly adopted at some very big companies. What challenges have you seen big companies facing while adopting Pinot, and what have you been building as you’ve evolved to support larger companies?
Kishore: Well, initially when we built Pinot, we only kept LinkedIn in mind. But we always knew about the scale for which we were trying to solve. The biggest challenge as we came out of LinkedIn is that the world had moved on from an on-prem world to the cloud world. For us, the biggest challenge was making Pinot cloud native. We had to make Pinot easy to run on Kubernetes and Docker, and also had to make it very simple and easy for developers to adopt and run. That was the first thing, and we invested very heavily in this early on.
The second challenge was really the technologies Pinot had to integrate with. Pinot was already heavily integrated with Kafka (since Kafka originated at LinkedIn as well), then we had Hadoop and Arrow, which we used for compile time dependency. These technologies were very tightly woven together, so we had to decouple them. Because in the outside world, there were a lot more use cases like ProtoBuf, Thrift, Kinesis, EventHub, and Pulsar. All these things were and still are important for us to use as source streams. Ultimately, we introduced the concept of plugins so that we could extend Pinot to various different sources. This was well-regarded in the Pinot community as well. People started contributing a lot more connectors, such as Spark connector, BigQuery, and Snowflake. All these things were new sets of connectors that we could add on top of the existing ones.
Glenn: Tell us a little about the decision you made to leave LinkedIn and to start a company around Pinot. What led you to take this step?
Kishore: First of all, it was very, very hard for me to leave LinkedIn. I actually wrote an article on how hard it was, especially since I spent almost eight years there. It was an amazing place to work at. I think one of the reasons I left was that we really saw the potential of Pinot, and I wanted to make sure that it reached its potential beyond LinkedIn and a few other companies that had adopted it. We wanted to make Pinot easier and more accessible to the world, and we thought it was a great opportunity. I think that’s kind of why we had to create StarTree. Open sourcing it was great, but there was only so much we could do being at LinkedIn. We had to invest in the community: We’d been researching documentation, connectors, plugins, and we started to actually see the full potential. That’s what led us to create StarTree.
Glenn: So you just raised your Series B. Congratulations! We’re very excited at GGV to be leading this round, and we’ve loved working with you so far. Obviously you don’t raise $47 million in a Series B—particularly in today’s environment—unless things are going quite well. To what do you attribute the success that you’ve seen so far in StarTree?
Kishore: I would say two things—first and foremost, it’s the team. We’ve been very fortunate to have an amazing team, and I think everything else is secondary. Our team is really passionate about the problem that we’re solving and they love working with each other. That’s one of the biggest reasons why we have been able to deliver so many things in such a short time frame. We’ve built not only on the community side, but also on the product side.
The second thing is really the community; it’s been very helpful for us. We almost treat our community as product managers. They tell us exactly what is lacking in our product and what things we need to develop. For the first year and a half, the roadmap was completely based on the input that we got from the community. They have been very helpful in terms of shaping the product and making sure that Pinot is useful beyond its original purpose.
Glenn: Can you look into the crystal ball for us? Where do you see StarTree three to five years from now?
Kishore: I think it kind of goes back to our vision of needing to democratize data. When you say democratization of data, people mostly think about employees having access to data within their companies. I think we want to redefine their definition and say that it can go beyond that. Every customer and user should have access to data to make decisions. For example, if you look at what Uber Eats does, we now see restaurant owners getting analytics about popular meals and ordering windows in real time. So that’s a massive shift from where we were a decade ago. And that’s the change that we want to bring in.
—
Listen to the full podcast episode on Founder Real Talk here.