Video: StarTree 101: Getting Started with StarTree Cloud – Basics of Apache Pinot | Duration: 3592s | Summary: StarTree 101: Getting Started with StarTree Cloud – Basics of Apache Pinot | Chapters: Welcome and Introduction (3.1999998s), Introducing Real-Time Analytics (61.61s), Importance of Real-Time Analytics (229.78s), Real-Time Analytics Importance (389.05s), Apache Pinot Architecture (686.08997s), StarTree Cloud Features (892.365s), Building with StarTree (1190.5099s), Configuring Client Access (2432.3052s), Demo Feedback Session (2573.1052s), Comparisons and Challenges (2613.34s), Q&A and Comparisons (2739.4448s), Conclusion and Resources (3117.5352s), Wrap-up and Feedback (3294.705s)
Transcript for "StarTree 101: Getting Started with StarTree Cloud – Basics of Apache Pinot": Hello. Hello. Welcome. Welcome, everyone, to our our very first webinar in in a while. Let's do a check. Can everyone hear me? People, please respond in the chat if you can hear me. Thank you. Thank you. Alright. Fantastic. Alright. So I'm gonna give, about thirty more seconds. I see some people are trickling in, before we get started. And, of course, I'm going to set up the slides. Oh, wait a minute. Can you see the slides? Perfect. Perfect. Alright. So welcome, everyone, to getting started with StarTree Cloud, the basics of Apache Pinot. So, my name is Barkha Herman, and I work for StarTree, of course. I'm a developer advocate. And this is going to be an intro class. Hopefully, we'll do follow ups with deeper topics, so bear with us. And before, before we get started, I do wanna run a quick check, quick survey. So let me see if I can bring this up. Alright. So what we're gonna do is I'd love to hear from you, And the way this will work is, the answers are numbered. The question is, what stage are you in your real time analytics journey? The options are I'm familiar with StarTree Cloud. I use Apache Pinot. I'm fam familiar with Apache Pinot looking to learn, and unfamiliar with StarTree Cloud and everything. So I see a couple of fours. Alright. Yeah. Three to four? Okay. Alright. Great. Great. So this helps me with how I'm gonna be presenting the session. Great. Okay. So I'm gonna go over the agenda real quick. So we're gonna talk a little bit about real time analytics today, Apache Pinot. We're gonna talk a little bit about Starchy Cloud. We're going to actually build, import some data and build a, a dashboard using, Starchy, using Python. So in theory, you can create Python apps or node, whatever you like. And we're gonna have some demo, and then we'll go over q and a and what's next. Sound good? Alright. So, since a lot of you said four, it tells me that everybody's not familiar with real time analytics with Apache Pinot. So let me so what is real time analytics, and why is it important? I'd love to hear from the chat. Yes. Ravindra, it is going to be recorded, and it will be available offline as well. We also have a couple other webinars that we have recorded before. So if you wanna do deeper dives, you're welcome to that. With that, I wanna, I wanna get started. So, why is real time analytics important? Any ideas? So I'm gonna I'm gonna share my own personal story. I remember this was four five yeah. Four years ago. No. Five years ago. I had flown to Seattle. I was working for Microsoft. I've flown to Seattle. It was cold, rainy, and I was hungry because I live in Florida. Flying to Seattle, it's a long flight. The food on the airplane was not great. So I'm sitting in this damp room, and it's cold outside. I don't feel like changing and going out to get even, like, to the sandwich store to get food. And I pull out my app, of course, Uber Eats. And, guess what? I order food food delivery. Guess who I'm going to pick for delivery? I'll give you a hint. I was hangry, so, of course, I picked the the fat the shortest delivery time. Right? That's a huge criteria because we live in an impatient society. We do. We we expect people, to respond immediately. So and, also, I remember this particular day, the delivery was, like, two minutes late. And I was all anxious because I was down by the lobby getting ready to, collect the food. If the delivery had been later, guess what? The tip wouldn't have been as great. So we all live in this, impatient society. We we expect things to happen when we want it. And what this is is a symptom of the impatient society that we live in. And the app, of course, Uber Eats use Apache Pinot, and it is delivering real time information. So when I make the order, the estimation of how long it will it take to deliver is calculated based on how many orders are in the kitchen, how far is the restaurant, and also the traffic and things like that. So this is what real time analytics does. Right? So, speed drives decisions. Obviously, I made my decision because I wanted shorter delivery time. My user experience was enhanced because I was constantly aware of where the driver was, what the status of my order was. And Uber Eats has a competitive edge compared to a hotel that you know, like, the, a restaurant that I call and say, hey. Do you deliver, or whatever? Because they're showing me on a phone where where in the process things are. So they have a competitive edge, and, of course, there's operational efficiency in this. So another example that I state in this slide is that, when you're monitoring systems in real time, any outages are detected immediately, and this is great. So real time analytics is important. It's part of our lives. People are using it. You are using it without realizing it, and the use cases for this are growing. So let me share the next slide. So let's talk about Apache Pinot. Again, not a lot of you are using Apache Pinot. So Apache Pinot was born to solve a problem. It was, created in 2013. So it's 12 years old, folks. It was created in 2013. What was happening was that, this was created at LinkedIn. The founding engineers of Structury did work for LinkedIn at a time. So LinkedIn created something called Kafka. How many of you have heard of Kafka? Please, respond with a one in the chat if you've heard of Kafka. So Kafka is a eventing system. You can send a messaging system. You can send messages to it. So they were using Kafka, and then what was happening was there were all these messages that were, coming up. And then every time you brought up a new dashboard, you had to collect all that data and show it somewhere. And they had used databases and data warehouses in the past. They had settled on using Elasticsearch, and Elasticsearch grew to the point where they were running 1,000 nodes of Elasticsearch. So any, IT folks or, you know, anybody in DevOps realizes that 1,000 nodes is a lot. And at this point, they were serving about 7,000,000 customers. The query per second performance was about 15,000, I think 1,500 queries per second. And they they were scaling fast, and they needed a better solution. So they invented Apache Pinot. And so Apache Pinot was a purpose build real time OLAP solution. And by the time they finished and it was an iterative process, of course. So by the time they got done and deployed it, they went from a thousand nodes to 70 nodes, which is amazing. And, also, they were, serving 5,000 queries per second, so more than double the original. Of course, by this time, LinkedIn had also grown, so they're now serving 70,000,000, customers, which is much, much, much more right now. So but back then, that's that was the speed. So that's the story of Apache Pinot. I think it was a great, it was a great, product that they created, of course, just like Kafka. So they invented Kafka, released it to Apache Foundation. They did the same thing with Pinot. They created an in house, released it to Apache, Foundation. And so it's completely open source. You can download the source code, build it locally. You can contribute to it and, have fun with it, run it locally, and which, you know, you can still do. So, who is using it? I mentioned LinkedIn. So LinkedIn when you go to the LinkedIn app on your phone, what you're looking at is the feed, which is a dashboard. The original use case for LinkedIn was who viewed my profile, which was the first dashboard that they created using Apache Pinot. Right now, there are 700 plus dashboards that, work in LinkedIn. Every time you go to a different screen, it's really a dashboard. And, it's using Apache Pinot behind the scenes. Of course, I mentioned the the Uber story with you. I shared the Uber story. It really happened, by the way. It's, the restaurant manager for Uber Eats uses Apache Pinot. Stripe uses Apache Pinot for fraud detection, on billions of transactions. Slack uses it for improving, searches across, again, millions and millions of users, and Walmart also uses it for inventory management and things like that. Again, you can see this pattern that real time is important because the the quicker you respond to an outage or a challenge, the better you can serve your customers, the better operations you have and customer satisfaction. And, also, Apache Pinot does this at scale and fast. I wanna talk a little bit about the architecture of Apache Pinot. So you can see that on the right side, you have this thing called a controller. And zookeeper is another open source, piece which is used by the controller to keep track of, some some admin variables and things like that. And then you have two flavors of servers, the top and bottom. So the controller is the actual interface for, create and, create tables, destroy tables, add indexes, those kinds of things. Then you have, the broker on the left side, which is a blue box you see, and you can have many brokers, and number of brokers. And then you see the two flavors of servers. There's a real time and an offline server. The brokers, of course, take care of all your queries. So if you when you are acquiring your database, broker brokers handle it. And then there is a segment store, which is the storage for, all of your data. It is I mentioned earlier, it's purpose built for real time and scale. It is distributed, not just the data is distributed, but also the processing is distributed. And that's why you can have n number of brokers, n number of real time servers, n number of offline servers. And what I mean by that is when you're doing large batch ingestion how many of you actually work with databases? Please, put a one in in the chat if you do. Okay. So if you do work with databases, you know that, the performance might go down when you're ingesting large amounts of data, especially like terabytes or petabytes of data. It affects the ingestion. Well, by separating the real time versus the batch, what Pinot does is that there is no down there's no blip in performance when batch injection is happening because they're writing to different segments. They're running in a different processor, so it's great. Alright. So let's move on now to StarTree Cloud. What is StarTree Cloud? So a StarTree Cloud, of course, is run run on Apache Pinot. So the same code base that is open source is used in Apache Pinot, in StarTree Cloud. However, it's the managed version of Apache Pinot. So in other words, if you are a small or large customer and you don't wanna deploy your own you don't wanna build it locally, deploy your own, manage your own clusters, no worries. We can we can help you there. So it is a database as a service. I like that. D b a a s, built on top of Apache Pinot. Like I mentioned, the code base is exactly the same. So if you are some some of you I think one person said three. If you are familiar with Apache Pinot open source, it is the same code. However, it adds a little bit to it on top. So on top of Apache Pinot running in a cloud environment, it, has, you know, the ability to self serve provision and, enterprise support, customer support, all this stuff, and it allows for flexible deployment. I won't go too deep into it. There's a couple more webinars on our platform available on Startree Cloud that mention the difference between, you know, you can either bring your own cloud or you can deploy using StarTree Cloud, and there's a distinction there. Since this is a one zero one, I won't go down that rabbit hole. So let's talk about what is that more part. So, hopefully, you can see this, diagram. And, basically, the center of it is that little black thing with Apache Pinot, and, you know, it is built on top of Apache Pinot. Then what we've added is some starter extensions. And so the starter extensions are proprietary. However, they're extensions that we wrote for supporting the Cloud infrastructure. Right? So there's a lot of, things that we added that allow us to manage, multiple pools, multiple customers, and so on and so forth. So those are extensions that sit on top of the open source Apache Pinot, product. And then we have, value added services that we created, and I will walk you through some of them so you don't have to remember them. But some of them are a data manager, security, Third Eye. Some of these products are built on top of the StarTree cloud. These are value added services that help your data estate. So, with ingestion, with monitoring, with security, and things like that. And, also, so I mentioned security and compliance. We are SOC two compliant, support, you know, OAuth and all that stuff. And then, of course, the infrastructure. We manage the cloud infrastructure for you, so you don't have to manage it. You don't have to worry about scaling up or down. All that happens automatically behind the scenes. And some of it, you can control by setting some configurations saying, hey. We I only wanna grow this much or that much and, you know, all that stuff. So, what it is is, so how what does the application built on StarTree cloud look like? So the center box that you're seeing there is the StarTree cloud. So we have, the value added stuff there. We have, the the StarTree, extensions as well as something called tiered storage. So typical scenarios for real time data, going back to my hotel example where I was hangry, let's say that I made that order. Right? And, I I'm really interested in that order. I'm sitting on my phone every thirty seconds checking where's the driver? Where's the driver? What's going on? I wanna know. But once the the food is delivered, even, like, ten minutes after the food is delivered, that data loses its importance because it's done. Right? It's not gone. It's not unimportant, but the value of the data is the freshness of the data. So the way StarTree, and this is Apache Pinot and StarTree Cloud handles this, is that we have something called tiered storage. So the stuff that's most valuable is in a storage that's highly accessible, fast, and the stuff that's kinda older goes into this this lower tier, which is less expensive, still accessible, still very fast, but it's not the same, as the current data. So that's what tier storage does. On the left side, you have the streaming pipeline. So we do support streaming, injection. As you remember, the original use case for, Apache Pinot was Kafka. So we do support Kafka. Of course, we support other streaming platforms as well, Kinesis and many, many others. And on the the further left of there is what kinds of data you want. So you can ingest data from databases. You can ingest them from other, you know, files and so forth. So so on and so forth. IoT devices, whatever. On the right hand side is, you know, you can have dashboards. You can have observability platforms. You can have applications or whatever because now that you have your data in a fast, real time OLAP, you wanna do something with it, and that do something with it part is the right hand side. Okay. I wanna pause and ask any questions. How, how's things going, guys? Does this make sense? A one for yes. I like ones. Alright. So there is a question. Why would I wanna use starchy cloud? So, great question. Let's let's hold on to that, Paul. I'll definitely come to that. Let's work with StarTree Cloud, and then you can tell me whether you wanna use it or not. One of the one of the simple the simple, answer is that if you wanna do some prototyping, this is a great way to start start strategy cloud. How is tiered storage works with Pino? What is the competitive difference with all the other public clouds? Okay. So how is tiered storage work with Pinot? You decide when your data moves from the current storage to tiered storage. In some cases, such as the Uber Eats use case that we talked about, there's a very good chance that it's a time bound thing. Right? So, as transactions are coming in, they they need to be in the current stores, the hot store. And then as time passes, you probably wanna move it. So you could set your time to be an hour. You could set it to be a day, whatever you like. There are use cases where people are monitoring, transactions, that are happening in a bank or something like that. In that case, it could be even a week or thirty days. In some cases, it's one day. In some cases, it's an hour. So you can set the parameter for when the data moves from the hot store to the cold store, and that is, how how that works. What is the competitive difference with all other public cloud? So, I mean, StarTree cloud is purpose built cloud for real time. It only offers one product, which is, the database as a service. I don't know if you are comparing it with Azure or AWS or GCP. Those clouds are I mean, start your cloud is built on top of them, so it's actually using general purpose clouds. I hope that answers your question. Does tier storage use Iceberg in, near future if not now? No. The answer is it it is not. And, the reason for that is because StarTree Cloud is super fast, and, it actually is a custom, format that is used for Apache Pinot to for it to be readable and fast. As far as the future, I know, Iceberg is big. Iceberg was implemented by, you know, like, many companies to solve a problem that really Apache Pinot has not experienced so far. Is there a possibility that we'll be implementing iceberg support in the future? Maybe. I'm not the, I'm not the engineering team, so you should definitely come attend the engineering debriefs that we do. There's a community call, I think, next month, and we can we can certainly put, put that information. How fast and what's the scale volumetric and responses? Is it microsecond, nanosecond? So it's all all responses are, subsecond. And, for specific, for specific numbers, I'll share some metrics with you, in future. Yeah. Yeah. Milliseconds is what, what the response time is. Alright. Let's, let's take a look at let's build something. Yeah. It's milliseconds. Near says it's milliseconds. Yeah. Alright. So let's build something. This is what we're going to be doing, is we're going to set up a new StarTree cloud account. We're going to ingest data. We're going to connect using Python, and then we're going to create a dashboard. Alright. So one of the re one of the questions Paul asked, and I'm going to answer that, is, why use treasury cloud? So first of all, it's free forever. Right? And, you can, do, wait a minute. I wrote it down. Hundred gigabyte of data. So you can actually, load a hundred gigabyte of data on the free tier StarTree cloud, and you can do half a million queries per day. So if you think about it, that actually exceeds most applications' requirements already. So that's very impressive. And a lot of times what happens with the typical person's journey is that they come, they prototype, or start using Start d Cloud, to evaluate it to see if this makes sense for us. And then after that, they, they like it, and they upgrade to a paid version. So alright. So with that, what I'm gonna do is I'm going to actually share my screen. Okay. So let me hide my slides and share screen. Oh, I'm having trouble sharing screen, folks. There you go. Alright. So let me know if you can see the screen. Can you see the screen? Perfect. Alright. So this is the stage, and this is me. Alright. So this is where, this is the this is the URL, and it is a public recipe. You can go along with me, but this is what we're doing. And, let's get started with StarTree Cloud. So the URL for StarTree Cloud is startree.cloud. Okay? And so you'll see this screen, and you can see that all you are you all we need from you is, like, you know, email, first name, last name, password, company. We don't even ask you for credit card or anything like that. So you can create a free account. You can also sign up with your Google account or GitHub. I already have an account, so I'm gonna use that. And I purposely am using not the StarTree account. This is a just a just like any other free account, I run our organization called Wit Voices, so I'm using that on purpose. And this is what you see when you log in for the first time. You can see that, there is a begin here, go to data manager. So it is a database. The first thing you wanna do is look at data, connect, or whatever. Then the second step possibly is you query the data, or you could want to connect with your client. And, of course, you can invite some team members to your account as well. So you can invite via email, and they will be able to access your, version, your your deployment of, StarTree Cloud. And there is something called anomaly detection called ThirdEye, and you can use ThirdEye. Of course, we also provide you with, links for joining our community, going to the documentation, and looking at some videos. But let's get started with the data manager. So this is the first thing we wanna do. Obviously, I do have some datasets that I have created, but I'm going to go ahead and, create a new dataset. So go ahead and create dataset. And you'll see that, you know, here's the steps for creating the datasets. You can see that we have some sample datasets that you can use if you're just starting out and you wanna play. You can connect to Kafka. Confluent Cloud, which is similar to Structured Cloud, runs Kafka on it. You can connect to Amazon Kinesis. You can access cloud object stores such as s three files. So if you have, a place where you have files, CSVs, Avro, JSON, or any any of these formats, you can pick that. It also collect connects to Delta Lake, so that is also an option. And it supports, cloud data warehouses such as Snowflake and Google, BigQuery. In our case, what I'm gonna do is I have, downloaded this these files from this URL. So this you're remember this? Already. So what I'm gonna do is I'm going to say upload a file. Select select file. This is always good because I don't know where I'm going. Yay. Actually, I'm gonna cancel. I'm just so this is, I cloned the, the cloud recipes repo, and I have some sample data here. So what I'm gonna do is go here. Code. That's where we are. Cloud recipes. Go to data, and here's my CSV file. And click on the next. I'm gonna call it a I'm gonna call it test because and I'll show you why. And then what you can do is you can check sample data, and it'll try to map it. So this is a click stream data. So a click happened at this time by this user, and the duration was these many seconds. So that's the type of data it is. And I can say next. Now this is where the Apache Pinot, magic happens behind the scenes. So it's already decided that, look, duration is a metric. It's, type type long. Time stamp is a date time. User ID is a dimension. Event type is a dimension. And, what we can do here is we can take a look at this preview, and it shows you what the schema looks like and also what the data looks like. I'm not gonna change anything. This is just a getting started, so I'm gonna go ahead and move forward. Now it does distinguish between the timestamp field than everything else because it is OLAP. So OLAPs do aggregation queries. And, generally speaking, time is a good indicator of that because it's a lot of times, it's time series data. It doesn't have to be. And then at this point, you can say, that, hey. You can retain this data for a hundred and eighty days, and I can also adjust the replication factor. So I can say, hey. I want you to create a replication of this so that if one segment gets corrupted, I can rely on a second segment. And sometimes it's also used for concurrent queries. So we can take a look at what the table config looks like, and it's gonna say, hey. You're you're doing an offline table here because it's, it's a batch batch ingestion. Right? And then it'll go ahead and define a lot of other things. You can have tenants that segregate who accesses which segment of data or which data table. It also has dictionary columns, event type, and user ID. It also has, replication, all this stuff. So I'm just gonna select apply for to to accept it and then say next, and then create a dataset. And it always says failed, but if you refresh it, you will see that it it takes a minute. So ingestion happened. So what happened? Let's go back and see that this test got uploaded. And I can also go to the query console and query this data, and there you have it. So that is, how easy it is to ingest data. So before, I go any further, are there any questions about this? So, Nicholas asks, any better Azure cloud integration coming in the pipeline? So we do have, Helm charts available that allow you to ingest right into Azure. And then we also have an offering in the, marketplace to deploy, Apache Pinot, on Azure. Nicholas, reach out to me, or you can tell me now if what is the, you know, what is the what is it that you're specifically looking for? Is that not good enough, Helm charts or, marketplace? Bill Reynolds says, Torchicloud has really flexible modeling options and easy to try out as you see. Yes. Amazing. Alright. So I'm gonna share more because there's more. Now that we have some data, let's run run an app. So let's share my entire screen again. And this time, I'm gonna gonna go into Visual Studio. And what I have is, I do have so this is the click click stream data, and I'll share the URL with you. I have a simple app, and app. Py, is actually a Streamlit app. And there's a Docker configure container that runs the Streamlit app. So, if you look at the ReadMe, you'll see actually, if you look at the read me in the right folder, you'll see that, what you have to do is, you can run this whole thing or you could just do it manually like I showed you. And the reason I showed you how to do it manually is to show how you can use the UI to do it. Now, typically, you wouldn't use UI to upload data, especially if it's, like, connected to an s s three bucket and you're running, continuous updates. Right? So every time a file drops, ingest data kinda situation or even streaming. So, I wanted to show you the user interface. So let's run this make app. And what this make app is doing is it's building a Docker image, which is going to run a Streamlit dashboard. And, using that Streamlit dashboard, you will be able to okay. I'll remove this. What happened here? Yeah. That's just wrong. Of course, I ran it five minutes before. Okay. No idea what's going on. Great. Yay. It wouldn't be fun if we didn't run into oh, the container already exists. Alright. So we're going to change the Docker file. Actually, Docker images. Yeah. Okay. Yep. Remove your old images, Barca, and that might help. Okay. Perfect. So if I now bring up this is the, Clickstream dashboard. And, basically, what it's doing is it's giving you a dashboard that is connected to the app and showing you some clicks. Now if this was ingesting, using streams, this would be more impressive, but it's a sample app. So it's working, and it's kinda cool. So with that, one more thing I'm gonna go over is that if you look at this make file, Alright. Just read me. Sorry. You will see that what you need to do to run this against your cloud is, a dot e n v file that has your token, your workplace name, your broker and controller, URLs. And so let's take a look at how we can get to that. So you go back to the StarTree cloud. And remember we talked about so we we did the data manager. We did the query console. Remember we talked about the configure client? Here you go. You go here, and you can generate an API token. And then once you have your API token, you can you can use it. So that would be the API token. And, here's your workspace name, and the the format for the broker is always the same, broker.pinot. Blah, and that blah is actually your workspace name. So with that, I'm gonna stop sharing and go back to yeah. Yeah. Yeah. I know. I I figured it out, Nicholas. It took me a minute. I'm slow, especially when on camera. Alright. So so we did the demo. So question, what did you think of it? Useful? Yes? Yes for good? Two for maybe? Three for no? Amazing. And it's very fast. It takes up less than a minute to provision. The only reason I didn't sign up is because I'll create another, account, and then I have to check my email because it'll send you an email with the the link. So, I don't wanna go ahead and do that, but you can do that right now. And then let me let me quickly share this repo in the chat. And this is what we did. This is what we implemented. So we ingested data. We ran some queries. We connected using an app, which is the Python app. In this case, of course, you could do other apps. And, you know, the GitHub repo that we use is this. Yeah. So that's a good idea. Comparison between Clickstream and, Apache Pinot versus StarTree? Okay. So, you know, I would say that the cloud versus cloud is is fine. If I had to do, Clickstream running locally versus cloud. There's some optimizations that are available in the cloud and what size of the cluster. So it's a little trickier than just just one to one. So that's the challenge in creating a comparison. And we have done comparisons in the past, and there is a article that we published based on it. So here's the the challenge is not that we can run it in three flavors of, you know, three three software. It's that the environments have to be comparable for it to mean anything. I mean, I can always optimize, you know, my yeah. So okay. Let's go to the q and a. Does Apache Pinot cover all data warehouse use cases like when we use AWS Redshift? I don't know what all data warehouse use cases mean it I mean. So it's not a general purpose data warehouse. It's a real time OLAP system. So purpose built for that. What types of APIs are available? So everything that you saw is using APIs. So when you deploy when you deploy it, using OSS, so the controller URL and the broker URL are actually API endpoints. And everything that you saw happening from I'll give you a hint. I used to work for, Microsoft. I did Azure. Just look at what is the transactions going from the web page to the back end. It's an API. So I would say that everything can be done using the API. I haven't found anything particular. Does Apache Pinot cover all data warehouse use cases? Yeah. I don't know the answer to that. I don't know how to answer that, to be honest with you. Does tier storage use iceberg? Yeah. We answered that. Is there any use case that you know that the data can be retained forever? So the data can be retained forever. Now the way I set up this use case, I kept the retention date to be '1, hundred and eighty. Zero is forever. Typically, what you wanna do, and this is Christian's question, is that you want to have tiered storage so that the newer data is more more readily accessible. Any better yeah. I talked about the Azure integration. If it's custom format, how does other engines can use it? Please suggest. So it's it's not the the the file format is not externally readable. If other applications need to use it, we have connectors from Apache Pinot to many, many other platforms. You can, use APIs to extract data or run queries or whatever. So, the the difference is stored data format verse versus, what, you know, what is usable. So it is a columnar database. It's highly indexed. It's got it's using compression. It's using all these things in order for it to be near real time. So that's one of the reason, for for it to be what it is. What types of APIs? I was referring to languages. So it's it's using REST API. Of course, you there are connectors available for, Python, Go, many other languages, of course, Java. So you can write your own or you can use REST API or you can use connectors in all these, things that I said. So, you know, I'm not up to date on all the flavors of connectors, but they have a lot of connectors. Also, it's an open source project, Apache Pinot is. So what we find is that, there are a lot of contributors. So Uber has a lot of additions that they added to the product because they needed it for their stuff. So an example of that is, you know, they have a connector for, they have a a proprietary index, and they also added some an infrastructure so you can run things like PromQL against, Apache Pinot. So those are all additional, additional things. So, you know, it's highly extensible. Any other things that we any other things that I missed? Did I miss any q and a's, Camille? So how is how can Pinot with tier storage be compared to conventional data warehouse like Redshift? Yeah. So conventional data warehouses are, you know so, I mean, we're cheaper, faster, more scalable, and, you know, better. So we've competed with, many, many data warehouse accounts. And in many cases, we performed a lot better. Now I'm not saying that it will work for every use case in every scenario, because what are you doing with your data warehouse? It depends. Right? So if you have a very specific need and your data warehouse works with it, great. Keep it. However, what we are providing is this real time analytics platform. So it's high, it's very good at scaling. It's very good at high concurrency queries. If your data warehouse has only, you know, a hundred users, a thousand users, then you don't need the high concurrency. LinkedIn has millions of users. Right? Stripe has millions of users. So it's very high concurrency, very purpose built for specific things. So any other questions? Am I missing anything? I'm kinda jumping between q and a and chat. So if I miss something, let me know. Alright. So before we, before we oh, how the query looks like before dashboard all up? Okay. So let me quickly what I'm gonna do is share this real quick, and then we'll go back to your question, Shrikant. So, great questions, great engagement. I love the enthusiasm. And if you wanna learn more, come to our virtual RTS summit where a lot of the people who are working with Apache Bina as well as, ClickHouse and other real time, analytics platforms are going to be there and presenting. We have amazing speakers, from all the end users, Sovereign, Uber, Stripe. All these people are gonna be there from LinkedIn contributors. And registration is open, and it is free. It's virtual, and you'll learn so much more and answer, these questions that are more intelligent than my answers. And, of course, join uh-oh. Join our Slack community. We are very active. Some of these questions can be easily answered there. Perfect. And, of course, this is me. Connect with me if you have questions. But I do wanna answer that question. I have only six minutes left, but I'm gonna go ahead and see if I can stop the slides, share my screen. Entire screen. Share. And then we're gonna go here, And I'll give you another URL. So there's startreedata.learnlearn. And if you go here, you can go to the Pinot advanced, and these are all exercises that you can do. And, so the question was about queries. So here are some sample queries. These are all the queries that you can run against this dataset that we created. You can see there are joins. There are, you know, JSON, stuff, all kinds of stuff. And if you have more questions, definitely, join the community chat and ask specific question. That is the best way to to engage. And with that, did I miss anything? We have four minutes left. So, yeah, before jumping off, I'd love to get some feedback. Yes. Great. For one, two for three for do better, Barkha. And, also, I would love to have, you know, suggestions on what else you wanna see. Christian says, can Pinot also be used to query historical data constantly? Yes. Absolutely. Absolutely can. So none of the data is unless you put a time to live, on the record, the data doesn't disappear. What what happens is that the way we and and this is up to you. What stays in the hot cache versus the the cold depends on you. It doesn't have to be time based. It could be something else based. So what how you configure your tables is up to you. There's a slight difference between queries against, the cold cache versus the hot. The difference is that because the data is highly segmented, all the queries are fairly fast, because think about, like, Stripe with, petabytes of data. Right? They're not it it's a financial institution. They're not, they don't have disappearing data. That would suck. So they have data that stays for several years, seven years, at least for, you know, tax purposes and everything. So, yes, the data is all available. It's not disappearing unless you mark it as disappearing. You can put a time to live on your records. Amazing. Pinot can handle a query against how does Pinot handle big data inquiries and what are its limitations. So, it's a very open ended question, so difficult to answer, but it was purposely built for high latency, high concurrency, low latency queries. Right? So the question is, how many queries are hit against the Pinot database that LinkedIn runs on? How many queries are hit against, the, you know, Uber, Uber Eats? The question is not what are the limitations. The question is how you design your Pinot cluster to support the use case that you have. And the way Pinot cluster for Uber is designed is very different than the one that is designed for, but we have another client called Cars.com, which doesn't have real time data, but they have a lot of data, and they use it still. So it it depends on each use case. Alright. So we have less than a minute, less than two minutes. Any last questions? If not, I'm gonna say thank you so much. I thoroughly enjoyed this presentation because it was engaging. And, again, I'm very interested in hearing, you know, what else kind of webinars, would you like to have. And, we'd love to host them if you would love to attend them. Anything I'm missing, Camille and Ellie? Thank you, Christian. Thank you, Paolo, everybody who participated. Great. Alright. In that case, I'm going to stop sharing. And, hopefully, I'll see everyone at RTA Summit. Okay. So real world new use cases. Got it. Okay.