Data-Sleek founder and CEO, Franck Leveneur appeared alongside Couchbase CMO, John Kreisa on DM Radio’s Really Real-Time Data hosted by Eric Kavanagh. During the interview, Leveneur, Kavanagh, and Kreisa discussed the importance of data management, the role of real-time data, and the future of AI in business.
In the first segment of the interview, Kavanagh, Leveneur, and Kreisa discuss the evolution of real-time data management and the challenges faced in delivering personalized experiences using modern technologies. Leveneur and Kreisa share their insights on the importance of JSON-based databases, the rise of hybrid transactional analytical processing (HTAP), and the integration of vector databases with large language models.
Kavanagh and Kreisa begin the discussion with highlights from Couchbase’s Capella platform. Couchbase combines an operational data store with a columnar store for real-time analytics. Their platform reduces latency and provides for adaptive applications. Kreisa emphasizes the flexibility of JSON-based document databases in handling diverse data structures while delivering personalized experiences.
Leveneur, with almost three decades of experience in databases, discusses the challenges of managing structured and unstructured data from various sources. He underscores the importance of choosing the right database engine and architecture early in an organization’s data journey.
The first segment also touches on the potential of vector databases, which convert text and imagery into numerical values. This enables efficient comparison and consensus-based analysis. The interview explores the implications of these technologies for real-time data management and the integration of large language models into workflows.
Key Takeaways
- Using real-time data to offer hyper-personalized experiences is increasingly critical for data-driven enterprises.
- Choosing the right database architecture is critical–changes can be costly and disruptive.
- Vector databases are essentially consensus engines.
THe Role of Real-Time Data Interview: Segment 1
Host: Eric Kavanagh
Guests: Franck Leveneur (Founder, Data-Sleek), John Kreisa (CMO, Couchbase)
Broadcasted May 23, 2024
Find the full podcast on DM-Radio.Biz Here.
Read Part 2 of this interview:
Read Part 3 of this interview:
Eric Kavanagh: Ladies and gentlemen hello and welcome back once again to the longest-running show in the world about data. It’s called DM radio. Yours truly, Eric Kavanaugh here, in year 17 of the data management radio show. We’ve been rocking and rolling for quite some time now and some things change a lot. Some things never change. And one thing that will never ever change is the need for data. That’s what the show is about. It’s all about data management and data persistence.
Obviously a lot of attention is on AI these days. Do you know what AI needs to do its job? Data, lots and lots and lots of data. Today we’re talking about one of my favorite topics which is really real time data.
That actually is a bit of a play off on one of my favorite movies, Repo Man. The music for our show is from the movie Repo Man from way back in 1983. I think with Emilio Estevez and a couple other characters.
This one scene where he says he had the dream and “it’s really real, it’s realistic.” So much in fact, that the first article I ever wrote in this industry was back in 2002 and I want to say and it’s all about real time data.
So 22 years ago we were talking about real-time data and like things have changed since then, there’s a lot going on. There are a lot of different engines out there to do it.
It’s a lot less expensive to do it and the computer is much more powerful these days. We have distributed architectures. There are lots of different ways you can “fry the fish” these days if you will, and we’re going to find out about those from our guests today.
Exploring the Role of Database Vendors
Eric Kavanagh: We’ve got John Kreisa from Couchbase and also Franck Leveneur from Data-Sleek. Both experts in the data field. When push comes to shove, like I said, there are lots of different ways you can do this.
Buying enterprise software is a very serious matter. You want to make sure that you buy the right technologies. You want a vendor who’s going to work with you, be there for you, change, and adapt over time.
And of course, Couchbase is one such company. They’re one of the many companies that spun out to compete against the former Goliath of Oracle, for example, IBM, and DB2.
I remember watching this database explosion a number of years ago and it’s really quite impressive. You got a whole bunch of different databases–there’s like a dozen or more of these open-source databases–and they’re all fit for purpose. They’re all doing interesting things.
These days I think there’s like 147 established database vendors. So what does that mean? It means you got a lot of choice, but you got to figure out where does one database engine excel versus another. That’s what we’re going to find out today.
So with that, John Krysa, what brings you in from Couchbase? Tell us in the real time world. You guys have done some interesting stuff lately. What’s the latest with Couchbase?
Couchbase’s Real-Time Application
John Kreisa: Thanks, good to be with you again Eric, and good to be with your audience. I’m John Kreisa, I’m the chief marketing officer here at Couchbase and we offer Couchbase Capella, which is a cloud database platform for modern applications.
The modern applications that include operational data, but we’ve also added a column or data store for that real-time processing and analytics of data. It eliminates some important latencies and serves some real important needs in terms of giving enterprises the ability to deploy applications.
Adaptive applications, as we would call them, to their customers. This can react to real-time data, to real-time inputs, and make them much more situationally aware and hyper-personalized. You can get that experience and take a wide variety of data into the database and give those experiences.
Eric Kavanagh: It’s interesting, you know, you were mentioning before the show with this new columnar store. You’ve added to do what amounts to, in database analytics, is the term we used to use. You think about how we got here, and obviously there are some big goliaths that are still around.
For example, IBM of course, and Oracle. They’re all still selling software. But you know, when you can bolt on a functionality like that, you are really serving all sorts of different purposes. To your point, historically, you would have had to have some other tool. You pull the data out into that tool and that’s what you do. It’s like that increases not only latency but it creates another choke point. It creates another bottleneck. It creates another place that things can break.
So when you can do that inside, I mean, and the real question is, you know, “how do you set that up and how do you get it all running?” But I think it makes complete sense to have the one system that you’re using as your foundation. Your data foundation serves both of these purposes. We can do that today efficiently, right?
John Kreisa: Yeah that’s right. By having them side by side in the same architecture, it reduces as you said that latency. There’s no ETL process to move data to another system where it gets processed.
In addition, there’s an impedance mismatch which is overcome. We are a document-based database, based on Json. So the columnar store also operates in Json so the data can transfer seamlessly between the two. That just gives a much faster, better experience for providing those analytics back into the applications and back into the operational store. That’s the core foundation of Couchbase.
As you said, in-memory architecture distributed for really, really interactive applications. Our customers run their most mission-critical and business-critical applications on Couchbase. So bringing that analytic capability close in there, the feedback we’ve had from customers has been super positive.
Navigating the Challenges of Large Language Models in Workflows
Eric Kavanagh: Yeah and you know, I heard a quote, this is a number of years ago. It’s probably almost 10 years ago now, but I’ll never forget this. A guy said that Json is the jpeg of data. Right?
John Kreisa: I haven’t heard that one, but I like it.
Eric Kavanagh: Of course Json is this architecture, right? It’s a hierarchy. Basically, there used to be HTML and wasn’t it xtm l or something like that?
John Kreisa: XML. Certainly description language.
Eric Kavanagh: Yeah, it’s still there. I mean people still use XML, but Json just won the battle. Json is everywhere. Maybe just to explain to the audience and the business world out there why a Json architecture matters to be able to capture all sorts of different architectures of data.
I think that’s the key. It’s not just columns and rows. You’re talking about a whole hierarchy with nested data and all kinds of different things. Because you’re a Json database by architecture, that means you can absorb all kinds of different–traditionally unwieldy–data types. Is that about right?
John Kreisa: Yeah, that’s correct. I mean the document using the document based on Json as the fundamental storage structure and representation of the data gives you a lot of flexibility because it’s self-describing.
You know what kind of structure and data is coming in but it’s not limited to rows and columns. It can actually be widely variable in terms of how you set it up so that a document database can handle time series data. It can handle graph-like data, it can handle medical data, medical records, it can handle transactional data.
There’s no doubt that Couchbase is being used for transactional applications, which are serving financially-related applications. We’ve got all that flexibility. That comes by making the choice to use documents and a Json based document as the core infrastructure. So lots of flexibility there.
Json: The De Facto Reference Architecture
Eric Kavanagh: I’ve got another client that was doing some really interesting stuff with Json and they view Json as a de facto reference architecture. I thought that was very interesting and you can use that as a personalization window into different entities or people so in that Json structure you can bring in characteristics of the entity or the person or the group or whatever and then that becomes key to your personalization efforts.
John Kreisa: Yes, that’s right. It’s a metadata, if you will. It’s stored amongst the very data itself which gives you more flexibility on how you create an application. Which is reacting to how you know which user you’re in, and what the situation that user’s in, and how you’re serving up that data. The experience you give them–a really hyper personalized experience–that’s key to it.
Delivering a Personalized Experience
Eric Kavanagh: Well and so I’ll throw one last question over at you and then we’ll bring Franck into the conversation here. Personalization is going to be the key to success, it seems to me. And it’s like anything else in this industry, we’ve talked about it for decades. It’s not new. We’ve talked about it for a long, long time.
Lots of people have been kind of jaded on the concept because it never quite got there. I think it didn’t get there because of the architectures, because of the compute power, because of lots of different factors. But now it’s like, ‘no really guys, we really, really can do this now,’ right?
John Kreisa: Yeah I think you’re right. I think it is a combination of network speeds and processing power flexibility that’s in the application. Then a lot of times the analytics really did have to go to a separate system to get the Insight you needed to do that personalization. Now there’s a lot of things out there, and we were talking before about Ai and what that’ll do.
There’s certainly fraud detection that uses machine learning type AI applications which are operating in real time. But those are very complex architectures. Now it’s something where more applications can have that architecture and deliver that personalized experience.
The Evolution of the Data Management Industry
Eric Kavanagh: That’s good stuff. Let’s bring in Franck Leveneur. Franck, you’re in this space. You consult with your clients all the time on real- time data. We were joking that it’s not new. I wrote about it 22 years ago and it was new then, it’s not new now. It’s been around a long time, but it really is real-time now. What are your thoughts on the evolution of this industry and how close we are to delivering on long offered promises? What do you think?
Franck Leveneur: Thank you. Well first of all, I just want to introduce myself a little bit. I have about 25 years of experience in databases. I started with Microsoft SQL back in the day and then moved into MySQL. I worked also on AWS optimizing MySQL, DBS, and Aurora, and I saw the evolution of how you used to manage a database on the server. Then after AWS came in and the database was being managed by AWS. DBAs were afraid that we were going to be out of jobs.
But I think what’s important is to embrace the technology, to learn about it. There’s always going to be some work needed. Real-time analytics, real-time data are definitely something that have evolved enormously and especially with mobile devices, IoT devices, 5Gs. I think all those are going to accelerate the data feeds. AI, of course, being able to use cameras and detect whatever it is.
I was watching some videos on YouTube yesterday where some cameras can detect the behavior of people on the street, whether they are dancing or whatever. [Whether] they are wearing a weapon, etc. So all this data is going to be fed somewhere. Some analytics are going to be done in the background. But there’s always going to be something happening.
The challenge is going to be to choose the storage, to choose the technology behind that in order to process this very quickly and do whatever needs to be done. The challenge is also about real-time data in the different data structures.
There are structured and unstructured data. Now we have sound, we have video, we have documents, we have text, we have JSON. That gets fed with APIs. I would say it’s interesting and it brings a lot of challenges. It makes you think architect, find the right solution, and find the right database engine to do the job.
Sometimes you choose the wrong one and then you have to migrate, which can be costly. I’ve seen that a couple of times, especially for startups, where they’ll pick a database and they’ll grow very quickly. Because they have a lot of data, and they have success, then they have to migrate because they’re abusing the database. They’re running crazy queries on it.
Some people are very creative. They come up with multi-line and hundreds of lines of SQL queries, and then they’re surprised that their application is not working properly. A lot of people don’t know that transactions are transactions, and analytics are analytics. They are two different worlds, two different behaviors of the data. One is constantly evolving. It’s alive. The other one is at rest, and you’re just querying the history. So it’s a totally different behavior.
The Rise of HTAP and Managing Transactional Analytics
Eric Kavanagh: And there’s this thing that’s not terribly new either. I think it was probably nine years ago or so that I was looking into it. HTAP as they call it. HTAP, hybrid transactional analytical processing. Monty Widenius is a guy who was working on some of that stuff as I recall.
Franck Leveneur: Yes
Eric Kavanagh: What they would do is they had sort of a sniffer, meaning when the query comes through, the sniffer will look at this and say, ‘Hm is this a transactional workload or an analytical workload?’ It’ll route accordingly. Now that stuff, have you seen that in practice, and how well does that work from your perspective?
Franck Leveneur: Well it’s actually a very good question because we’ve been seeing these issues where, like you said, they want to use a database for both the analytical and the transaction. Back in 2016/2014, I discovered a database engine called, at the time was called MSQL. It’s called SingleStore.
Eric Kavanagh: Sure, yes.
Franck Leveneur: That’s what they do. It’s HTAP, they’re able to do transactions, but they can also do analytics. And now they’re also moving into the vector engine. So it’s a pretty scalable solution. That’s the thing with Snowflake is to separate the compute from the storage.
I think it’s also critical right now, today, to be able to not worry about your storage. That used to be a problem where your data space goes too quickly. You run out of space and then your application goes down.
Now, I think, is a thing of the past. SingleStore has transitioned to that. They use actually S3 as a storage layer and then some caching mechanism, and then you can scale the computer as much as you need. They share the data for you. It’s an efficient platform even for data ingestion, being able to plug into history and ingest data. It’s very useful.
Vector Databases and Workflow Integration
Eric Kavanagh: Yeah, and I’m glad you brought up vector databases. We should probably talk about that a little bit at least. Because, of course, these are the tools of choice to go alongside a large language model to host your embeddings, basically.
What these engines do is they just convert text and imagery to numeric values and then they convert back to text or imagery on the other side. So you’re always going to have a little bit of lossy nature to it, because you are doing this conversion. But nonetheless, the vector databases are very, very good at comparison and contrasting. They’re really–what did someone say? They’re really consensus engines.
What they’re doing is they’re taking so much of this data. You have, like a dog starts here, and goes like this, and all dogs go like this. Or they’re in this general area. Cats are maybe over here, highways are over here. And somewhere in that expanse, you can mix and match different things. But they’re really consensus engines. So they’re doing different things than you would do for real-time data, for example.
Nonetheless they’re a huge force out there. Everybody’s working on their Vector databases. I look at this whole, and our first break is coming up here, but this whole new gen AI spin has clearly captured the imagination of companies everywhere. It is not fit for every purpose in the data world, it is not fit for deterministic use cases. It is very fit for stochastic use cases, for discovery, for playing around with ideas, and learning things.
But I’m here to say that as a front-end, these large language models are going to fundamentally change how we interact with information and how executives interact, how working people interact. It’s a big deal. That vector database is a big part of that equation. But don’t touch that, folks. We’ll be right back in 1 minute. You’re listening to DM radio.
This interview has been edited lightly for clarity.
Find the full podcast on DM-Radio.Biz Here.
Jump to Part 2 of this interview:
Jump to Part 3 of this interview:
Jump to Part 4 of this interview:
Want to host Data-Sleek on your next podcast? Contact us for our speaker sheet and to set up a free consultation about real-time analytics.