Ruminations #16: RAG, resources on moats, GenAI's value "grossly over-estimated", Sora
The usual mix of musings
Hi all,
Firstly, thanks so much to all of the folks who 1) came up to me at Southstart to chat about this blog and 2) provided thoughts and questions off the back of my 1000 Conversations article. I massively appreciate both!
I cover a bunch of random topics today:
Sora
GenAI’s true economic value
RAG
Resources on data-based moats
Resources the team enjoyed
Send me your thoughts on any of the below, but note my reply may be delayed as our team is offsite next week.
I’m still thinking about Sora
I shared in an earlier edition of this newsletter that part of the excitement generated by OpenAI’s Sora had to do with the potential of building a world model: a model that understands physics, and, subsequently, cause and effect.
Whether OpenAI has accomplished this and whether it’s even possible with the architecture they allegedly used ignited a lot of debate online amongst experts.
Whilst I have no interest in weighing in on those research questions, I did share the following with my team after seeing this Instagram video on OpenAI’s account.
I saw this last night and thought it was a good illustration of what is yet to be solved in video generation: understanding the relationship between the things happening from one frame in a video to the next. You can see here that the video generation looks pretty great, but if you watch the steering wheel you can see that turning the wheel left or right does not steer the car in either direction. That's because the model hasn't figured out the relationship between an action and a reaction. The model doesn't know that the movement of the steering wheel relates to the movement of the car and its position in space.
Is GenAI’s economic value over-estimated?
I saw this Tweet by famous AI researcher Yann LeCun and I was surprised, given LeCun has been driving GenAI-related work at Meta. Does he also think GenAI is over-valued?
I asked a well-respected professor & researcher about the tweet. Here’s his response:
The problem with the LLMs is that people confound the language understanding and the knowledge base. Language understanding is incredibly powerful and represents a revolution. The knowledge stored in generative models is sparse, and unreliable. Langchain uses the language understanding part of the LLMs far more than the knowledge, and thus makes a lot of sense. There are a bunch of extensions to this approach, including all of the retrieval augmented tech, and they have a lot of value because they massively reduce the training data requirements for ML businesses. The problem with the knowledge in LLMs is that the model has to interpolate between what it knows, [i.e.] the Prime Minister between Tony Abbott and Scott Morrison wasn't Tony Morrison.
The same opportunity exists with LMMs [Large Multimodal Models]. They are a very effective way to access the value in multimodal data, but if you want to generate images then yes, they;'re probably just for art because they hallucinate.
His quote speaks to a couple of things for me that I think about a lot:
The comment above reminds me of a quote from Andrew Ng that I shared in an earlier blog post: “Think of these models now as reasoning engines, not as memory machines.” This is the exciting opportunity of GenAI and what’s to come, I believe.
On LMMs (not a typo - Large Multimodal Models), he suggests that perhaps in the same vein, we can get more value from them than simply “generating” images and video. This aligns with what I talk about above on Sora, in which researchers are more excited long-term for the “world simulation” capabilities of video generation models than the ability to generate videos themselves.
RAG (Retrieval Augmented Generation)
I was intrigued to see Cohere’s new product, Command-R, now out in the world. They seem to have hit on many of the focus areas of enterprises in this release: it’s built for RAG and tool use (like APIs) - which can provide much greater accuracy than solely using parametric memory - and it’s low latency and high throughput at a lower price than their other offerings. It sounds like it’s also designed with productionising in mind. So many wins for enterprises currently struggling to take MVPs to production!
I would be interested in hearing from researchers and devs who take a look: what do you find compelling about this release and what’s just marketing? I’m very interested in seeing how the GenAI toolkit for enterprise plays out over the next 12 months. It will hopefully be a great accelerant for AI adoption and hence for startup revenue.
Friend of Square Peg, James Alcorn, had a great, short write-up on RAG in the recent Zetta Venture Partners newsletter that I thought I’d share here for those who aren’t as familiar with RAG:
Retrieval Augmented Generation has become an enormously popular LLM inference technique hand-in-hand with the AI boom. Practically speaking, ‘RAG’ is the process by which AI developers give an LLM the information needed to answer specific, data-dependent questions, like “How much did my company spend on legal fees last year?” To answer this question, an LLM would need to see the company’s invoices (information not contained within its training set) from which it would extract and summarize the prior year’s expenditures on legal services. Or as developers would say, the model would draw additional context (the company’s legal invoices) into its context window at inference-time, and in doing so, will have ‘augmented’ its ‘generated’ output with the information it ‘retrieved’. Thus: ‘Retrieval Augmented Generation’.
The recipe for foolproof RAG is shockingly simple: fetch useful context. The hard part is determining what is and isn’t useful. Developers solve this using a vector database, like Weaviate, to retrieve additional context that is semantically similar to the user’s query. This functionality made ‘RAG on vector DBs’ a top AI initiative for executives and their boards around the globe last year, and skyrocketed vector databases to prominence as the perfect hammer to RAG’s nail.
As the AI application market has evolved, new RAG approaches are emerging to serve new use cases. The early cohort of popular LLM applications (e.g. image generators, chatbots, and email writers) traffic almost exclusively in words, images, or video, three categories of unstructured data that embeddings do a great job of representing. But when LLMs are asked to reason over symbols or objects, like software code, that are formally defined by their own internal set of logic, properties, and methods, embeddings can struggle.
Embeddings are low-dimensional representations of high-dimensional data - by definition lossy. That is precisely what makes them so useful when processing unstructured data. But symbolic systems, like code, are governed by a logic that we can expose concretely to the LLM via alternative ontologies like a graph, which is purpose-built to capture logical information and hierarchical relationships. Graphs, or graphs in concert with embeddings, may better equip the LLM for complex reasoning tasks in symbolic domains, and therefore produce better answers to harder questions.
Graph databases are nothing new; in fact, they have followed a rise-and-fall pattern over the last two decades because, compared to other database formats, they can be slow, expensive, and difficult to manage. The rise of LLM-powered applications reasoning about structured or symbolic data may finally be the significant market use case the technology has been waiting for. Symbolic ontologies are an increasing target for AI applications because they can enable unprecedented levels of autonomy, reasoning, and understanding, especially in the context of software engineering tasks.
If data isn’t as much a moat anymore, what is?
A reader of this blog emailed me about my 1000 Conversations article to ask: if you don’t believe in data being as strong a moat for LLM-based AI businesses, what moats are relevant?
I wrote:
I am still developing my own knowledge around how to think about how moats develop in software businesses and of course, AI businesses. But at the moment I believe:
Few software businesses start out today with moats, and that's ok. They develop over time, and they develop through the advantage gained by building good products fast.
Application-layer AI startups will likely develop the same type of moats software companies have in the past. In some cases that means legitimate data moats, in other cases it looks like more traditional moats like those covered by Hamilton Helmer's 7 Powers.
More technical AI-native companies will look different from a moat perspective, but I haven’t focussed on that heavily as I don’t see them as often in Australia/NZ.
I have a longer article I’ve been sitting on for a while about data moats in particular. I haven’t shared it because I want to make sure I get the nuance right and provide real examples. In the meantime, some recommended reading:
7 Powers - NfX
The New New Moats - Greylock
Why are data network effects less valuable than regular network effects? - Platform Chronicles
Six questions to evaluate data-enabled learning moats - Platform Chronicles
Creating feedback loops from data and AI - Platform Chronicles
Content our team enjoyed recently
Philippe shared this interview with the MD of Insight Partners, with the comment:
George Mathew who led most of the AI investments at Insight has interesting views about AI investments strategy. Also a good overview of the Israeli AI market.
Lucinda enjoyed this NYT podcast episode:
Interesting quick listen from the Daily about Google Gemini and whether AI should be guided by social values.
Ben shared Cloudflare’s 2023 Year in Review with the team, as we spend a lot of time in our listed equities team looking at tech infrastructure players. The report shares the GenAI sites with the highest activity, some of which I wouldn’t have expected to see there. I’m interested to see where Perplexity lands next year - it’s become the favourite search engine of a few of us at Square Peg.
That’s all! Thanks again for reading!
Casey