Ruminations #15: For investors, AI prompts more questions than answers
Plus, the surprise of Sora continues
Hi all,
Often when I update my team on what I’ve learnt about AI during the week they say something along the lines of “That’s interesting, but what’s the business implication?” to which I often reply: “I don’t know… yet. I’m trying to get to that”.
Thinking through investing in AI has felt a bit like pure versus applied research. With applied research, you aim to find a solution to a real problem. With pure research, you don’t necessarily know how your learnings may be applied but you hope they’ll become useful in time.
In the blog below, Elad Gil reflects on a similar experience. We were lucky to have Elad as a speaker at our first Founder Summit in San Francisco (which is back again in SF this year!); he’s an excellent writer, deep in AI and we respect him greatly as an investor.
I share notes below but I highly recommend reading it yourself. I also share a great podcast episode from A16Z on Sora with some notes on that.
Also, next week on the 6th of March at 2pm Anton van den Hengel and I will be jumping on the SouthStart stage to talk about all things AI. I would love to see some of you there!
The more you say, the less I know
“The more I learn about AI markets, the less I think I know” Elad Gil notes below the heading of his recent writing on AI.
There are a lot of unknowns. Even when we collectively feel there’s a “known” outcome like AI video generation for example, we don’t know when that will occur, as in the case of OpenAI’s Sora (see below).
In the first section on LLMs, Elad shares questions and thoughts on what happens at the foundation model layer. It’s all interesting to ruminate on, though as I shared last week in my 1000 Conversations article:
I believe that whether the future is one large model (or one large mixture of experts) versus smaller or open source models is not as important a question for most founders.
Founders will have to largely accept what’s available to them in market. What’s more important is finding PMF and then you can rebuild your stack as appropriate. If you reach PMF you’ll have the resources to do so.
Note that I say “for most founders” as this question is in my opinion more important for founders building products lower in the stack; most are building at the top of the stack.
That said, I love the way he frames distinguishing the different models that are emerging:
I expect that in the short-term the largest tech players (who can afford to do so) will continue running towards AGI (the right-hand side of his framework), whilst all businesses (incl. big tech) will build with some combination of niche models and rationed use of the more expensive, generalisable foundation models.
If, in the future, we can reach the “Holy Grail” of fast, cheap and high generalizability we may not need niche models (that is the belief of some folks I’ve spoken to at Deep Mind, OpenAI etc). That or they’ll be reserved for edge computing.
In the section on infrastructure, he lists some great questions that I think about a lot, namely these:
Does the current AI cloud companies need to build an on-premise/BYOC/VPN version of their offerings for larger enterprises?
When does the GPU bottleneck end and how does that impact new AI cloud providers?
How do new AI ASICS like Groq impact AI clouds?
What else gets consolidated into AI clouds? Do they cross sell embeddings & RAG? Continuous updates? Fine tuning? Other services? How does that impact data labelers or others with overlapping offerings? What gets consolidated directly into model providers vs via the clouds?
On this, I wrote the following notes while reading:
He touches on the fact that VCs can't meaningfully deploy enough capital to entirely fund the development of foundation models, leaving them to rely on cloud companies. Elad tries to think through the market implications of this. The clearest one is that it gives hyperscalers a huge amount of power. This tidbit was interesting:
It is important to note that the scale of investments being made by these cloud providers is dwarfed by actual cloud revenue. For example, Azure from Microsoft generates $25B in revenue a quarter. The ~$10B OpenAI investment by Microsoft is roughly 6 weeks of Azure revenue. This suggests the cloud business (at least for now) is more important than any one model set for Azure (this may change if someone reaches true AGI or frontier model dominance). Indeed Azure grew 6 percentage points in Q2 2024 from AI - which would put it at an annualized increase of $5-6B (or 50% of its investment in OpenAI! Per year!). Obviously revenue is not net income but this is striking nonetheless, and suggests the big clouds have an economic reason to fund more large scale models over time.
I also didn't know that 1) Meta recently announced a $20b compute budget and 2) that you have to licence Meta's open source models if you are above 700m users and that Microsoft, for example, currently pays licence fees to use Llama.
He commented on enterprises leveraging their existing cloud credits for using AI. This has been noted in the earnings reports of some of the cloud players, in which they've seen some cannibalisation of cloud spend in favour of spending that same budget on AI. I expect that's a temporary measure as companies begin to see real cost savings from AI and can better rationalise that additional spend.
He asks how important on-prem versions of AI products/models will be which is a question I hold and have been asking hyper scalers about (MSFT, AWS, GOOG all seem very focussed on this - MSFT actually seems to be leader in Aus on that given the heavy large enterprise skew of their customer base & the recent development of local cloud clusters).
I also found this point interesting, though not that relevant to investing:
"In the absence of GPU on the main cloud providers companies are scrambling to find sufficient GPU for their needs, accelerating adoption of new startups with their own GPU clouds. One potential strategy NVIDIA could be doing is preferentially allocating GPU to these new providers to decrease bargaining power of hyper-scalers"
From a VC perspective, I was intrigued by his point around GPU scarcity being a temporary growth accelerant for smaller startups that have their own GPU clouds. Customers may be using the services of startups in part because those startups have their own GPU capacity.
Lastly, he talks about the application layer. The point I found most interesting in this section is:
ChatGPT launched ~15 months ago. If it takes 9-12 months to decide to quit your job, a few months to do it, and a few months to brainstorm an initial idea with a cofounder, we should start to see a wave of app builders showing up now / shortly.
If you’re in this next wave, let me know! Let’s jam on ideas.
Sora: “When, not if” but no one thought the when was now
No one except OpenAI! Although maybe their researchers were surprised, too.
A16Z had a great podcast episode out over the weekend about the surprise release of Sora. The interview is with one of the key researchers behind diffusion models, Stefano Ermon.
As with the researchers I’ve spoken to, Stefano was surprised by Sora:
Yeah, honestly, I was very, very surprised. I mean, I know the two of us often talk about how quickly the field is moving, how hard it is to keep track of all the things that are happening. And I was not expecting a model so good coming out so soon. I mean, I don't think there is anything fundamentally impossible and I knew it was coming. It was just a matter of time and just more research, more investments, more people working on these things, but I was not expecting something that good happening so soon. I thought it was maybe six months out, a year out. And so, I was shocked when I saw those videos, the quality of the videos, the length and the ability to generate 60 second videos. Oh, it's really amazing.
What’s so hard about AI-based video generation?
Compute:
At a very high level, you can think of a video as just a collection of images. And so, really, the first challenge you have to deal with is that you're generating multiple images at the same time. And so, the compute cost that you need to process an image is at least 10 times larger than what you would pay if you just want to process one of them at a time.
And basically, this means a lot more compute, a lot more memory, and just much more expensive to train on your large scale model on video data.
Data:
The other challenge is just the data challenge. I think a lot of the success we've seen in the diffusion models for images was partially due to the availability of publicly available data, such as like Lion, like large scale, image, and captioned data sets that were scraped from the Internet, and they were made available, and people could use them to train large scale models. I think we don't quite have that for video. I mean, there is a lot of video data, but the quality is kind of like a mixed bag, and we don't have a good way to filter or screen them, or there's not a go-to data set that is available, and everybody's using to train these models. So, I'm guessing some of the innovations that went into the SORA model are actually on selecting good quality data to train the models on.
Captions for labelling:
Captions are also hard to get for video. I mean, the video data is out there, but getting good labels, good descriptions of what's happening in the videos is challenging, and you need that if you want to have good control over the kind of content that you generate with these models.
Complexity:
And then there is also a challenge of video content. It's just more complex. There is more going on. If you think about a sequence of images, as opposed to just one, there are complex relationships between the frames. There is physics, there is object permanence, and the principle, I think a high-capacity model with enough compute, enough data. I'm potentially learning these things, but it was always an empirical question, how much data are you going to need? How much computer are you going to need? When is that going to happen? Is the model really going to discover all this high-level concept and statistics of the data essentially? And it was surprising to see that it's doing so well.
The last couple of episodes of the All In Podcast mostly cover AI, which I’ve enjoyed. In Ep 166 they talk about how Sora was a surprise and why it’s difficult to accomplish, which ties well with the above. In Ep 167 they talk about last week’s controversy surrounding Gemini, Groq’s recent success and NVIDIA’s current market position.
That’s all, thanks! Feedback is always welcome and appreciated.
For this week’s image I wanted to refer to AI watching Newton’s apple tree to understand the physics of the world. Unfortunately, it confused the colour orange with the fruit…
Midjourney prompt: robot dropping apple from tree, white background, orange --style 4Bjh1JDmdFzBpbz2 --v 5.2