Ruminations #1: Insights from Tel Aviv, data flywheel discussions and reading
Notes from Tel Aviv, data flywheels and "new" moats
Hi all,
I’ve decided to experiment with moving the (roughly) weekly AI update I send to Substack, mostly because I expect it’ll be logistically easier for me.
Besides, what’s more cliche than a VC with a Substack on AI in 2023?
I've been out of office, visiting Israel and meeting some great AI folks whilst there. Below I share my summarised notes from conversations I had and from a dinner the team organised with some of the best CTOs, researchers and ML engineers we know in Tel Aviv.
I've also written some thoughts on the value of data flywheels. I consistently hear from founders that they expect a data flywheel to be their key source of moat for their AI-native startup and I’m not entirely sure I agree. This is my attempt to put my thoughts in one place to solicit feedback on my thinking. I would love to hear your thoughts; you’ll need to request access due to our security posture at Square Peg.
I also share some extracts from an article by Greylock’s Jerry Chen on moats relating to AI-enabled businesses.
Notes from Tel Aviv
I had the good fortune of working in Tel Aviv last week, and met some great AI folks while in town. Below is a summary of what I heard from those I met.
Open source
When discussing open source, a number of people referenced knowledge distillation. People are using large models to train small models with the same level of capability. This means models can get smaller with the same power. It seems possible that open-source players can use big models to train smaller, more cost-effective ones which can help them keep up while consuming less compute.
One founder I spoke to said he was sceptical of knowledge distillation in some regard: he feels that it is not lossless and that you lose important edge cases.
There seemed to be a high degree of bullishness for open source in general and its continued contribution to the field.
Some said that they see foundation models as a commodity, and that open source models will be of equivalent capability. They referenced knowledge distillation here, and examples of people training open source models in clever ways using GPT4. They also referenced how many developers open source has attracted, and how people - regardless of role - want to feel that they’re giving back to something and that they’re part of a community.
One CTO spoke about open sourcing their code, and and how he was very against it for some time. When they finally did, they found a community of collaborators who really drove their product forward, and he said it’s been a huge positive.
One person on the table felt that the largest models would remain few and in the hands of private companies, because of the cost and the danger of the most capable models. We spoke about costs coming down, about knowledge distillation and about whether danger will really factor in given models becoming more and more accessible (given there’s so much incentive to develop regardless). Didn’t get to the bottom of this point.
Others did not feel that open source models would be able to keep up, both from a product experience and model performance perspective. Some also felt that people like Sam Altman would be successful in encouraging regulators to place limitations on who could build the most powerful models, preventing people from continuing to build and work on the most powerful open source models.
One academic commented on the push to privatise research and move away from the culture of open source, at least in industry. He agreed that it could slow the progress of ML research.
Organisations that value both model experimentation and security of data are experimenting with ways to quickly spin up on-prem open source models so that they can leverage different models for different use cases.
Implementing LMs
Multiple ML engineers felt that the future was businesses using smaller models more targetted to their use case, given the cost efficiency and the fact that few businesses need the full power of the largest models.
What is the value of data accumulation and fine tuning at the level of an individual business if businesses opt for models that are already optimised on their use case?
Multiple ML engineers also spoke to their work combining models for their product or use cases. They are combining models like GPT4, Falcon, Bloom, etc in their products, leveraging them in turn depending on the aspect of the product experience that required it. They spoke about doing this for two reasons: cost optimisation and to have more control over the underlying models than is available to them if simply using GPT4.
How is that being managed technically? I have heard from at least one dev that combining these models is challenging from the perspective of managing data pipelines.
Fine-tuning of some kind isn't going anywhere: some kind of "taming" the model to the individual needs of a business is important. However, having to fine-tune doesn't mean that the data you're using to fine tune to your business means you have a competitive moat. Product insights used well could be very important
LLM-specific GPUs
One academic leading research at a leading GPU provider felt that there is little advantage in customing GPUs specifically for the needs of LLMs today, as chips with more flexibility have appropriate performance levels and enables players like NVIDIA to service other needs and maintain their competitive advantage. He felt NVIDIA’s competitive advantage would be lessened were they to specialise in this regard.
One CTO sees requirements for hardware changing so quickly that, given cycle times for production, it would be inefficient for NVIDIA to chase what’s required right now instead of focussing on performance. He spoke about NVIDIA’s chip performance having the fastest matrix multiplication despite not explicitly focussing on what’s required for LLMs. He also shared that chips focussing more explicitly on matrix multiplication doesn't do enough for LLMs, because they need to multiply vectors by matrices.
Incumbency risk
Much of the group at dinner was not concerned about value creation for startups in this wave. They felt that the speed of startup innovation would still mean that they could create value despite incumbents adopting AI quickly. We generally agreed, however, that distribution is likely more important than ever. Companies will may have to place more emphasis on that.
Proprietary data
Many see truly proprietary data as extremely important, but they feel that much of the data referenced by founders is probably not proprietary.
Two academics and one CTO felt that truly proprietary data likely comes from a unique means of measurement, combined with collection of that data over time that improves measurement (a flywheel).
One CTO referenced the underground scans they collect for their computer vision work as truly proprietary, because of their measurement technique and the access required to get it
Superhuman reasoning
Super human reasoning was the strongest area of interest for one academic. He asked the group what they think we need to do to get or manufacture the kind of data required to reach super-human reasoning, given the data we have today is all based on human-limited reasoning.
We discussed what it would mean to measure super-human reasoning and whether we would even recognise it. We also discussed super human creativity, and whether such a thing exists given the subjectivity of creativity and the seeming limitlessness of it relative to reasoning.
A few of us spoke about Go and Chess, as the main (only?) example of AI exceeding human reasoning, or perhaps the only visible benchmark.
Greylock’s “New New Moats” Article
I enjoyed reading this Greylock "The New Moats" article recommended to me by Nilushanan Kulasingham at Stori.ai. It's an article from 2017 on AI updated with the latest from this year.
The summary, which is how we feel at Square Peg (at the moment!), is that traditional moats are likely more important than ever but largely unchanged by this latest wave. A chunk of the piece is very "business 101" but the AI-related annotations are good. Key excerpts I took away below.
If you don’t read the full article, read the closing comments:
While the rise of AI is exciting, in many ways we have come full circle in our quest to build new moats. It turns out that the old moats matter more than ever. If the Google “We have no moats” prediction is true and AI models enable any developer with access to GPT or LLaMA to build systems of intelligence, then how do we build a sustainable business? The value of the application is how to deliver the value. Workflows, integration with data and other applications, brand/trust, network effects, scale and cost efficiency all become drivers of economic value and the creator of moats. Companies that are able to build systems of intelligence will still need to master go-to-market. They’ll have to perfect not just product-market-fit, but product go to market fit.
AI doesn’t change how startups market, sell, or partner. AI reminds us that despite the technology underpinning each generation of technology, the fundamentals of business building remain the same.
The new moats are the old moats.
Excerpts I found interesting from the New Moats article (copied verbatim - not my writing):
Historically, open-source technology has reduced value in whichever layer it is available, and moved the value to adjacent layers. For example, an open source operating system like Linux or Android lessened the dependence of apps on Windows and iOS, and moved more of the value to the app layer. This doesn’t mean there is zero value in the open-sourced layer (Windows and iOS definitely capture value!). At the same time, you can still create value and attack Castles in the Cloud with open-source business models, as we’ve seen achieved by the likes of Databricks, MongoDB, and Chronosphere.
In an era of cloud and open source, deep technology attacking hard problems is becoming a shallower moat. The use of open source is making it harder to monetize technology advances while the use of cloud to deliver technology is moving defensibility to different parts of the product.
In our blog six years ago we highlighted how the adjacent layer that benefited more often than not was the big cloud platforms. But, with respect to open source foundation models, we can see that some of the value that would have been captured by OpenAI or Google can now be shifted to apps and startups and infrastructure around the LLMs. OpenAI and Google can still capture value, and the ability to build and run these giant models at scale is still a moat. Building a developer community and network effects is still a moat, but the value captured by these moats is lessened in a world where open source alternatives exist.
One of the most successful cloud businesses, Amazon Web Services (AWS), has both the advantages of scale but also the power of network effects. More apps and services are built natively on AWS because “that’s where the customers and the data are.” In turn, the ecosystem of solutions attracts more customers and developers who build more apps that generate more data continuing the virtuous cycle while driving down Amazon’s cost through the advantages of scale.
Startup founders who succeed tend to execute a dual-pronged strategy: 1) Attack legacy player moats and 2) simultaneously build their own defensible moats that ride the new wave.
In all of these markets, the battle is moving from the old moats (the sources of the data), to the new moats (what you do with the data).
Content we’ve enjoyed from the Square Peg team + others
Ed (Square Peg) shared this podcast episode, and shared an insight he took away on PyTorch:
“PyTorch is a big thorn in Nvidia's side. CUDA is Nvidia's programming model for GPUs which is part of their ecosystem + leads to lock-in of their hardware. PyTorch is increasingly becoming a standard for GPU programming for developers which is hardware agnostic & eating into Nvidia's lock-in.”
Ed also enjoyed Marc Andreeson’s latest podcast interview with Ben Thompson. Ed called out this quote in particular:
"...other VCs do this, they’ll do things like they’ll say things like, “Well, whatever, X, Y, Z wave is over”. Then you look on their website and they have companies that are working in that sector. You just find yourself — what do those entrepreneurs think about the fact that their investor just said? “You know what? This whole thing is toast, pack it up.” It’s just like, lots of other people can do that, there’s lots of people speaking in public who can draw negative conclusions on things. If you’re going to be in this business, you should support the people you’re working with. In our case, that’s the founders and their long-term dreams."
Paul (Square Peg) enjoyed this post by Ben Evans about automation and the future of work.
Shuki (AI21) shared a post he’s written on a future role in AI, one data scientists will need to adapt to, called the AI Tamer.
In this analogy, the Foundation Model is like a wild animal that has to be tamed in order to be useful. Let’s break it down: FM is a wild animal because it was fed a vast amount of unsupervised data, enabling the animal (i.e. FM) to become very strong (i.e. highly capable). But a wild animal is of no use to humans unless it is tamed carefully. In this regard, AI Tamers are similar to the animal tamers: Through understanding the animal’s incentive mechanisms, and feeding the animal according to its desired behavior, an animal can truly be “reduced from a wild state to a domestic state”.
Indigo (Useful Bird) shared a piece on how to make UX a valuable moat for AI businesses.
As always, I love hearing from people on their thoughts surrounding AI, so please reach out or comment if you have any thoughts on the content above.
Casey
This is the best "AI news" update I've seen, and I read a tonne of them. Thanks so much Casey! I've read most of those sources you referenced, and took away similar learnings. The founder learnings are gold though: we don't get that stuff on Twitter (it's mostly sharing papers and tools), or blogs (mostly summarisers and speculators).
If I become a max paid sub, what is the probability you think you could keep this sort of quality up for >3 months? I'd be so grateful as I'm about to build another company and am trying to be careful with choosing the product I build, and I can see your insights as being fucking helpful as decision making inputs.
Gotta love Aussie writing.
Would never call ourselves ‘Brilliantly Intelligent’. No love for posers or posturing. Or lining ourselves up for a chopping as a tall poppy.
Love a good double entendre ‘Artificially Intelligent’. Beautifully self deprecating.
However, got to love the most, triple entendre.
Brilliant. Intelligent. Cheeky.
Australian.