Ruminations #11: Two experts on where to build in AI and Andrew Ng's advice at NeurIPS
Supercycles and how to survive them
Hi all,
Welcome to 2024! I spent my break with family playing Monopoly Deal, surfing and addictively making my way through the levels of this CPU-building game Turing Complete.
Before we kick off can I ask a favour? I’d really love to hear what kind of content would be helpful for you here. Do you want to hear from VCs investing in AI? From researchers in the space? From our team on how we're shaping our thesis? Something else? Your feedback would mean a lot to me! You can reply to this email.
In this edition:
Notes from a great write-up on technology cycles and a paper on generalist models vs domain-specific
Expert Kendra Vant shares what she’s looking for in AI startups
Content we’ve enjoyed
Capex & Opex supercycles — the dusk of SaaS and the dawn of AI-SaaS
Thank you to the researcher at Canva who sent me this great piece from a Partner at Lightspeed. It’s quite relevant to founders. Below are the excerpts I shared with the Square Peg team:
He centres much of the piece around these three stages:
Schumpeterian forces that create what are ‘extractive’ industries — industries that create the new as they destroy the old, strive to produce more with less, and unlock some net-new resource for the world.
Next comes the Ricardian forces that take this new resource and distribute it to new markets, make this resource accessible to all. Generally these forces deal in trade, expansion, and competitive dynamics.
Finally, the Malthusian forces create fear that there won’t be enough for everyone. That as resources expand, so shall the need for it, till there isn’t enough for everyone. And thus the cycle goes back to ‘extractive’ — extract something else, extract more, saturate, restart.
These were his three finishing thoughts:
My first AI project was back in the early 2000s. I’ve seen a few of these hypes and winters come and go. 2024 will be another winter but AI is a one-way street now. In 2005–10 you could be a social media sceptic and survive as a CMO. No more. In early 2010–2015 you could be a cloud sceptic and survive as a CTO. No more. A few years ago, you could be an AI sceptic and survive as a CEO. No more. You have to be all in.
If you are building in the ‘extractive’ part of AI, great — $$s will chase you more easily. But you are also likely to suffer from the great ‘pinch-point’ of the hour-glass that’s coming. It’s good to play the picks and shovels game in a gold rush but how many picks and shovels companies do you remember? This is the category that’ll likely see a lot of M&A in the coming years as big platform players will do tuck-ins. As a founder, your job would be to build fast into the available gap today, but then start to sprawl horizontally to build a platform quickly.
If you are building into the ‘distributive’ part of AI, I guess the obvious solve here is that you need to own some of the data, the models, or own parts of the training or inference stack that gives you an edge over others to begin with. In other words, you need to get closer to the ‘extractive’ part of the layer to get to faster value accretion. Another strategy is to just play the long, long game. Maybe raise less, take a few years to get to scale profitably, and once the lower layers in AI stabilize and we are clearly entering the distributive phase of AI, raise a larger round and scale up!
Can Generalist Foundation Models Outcompete Special-Purpose Tuning?
A researcher at Meta shared this paper with me over the weekend, knowing that I have an interest in what data-based competitive advantages will look like going forward.
The argument I hear from founders is that, with enough data, you can outcompete your competitors with more performant models. I think that’s less true than ever thanks to foundation models.
Here’s some of the abstract from the paper:
Generalist foundation models such as GPT-4 have displayed surprising capabilities in a wide variety of domains and tasks. Yet, there is a prevalent assumption that they cannot match specialist capabilities of fine-tuned models. For example, most explorations to date on medical competency benchmarks have leveraged domain-specific training, as exemplified by efforts on BioGPT and Med-PaLM.
…
Rather than using simple prompting to highlight the model's out-of-the-box capabilities, we perform a systematic exploration of prompt engineering. We find that prompting innovation can unlock deeper specialist capabilities and show that GPT-4 easily tops prior leading results for medical benchmarks.
This loosely aligns with what I heard from Andrew Ng while at NeurIPS. His advice was:
Know that fine tuning is for changing the style and tone, whereas retrieval is for accuracy. Andrew suggests working hard with prompting and use fine-tuning only if necessary. Think of these models now as reasoning engines, not as memory machines.
Additionally, don’t assume that short prompts are the clearest prompts for an LLM. Generally longer prompts are helpful as they offer more detail.
Consider that AI doesn’t automate jobs, it automates tasks. Don’t focus on jobs to be done. Focus on tasks to be done. Use that to stitch together solutions, as opposed to try and have it solve everything. Don’t look at job descriptions to understand what tasks your customers need done - look at what they actually do each day and then rank those tasks by their relevance to solving with AI.
Going forward Andrew is excited by customised models, in which models actually get to know you. He’s also excited about more complex agents in which there are deeper sequences for completing tasks.
Expert Q&A
This is a new section I’ve started where I’m sharing answers I’ve received from experts in the field on how they think about all things AI. This edition I asked a question of Kendra Vant who describes herself as a “forever curious lapsed physicist with an insatiable interest in real-world impacts and seven years spent building AI-powered products in SaaS companies with global reach.”
I love her response to this week's question. I asked:
If you were an investor, what types of businesses would be most interesting to you? What would be an indicator of an interesting business to you?
A lot of founder teams are looking to play in the Generative AI developer tools space. This is understandable given the enormous amount of attention focussed there and the potential to build a hyperscale business. However I don’t see it as an area that provides great opportunities for startups just getting going today. It’s too competitive and data gravity, funding and access to compute tips the scales massively towards the current major players.
Instead, I agree with the thesis of Andrew Ng and the AI Fund. The existing and emerging toolset around supervised learning and generative AI - the two most commercially useful paradigms of AI today - now provides the opportunity for founders who are subject matter experts in well scoped, hard / boring problems that will save companies money, time or both, to create great businesses with solid opportunities for growth and acquisition.
Andrew uses the example of Bearing.ai - a startup using AI to help save fuel costs in maritime shipping. Similarly, I’m impressed by Phaidra.ai, a closed loop AI control service that delivers step function improvements in plant stability and energy efficiency for energy-intensive industrial processes. And I’m intrigued by Ajust, working to resolve consumer complaints and improving the relationships between consumers and businesses, and RapidAIM, creating accurate pest forecasts and driving down the need for insecticide application.
The toolsets available today - from Open AI and others - have massively increased what’s possible for a small team with limited data and compute to achieve. However, to move at speed, you still need experience in end-to-end AI product design, evaluation and machine learning engineering and ops to run a responsible, reliable and cost effective product.
So I look for a well defined problem statement: something that users will pay to have removed from their list of frustrations. Substantial subject matter expertise in the core business problem to be solved. And substantial experience in designing and building AI systems at scale. Where those three are combined, I would be very interested in knowing more.
Check out Kendra’s Data Runs Deep Substack here:
That’s all! Feedback is very, very welcome.
That Andrew Ng interview is fantastic. Really liked three observations he makes.
1. Instead of AI for Visual Generation, AI for Visual Analysis is a great opportunity.
2. Instead of saas, opportunities for remote edge on mobile
3. Switching costs are not high between large models but after developers have lots of api hooks into e.g. Google cloud
His comments reinforce different experiences we’ve had while spinning of a new team to explore the question ‘can Vision x AI x Spatial x Twins become simple daily handheld tools to accelerate allied health frontline workforce training?’
1. Regarding visual analysis ai.
We’ve found this to be true for seeking utility for AI for workforce training.
GenAI can make digital twins for the trainer to explain their judgment in a mixed reality context.
But Computer Vision with AI, this is aligned to what the Trainer is looking for, their visual reasoning, judgment and assessment. So we’re more convinced to double down on Vision x AI to drive Spatial x Twins to guide visual analysis and judgment. Rather than focus on GenAI.
Faster generation of digital twin props for training does advance training. However aligned visual reasoning has potential to accelerate guidance and analysis for Trainers of different skill levels. And eventually become AI trainer cop-pilot tools for trainees in labour shortage supply industries like Aged care.
He confirms where our team needs to focus. Vision x AI first. Then Spatial and Twins second.
After integrating LandingLens, we trying to simplify spatial computing xr interfaces for computer vision.
2. On edge
We’re finding promising the tests for local edge PyTorch Ultralytics, Apple local CoreML vision models, Google local CV. But we do want Eyes for Vision and a Brain to support visual reasoning.
3. Switching costs, stickiness and momentum
While Microsoft Kosmos is promising, we will focus on integrating Google Gemini for multi modal visual reasoning. Not because it might be better or worse. But related to Andrew’s other comments. Once, our devs are already building in an environment, for us, Google cloud, we’re not just looking at comparable performance between visual reasoning models. But how all the other apis are also hooked in.
As Google has now released Vertex AI studio, with RAG, AB testing and it being cross integrated with Vertex vision. This together with Google’s shared fate indemnity to support teams that build on Gemini as they are convinced they trained in an ethical supply chain. We’re leaning in.
Not Andrew Ng’s first rodeo. Always find if he draws a highlighter around an area to pay attention to. Our team benefits if his understanding can recalibrate or reinforce our focus.
# Andrew’s vision for the mundane and useful
We been listening to his vision around why he made Landing Lens. I love that hope that one day even Pizza Makers can use computer vision in mundane yet useful ways. I’m hoping that will be the same for computer vision in Health, not just for doctors in hospitals running Harrision.ai but for boring useful visual guidance and analysis tools Allied Health occupational therapists and physios visiting our ageing parents home.
I want to see, boring useful visual ai tools running on the edge, solving operational hand off problems like how to coordinate to put a safety grab rail to prevent grandma from falling down when she gets back from the hospital.
Curious to hear from Casey and others about teams they are excited about who are not working on RAG and SaaS. But on visual analysis, remote edge, visual reasoning and the boring operational opportunities, especially for frontline workers on their feet and not at desktops.
If you’re in Sydney, coffee and bubble tea is on me. We can geek out and be excited about the boring, mundane and useful.
Completely agree with the opinion from Kendra Vant. There are many AI wrappers building a friendly user interface with an API to Open AI, it seems it creates the ease of use tools for the developers, however, it’s missing 2 most important elements in GenAI products : 1) if they don’t have any backend algorithm and only just have the front end UI of design a simply SOP, how can they control the quality of the output ? 2) How can they deal with data security / privacy / ownership issues, i.e what are their procedures if Open AI is leaking the data https://www.wired.com/story/openai-custom-chatbots-gpts-prompt-injection-attacks/.
In fact, if a product can be built in minutes by them (which is the promotional text used by many AI wrappers), it can be built by other companies too, what exactly is their ‘moat’ ?
Unfortunately, VCs seem to buy into those ideas, as evidenced by the recent fundraising activities in the Australian AI startup ecosystem.