Ah… what a wild, wild few days!
He’s gone… no he’s not… yes he is… no… he’s not!
What a whirlwind. OpenAI has just tweeted:
We have reached an agreement in principle for Sam to return to OpenAI as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo.
We are collaborating to figure out the details. Thank you so much for your patience through this.
I won’t write much about this at the moment - there’s enough speculation already. For now, I’m following Kara Swisher as she somehow always has the inside scoop.
In other news, I’m in San Francisco the week beginning 4th of December! I’m meeting lots of AI devs, researchers and founders and I’d love recommendations for others I should meet while I’m over there. I’d be super appreciative of any suggestions.
Answers from experts
There are some really, really bright people working on AI with much deeper knowledge than I.
I love asking experts questions on all things AI and I want to share their answers with you. This edition I asked a question of Jamie Hall, who is absolutely excellent (and is hiring software engineers, by the way!). Before leaving to co-found OpTech.ai Jamie worked in the LaMDA team at Google and he’s a well of knowledge.
I asked Jamie: What are people getting wrong about AI?
He replied:
Language models' ability to be factual is underrated. Everyone is now well aware of hallucination and the obvious risks it poses, but as a result they're making wrong assumptions about what AI can and will do.
The first point to bear in mind is that the general problem of factual grounding is only relevant if you're trying to build a very general language system that can handle anything the user says. If you're building an AI-first product with a specific focus and a sensible engineering strategy, you will have broken your key data problems down into well-specified operations that LLMs find straightforward. Consequently, the risk from hallucination should be irrelevant.
As the research tech lead for factuality on the LaMDA team, where we were deliberately aiming to solve the general problem of factual grounding on any dialogue domain, I was able to see the model gradually improve from "very hit-and-miss" in the early days to "pretty good" by 2022. (Excuse the vague hand-waving; the exact techniques we used to measure factual grounding are part of Google's trade secrets.) There are plenty more improvements in the pipeline to come, and a lot of interest in factual grounding at other AI companies and at universities, so expect the large models to continue steadily improving.
You sometimes hear claims that LLMs are "stochastic parrots" which are fundamentally incapable of making factually grounded statements. The research literature which makes those claims is now years out of date, and simply ignores the techniques in retrieval and factual grounding that have been developed since 2020. In my opinion, these claims are misleading and inaccurate to an ironic degree.
Overall, the situation with LLM factuality is like the early internet. In the late 90s, academics such as Hubert Dreyfus wrote books arguing that it was categorically impossible to find genuine knowledge on the world-wide web, and in the early 2000s, the idea of fact-checking something on Wikipedia was literally a punchline. Those attitudes were obsolete just a few years later. Why? The underlying tech gradually but steadily improved, and at the same time we all got used to how to use it properly. I predict that the same thing will happen with LLMs.
Content we’ve enjoyed
Casey: My friend James released a paper from a project of his that I thought was really interesting. From the paper & Github page:
Text-to-image diffusion models understand spatial relationship between objects, but do they represent the true 3D structure of the world from only 2D supervision? We demonstrate that yes, 3D knowledge is encoded in 2D image diffusion models like Stable Diffusion, and we show that this structure can be exploited for 3D vision tasks.
Ben enjoyed this tweet thread from Matthew Prince at Cloudflare about why Microsoft is not ultimately benefiting from this weekend’s OpenAI mess. I think it’s always useful to remember that large players like Microsoft and Google operate under a very different level of regulatory scrutiny to startups. Microsoft in particular has a history of getting into trouble with competition law, and so it will be interesting to see how regulators perceive where things land.
Ben and our Head of IT, Mick, also shared this Tweet about pressure testing recall performance of GPT-4 with the commentary:
This is super interesting and a great reminder that we are still dealing with 1s and 0s with these models - they’ll keep getting better of course but [there’s] lots we can’t rely on. Worth reading the whole tweet.
For the lazy among us, here are the takeaways shared in the tweet:
So what:
No Guarantees - Your facts are not guaranteed to be retrieved. Don’t bake the assumption they will into your applications.
Less context = more accuracy - This is well know, but when possible reduce the amount of context you send to GPT-4 to increase its ability to recall.
Position matters - Also well know, but facts placed at the very beginning and 2nd half of the document seem to be recalled better.
Lucy shared an excellent report from Coatue about AI with the commentary:
I'm primarily interested in companies in the application layer as opposed to model layer and below. I think the winners are going to be few in the model layer and below and most of them will likely be in the US. The talent is scarce right now and concentrated over there.
Think this platform shift has parallels to mobile where incumbents do have significant advantages (access to massive amounts of data + distribution + how fast they've already moved to roll out features). But startups have an opportunity to build entirely different product experiences/interfaces or where it's unlocked an old school category's labor bucket in a way that wasn't possible before (e.g. construction, logistics).
Multi-modal will obviously be a huge driver of different product experiences so excited to stay across developments in that. Right now, we're still in early innings so a lot of what I'm seeing remain heavily text based but recently, I met a company that was doing speech to text in a defence context which makes a lot of sense (e.g. you're in the battlefield, you want to be able to instruct via your voice than via a screen)
Some of the slides I, Casey, find useful or interesting:
That’s all!
Casey
That slide on models on Gpus or mac silicon. Reminded me of the first time our Apple hardware dev team had to buy a windows machine for the rtx card to use nvidia omniverse software because Apple dont place nice with nvidia gpus these days. And have split with their silicon.
I wonder if apple will let nvidia external gpu play nice with apple hardware in the future. It may not make sense competitively. However, dev teams can also prefer apple toolchains but also want to co-dev with nvidia software. My guess is that we will see nvidia external gpus for mac silicon to reduce enterprise and creative dev team friction. Especially since nvidia and apple in alliance for openusd. They would want the software dev velocity not to be hindered.
Thanks as always Casey.
I agree with Coatue's approach about focusing on the application layer. This isn't the same paradigm as focusing on the platforms like the last 15 years of B2B SaaS - so much about the models is open that there are foundational models evolving constantly.
I believe value is going to start accruing to the application of this technology to solve problems, rather than trying to capture a sliver of the value that the underlying LLM APIs provide.
I'm already building my product architecture in a manner that allows me to switch out the foundational model without too many changes. THAT'S A HUGE DEAL when it comes to evaluating where value is going to accrue in this AI phase in tech.