That Andrew Ng interview is fantastic. Really liked three observations he makes.
1. Instead of AI for Visual Generation, AI for Visual Analysis is a great opportunity.
2. Instead of saas, opportunities for remote edge on mobile
3. Switching costs are not high between large models but after developers have lots of api hooks into e.g. Google cloud
His comments reinforce different experiences we’ve had while spinning of a new team to explore the question ‘can Vision x AI x Spatial x Twins become simple daily handheld tools to accelerate allied health frontline workforce training?’
1. Regarding visual analysis ai.
We’ve found this to be true for seeking utility for AI for workforce training.
GenAI can make digital twins for the trainer to explain their judgment in a mixed reality context.
But Computer Vision with AI, this is aligned to what the Trainer is looking for, their visual reasoning, judgment and assessment. So we’re more convinced to double down on Vision x AI to drive Spatial x Twins to guide visual analysis and judgment. Rather than focus on GenAI.
Faster generation of digital twin props for training does advance training. However aligned visual reasoning has potential to accelerate guidance and analysis for Trainers of different skill levels. And eventually become AI trainer cop-pilot tools for trainees in labour shortage supply industries like Aged care.
He confirms where our team needs to focus. Vision x AI first. Then Spatial and Twins second.
After integrating LandingLens, we trying to simplify spatial computing xr interfaces for computer vision.
2. On edge
We’re finding promising the tests for local edge PyTorch Ultralytics, Apple local CoreML vision models, Google local CV. But we do want Eyes for Vision and a Brain to support visual reasoning.
3. Switching costs, stickiness and momentum
While Microsoft Kosmos is promising, we will focus on integrating Google Gemini for multi modal visual reasoning. Not because it might be better or worse. But related to Andrew’s other comments. Once, our devs are already building in an environment, for us, Google cloud, we’re not just looking at comparable performance between visual reasoning models. But how all the other apis are also hooked in.
As Google has now released Vertex AI studio, with RAG, AB testing and it being cross integrated with Vertex vision. This together with Google’s shared fate indemnity to support teams that build on Gemini as they are convinced they trained in an ethical supply chain. We’re leaning in.
Not Andrew Ng’s first rodeo. Always find if he draws a highlighter around an area to pay attention to. Our team benefits if his understanding can recalibrate or reinforce our focus.
# Andrew’s vision for the mundane and useful
We been listening to his vision around why he made Landing Lens. I love that hope that one day even Pizza Makers can use computer vision in mundane yet useful ways. I’m hoping that will be the same for computer vision in Health, not just for doctors in hospitals running Harrision.ai but for boring useful visual guidance and analysis tools Allied Health occupational therapists and physios visiting our ageing parents home.
I want to see, boring useful visual ai tools running on the edge, solving operational hand off problems like how to coordinate to put a safety grab rail to prevent grandma from falling down when she gets back from the hospital.
Curious to hear from Casey and others about teams they are excited about who are not working on RAG and SaaS. But on visual analysis, remote edge, visual reasoning and the boring operational opportunities, especially for frontline workers on their feet and not at desktops.
If you’re in Sydney, coffee and bubble tea is on me. We can geek out and be excited about the boring, mundane and useful.
Completely agree with the opinion from Kendra Vant. There are many AI wrappers building a friendly user interface with an API to Open AI, it seems it creates the ease of use tools for the developers, however, it’s missing 2 most important elements in GenAI products : 1) if they don’t have any backend algorithm and only just have the front end UI of design a simply SOP, how can they control the quality of the output ? 2) How can they deal with data security / privacy / ownership issues, i.e what are their procedures if Open AI is leaking the data https://www.wired.com/story/openai-custom-chatbots-gpts-prompt-injection-attacks/.
In fact, if a product can be built in minutes by them (which is the promotional text used by many AI wrappers), it can be built by other companies too, what exactly is their ‘moat’ ?
Unfortunately, VCs seem to buy into those ideas, as evidenced by the recent fundraising activities in the Australian AI startup ecosystem.
Another great post, thanks. I agree that so many fellow builders in the space operate on the assumption that fine-tuning is better than prompt engineering. I think it's just because it "feels" like it's an opportunity to create some moat for startups. That paper you shared suggests it's just not the right place to look.
That Andrew Ng interview is fantastic. Really liked three observations he makes.
1. Instead of AI for Visual Generation, AI for Visual Analysis is a great opportunity.
2. Instead of saas, opportunities for remote edge on mobile
3. Switching costs are not high between large models but after developers have lots of api hooks into e.g. Google cloud
His comments reinforce different experiences we’ve had while spinning of a new team to explore the question ‘can Vision x AI x Spatial x Twins become simple daily handheld tools to accelerate allied health frontline workforce training?’
1. Regarding visual analysis ai.
We’ve found this to be true for seeking utility for AI for workforce training.
GenAI can make digital twins for the trainer to explain their judgment in a mixed reality context.
But Computer Vision with AI, this is aligned to what the Trainer is looking for, their visual reasoning, judgment and assessment. So we’re more convinced to double down on Vision x AI to drive Spatial x Twins to guide visual analysis and judgment. Rather than focus on GenAI.
Faster generation of digital twin props for training does advance training. However aligned visual reasoning has potential to accelerate guidance and analysis for Trainers of different skill levels. And eventually become AI trainer cop-pilot tools for trainees in labour shortage supply industries like Aged care.
He confirms where our team needs to focus. Vision x AI first. Then Spatial and Twins second.
After integrating LandingLens, we trying to simplify spatial computing xr interfaces for computer vision.
2. On edge
We’re finding promising the tests for local edge PyTorch Ultralytics, Apple local CoreML vision models, Google local CV. But we do want Eyes for Vision and a Brain to support visual reasoning.
3. Switching costs, stickiness and momentum
While Microsoft Kosmos is promising, we will focus on integrating Google Gemini for multi modal visual reasoning. Not because it might be better or worse. But related to Andrew’s other comments. Once, our devs are already building in an environment, for us, Google cloud, we’re not just looking at comparable performance between visual reasoning models. But how all the other apis are also hooked in.
As Google has now released Vertex AI studio, with RAG, AB testing and it being cross integrated with Vertex vision. This together with Google’s shared fate indemnity to support teams that build on Gemini as they are convinced they trained in an ethical supply chain. We’re leaning in.
Not Andrew Ng’s first rodeo. Always find if he draws a highlighter around an area to pay attention to. Our team benefits if his understanding can recalibrate or reinforce our focus.
# Andrew’s vision for the mundane and useful
We been listening to his vision around why he made Landing Lens. I love that hope that one day even Pizza Makers can use computer vision in mundane yet useful ways. I’m hoping that will be the same for computer vision in Health, not just for doctors in hospitals running Harrision.ai but for boring useful visual guidance and analysis tools Allied Health occupational therapists and physios visiting our ageing parents home.
I want to see, boring useful visual ai tools running on the edge, solving operational hand off problems like how to coordinate to put a safety grab rail to prevent grandma from falling down when she gets back from the hospital.
Curious to hear from Casey and others about teams they are excited about who are not working on RAG and SaaS. But on visual analysis, remote edge, visual reasoning and the boring operational opportunities, especially for frontline workers on their feet and not at desktops.
If you’re in Sydney, coffee and bubble tea is on me. We can geek out and be excited about the boring, mundane and useful.
Completely agree with the opinion from Kendra Vant. There are many AI wrappers building a friendly user interface with an API to Open AI, it seems it creates the ease of use tools for the developers, however, it’s missing 2 most important elements in GenAI products : 1) if they don’t have any backend algorithm and only just have the front end UI of design a simply SOP, how can they control the quality of the output ? 2) How can they deal with data security / privacy / ownership issues, i.e what are their procedures if Open AI is leaking the data https://www.wired.com/story/openai-custom-chatbots-gpts-prompt-injection-attacks/.
In fact, if a product can be built in minutes by them (which is the promotional text used by many AI wrappers), it can be built by other companies too, what exactly is their ‘moat’ ?
Unfortunately, VCs seem to buy into those ideas, as evidenced by the recent fundraising activities in the Australian AI startup ecosystem.
Another great post, thanks. I agree that so many fellow builders in the space operate on the assumption that fine-tuning is better than prompt engineering. I think it's just because it "feels" like it's an opportunity to create some moat for startups. That paper you shared suggests it's just not the right place to look.
For sure. I have done some writing on this - I've just been sitting on it for ages but will publish it soon.