Discussion about this post

User's avatar
Oren Bahari's avatar

Why is the operator/visual workflow data important? The DeepSeek R1 paper in my mind argued that R1-Zero's impact, self reinforcement on evaluatable objectives, was more than the human chain SFT on top. Models with richer reasoning chain data are still better, but why then operator. My intuition is that the data on Airbnb's feedback form page is low signal, especially when you lack the primitives on how its built. Am I missing something?

Expand full comment
5 more comments...

No posts