New ask Hacker News story: Ask HN: Is synthetic data generation practical outside academia?
Ask HN: Is synthetic data generation practical outside academia?
2 by cpard | 0 comments on Hacker News.
I keep seeing synthetic data pipelines powering the latest LLM “breakthroughs”: • TinyZero’s $30 fine-tuning workflow • Sky-T1’s $450 reasoning-model build • Meta AI’s Llama 3 herd (2024 paper detailing their synthetic-data training) • Berkeley OpenThoughts (“Data Recipes for Reasoning Models”), published yesterday There are also open-source toolkits you can experiment with: https://ift.tt/Uyx2NuT https://ift.tt/PRBFX61 But it still feels very research-oriented. I haven’t found many examples of these pipelines running in real-world products. I’m curious: 1. Who is using synthetic-data pipelines in production today? 2. What tasks does it actually improve. E.g. fine-tuning smaller models for specific tasks? Any real-world stories, pointers, or further reading would be hugely appreciated. Thanks!
2 by cpard | 0 comments on Hacker News.
I keep seeing synthetic data pipelines powering the latest LLM “breakthroughs”: • TinyZero’s $30 fine-tuning workflow • Sky-T1’s $450 reasoning-model build • Meta AI’s Llama 3 herd (2024 paper detailing their synthetic-data training) • Berkeley OpenThoughts (“Data Recipes for Reasoning Models”), published yesterday There are also open-source toolkits you can experiment with: https://ift.tt/Uyx2NuT https://ift.tt/PRBFX61 But it still feels very research-oriented. I haven’t found many examples of these pipelines running in real-world products. I’m curious: 1. Who is using synthetic-data pipelines in production today? 2. What tasks does it actually improve. E.g. fine-tuning smaller models for specific tasks? Any real-world stories, pointers, or further reading would be hugely appreciated. Thanks!
Comments
Post a Comment