New ask Hacker News story: Open Sourcing our Startup – AI-powered avatars (UE 5.2)
Open Sourcing our Startup – AI-powered avatars (UE 5.2)
4 by henryobj | 1 comments on Hacker News.
Hi HN TL;DR: Had to shut down our startup SPAR - Open Sourcing the code https://ift.tt/hfvFDrX In 2024, we developed an AI agent infrastructure to support realistic, personality-driven AI avatars in real-time. The business use case was to provide a new training (sparring) and onboarding tool for companies. In particular, for companies that need to train customer-facing employees (ex, high-end retail) To achieve the above, we were orchestrating three servers: 1. The first to run a Metahuman on Unreal Engine (5.2); 2. The second to run a custom finetuned open-sourced LLM; 3. The third to handle all the rest, connecting to the above two servers and streaming (WebRTC) on the client's browser, while coordinating with external APIs (Text-to-Speech and Speech-to-Text, etc.). Key features: * Real-time interactions with distinct avatar personalities. * Fine-tuning toolkit for customizing and refining LLM-generated dialogues. * Structured feedback system that links actionable guidance directly to conversation points. The future will use AI and immersive experiences to practice soft skills. We will not be building this future, but if you are, feel free to use our work to accelerate yours
4 by henryobj | 1 comments on Hacker News.
Hi HN TL;DR: Had to shut down our startup SPAR - Open Sourcing the code https://ift.tt/hfvFDrX In 2024, we developed an AI agent infrastructure to support realistic, personality-driven AI avatars in real-time. The business use case was to provide a new training (sparring) and onboarding tool for companies. In particular, for companies that need to train customer-facing employees (ex, high-end retail) To achieve the above, we were orchestrating three servers: 1. The first to run a Metahuman on Unreal Engine (5.2); 2. The second to run a custom finetuned open-sourced LLM; 3. The third to handle all the rest, connecting to the above two servers and streaming (WebRTC) on the client's browser, while coordinating with external APIs (Text-to-Speech and Speech-to-Text, etc.). Key features: * Real-time interactions with distinct avatar personalities. * Fine-tuning toolkit for customizing and refining LLM-generated dialogues. * Structured feedback system that links actionable guidance directly to conversation points. The future will use AI and immersive experiences to practice soft skills. We will not be building this future, but if you are, feel free to use our work to accelerate yours
 
Comments
Post a Comment