AI Coder
June 23, 2024AI TTS: JS/HTML
July 8, 2024AI Coder
June 23, 2024AI TTS: JS/HTML
July 8, 2024AI Talk Show
⸻ Science Tonight ⸻
Proof of concept
Generative AI has demonstrated remarkable proficiency in text comprehension and creation. Naturally, the evolution into speech was imminent. ChatGPT is leading the way in this domain with its latest iteration, ChatGPT 4.0. In our upcoming update, we will introduce capabilities that allow the AI to express emotions, laughter, and even singing talents. Could this be sufficient for AI to take on the role of a talk show host?
Listen to the Article:
Episode II.
For this episode we included a training file, extra voices using Google TTS, reading questions from a file (questions from the audience) and as a bonus: AI generated ads during the breaks.
The full show was created using just one request.
Request: Your name is John Coder. You are the host of a talk show. Prepare to interview your first guest: a generative AI. Please ask 40 relevant questions for an AI.
Hosting Interviews Using AI and AI TTS Capabilities
To enable AI to take on the role of a talk show host, we need to teach it to perform two main tasks:
- AI: Prepare interview questions based on the guest's background.
- AI TTS: Record the questions as audio.
Here’s a step-by-step guide for this process:
Step 1: Choose Your Language
Select the programming language for the proof of concept. For simplicity, we chose HTML/JavaScript.
Step 2: Create a Training File
Develop a training file that instructs the AI to act as a talk show host. Include relevant information about the guest and ensure the AI can generate follow-up questions based on the guest's answers.
Step 3: Write an AI Chat Request
Write the code to perform the AI request for the interview based on the training file.
Step 4: Implement AI TTS
Write the code to perform a second request to AI TTS to record the full text of the interview as audio.
Step 5: Handle Large Data
Use a method to manage long text/responses, such as splitting the AI-generated text into smaller sections.
Step 6: Using OpenAI 4.0
If you decide to use the OpenAI 4.0 model, you can simplify the process by using a single AI model for both the interview script and audio recording.
Step 7: Update Your Code for Multiple Voices
Update your code to use multiple AI models so each participant can have a distinct voice.
Step 8: Utilize a JS Script
We have prepared a free-to-use JS script for AI TTS with multiple voices.
Step 9: Integrate AI Voice Cloning
Consider integrating an AI Voice Cloning service into your app to make the voices sound more realistic.
We are looking forward to the release of OpenAI's 4.0 real-time capabilities, where the AI can be interrupted while speaking/generating text via voice command.
Practical applications and future potential
With the rise of AI real-time speech and vision capability, having an conversation with AI will become as natural as talking with a friend, teacher or your inner voice.
Here are some potential application for this this type of Apps, apps that GreenCoders can create for you - on request:
- House guardian. Simply turn on your AI powered webcam and it will guard your home. The AI can simply "see" if someone is trying to enter your property or if it's spots any danger (smoke, fires, floodings, etc.). Using AI speech capabilities, the AI can engage with a person, make emergency calls or simply order online products for you.
- AI Virtual Teacher. Using your laptop or phone you can create an AI virtual teacher in any domain. The virtual teacher will have a face, and it will be able to "see" hear and speak with you or your kid or the entire class.
- AI podcaster. As we just learned, AI can now host LIVE interviews with real people. Maybe include a real-time avatar, and you are ready to go. Work in progress - by GreenCoders Labs.
Conclusion
Providing AI with voice and vision capabilities is a significant advancement for any generative AI system. The primary barrier to having real-life cyborgs walking the streets is not the lack of technical know-how, but rather the costs involved. For instance, enabling an AI to simultaneously process and interpret sounds, images, and text requires substantial financial investment. Nevertheless, a prototype can be created even today.
The benefit is that you don't need to equip your application or robot with expensive hardware; as the AI can handle these tasks for you. However, this topic warrants further discussion, which we will cover in a future article.