Want the power of an AI assistant without relying on the cloud? This is your guide to building a local voice assistant using Verbi, an open-source project that puts you in control.
🗝️ Why Go Local? 🤔
Imagine having a voice assistant that:
- Respects your privacy: Your conversations stay on your device. 🤫
- Works offline: No internet? No problem! 📶🚫
- Responds instantly: Lightning-fast responses without server lag. ⚡
That’s the power of going local!
🏗️ Building Your Local Voice Assistant: The 3 Pillars 🏛️
Think of your voice assistant like a relay race:
- 👂 Speech to Text (Whisper API): Converts your spoken words into text.
- 🧠 Language Model (OLAMMA): Understands the text and generates responses.
- 🗣️ Text to Speech (Mello TTS): Transforms the responses back into spoken words.
Let’s break down how to set up each component locally:
1. 👂 From Sounds to Words: Fast Whisper API 💨
- What it does: Transcribes your voice into text with impressive accuracy.
- Why it’s cool: Built on the Whisper model, known for its speed and efficiency.
- How to set it up:
- Clone the Fast Whisper API repository.
- Install the necessary packages.
- Run the API to start the server.
💡Pro Tip: Create a virtual environment to keep your project dependencies organized.
2. 🧠 The Brain: Running LLMs Locally with OLAMMA 🧠
- What it does: Acts as the brain of your assistant, understanding your requests and generating responses.
- Why it’s cool: Lets you run powerful language models like LLaMA on your own hardware.
- How to set it up:
- Install OLAMMA on your machine.
- Download your desired language model (e.g., LLaMA 38B).
- Run the OLAMMA client to start the server.
🤯 Fun Fact: LLaMA stands for “Large Language Model Meta AI.”
3. 🗣️ Giving Your Assistant a Voice: Mello TTS 🎶
- What it does: Converts text responses into natural-sounding speech.
- Why it’s cool: Offers a variety of voices to choose from, making your assistant more personable.
- How to set it up:
- Clone the Mello TTS repository.
- Install Mello TTS as a package.
- Download the speech model files.
- Run the Mello TTS API endpoint.
💡Pro Tip: Experiment with different speaker IDs to find a voice you like!
🧩 Putting It All Together: Configuring Verbi ⚙️
- 1. Configuration is Key: Open the
config.py
file in Verbi and update the following:- Transcription Model: Set to
fastwhisperapi
- Response Model: Set to
llama
- Text to Speech Model: Set to
mellowtts
- Transcription Model: Set to
- 2. Start Talking: Run the
run_voice_assistant.py
script.
🎉 Congratulations! You’ve built your very own local voice assistant!
🧰 Resource Toolbox 🧰
- Verbi GitHub: https://github.com/PromtEngineer/Verbi
- Fast Whisper API: https://github.com/3choff/FastWhisperAPI.git
- Mello TTS Installation: https://github.com/myshell-ai/MeloTTS/blob/main/docs/install.md#linux-and-macos-install
- OLAMMA: https://github.com/facebookresearch/llama (Find instructions for installing and running OLAMMA here)
This setup provides a solid foundation for a local voice assistant. Remember, the world of open-source is constantly evolving, so keep exploring and experimenting to create an assistant that truly meets your needs!