Thanks to AI, you can now Whisper to an LLaMA



summary
Summary

Alexa, Siri, and similar voice assistants have long been touted as the next big thing in computer interfaces, but so far, they have not lived up to expectations. Large language models could change that.

To take Alexa and Co. to the next level, we need (at least) four advances:

  • Reliable and flexible speech recognition,
  • authentic voice output,
  • flexible, consistent conversational interaction
  • including task deduction and execution.

Items two through four could be addressed by the new generation of ChatGPT-like language models: They can provide credible, continuous dialog and voice output via speech model APIs, and perform complex tasks via plugins or code. Today’s language models are far more competent and flexible than anything Alexa, Siri, and the like offer today.

While tool-based language models are still in their infancy, AI models such as OpenAI’s Whisper have made reliable speech recognition a practical reality.

ad

Whisper meets LLaMA

Developer Georgi Gerganov’s “LLaMA Voice Chat” offers a taste of a next-generation assistant based entirely on open-source technology. Gerganov has made the OpenAI speech recognition model “Whisper” executable in C/C++ on the Apple Neural Engine. The video below shows it in action on an iPhone 13.

Video: Gerganov

According to Gerganov, his Whisper model is so powerful that it runs smoothly on multiple platforms: from iOS to Android to Raspberry Pi, and even in the browser using WebAssembly.

Initially, it is “just” a high-level speech transcription. But it becomes an interface when combined with other software, such as a large language model. For demonstration purposes, Gerganov uses Meta’s LLaMA language model, which provides responses to user text transcribed via Whisper.

As a prompt for LLaMA, Gerganov guides the model to be a “helpful, friendly, and honest” assistant who can write well and provide direct, detailed responses. The following video shows the Whisper-LLaMA combination in action.

Recommendation

exemplary LLaMA implementations on Github.



Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top