Language models can enable more realistic dialogue and interaction with non-player characters (NPCs) in video games.
LLMs can generate in-game dialogue that takes into account the context of the game and the player’s interactions in real-time.
At CES 2024, Nvidia and Convai, a startup specializing in real-time generative conversations in virtual worlds, showed an updated version of a demo in which a player can have a real-time conversation with two non-player characters in a cyberpunk bar.
The new demo goes beyond character-to-character interaction by allowing game characters to collect items and navigate their environment based on conversations with players.
LLM conversations are not yet a replacement for hand-written dialogue in the specific context of a larger story. But they could make the mostly superficial and irrelevant NPC dialogue in large game worlds much more personal and varied, thus increasing immersion.
Smaller teams in particular could benefit from this technology, but Strauss Zelnick, CEO of gaming juggernaut Take-Two, hinted that it could be an interesting use case for games like GTA.
There are still challenges to implementing LLMs in games, such as latency or the risk of the AI making false statements or generating content that is inappropriate for the game. Developers can mitigate this to some extent by providing thoughtful prompts and guidelines for the AI.
Nvidia’s Avatar Cloud Engine (ACE) for generative AI characters
The demo runs on Nvidia’s Avatar Cloud Engine (ACE), a generative AI platform for more realistic dialog and interaction in video games. It includes several components, including NeMo for large language models, Riva for speech recognition and text-to-speech conversion, and Audio2Face for facial animation.
Convai’s tools, combined with Nvidia’s ACE, can help improve the latency and quality of non-playable AI characters in video games.
Startup Replica has developed an “AI Voice Plugin” for the Unity and Unreal game development platforms that supports various LLMs. The plugin can lip-sync sentences generated by LLMs with the AI voices of NPCs and display them with the corresponding body language. You can see the plugin in action in this Matrix demo.