Training and deploying generative AI is expensive, especially if you want to scale it for millions of customers, as Microsoft does.
Last fall, reports surfaced that Microsoft was ramping up its research into smaller, more efficient AI models.
Now, according to The Information, Microsoft is doubling down on that approach. A new GenAI team is tasked with developing smaller and cheaper conversational AI. The publication cites two people with direct knowledge of the matter.
These so-called “small language models” (SLMs) are intended to mimic the quality of large language models, such as OpenAI’s GPT-4, using significantly less computing power. SLMs could be used to process simple queries to chatbots like Bing and Windows Copilot to save on computing costs.
Microsoft has moved several leading AI developers from its research group to the new GenAI team, including Sebastien Bubeck, who also worked on Microsoft’s flagship SLM, Phi-2. Microsoft released Phi-2 as open source earlier this year and said it outperforms Google’s commercial SLM Gemini Nano.
The GenAI team is led by corporate vice president Misha Bilenko and reports to CTO Kevin Scott. Microsoft also has a Turing team developing large-scale language models. Turing models are used in Copilot products, sometimes in combination with OpenAI models. Again, Microsoft’s models are intended to do the easier work and save costs.
Scaling is everything in AI, but it is expensive
Scaling is the dominant theme in AI, both in model development and deployment. Models are expected to become more capable, which can lead to more computationally intensive training phases and higher costs to run the models.
At the same time, companies like Microsoft try to deploy the technology to as many people as possible as quickly as possible to achieve lock-in effects in the race for market share.
Without efficiency gains, the price spiral will only accelerate in this scenario.
According to an anonymous source from the Wall Street Journal, Microsoft lost more than $20 per user per month on the generative code AI Github Copilot in the first few months of the year. Some users are said to have cost as much as $80 per month. Microsoft charges $10 per month.
By the fall of 2023, Microsoft research chief Peter Lee is said to have tasked “many” of the company’s 1,500 researchers to develop smaller and cheaper conversational AI systems.
AI providers are also investigating how to reduce their reliance on expensive AI chips like Nvidia’s by developing chips that are cheaper and more efficient. Nvidia’s expensive processors are a cost driver, in part because they are difficult to source.
OpenAI CEO Sam Altman, worried about a chip shortage, is said to be in talks with TSMC to start a chip company that would have OpenAI as its main customer. It could take years for these efforts to have a positive impact on costs.
Microsoft’s Lee is said to have instructed his team to use much of the 2,000 Nvidia graphics cards available to his research unit to develop more efficient AI models.
At the same time, the focus on cost efficiency must not come at the expense of quality, as this could reduce utility and slow the pace of AI adaptation. The most powerful models, typically GPT-4, are just good enough for many text tasks.
Since the release of GPT-4 in March, ChatGPT users have repeatedly complained that the performance of the model has decreased, which could be related to efficiency measures taken by OpenAI. There is only anecdotal evidence that this might be the case.
With new models like GPT-4 Turbo, OpenAI is likely aiming for efficiency gains. Efficiency is reportedly the focus of OpenAI’s new generation of models, which is why the current prototype models are named after deserts.