Serving tech enthusiasts for over 25 years.
TechSpot means tech analysis and advice you can trust.
A hot potato: As more companies jump on the AI bandwagon, the energy consumption of AI models is becoming an urgent concern. While the most prominent players – Nvidia, Microsoft, and OpenAI – have downplayed the situation, one company claims it has come up with the solution.
Researchers at BitEnergy AI have developed a technique that could dramatically reduce AI power consumption without sacrificing too much accuracy and speed. The study claims that the method could cut energy usage by up to 95 percent. The team calls the breakthrough Linear-Complexity Multiplication or L-Mul for short. The computational process uses integer additions, which require much less energy and fewer steps than floating-point multiplications for AI-related tasks.
Floating-point numbers are used extensively in AI computations when handling very large or very small numbers. These numbers are like scientific notation in binary form and allow AI systems to execute complex calculations precisely. However, this precision comes at a cost.
The growing energy demands of the AI boom have reached a concerning level, with some models requiring vast amounts of electricity. For example, ChatGPT uses electricity equivalent to 18,000 US homes (564 MWh daily). Analysts at the Cambridge Centre for Alternative Finance estimate that the AI industry could consume between 85 and 134 TWh annually by 2027.
The L-Mul algorithm addresses this excessive waste of energy by approximating complex floating-point multiplications with simpler integer additions. In testing, AI models maintained accuracy while reducing energy consumption by 95 percent for tensor multiplications and 80 percent for dot products.
The L-Mul technique also delivers proportionally enhanced performance. The algorithm exceeds current 8-bit computational standards, achieving higher precision with fewer bit-level calculations. Tests covering various AI tasks, including natural language processing and machine vision, demonstrated only a 0.07-percent performance decrease – a small tradeoff when factored into the energy savings.
Transformer-based models, like GPT, can benefit the most from L-Mul, as the algorithm integrates seamlessly into the attention mechanism, a crucial yet energy-intensive component of these systems. Tests on popular AI models, such as Llama and Mistral, have even shown improved accuracy with some tasks. However, there is good news and bad news.
The bad news is that L-Mul currently requires specialized hardware. Contemporary AI processing is not optimized to take advantage of the technique. The good news is plans for developing specialized hardware and programming APIs are in the works, paving the way for more energy-efficient AI within a reasonable timeframe.
The only other obstacle would be companies, notably Nvidia, hampering adoption efforts, which is a genuine possibility. The GPU manufacturer has made a reputation for itself as the go-to hardware developer for AI applications. It is doubtful it will throw its hands up to more energy-efficient hardware when it holds the lion's share of the market.
Those who live for complex mathematical solutions, a preprint version of the study is posted on Rutgers University's "arXiv" library.