Today's AI ecosystem is unsustainable for most everyone but Nvidia, warns top scholar

1 week ago 7

The economics of artificial intelligence are unsustainable for just about everyone other than GPU chip-maker Nvidia, and that poses a big problem for the new field's continued development, according to a noted AI scholar.

Also: Gartner's 2025 tech trends show how your business needs to adapt - and fast

"The ecosystem is incredibly unhealthy," said Kai-Fu Lee in a private discussion forum earlier this month. Lee was referring to the profit disparity between, on the one hand, makers of AI infrastructure, including Nvidia and Google, and, on the other hand, the application developers and companies that are supposed to use AI to reinvent their operations.

Lee, who served as founding director of Microsoft Research Asia before working at Google and Apple, founded his current company, Sinovation Ventures, to fund startups such as 01.AI, which makes a generative AI search engine called BeaGo.

Lee's remarks were made during the Collective[i] Forecast, an interactive discussion series organized by Collective[i], which bills itself as "an AI platform designed to optimize B2B sales."

Today's AI ecosystem, according to Lee, consists of Nvidia, and, to a lesser extent, other chip makers such as Intel and Advanced Micro Devices. Collectively, the chip makers rake in $75 billion in annual chip sales from AI processing. "The infrastructure is making $10 billion, and apps, $5 billion," said Lee. "If we continue in this inverse pyramid, it's going to be a problem," he said.

The "inverse pyramid" is Lee's phrase for describing the unprecedented reversal of classic tech industry economics. Traditionally, application makers make more money than the chip and system vendors that supply them. For example, Salesforce makes more money off of CRM applications than do Dell and Intel, which build the computers and chips, respectively, to run the CRM applications in the cloud.

Also: Bank of America survey predicts massive AI lift to corporate profits

Such healthy ecosystems, said Lee, "are developed so that apps become more successful, they bring more users, apps make more money, infrastructure improves, semiconductors improve, and goes on." That's how things played out not only in the cloud, said Lee, but also in mobile computing, where the fortunes of Apple and ARM have produced winners at the "top of the stack" such as Facebook's advertising business.

Conversely, "When the apps aren't making money, the users aren't getting as much benefit, then you don't form the virtuous cycle."

Returning to the present, Lee bemoaned the lopsided nature of Nvidia's marketplace. "We'd love for Nvidia to make more money, but they can't make more money than apps," he said, referring to AI apps.

The development of the ecosystems of the cloud, personal computers, and mobile "are clearly not going to happen today" at the current rate of spending on Nvidia GPUs, said Lee. "The cost of inference has to get lower" for a healthy ecosystem to flourish, he said. "GPT-4o1 is wonderful, but it's very expensive."

Lee came to the event with more than a warning, however, offering a "pragmatic" recommendation that he said could resolve the unfortunate economic reality. He recommended that companies build their own vertically integrated tech stack the way Apple did with the iPhone, in order to dramatically lower the cost of generative AI.

Also: The journey to fully autonomous AI agents and the venture capitalists funding them

Lee's striking assertion is that the most successful companies will be those that build most of the generative AI components -- including the chips -- themselves, rather than relying on Nvidia. He cited how Apple's Steve Jobs pushed his teams to build all the parts of the iPhone, rather than waiting for technology to come down in price.

"We're inspired by the iPhone," said Lee of BeaGo's efforts. "Steve Jobs was daring and took a team of people from many disciplines -- from hardware to iOS to drivers to applications -- and decided, these things are coming together, but I can't wait until they're all industry-standard because by then, anybody can do it," explained Lee.

The BeaGo app, said Lee, was not built on standard components such as OpenAI's GPT-4o1, or Meta Platforms's Llama 3. Rather, it was assembled as a collection of hardware and software developed in concert.

"Through vertical integration, [we designed] special hardware that wouldn't work for necessarily other inference engines," explained Lee. For example, while a GPU chip is still used for prediction-making, it has been enhanced with more main memory, known as high-bandwidth memory (HBM), to optimize the caching of data.

Also: Businesses still ready to invest in Gen AI, with risk management a top priority

The software used for BeaGo is "not a generic model." Without disclosing technical details, Lee said the generative AI large language model is "not necessarily the best model, but it's the best model one could train, given the requirement for an inference engine that only works on this hardware, and excels at this hardware, and models that were trained given that it knows it would be inference on this hardware."

Building the application -- including the hardware and the novel database to cache query results, has cost BeaGo and its backers $100 million, said Lee. "You have to go back to first principles, and say, 'We want to do super fast inference at a phenomenally lower costs, what approach should we take?' "

Lee demonstrated how BeaGo can call up a single answer to a question in the blink of an eye. "Speed makes all the difference," he said, comparing it to Google's early days when the new search engine delivered results much faster than established engines such as Yahoo!

Also: AI agents are the 'next frontier' and will change our working lives forever

A standard foundation model AI such as Meta's Llama 3.01 405b, said Lee, "will not even come close to working out for this scenario." Not only is BeaGo able to achieve a greater speed of inference -- the time it takes to return a prediction in response to a search query -- but it's also dramatically cheaper, said Lee.

Today's standard inference cost using a service such as OpenAI's GPT-4 is $4.40 per million tokens, noted Lee. That equates to 57 cents per query -- "still way too expensive, still 180 times more expensive than the cost of non-AI search," explained Lee.

He was comparing the cost to Google's standard cost per query, which is estimated to be three-tenths of one cent per query.

The cost for BeaGo to serve queries is "close to one cent per query," he said, "so, it's incredibly inexpensive."

The example of BeaGo, argued Lee, shows "what needs to happen to catalyze the [AI] app ecosystem [is] not going to happen by just sitting here using the newest OpenAI API, but by someone who dares to go deep and do that vertical integration."

Lee's dour overview of the present contrasts with his conviction that generative AI will enable a new ecosystem that is ultimately as fruitful as the PC, cloud, and mobile eras.

Also: The best AI chatbots: ChatGPT, Copilot, and worthy alternatives

"Over the next two years, all the apps will be re-written, and they will provide value for the end user," said Lee. "There will be apps that didn't exist before, devices that didn't exist before, business models that didn't exist before."

Each step of that development, said Lee, "will lead to more usage, more users, richer data, richer interaction, more money to be made." Those users "will demand better models and they will bring more business opportunities," he said.

"It took the mobile industry 10 years to build [a successful ecosystem]," he said. "It took the PC industry perhaps 20 years to build it; I think, with Gen AI, maybe, two years."

Lee offered his thoughts on what the consumer and enterprise use cases will look like if generative AI plays out successfully. For consumers, he said, the smartphone model of today most likely will go away.

"The app ecosystem is really just the first step because once we start communicating with devices by speech, then the phone really isn't the right thing anymore because we are wanting to be always listening, always on, and phones are not."

Also: Think AI can solve all your business problems? Apple's new study shows otherwise

As for app stores, said Lee, "they'll be gone because agents will directly do things that we want, and a lot of apps and e-commerce -- that will change a lot, but that's later."

The path for enterprise use of generative AI is going to be much more difficult than the consumer use case, hypothesized Lee, because of factors such as the entrenched nature of the business groups inside companies, as well as the difficulty of identifying the areas that will truly reap a return on investment.

"Enterprise will go slower," he said, "because CIOs are not necessarily fully aligned with, and not necessarily fully knowledgeable about, what Gen AI can do."

Likewise, hooking up generative AI to data stored in ERP and CRM systems, said Lee, "is very, very tough." The "biggest blocker" of Gen AI implementation, said Lee, "is people who are used to doing things one way and aren't necessarily ready to embrace" new technological approaches.

Also: AI agents are the 'next frontier' and will change our working lives forever

Assuming those obstacles can be surmounted, said Lee, early projects in Gen AI, such as automating routine processes, are "good places to start, but I would also say, these are not the best points to create the most value.

kai-fu-lee-2024-gen-ai-will-make-companies-more-nimble.png

"Ultimately, for enterprises, I think Gen AI should become the core brain of the enterprise, not these somewhat peripheral things. For an energy company, what's core is drilling oil, right?" Lee offered. "For a financial institution, what's core is making money."

What should result, he said, is "a smaller, leaner group of leaders who are not just hiring people to solve problems, but delegating to smart enterprise AI for particular functions -- that's when this will make the biggest deal."

Also: 6 ways to write better ChatGPT prompts - and get the results you want faster

"What's really core is not just to save money," said Lee, "but to make money, and not just any money, but to make money in the core, strategic part of the company's business."

Read Entire Article