Microsoft AI chief Mustafa Suleyman says conversational AI is the next web browser

1 week ago 4

Today, I’m talking with Mustafa Suleyman, the CEO of Microsoft AI. Mustafa is a fascinating character in the world of AI — he’s been in and out of some pivotal companies. He was one of the cofounders of DeepMind, which got acquired by Google in 2014, then became a Google VP for several years before leaving in 2022 to found another AI startup, Inflection.

Then, earlier this year, Inflection cut a deal with Microsoft to license its core technology in a weird and kind of controversial not-quite-acquisition situation, one that sent Mustafa, his cofounder, and a majority of their employees into Microsoft. 

As CEO of Microsoft AI, Mustafa now oversees all of its consumer AI products, including the Copilot app, Bing, and even the Edge browser and MSN — two core components of the web experience that feel like they’re radically changing in a world of AI. That’s a lot — and a lot of Decoder bait, since I’m always fascinated by Microsoft’s org chart and all the little CEOs that report to Satya Nadella, and of course, I’m obsessed with what AI might do to the web at large. I also asked Mustafa to compare and contrast working at Microsoft and Google since he has direct experience at both, and his answer was pretty revealing.

Listen to Decoder, a show hosted by The Verge’s Nilay Patel about big ideas — and other problems. Subscribe here!

I also wanted to ask Mustafa about AI training and the data it requires. He’s caught some heat for describing content on the web as “freeware” before, and Microsoft and OpenAI are in the middle of major copyright lawsuits about training data. I’m curious how AI companies are thinking about the risky and seemingly uncertain legal foundations of their work, and I wanted to know how Mustafa was thinking about it now.

But before we got into all that, I needed to ask about AGI, or artificial general intelligence. That’s the idea that these AI systems will be able to handle tasks as well as a human — or even better, in some cases. Sam Altman at OpenAI — which, again, is a huge partner with Microsoft for this stuff — has said he thinks AGI is achievable on our current computing hardware. In his most recent comments, he seemed to lower the bar for how he defines AGI entirely — which makes it easier to argue that it will arrive sooner than most think. On top of that, there’s a lot of reporting that says OpenAI can get out of its Microsoft deal when it achieves AGI, so he’s got a lot of incentives to say it’s happening.

I asked Mustafa straight out if he agrees with Altman and if AGI is achievable on current hardware — because if the answer is yes, then maybe a bunch of org chart questions are a little secondary. You’ll hear him be optimistic but on a much longer timeframe — and you’ll also hear him pull away from the idea of AGI being a superintelligence, which feels like another kind of redefinition.

There’s a lot here — including a discussion of what I’ve started calling the DoorDash problem. You’ll see what I mean.

Okay, Microsoft AI CEO Mustafa Suleyman. Here we go.

This transcript has been lightly edited for length and clarity.

Mustafa Suleyman, you are the CEO of Microsoft AI. Welcome to Decoder.

Great to be with you.

I’m very excited to talk to you. I have a lot of questions for you about how Microsoft AI is structured within Microsoft, what it means to be the CEO of Microsoft AI (at a company that appears to be all about AI lately), how you make decisions — all the Decoder stuff. I’m going to start hot out of the gate. I hope you’re ready for this because I realize that if you answer one way, this whole interview goes in a different direction. So, very recently, Sam Altman said in a Reddit AMA that he thinks we can achieve artificial general intelligence (AGI) on current hardware. Do you think that’s possible?

What does current hardware mean?

Within one or two generations of what we have now, I would say.

I don’t think it can be done on [Nvidia] GB200s. I do think it is going to be plausible at some point in the next two to five generations. I don’t want to say I think it’s a high probability that it’s two years away, but I think within the next five to seven years since each generation takes 18 to 24 months now. So, five generations could be up to 10 years away depending on how things go. We really are facing increasingly tough challenges with these chips. I don’t think it’s going to be as linear in terms of its progress or cost per dollar as we’ve seen in the past. But things are accelerating very fast. So, I agree with that sentiment.

So, between two and 10 years, you think?

The uncertainty around this is so high that any categorical declarations just feel sort of ungrounded to me and over the top.

You and I have spoken several times in the past about a lot of things, and I want to follow up on all of those ideas. It just occurs to me that if we think AGI is between two and 10 years away, very much in the span of our lifetimes, maybe we shouldn’t be working on anything else. That seems like it will be a paradigm shift, right? We’re through the singularity now, there is AGI. Everything will be different on the other end of it. How do you approach that and then also think about, “Well, I need to launch the Copilot app on the iPhone”?

It depends on your definition of AGI, right? AGI isn’t the singularity. The singularity is an exponentially recursive self-improving system that very rapidly accelerates far beyond anything that might look like human intelligence. 

To me, AGI is a general-purpose learning system that can perform well across all human-level training environments. So, knowledge work, by the way, that includes physical labor. A lot of my skepticism has to do with the progress and the complexity of getting things done in robotics. But yes, I can well imagine that we have a system that can learn — without a great deal of handcrafted prior prompting — to perform well in a very wide range of environments. I think that is not necessarily going to be AGI, nor does that lead to the singularity, but it means that most human knowledge work in the next five to 10 years could likely be performed by one of the AI systems that we develop. And I think the reason why I shy away from the language around singularity or artificial superintelligence is because I think they’re very different things.

The challenge with AGI is that it’s become so dramatized that we sort of end up not focusing on the specific capabilities of what the system can do. And that’s what I care about with respect to building AI companions, getting them to be useful to you as a human, work for you as a human, be on your side, in your corner, and on your team. That’s my motivation and that’s what I have control and influence over to try and create systems that are accountable and useful to humans rather than pursuing the theoretical super intelligence quest.

One of the reasons I’m particularly curious about this is the notion that all human knowledge work can be performed either with the assistance of a very capable general AI or by the AI itself. It sort of implies that we will build a new kind of AI system, right? One that will be able to be as creative as a human knowledge worker at the 99th percentile. And I don’t see that in our systems now. The way an LLM works, they don’t necessarily come up with a bunch of individually creative thoughts. You can prompt them to do surprising things, but that turning [into something more] — I have not experienced. Do you think that the way that the current LLMs are built, trained, and deployed is a linear path to the kind of AGI you’re describing, or is there another kind of thing we need to build?

It’s funny because two or three years ago, people would often say, “Well, these systems are destined to regurgitate the training data that they were trained on.” And that there is some one-to-one mapping between query training data and output. It’s pretty clear today that they’re actually not doing that. The interpolation of the space between multiple N-dimensional elements of their training data is in itself the creative process, right? It’s picking some point in this massively complex space to produce or generate a novel form of the response to the question that it has never seen before. We’ve never seen that specific answer produced in that specific way. To me, that is the beginning of creativity. It’s the kind of glimmer of a truly novel invention, which is obviously what we’re trying to produce here. 

Intelligence is the very sort of thing that has driven all of our progress in the world throughout history. It’s the power to synthesize vast amounts of information, aggregate it into conceptual representations that help us reason more efficiently in complex spaces, make predictions about how the world is likely to unfold, and then take action on the basis of those predictions. Whether you are making a table or you are playing baseball with your friend, every single one of those environments that you experience has those characteristics. 

So if we can distill those moments, if you like, into an algorithmic construct, then of course there is huge value there. What I think we see in this mini moment in the last three or four years are the glimmers that they (LLMs) really can be creative, exert real judgment, and produce novel ideas. Your point about whether they can do that proactively is a good one. Like can LLMs do that unprompted? Can they do it independently? Can they do it with very subtle, nuanced, or lightweight guidance? I think that’s kind of an open question. I feel very optimistic about that myself.

Much of the infrastructure to ensure that LLMs can do that is kind of an engineering issue now. Stateful memory and meta-reasoning about the current context of a model are things that we know how to do in software today. We know how to introduce a second or a third system to observe the working state of an LLM in its activity and use that to steer or re-steer a prompt that it is operating to. And if you can do asynchronous meta-reasoning, which is what the initial “chain of thought” methods seem to show in the last six to 12 months, then you can imagine how it could string together actions in these continuous environments.

It could then orchestrate and coordinate with other parts of its working memory, other parts of its system — some of which are designed to do more short-term things, some to draw from long-term memory, some to be a bit more creative, and some to be more adherent to the behavior policy or the safety policy that you’re designing to.

So, it’s obviously not done and dusted, but there are very, very clear signs that we’re on the right path, I think.

Those orchestration systems are fascinating to me because the models themselves are not deterministic. They’re never going to produce the same output twice. A lot of the things we want computers to do are insanely deterministic. We definitely want them to do the same thing over and over again. In a variety of situations where an AI might be really helpful, like if you want to do tax preparation, you want the AI to be very helpful and understand all the inputs. You also want it to follow the rules 100% of the time. 

It seems like connecting our logical computer systems to control the non-deterministic AI systems is a big pathway here, more so than making the AI more capable. And that feels like a new way of talking about it that I’ve only recently seen. Does that feel like the kinds of products you need to build or are you still focused on the capability of the model itself?

It’s a good framing, but let’s tease apart what you mean by determinism. So, determinism operates at layers of abstraction. At the very lowest layer, each token is being generated non-deterministically. As those outputs become more recognizable with respect to a behavior policy, a heuristic, or a known objective — like filling out a tax form — then that knowledge can be stored in representations that are more stable and deterministic. 

And this is exactly how humans operate today. No matter how well you might memorize something, if I ask you to do it 100 times over, you’re most likely going to have some variation in the output. We don’t really store things deterministically. We have co-occurring conceptual representations, which are quite fluid and abstract. We then reproduce and fit them into a schema of words and language in order for us to be able to communicate with one another.

These models are actually very similar to that architecture. They can store stable information that can be retrieved in quite deterministic ways, and like you said, integrate with existing computer systems and knowledge bases. But it’s not true to say that one approach is going to trump another. The models are going to get way more capable, and the methods for retrieval, information access, the use of existing databases, or making function calls to third-party APIs to integrate that information, are going to advance simultaneously. 

By the way, we’re going to open up a third front, which is that these LLMs can speak natural language now. They’re going to be able to go and query other humans and other AIs in real-time. So, that’s like a third paradigm for “retrieving” or verifying that information, accessing new knowledge, or checking state on something. That in itself is going to drive huge gains in addition to straight-up model capabilities and integration with existing systems.

I want to talk about the agent component of that at length because that seems to be where so many companies are focused, including to some extent, Microsoft. It raises a million questions about how our computer systems and our networks should work. We think we’re headed towards AGI between two and 10 years from now, we think we can do it with an increase in model capability, but also some novel approaches to how we use those models. 

I want to talk about how you’re actually doing it at Microsoft. It occurred to me from the jump, that if we didn’t agree on what the goals were, the structure conversation would be ungrounded from reality. So, those are the goals. Those are huge goals. At Microsoft AI, how are you structured to accomplish those goals?

That’s a great tee-up. First and foremost, my organization is focused on the consumer AI part. So, it is about Bing, Edge, MSN, and Copilot — so consumer-facing products that have hundreds of millions of daily active users, lots of user data, and lots of direct commercial surfaces where we can deploy into production, get feedback, and drive large-scale experimentation. For me, that’s mission-critical, because five years ago, we were in a state with LLMs and AI where we were still relying on the benchmarks to drive progress. Evaluation was taking place in basically academic environments, albeit in commercial engineering labs. The models weren’t good enough to actually put them into production and collect feedback from the real world. That has completely shifted now where all of the innovation is happening by optimization and hill climbing in production. So, I think that’s the first thing to say.

The second thing to say is that our Azure business and the immense number of customers that we have using M365 Copilot every day provide another huge experimentation framework, which is very different from the consumer experimentation framework. It’s actually a great opportunity for me because I’m learning a lot from how many businesses are integrating true AI agents in their workflow today. Since they have more visibility and control of their internal data, and in many cases, they have tens or even hundreds of thousands of employees, they’re able to introduce novel Copilot into their workflows, be it for training sales agents, up-skilling underperforming sales agents, and providing marketing feedback. I’ve seen HR Copilots, there’s all kinds of customer service Copilots happening. That gives me a sort of window into all the different flavors of testing and pushing the limits of these AI models in third-party production environments in the enterprise context.

The third arena, of course, is our collaboration with OpenAI, our great partners. I think this is going to turn out to be one of the most successful partnerships in computer history. That partnership is five years old now and has many years to run. We get models from them, we get intellectual property (IP), and they get compute and funding. It’s obviously a huge source of support for us. 

And then the fourth area is that we’ve just spawned — since I arrived eight or nine months ago now — our own core effort to develop these models at scale inside of Microsoft AI. We have some of the best AI researchers and scientists who are pushing the frontier of post-training and pre-training for our weight class. We are choosing a floating point operations per second (FLOPS) match target that really suits the kind of use cases that we care about and making sure we have absolutely world-class frontier models that can do that.

Let me just unpack some of the vocabulary there. You said “weight class.” Does that just mean a giant corporation, or do you mean something more specific by “weight class”?

Weight class is the way that we refer to comparing frontier models with one another. Your FLOPS need to be matched to your competitor model that you’re evaluating yourself against. So, size is really significant. It’s by far the overriding predictor of capability performance in these models. You sort of can’t compare yourself to something that’s 10X larger by FLOPS. You have to treat them as weight classes or FLOPS classes if you like.

That makes sense to me. And then you said you want to target it towards the applications you’re using, right? So, you’re making many models that are geared toward specific Microsoft products?

That’s right. So, if you think about it, Copilot under the hood is a whole collection of different models, of different sizes that adapt to different contexts. If you’re in a speech setting, it’s a different type of model. If you’re on a desktop, if you’re actually in the native apps on Mac or on Windows, they’re all slightly different models. And then there are different models for search, reasoning, and safety, and I think that that is going to get even more heterogeneous as we go.

And then I just want to be very clear about this. It sounds like you’re developing a frontier model that can compete with Gemini, GPT-4, or GPT-5, whatever it is. Are you working on that as well?

For the current weight class, yes. So, at the GPT-4, GPT-4o scale. But it depends on how things turn out over the next few years because each order of magnitude increase is a phenomenal piece of physical infrastructure. You’re talking about hundreds of megawatts, and soon gigawatts, of capacity. There will really only be three or four labs in the world that have the resources to be able to train at that scale by the time that we get to 10 to the 27 FLOPS (floating point operations per second) for a single training run. We won’t duplicate that between us and OpenAI. OpenAI is our pre-training frontier model partner for those things, and hopefully, that continues for a long time to come.

So, you’re not going to compete with the next-generation model’s size, right? You’re going to let OpenAI do that. The reason I ask is because Microsoft runs the data centers, right? That as a partnership is ongoing, but Amazon runs its own data centers and Google runs its own data centers, and it seems like there is just a core tension here regardless of how good the partnership is. It’s between, “We are going to build these data centers and restart nuclear power plants in the United States to supply power to some of these data centers,” and, “Maybe it’s better to sell that to someone else versus build the models ourselves.” Do you feel that tension?

Every partnership has tension. It’s healthy and natural. I mean, they’re a completely different business to us. They operate independently and partnerships evolve over time. Back in 2019 when [Microsoft CEO] Satya [Nadella] put a billion dollars into OpenAI, I mean it seemed pretty crazy. I didn’t think it was crazy, but I think a lot of people thought it was crazy. Now that has paid off and both companies have massively benefited from the partnership. And so, partnerships evolve and they have to adapt to what works at the time, so we’ll see how that changes over the next few years.

Do you have a backup plan if OpenAI declares AGI and walks away from the Microsoft deal? There’s some credible reporting that’s as if they declare AGI they could walk away from the deal.

No. Look, it’s very unclear what the definition of AGI is. We have, inside of Microsoft AI, one of the strongest AI research teams in the world. If you look at the pedigree of our crew, my own co-founder, Karén Simonyan, led the deep learning scaling team at DeepMind for eight years and was behind many of the major breakthroughs. Nando de Freitas has just joined us; he previously ran audio/video generation at DeepMind for 10 years. So, we have an exceptional team and we’ll make sure that whatever happens, we’ll be in a position to train the best models in the world.

It does seem like you have some uncertainty there. You’ve said whatever happens several times now in the context of the OpenAI deal. Does that feel like something that you can rely on over the course of the next two to 10 years? Because that seems like a very important timeframe.

It definitely does. Look, they’re an exceptional company. They’re on a tear. There aren’t many companies in the world that have grown as fast as they have. During that kind of meteoric rise, things are going to be brittle and some of the bits and pieces are going to fall off occasionally. That’s what we’ve seen in the last 12 months. So, that doesn’t really change their trajectory. They’re going to be incredibly successful, and we’re going to do everything we can to help them be successful because they’ve helped make us successful. That’s genuinely what’s going on here. Naturally, in any partnership, there are little tensions here and there, but fundamentally we will win together.

I want to come back to the cooperation-competition dynamic there when we actually talk about products, but I want to stay focused on Microsoft AI inside of Microsoft for one more turn. You obviously started Inflection, Microsoft sort of reverse, acqui-hired all of Inflection. They brought over all the people and they issued you all shares. Why do the deal that way? Why join Microsoft and why structure that deal in that way?

So, I’ve known Satya for a very long time. He’s been sort of trying to get me to come and be part of the Microsoft crew for a while, as far back as 2017 when we first started hanging out. I’ve always been particularly inspired by his leadership, and I think the company is actually in an incredibly strong position: the investments that we’re making in compute, the distribution that we have with so many enterprise partners now deploying M365 Copilot, and what you can learn from that is a real game changer. A lot of people are talking about these actions, right? Clearly, you want your consumer Copilot experience to have these seamless interactions with brands, businesses, opportunities for getting stuff done, buying things, booking, planning, and so on. And so, having that kind of protocol built in-house and available to the consumer side, is super important.

The thing I realized about where we were at with Pi and Inflection — we had an unbelievable engagement with Pi, very high-intensity DAO. The average session of voice interaction lasted 33 minutes a day. It was pretty remarkable. But I think the challenge is that the competition is going to invest for years and years, and keep it free, if not reduce it to nothing. Basically make it widely available to hundreds of millions of people. And so, from a consumer perspective, it is a very, very competitive landscape. And look, when Satya made me the offer to come and run all the consumer stuff here, it was just an offer that we couldn’t refuse. It sort of enabled us to pursue our long-term vision of actually creating a true AI companion that has a lasting relationship with hundreds of millions of consumers that is really useful to you. And to me, that’s going to shape the future. That is really the thing that is going to shape our long-term trajectory. So, I couldn’t turn that down.

You are the CEO of Microsoft AI. Microsoft is an interesting company in that it has a CEO and then several other CEOs. Phil Spencer is the CEO of Microsoft Gaming. Ryan Roslansky is the CEO of LinkedIn. We just had Thomas Dohmke from GitHub on, he’s the CEO of GitHub. What does it mean to you to be the CEO of Microsoft AI?

Microsoft is an enormous organization, with a quarter of a trillion dollars in revenue, and about 280,000 employees. The logic of making single individuals accountable for our own P&L is very rational. There are about 10,000 or so people in my org. We have full integration from training the models, building the infrastructure, running the ads platform, managing all the sales leaders, making sure that our content is high quality, and getting that integrated across four platforms. So, it just creates accountability. That’s the logic here, and that’s very much how Satya runs it. Extreme accountability.

One thing that strikes me here is that GitHub is a product. LinkedIn is a product, as a beginning and an end, it’s very tangible. People can understand it.

Microsoft AI is the company. There’s just a lot of AI at Microsoft that is infusing into all of these products. I think Satya has agreed that AI feels like a platform change. There’s enormous opportunity inside of a platform change. You’ve obviously got your core products in Bing and Edge and MSN and all that, but when you think about the relationship to the rest of the AI efforts at Microsoft, where does the line begin and end for you?

That’s a good question. Right now, the company is so focused on winning on Azure. OpenAI, for example. Getting our models into production and getting them into the hands of hundreds of thousands or millions of businesses. I’m involved in a lot of the reviews on the enterprise side but also play a role as an advisor and support. Our Microsoft AI (MAI) internal models haven’t really been focused on those enterprise use cases. My logic is that we have to create something that works extremely well for the consumer and really optimize for our use case. So, we have vast amounts of very predictive and very useful data on the ad side, on consumer telemetry, and so on. My focus is on building models that really work for the consumer companion.

That’s a product-focused structure it sounds like. Have you reorganized Microsoft AI to be a more product-driven team?

I think the business was focused on the product before. What we’ve done is bring the kind of AI sensibility into the heart of each one of our products. We have a lot of rankings. We have increasingly conversational and interactive surfaces. We’re trying to bring the voice of Copilot to Bing and MSN. We want to make it a core part of the search experience so that your first thought is: let me just ask my AI. “What does my AI think about that?” and “My AI can remember that for me, save it, and organize it.” And so, making sure that it shows up in deeply integrated ways that really support the surface, rather than an adjacent add-on or an afterthought. That’s the craft that we’re kind of working towards.

You are a unique person to have on the show because you also co-founded DeepMind and you worked at Google. We’ve had Demis, the CEO of DeepMind on the show before. Google is a challenging place to work at. He is a CEO of Google DeepMind. Google doesn’t have CEOs the way that Microsoft has CEOs.

Can you compare and contrast these two companies? You worked at one huge company, you were at a startup for a minute. Now you work at another huge company. They are very different culturally and structurally. Do you think Microsoft has advantages over Google’s approach?

I do. I think that at Microsoft there is a lot of discipline around revenue and P&L. I think that is a very healthy attitude because it really focuses the mind on what a consumer is going to find truly valuable and be prepared to pay for. Second, there’s long-term thinking about “Where does this platform shift take us and what does the five to 10-year horizon look like?” So, there’s a kind of planning attitude, which, during my time at Google, felt more instinctive. I mean, their instincts are really good. It’s an incredibly creative company and many times they’ve made long-term bets, but they were kind of instinctively reactive. Whereas I think there’s a lot more thought in the scenario planning and thorough deliberation [at Microsoft]. Then the third thing I guess I would say is that Friday’s senior leadership team meeting with Satya is a phenomenal experience. It runs from 8:30AM until 2:30PM PT in the office in Redmond, and everyone’s there, all the leaders.

We review all the big businesses or all the big strategic initiatives in detail, and the senior leadership team is cross-functionally in the weeds. And that is pretty remarkable because they’re sort of reviewing these things week after week, like security — huge priority, genuinely like a number one focus for the company — AI, and infrastructure. Then reviewing all of the businesses. It’s very cool to see that other leaders ask the questions and I kind of see the world through their eyes, which is slightly different. So, although there are lots of CEOs, everyone’s looking at everyone else’s businesses and giving advice and feedback. It’s quite an intellectually diverse group.

And then the other thing I would say is that because there’s obviously an enterprise-style DNA to the company, there’s a real focus on, “what does the customer want?” But Google is like, “What would be a cool technology for us to build?” Whereas Microsoft’s like, “How would this actually help the customer and what are they asking for?” And I think both of those strategies have their own benefits, but if you swing one way or the other to an extreme, there are real problems. And so, I’ve certainly enjoyed learning from the fact that Microsoft is very much like, “What does the consumer want?” and “What does the customer need?”

You mentioned security at Microsoft. The renewed focus on security is because there were a bunch of lapses earlier this year, right? This has been an issue. You have an outsider perspective; you’re building a lot of products that might go out into the world and do things for people. You’re building a lot of products that require a lot of customer data to be maximally useful. As you go into these meetings and you talk about Microsoft’s renewed effort on security because there were some problems in the past, how has that affected your approach to building these products?

I definitely think that the company culture is security-first and —

But that’s now, I just want to be very clear to the audience. Satya has started saying that now, but it’s because there were these enormous security lapses in the past year.

That’s true. That is very true. I’m just saying since I’ve started there, I sit in a weekly security meeting where literally all the heads of the companies and various different divisions are singularly focused on what we can do and it is the number one priority. There’s nothing that can override that. No customer demand, no amount of revenue. It is the first thing that everybody asks. So, culturally, as far as I’ve known, it is the central priority, which has been good for me too. I mean, for my businesses it is also mission-critical that we preserve consumer trust and trust means that people expect us to be able to store, manage, and use their data in ways that singularly benefit them and are in their interests. I do think that that is a central part of the culture. And you’re right, maybe that’s a refocusing of late, but it certainly is the case now.

You also mentioned you have P&Ls as CEOs. I sort of understand how LinkedIn has a P&L, right? They have a product, they have some engineers, they make some money, and people pay for Premium. Microsoft AI, feels like a lot of losses and not so many profits. How are you thinking about balancing that out?

Oh, we’re very profitable. We are very profitable!

Well, I’m just saying there’s a lot of investment in AI. That stuff hasn’t paid off yet.

That’s true, that’s true. The AI stuff hasn’t paid off yet. I think it’s fair to say. But remember, I spend over half my time focused on the Bing business, and the Bing business is doing incredibly well. I mean, we grew 18% last quarter and we actually took gains from Google, which means we are growing faster than Google, and that makes everybody feel happy. And that’s kind of the main goal. So, the product is deeply integrated AI. There are generative search results in the context of your search experience. There are increasing conversational experiences there. The general quality that we’ve been able to level up with LLMs has been very impressive, and I think that’s translating into revenue improvements as well.

So, in that sense, AI itself is actually in production across the company. It’s not like we’re just waiting for chatbots to suddenly and miraculously generate a new business model. LLMs are being used at all sizes across the existing business for all kinds of things, like even in Edge, for example, for transcription and summarization built into the browser. There are so many different ways that AI is showing up. You’ve got to think of it more as a new high bar in terms of the table stakes of the features that we offer.

The part where the LLMs are integrated into a bunch of products like Bing or Edge, are they driving more revenue from those products or are they just taking share away from Google?

So, the way I think about it is that it’s improving the quality of ads that we show, improving the relevance of those ads, and so it’s making the experience more useful for the consumer. And that is… I mean, obviously, the overall pie is growing, and that’s the nature of the growth. Obviously, Google’s growing too, so the entire market is continuing to grow. The point is that we’re growing faster than Google for this quarter, and I think that’s a huge achievement. The team’s done an amazing job and it’s not about me by the way. That’s a product of many years of them investing in quality and relevance and just generally doing a great job.

Famously, when Bing with Copilot was introduced and I sat down with Satya, he said, “I want to make Google dance.” And then I went and asked [Google CEO] Sundar [Pichai] about that. He said, “He just gave you that quote so that people would run that quote.” And that was kind of his response. Sundar is very calm in that way. You came into it after that whole situation and now you run the products that are directly competitive with Google. Do you think that you are… you know, you’re growing faster than Google in some places.  Do you think that you are actually posing a competitive threat to Google in either Bing with Search or Edge with Chrome?

One of the things that I’ve realized as I’ve become a bit more experienced and mature over the years is that you have to be very humble about how the landscape changes. I mean, on the one hand, this is an opportunity to relitigate some of the battles of the past. The chips are going to fall into a completely different configuration in the next two or three years. At the same time, that’s a very challenging thing to do. Habits die hard and so on. But our goal with this completely new interface is to make it 10 times easier for people to access information, advice, and support in a truly conversational way, and to do things that our competitors won’t do — things that are truly useful to everyday consumers. And I think that’s actually going to be one of the differentiators. It’s like what is the personality, the tone, and the emotional intelligence of an AI companion?

Remember, most people do love information and they like getting accurate and reliable information, but that’s going to be commoditized. All of these models are going to have that. And despite what we like to think in Silicon Valley, surrounded as we are by nerds and information obsessives who read all the content that you can get access to, most people really connect to brands and really connect to ideas in a social way. They connect to it because it is sort of friendly, kind, supportive, and emotionally reassuring, and I think that’s going to form a big part of the way these models actually turn out to be successful in a few year’s time.

I need to ask you the core Decoder question, but then I want to come back to the idea that the information will be commoditized. You’ve described a lot of change. You were at one company, you were at a startup, you’re at Microsoft, you’re learning how Microsoft works. You have big decisions to make about how to deploy these products. What is your framework for making decisions? How do you make them?

The way that I like to operate is in a six-week rhythm. So, I have a six-week cycle, and then we have a one-week meetup for reflection, retrospectives, planning, brainstorming, and being in person. The reality post-COVID is that people work from all kinds of places and they like that flexibility. So, my rhythm is to keep people in person two to three days a week and then really come together for that seventh week of retrospectives. My general framework is to try to be as in the weeds as possible. Okay? Really spend a lot of time in our tools, tracking telemetry, hearing feedback from people, and then creating this very tight operating rhythm where in the context of a cycle, six to seven-week process, we have a very falsifiable mission. Every single team can express in a sentence exactly what it is they’re going to deliver, and it’ll be very falsifiable at the end of that, so we’ll know.

And then when we observe whether or not that happened, that’s a moment for retrospective and reflection. I really like to write. I’m a writer, I think by writing, and I like to broadcast my writing. So, every week, I write a newsletter to the team that is just like a reflection on what I’ve seen, what I’ve learned, what’s changing, what’s important, and then I document that over time and use that to track and steer where we are going. That’s kind of the basics of how I practically implement my process for reflection and stuff like that. But in terms of the framework, one thing is to really tune in to the fact that no matter what product you invent, no matter how clever your business model is, we are all surfing these exponential waves. And the goal is to predict which capabilities fall out of the next large training model.

If you overthink that and assume that there’s some genius new ecosystem incentive, new business model, or new UI style, all that is super important. But if you think that it’s only going to be that or that it’s going to be the overwhelming driver, I think that’s a mistake. Maybe this comes from my 15 years of experience in trying to build these models. Remember at DeepMind, 2014 to 2020, I was banging my head against the table trying to ship machine learning models, ship convolutional neural networks (CNNs) in the early days, find classifiers, do re-ranking, try to predict what to watch next on YouTube, trying to do activity classification on your wearables, trying to crash detection algorithms inside of Waymo. Every single applied practical machine learning objective, I explored there. And now, we have the tools to be able to do those things and do them really, really well. They’re really working.

So, we are basically surfing those tides. The goal is to really nail those waves because we already have models that are giving us more than we can extract and apply into products. That’s quite a profound state that we’re in. We haven’t completely extracted all the gains from the current class of frontier language models. Every week, there’s still some new capability, some new trick, or people have crafted or sculpted them in post-training in a new way. And I think that that is going to continue for the next few years to come, many years to come, in fact. So, in terms of the decision-making framework, the goal is to be very focused on model development and scaling those models, getting them to be practical and useful, really aligning them, and getting them to behave in the way that you need for your product.

Let me ask you about that because model development… and we need to get more of the models we have now. There’s a little bit of tension there. There’s a notion that the scaling laws are going to run out, that the next class of models is not significantly outperforming the models we have now, and I think you can track that in just the way we’re talking about the products. 

A couple of years ago, it was, “AI’s an existential risk, we have to stop it so we can make sure it’s aligned before we kill everyone.” And now, we’re kind of like, “Well, we got to get more out of the models we have now. Actually ship some products, make some money, hopefully, and figure out what it’s all good for and how to best use it because it doesn’t seem like the next generation of models are actually running away as fast as we think they might.” Is that your view that the frontier models are not getting better as fast as we thought they might and so we have to get more out of what we have?

No, I don’t think that’s true. I think that they’re going to continue to deliver the same seismic gains that we’ve seen in the previous generations. Remember that they’re more costly and more fragile, and they’ll take longer to train this time around. So, we’re not going to see them happen in the same sort of 12 to 18-month timeframe. It’s going to shift to 18 to 24 months and then a bit longer. But I don’t see any sign that there’s a structural slowdown. I kind of see the opposite. There are huge gains to extract from where we are today, but it’s very clear to me that there are also huge gains to extract from the next two orders of magnitude of training as well.

I want to make sure we talk about the thing you mentioned, the commodification of information, and then I definitely want to make sure we talk about agents real quick to bring this all around to the products to come. The commodification of information is, I think, the big story of the internet that we have today, the platform internet, for lack of a better word. You go to Google, you ask it a question, and now it might spit out an AI-generated answer. You go to MSN, you ask it for the news, and it might algorithmically or with AI sort a bunch of news and summarize that news for you.

Everyone’s headed in this way. We’ve been talking about this for a long time. To train the next-generation models, we need even more information. You’ve gotten yourself into some trouble, I would say, saying that the information on the internet is “freeware,” and the expectations that you can use it to train. There are a lot of lawsuits, including several pointed at Microsoft. Where do you think that next body of information comes from before we sort out the copyright implications of using all this stuff to train?

One way of thinking about it is that the more computation you have, the more time these models can spend attending to the various relational components of all that training data. Think of FLOPS as a way to spend understanding time, learning the relationship between all these various training inputs. So, first of all, you can still gain more from just having more computation to learn over all the existing data. The second thing is that we learn a vast amount from interaction data. Users tell us implicitly and explicitly how they feel about an output. Is it high quality? Is it used? Is it ignored? Third, we’re generating vast amounts of synthetic data. That synthetic data is increasingly high quality. When you ask an AI teacher or a rater to compare two or three different examples of the synthetically generated output and the human written output, it’s extremely difficult to detect those precise nuances.

So, the synthetic data is increasingly high quality and used in a whole bunch of different settings. Fourth, I can imagine AIs talking to other AIs, asking for feedback — AIs that have been primed for different areas of expertise or different styles and prompted in different ways. You can imagine those interactions producing valuable new knowledge, either because they’re grounded in different sources or just because of their stylistic output, they’re producing novel interactions. So, I don’t necessarily see data being the limitation anytime soon. I think that there are still huge benefits to come from scale for the foreseeable future.

So, that’s all new data, right? You’re going to get a bunch of interaction data. Maybe the synthetic data will be a high enough quality to train the next generation models, but the original data sets were the web. It was a bunch of web content. It was the entire internet, maybe it was to video platforms to some extent from some of the model providers. 

The quote I have from you in June, I think you were speaking to Andrew Ross Sorkin. Here’s a quote, you said, “I think that with respect to content that’s already on the open web, the social contract of that content since the 90s is that it’s fair use, anyone can copy it, recreate with it, reproduce with it. That has been ‘freeware,’ if you like, that’s been the understanding.” I’m curious… You said that. That was the understanding for search and there was a lot of litigation around search, Google Image Search, and Google Books that led there. Do you think that that is still stable enough for you in the age of AI with all of the lawsuits outstanding?

What I was describing in that setting was the way that the world had perceived things up to that point. My take is that just as anyone can read the news and content on the web to increase their knowledge under fair use, so can an AI, because an AI is basically a tool that will help humans to learn from publicly available material. All the material that has been used for generating or training our models has been scraped from publicly available material. Where we —

But publicly available and copyrighted are very different things on the internet, right? Publicly available does not mean free of copyright restrictions.

Oh, yeah. I mean, look, obviously, we respect the content providers, so that’s an important distinction. But I guess what I’m trying to say is that from our perspective, there are certain types of content, for example, in our Copilot Daily or MSN Daily that are paywall publisher content that we pay for directly. And what MSN has been doing since the beginning of time. It’s what we’ve decided to do with Copilot Daily for high-quality content because we want those publishers to create an information ecosystem that really works for everybody. And I just think this is one of those situations where things will play themselves out in the courts. At any time when there’s a new piece of technology, it changes the social contract as it is at the moment. There’s clearly a gray area in terms of what constitutes fair use and whether an AI can have the same fair use as a human, and we will just have to play it out over the next few years. I think we’ll have some perspective over that in the next few years as things land.

One of the reasons that I ask that — as directly as I’m asking it — is the cost of training the next generation models is very, very high. But that cost is built on a foundation of, well, the training data is free, and if a couple of court decisions go a couple of ways, the cost of the training data might skyrocket, right? If a court says it’s not fair use to use the New York Times’ content, or it’s not fair use to use these books from these authors. Suddenly you may have to pay a lot of money for that data as well. Do you think that that is something —

We already pay for books on a huge scale. So, if it is a copyrighted book, we’re not hoovering that up from the internet. Copyright books and licensed —

Well, Microsoft might not be, but there’s a very big lawsuit from a bunch of publishers who say that, for example, OpenAI is, right? And that’s the model that you are reliant on. So, it just seems like there’s a... Maybe legally we’ll see what the answer is, but economically, there’s also a lot of uncertainty here because of the cost of the underlying data.

Yeah, that’s true. And I think our focus has been to make sure that we pay for the really high-quality copyrighted material from publishers — news publishers, book publishers, and others, and I think that’s going to continue. That’s definitely what we’re committed to.

Who decides what’s high quality?

That’s actually an interesting question. Quality is actually something that we can measure. We want to make sure that the content, especially from a non-fiction perspective, so we’re particularly interested in academic journals and academic textbooks…  We can verify the source and citations for that knowledge, and that is one of the big measures that we consider to be high quality.

But the visual artists, the non-fiction artists, visual effects artists, the movie industry, they’re saying, “Hey, we’re going to get pushed out of work because we are not compensated for any of the work that’s going into these models.” How do you think this plays out for that? Because again, I agree that the law here is deeply uncertain, these cases are going to play out, but I’m looking back at what you’re describing as the social contract of the web. And what I see is, “Oh, Google litigated a million of these lawsuits.” That social contract was not... We didn’t just all wake up one day and decide this is how it’s going to work. Google went to court 15 times and they were a bunch of kids who had slides in the office and they had just made Google. They were very positioned as a company in a moment, and they had a product that was so obviously useful in so many different ways that they kind of got away with it.

And I don’t know that the tech industry is in that position anymore. I don’t know that the products are so obviously useful the way that putting Google on the internet for the first time ever was so obviously useful, and I certainly don’t know that the feelings from particularly one set of creators are as mixed or too positive as they were for Google back in the 90s and early 2000s. And that to me feels like you’re on the board of The Economist. That to me feels like the people that make the work are having the most mixed emotions of all. Because yes, I think a lot of us can see the value of the products, but we also see the value transfer to the big tech companies, not the upstarts, not the cute kids with the slides in the office.

I think that this is going to be more useful and valuable than search. I think search is completely broken, and I think it’s a total pain in the butt, and we’ve just kind of become used to using a terrible experience. Typing a query... Just think about what a query is. We had to invent the word “query” to describe this really weird, restricted way that you express a sentence or a question into a search engine because of the weakness of the search engine. And then you get 10 blue links, and then those things are vaguely related to what you’re looking for. You click on one and then you have to go and refine your query. I mean, it is a painful and slow experience.

I think that if we can get this right, if we can really reduce hallucinations to de minimis amount… I think we’ve already demonstrated that they don’t have to be toxic, biased, offensive, and all the rest of it. It’s pretty good. It’s not perfect, but it’s getting much much better, and I think it’s only going to get better with more stylistic control. Then these conversational interactions are going to become the future of the web. It’s quite simple. This is the next browser; this is the next search engine.

It is going to be 100 times easier for me to just turn, by voice, to my Copilot and say, “Hey, Copilot, what’s the answer to this?” I already do it five times a day. It is my go-to. It’s my bottom right-hand app on my iPhone. My thumb instantly goes to it. I use the power button to open it. My favorite app, like I did with Pi. I mean, it is clearly the future, that conversation interaction. So, to me, the utility is phenomenal, and I think that is going to weigh into the cases as they make their way through the court.

So, that leads us, I think, directly to agents, where you are going to ask some app on your phone or some part of the operating system on your computer to do something and it will go off and do it. It will bring you the information back or it’ll accomplish some task on your behalf and bring you the result. You and I have talked about this before in various ways. That commodifies a lot of the service providers themselves, right? You say, “I want a sandwich,” and now I don’t know if it’s DoorDash, Uber Eats, Seamless, or whoever is going to bring me the sandwich. My AI is going to go out and talk to them. That implies that they will allow that to happen — they will allow the agents to use their services.

In the best case, they would provide APIs for you to do it. In the worst case, they let people click around on their websites, which is a thing that we’ve seen other companies do. And sort of in the medium case, they develop some sort of AI-to-AI conversation. Not quite an API, not quite we’re just literally clicking around on a website and pretending to be human, but our AIs are going to have some conversation. What is the incentive for those companies to build all of those systems or allow that to happen to become disintermediated in that way?

I mean, people often ask when there’s a new technological or scientific revolution and it’s causing a massive amount of disruption, and people are curious. It’s like, “Well, why would someone do that in 10 years?” And then if you look back for centuries, it’s always the case that if it is useful, it gets cheaper and easier to use. It proliferates; it becomes the default. And then the next revolution comes along and completely turns everything on its head. My bet is that every browser, search engine, and app is going to get represented by some kind of conversational interface, some kind of generative interface. The UI that you experience is going to be automagically produced by an LLM in three or five years, and that is going to be the default. And they’ll be representing the brands, businesses, influencers, celebrities, academics, activists, and organizations, just as each one of those stakeholders in society ended up getting a podcast, getting a website, writing a blog, maybe building an app, or using the telephone back in the day.

The technological revolution produces a new interface, which completely shuffles the way that things are distributed. And some organizations adapt really fast and they jump on board and it kind of transforms their businesses and their organizations, and some don’t. There will be an adjustment. We’ll look back by 2030 and be like, “Oh, that really was the kind of moment when there was this true inflection point because these conversational AIs really are the primary way that we have these interactions.” And so, you’re absolutely right. A brand and a business are going to use that AI to talk to your personal companion AI because I don’t really like doing that kind of shopping. And some people do, and they’ll do that kind of direct-to-consumer browsing experience. Many people don’t like it, and it’s actually super frustrating, hard, and slow.

And so, increasingly you’ll come to work with your personal AI companion to go and be that interface, to go and negotiate, find great opportunities, and adapt them to your specific context. That’ll just be a much more efficient protocol because AIs can talk to AIs in super real-time. And by the way, let’s not fool ourselves. We already have this on the open web today. We have behind-the-scenes, real-time negotiation between buyers and sellers of ad space, or between search ranking algorithms. So, there’s already that kind of marketplace of AIs. It’s just not explicitly manifested in language. It’s operating in vector space.

Well, that’s the part I’m really curious about. The idea that natural language is the paradigm shift. I think it’s very powerful. I don’t think it has been expressed very clearly, but the notion that actually the next form of computing is inherently based in natural language, that I’m just going to talk to the computer and it’s going to go off and do some stuff because it understands me, is very powerful. I buy it.

How that actually plays out on the back end is the part that, to me, still feels up in the air, right? I’m going to ask for a sandwich, that necessitates there to be companies that are in the business of bringing me a sandwich, and how they talk to my AI and how they stay in business seems very challenging. Right now, those companies, they’re in business because they can sell ad space on my phone to the other companies that actually make the sandwiches. They have upsells. There are a million different ways that these companies make money. If they abstract themselves down to their AI talks to my AI and says, “Okay, here’s a sandwich,” and I take away all of their other revenue opportunities, I’m not sure that that ecosystem can stay relevant or even alive.

I’m not sure about that. I mean, your sandwich-making AI is still going to want to sell itself, be persuasive, be entertaining, and produce content for the consumer, right? It’s not that it kind of completely disintermediates and disconnects. Brand and display advertising is still super relevant, and there will be ways that that sandwich-making AI shows up in the context of your personal AI context in (maybe) a sponsored way too. So, there’ll still be that core framework of keyword bidding, paying for presence, and paying for awareness. There’s still going to be ranking — that is still going to be relevant to some extent. It’s just that you are going to be represented by a personal AI companion that is going to be that interlocutor or negotiator, and those two AIs are going to have an exchange in natural language, which is what we would want. We’d want to be able to go back and audit that negotiation and check where the error came from,  see if it really was a good price in hindsight and all the rest of it.

As you start to build these products in Copilot, have you had these negotiations with these other providers? Have they started to say what they would want?

We’ve talked; I wouldn’t describe them as negotiations. I mean, I think lots of brands and businesses are building their own AIs. Today, they’re characterized as customer support AIs that pop up on your website. But tomorrow, in two or three years’ time, they’re going to be fully animated, conversational rich, clever, smart, digital Copilots that live in social media. They’re going to appear on TikTok. They’re going to be part of the cultural space. So I think that there’s not so much negotiation to happen there. I think it’s just this inevitable tide of the arrival of these Copilot agents.

You run MSN, you obviously have peers at Microsoft who run other kinds of social networks, and other kinds of information products. I see a flood of AI slop choking out some of these networks. I’ve searched Facebook for Spaghetti Jesus and I have seen the other side of the singularity, my friend. We already had one conversation about determining high quality, and the answer is sort of, “I know it when I see it.” But if you run these networks and you’re faced with a bunch of agent AIs who are talking or AI influencers on TikTok, can you label that stuff effectively? Can you make it so that users can only see things from other people?

You certainly can. It would require a shift in the identity management system of the platform, which has a lot of pros and cons. You can certainly tell which accounts come from a human and which are AI-generated. To some extent, I think there can be digital watermarking and signing for verified human content or verified AI content from a specific source. And then to some extent, there can be detection of synthetically generated content, because that does have a specific signature. Long term, I don’t think that’s a defense. I think it is going to be perfectly photorealistic, very high quality, and it is going to be a game of cat-and-mouse just as it has been in security, privacy, and information for decades and centuries actually. So, I expect that to continue. It is going to get harder and more nuanced, but this is the natural trajectory of things.

Do the people who run LinkedIn or do your folks at MSN say, “This is a problem that we can’t stop”? We need to make sure we don’t have too much AI content here because right now it’s not good enough. I can see it a mile away. I see those bullet points. I think someone made this with ChatGPT. I don’t even want to read it. Is that a problem that you’re facing right now, or is it a problem to come?

I think it’s a problem to come, but the thing I would say is we humans are behaviorists, right? We observe the output of other humans and we evaluate and decipher trust, based on the quality of information with respect to our own assessment. Is it accurate? Is it reliable? Is that person consistently doing what they said they would do? And so we can observe their actions. Rather than sort of introspecting, why did this happen? Why did this neural network generate this output? Why did this person come to this conclusion? And that’s actually an important distinction because I think a lot of purists are kind of fixated on the causal explanation for why an output has been produced rather than the more observational assessment of, “Was it useful? Does it do the same thing over and over again?” That’s what drives trust.

I do think poor-quality content will be detectable in that sense, or AI content that is deliberately misrepresentative or misinforming will be detectable because I think we’ll have better models. We are getting them all the time for ranking down and deprioritizing certain types of content.

One of the things that I’ve been thinking about a lot throughout this conversation… You’re in charge of Microsoft’s consumer business. Microsoft’s consumer business, I think famously right now in 2024, is built around not making the iPhone, right? That’s the thing that Microsoft famously missed in consumer. It has nothing to do with you, but the iPhone happened. 

Microsoft pivoted to being an enterprise business, and it’s not slowly coming back because I think the company rightfully sees a platform shift, a paradigm shift, in computing. Apple still exists, and the iPhone still exists. You said, “I’ve got this icon on my iPhone, it made it onto the home screen and it’s in this preferred position,” the position everybody wants in the bottom corner. Apple has a pretty big distribution advantage here. They have a deal with OpenAI to use ChatGPT. Can you make products so good that you overcome the iPhone’s distribution advantage? That they’re bundling into the operating system?

It is a great question. I mean, distribution and defaults really matter. And so, from our perspective, obviously we’re focused on distribution deals, but fundamentally we are also focused on building something that’s truly differentiated. To me, that AI companion really is going to be differentiated. The tone and the style of that interaction matter. The fact that it will be able to remember you and what you’ve said over time, it will reach out at that opportune moment just before a difficult moment in your life when you’re starting a new job or your kid is having their birthday party, or something — you’re in a moment where having your companion reach out and be supportive is a differentiator. And that’s how a lot of people make their decisions, and it’s how a lot of people seek support.

So I think that’s a really big opportunity to spread a good vibe and spread kindness. And I think most apps and most product thinking in Silicon Valley doesn’t really engage with that kind of emotional plane in the way that the advertising industry in New York just thinks of that as second nature, for example. I think that’s a big shift that we’re making as an industry and it’s certainly one of the areas that we are going to be focused on in Copilot. We have to build something that is really beautiful and differentiated. It is going to be a real challenge. It’s not easy.

Do you think this is an opportunity to build consumer hardware again? Not a phone, but whatever comes after the phone?

I am open-minded about that. I think that the voice-first experiences are going to be transformational and they do represent a new platform. I think we’re increasingly tired of our screens. I’m frankly sick of looking at a grid of multicolored icons on my iPhone. I think many people are. You sort of feel trapped down this structured, fixed unit of tapping these things. And I don’t know, I think people are looking for more opportunities to go hands-free and to be away from keyboards and screens, and leave your phone at home. So, I think there’s a lot of opportunity there. I’m very, very interested in that space.

Have you played with the products that are out now? Humane’s? The Rabbit’s?

I have. I played with all of them, yeah. And I’ve actually just come back from an extended trip to China where I visited all the big manufacturing companies, and got to know those guys. Very impressive what they’re doing out there, moving at light speeds. Very, very interesting to see.

Should we expect hardware from you?

Not anytime soon, but I think we are a huge company. We’re keeping an open mind about lots of things and we will see how things go.

Very good. Well, Mustafa, we’re going to have to have you back very soon. I have a million questions here I didn’t get a chance to ask you. This was great. Thank you so much for being on the show.

This has been a lot of fun. Thanks, Nilay. I really appreciate it. Talk to you soon.

Decoder with Nilay Patel /

A podcast from The Verge about big ideas and other problems.

SUBSCRIBE NOW!

Read Entire Article