Jensen Huang’s Latest Podcast: AI is Transitioning from the “Model Era” to the “System Era”

12٬636 0 5

Video Author: All-In Podcast

Compiled by: Peggy, BlockBeats

Editor’s Note: As the AI narrative continues to heat up, the market’s focus is shifting from “how powerful is the model” to “how will the system be implemented.” Over the past two years, the industry has successively experienced breakthroughs in large model capabilities, a training compute race, and the expansion of generative applications. But as these stages gradually become consensus, new questions emerge: When AI is no longer just answering questions but starts executing tasks, embedding into enterprise workflows, and entering the physical world, what are the underlying conditions that will support its continued advancement?

This dialogue is excerpted from the renowned tech podcast All-In Podcast. As one of Silicon Valley’s most influential investor podcasts, the show is co-hosted by four investors who have long been active on the front lines and is known for its in-depth discussions on technology, business, and macro trends.

The four hosts of the show are:

Jason Calacanis, an early internet entrepreneur and angel investor, widely known for investments in companies like Uber and Robinhood;
Chamath Palihapitiya, founder of Social Capital, former Facebook executive, who has invested in numerous tech companies including Slack and Box;
David Sacks, partner at Craft Ventures, a member of the “PayPal Mafia,” founder of Yammer which was sold to Microsoft for approximately $1.2 billion, and an early investor in Airbnb and Uber;
David Friedberg, founder of The Production Board, focusing on investments in agriculture, climate, and life sciences, and founder of The Climate Corporation (later acquired by Monsanto).

The guest for this episode is Jensen Huang, co-founder and CEO of NVIDIA, regarded as one of the most critical drivers in the current wave of AI infrastructure.

Jensen Huang's Latest Podcast: AI is Transitioning from the

From left to right: David Friedberg, Chamath Palihapitiya, David Sacks, Jensen Huang, Jason Calacanis

The entire interview can be roughly summarized into three layers.

First, AI infrastructure is changing. In the past, the market’s understanding of AI was largely built on more powerful GPUs and more data centers. But what Jensen Huang wants to emphasize is that future competition is no longer just about a single chip, but about competition between entire systems. As inference demand rises, model types proliferate, and agents begin to handle more complex tasks, AI computing is shifting from a relatively singular model to more complex, more specialized system collaboration. NVIDIA is thus attempting to evolve its role from a chip company to a builder of “AI factories.”

Second, AI is moving from “generating content” to “completing tasks.” This is the most crucial thread in this interview. ChatGPT gave the public its first intuitive sense of AI’s capabilities, but in Jensen Huang’s view, the truly bigger change is AI starting to enter workflows in the form of agents: it’s not just answering questions, but can call tools, decompose tasks, coordinate execution, and ultimately get things done. Precisely because of this, what users are willing to pay for will gradually shift from “getting an answer” to “getting a result.” This implies greater inference demand, higher system complexity, and also means that the ways of software development, organizational management, and knowledge work may all be rewritten.

Finally, AI is extending from the digital world into the real world. In the interview, whether discussing autonomous driving, robotics, healthcare, digital biology, or what Jensen calls Physical AI, they all point to the same trend: the value of AI is not only reflected on screens but will increasingly manifest in factories, hospitals, cars, end devices, and daily life. But this also means that AI will no longer face just technical challenges, but also more complex real-world constraints including supply chains, policy, regulation, manufacturing capabilities, and geopolitics. In other words, the next round of AI expansion will be a true process of industrialization.

From this perspective, what is most noteworthy about this conversation is not a specific product or an optimistic number, but a judgment Jensen repeatedly conveys: AI is moving from the “model era” to the “system era.” Future competition is not just about whose model is bigger or whose compute is stronger, but about who understands industries better, who can embed AI deeper into real processes, and who can organize these capabilities into a runnable, scalable system.

This also makes the subject of this article extend beyond NVIDIA itself. The question it truly attempts to answer is: As AI gradually becomes infrastructure, how will the next round of industrial restructuring unfold, and where will new value be formed?

The following is the original content (edited for readability):

ملخص طويل

AI infrastructure is moving from “single GPU” to a disaggregated architecture. Different computing tasks will be handled collaboratively by GPUs, CPUs, networking chips, and inference chips like Groq.
NVIDIA is transforming from a GPU company into an “AI factory company” providing complete systems. It sells entire infrastructure, not single chips.
The key metric for AI cost is not data center price, but token cost and throughput efficiency. A more expensive system might actually be cheaper.
AI is moving from generative models to the Agent era. Users are truly willing to pay for “getting things done,” not just getting answers.
Compute demand is exploding. From generation to inference to agents, it may have already increased by over 10,000x in a short time and is still accelerating.
Future software development will change. Engineers will no longer just write code, but تحديne problems, design architectures, and collaborate with agents.
Long-term, the biggest opportunities lie in deep specialization within vertical domains, not in the general models themselves. Whoever understands the industry better has a stronger moat.

Interview Transcript

Jason Calacanis (Renowned Angel Investor | All-In Podcast Host | Early Investor in Uber):

This week is a special episode. We let our regular weekly show “step aside,” and this treatment we usually only give to three types of people: President Trump, Jesus, and Jensen Huang (founder and CEO of NVIDIA). You decide how to rank these three. Your momentum lately has been incredible, and this GTC was also a huge success.

Jensen Huang (CEO, NVIDIA):

The entire industry showed up. Almost every tech company, every AI company was there.

Jason Calacanis:

It’s unbelievable, truly extraordinary. One of the biggest announcements in the past year was Groq. When you acquired Groq, did you realize how “insufferable” it would make Chamath?

Note: Groq is not Grok. The former is a company making AI inference chips and inference cloud, the latter is xAI’s chatbot. In late 2025, Groq and NVIDIA reached a non-exclusive inference technology licensing agreement; the official transaction amount was not disclosed, but there were external reports and speculation around $170-200 billion. By GTC 2026, Jensen Huang further showcased inference systems integrating Groq technology into the NVIDIA platform.

The Chamath mentioned here refers to Chamath Palihapitiya (Founder, Social Capital | Former Facebook Executive | All-In Host). He is both one of the four All-In hosts and was an early investor and board member of Groq. Therefore, when the major NVIDIA-Groq deal surfaced, it was seen as another key project Chamath backed correctly.

Jensen Huang:

I had an inkling.

Jason Calacanis:

We have to deal with him every week.

Jensen Huang:

I know. And you had to endure the entire six-week closing period with him.

Jason Calacanis:

That’s right.

From GPU Company to “AI Factory” Company

Jensen Huang:

Actually, we announce many of our strategies years in advance at GTC. Two and a half years ago, I introduced the operating system for AI factories, called Dynamo.

You know, a dynamo is originally a device, invented by Siemens, that converts water energy into electricity, powering the factory system of the last industrial revolution. So I thought that name was perfect for the “factory operating system” of the next industrial revolution. And within Dynamo, one of the core technologies is disaggregated inference.

Jason Calacanis:

Jensen, I know you’re deeply technical. Go ahead, you define it. I don’t want to steal your thunder.

Jensen Huang:

Thank you. So-called disaggregated inference means: the entire inference processing pipeline is extremely complex, perhaps the most complex class of computing problems today.

Its scale is staggering, containing massive amounts of mathematical computations of different forms and sizes. Our idea is to split the entire processing flow, letting one part run on one type of GPU, another part on another type of GPU. Further, this also made us realize that perhaps disaggregated computing itself is a reasonable direction: we can absolutely have different types, different natures of computing resources work together.

The same line of thinking later led us to Mellanox. Look today, NVIDIA’s computing is already distributed across GPUs, CPUs, switches, scale-up switches, scale-out switches, and network processors. Now, we’re adding Groq to the mix.

Our goal is to put the right workload on the right chip. In other words, we have evolved from a GPU company into an AI factory company.

David Sacks (Partner, Craft Ventures | Former PayPal COO | All-In Host):

For me, this is probably the most important insight. What you’re seeing now is a fundamental “disaggregation.” In the past, there was only the GPU option, but now more and more different computing forms are emerging, and these choices will coexist in the future.

You mentioned something on stage that I think everyone doing high-value inference should listen to carefully: you said that about 25% of the space in a data center should be configured for Groq’s LPUs.

Note: LPU stands for Language Processing Unit. This is a chip category proposed by Groq, with a core positioning not for training, but for inference.

Jensen Huang:

Yes, in a data center, you could allocate about 25% of the Vera Rubin system to Groq.

Note: Vera Rubin is NVIDIA’s next-generation AI platform architecture. It’s not a single chip, but a system-level infrastructure platform for AI factories.

David Sacks:

Can you talk about how the industry views this direction now? Essentially, you’re building the next-generation disaggregated architecture: prefill, decode separation, inference flow split. How do you think people will react?

Jensen Huang:

Let’s step back first. We added this capability to the system because the entire industry has shifted from large language model processing to Agentic Processing.

When you run an agent, it accesses working memory, long-term memory, calls tools—this puts enormous pressure on storage. You’ll also see agents collaborating with agents. Some agents use huge models, some small models; some are diffusion models, some are autoregressive models. That is, inside this data center, there will simultaneously exist all kinds of completely different model types. We built Vera Rubin to handle this extreme diversity of loads.

So, in the past we were a company with “one type of rack,” now we’ve added four more rack types. In other words, NVIDIA’s TAM, the total addressable market, instantly expanded, roughly 33% to 50% higher than before.

And a large part of this new 33% to 50% will be storage processors, i.e., BlueField; a part, I personally hope a large part, will be Groq processors; another part will be CPUs; and of course, many network processors. All of these together are ultimately running that “new type of computer” of the AI revolution, which is agents. It’s the operating system of modern industry.

Chamath Palihapitiya (Founder, Social Capital | Former Facebook Executive | All-In Host):

What about embedded applications? Like the teddy bear in my daughter’s house, if it wants to talk to her, what would be inside? A custom ASIC? Or will there be a broader TAM in edge and embedded scenarios in the future, with different tools for different scenarios?

Note: ASIC stands for Application-Specific Integrated Circuit, TAM stands for Total Addressable سوق.

Jensen Huang:

We believe there are actually three computers in this question.

The first, at the largest scale, is the computer for training AI models, developing AI, creating AI.

The second is the computer for evaluating AI. Look around, there are robots, cars, things like that everywhere. You must first put them into a virtual environment that represents the physical world for evaluation. That is, the software itself must obey the laws of physics. We call this system Omniverse.

The third is the computer deployed at the edge, the robot computer. It could be an autonomous vehicle, a robot, even a little teddy bear.

For devices like teddy bears, one very important direction we’re working on is: turning telecom base stations into part of the AI infrastructure. This way, the entire $2 trillion telecom industry will gradually become an extension of AI infrastructure. So, radio equipment will become edge devices, factories will become edge devices, warehouses too.

In short, all three of these foundational computers are absolutely essential.

David Friedberg (Founder, The Production Board | All-In Podcast Host):

Jensen, last year I already thought you saw things earlier than the whole world. You said then that inference demand growth wouldn’t be just 1000x.

Jensen Huang:

Did I dig myself into a hole?

David Friedberg:

But would grow 1 million times? 1 billion times? Right?

I think many people back then thought that was exaggerated, because the whole world was still focused on training scale-up. But now you see, inference has truly exploded and started becoming “inference-constrained.” Now you’ve released an “inference factory” with 10x the throughput of the next-gen factory.

But if you look at external discussions, many people say: your inference factory costs $40-50 billion, while those alternative solutions, like custom ASICs, AMD, etc., only cost $25-30 billion, so you’ll lose market share.

So why don’t you just tell us: what exactly are you seeing? How do you view market share? Is it worth it for these customers to pay nearly double the premium?

Why a More Expensive System Can Produce Cheaper رمز مميزs?

Jensen Huang:

The most important point, the core point is: don’t equate the price of the factory with the price of a token, nor with the cost of a token.

It’s very possible, and I can prove it, that the $50 billion factory can actually produce the lowest-cost tokens. The reason is, our efficiency in generating these tokens is astonishingly high, up to 10x higher.

Look, the difference between $50 billion and $20 billion, a lot of it is just land, power, and the building shell. Besides that, you still have to buy storage, networking, CPUs, servers, cooling systems. So, whether the GPU itself is at full price or half price doesn’t drop the total cost from $50 billion directly to $30 billion. Pick any number you like, more realistically, it might just drop from $50 billion to $40 billion.

And if a $50 billion data center has 10x the throughput, then that price difference isn’t really significant.

Jason Calacanis: Got it.

Jensen Huang:

That’s also why I always say: even for many chips, if you can’t keep up with the technological frontier, with the pace we’re advancing, then even giving the chips away for free still wouldn’t be cheap enough.

David Sacks:

I want to ask a more macro strategic question. You’re now running the world’s most valuable company. Next year’s revenue might exceed $350 billion, free cash flow $200 billion, and it’s still compounding at a crazy rate.

How exactly do you make decisions? How do you get information? Everyone knows about your famous email system now, but how do you actually form intuition, shape the market, decide where to place big bets, where to pull back, where to enter new fields? How does that information reach you? And how do you make the final judgment?

Jensen Huang:

That’s the CEO’s job.

David Sacks:

Right.

Jensen Huang:

Our responsibility is to define the vision, define the strategy. Of course, we get inspiration and information from the brilliant computer scientists, technologists within the company, and countless excellent employees, but ultimately, shaping the future is our responsibility.

Part of the criteria is: is this thing ridiculously difficult? If it’s not difficult enough, we should stay away from it. The reason is simple, if something is easy to do, there will definitely be a ton of competitors.

Is it something no one has ever done before, and ridiculously difficult? Does it happen to mobilize our company’s unique “superpowers”? So I have to find an intersection point: it must meet all

هذا المقال مصدره من الانترنت: Jensen Huang’s Latest Podcast: AI is Transitioning from the “Model Era” to the “System Era”