AI, China, Leadership, Performance, Technology

The Whale in the Room: Three Insights on DeepSeek

You win some. You lose some.

But when you lose half a trillion dollars in a single day, it’s hard to be philosophical. Ask Jensen Huang, the founder and CEO of NVIDIA, whose stock price fell off a cliff on Monday, January 27, 2025, taking the stock price of the entire alphabet of tech firms with it.

What triggered this single-day, single-company record-setting loss on Wall Street? The launch, a week earlier, of an AI chatbot from China called DeepSeek-V3. According to a research paper released by DeepSeek, the performance results of this little known AI model were shocking to the unprepared West.

DeepSeek’s chatbot is at comparable levels with proprietary AI models like Open AI’s GPT4o and Anthropic’s Claude-Sonnet-3.5, and even outperforms all  open-source models, including Meta’s Llama-3.1-405B model in benchmarks measuring diversity and depth of knowledge. And in terms of coding, match and reasoning benchmarks, DeepSeek’s chatbot is on par with all major models.

DeepSeek accomplished this spending only $5.6 million on the training phase. It cost Meta an estimated $100 million to train Llama 3.1.  Here’s how Peter Diamandis describes this David vs Goliath tale of the tape in his January 28 newsletter:

OpenAI was founded 10 years ago, has around 4,500 employees, and has raised $6.6 billion in capital. DeepSeek was founded less than 2 years ago, has 200 employees, and was developed for roughly $5 million. While tech giants like OpenAI and Anthropic have been spending $100M+ just to train their AI models, this small 200-person team out of China built an AI system matching GPT-4’s performance for 20x less money.

Here are a few insights to be noted from this mini-sputnik moment:

  • The performance gap between open and closed models is closing
  • Necessity is the mother of invention
  • Disruption can come from anywhere, so beware and be bold

The Gap is Closing

The race for AGI and ASI is often framed geopolitically – a battle for AI supremacy between the US and China. But DeepSeek’s splash onto the global arena shows that it may really be a battle between closed and open-source models. By developing an open-sourced model, DeepSeek saves on licensing fees and leverages the community to debug and maintain the model, as well as develop new tools for it. And they provide this model to anyone for free.

Meta does the same, but its Llama models have generally lagged in performance compared to the bright and shiny models of OpenAI and Anthropic. But now that DeepSeek has shown open-source can be as good as the best of the proprietary models, the perceived lead of the more expensive closed models is under threat.

Ma Necessity Gives Birth to Baby Innovation

In October of 2022, the American government, in essence, restricted the sale and distribution of advanced computing chips like NVIDIA’s GPUs “destined for a supercomputer or semiconductor development or production end use in the PRC.”

But as leading AI technologist and investor from China, Kai-fu Li said, not only is it really difficult for Chinese companies to get advanced chips, it’s too expensive for most organizations or institutions in China to purchase them anyway.

“We couldn’t afford 10,000 GPUs,” he explained in this interview about his own company’s efforts in building leading-edge AI models. “Basically, we did production runs on only 2,000 GPUs, which is a small fraction of what the US companies are using. Elon Musk just put together 100,000 H100s, and Open AI  even more. We have basically less than 2% of their compute. But I am a deep believer in efficiency, power of engineering, small teams working together, vertical integration. I’m strong believer that necessity is the mother of innovation.”

To me, it feels like we’re witnessing a stark comparison between a 3-star Michelin chef with a professional team and a fully stocked pantry (frontier AI companies), and a sole street vendor, with limited ingredients (DeepSeek). The chef asks the team to experiment with exotic flavors and lavish techniques, perhaps even discarding dishes that don’t meet their standards. The street vendor has to be crafty and resourceful, maximizing flavors and minimizing waste to create something tasty. This necessity to “do more with less” can lead to new and surprising meals.

Disruption Can Come From Anywhere

The name on everyone’s lips is Liang Wenfeng, the founder of DeepSeek. He is a 39 year old native of Guangdong Province in southern China who has a master in Information and Communication Engineering, and founded a quant hedge fund called High-Flyer in 2016. His goal early on was to leverage artificial intelligence to enhance trading performance, and very wisely stockpiled NVIDIA chips, before the ban on China was established.

According to this source, “fewer than five companies in (China) owned over 10,000 GPUs, apart from major tech giants. One of them was High-Flyer.” The New York Times explained that DeepSeek used NVIDIA’s H800 chips to train its most recent model. The H800s are not restricted under the chip export ban, as it is a watered-down version of NVIDIA’s H100, specifically marketed to the Chinese market. The Times went on to explain that the Biden administration wasn’t happy with NVIDIA’s technically legal action, so it banned the H800 too, but not quickly enough to stop DeepSeek and other Chinese companies from snapping them up.

Somehow, despite having the computational fire-power to develop cutting-edge AI models, DeepSeek flew under the radar, even to the Chinese media, based on this July, 2024 article published by 量子位 (QbitAI).

Among China’s seven large model startup companies, DeepSeek (深度求索) is the most low-profile, but it always manages to be remembered in unexpected ways.

This article was written after DeepSeek had released an earlier version of its LLM called V2, and it explained how that version caught the surprised eye of OpenAI.

In Silicon Valley, DeepSeek is referred to as a “mysterious force from the East.” SemiAnalysis’s chief analyst believes that the DeepSeek V2 paper “might be the best one this year.” Former OpenAI employee Andrew Carr thinks the paper is “full of astonishing wisdom” and has applied its training settings to his own model. Former OpenAI policy director and Anthropic co-founder Jack Clark believes DeepSeek “has hired a group of inscrutable geniuses” and thinks that large models made in China “will become an unignorable force, just like drones and electric vehicles.

The surprise by the West, particularly the US is based on the premise that China has progressed by copying innovation, not sparking it. Liang Wenfeng said that perception is no longer valid.

“What surprised them was that a Chinese company was participating as an innovator in their game. After all, most Chinese companies are used to following rather than innovating.”

Sam’s Subtle Rebuttal

On January 28, a week after DeepSeekv3 turned the AI world upside down, OpenAI founder and CEO responded on X, saying “deepseek’s r1 is an impressive model, particularly around what they’re able to deliver for the price.”

The tech world, including OpenAI are amazed at DeepSeek’s innovation, and Altman said his competitive spirit has been sparked. “We will obviously deliver much better models and also it’s legit invigorating to have a new competitor!”

But Altman believes the long game towards AGI and ASI requires massive technological firepower and expense, and that advancements in AI have just begun.

“Mostly we are excited to continue to execute on our research roadmap and believe more compute is more important now than ever before to succeed at our mission. the world is going to want to use a LOT of ai, and really be quite amazed by the next gen models coming.”

ARTICLE FAQS

1. Why was the launch of DeepSeek considered so disruptive?
DeepSeek’s V3 model matched the performance of leading proprietary systems like GPT-4o and Claude-Sonnet-3.5 while costing a fraction of the training budgets of Western tech giants. Its emergence challenged assumptions about who holds the edge in AI and showed that disruption can come from smaller, resource-constrained teams.

2. What does DeepSeek reveal about the competition between open and closed AI models?
The gap between open-source and proprietary models is narrowing. Open-source models benefit from community support, lower costs, and wider accessibility, which puts pressure on expensive closed systems to justify their higher investment.

3. How did resource constraints shape DeepSeek’s development?
Restrictions on access to advanced chips forced DeepSeek to innovate with fewer GPUs and leaner processes. This necessity led to more efficient engineering and resourceful problem-solving, proving that breakthroughs do not always require massive computational power.

4. What does DeepSeek’s rise say about China’s role in AI innovation?
DeepSeek challenges the stereotype that Chinese companies primarily copy rather than innovate. Its success demonstrates China’s growing capacity to lead in AI research and development, joining other industries like drones and electric vehicles where it has become globally competitive.

5. How did global markets react to DeepSeek’s launch?
The announcement rattled financial markets, contributing to a historic drop in NVIDIA’s stock price and raising concerns about the competitive balance in the AI industry. It underscored how quickly new players can reshape expectations and valuations.

6. What is the broader lesson for established AI companies?
Even well-funded leaders like OpenAI and Anthropic must stay alert, as innovation can emerge from unexpected places. Efficiency, agility, and boldness can rival scale, and the long-term race toward AGI will not be decided by money alone.

Leave a Reply