DeepSeek
| DeepSeek | |
|---|---|
| 杭州深度求索人工智能基础技术研究有限公司 | |
| Type | Private |
| Industry | Artificial intelligence |
| Founded | July 17, 2023 |
| Founder | Liang Wenfeng |
| Headquarters | Hangzhou, Zhejiang, China |
| Key people | Liang Wenfeng (CEO) |
| Owner | High-Flyer Capital Management (Hangzhou Huanfang Technology) |
| Products | DeepSeek-V2, DeepSeek-V3, DeepSeek-Coder, DeepSeek-Coder-V2, DeepSeek-R1, DeepSeek-VL2
|
| Employees | 160 (2025) |
| Website | https://www.deepseek.com/ |
DeepSeek (Chinese: 杭州深度求索人工智能基础技术研究有限公司; commonly DeepSeek AI or simply DeepSeek) is a Chinese artificial intelligence company known for developing large language models (LLMs) and releasing several prominent open-source and research models. Founded in 2023 by hedge fund entrepreneur Liang Wenfeng, the company has gained international recognition for achieving competitive performance with leading Western AI models at dramatically lower training costs.[1][2]
DeepSeek rose to global prominence in January 2025 when its mobile app briefly topped the Apple App Store's free charts in the United States, following the release of its reasoning-focused DeepSeek-R1 models. The company's claim of training competitive models for under $6 million using Nvidia H800 GPUs, compared to over $100 million for Western equivalents, caused significant market disruption, with Nvidia losing nearly $600 billion in market capitalization.[3][4]
History
Background and Origins (2016–2023)
DeepSeek's origins trace back to High-Flyer Capital Management, a Chinese quantitative hedge fund co-founded in February 2016 by Liang Wenfeng and two classmates from Zhejiang University.[1] High-Flyer began adopting deep learning models for stock trading on October 21, 2016, transitioning from CPU-based linear models to GPU-dependent systems. By 2021, the fund relied exclusively on AI for trading operations.[5]
In 2019, High-Flyer built its first computing cluster, Fire-Flyer (萤火一号), at a cost of 200 million yuan, equipped with 1,100 GPUs. Anticipating U.S. export restrictions on advanced chips to China, Liang acquired 10,000 Nvidia A100 units before restrictions took effect. Construction of Fire-Flyer 2 (萤火二号) began in 2021 with a 1 billion yuan budget, incorporating 5,000 PCIe A100 GPUs across 625 nodes by 2022.[5][6]
Founding and Early Development (2023–2024)
On April 14, 2023, High-Flyer announced the establishment of an artificial general intelligence (AGI) research lab. This lab was formally incorporated as DeepSeek on July 17, 2023, with High-Flyer serving as the principal investor. Venture capital firms were initially reluctant to invest, considering the lack of short-term exit opportunities.[1][7]
The company released its first model, DeepSeek Coder, on November 2, 2023, followed by the DeepSeek-LLM series on November 29, 2023. Throughout 2024, DeepSeek continued releasing specialized models:
- January 2024: DeepSeek-MoE models (Base and Chat variants)
- April 2024: DeepSeek-Math models (Base, Instruct, and RL)
- May 2024: DeepSeek-V2
- June 2024: DeepSeek-Coder V2 series
- September 2024: DeepSeek V2.5[5]
Global Breakthrough (2025)
In December 2024, DeepSeek released DeepSeek-V3, featuring a Mixture of Experts architecture with 671 billion total parameters. On January 20, 2025, the company announced DeepSeek-R1, a reasoning-centric model using pure reinforcement learning that matched performance of OpenAI's o1 family at significantly lower costs.[8][9]
DeepSeek's mobile app reached #1 among free apps on the U.S. Apple App Store on January 27–28, 2025. This surge coincided with an 18% drop in Nvidia's share price and over $1 trillion erased from U.S. tech market capitalization. Prominent tech investor Marc Andreessen described this as "AI's Sputnik moment."[3][10][4]
On January 27-28, 2025, DeepSeek reported large-scale malicious attacks on its services, temporarily restricting new sign-ups.[11]
Technology
Architecture
Mixture of Experts (MoE)
DeepSeek's models employ a Mixture of Experts architecture, which allows massive parameter counts while maintaining computational efficiency. The MoE framework in DeepSeek-V3 consists of:[12][13]
- 671 billion total parameters
- 37 billion activated parameters per forward pass
- 256 routed experts per layer (increased from 160 in V2)
- 1 shared expert per layer that is always activated
- 3 all-experts-activated layers
Multi-head Latent Attention (MLA)
DeepSeek-V2 and subsequent models incorporate Multi-head Latent Attention (MLA), a modified attention mechanism that compresses the key-value (KV) cache. MLA achieves:[2][14]
- KV-cache reduction to 5-13% of traditional methods
- Significant memory overhead reduction during inference
- Support for 128K-164K token context windows
- Lower computational cost for long-context processing
Training Methodology
DeepSeek-R1 employs a distinctive training pipeline:[8][15] 1. Cold Start Phase: Fine-tuning base model with curated chain-of-thought reasoning examples 2. Reasoning-Oriented Reinforcement Learning: Large-scale RL focusing on rule-based evaluation tasks 3. Supervised Fine-Tuning: Combining reasoning and non-reasoning data 4. RL for All Scenarios: Final refinement for helpfulness and harmlessness
DeepSeek Sparse Attention (DSA)
Introduced in DeepSeek-V3.2-Exp (September 2025), DSA is a fine-grained sparse attention mechanism optimized for long-context training and inference efficiency with minimal performance impact.[16]
DeepSeek-OCR (2025)
In October 2025, DeepSeek released DeepSeek-OCR, an open-source end-to-end document OCR and understanding system that explores “contexts optical compression”—representing long text as images and decoding it back with a vision–language stack to save tokens for long-context LLM applications.[17][18]
- Architecture: A ~380M-parameter DeepEncoder (SAM-base window attention → 16× token compression via 2-layer conv → CLIP-large global attention) feeds a 3B MoE decoder (DeepSeek-3B-MoE-A570M; ~570M active params at inference). Multiple resolution modes control vision-token budgets: Tiny (64 tokens, 512²), Small (100, 640²), Base (256, 1024²), Large (400, 1280²), plus tiled Gundam (n×100 + 256 tokens) and Gundam-M modes for ultra-high-res pages.[17]
- Reported compression/accuracy: On a Fox benchmark subset (English pages with 600–1,300 text tokens), the paper reports ≈97% decoding precision when text tokens are <10× vision tokens, and ~60% accuracy around 20× compression. On OmniDocBench (edit distance; lower is better), Small (100 tokens) outperforms GOT-OCR 2.0 (256 tokens), and Gundam (<~800 tokens) surpasses MinerU-2.0 (~6,790 tokens) in the reported setup.[17]
- Throughput/uses: DeepSeek positions the system as a data-engine for LLM/VLM pretraining—claiming >200k pages/day on a single A100-40G, scalable to tens of millions per day on clusters—plus “deep parsing” of charts, chemical structures (SMILES), and planar geometry into structured outputs (for example HTML tables or dictionaries).[18]
- Availability/ecosystem: Source code and weights are hosted on GitHub and Hugging Face, with examples for Transformers/vLLM inference. Community walkthroughs (for example Simon Willison) documented running the 6.6-GB model on diverse hardware and shared setup notes.[19][20][21]
Infrastructure
DeepSeek operates two primary computing clusters:[5]
- Fire-Flyer 1 (萤火一号): Built 2019, retired after 1.5 years
- Fire-Flyer 2 (萤火二号): Operational since 2022, featuring:
- Nvidia GPUs with 200 Gbps interconnects
- Fat tree topology for high bisection bandwidth
- 3FS distributed file system with Direct I/O and RDMA
- 2,048 Nvidia H800 GPUs used for R1 training
Performance Benchmarks
| Benchmark | DeepSeek-V3 | DeepSeek-R1 | GPT-4o | Description |
|---|---|---|---|---|
| MMLU | 88.5 | 91.8 | 87.2 | Massive Multitask Language Understanding |
| HumanEval | 82.6 | 85.4 | 80.5 | Code Generation |
| MATH-500 | 90.2 | 97.3 | 74.6 | Mathematical Problem-Solving |
| Codeforces | 51.6 | 57.2 | 23.6 | Complex Coding Performance |
| GPQA | 59.1 | 72.3 | N/A | Graduate-Level Question Answering |
| AIME | N/A | 79.8% pass@1 | N/A | American Invitational Mathematics Examination |
Models and Products
Major Model Releases
| Model | Type/Focus | Parameters | Context Length | Release Date | Key Features |
|---|---|---|---|---|---|
| DeepSeek-V2 | General LLM (MoE) | 236B total; 21B active | 128K | May 2024 | MLA; DeepSeekMoE routing[2] |
| DeepSeek-V3 | General LLM (MoE) | 671B total; 37B active | 131K | Dec 2024 | Enhanced MoE; $5.6M training cost[12] |
| DeepSeek-Coder | Code LLMs | Various sizes | 16K | Nov 2023 | 2T tokens; 87% code / 13% NL; infilling[23] |
| DeepSeek-Coder-V2 | Code LLMs (MoE) | 236B total; 21B active | 128K | June 2024 | +6T tokens; GPT-4-Turbo comparable[24] |
| DeepSeek-R1 | Reasoning post-training | 671B total; 37B active | 164K | Jan 2025 | Pure RL training; o1-level performance[8][25] |
| DeepSeek-V3.1 | Hybrid MoE | N/A | N/A | Aug 2025 | 71.6% Aider pass rate[26] |
| DeepSeek-V3.2-Exp | Experimental | N/A | N/A | Sep 2025 | Sparse attention (DSA); Huawei Ascend support[16][27] |
| DeepSeek-VL2 | Vision-Language (MoE) | 27B total; 4.5B active | 4K | 2025 | Multimodal understanding[28] |
Distilled Models
DeepSeek has created smaller, efficient models through knowledge distillation:[29]
- DeepSeek-R1-Distill-Qwen-32B
- DeepSeek-R1-Distill-Llama-70B
- Various models from 1.5B to 70B parameters
API and Pricing
DeepSeek provides API access through the DeepSeek Open Platform with competitive pricing:[30]
- Input costs: $0.07-$0.27 per million tokens (vs $2.50 for GPT-4o)
- Output costs: $1.10 per million tokens (vs $10.00 for GPT-4o)
- 50%+ price reduction following V3.2-Exp release (September 2025)
- Pre-paid billing model
Organization and Leadership
Liang Wenfeng
Liang Wenfeng (梁文锋), born 1985 in Guangdong province, is the founder and CEO of DeepSeek. He graduated from Zhejiang University with:[31]
- Bachelor of Engineering in electronic information engineering (2007)
- Master of Engineering in information and communication engineering (2010)
Liang co-founded High-Flyer in 2015 and began acquiring Nvidia GPUs in 2021, purchasing 10,000 A100 chips before U.S. export restrictions.[6]
Corporate Structure
- 84% owned by Liang Wenfeng through shell corporations
- 16% owned by High-Flyer affiliated individuals
- No external venture capital funding as of 2025
- 160 employees (2025)
- Estimated valuation undisclosed; Liang's personal wealth: $4.5 billion (2025)[30]
Organizational Philosophy
DeepSeek operates with an unconventional structure:[6][5]
- Bottom-up organization with natural division of labor
- No preassigned roles or rigid hierarchy
- Unrestricted computing resource access for researchers
- Emphasis on fresh graduates and non-CS backgrounds
- Recruitment from poetry, advanced mathematics, and other fields
Market Impact and Adoption
User Growth
DeepSeek experienced explosive growth following its January 2025 releases:[30]
- 30 million daily active users within weeks of launch
- 33.7 million monthly active users (4th largest AI application globally)
- Briefly surpassed ChatGPT in daily users (21.6M vs 14.6M)
- Geographic distribution (January 2025):
- China: 30.7%
- India: 13.6%
- Indonesia: 6.9%
- United States: 4.3%
- France: 3.2%
Market Disruption
The "DeepSeek shock" of January 2025 caused:[4][10]
- $1+ trillion erased from U.S. tech market capitalization
- Nvidia stock decline of 17% in single day ($600 billion loss)
- Triggered "Sputnik moment" discussions in U.S. AI industry
- AI price war in China with competitors cutting prices up to 97%
Cost Comparison
| Model | Training Cost | Hardware Used |
|---|---|---|
| DeepSeek-R1 | $5.6 million | 2,048 Nvidia H800 GPUs |
| GPT-4 | $100+ million (est.) | Unknown (likely H100s) |
| Claude 3 | $100+ million (est.) | Unknown |
| Gemini Ultra | $191 million (est.) | TPU v5p |
Controversies and Challenges
Security and Privacy Concerns
Multiple governments and organizations have restricted DeepSeek usage:[32]
- Australian government agencies
- India central government
- South Korea industry ministry
- Taiwan government agencies
- Texas state government
- U.S. Congress and Pentagon
- Potential EU-wide ban under consideration
Export Control Issues
- February 2025: Arrests in Singapore for illegally exporting Nvidia chips to DeepSeek
- April 2025: Trump administration considered blocking DeepSeek from U.S. technology purchases
- Ongoing scrutiny of chip acquisition methods[5]
Content Alignment
DeepSeek-R1-0528 and later models noted for alignment with Chinese government policies and content restrictions.[33]
NIST Evaluation
A September 2025 NIST evaluation found:[33]
- Performance shortcomings compared to U.S. models
- Security vulnerabilities in certain implementations
- Cost calculations disputed by independent analysts
Future Roadmap
2025-2026 Priorities
- DeepSeek Coder 2.0: Support for Rust, Swift, Kotlin, Go
- Multimodal DeepSeek-VL 3.0: Integration of text, vision, and audio
- Private Model Hosting: Enterprise deployment solutions
- Edge AI Models: Sub-1B parameter models for edge devices
- AI Agent Systems: Multi-step task completion, late 2025 release[34]
Long-term Vision (2027-2030)
- AGI Research: $2 billion investment in consciousness-mapping research
- Global Expansion: Operations in 50+ countries by 2028
- AI Ethics Framework: Open-source accountability frameworks
- Energy Efficiency: 40% reduction in training energy via quantum-inspired algorithms[35]
Legal and Compliance
DeepSeek operates under comprehensive service agreements addressing:[36]
- Content management per China's Interim Measures for the Management of Generative Artificial Intelligence Services
- Data security compliance with China's Data Security Law and Personal Information Protection Law
- Technical standards for AI-generated content identification
- Developer responsibilities for content filtering and monitoring
See Also
- Large language model
- Mixture of Experts
- Reinforcement learning
- Artificial general intelligence
- Open-source artificial intelligence
- High-Flyer Capital Management
- Liang Wenfeng
- OpenAI
- GPT-4
- Claude
- Nvidia
References
- ↑ 1.0 1.1 1.2 Reuters - "What is DeepSeek and why is it disrupting the AI sector?" (Jan. 28, 2025). https://www.reuters.com/technology/artificial-intelligence/what-is-deepseek-why-is-it-disrupting-ai-sector-2025-01-27/
- ↑ 2.0 2.1 2.2 arXiv - "DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model" (May 7, 2024). https://arxiv.org/abs/2405.04434
- ↑ 3.0 3.1 The Washington Post - "What is DeepSeek…?" (Jan. 27–28, 2025). https://www.washingtonpost.com/technology/2025/01/27/what-is-deepseek-ai-china-us-stock-fears/
- ↑ 4.0 4.1 4.2 4.3 CNN - "What is DeepSeek, the Chinese AI startup that shook the tech world?" (Jan. 27, 2025). https://edition.cnn.com/2025/01/27/tech/deepseek-ai-explainer/index.html
- ↑ 5.0 5.1 5.2 5.3 5.4 5.5 Wikipedia contributors, "DeepSeek," Wikipedia, The Free Encyclopedia, accessed October 2025
- ↑ 6.0 6.1 6.2 Fortune - "Meet DeepSeek founder Liang Wenfeng, a hedge fund manager" (April 2025). https://fortune.com/2025/01/27/deepseek-founder-liang-wenfeng-hedge-fund-manager-high-flyer-quant-trading/
- ↑ TechCrunch - "DeepSeek isn't taking VC money yet - here are 3 reasons why" (March 2025). https://techcrunch.com/2025/03/10/deepseek-isnt-taking-vc-money-yet-here-are-3-reasons-why/
- ↑ 8.0 8.1 8.2 8.3 arXiv - "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" (Jan. 2025). https://arxiv.org/abs/2501.12948
- ↑ Nature News - "Secrets of DeepSeek AI model revealed in landmark paper" (Sep. 17, 2025). https://www.nature.com/articles/d41586-025-03015-6
- ↑ 10.0 10.1 Reuters - "China's DeepSeek sets off AI market rout" (Jan. 27, 2025). https://www.reuters.com/technology/chinas-deepseek-sets-off-ai-market-rout-2025-01-27/
- ↑ The Verge - "DeepSeek's top-ranked AI app is restricting sign-ups due to 'malicious attacks'" (Jan. 27, 2025). https://www.theverge.com/2025/1/27/24353023/deepseek-ai-app-restricting-sign-ups-malicious-attacks
- ↑ 12.0 12.1 12.2 ArXiv - "DeepSeek-V3 Technical Report" (December 2024). https://arxiv.org/html/2412.19437v1
- ↑ Fireworks AI - "DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical" (accessed January 2025). https://fireworks.ai/blog/deepseek-model-architecture
- ↑ GeeksforGeeks - "DeepSeek-R1: Technical Overview of its Architecture and Innovations" (February 2025). https://www.geeksforgeeks.org/deepseek-r1-technical-overview-of-its-architecture-and-innovations/
- ↑ Fireworks AI - "DeepSeek R1 Overview: Features, Capabilities, Parameters" (accessed January 2025). https://fireworks.ai/blog/deepseek-r1-deepdive
- ↑ 16.0 16.1 Reuters - "China's DeepSeek releases 'intermediate' AI model…" (Sep. 29, 2025). https://www.reuters.com/technology/deepseek-releases-model-it-calls-intermediate-step-towards-next-generation-2025-09-29/
- ↑ 17.0 17.1 17.2 arXiv - "DeepSeek-OCR: Contexts Optical Compression" (v1, Oct. 21, 2025). https://arxiv.org/abs/2510.18234
- ↑ 18.0 18.1 DeepSeek Blog - "DeepSeek-OCR: Context Compression with Optical 2D Mapping" (Oct. 2025). https://deepseek.ai/blog/deepseek-ocr-context-compression
- ↑ GitHub - deepseek-ai/DeepSeek-OCR (initial release Oct. 20, 2025; vLLM support noted Oct. 23, 2025). https://github.com/deepseek-ai/DeepSeek-OCR
- ↑ Hugging Face - "deepseek-ai/DeepSeek-OCR" model card (accessed Oct. 23, 2025). https://huggingface.co/deepseek-ai/DeepSeek-OCR
- ↑ Simon Willison - "Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code" (Oct. 20, 2025). https://simonwillison.net/2025/Oct/20/deepseek-ocr-claude-code/
- ↑ Analytics Vidhya - "DeepSeek V3 vs GPT-4o: Which is Better?" (May 2025). https://www.analyticsvidhya.com/blog/2024/12/gpt-4o-vs-deepseek-v3/
- ↑ GitHub - DeepSeek-Coder. https://github.com/deepseek-ai/DeepSeek-Coder
- ↑ GitHub - DeepSeek-Coder-V2. https://github.com/deepseek-ai/DeepSeek-Coder-V2
- ↑ DeepSeek - "DeepSeek-R1 Release" (Jan. 20, 2025). https://api-docs.deepseek.com/news/news250120
- ↑ Dev.to - "DeepSeek V3.1 Complete Evaluation Analysis" (August 20, 2025). https://dev.to/czmilo/deepseek-v31-complete-evaluation-analysis-58jc
- ↑ Tom's Hardware - "DeepSeek's new AI model supports China-native chips and CANN" (Oct. 1, 2025). https://www.tomshardware.com/tech-industry/deepseek-new-model-supports-huawei-cann
- ↑ SiliconFlow - "The Best DeepSeek-AI Models in 2025" (accessed October 2025). https://www.siliconflow.com/articles/zh-Hans/the-best-deepseek-ai-models-in-2025
- ↑ Hugging Face - "deepseek-ai/DeepSeek-R1" (accessed January 2025). https://huggingface.co/deepseek-ai/DeepSeek-R1
- ↑ 30.0 30.1 30.2 30.3 Thunderbit - "50 Latest DeepSeek Statistics (2025)" (accessed October 2025). https://thunderbit.com/zh-Hans/blog/deepseek-ai-statistics
- ↑ Wikipedia contributors, "Liang Wenfeng," Wikipedia, The Free Encyclopedia, accessed October 2025
- ↑ TechTarget - "DeepSeek explained: Everything you need to know" (August 2025). https://www.techtarget.com/whatis/feature/DeepSeek-explained-Everything-you-need-to-know
- ↑ 33.0 33.1 NIST - "CAISI Evaluation of DeepSeek AI Models Finds Shortcomings and Risks" (September 30, 2025). https://www.nist.gov/news-events/news/2025/09/caisi-evaluation-deepseek-ai-models-finds-shortcomings-and-risks
- ↑ DeepSeek - "DeepSeek AI's Roadmap: Upcoming Features to Watch in 2025" (accessed October 2025). https://deepseek.com.pk/deepseek-ais-roadmap-upcoming-features-to-watch-in-2025/
- ↑ DeepSeek AI - "Roadmap DeepSeek 2030" (accessed October 2025). https://deepseekai.org.in/deepseek-ai-roadmap-2025-2030/
- ↑ DeepSeek - "DeepSeek Open Platform Service Agreement" (accessed October 2025). https://cdn.deepseek.com/policies/zh-CN/deepseek-open-platform-terms-of-service.html