DeepSeek

From AI Wiki
(Redirected from DeepSeek-AI)
DeepSeek
杭州深度求索人工智能基础技术研究有限公司
DeepSeek logo
Type Private
Industry Artificial intelligence
Founded July 17, 2023
Founder Liang Wenfeng
Headquarters Hangzhou, Zhejiang, China
Key people Liang Wenfeng (CEO)
Owner High-Flyer Capital Management (Hangzhou Huanfang Technology)
Products DeepSeek-V2, DeepSeek-V3, DeepSeek-Coder, DeepSeek-Coder-V2, DeepSeek-R1, DeepSeek-VL2



Employees 160 (2025)
Website https://www.deepseek.com/

DeepSeek (Chinese: 杭州深度求索人工智能基础技术研究有限公司; commonly DeepSeek AI or simply DeepSeek) is a Chinese artificial intelligence company known for developing large language models (LLMs) and releasing several prominent open-source and research models. Founded in 2023 by hedge fund entrepreneur Liang Wenfeng, the company has gained international recognition for achieving competitive performance with leading Western AI models at dramatically lower training costs.[1][2]

DeepSeek rose to global prominence in January 2025 when its mobile app briefly topped the Apple App Store's free charts in the United States, following the release of its reasoning-focused DeepSeek-R1 models. The company's claim of training competitive models for under $6 million using Nvidia H800 GPUs, compared to over $100 million for Western equivalents, caused significant market disruption, with Nvidia losing nearly $600 billion in market capitalization.[3][4]

History

Background and Origins (2016–2023)

DeepSeek's origins trace back to High-Flyer Capital Management, a Chinese quantitative hedge fund co-founded in February 2016 by Liang Wenfeng and two classmates from Zhejiang University.[1] High-Flyer began adopting deep learning models for stock trading on October 21, 2016, transitioning from CPU-based linear models to GPU-dependent systems. By 2021, the fund relied exclusively on AI for trading operations.[5]

In 2019, High-Flyer built its first computing cluster, Fire-Flyer (萤火一号), at a cost of 200 million yuan, equipped with 1,100 GPUs. Anticipating U.S. export restrictions on advanced chips to China, Liang acquired 10,000 Nvidia A100 units before restrictions took effect. Construction of Fire-Flyer 2 (萤火二号) began in 2021 with a 1 billion yuan budget, incorporating 5,000 PCIe A100 GPUs across 625 nodes by 2022.[5][6]

Founding and Early Development (2023–2024)

On April 14, 2023, High-Flyer announced the establishment of an artificial general intelligence (AGI) research lab. This lab was formally incorporated as DeepSeek on July 17, 2023, with High-Flyer serving as the principal investor. Venture capital firms were initially reluctant to invest, considering the lack of short-term exit opportunities.[1][7]

The company released its first model, DeepSeek Coder, on November 2, 2023, followed by the DeepSeek-LLM series on November 29, 2023. Throughout 2024, DeepSeek continued releasing specialized models:

Global Breakthrough (2025)

In December 2024, DeepSeek released DeepSeek-V3, featuring a Mixture of Experts architecture with 671 billion total parameters. On January 20, 2025, the company announced DeepSeek-R1, a reasoning-centric model using pure reinforcement learning that matched performance of OpenAI's o1 family at significantly lower costs.[8][9]

DeepSeek's mobile app reached #1 among free apps on the U.S. Apple App Store on January 27–28, 2025. This surge coincided with an 18% drop in Nvidia's share price and over $1 trillion erased from U.S. tech market capitalization. Prominent tech investor Marc Andreessen described this as "AI's Sputnik moment."[3][10][4]

On January 27-28, 2025, DeepSeek reported large-scale malicious attacks on its services, temporarily restricting new sign-ups.[11]

Technology

Architecture

Mixture of Experts (MoE)

DeepSeek's models employ a Mixture of Experts architecture, which allows massive parameter counts while maintaining computational efficiency. The MoE framework in DeepSeek-V3 consists of:[12][13]

  • 671 billion total parameters
  • 37 billion activated parameters per forward pass
  • 256 routed experts per layer (increased from 160 in V2)
  • 1 shared expert per layer that is always activated
  • 3 all-experts-activated layers

Multi-head Latent Attention (MLA)

DeepSeek-V2 and subsequent models incorporate Multi-head Latent Attention (MLA), a modified attention mechanism that compresses the key-value (KV) cache. MLA achieves:[2][14]

  • KV-cache reduction to 5-13% of traditional methods
  • Significant memory overhead reduction during inference
  • Support for 128K-164K token context windows
  • Lower computational cost for long-context processing

Training Methodology

DeepSeek-R1 employs a distinctive training pipeline:[8][15] 1. Cold Start Phase: Fine-tuning base model with curated chain-of-thought reasoning examples 2. Reasoning-Oriented Reinforcement Learning: Large-scale RL focusing on rule-based evaluation tasks 3. Supervised Fine-Tuning: Combining reasoning and non-reasoning data 4. RL for All Scenarios: Final refinement for helpfulness and harmlessness

DeepSeek Sparse Attention (DSA)

Introduced in DeepSeek-V3.2-Exp (September 2025), DSA is a fine-grained sparse attention mechanism optimized for long-context training and inference efficiency with minimal performance impact.[16]

DeepSeek-OCR (2025)

In October 2025, DeepSeek released DeepSeek-OCR, an open-source end-to-end document OCR and understanding system that explores “contexts optical compression”—representing long text as images and decoding it back with a vision–language stack to save tokens for long-context LLM applications.[17][18]

  • Architecture: A ~380M-parameter DeepEncoder (SAM-base window attention → 16× token compression via 2-layer conv → CLIP-large global attention) feeds a 3B MoE decoder (DeepSeek-3B-MoE-A570M; ~570M active params at inference). Multiple resolution modes control vision-token budgets: Tiny (64 tokens, 512²), Small (100, 640²), Base (256, 1024²), Large (400, 1280²), plus tiled Gundam (n×100 + 256 tokens) and Gundam-M modes for ultra-high-res pages.[17]
  • Reported compression/accuracy: On a Fox benchmark subset (English pages with 600–1,300 text tokens), the paper reports ≈97% decoding precision when text tokens are <10× vision tokens, and ~60% accuracy around 20× compression. On OmniDocBench (edit distance; lower is better), Small (100 tokens) outperforms GOT-OCR 2.0 (256 tokens), and Gundam (<~800 tokens) surpasses MinerU-2.0 (~6,790 tokens) in the reported setup.[17]
  • Throughput/uses: DeepSeek positions the system as a data-engine for LLM/VLM pretraining—claiming >200k pages/day on a single A100-40G, scalable to tens of millions per day on clusters—plus “deep parsing” of charts, chemical structures (SMILES), and planar geometry into structured outputs (for example HTML tables or dictionaries).[18]
  • Availability/ecosystem: Source code and weights are hosted on GitHub and Hugging Face, with examples for Transformers/vLLM inference. Community walkthroughs (for example Simon Willison) documented running the 6.6-GB model on diverse hardware and shared setup notes.[19][20][21]

Infrastructure

DeepSeek operates two primary computing clusters:[5]

  • Fire-Flyer 1 (萤火一号): Built 2019, retired after 1.5 years
  • Fire-Flyer 2 (萤火二号): Operational since 2022, featuring:
    • Nvidia GPUs with 200 Gbps interconnects
    • Fat tree topology for high bisection bandwidth
    • 3FS distributed file system with Direct I/O and RDMA
    • 2,048 Nvidia H800 GPUs used for R1 training

Performance Benchmarks

DeepSeek Model Performance Comparison
Benchmark DeepSeek-V3 DeepSeek-R1 GPT-4o Description
MMLU 88.5 91.8 87.2 Massive Multitask Language Understanding
HumanEval 82.6 85.4 80.5 Code Generation
MATH-500 90.2 97.3 74.6 Mathematical Problem-Solving
Codeforces 51.6 57.2 23.6 Complex Coding Performance
GPQA 59.1 72.3 N/A Graduate-Level Question Answering
AIME N/A 79.8% pass@1 N/A American Invitational Mathematics Examination

[12][8][22]

Models and Products

Major Model Releases

DeepSeek Model Family Overview
Model Type/Focus Parameters Context Length Release Date Key Features
DeepSeek-V2 General LLM (MoE) 236B total; 21B active 128K May 2024 MLA; DeepSeekMoE routing[2]
DeepSeek-V3 General LLM (MoE) 671B total; 37B active 131K Dec 2024 Enhanced MoE; $5.6M training cost[12]
DeepSeek-Coder Code LLMs Various sizes 16K Nov 2023 2T tokens; 87% code / 13% NL; infilling[23]
DeepSeek-Coder-V2 Code LLMs (MoE) 236B total; 21B active 128K June 2024 +6T tokens; GPT-4-Turbo comparable[24]
DeepSeek-R1 Reasoning post-training 671B total; 37B active 164K Jan 2025 Pure RL training; o1-level performance[8][25]
DeepSeek-V3.1 Hybrid MoE N/A N/A Aug 2025 71.6% Aider pass rate[26]
DeepSeek-V3.2-Exp Experimental N/A N/A Sep 2025 Sparse attention (DSA); Huawei Ascend support[16][27]
DeepSeek-VL2 Vision-Language (MoE) 27B total; 4.5B active 4K 2025 Multimodal understanding[28]

Distilled Models

DeepSeek has created smaller, efficient models through knowledge distillation:[29]

API and Pricing

DeepSeek provides API access through the DeepSeek Open Platform with competitive pricing:[30]

  • Input costs: $0.07-$0.27 per million tokens (vs $2.50 for GPT-4o)
  • Output costs: $1.10 per million tokens (vs $10.00 for GPT-4o)
  • 50%+ price reduction following V3.2-Exp release (September 2025)
  • Pre-paid billing model

Organization and Leadership

Liang Wenfeng

Liang Wenfeng (梁文锋), born 1985 in Guangdong province, is the founder and CEO of DeepSeek. He graduated from Zhejiang University with:[31]

  • Bachelor of Engineering in electronic information engineering (2007)
  • Master of Engineering in information and communication engineering (2010)

Liang co-founded High-Flyer in 2015 and began acquiring Nvidia GPUs in 2021, purchasing 10,000 A100 chips before U.S. export restrictions.[6]

Corporate Structure

  • 84% owned by Liang Wenfeng through shell corporations
  • 16% owned by High-Flyer affiliated individuals
  • No external venture capital funding as of 2025
  • 160 employees (2025)
  • Estimated valuation undisclosed; Liang's personal wealth: $4.5 billion (2025)[30]

Organizational Philosophy

DeepSeek operates with an unconventional structure:[6][5]

  • Bottom-up organization with natural division of labor
  • No preassigned roles or rigid hierarchy
  • Unrestricted computing resource access for researchers
  • Emphasis on fresh graduates and non-CS backgrounds
  • Recruitment from poetry, advanced mathematics, and other fields

Market Impact and Adoption

User Growth

DeepSeek experienced explosive growth following its January 2025 releases:[30]

  • 30 million daily active users within weeks of launch
  • 33.7 million monthly active users (4th largest AI application globally)
  • Briefly surpassed ChatGPT in daily users (21.6M vs 14.6M)
  • Geographic distribution (January 2025):
    • China: 30.7%
    • India: 13.6%
    • Indonesia: 6.9%
    • United States: 4.3%
    • France: 3.2%

Market Disruption

The "DeepSeek shock" of January 2025 caused:[4][10]

  • $1+ trillion erased from U.S. tech market capitalization
  • Nvidia stock decline of 17% in single day ($600 billion loss)
  • Triggered "Sputnik moment" discussions in U.S. AI industry
  • AI price war in China with competitors cutting prices up to 97%

Cost Comparison

Training Cost Comparison
Model Training Cost Hardware Used
DeepSeek-R1 $5.6 million 2,048 Nvidia H800 GPUs
GPT-4 $100+ million (est.) Unknown (likely H100s)
Claude 3 $100+ million (est.) Unknown
Gemini Ultra $191 million (est.) TPU v5p

[4][30]

Controversies and Challenges

Security and Privacy Concerns

Multiple governments and organizations have restricted DeepSeek usage:[32]

  • Australian government agencies
  • India central government
  • South Korea industry ministry
  • Taiwan government agencies
  • Texas state government
  • U.S. Congress and Pentagon
  • Potential EU-wide ban under consideration

Export Control Issues

  • February 2025: Arrests in Singapore for illegally exporting Nvidia chips to DeepSeek
  • April 2025: Trump administration considered blocking DeepSeek from U.S. technology purchases
  • Ongoing scrutiny of chip acquisition methods[5]

Content Alignment

DeepSeek-R1-0528 and later models noted for alignment with Chinese government policies and content restrictions.[33]

NIST Evaluation

A September 2025 NIST evaluation found:[33]

  • Performance shortcomings compared to U.S. models
  • Security vulnerabilities in certain implementations
  • Cost calculations disputed by independent analysts

Future Roadmap

2025-2026 Priorities

  • DeepSeek Coder 2.0: Support for Rust, Swift, Kotlin, Go
  • Multimodal DeepSeek-VL 3.0: Integration of text, vision, and audio
  • Private Model Hosting: Enterprise deployment solutions
  • Edge AI Models: Sub-1B parameter models for edge devices
  • AI Agent Systems: Multi-step task completion, late 2025 release[34]

Long-term Vision (2027-2030)

  • AGI Research: $2 billion investment in consciousness-mapping research
  • Global Expansion: Operations in 50+ countries by 2028
  • AI Ethics Framework: Open-source accountability frameworks
  • Energy Efficiency: 40% reduction in training energy via quantum-inspired algorithms[35]

Legal and Compliance

DeepSeek operates under comprehensive service agreements addressing:[36]

See Also

References

  1. 1.0 1.1 1.2 Reuters - "What is DeepSeek and why is it disrupting the AI sector?" (Jan. 28, 2025). https://www.reuters.com/technology/artificial-intelligence/what-is-deepseek-why-is-it-disrupting-ai-sector-2025-01-27/
  2. 2.0 2.1 2.2 arXiv - "DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model" (May 7, 2024). https://arxiv.org/abs/2405.04434
  3. 3.0 3.1 The Washington Post - "What is DeepSeek…?" (Jan. 27–28, 2025). https://www.washingtonpost.com/technology/2025/01/27/what-is-deepseek-ai-china-us-stock-fears/
  4. 4.0 4.1 4.2 4.3 CNN - "What is DeepSeek, the Chinese AI startup that shook the tech world?" (Jan. 27, 2025). https://edition.cnn.com/2025/01/27/tech/deepseek-ai-explainer/index.html
  5. 5.0 5.1 5.2 5.3 5.4 5.5 Wikipedia contributors, "DeepSeek," Wikipedia, The Free Encyclopedia, accessed October 2025
  6. 6.0 6.1 6.2 Fortune - "Meet DeepSeek founder Liang Wenfeng, a hedge fund manager" (April 2025). https://fortune.com/2025/01/27/deepseek-founder-liang-wenfeng-hedge-fund-manager-high-flyer-quant-trading/
  7. TechCrunch - "DeepSeek isn't taking VC money yet - here are 3 reasons why" (March 2025). https://techcrunch.com/2025/03/10/deepseek-isnt-taking-vc-money-yet-here-are-3-reasons-why/
  8. 8.0 8.1 8.2 8.3 arXiv - "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" (Jan. 2025). https://arxiv.org/abs/2501.12948
  9. Nature News - "Secrets of DeepSeek AI model revealed in landmark paper" (Sep. 17, 2025). https://www.nature.com/articles/d41586-025-03015-6
  10. 10.0 10.1 Reuters - "China's DeepSeek sets off AI market rout" (Jan. 27, 2025). https://www.reuters.com/technology/chinas-deepseek-sets-off-ai-market-rout-2025-01-27/
  11. The Verge - "DeepSeek's top-ranked AI app is restricting sign-ups due to 'malicious attacks'" (Jan. 27, 2025). https://www.theverge.com/2025/1/27/24353023/deepseek-ai-app-restricting-sign-ups-malicious-attacks
  12. 12.0 12.1 12.2 ArXiv - "DeepSeek-V3 Technical Report" (December 2024). https://arxiv.org/html/2412.19437v1
  13. Fireworks AI - "DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical" (accessed January 2025). https://fireworks.ai/blog/deepseek-model-architecture
  14. GeeksforGeeks - "DeepSeek-R1: Technical Overview of its Architecture and Innovations" (February 2025). https://www.geeksforgeeks.org/deepseek-r1-technical-overview-of-its-architecture-and-innovations/
  15. Fireworks AI - "DeepSeek R1 Overview: Features, Capabilities, Parameters" (accessed January 2025). https://fireworks.ai/blog/deepseek-r1-deepdive
  16. 16.0 16.1 Reuters - "China's DeepSeek releases 'intermediate' AI model…" (Sep. 29, 2025). https://www.reuters.com/technology/deepseek-releases-model-it-calls-intermediate-step-towards-next-generation-2025-09-29/
  17. 17.0 17.1 17.2 arXiv - "DeepSeek-OCR: Contexts Optical Compression" (v1, Oct. 21, 2025). https://arxiv.org/abs/2510.18234
  18. 18.0 18.1 DeepSeek Blog - "DeepSeek-OCR: Context Compression with Optical 2D Mapping" (Oct. 2025). https://deepseek.ai/blog/deepseek-ocr-context-compression
  19. GitHub - deepseek-ai/DeepSeek-OCR (initial release Oct. 20, 2025; vLLM support noted Oct. 23, 2025). https://github.com/deepseek-ai/DeepSeek-OCR
  20. Hugging Face - "deepseek-ai/DeepSeek-OCR" model card (accessed Oct. 23, 2025). https://huggingface.co/deepseek-ai/DeepSeek-OCR
  21. Simon Willison - "Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code" (Oct. 20, 2025). https://simonwillison.net/2025/Oct/20/deepseek-ocr-claude-code/
  22. Analytics Vidhya - "DeepSeek V3 vs GPT-4o: Which is Better?" (May 2025). https://www.analyticsvidhya.com/blog/2024/12/gpt-4o-vs-deepseek-v3/
  23. GitHub - DeepSeek-Coder. https://github.com/deepseek-ai/DeepSeek-Coder
  24. GitHub - DeepSeek-Coder-V2. https://github.com/deepseek-ai/DeepSeek-Coder-V2
  25. DeepSeek - "DeepSeek-R1 Release" (Jan. 20, 2025). https://api-docs.deepseek.com/news/news250120
  26. Dev.to - "DeepSeek V3.1 Complete Evaluation Analysis" (August 20, 2025). https://dev.to/czmilo/deepseek-v31-complete-evaluation-analysis-58jc
  27. Tom's Hardware - "DeepSeek's new AI model supports China-native chips and CANN" (Oct. 1, 2025). https://www.tomshardware.com/tech-industry/deepseek-new-model-supports-huawei-cann
  28. SiliconFlow - "The Best DeepSeek-AI Models in 2025" (accessed October 2025). https://www.siliconflow.com/articles/zh-Hans/the-best-deepseek-ai-models-in-2025
  29. Hugging Face - "deepseek-ai/DeepSeek-R1" (accessed January 2025). https://huggingface.co/deepseek-ai/DeepSeek-R1
  30. 30.0 30.1 30.2 30.3 Thunderbit - "50 Latest DeepSeek Statistics (2025)" (accessed October 2025). https://thunderbit.com/zh-Hans/blog/deepseek-ai-statistics
  31. Wikipedia contributors, "Liang Wenfeng," Wikipedia, The Free Encyclopedia, accessed October 2025
  32. TechTarget - "DeepSeek explained: Everything you need to know" (August 2025). https://www.techtarget.com/whatis/feature/DeepSeek-explained-Everything-you-need-to-know
  33. 33.0 33.1 NIST - "CAISI Evaluation of DeepSeek AI Models Finds Shortcomings and Risks" (September 30, 2025). https://www.nist.gov/news-events/news/2025/09/caisi-evaluation-deepseek-ai-models-finds-shortcomings-and-risks
  34. DeepSeek - "DeepSeek AI's Roadmap: Upcoming Features to Watch in 2025" (accessed October 2025). https://deepseek.com.pk/deepseek-ais-roadmap-upcoming-features-to-watch-in-2025/
  35. DeepSeek AI - "Roadmap DeepSeek 2030" (accessed October 2025). https://deepseekai.org.in/deepseek-ai-roadmap-2025-2030/
  36. DeepSeek - "DeepSeek Open Platform Service Agreement" (accessed October 2025). https://cdn.deepseek.com/policies/zh-CN/deepseek-open-platform-terms-of-service.html

External Links