DeepSeek

DeepSeek
	杭州深度求索人工智能基础技术研究有限公司
Type	Private
Industry	Artificial intelligence
Founded	July 17, 2023
Founder	Liang Wenfeng
Headquarters	Hangzhou, Zhejiang, China
Key people	Liang Wenfeng (CEO)
Owner	High-Flyer Capital Management (Hangzhou Huanfang Technology)
Products	DeepSeek-V2, DeepSeek-V3, DeepSeek-Coder, DeepSeek-Coder-V2, DeepSeek-R1, DeepSeek-VL2 ; ;
Employees	160 (2025)
Website	https://www.deepseek.com/

DeepSeek (Chinese: 杭州深度求索人工智能基础技术研究有限公司; commonly DeepSeek AI or simply DeepSeek) is a Chinese artificial intelligence company known for developing large language models (LLMs) and releasing several prominent open-source and research models. Founded in 2023 by hedge fund entrepreneur Liang Wenfeng, the company has gained international recognition for achieving competitive performance with leading Western AI models at dramatically lower training costs.^[1]^[2]

DeepSeek rose to global prominence in January 2025 when its mobile app briefly topped the Apple App Store's free charts in the United States, following the release of its reasoning-focused DeepSeek-R1 models. The company's claim of training competitive models for under $6 million using Nvidia H800 GPUs, compared to over $100 million for Western equivalents, caused significant market disruption, with Nvidia losing nearly $600 billion in market capitalization.^[3]^[4]

History

Background and Origins (2016–2023)

DeepSeek's origins trace back to High-Flyer Capital Management, a Chinese quantitative hedge fund co-founded in February 2016 by Liang Wenfeng and two classmates from Zhejiang University.^[1] High-Flyer began adopting deep learning models for stock trading on October 21, 2016, transitioning from CPU-based linear models to GPU-dependent systems. By 2021, the fund relied exclusively on AI for trading operations.^[5]

In 2019, High-Flyer built its first computing cluster, Fire-Flyer (萤火一号), at a cost of 200 million yuan, equipped with 1,100 GPUs. Anticipating U.S. export restrictions on advanced chips to China, Liang acquired 10,000 Nvidia A100 units before restrictions took effect. Construction of Fire-Flyer 2 (萤火二号) began in 2021 with a 1 billion yuan budget, incorporating 5,000 PCIe A100 GPUs across 625 nodes by 2022.^[5]^[6]

Founding and Early Development (2023–2024)

On April 14, 2023, High-Flyer announced the establishment of an artificial general intelligence (AGI) research lab. This lab was formally incorporated as DeepSeek on July 17, 2023, with High-Flyer serving as the principal investor. Venture capital firms were initially reluctant to invest, considering the lack of short-term exit opportunities.^[1]^[7]

The company released its first model, DeepSeek Coder, on November 2, 2023, followed by the DeepSeek-LLM series on November 29, 2023. Throughout 2024, DeepSeek continued releasing specialized models:

January 2024: DeepSeek-MoE models (Base and Chat variants)
April 2024: DeepSeek-Math models (Base, Instruct, and RL)
May 2024: DeepSeek-V2
June 2024: DeepSeek-Coder V2 series
September 2024: DeepSeek V2.5^[5]

Global Breakthrough (2025)

In December 2024, DeepSeek released DeepSeek-V3, featuring a Mixture of Experts architecture with 671 billion total parameters. On January 20, 2025, the company announced DeepSeek-R1, a reasoning-centric model using pure reinforcement learning that matched performance of OpenAI's o1 family at significantly lower costs.^[8]^[9]

DeepSeek's mobile app reached #1 among free apps on the U.S. Apple App Store on January 27–28, 2025. This surge coincided with an 18% drop in Nvidia's share price and over $1 trillion erased from U.S. tech market capitalization. Prominent tech investor Marc Andreessen described this as "AI's Sputnik moment."^[3]^[10]^[4]

On January 27-28, 2025, DeepSeek reported large-scale malicious attacks on its services, temporarily restricting new sign-ups.^[11]

Technology

Architecture

Mixture of Experts (MoE)

DeepSeek's models employ a Mixture of Experts architecture, which allows massive parameter counts while maintaining computational efficiency. The MoE framework in DeepSeek-V3 consists of:^[12]^[13]

671 billion total parameters
37 billion activated parameters per forward pass
256 routed experts per layer (increased from 160 in V2)
1 shared expert per layer that is always activated
3 all-experts-activated layers

Multi-head Latent Attention (MLA)

DeepSeek-V2 and subsequent models incorporate Multi-head Latent Attention (MLA), a modified attention mechanism that compresses the key-value (KV) cache. MLA achieves:^[2]^[14]

KV-cache reduction to 5-13% of traditional methods
Significant memory overhead reduction during inference
Support for 128K-164K token context windows
Lower computational cost for long-context processing

Training Methodology

DeepSeek-R1 employs a distinctive training pipeline:^[8]^[15] 1. Cold Start Phase: Fine-tuning base model with curated chain-of-thought reasoning examples 2. Reasoning-Oriented Reinforcement Learning: Large-scale RL focusing on rule-based evaluation tasks 3. Supervised Fine-Tuning: Combining reasoning and non-reasoning data 4. RL for All Scenarios: Final refinement for helpfulness and harmlessness

DeepSeek Sparse Attention (DSA)

Introduced in DeepSeek-V3.2-Exp (September 2025), DSA is a fine-grained sparse attention mechanism optimized for long-context training and inference efficiency with minimal performance impact.^[16]

DeepSeek-OCR (2025)

In October 2025, DeepSeek released DeepSeek-OCR, an open-source end-to-end document OCR and understanding system that explores “contexts optical compression”—representing long text as images and decoding it back with a vision–language stack to save tokens for long-context LLM applications.^[17]^[18]

Architecture: A ~380M-parameter DeepEncoder (SAM-base window attention → 16× token compression via 2-layer conv → CLIP-large global attention) feeds a 3B MoE decoder (DeepSeek-3B-MoE-A570M; ~570M active params at inference). Multiple resolution modes control vision-token budgets: Tiny (64 tokens, 512²), Small (100, 640²), Base (256, 1024²), Large (400, 1280²), plus tiled Gundam (n×100 + 256 tokens) and Gundam-M modes for ultra-high-res pages.^[17]
Reported compression/accuracy: On a Fox benchmark subset (English pages with 600–1,300 text tokens), the paper reports ≈97% decoding precision when text tokens are <10× vision tokens, and ~60% accuracy around 20× compression. On OmniDocBench (edit distance; lower is better), Small (100 tokens) outperforms GOT-OCR 2.0 (256 tokens), and Gundam (<~800 tokens) surpasses MinerU-2.0 (~6,790 tokens) in the reported setup.^[17]
Throughput/uses: DeepSeek positions the system as a data-engine for LLM/VLM pretraining—claiming >200k pages/day on a single A100-40G, scalable to tens of millions per day on clusters—plus “deep parsing” of charts, chemical structures (SMILES), and planar geometry into structured outputs (for example HTML tables or dictionaries).^[18]
Availability/ecosystem: Source code and weights are hosted on GitHub and Hugging Face, with examples for Transformers/vLLM inference. Community walkthroughs (for example Simon Willison) documented running the 6.6-GB model on diverse hardware and shared setup notes.^[19]^[20]^[21]

Infrastructure

DeepSeek operates two primary computing clusters:^[5]

Fire-Flyer 1 (萤火一号): Built 2019, retired after 1.5 years
Fire-Flyer 2 (萤火二号): Operational since 2022, featuring:
- Nvidia GPUs with 200 Gbps interconnects
- Fat tree topology for high bisection bandwidth
- 3FS distributed file system with Direct I/O and RDMA
- 2,048 Nvidia H800 GPUs used for R1 training

Performance Benchmarks

DeepSeek Model Performance Comparison
Benchmark	DeepSeek-V3	DeepSeek-R1	GPT-4o	Description
MMLU	88.5	91.8	87.2	Massive Multitask Language Understanding
HumanEval	82.6	85.4	80.5	Code Generation
MATH-500	90.2	97.3	74.6	Mathematical Problem-Solving
Codeforces	51.6	57.2	23.6	Complex Coding Performance
GPQA	59.1	72.3	N/A	Graduate-Level Question Answering
AIME	N/A	79.8% pass@1	N/A	American Invitational Mathematics Examination

^[12]^[8]^[22]

Models and Products

Major Model Releases

DeepSeek Model Family Overview
Model	Type/Focus	Parameters	Context Length	Release Date	Key Features
DeepSeek-V2	General LLM (MoE)	236B total; 21B active	128K	May 2024	MLA; DeepSeekMoE routing^[2]
DeepSeek-V3	General LLM (MoE)	671B total; 37B active	131K	Dec 2024	Enhanced MoE; $5.6M training cost^[12]
DeepSeek-Coder	Code LLMs	Various sizes	16K	Nov 2023	2T tokens; 87% code / 13% NL; infilling^[23]
DeepSeek-Coder-V2	Code LLMs (MoE)	236B total; 21B active	128K	June 2024	+6T tokens; GPT-4-Turbo comparable^[24]
DeepSeek-R1	Reasoning post-training	671B total; 37B active	164K	Jan 2025	Pure RL training; o1-level performance^[8]^[25]
DeepSeek-V3.1	Hybrid MoE	N/A	N/A	Aug 2025	71.6% Aider pass rate^[26]
DeepSeek-V3.2-Exp	Experimental	N/A	N/A	Sep 2025	Sparse attention (DSA); Huawei Ascend support^[16]^[27]
DeepSeek-VL2	Vision-Language (MoE)	27B total; 4.5B active	4K	2025	Multimodal understanding^[28]

Distilled Models

DeepSeek has created smaller, efficient models through knowledge distillation:^[29]

DeepSeek-R1-Distill-Qwen-32B
DeepSeek-R1-Distill-Llama-70B
Various models from 1.5B to 70B parameters

API and Pricing

DeepSeek provides API access through the DeepSeek Open Platform with competitive pricing:^[30]

Input costs: $0.07-$0.27 per million tokens (vs $2.50 for GPT-4o)
Output costs: $1.10 per million tokens (vs $10.00 for GPT-4o)
50%+ price reduction following V3.2-Exp release (September 2025)
Pre-paid billing model

Organization and Leadership

Liang Wenfeng

Liang Wenfeng (梁文锋), born 1985 in Guangdong province, is the founder and CEO of DeepSeek. He graduated from Zhejiang University with:^[31]

Bachelor of Engineering in electronic information engineering (2007)
Master of Engineering in information and communication engineering (2010)

Liang co-founded High-Flyer in 2015 and began acquiring Nvidia GPUs in 2021, purchasing 10,000 A100 chips before U.S. export restrictions.^[6]

Corporate Structure

84% owned by Liang Wenfeng through shell corporations
16% owned by High-Flyer affiliated individuals
No external venture capital funding as of 2025
160 employees (2025)
Estimated valuation undisclosed; Liang's personal wealth: $4.5 billion (2025)^[30]

Organizational Philosophy

DeepSeek operates with an unconventional structure:^[6]^[5]

Bottom-up organization with natural division of labor
No preassigned roles or rigid hierarchy
Unrestricted computing resource access for researchers
Emphasis on fresh graduates and non-CS backgrounds
Recruitment from poetry, advanced mathematics, and other fields

Market Impact and Adoption

User Growth

DeepSeek experienced explosive growth following its January 2025 releases:^[30]

30 million daily active users within weeks of launch
33.7 million monthly active users (4th largest AI application globally)
Briefly surpassed ChatGPT in daily users (21.6M vs 14.6M)
Geographic distribution (January 2025):
- China: 30.7%
- India: 13.6%
- Indonesia: 6.9%
- United States: 4.3%
- France: 3.2%

Market Disruption

The "DeepSeek shock" of January 2025 caused:^[4]^[10]

$1+ trillion erased from U.S. tech market capitalization
Nvidia stock decline of 17% in single day ($600 billion loss)
Triggered "Sputnik moment" discussions in U.S. AI industry
AI price war in China with competitors cutting prices up to 97%

Cost Comparison

Training Cost Comparison
Model	Training Cost	Hardware Used
DeepSeek-R1	$5.6 million	2,048 Nvidia H800 GPUs
GPT-4	$100+ million (est.)	Unknown (likely H100s)
Claude 3	$100+ million (est.)	Unknown
Gemini Ultra	$191 million (est.)	TPU v5p

^[4]^[30]

Controversies and Challenges

Security and Privacy Concerns

Multiple governments and organizations have restricted DeepSeek usage:^[32]

Australian government agencies
India central government
South Korea industry ministry
Taiwan government agencies
Texas state government
U.S. Congress and Pentagon
Potential EU-wide ban under consideration

Export Control Issues

February 2025: Arrests in Singapore for illegally exporting Nvidia chips to DeepSeek
April 2025: Trump administration considered blocking DeepSeek from U.S. technology purchases
Ongoing scrutiny of chip acquisition methods^[5]

Content Alignment

DeepSeek-R1-0528 and later models noted for alignment with Chinese government policies and content restrictions.^[33]

NIST Evaluation

A September 2025 NIST evaluation found:^[33]

Performance shortcomings compared to U.S. models
Security vulnerabilities in certain implementations
Cost calculations disputed by independent analysts

Future Roadmap

2025-2026 Priorities

DeepSeek Coder 2.0: Support for Rust, Swift, Kotlin, Go
Multimodal DeepSeek-VL 3.0: Integration of text, vision, and audio
Private Model Hosting: Enterprise deployment solutions
Edge AI Models: Sub-1B parameter models for edge devices
AI Agent Systems: Multi-step task completion, late 2025 release^[34]

Long-term Vision (2027-2030)

AGI Research: $2 billion investment in consciousness-mapping research
Global Expansion: Operations in 50+ countries by 2028
AI Ethics Framework: Open-source accountability frameworks
Energy Efficiency: 40% reduction in training energy via quantum-inspired algorithms^[35]

Legal and Compliance

DeepSeek operates under comprehensive service agreements addressing:^[36]

Content management per China's Interim Measures for the Management of Generative Artificial Intelligence Services
Data security compliance with China's Data Security Law and Personal Information Protection Law
Technical standards for AI-generated content identification
Developer responsibilities for content filtering and monitoring

References

↑ ^1.0 ^1.1 ^1.2 Reuters - "What is DeepSeek and why is it disrupting the AI sector?" (Jan. 28, 2025). https://www.reuters.com/technology/artificial-intelligence/what-is-deepseek-why-is-it-disrupting-ai-sector-2025-01-27/
↑ ^2.0 ^2.1 ^2.2 arXiv - "DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model" (May 7, 2024). https://arxiv.org/abs/2405.04434
↑ ^3.0 ^3.1 The Washington Post - "What is DeepSeek…?" (Jan. 27–28, 2025). https://www.washingtonpost.com/technology/2025/01/27/what-is-deepseek-ai-china-us-stock-fears/
↑ ^4.0 ^4.1 ^4.2 ^4.3 CNN - "What is DeepSeek, the Chinese AI startup that shook the tech world?" (Jan. 27, 2025). https://edition.cnn.com/2025/01/27/tech/deepseek-ai-explainer/index.html
↑ ^5.0 ^5.1 ^5.2 ^5.3 ^5.4 ^5.5 Wikipedia contributors, "DeepSeek," Wikipedia, The Free Encyclopedia, accessed October 2025
↑ ^6.0 ^6.1 ^6.2 Fortune - "Meet DeepSeek founder Liang Wenfeng, a hedge fund manager" (April 2025). https://fortune.com/2025/01/27/deepseek-founder-liang-wenfeng-hedge-fund-manager-high-flyer-quant-trading/
↑ TechCrunch - "DeepSeek isn't taking VC money yet - here are 3 reasons why" (March 2025). https://techcrunch.com/2025/03/10/deepseek-isnt-taking-vc-money-yet-here-are-3-reasons-why/
↑ ^8.0 ^8.1 ^8.2 ^8.3 arXiv - "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" (Jan. 2025). https://arxiv.org/abs/2501.12948
↑ Nature News - "Secrets of DeepSeek AI model revealed in landmark paper" (Sep. 17, 2025). https://www.nature.com/articles/d41586-025-03015-6
↑ ^10.0 ^10.1 Reuters - "China's DeepSeek sets off AI market rout" (Jan. 27, 2025). https://www.reuters.com/technology/chinas-deepseek-sets-off-ai-market-rout-2025-01-27/
↑ The Verge - "DeepSeek's top-ranked AI app is restricting sign-ups due to 'malicious attacks'" (Jan. 27, 2025). https://www.theverge.com/2025/1/27/24353023/deepseek-ai-app-restricting-sign-ups-malicious-attacks
↑ ^12.0 ^12.1 ^12.2 ArXiv - "DeepSeek-V3 Technical Report" (December 2024). https://arxiv.org/html/2412.19437v1
↑ Fireworks AI - "DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical" (accessed January 2025). https://fireworks.ai/blog/deepseek-model-architecture
↑ GeeksforGeeks - "DeepSeek-R1: Technical Overview of its Architecture and Innovations" (February 2025). https://www.geeksforgeeks.org/deepseek-r1-technical-overview-of-its-architecture-and-innovations/
↑ Fireworks AI - "DeepSeek R1 Overview: Features, Capabilities, Parameters" (accessed January 2025). https://fireworks.ai/blog/deepseek-r1-deepdive
↑ ^16.0 ^16.1 Reuters - "China's DeepSeek releases 'intermediate' AI model…" (Sep. 29, 2025). https://www.reuters.com/technology/deepseek-releases-model-it-calls-intermediate-step-towards-next-generation-2025-09-29/
↑ ^17.0 ^17.1 ^17.2 arXiv - "DeepSeek-OCR: Contexts Optical Compression" (v1, Oct. 21, 2025). https://arxiv.org/abs/2510.18234
↑ ^18.0 ^18.1 DeepSeek Blog - "DeepSeek-OCR: Context Compression with Optical 2D Mapping" (Oct. 2025). https://deepseek.ai/blog/deepseek-ocr-context-compression
↑ GitHub - deepseek-ai/DeepSeek-OCR (initial release Oct. 20, 2025; vLLM support noted Oct. 23, 2025). https://github.com/deepseek-ai/DeepSeek-OCR
↑ Hugging Face - "deepseek-ai/DeepSeek-OCR" model card (accessed Oct. 23, 2025). https://huggingface.co/deepseek-ai/DeepSeek-OCR
↑ Simon Willison - "Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code" (Oct. 20, 2025). https://simonwillison.net/2025/Oct/20/deepseek-ocr-claude-code/
↑ Analytics Vidhya - "DeepSeek V3 vs GPT-4o: Which is Better?" (May 2025). https://www.analyticsvidhya.com/blog/2024/12/gpt-4o-vs-deepseek-v3/
↑ GitHub - DeepSeek-Coder. https://github.com/deepseek-ai/DeepSeek-Coder
↑ GitHub - DeepSeek-Coder-V2. https://github.com/deepseek-ai/DeepSeek-Coder-V2
↑ DeepSeek - "DeepSeek-R1 Release" (Jan. 20, 2025). https://api-docs.deepseek.com/news/news250120
↑ Dev.to - "DeepSeek V3.1 Complete Evaluation Analysis" (August 20, 2025). https://dev.to/czmilo/deepseek-v31-complete-evaluation-analysis-58jc
↑ Tom's Hardware - "DeepSeek's new AI model supports China-native chips and CANN" (Oct. 1, 2025). https://www.tomshardware.com/tech-industry/deepseek-new-model-supports-huawei-cann
↑ SiliconFlow - "The Best DeepSeek-AI Models in 2025" (accessed October 2025). https://www.siliconflow.com/articles/zh-Hans/the-best-deepseek-ai-models-in-2025
↑ Hugging Face - "deepseek-ai/DeepSeek-R1" (accessed January 2025). https://huggingface.co/deepseek-ai/DeepSeek-R1
↑ ^30.0 ^30.1 ^30.2 ^30.3 Thunderbit - "50 Latest DeepSeek Statistics (2025)" (accessed October 2025). https://thunderbit.com/zh-Hans/blog/deepseek-ai-statistics
↑ Wikipedia contributors, "Liang Wenfeng," Wikipedia, The Free Encyclopedia, accessed October 2025
↑ TechTarget - "DeepSeek explained: Everything you need to know" (August 2025). https://www.techtarget.com/whatis/feature/DeepSeek-explained-Everything-you-need-to-know
↑ ^33.0 ^33.1 NIST - "CAISI Evaluation of DeepSeek AI Models Finds Shortcomings and Risks" (September 30, 2025). https://www.nist.gov/news-events/news/2025/09/caisi-evaluation-deepseek-ai-models-finds-shortcomings-and-risks
↑ DeepSeek - "DeepSeek AI's Roadmap: Upcoming Features to Watch in 2025" (accessed October 2025). https://deepseek.com.pk/deepseek-ais-roadmap-upcoming-features-to-watch-in-2025/
↑ DeepSeek AI - "Roadmap DeepSeek 2030" (accessed October 2025). https://deepseekai.org.in/deepseek-ai-roadmap-2025-2030/
↑ DeepSeek - "DeepSeek Open Platform Service Agreement" (accessed October 2025). https://cdn.deepseek.com/policies/zh-CN/deepseek-open-platform-terms-of-service.html

External Links

[Reuters_WhatIsDeepSeek-1] 1.0 ^1.1 ^1.2 Reuters - "What is DeepSeek and why is it disrupting the AI sector?" (Jan. 28, 2025). https://www.reuters.com/technology/artificial-intelligence/what-is-deepseek-why-is-it-disrupting-ai-sector-2025-01-27/

[V2_arXiv-2] 2.0 ^2.1 ^2.2 arXiv - "DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model" (May 7, 2024). https://arxiv.org/abs/2405.04434

[WaPo_appFAQ-3] 3.0 ^3.1 The Washington Post - "What is DeepSeek…?" (Jan. 27–28, 2025). https://www.washingtonpost.com/technology/2025/01/27/what-is-deepseek-ai-china-us-stock-fears/

[CNN_explainer-4] 4.0 ^4.1 ^4.2 ^4.3 CNN - "What is DeepSeek, the Chinese AI startup that shook the tech world?" (Jan. 27, 2025). https://edition.cnn.com/2025/01/27/tech/deepseek-ai-explainer/index.html

[Wikipedia_DeepSeek-5] 5.0 ^5.1 ^5.2 ^5.3 ^5.4 ^5.5 Wikipedia contributors, "DeepSeek," Wikipedia, The Free Encyclopedia, accessed October 2025

[Fortune_profile-6] 6.0 ^6.1 ^6.2 Fortune - "Meet DeepSeek founder Liang Wenfeng, a hedge fund manager" (April 2025). https://fortune.com/2025/01/27/deepseek-founder-liang-wenfeng-hedge-fund-manager-high-flyer-quant-trading/

[TechCrunch_funding-7] TechCrunch - "DeepSeek isn't taking VC money yet - here are 3 reasons why" (March 2025). https://techcrunch.com/2025/03/10/deepseek-isnt-taking-vc-money-yet-here-are-3-reasons-why/

[R1_arXiv-8] 8.0 ^8.1 ^8.2 ^8.3 arXiv - "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" (Jan. 2025). https://arxiv.org/abs/2501.12948

[Nature_news-9] Nature News - "Secrets of DeepSeek AI model revealed in landmark paper" (Sep. 17, 2025). https://www.nature.com/articles/d41586-025-03015-6

[Reuters_market-10] 10.0 ^10.1 Reuters - "China's DeepSeek sets off AI market rout" (Jan. 27, 2025). https://www.reuters.com/technology/chinas-deepseek-sets-off-ai-market-rout-2025-01-27/

[Verge_attack-11] The Verge - "DeepSeek's top-ranked AI app is restricting sign-ups due to 'malicious attacks'" (Jan. 27, 2025). https://www.theverge.com/2025/1/27/24353023/deepseek-ai-app-restricting-sign-ups-malicious-attacks

[V3_arXiv-12] 12.0 ^12.1 ^12.2 ArXiv - "DeepSeek-V3 Technical Report" (December 2024). https://arxiv.org/html/2412.19437v1

[Fireworks_arch-13] Fireworks AI - "DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical" (accessed January 2025). https://fireworks.ai/blog/deepseek-model-architecture

[GeeksForGeeks-14] GeeksforGeeks - "DeepSeek-R1: Technical Overview of its Architecture and Innovations" (February 2025). https://www.geeksforgeeks.org/deepseek-r1-technical-overview-of-its-architecture-and-innovations/

[Fireworks_R1-15] Fireworks AI - "DeepSeek R1 Overview: Features, Capabilities, Parameters" (accessed January 2025). https://fireworks.ai/blog/deepseek-r1-deepdive

[Reuters_V32-16] 16.0 ^16.1 Reuters - "China's DeepSeek releases 'intermediate' AI model…" (Sep. 29, 2025). https://www.reuters.com/technology/deepseek-releases-model-it-calls-intermediate-step-towards-next-generation-2025-09-29/

[DS_OCR_arXiv-17] 17.0 ^17.1 ^17.2 arXiv - "DeepSeek-OCR: Contexts Optical Compression" (v1, Oct. 21, 2025). https://arxiv.org/abs/2510.18234

[DS_OCR_blog-18] 18.0 ^18.1 DeepSeek Blog - "DeepSeek-OCR: Context Compression with Optical 2D Mapping" (Oct. 2025). https://deepseek.ai/blog/deepseek-ocr-context-compression

[DS_OCR_github-19] GitHub - deepseek-ai/DeepSeek-OCR (initial release Oct. 20, 2025; vLLM support noted Oct. 23, 2025). https://github.com/deepseek-ai/DeepSeek-OCR

[DS_OCR_hf-20] Hugging Face - "deepseek-ai/DeepSeek-OCR" model card (accessed Oct. 23, 2025). https://huggingface.co/deepseek-ai/DeepSeek-OCR

[DS_OCR_willison-21] Simon Willison - "Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code" (Oct. 20, 2025). https://simonwillison.net/2025/Oct/20/deepseek-ocr-claude-code/

[Analytics_Vidhya-22] Analytics Vidhya - "DeepSeek V3 vs GPT-4o: Which is Better?" (May 2025). https://www.analyticsvidhya.com/blog/2024/12/gpt-4o-vs-deepseek-v3/

[Coder_github-23] GitHub - DeepSeek-Coder. https://github.com/deepseek-ai/DeepSeek-Coder

[CoderV2_github-24] GitHub - DeepSeek-Coder-V2. https://github.com/deepseek-ai/DeepSeek-Coder-V2

[DeepSeek_R1_news-25] DeepSeek - "DeepSeek-R1 Release" (Jan. 20, 2025). https://api-docs.deepseek.com/news/news250120

[Dev_V31-26] Dev.to - "DeepSeek V3.1 Complete Evaluation Analysis" (August 20, 2025). https://dev.to/czmilo/deepseek-v31-complete-evaluation-analysis-58jc

[Toms_CANN-27] Tom's Hardware - "DeepSeek's new AI model supports China-native chips and CANN" (Oct. 1, 2025). https://www.tomshardware.com/tech-industry/deepseek-new-model-supports-huawei-cann

[SiliconFlow-28] SiliconFlow - "The Best DeepSeek-AI Models in 2025" (accessed October 2025). https://www.siliconflow.com/articles/zh-Hans/the-best-deepseek-ai-models-in-2025

[HuggingFace_R1-29] Hugging Face - "deepseek-ai/DeepSeek-R1" (accessed January 2025). https://huggingface.co/deepseek-ai/DeepSeek-R1

[Thunderbit-30] 30.0 ^30.1 ^30.2 ^30.3 Thunderbit - "50 Latest DeepSeek Statistics (2025)" (accessed October 2025). https://thunderbit.com/zh-Hans/blog/deepseek-ai-statistics

[Liang_wiki-31] Wikipedia contributors, "Liang Wenfeng," Wikipedia, The Free Encyclopedia, accessed October 2025

[TechTarget-32] TechTarget - "DeepSeek explained: Everything you need to know" (August 2025). https://www.techtarget.com/whatis/feature/DeepSeek-explained-Everything-you-need-to-know

[NIST_eval-33] 33.0 ^33.1 NIST - "CAISI Evaluation of DeepSeek AI Models Finds Shortcomings and Risks" (September 30, 2025). https://www.nist.gov/news-events/news/2025/09/caisi-evaluation-deepseek-ai-models-finds-shortcomings-and-risks

[Roadmap-34] DeepSeek - "DeepSeek AI's Roadmap: Upcoming Features to Watch in 2025" (accessed October 2025). https://deepseek.com.pk/deepseek-ais-roadmap-upcoming-features-to-watch-in-2025/

[Longterm-35] DeepSeek AI - "Roadmap DeepSeek 2030" (accessed October 2025). https://deepseekai.org.in/deepseek-ai-roadmap-2025-2030/

[Terms-36] DeepSeek - "DeepSeek Open Platform Service Agreement" (accessed October 2025). https://cdn.deepseek.com/policies/zh-CN/deepseek-open-platform-terms-of-service.html

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]