Ollama
Ollama is an open-source tool designed to simplify the deployment and management of large language models (LLMs) locally on personal computers and servers.[1] It provides a streamlined interface for downloading, running, and managing various open-source LLMs without requiring cloud computing services or extensive technical expertise, positioning itself as "Docker for LLMs."[2]
History
Ollama was founded in 2021 by Jeffrey Morgan and Michael Chiang in Palo Alto, California.[3][4] The company participated in Y Combinator's Winter 2021 batch and raised $125,000 in pre-seed funding from investors including Y Combinator, Essence Venture Capital, Rogue Capital, and Sunflower Capital.[5]
Prior to founding Ollama, Morgan and Chiang, along with Sean Li, created Kitematic, a tool designed to simplify Docker container management on macOS, which was eventually acquired by Docker, Inc.[6][7] Jeffrey Morgan and Sean Li graduated from the University of Waterloo (BASc 2013, Software Engineering), while Michael Chiang was an electrical engineering student there at the time of Kitematic's acquisition.[7] This experience in making complex command-line tools accessible through simpler interfaces directly influenced Ollama's design philosophy.[6]
The platform quickly gained traction in the open-source AI community for its ease of use and Docker-like simplicity in managing LLMs.[5] Initial releases focused on core functionality for running models like LLaMA 2, with subsequent updates introducing features such as multimodal support and tool calling.[8]
Key Milestones
| Date | Milestone | Notes |
|---|---|---|
| 2021 | Company Founded | Participated in Y Combinator W21 batch |
| March 23, 2021 | Pre-seed Funding | Raised $125,000 from Y Combinator and other investors[9] |
| 2023 | Public Launch | Basic model management and inference capabilities |
| February 8, 2024 | OpenAI Compatibility | Initial compatibility with the OpenAI Chat Completions API at /v1/chat/completions[10]
|
| February 15, 2024 | Windows Preview | Native Windows build with built-in GPU acceleration and always-on API[11] |
| March 14, 2024 | AMD GPU Preview | Preview acceleration on supported AMD Radeon/Instinct cards on Windows and Linux[12] |
| July 30, 2025 | Desktop App | Official GUI app for macOS/Windows with file drag-and-drop and context-length controls[13] |
| September 19, 2025 | Cloud Models (Preview) | Option to run larger models on datacenter hardware while maintaining local workflows[14] |
| September 26, 2025 | Version 0.12.3 | Latest stable release with web search API and performance optimizations[15] |
Architecture and Technical Implementation
Core Technology
Ollama is built primarily in Go and leverages llama.cpp as its underlying inference engine through CGo bindings.[16][17] The llama.cpp project, created by Georgi Gerganov in March 2023, provides an efficient C++ implementation of LLaMA and other language models, enabling them to run on consumer-grade hardware.[16]
Model Format
Ollama primarily uses the GGUF (GPT-Generated Unified Format) file format for storing and loading models.[18] GGUF replaced the earlier GGML format and provides better compatibility, metadata handling, and performance optimization for quantized models.[19] This quantization is what allows massive models (for example 70 billion parameters) to run on machines with limited VRAM.
Ollama can also import models from specific Safetensors directories for supported architectures (for example Llama, Mistral, Gemma, Phi).[20]
Features
Core Capabilities
| Area | Details | Notes |
|---|---|---|
| Server & Port | Local HTTP server at 127.0.0.1:11434 |
Configurable via OLLAMA_HOST environment variable[21]
|
| Core Endpoints | /api/generate, /api/chat, /api/embeddings, model management |
Streaming JSON supported[22] |
| OpenAI Compatibility | /v1/chat/completions |
Drop-in replacement for many OpenAI-based clients[10] |
| Local-First Design | All processing occurs locally | Ensures complete data privacy[2] |
| Multimodal Support | Text, images, and other data types | Self-contained projection layers[23] |
| Tool Calling | External function calls | Enhances reasoning and automation[8] |
| Cloud Integration | Hybrid mode for larger models | Maintains local workflows (v0.12.0+)[14] |
| Performance | Flash attention, GPU/CPU overlap | Batch processing for efficiency[15] |
Modelfile
A key component of Ollama is the Modelfile, which serves as a blueprint for creating and sharing models.[20] Similar to a Dockerfile, the Modelfile defines model behavior and configuration.
| Instruction | Description | Example |
|---|---|---|
| FROM | (Required) Specifies the base model or local GGUF path | FROM llama3.2 or FROM ./model.gguf
|
| PARAMETER | Sets model parameters | PARAMETER temperature 0.7PARAMETER num_ctx 4096
|
| SYSTEM | Defines system message/persona | SYSTEM "You are a helpful assistant"
|
| TEMPLATE | Sets prompt template format | TEMPLATE "[INST] Template:.System Template:.Prompt [/INST]"
|
| ADAPTER | Applies LoRA/QLoRA adapters | ADAPTER /path/to/adapter.bin
|
| LICENSE | Specifies model license | LICENSE "MIT"
|
| MESSAGE | Provides conversation history for few-shot learning | MESSAGE user "What is 1+1?"MESSAGE assistant "2"
|
Example Modelfile
# Specify the base model
FROM llama3.2
# Set model parameters
PARAMETER temperature 0.8
PARAMETER num_ctx 4096
PARAMETER stop </s>
# Set the system message
SYSTEM """
You are an expert Python programming assistant.
Always provide clear, concise code examples.
Your responses must be formatted in Markdown.
"""
# Define the chat template
TEMPLATE """
<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>
<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""
This custom model can be created with: ollama create my-assistant -f ./Modelfile
Command-Line Interface
| Command | Description | Example |
|---|---|---|
ollama run |
Runs a model interactively | ollama run llama3.2
|
ollama pull |
Downloads a model | ollama pull gemma:2b
|
ollama create |
Creates custom model from Modelfile | ollama create mymodel -f ./Modelfile
|
ollama list |
Lists installed models | ollama list
|
ollama rm |
Removes a model | ollama rm llama3.2
|
ollama cp |
Copies a model | ollama cp llama3.2 mymodel
|
ollama push |
Uploads model to registry | ollama push mymodel
|
ollama serve |
Starts the Ollama server | ollama serve
|
REST API
Ollama exposes a REST API on port 11434 by default, providing programmatic access to model functionality:[22]
| Endpoint | Method | Description | Example |
|---|---|---|---|
/api/generate |
POST | Generate text completion | curl -X POST http://localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"Hello"}'
|
/api/chat |
POST | Chat conversation interface | curl -X POST http://localhost:11434/api/chat -d '{"model":"llama3.2","messages":[{"role":"user","content":"Hi"}]}'
|
/api/embeddings |
POST | Generate text embeddings | curl -X POST http://localhost:11434/api/embeddings -d '{"model":"llama3.2","prompt":"Test"}'
|
/api/pull |
POST | Download a model | curl -X POST http://localhost:11434/api/pull -d '{"name":"llama3.2"}'
|
/api/show |
POST | Show model information | curl -X POST http://localhost:11434/api/show -d '{"name":"llama3.2"}'
|
/api/list |
GET | List installed models | curl http://localhost:11434/api/list
|
Supported Models
Ollama supports a wide range of open-source language models, continuously updated as new models are released:[24]
| Model Name | Parameters | Category | Description | Creator |
|---|---|---|---|---|
| Llama 3.2 | 1B, 3B, 11B, 90B | Source/Fine-Tuned | Meta's latest open-source LLM with improved reasoning and tool support | Meta AI |
| Gemma 2 | 2B, 9B, 27B | Source | Google's lightweight models for efficient on-device inference | |
| DeepSeek-R1 | 7B, 67B | Tools/Multimodal | Hybrid model supporting thinking modes for complex tasks | DeepSeek AI |
| Mistral / Mixtral | 7B, 8x7B, 8x22B | Source | High-efficiency models, often outperforming larger models | Mistral AI |
| Qwen2.5 | Up to 72B | Source | Alibaba's multilingual models supporting 128K tokens | Alibaba |
| Phi 4 | 3B, 14B | Fine-Tuned | Microsoft's small language models for efficient reasoning | Microsoft |
| CodeLlama | 7B, 13B, 34B | Code | Specialized for code generation and programming tasks | Meta AI |
| LLaVA | 7B, 13B | Multimodal | Visual language model for text and image understanding | Various |
| Snowflake Arctic Embed | 568M | Embedding | Multilingual embedding model for retrieval tasks | Snowflake |
| Firefunction | 7B | Tools | Function-calling model for automation and integration | Various |
| Vicuna, Alpaca | Various | Fine-Tuned | LLaMA derivatives with specialized capabilities | Various |
Installation
System Requirements
| Platform | Minimum Version | Installation Method |
|---|---|---|
| macOS | 11 Big Sur or later | Download .dmg from official website
|
| Linux | Ubuntu 18.04 or equivalent | sh |
| Windows | Windows 10 22H2 or later | Download .exe installer
|
| Docker | Any platform | docker pull ollama/ollama
|
| Model Size | RAM Required | Storage | GPU VRAM (Optional) |
|---|---|---|---|
| 3B parameters | 8GB | 10GB+ | 4GB |
| 7B parameters | 16GB | 20GB+ | 8GB |
| 13B parameters | 32GB | 40GB+ | 16GB |
| 70B parameters | 64GB+ | 100GB+ | 48GB+ |
GPU Support
- NVIDIA GPUs via CUDA (compute capability 5.0+)
- AMD GPUs via ROCm (preview support)
- Apple Silicon via Metal
- Intel Arc GPUs (experimental)
Integration and Ecosystem
Programming Languages
Ollama provides official client libraries:[25]
- Python -
pip install ollama - JavaScript / TypeScript -
npm install ollama - Go - Native API support
Third-Party Integrations
Development Frameworks
- LangChain - LLM application framework
- LlamaIndex - Data framework for LLMs
- AutoGen - Multi-agent systems
- Semantic Kernel - Microsoft's AI orchestration
- Spring AI - Java/Spring integration[26]
User Interfaces
- Open WebUI - Web-based chat interface
- Continue.dev - VS Code extension
- AnythingLLM - Multi-model chat application
- Various mobile applications for iOS and Android
Database and Infrastructure
- PostgreSQL with pgai extension[27]
- IoT device integrations[28]
Privacy and Security
Privacy Features
By default, Ollama operates entirely locally:[21]
- Server binds to
127.0.0.1:11434(loopback interface only) - No prompts or responses sent to external servers
- Complete data privacy for sensitive information
- Offline operation after model download
To expose on a network, users must explicitly set OLLAMA_HOST environment variable (for example OLLAMA_HOST=0.0.0.0:11434).
Security Vulnerabilities
Ollama has addressed several security vulnerabilities:[29]
| CVE | Description | Affected Versions | Status |
|---|---|---|---|
| CVE-2024-37032 | Remote code execution via API misconfiguration (Probllama) | <0.1.34 | Fixed[30] |
| CVE-2025-0312 | Malicious GGUF model exploitation | ≤0.3.14 | Fixed[31] |
| CNVD-2025-04094 | Unauthorized access due to improper configuration | Various | Configuration issue[32] |
Users are advised to keep Ollama updated and configure securely, especially when exposing APIs.
Comparisons with Similar Tools
Ollama vs. LM Studio
| Feature | Ollama | LM Studio |
|---|---|---|
| Interface | CLI-focused | GUI-focused |
| License | MIT (open source) | Proprietary (free to use) |
| Model Sources | Ollama registry + GGUF | Hugging Face + GGUF |
| Concurrent Handling | Excellent (batching) | Limited |
| macOS Performance | Good | Better (MLX support) |
| Automation | Excellent | Limited |
| API | Built-in REST API | Available |
Ollama is preferred for its flexibility in integrations and developer-centric approach, while LM Studio offers a more user-friendly GUI.[33]
Community and Reception
Ollama has a vibrant community on GitHub, with over 90,000 stars and active contributions focusing on performance and compatibility.[34] It has been praised for advancing local AI technologies, cost savings, and accessibility.[35]
Licensing Controversy
Some criticism has arisen regarding licensing compliance issues with dependencies like llama.cpp, with community members raising concerns about proper attribution.[36] The Ollama team has been working to address these concerns.
Significance and Impact
Ollama has been a key driver in the democratization of large language models by:[37]
- Enabling developers to build and test AI-powered applications locally without cost
- Allowing researchers to experiment with various open-source models easily
- Empowering hobbyists to run state-of-the-art AI on personal computers
- Enhancing privacy for users who can leverage powerful AI without data leaving their machine
- Fostering a community-driven approach to AI development
The tool is widely used in education, research, and enterprise for privacy-sensitive applications and has become a foundational tool in the open-source AI movement.
See Also
- Large language model
- llama.cpp
- GGUF
- LangChain
- LlamaIndex
- Hugging Face
- Docker
- Local AI
- Mixture of Experts
- Retrieval-augmented generation
- Embeddings
References
- ↑ Bala Priya C (May 7, 2024). "Ollama Tutorial: Running LLMs Locally Made Super Simple". KDnuggets. https://www.kdnuggets.com/ollama-tutorial-running-llms-locally-made-super-simple.
- ↑ 2.0 2.1 Vijaykumar (May 5, 2025). "Deploy LLMs Locally with Ollama: Your Complete Guide to Local AI Development". Medium. https://medium.com/@bluudit/deploy-llms-locally-with-ollama-your-complete-guide-to-local-ai-development.
- ↑ "Ollama - Y Combinator". Y Combinator. https://www.ycombinator.com/companies/ollama.
- ↑ "Ollama Company Profile". PitchBook. https://pitchbook.com/profiles/company/537457-42.
- ↑ 5.0 5.1 "Who Is the Owner of Ollama? Discover the Leadership". BytePlus. https://www.byteplus.com/en/topic/375310.
- ↑ 6.0 6.1 "Who Is Behind Ollama? Discover the Founders & Team". BytePlus. https://www.byteplus.com/en/topic/418063.
- ↑ 7.0 7.1 "Software startup with Waterloo Engineering roots acquired by Docker Inc.". University of Waterloo. May 21, 2024. https://uwaterloo.ca/engineering/news/software-startup-waterloo-engineering-roots-acquired-docker.
- ↑ 8.0 8.1 "Tool support". Ollama Blog. July 25, 2024. https://ollama.com/blog/tool-support.
- ↑ "Ollama Pre Seed Round". Crunchbase. March 23, 2021. https://www.crunchbase.com/funding_round/ollama-pre-seed--c51b44d8.
- ↑ 10.0 10.1 "OpenAI compatibility". Ollama Blog. https://ollama.com/blog/openai-compatibility.
- ↑ "Windows preview". Ollama Blog. February 15, 2024. https://ollama.com/blog/windows-preview.
- ↑ "Ollama now supports AMD graphics cards". Ollama Blog. March 14, 2024. https://ollama.com/blog/amd-support.
- ↑ "Ollama's new app". Ollama Blog. July 30, 2025. https://ollama.com/blog/new-app.
- ↑ 14.0 14.1 "Cloud models". Ollama Blog. September 19, 2025. https://ollama.com/blog/cloud-models.
- ↑ 15.0 15.1 "Releases - ollama/ollama". GitHub. https://github.com/ollama/ollama/releases.
- ↑ 16.0 16.1 "llama.cpp vs. ollama: Running LLMs Locally for Enterprises". Picovoice. https://picovoice.ai/blog/local-llms-llamacpp-ollama/.
- ↑ "Ollama: How It Works Internally". Medium. July 5, 2024. https://medium.com/@laiso/ollama-under-the-hood-f8ed0f14d90c.
- ↑ "Use Ollama with any GGUF Model on Hugging Face Hub". Hugging Face. https://huggingface.co/docs/hub/en/ollama.
- ↑ Mark Needham (October 18, 2023). "Ollama: Running GGUF Models from Hugging Face". Mark Needham. https://www.markhneedham.com/blog/2023/10/18/ollama-hugging-face-gguf-models/.
- ↑ 20.0 20.1 "Modelfile Documentation". GitHub - ollama/ollama. https://github.com/ollama/ollama/blob/main/docs/modelfile.md.
- ↑ 21.0 21.1 "Ollama FAQ". Ollama Docs. https://ollama.com/docs/faq.
- ↑ 22.0 22.1 "API Documentation". GitHub - ollama/ollama. https://github.com/ollama/ollama/blob/main/docs/api.md.
- ↑ "Ollama's new engine for multimodal models". Ollama. https://ollama.com/blog/multimodal-models.
- ↑ "Ollama Model Library". Ollama. https://ollama.com/library.
- ↑ "ollama - PyPI". Python Package Index. https://pypi.org/project/ollama/.
- ↑ "Spring AI with Ollama Tool Support". Spring. July 26, 2024. https://spring.io/blog/2024/07/26/spring-ai-with-ollama-tool-support.
- ↑ "Use Open-Source LLMs in PostgreSQL With Ollama and Pgai". TigerData. June 28, 2024. https://www.tigerdata.com/blog/use-open-source-llms-in-postgresql-with-ollama-and-pgai.
- ↑ "Unlock Smart Home Potential with Ollama & IoT Devices". Arsturn. August 26, 2024. https://www.arsturn.com/blog/integrating-ollama-with-iot-devices.
- ↑ "More Models, More ProbLLMs: New Vulnerabilities in Ollama". Oligo Security. October 30, 2024. https://www.oligo.security/blog/more-models-more-probllms.
- ↑ "Probllama: Ollama Remote Code Execution Vulnerability". Wiz. June 24, 2024. https://www.wiz.io/blog/probllama-ollama-vulnerability-cve-2024-37032.
- ↑ "CVE-2025-0312 Detail". NVD. March 20, 2025. https://nvd.nist.gov/vuln/detail/CVE-2025-0312.
- ↑ "Ollama Unauthorized Access Vulnerability Due to Improper Configuration". NSFOCUS. March 13, 2025. https://nsfocusglobal.com/ollama-unauthorized-access-vulnerability-due-to-misconfiguration-cnvd-2025-04094/.
- ↑ "Choosing Your Local LLM: Ollama or LM Studio?". 2am.tech. September 4, 2025. https://www.2am.tech/blog/ollama-vs-lm-studio.
- ↑ "ollama/ollama - GitHub". GitHub. https://github.com/ollama/ollama.
- ↑ "Ollama's Contribution to Local AI Technologies". Arsturn. April 24, 2025. https://www.arsturn.com/blog/ollamas-contribution-to-the-emergence-of-local-ai-technologies.
- ↑ "Ollama violating llama.cpp license for over a year". Reddit. May 16, 2025. https://www.reddit.com/r/LocalLLaMA/comments/1ko1iob/ollama_violating_llamacpp_license_for_over_a_year/.
- ↑ "Unlocking AI's Potential: Ollama's Local Revolution in AI Development". Hashnode. January 29, 2025. https://fahrenheit.hashnode.dev/unlocking-ais-potential-ollamas-local-revolution-in-ai-development.