Ollama

From AI Wiki

Template:Infobox software

Ollama logo1.png

Ollama is an open-source tool designed to simplify the deployment and management of large language models (LLMs) locally on personal computers and servers.[1] It provides a streamlined interface for downloading, running, and managing various open-source LLMs without requiring cloud computing services or extensive technical expertise, positioning itself as "Docker for LLMs."[2]

History

Ollama was founded in 2021 by Jeffrey Morgan and Michael Chiang in Palo Alto, California.[3][4] The company participated in Y Combinator's Winter 2021 batch and raised $125,000 in pre-seed funding from investors including Y Combinator, Essence Venture Capital, Rogue Capital, and Sunflower Capital.[5]

Prior to founding Ollama, Morgan and Chiang, along with Sean Li, created Kitematic, a tool designed to simplify Docker container management on macOS, which was eventually acquired by Docker, Inc.[6][7] Jeffrey Morgan and Sean Li graduated from the University of Waterloo (BASc 2013, Software Engineering), while Michael Chiang was an electrical engineering student there at the time of Kitematic's acquisition.[7] This experience in making complex command-line tools accessible through simpler interfaces directly influenced Ollama's design philosophy.[6]

The platform quickly gained traction in the open-source AI community for its ease of use and Docker-like simplicity in managing LLMs.[5] Initial releases focused on core functionality for running models like LLaMA 2, with subsequent updates introducing features such as multimodal support and tool calling.[8]

Key Milestones

Release History and Milestones
Date Milestone Notes
2021 Company Founded Participated in Y Combinator W21 batch
March 23, 2021 Pre-seed Funding Raised $125,000 from Y Combinator and other investors[9]
2023 Public Launch Basic model management and inference capabilities
February 8, 2024 OpenAI Compatibility Initial compatibility with the OpenAI Chat Completions API at /v1/chat/completions[10]
February 15, 2024 Windows Preview Native Windows build with built-in GPU acceleration and always-on API[11]
March 14, 2024 AMD GPU Preview Preview acceleration on supported AMD Radeon/Instinct cards on Windows and Linux[12]
July 30, 2025 Desktop App Official GUI app for macOS/Windows with file drag-and-drop and context-length controls[13]
September 19, 2025 Cloud Models (Preview) Option to run larger models on datacenter hardware while maintaining local workflows[14]
September 26, 2025 Version 0.12.3 Latest stable release with web search API and performance optimizations[15]

Architecture and Technical Implementation

Core Technology

Ollama is built primarily in Go and leverages llama.cpp as its underlying inference engine through CGo bindings.[16][17] The llama.cpp project, created by Georgi Gerganov in March 2023, provides an efficient C++ implementation of LLaMA and other language models, enabling them to run on consumer-grade hardware.[16]

Model Format

Ollama primarily uses the GGUF (GPT-Generated Unified Format) file format for storing and loading models.[18] GGUF replaced the earlier GGML format and provides better compatibility, metadata handling, and performance optimization for quantized models.[19] This quantization is what allows massive models (for example 70 billion parameters) to run on machines with limited VRAM.

Ollama can also import models from specific Safetensors directories for supported architectures (for example Llama, Mistral, Gemma, Phi).[20]

Features

Core Capabilities

Features at a Glance
Area Details Notes
Server & Port Local HTTP server at 127.0.0.1:11434 Configurable via OLLAMA_HOST environment variable[21]
Core Endpoints /api/generate, /api/chat, /api/embeddings, model management Streaming JSON supported[22]
OpenAI Compatibility /v1/chat/completions Drop-in replacement for many OpenAI-based clients[10]
Local-First Design All processing occurs locally Ensures complete data privacy[2]
Multimodal Support Text, images, and other data types Self-contained projection layers[23]
Tool Calling External function calls Enhances reasoning and automation[8]
Cloud Integration Hybrid mode for larger models Maintains local workflows (v0.12.0+)[14]
Performance Flash attention, GPU/CPU overlap Batch processing for efficiency[15]

Modelfile

A key component of Ollama is the Modelfile, which serves as a blueprint for creating and sharing models.[20] Similar to a Dockerfile, the Modelfile defines model behavior and configuration.

Modelfile Instructions
Instruction Description Example
FROM (Required) Specifies the base model or local GGUF path FROM llama3.2 or FROM ./model.gguf
PARAMETER Sets model parameters PARAMETER temperature 0.7
PARAMETER num_ctx 4096
SYSTEM Defines system message/persona SYSTEM "You are a helpful assistant"
TEMPLATE Sets prompt template format TEMPLATE "[INST] Template:.System Template:.Prompt [/INST]"
ADAPTER Applies LoRA/QLoRA adapters ADAPTER /path/to/adapter.bin
LICENSE Specifies model license LICENSE "MIT"
MESSAGE Provides conversation history for few-shot learning MESSAGE user "What is 1+1?"
MESSAGE assistant "2"

Example Modelfile

# Specify the base model
FROM llama3.2

# Set model parameters
PARAMETER temperature 0.8
PARAMETER num_ctx 4096
PARAMETER stop </s>

# Set the system message
SYSTEM """
You are an expert Python programming assistant.
Always provide clear, concise code examples.
Your responses must be formatted in Markdown.
"""

# Define the chat template
TEMPLATE """
<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>
<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""

This custom model can be created with: ollama create my-assistant -f ./Modelfile

Command-Line Interface

CLI Commands
Command Description Example
ollama run Runs a model interactively ollama run llama3.2
ollama pull Downloads a model ollama pull gemma:2b
ollama create Creates custom model from Modelfile ollama create mymodel -f ./Modelfile
ollama list Lists installed models ollama list
ollama rm Removes a model ollama rm llama3.2
ollama cp Copies a model ollama cp llama3.2 mymodel
ollama push Uploads model to registry ollama push mymodel
ollama serve Starts the Ollama server ollama serve

REST API

Ollama exposes a REST API on port 11434 by default, providing programmatic access to model functionality:[22]

Main API Endpoints
Endpoint Method Description Example
/api/generate POST Generate text completion curl -X POST http://localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"Hello"}'
/api/chat POST Chat conversation interface curl -X POST http://localhost:11434/api/chat -d '{"model":"llama3.2","messages":[{"role":"user","content":"Hi"}]}'
/api/embeddings POST Generate text embeddings curl -X POST http://localhost:11434/api/embeddings -d '{"model":"llama3.2","prompt":"Test"}'
/api/pull POST Download a model curl -X POST http://localhost:11434/api/pull -d '{"name":"llama3.2"}'
/api/show POST Show model information curl -X POST http://localhost:11434/api/show -d '{"name":"llama3.2"}'
/api/list GET List installed models curl http://localhost:11434/api/list

Supported Models

Ollama supports a wide range of open-source language models, continuously updated as new models are released:[24]

Supported Models (Select Examples)
Model Name Parameters Category Description Creator
Llama 3.2 1B, 3B, 11B, 90B Source/Fine-Tuned Meta's latest open-source LLM with improved reasoning and tool support Meta AI
Gemma 2 2B, 9B, 27B Source Google's lightweight models for efficient on-device inference Google
DeepSeek-R1 7B, 67B Tools/Multimodal Hybrid model supporting thinking modes for complex tasks DeepSeek AI
Mistral / Mixtral 7B, 8x7B, 8x22B Source High-efficiency models, often outperforming larger models Mistral AI
Qwen2.5 Up to 72B Source Alibaba's multilingual models supporting 128K tokens Alibaba
Phi 4 3B, 14B Fine-Tuned Microsoft's small language models for efficient reasoning Microsoft
CodeLlama 7B, 13B, 34B Code Specialized for code generation and programming tasks Meta AI
LLaVA 7B, 13B Multimodal Visual language model for text and image understanding Various
Snowflake Arctic Embed 568M Embedding Multilingual embedding model for retrieval tasks Snowflake
Firefunction 7B Tools Function-calling model for automation and integration Various
Vicuna, Alpaca Various Fine-Tuned LLaMA derivatives with specialized capabilities Various

Installation

System Requirements

Operating System Requirements
Platform Minimum Version Installation Method
macOS 11 Big Sur or later Download .dmg from official website
Linux Ubuntu 18.04 or equivalent sh
Windows Windows 10 22H2 or later Download .exe installer
Docker Any platform docker pull ollama/ollama
Hardware Requirements by Model Size
Model Size RAM Required Storage GPU VRAM (Optional)
3B parameters 8GB 10GB+ 4GB
7B parameters 16GB 20GB+ 8GB
13B parameters 32GB 40GB+ 16GB
70B parameters 64GB+ 100GB+ 48GB+

GPU Support

Integration and Ecosystem

Programming Languages

Ollama provides official client libraries:[25]

Third-Party Integrations

Development Frameworks

User Interfaces

Database and Infrastructure

Privacy and Security

Privacy Features

By default, Ollama operates entirely locally:[21]

  • Server binds to 127.0.0.1:11434 (loopback interface only)
  • No prompts or responses sent to external servers
  • Complete data privacy for sensitive information
  • Offline operation after model download

To expose on a network, users must explicitly set OLLAMA_HOST environment variable (for example OLLAMA_HOST=0.0.0.0:11434).

Security Vulnerabilities

Ollama has addressed several security vulnerabilities:[29]

Known Security Vulnerabilities
CVE Description Affected Versions Status
CVE-2024-37032 Remote code execution via API misconfiguration (Probllama) <0.1.34 Fixed[30]
CVE-2025-0312 Malicious GGUF model exploitation ≤0.3.14 Fixed[31]
CNVD-2025-04094 Unauthorized access due to improper configuration Various Configuration issue[32]

Users are advised to keep Ollama updated and configure securely, especially when exposing APIs.

Comparisons with Similar Tools

Ollama vs. LM Studio

Comparison with LM Studio
Feature Ollama LM Studio
Interface CLI-focused GUI-focused
License MIT (open source) Proprietary (free to use)
Model Sources Ollama registry + GGUF Hugging Face + GGUF
Concurrent Handling Excellent (batching) Limited
macOS Performance Good Better (MLX support)
Automation Excellent Limited
API Built-in REST API Available

Ollama is preferred for its flexibility in integrations and developer-centric approach, while LM Studio offers a more user-friendly GUI.[33]

Community and Reception

Ollama has a vibrant community on GitHub, with over 90,000 stars and active contributions focusing on performance and compatibility.[34] It has been praised for advancing local AI technologies, cost savings, and accessibility.[35]

Licensing Controversy

Some criticism has arisen regarding licensing compliance issues with dependencies like llama.cpp, with community members raising concerns about proper attribution.[36] The Ollama team has been working to address these concerns.

Significance and Impact

Ollama has been a key driver in the democratization of large language models by:[37]

  • Enabling developers to build and test AI-powered applications locally without cost
  • Allowing researchers to experiment with various open-source models easily
  • Empowering hobbyists to run state-of-the-art AI on personal computers
  • Enhancing privacy for users who can leverage powerful AI without data leaving their machine
  • Fostering a community-driven approach to AI development

The tool is widely used in education, research, and enterprise for privacy-sensitive applications and has become a foundational tool in the open-source AI movement.

See Also

References

  1. Bala Priya C (May 7, 2024). "Ollama Tutorial: Running LLMs Locally Made Super Simple". KDnuggets. https://www.kdnuggets.com/ollama-tutorial-running-llms-locally-made-super-simple.
  2. 2.0 2.1 Vijaykumar (May 5, 2025). "Deploy LLMs Locally with Ollama: Your Complete Guide to Local AI Development". Medium. https://medium.com/@bluudit/deploy-llms-locally-with-ollama-your-complete-guide-to-local-ai-development.
  3. "Ollama - Y Combinator". Y Combinator. https://www.ycombinator.com/companies/ollama.
  4. "Ollama Company Profile". PitchBook. https://pitchbook.com/profiles/company/537457-42.
  5. 5.0 5.1 "Who Is the Owner of Ollama? Discover the Leadership". BytePlus. https://www.byteplus.com/en/topic/375310.
  6. 6.0 6.1 "Who Is Behind Ollama? Discover the Founders & Team". BytePlus. https://www.byteplus.com/en/topic/418063.
  7. 7.0 7.1 "Software startup with Waterloo Engineering roots acquired by Docker Inc.". University of Waterloo. May 21, 2024. https://uwaterloo.ca/engineering/news/software-startup-waterloo-engineering-roots-acquired-docker.
  8. 8.0 8.1 "Tool support". Ollama Blog. July 25, 2024. https://ollama.com/blog/tool-support.
  9. "Ollama Pre Seed Round". Crunchbase. March 23, 2021. https://www.crunchbase.com/funding_round/ollama-pre-seed--c51b44d8.
  10. 10.0 10.1 "OpenAI compatibility". Ollama Blog. https://ollama.com/blog/openai-compatibility.
  11. "Windows preview". Ollama Blog. February 15, 2024. https://ollama.com/blog/windows-preview.
  12. "Ollama now supports AMD graphics cards". Ollama Blog. March 14, 2024. https://ollama.com/blog/amd-support.
  13. "Ollama's new app". Ollama Blog. July 30, 2025. https://ollama.com/blog/new-app.
  14. 14.0 14.1 "Cloud models". Ollama Blog. September 19, 2025. https://ollama.com/blog/cloud-models.
  15. 15.0 15.1 "Releases - ollama/ollama". GitHub. https://github.com/ollama/ollama/releases.
  16. 16.0 16.1 "llama.cpp vs. ollama: Running LLMs Locally for Enterprises". Picovoice. https://picovoice.ai/blog/local-llms-llamacpp-ollama/.
  17. "Ollama: How It Works Internally". Medium. July 5, 2024. https://medium.com/@laiso/ollama-under-the-hood-f8ed0f14d90c.
  18. "Use Ollama with any GGUF Model on Hugging Face Hub". Hugging Face. https://huggingface.co/docs/hub/en/ollama.
  19. Mark Needham (October 18, 2023). "Ollama: Running GGUF Models from Hugging Face". Mark Needham. https://www.markhneedham.com/blog/2023/10/18/ollama-hugging-face-gguf-models/.
  20. 20.0 20.1 "Modelfile Documentation". GitHub - ollama/ollama. https://github.com/ollama/ollama/blob/main/docs/modelfile.md.
  21. 21.0 21.1 "Ollama FAQ". Ollama Docs. https://ollama.com/docs/faq.
  22. 22.0 22.1 "API Documentation". GitHub - ollama/ollama. https://github.com/ollama/ollama/blob/main/docs/api.md.
  23. "Ollama's new engine for multimodal models". Ollama. https://ollama.com/blog/multimodal-models.
  24. "Ollama Model Library". Ollama. https://ollama.com/library.
  25. "ollama - PyPI". Python Package Index. https://pypi.org/project/ollama/.
  26. "Spring AI with Ollama Tool Support". Spring. July 26, 2024. https://spring.io/blog/2024/07/26/spring-ai-with-ollama-tool-support.
  27. "Use Open-Source LLMs in PostgreSQL With Ollama and Pgai". TigerData. June 28, 2024. https://www.tigerdata.com/blog/use-open-source-llms-in-postgresql-with-ollama-and-pgai.
  28. "Unlock Smart Home Potential with Ollama & IoT Devices". Arsturn. August 26, 2024. https://www.arsturn.com/blog/integrating-ollama-with-iot-devices.
  29. "More Models, More ProbLLMs: New Vulnerabilities in Ollama". Oligo Security. October 30, 2024. https://www.oligo.security/blog/more-models-more-probllms.
  30. "Probllama: Ollama Remote Code Execution Vulnerability". Wiz. June 24, 2024. https://www.wiz.io/blog/probllama-ollama-vulnerability-cve-2024-37032.
  31. "CVE-2025-0312 Detail". NVD. March 20, 2025. https://nvd.nist.gov/vuln/detail/CVE-2025-0312.
  32. "Ollama Unauthorized Access Vulnerability Due to Improper Configuration". NSFOCUS. March 13, 2025. https://nsfocusglobal.com/ollama-unauthorized-access-vulnerability-due-to-misconfiguration-cnvd-2025-04094/.
  33. "Choosing Your Local LLM: Ollama or LM Studio?". 2am.tech. September 4, 2025. https://www.2am.tech/blog/ollama-vs-lm-studio.
  34. "ollama/ollama - GitHub". GitHub. https://github.com/ollama/ollama.
  35. "Ollama's Contribution to Local AI Technologies". Arsturn. April 24, 2025. https://www.arsturn.com/blog/ollamas-contribution-to-the-emergence-of-local-ai-technologies.
  36. "Ollama violating llama.cpp license for over a year". Reddit. May 16, 2025. https://www.reddit.com/r/LocalLLaMA/comments/1ko1iob/ollama_violating_llamacpp_license_for_over_a_year/.
  37. "Unlocking AI's Potential: Ollama's Local Revolution in AI Development". Hashnode. January 29, 2025. https://fahrenheit.hashnode.dev/unlocking-ais-potential-ollamas-local-revolution-in-ai-development.

External Links