<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Open Source Tools | Ziyang Lin</title><link>https://ziyanglin.netlify.app/en/tags/open-source-tools/</link><atom:link href="https://ziyanglin.netlify.app/en/tags/open-source-tools/index.xml" rel="self" type="application/rss+xml"/><description>Open Source Tools</description><generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><lastBuildDate>Fri, 27 Jun 2025 02:00:00 +0000</lastBuildDate><image><url>https://ziyanglin.netlify.app/img/icon-192.png</url><title>Open Source Tools</title><link>https://ziyanglin.netlify.app/en/tags/open-source-tools/</link></image><item><title>Ollama Practical Guide: Local Deployment and Management of Large Language Models</title><link>https://ziyanglin.netlify.app/en/post/ollama-documentation/</link><pubDate>Fri, 27 Jun 2025 02:00:00 +0000</pubDate><guid>https://ziyanglin.netlify.app/en/post/ollama-documentation/</guid><description>&lt;h2 id="1-introduction">1. Introduction&lt;/h2>
&lt;p>Ollama is a powerful open-source tool designed to allow users to easily download, run, and manage large language models (LLMs) in local environments. Its core advantage lies in simplifying the deployment and use of complex models, enabling developers, researchers, and enthusiasts to experience and utilize state-of-the-art artificial intelligence technology on personal computers without specialized hardware or complex configurations.&lt;/p>
&lt;p>&lt;strong>Key Advantages:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Ease of Use:&lt;/strong> Complete model download, running, and interaction through simple command-line instructions.&lt;/li>
&lt;li>&lt;strong>Cross-Platform Support:&lt;/strong> Supports macOS, Windows, and Linux.&lt;/li>
&lt;li>&lt;strong>Rich Model Library:&lt;/strong> Supports numerous popular open-source models such as Llama 3, Mistral, Gemma, Phi-3, and more.&lt;/li>
&lt;li>&lt;strong>Highly Customizable:&lt;/strong> Through &lt;code>Modelfile&lt;/code>, users can easily customize model behavior, system prompts, and parameters.&lt;/li>
&lt;li>&lt;strong>API-Driven:&lt;/strong> Provides a REST API for easy integration with other applications and services.&lt;/li>
&lt;li>&lt;strong>Open Source Community:&lt;/strong> Has an active community continuously contributing new models and features.&lt;/li>
&lt;/ul>
&lt;p>This document will provide a comprehensive introduction to Ollama's various features, from basic fundamentals to advanced applications, helping you fully master this powerful tool.&lt;/p>
&lt;hr>
&lt;h2 id="2-quick-start">2. Quick Start&lt;/h2>
&lt;p>This section will guide you through installing and basic usage of Ollama.&lt;/p>
&lt;h3 id="21-installation">2.1 Installation&lt;/h3>
&lt;p>Visit the &lt;a href="https://ollama.com/">Ollama official website&lt;/a> to download and install the package suitable for your operating system.&lt;/p>
&lt;h3 id="22-running-your-first-model">2.2 Running Your First Model&lt;/h3>
&lt;p>After installation, open a terminal (or command prompt) and use the &lt;code>ollama run&lt;/code> command to download and run a model. For example, to run the Llama 3 model:&lt;/p>
&lt;pre>&lt;code class="language-shell">ollama run llama3
&lt;/code>&lt;/pre>
&lt;p>On first run, Ollama will automatically download the required model files from the model library. Once the download is complete, you can directly converse with the model in the terminal.&lt;/p>
&lt;h3 id="23-managing-local-models">2.3 Managing Local Models&lt;/h3>
&lt;p>You can use the following commands to manage locally downloaded models:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>List Local Models:&lt;/strong>&lt;/p>
&lt;pre>&lt;code class="language-shell">ollama list
&lt;/code>&lt;/pre>
&lt;p>This command displays the name, ID, size, and modification time of all downloaded models.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Remove Local Models:&lt;/strong>&lt;/p>
&lt;pre>&lt;code class="language-shell">ollama rm &amp;lt;model_name&amp;gt;
&lt;/code>&lt;/pre>
&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="3-core-concepts">3. Core Concepts&lt;/h2>
&lt;h3 id="31-modelfile">3.1 Modelfile&lt;/h3>
&lt;p>&lt;code>Modelfile&lt;/code> is one of Ollama's core features. It's a configuration file similar to &lt;code>Dockerfile&lt;/code> that allows you to define and create custom models. Through &lt;code>Modelfile&lt;/code>, you can:&lt;/p>
&lt;ul>
&lt;li>Specify a base model.&lt;/li>
&lt;li>Set model parameters (such as temperature, top_p, etc.).&lt;/li>
&lt;li>Define the model's system prompt.&lt;/li>
&lt;li>Customize the model's interaction template.&lt;/li>
&lt;li>Apply LoRA adapters.&lt;/li>
&lt;/ul>
&lt;p>A simple &lt;code>Modelfile&lt;/code> example:&lt;/p>
&lt;pre>&lt;code class="language-Modelfile"># Specify base model
FROM llama3
# Set model temperature
PARAMETER temperature 0.8
# Set system prompt
SYSTEM &amp;quot;&amp;quot;&amp;quot;
You are a helpful AI assistant. Your name is Roo.
&amp;quot;&amp;quot;&amp;quot;
&lt;/code>&lt;/pre>
&lt;p>Use the &lt;code>ollama create&lt;/code> command to create a new model based on a &lt;code>Modelfile&lt;/code>:&lt;/p>
&lt;pre>&lt;code class="language-shell">ollama create my-custom-model -f ./Modelfile
&lt;/code>&lt;/pre>
&lt;h3 id="32-model-import">3.2 Model Import&lt;/h3>
&lt;p>Ollama supports importing models from external file systems, particularly from &lt;code>Safetensors&lt;/code> format weight files.&lt;/p>
&lt;p>In a &lt;code>Modelfile&lt;/code>, use the &lt;code>FROM&lt;/code> directive and provide the directory path containing &lt;code>safetensors&lt;/code> files:&lt;/p>
&lt;pre>&lt;code class="language-Modelfile">FROM /path/to/safetensors/directory
&lt;/code>&lt;/pre>
&lt;p>Then use the &lt;code>ollama create&lt;/code> command to create the model.&lt;/p>
&lt;h3 id="33-multimodal-models">3.3 Multimodal Models&lt;/h3>
&lt;p>Ollama supports multimodal models (such as LLaVA) that can process both text and image inputs simultaneously.&lt;/p>
&lt;pre>&lt;code class="language-shell">ollama run llava &amp;quot;What's in this image? /path/to/image.png&amp;quot;
&lt;/code>&lt;/pre>
&lt;hr>
&lt;h2 id="4-api-reference">4. API Reference&lt;/h2>
&lt;p>Ollama provides a set of REST APIs for programmatically interacting with models. The default service address is &lt;code>http://localhost:11434&lt;/code>.&lt;/p>
&lt;h3 id="41-apigenerate">4.1 &lt;code>/api/generate&lt;/code>&lt;/h3>
&lt;p>Generate text.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Request (Streaming):&lt;/strong>
&lt;pre>&lt;code class="language-shell">curl http://localhost:11434/api/generate -d '{
&amp;quot;model&amp;quot;: &amp;quot;llama3&amp;quot;,
&amp;quot;prompt&amp;quot;: &amp;quot;Why is the sky blue?&amp;quot;
}'
&lt;/code>&lt;/pre>
&lt;/li>
&lt;li>&lt;strong>Request (Non-streaming):&lt;/strong>
&lt;pre>&lt;code class="language-shell">curl http://localhost:11434/api/generate -d '{
&amp;quot;model&amp;quot;: &amp;quot;llama3&amp;quot;,
&amp;quot;prompt&amp;quot;: &amp;quot;Why is the sky blue?&amp;quot;,
&amp;quot;stream&amp;quot;: false
}'
&lt;/code>&lt;/pre>
&lt;/li>
&lt;/ul>
&lt;h3 id="42-apichat">4.2 &lt;code>/api/chat&lt;/code>&lt;/h3>
&lt;p>Conduct multi-turn conversations.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Request:&lt;/strong>
&lt;pre>&lt;code class="language-shell">curl http://localhost:11434/api/chat -d '{
&amp;quot;model&amp;quot;: &amp;quot;llama3&amp;quot;,
&amp;quot;messages&amp;quot;: [
{
&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;,
&amp;quot;content&amp;quot;: &amp;quot;why is the sky blue?&amp;quot;
}
],
&amp;quot;stream&amp;quot;: false
}'
&lt;/code>&lt;/pre>
&lt;/li>
&lt;/ul>
&lt;h3 id="43-apiembed">4.3 &lt;code>/api/embed&lt;/code>&lt;/h3>
&lt;p>Generate embedding vectors for text.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Request:&lt;/strong>
&lt;pre>&lt;code class="language-shell">curl http://localhost:11434/api/embed -d '{
&amp;quot;model&amp;quot;: &amp;quot;all-minilm&amp;quot;,
&amp;quot;input&amp;quot;: [&amp;quot;Why is the sky blue?&amp;quot;, &amp;quot;Why is the grass green?&amp;quot;]
}'
&lt;/code>&lt;/pre>
&lt;/li>
&lt;/ul>
&lt;h3 id="44-apitags">4.4 &lt;code>/api/tags&lt;/code>&lt;/h3>
&lt;p>List all locally available models.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Request:&lt;/strong>
&lt;pre>&lt;code class="language-shell">curl http://localhost:11434/api/tags
&lt;/code>&lt;/pre>
&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="5-command-line-tools-cli">5. Command Line Tools (CLI)&lt;/h2>
&lt;p>Ollama provides a rich set of command-line tools for managing models and interacting with the service.&lt;/p>
&lt;ul>
&lt;li>&lt;code>ollama run &amp;lt;model&amp;gt;&lt;/code>: Run a model.&lt;/li>
&lt;li>&lt;code>ollama create &amp;lt;model&amp;gt; -f &amp;lt;Modelfile&amp;gt;&lt;/code>: Create a model from a Modelfile.&lt;/li>
&lt;li>&lt;code>ollama pull &amp;lt;model&amp;gt;&lt;/code>: Pull a model from a remote repository.&lt;/li>
&lt;li>&lt;code>ollama push &amp;lt;model&amp;gt;&lt;/code>: Push a model to a remote repository.&lt;/li>
&lt;li>&lt;code>ollama list&lt;/code>: List local models.&lt;/li>
&lt;li>&lt;code>ollama cp &amp;lt;source_model&amp;gt; &amp;lt;dest_model&amp;gt;&lt;/code>: Copy a model.&lt;/li>
&lt;li>&lt;code>ollama rm &amp;lt;model&amp;gt;&lt;/code>: Delete a model.&lt;/li>
&lt;li>&lt;code>ollama ps&lt;/code>: View running models and their resource usage.&lt;/li>
&lt;li>&lt;code>ollama stop &amp;lt;model&amp;gt;&lt;/code>: Stop a running model and unload it from memory.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="6-advanced-features">6. Advanced Features&lt;/h2>
&lt;h3 id="61-openai-api-compatibility">6.1 OpenAI API Compatibility&lt;/h3>
&lt;p>Ollama provides an endpoint compatible with the OpenAI API, allowing you to seamlessly migrate existing OpenAI applications to Ollama. The default address is &lt;code>http://localhost:11434/v1&lt;/code>.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>List Models (Python):&lt;/strong>
&lt;pre>&lt;code class="language-python">from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1',
api_key='ollama', # required, but unused
)
response = client.models.list()
print(response)
&lt;/code>&lt;/pre>
&lt;/li>
&lt;/ul>
&lt;h3 id="62-structured-output">6.2 Structured Output&lt;/h3>
&lt;p>By combining the OpenAI-compatible API with Pydantic, you can force the model to output JSON with a specific structure.&lt;/p>
&lt;pre>&lt;code class="language-python">from pydantic import BaseModel
from openai import OpenAI
client = OpenAI(base_url=&amp;quot;http://localhost:11434/v1&amp;quot;, api_key=&amp;quot;ollama&amp;quot;)
class UserInfo(BaseModel):
name: str
age: int
try:
completion = client.beta.chat.completions.parse(
model=&amp;quot;llama3.1:8b&amp;quot;,
messages=[{&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;My name is John and I am 30 years old.&amp;quot;}],
response_format=UserInfo,
)
print(completion.choices[0].message.parsed)
except Exception as e:
print(f&amp;quot;Error: {e}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;h3 id="63-performance-tuning">6.3 Performance Tuning&lt;/h3>
&lt;p>You can adjust Ollama's performance and resource management through environment variables:&lt;/p>
&lt;ul>
&lt;li>&lt;code>OLLAMA_KEEP_ALIVE&lt;/code>: Set how long models remain active in memory. For example, &lt;code>10m&lt;/code>, &lt;code>24h&lt;/code>, or &lt;code>-1&lt;/code> (permanent).&lt;/li>
&lt;li>&lt;code>OLLAMA_MAX_LOADED_MODELS&lt;/code>: Maximum number of models loaded into memory simultaneously.&lt;/li>
&lt;li>&lt;code>OLLAMA_NUM_PARALLEL&lt;/code>: Number of requests each model can process in parallel.&lt;/li>
&lt;/ul>
&lt;h3 id="64-lora-adapters">6.4 LoRA Adapters&lt;/h3>
&lt;p>Use the &lt;code>ADAPTER&lt;/code> directive in a &lt;code>Modelfile&lt;/code> to apply a LoRA (Low-Rank Adaptation) adapter, changing the model's behavior without modifying the base model weights.&lt;/p>
&lt;pre>&lt;code class="language-Modelfile">FROM llama3
ADAPTER /path/to/your-lora-adapter.safetensors
&lt;/code>&lt;/pre>
&lt;hr>
&lt;h2 id="7-appendix">7. Appendix&lt;/h2>
&lt;h3 id="71-troubleshooting">7.1 Troubleshooting&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Check CPU Features:&lt;/strong> On Linux, you can use the following command to check if your CPU supports instruction sets like AVX, which are crucial for the performance of certain models.
&lt;pre>&lt;code class="language-shell">cat /proc/cpuinfo | grep flags | head -1
&lt;/code>&lt;/pre>
&lt;/li>
&lt;/ul>
&lt;h3 id="72-contribution-guidelines">7.2 Contribution Guidelines&lt;/h3>
&lt;p>Ollama is an open-source project, and community contributions are welcome. When submitting code, please follow good commit message formats, for example:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Good:&lt;/strong> &lt;code>llm/backend/mlx: support the llama architecture&lt;/code>&lt;/li>
&lt;li>&lt;strong>Bad:&lt;/strong> &lt;code>feat: add more emoji&lt;/code>&lt;/li>
&lt;/ul>
&lt;h3 id="73-related-links">7.3 Related Links&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Official Website:&lt;/strong> &lt;a href="https://ollama.com/">&lt;a href="https://ollama.com/">https://ollama.com/&lt;/a>&lt;/a>&lt;/li>
&lt;li>&lt;strong>GitHub Repository:&lt;/strong> &lt;a href="https://github.com/ollama/ollama">&lt;a href="https://github.com/ollama/ollama">https://github.com/ollama/ollama&lt;/a>&lt;/a>&lt;/li>
&lt;li>&lt;strong>Model Library:&lt;/strong> &lt;a href="https://ollama.com/library">&lt;a href="https://ollama.com/library">https://ollama.com/library&lt;/a>&lt;/a>&lt;/li>
&lt;/ul></description></item></channel></rss>