Ollama is an open source tool that allows you to run large language models (LLMs) directly on your local computer without having to depend on paid cloud services. In this guide we will see how to install it and how to use it.
It is a lightweight framework that provides a simple API for running and managing language models, along with a library of pre-built models that can be easily used in a wide variety of contexts.
Why run LLMs locally?
- Security: When using local LLMs, your data remains on your computer, ensuring you have full control of your information.
- Offline use: Running LLM locally eliminates the need to connect to the Internet.
- Affordability: AI services in the cloud are usually pay-per-use. Thanks to Ollama it is possible to use your own hardware to run models completely free of charge.
Prerequisites to install Ollama
The hardware prerequisites for running LLM on Ollama may vary depending on the size of the model which is measured in “billions of parameters” (B). Below are some indicative estimates:
- at least 8 GB of RAM for 3B models;
- at least 16 GB of RAM for 7B models;
- at least 32 GB of RAM for 13B models.
Obviously the quality of the model and the results it is able to provide is proportional to the number of parameters on which it is installed.
Install Ollama
Ollama is natively compatible with Linux or Apple operating systems but the Windows version was also recently released and is still in beta testing. On Windows it is also possible to install the Linux version as long as you have first installed the WSL (Windows Subsystem for Linux).
It is possible to download the version compatible with your machine at the following link: download Ollama. For those who prefer the containerized version, the Docker image is also available and can be downloaded from Docker Hub.
Once installed and started, to verify that everything was successful you can execute the following instruction in the terminal:
ollama --help
If the command is recognized, it means that you are ready to run your first LLM locally.
Compatible LLMs
Ollama is compatible with different models. Below are the best known ones:
- llama3: model developed by Meta Inc. and trained on dialogues and chats. It is available in two variants of 8B and 70B.
- Gemma: model developed by Google and available in two versions, 2B and 7B. It is a light version of Gemini, trained on web documents.
- Mistral: one of the most famous open models (distributed under the Apache license). Released in a 7.3B version.
- Phi-3 Mini: 3.8B light model trained by Microsoft on website data.
In addition to these, Ollama also has other models and it is important to choose the right one based on the use you want to make. For example, there are models trained on application source code and are more suitable if you want to use them to write code (e.g. Codegemma or Codellama).
Start llama3 on Ollama
In this tutorial we will try the llama3 8B model. To download and run it, simply launch the following command in the console:
ollama run llama3
The model weighs approximately 4.7GB so, once the command has been launched you have to wait for the download to finish. Once completed it will be possible to chat from the command line as in the following example:
From a first test, the llama3 model seems to have good generative capabilities (not only in English) despite being light and compatible with the hardware of a generic personal computer. It is therefore an excellent choice that I suggest you try. In any case, having downloaded Ollama you can have fun personally trying out all the models and evaluating which one is right for your needs.
Ollama main commands
We have already seen the “run” command which is used to start a model but Ollama also has other useful commands which I will summarize below.
Download a model:
ollama pull <nome-modello>
List of models:
ollama list
Delete a model:
ollama rm <nome-modello>
I hope that the guide has been useful to you and I invite you to read the next articles on this topic.