We have talked about Ollama several times on our site, explaining how it is possible to use it to install LLMs locally and how to integrate them into your applications using different frameworks such as Spring AI and LangChain. Now we will explore a new versatile alternative that allows Ollama to be integrated with any software regardless of the language or framework with which it is implemented. We are talking about the Ollama REST API.
In fact, the Ollama REST API offers the exact same functionality that we can experience from the command line but allows remote invocation, transforming your machine into a real web-server.
What are the benefits of Ollama REST API?
Exposing the functionality offered by LLMs using the Representational State Transfer (REST) protocol offers several advantages for developers:
- Scalability: Since REST APIs are stateless by nature, they allow the creation of easily scalable applications. The processing of linguistic models is a very “heavy” operation from a computational point of view. Thanks to the REST API it is possible to deploy multiple Ollama server instances and distribute the load across multiple nodes.
- Compatibility: The REST protocol defines common standards that allow functionality to be integrated using simple HTTP invocations. This allows you to include the capabilities offered by AI with any tool or platform including web browsers, mobile devices and IoT devices.
Prerequisites
In this tutorial we will use the curl command which is usually present natively on any Linux/Mac device and which can also be installed on Windows. Alternatively, you can use a graphical interface like Postman to do the tests.
As a model we will instead use Phi3 which is very lightweight and can run locally on most hardware.
The main APIs
There are three fundamental endpoints of the Ollama REST API:
Pull
So let’s start with the API that allows you to pull a model and download it to your local machine. If you have not already downloaded Phi3 previously you can do so by invoking the following REST endpoint:
curl http://localhost:11434/api/pull -d '{
"name": "phi3"
}'
The request is very simple and consists of a single input parameter which represents the name of the model. By default the Ollama server is exposed on port 11434.
List
Once the download is complete, we can check the availability of the model by invoking the list API which outputs the list of models present locally.
curl http://localhost:11434/api/tags
Chat
The main API is definitely the one that allows you to chat with the LLM.
curl http://localhost:11434/api/chat -d '{
"model": "phi3",
"messages": [
{
"role": "user",
"content": "what is a REST API?"
}
],
"stream": false
}'
By default Ollama exposes APIs in “stream” format. This means that the response will be composed of a large number of messages, each of which represents a token (i.e. a word). This mode is very useful if you want to receive the answer progressively without long waiting times (as if there were a person on the other end writing one word at a time) but the output is difficult to read by eye. It is possible to solve the problem by inserting the “stream: false” parameter.
The model output will look like this:
{
"model": "phi3",
"created_at": "2024-05-12T13:06:42.049683Z",
"message": {
"role": "assistant",
"content": "A REST API (Representational State Transfer Application Programming Interface) is an architectural style for designing networked applications. It uses standard HTTP methods like GET, POST, PUT, and DELETE to perform operations on resources, which are typically identified by URLs. ..."
},
"done": true,
"total_duration": 21016115542,
"load_duration": 10267708,
"prompt_eval_duration": 862673000,
"eval_count": 406,
"eval_duration": 20134192000
}
In conclusion, using the Ollama REST API developers can quickly create intelligent, conversational applications that enrich the user experience.