--- title: "Using ollamar" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using ollamar} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ollamar is the easiest way to integrate R with [Ollama](https://ollama.com/), which lets you run language models locally on your own machine. ## Installation 1. Download and install the [Ollama](https://ollama.com) app. - [macOS](https://ollama.com/download/Ollama-darwin.zip) - [Windows preview](https://ollama.com/download/OllamaSetup.exe) - Linux: `curl -fsSL https://ollama.com/install.sh | sh` - [Docker image](https://hub.docker.com/r/ollama/ollama) 2. Open/launch the Ollama app to start the local server. 3. Install either the stable or latest/development version of `ollamar`. Stable version: ```{r eval=FALSE} install.packages("ollamar") ``` For the latest/development version with more features/bug fixes (see latest changes [here](https://hauselin.github.io/ollama-r/news/index.html)), you can install it from GitHub using the `install_github` function from the `remotes` library. If it doesn't work or you don't have `remotes` library, please run `install.packages("remotes")` in R or RStudio before running the code below. ```{r eval=FALSE} # install.packages("remotes") # run this line if you don't have the remotes library remotes::install_github("hauselin/ollamar") ``` ## Usage `ollamar` uses the [`httr2` library](https://httr2.r-lib.org/index.html) to make HTTP requests to the Ollama server, so many functions in this library returns an `httr2_response` object by default. If the response object says `Status: 200 OK`, then the request was successful. ```{r eval=FALSE} library(ollamar) test_connection() # test connection to Ollama server # if you see "Ollama local server not running or wrong server," Ollama app/server isn't running # generate a response/text based on a prompt; returns an httr2 response by default resp <- generate("llama3.1", "tell me a 5-word story") resp #' interpret httr2 response object #' #' Status: 200 OK # if successful, status code should be 200 OK #' Content-Type: application/json #' Body: In memory (414 bytes) # get just the text from the response object resp_process(resp, "text") # get the text as a tibble dataframe resp_process(resp, "df") # alternatively, specify the output type when calling the function initially txt <- generate("llama3.1", "tell me a 5-word story", output = "text") # list available models (models you've pulled/downloaded) list_models() name size parameter_size quantization_level modified 1 codegemma:7b 5 GB 9B Q4_0 2024-07-27T23:44:10 2 llama3.1:latest 4.7 GB 8.0B Q4_0 2024-07-31T07:44:33 ``` ### Pull/download model Download a model from the ollama library (see [API doc](https://github.com/ollama/ollama/blob/main/docs/api.md#pull-a-model)). For the list of models you can pull/download, see [Ollama library](https://ollama.com/library). ```{r eval=FALSE} pull("llama3.1") # download a model (equivalent bash code: ollama run llama3.1) list_models() # verify you've pulled/downloaded the model ``` ### Delete model Delete a model and its data (see [API doc](https://github.com/ollama/ollama/blob/main/docs/api.md#delete-a-model)). You can see what models you've downloaded with `list_models()`. To download a model, specify the name of the model. ```{r eval=FALSE} list_models() # see the models you've pulled/downloaded delete("all-minilm:latest") # returns a httr2 response object ``` ### Generate completion Generate a response for a given prompt (see [API doc](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion)). ```{r eval=FALSE} resp <- generate("llama3.1", "Tomorrow is a...") # return httr2 response object by default resp resp_process(resp, "text") # process the response to return text/vector output generate("llama3.1", "Tomorrow is a...", output = "text") # directly return text/vector output generate("llama3.1", "Tomorrow is a...", stream = TRUE) # return httr2 response object and stream output generate("llama3.1", "Tomorrow is a...", output = "df", stream = TRUE) # image prompt # use a vision/multi-modal model generate("benzie/llava-phi-3", "What is in the image?", images = "image.png", output = 'text') ``` ### Chat Generate the next message in a chat/conversation. ```{r eval=FALSE} messages <- create_message("what is the capital of australia") # default role is user resp <- chat("llama3.1", messages) # default returns httr2 response object resp # resp_process(resp, "text") # process the response to return text/vector output # specify output type when calling the function chat("llama3.1", messages, output = "text") # text vector chat("llama3.1", messages, output = "df") # data frame/tibble chat("llama3.1", messages, output = "jsonlist") # list chat("llama3.1", messages, output = "raw") # raw string chat("llama3.1", messages, stream = TRUE) # stream output and return httr2 response object # create chat history messages <- create_messages( create_message("end all your sentences with !!!", role = "system"), create_message("Hello!"), # default role is user create_message("Hi, how can I help you?!!!", role = "assistant"), create_message("What is the capital of Australia?"), create_message("Canberra!!!", role = "assistant"), create_message("what is your name?") ) cat(chat("llama3.1", messages, output = "text")) # print the formatted output # image prompt messages <- create_message("What is in the image?", images = "image.png") # use a vision/multi-modal model chat("benzie/llava-phi-3", messages, output = "text") ``` #### Stream responses ```{r eval=FALSE} messages <- create_message("Tell me a 1-paragraph story.") # use "llama3.1" model, provide list of messages, return text/vector output, and stream the output chat("llama3.1", messages, output = "text", stream = TRUE) # chat(model = "llama3.1", messages = messages, output = "text", stream = TRUE) # same as above ``` #### Format messages for chat Internally, messages are represented as a `list` of many distinct `list` messages. Each list/message object has two elements: `role` (can be `"user"` or `"assistant"` or `"system"`) and `content` (the message text). The example below shows how the messages/lists are presented. ```{r eval=FALSE} list( # main list containing all the messages list(role = "user", content = "Hello!"), # first message as a list list(role = "assistant", content = "Hi! How are you?") # second message as a list ) ``` To simplify the process of creating and managing messages, `ollamar` provides functions to format and prepare messages for the `chat()` function. These functions also work with other APIs or LLM providers like OpenAI and Anthropic. - `create_messages()`: create messages to build a chat history - `create_message()` creates a chat history with a single message - `append_message()` adds a new message to the end of the existing messages - `prepend_message()` adds a new message to the beginning of the existing messages - `insert_message()` inserts a new message at a specific index in the existing messages - by default, it inserts the message at the -1 (final) position - `delete_message()` delete a message at a specific index in the existing messages - positive and negative indices/positions are supported - if there are 5 messages, the positions are 1 (-5), 2 (-4), 3 (-3), 4 (-2), 5 (-1) ```{r eval=FALSE} # create a chat history with one message messages <- create_message(content = "Hi! How are you? (1ST MESSAGE)", role = "assistant") # or simply, messages <- create_message("Hi! How are you?", "assistant") messages[[1]] # get 1st message # append (add to the end) a new message to the existing messages messages <- append_message("I'm good. How are you? (2ND MESSAGE)", "user", messages) messages[[1]] # get 1st message messages[[2]] # get 2nd message (newly added message) # prepend (add to the beginning) a new message to the existing messages messages <- prepend_message("I'm good. How are you? (0TH MESSAGE)", "user", messages) messages[[1]] # get 0th message (newly added message) messages[[2]] # get 1st message messages[[3]] # get 2nd message # insert a new message at a specific index/position (2nd position in the example below) # by default, the message is inserted at the end of the existing messages (position -1 is the end/default) messages <- insert_message("I'm good. How are you? (BETWEEN 0 and 1 MESSAGE)", "user", messages, 2) messages[[1]] # get 0th message messages[[2]] # get between 0 and 1 message (newly added message) messages[[3]] # get 1st message messages[[4]] # get 2nd message # delete a message at a specific index/position (2nd position in the example below) messages <- delete_message(messages, 2) # create a chat history with multiple messages messages <- create_messages( create_message("You're a knowledgeable tour guide.", role = "system"), create_message("What is the capital of Australia?") # default role is user ) ``` You can convert `data.frame`, `tibble` or `data.table` objects to `list()` of messages and vice versa with functions from base R or other popular libraries. ```{r eval=FALSE} # create a list of messages messages <- create_messages( create_message("You're a knowledgeable tour guide.", role = "system"), create_message("What is the capital of Australia?") ) # convert to dataframe df <- dplyr::bind_rows(messages) # with dplyr library df <- data.table::rbindlist(messages) # with data.table library # convert dataframe to list with apply, purrr functions apply(df, 1, as.list) # convert each row to a list with base R apply purrr::transpose(df) # with purrr library ``` ### Embeddings Get the vector embedding of some prompt/text (see [API doc](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings)). By default, the embeddings are normalized to length 1, which means the following: - cosine similarity can be computed slightly faster using just a dot product - cosine similarity and Euclidean distance will result in the identical rankings ```{r eval=FALSE} embed("llama3.1", "Hello, how are you?") # don't normalize embeddings embed("llama3.1", "Hello, how are you?", normalize = FALSE) ``` ```{r eval=FALSE} # get embeddings for similar prompts e1 <- embed("llama3.1", "Hello, how are you?") e2 <- embed("llama3.1", "Hi, how are you?") # compute cosine similarity sum(e1 * e2) # not equals to 1 sum(e1 * e1) # 1 (identical vectors/embeddings) # non-normalized embeddings e3 <- embed("llama3.1", "Hello, how are you?", normalize = FALSE) e4 <- embed("llama3.1", "Hi, how are you?", normalize = FALSE) ``` ### Parse `httr2_response` objects with `resp_process()` `ollamar` uses the [`httr2` library](https://httr2.r-lib.org/index.html) to make HTTP requests to the Ollama server, so many functions in this library returns an `httr2_response` object by default. You can either parse the output with `resp_process()` or use the `output` parameter in the function to specify the output format. Generally, the `output` parameter can be one of `"df"`, `"jsonlist"`, `"raw"`, `"resp"`, or `"text"`. ```{r eval=FALSE} resp <- list_models(output = "resp") # returns a httr2 response object # # Status: 200 OK # Content-Type: application/json # process the httr2 response object with the resp_process() function resp_process(resp, "df") # or list_models(output = "df") resp_process(resp, "jsonlist") # list # or list_models(output = "jsonlist") resp_process(resp, "raw") # raw string # or list_models(output = "raw") resp_process(resp, "resp") # returns the input httr2 response object # or list_models() or list_models("resp") resp_process(resp, "text") # text vector # or list_models("text") ``` ## Advanced usage ### Tool calling You can use [tool calling](https://ollama.com/blog/tool-support) with the `chat()` function with certain models such as Llama3.1. See also [Python examples](https://github.com/ollama/ollama-python/blob/main/examples/tools.py). First, define your tools as functions. Two example functions are shown below. ```{r eval=FALSE} add_two_numbers <- function(x, y) { return(x + y) } multiply_two_numbers <- function(a, b) { return(a * b) } # each tool needs to be in a list tool1 <- list(type = "function", "function" = list( name = "add_two_numbers", # function name description = "add two numbers", parameters = list( type = "object", required = list("x", "y"), # function parameters properties = list( x = list(class = "numeric", description = "first number"), y = list(class = "numeric", description = "second number"))) ) ) tool2 <- list(type = "function", "function" = list( name = "multiply_two_numbers", # function name description = "multiply two numbers", parameters = list( type = "object", required = list("a", "b"), # function parameters properties = list( x = list(class = "numeric", description = "first number"), y = list(class = "numeric", description = "second number"))) ) ) ``` Then call the `chat()` function with the `tools` parameter set to a list of your tools. Pass in a single tool. ```{r eval=FALSE} msg <- create_message("what is three plus one?") resp <- chat("llama3.1", msg, tools = list(tool1), output = "tools") tool <- resp[[1]] # get the first tool/function # call the tool function with arguments: add_two_numbers(3, 1) do.call(tool$name, tool$arguments) ``` Pass in multiple tools. The model will pick the best tool to use based on the context of the message. ```{r eval=FALSE} msg <- create_message("what is three multiplied by four?") resp <- chat("llama3.1", msg, tools = list(tool1, tool2), output = "tools") tool <- resp[[1]] # get the first tool/function # call the tool function with arguments: multiply_two_numbers(3, 4) do.call(tool$name, tool$arguments) ``` Pass in multiple tools and get the model to use multiple tools. Note that LLM responses are inherently stochastic, so sometimes the model might choose to call only one tool, and sometimes might call tools multiple times. ```{r eval=FALSE} msg <- create_message("add three plus four. then multiply by ten") resp <- chat("llama3.1", msg, tools = list(tool1, tool2), output = "tools") # first tool/function: add_two_numbers(3, 4) do.call(resp[[1]]$name, resp[[1]]$arguments) # 7 # second tool/function: multiply_two_numbers(7, 10) do.call(resp[[2]]$name, resp[[2]]$arguments) # 70 ``` ### Structured outputs The `chat()` and `generate()` functions support [structured outputs](https://ollama.com/blog/structured-outputs), making it possible to constrain a model's output to a specified format defined by a JSON schema (R list). ```{r eval=FALSE} # define a JSON schema as a list to constrain a model's output format <- list( type = "object", properties = list( name = list(type = "string"), capital = list(type = "string"), languages = list(type = "array", items = list(type = "string") ) ), required = list("name", "capital", "languages") ) generate("llama3.1", "tell me about Canada", output = "structured", format = format) msg <- create_message("tell me about Canada") chat("llama3.1", msg, format = format, output = "structured") ``` ### Parallel requests For the `generate()` and `chat()` endpoints/functions, you can specify `output = 'req'` in the function so the functions return `httr2_request` objects instead of `httr2_response` objects. ```{r eval=FALSE} prompt <- "Tell me a 10-word story" req <- generate("llama3.1", prompt, output = "req") # returns a httr2_request object ``` When you have multiple `httr2_request` objects in a list, you can make parallel requests with the `req_perform_parallel` function from the `httr2` library. See [`httr2` documentation](https://httr2.r-lib.org/reference/req_perform_parallel.html) for details. ```{r eval=FALSE} library(httr2) prompt <- "Tell me a 5-word story" # create 5 httr2_request objects that generate a response to the same prompt reqs <- lapply(1:5, function(r) generate("llama3.1", prompt, output = "req")) # make parallel requests and get response resps <- req_perform_parallel(reqs) # list of httr2_request objects # process the responses sapply(resps, resp_process, "text") # get responses as text # [1] "She found him in Paris." "She found the key upstairs." # [3] "She found her long-lost sister." "She found love on Mars." # [5] "She found the diamond ring." ``` Example sentiment analysis with parallel requests with `generate()` function ```{r eval=FALSE} library(httr2) library(glue) library(dplyr) # text to classify texts <- c('I love this product', 'I hate this product', 'I am neutral about this product') # create httr2_request objects for each text, using the same system prompt reqs <- lapply(texts, function(text) { prompt <- glue("Your only task/role is to evaluate the sentiment of product reviews, and your response should be one of the following:'positive', 'negative', or 'other'. Product review: {text}") generate("llama3.1", prompt, output = "req") }) # make parallel requests and get response resps <- req_perform_parallel(reqs) # list of httr2_request objects # process the responses sapply(resps, resp_process, "text") # get responses as text # [1] "Positive" "Negative." # [3] "'neutral' translates to... 'other'." ``` Example sentiment analysis with parallel requests with `chat()` function ```{r eval=FALSE} library(httr2) library(dplyr) # text to classify texts <- c('I love this product', 'I hate this product', 'I am neutral about this product') # create system prompt chat_history <- create_message("Your only task/role is to evaluate the sentiment of product reviews provided by the user. Your response should simply be 'positive', 'negative', or 'other'.", "system") # create httr2_request objects for each text, using the same system prompt reqs <- lapply(texts, function(text) { messages <- append_message(text, "user", chat_history) chat("llama3.1", messages, output = "req") }) # make parallel requests and get response resps <- req_perform_parallel(reqs) # list of httr2_request objects # process the responses bind_rows(lapply(resps, resp_process, "df")) # get responses as dataframes # # A tibble: 3 × 4 # model role content created_at # # 1 llama3.1 assistant Positive 2024-08-05T17:54:27.758618Z # 2 llama3.1 assistant negative 2024-08-05T17:54:27.657525Z # 3 llama3.1 assistant other 2024-08-05T17:54:27.657067Z ```