Skip to main content
This tutorial will guide you through setting up Ollama, a powerful platform serving large language model, on a GPU Bnode using Brightnode. Ollama makes it easy to run, create, and customize models. However, not everyone has access to the compute power needed to run these models. With Brightnode, you can spin up and manage GPUs in the Cloud. Brightnode offers templates with preinstalled libraries, which makes it quick to run Ollama. In the following tutorial, you’ll set up a Bnode on a GPU, install and serve the Ollama model, and interact with it on the CLI.

Prerequisites

The tutorial assumes you have a Brightnode account with credits. No other prior knowledge is needed to complete this tutorial.

Step 1: Start a PyTorch Template on Brightnode

You will create a new Bnode with the PyTorch template. In this step, you will set overrides to configure Ollama.
  1. Log in to your Brightnode account and choose + GPU Bnode.
  2. Choose a GPU Bnode like A40.
  3. From the available templates, select the lastet PyTorch template.
  4. Select Customize Deployment.
    1. Add the port 11434 to the list of exposed ports. This port is used by Ollama for HTTP API requests.
    2. Add the following environment variable to your Bnode to allow Ollama to bind to the HTTP port:
      • Key: OLLAMA_HOST
      • Value: 0.0.0.0
  5. Select Set Overrides, Continue, then Deploy.
This setting configures Ollama to listen on all network interfaces, enabling external access through the exposed port. For detailed instructions on setting environment variables, refer to the Ollama FAQ documentation. Once the Bnode is up and running, you’ll have access to a terminal within the Brightnode interface.

Step 2: Install Ollama

Now that your Bnode is running, you can Log in to the web terminal. The web terminal is a powerful way to interact with your Bnode.
  1. Select Connect and choose Start Web Terminal.
  2. Make note of the Username and Password, then select Connect to Web Terminal.
  3. Enter your username and password.
  4. To ensure Ollama can automatically detect and utilize your GPU, run the following commands.
apt update
apt install lshw
  1. Run the following command to install Ollama and send to the background:
(curl -fsSL https://ollama.com/install.sh | sh && ollama serve > ollama.log 2>&1) &
This command fetches the Ollama installation script and executes it, setting up Ollama on your Bnode. The ollama serve part starts the Ollama server, making it ready to serve AI models. Now that your Ollama server is running on your Bnode, add a model.

Step 3: Run an AI Model with Ollama

To run an AI model using Ollama, pass the model name to the ollama run command:
ollama run [model name]
# ollama run llama2
# ollama run mistral
Replace [model name] with the name of the AI model you wish to deploy. For a complete list of models, see the Ollama Library. This command pulls the model and runs it, making it accessible for inference. You can begin interacting with the model directly from your web terminal. Optionally, you can set up an HTTP API request to interact with Ollama. This is covered in the next step.

Step 4: Interact with Ollama via HTTP API

With Ollama set up and running, you can now interact with it using HTTP API requests. In step 1.4, you configured Ollama to listen on all network interfaces. This means you can use your Bnode as a server to receive requests. Get a list of models To list the local models available in Ollama, you can use the following GET request:
  • cURl
  • Output
curl https://{POD_ID}-11434.proxy.brightnode.net/api/tags
# curl https://cmko4ns22b84xo-11434.proxy.brightnode.net/api/tags
Replace [your-bnode-id] with your actual Bnode Id.
Getting a list of available models is great, but how do you send an HTTP request to your Bnode? Make requests To make an HTTP request against your Bnode, you can use the Ollama interface with your Bnode Id.
curl -X POST https://{POD_ID}-11434.proxy.brightnode.net/api/generate -d '{
  "model": "mistral",
  "prompt":"Here is a story about llamas eating grass"
 }'
Replace [your-bnode-id] with your actual Bnode Id. Because port 11434 is exposed, you can make requests to your Bnode using the curl command. For more information on constructing HTTP requests and other operations you can perform with the Ollama API, consult the Ollama API documentation.

Additional considerations

This tutorial provides a foundational understanding of setting up and using Ollama on a GPU Bnode with Brightnode. By following these steps, you can deploy AI models efficiently and interact with them through HTTP API requests, harnessing the power of GPU acceleration for your AI projects.