1. Download the Ollama LLM runtime environment download and install it. After installation the server can be accessed at http://127.0.0.1:11434/.

3. Show the list of installed models. The list should be empty.

ollama list

4. Download the llama3.2 LLM and DeepSeekv3 (404 GB HD & 413 GB RAM).

ollama pull llama3.2 ollama pull deepseek-v3

On the Meta website you can find the current versions of the LLM.

5. Start llama3.

ollama run llama3.2

The language model can be stopped with “Ctrl + d” or with the “/bye” command.

6. Display model details for llama3.2.

ollama show llama3.2

Parameter
architectureSpecifies the architecture of the model. The architecture defines the structure of the neural network. LLaMA is a family of transformer models.
parametersShows the number of model parameters. The model has 3.2B (3.2 billion) parameters. The parameters are the weights and biases of the model.
context lengthSpecifies the maximum length of the context (in tokens) that the model can consider during processing. The value is 131072 (131,072 tokens). A longer context length allows the model to analyze longer texts, documents or conversations without losing relevant information.
embedding lengthSpecifies the quantization method used. Here it is Q4_K_M. Quantization is a technique to reduce the model size by lowering the precision of the model parameters (e.g. from 32-bit to 4-bit).
sizeThis is the actual disk size required to store the model.
download nameThe name of the model.

7. Show running LLM or llama3.x instances.

ollama ps

8. Stop the Ollama server

Both processes can be terminated via Task Manager or Bash.

tasklist | findstr ollama
taskkill /PID /F

09. Uninstall the Ollama model

10. REST call via Postman

Request TypePOST
Content-Typeapplication/json
Request Body{ “model”: “llama3.2”, “prompt”: “What is the capital of Germany?” }