Llama (ChatGPT @Home)

A 2 files on prem LLM. - Intro to Large Language Models / Llama In My Living Room

Llama vs ChatGPT

ChatGPT vs Llama2 - A Detailed Statistical Comparison
How To Run Llama 2 on Anything
- Run a Chatgpt-like Chatbot on a Single GPU with ROCm
- Hardware requirements for Llama 2 #425 - llama-2-13b-chat.ggmlv3.q4_0.bin, llama-2-13b-chat.ggmlv3.q8_0.bin and llama-2-70b-chat.ggmlv3.q4_0.bin from TheBloke. - huggingface-cli download TheBloke/Llama-2-13B-chat-GGUF llama-2-13b-chat.Q4_K_M.gguf
  - ./main -ngl 32 -m /home/yves/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGUF/snapshots/4458acc949de0a9914c3eab623904d4fe999050a/llama-2-13b-chat.Q4_K_M.gguf --color --temp 0.7 --repeat_penalty 1.1 -n -1 -i -ins
- Cheapest hardware to run Llama 2 70B - Anything with 64GB of memory will run a quantized 70B model. What else you need depends on what is acceptable speed for you.

Llama.cpp

The main goal of llama.cpp is to run the LLaMA model using 4-bit integer quantization on a MacBook

$ git clone https://github.com/ggerganov/llama.cpp
$ cd llama.cpp
# https://github.com/ggerganov/llama.cpp#hipblas
$ make LLAMA_HIPBLAS=1 #  BLAS acceleration on HIP-supported AMD GPUs (ROCm)

Download Model

Already quantized/converted model

Running Chat

$ ./main -m ./models/7B/llama-2-7b.Q4_K_M.gguf  --repeat_penalty 1.0 -ngl 100 --color -i -r "User:" -f prompts/chat-with-bob.txt

User:Where is Bratislava ?
Bob: Bratislava is the capital of Slovakia, which is a country in the European Union.
User:When was it founded ?
Bob: Bratislava was founded in 1536, by King Ferdinand I. # Beware of LLM answer...

see

Offloading 0 layers to GPU #1956 - use -ngl 100 to force using VRAM
Computer Hardware Required to Run LLaMA AI Model Locally (GPU, CPU, RAM, SSD)

UI ?

gpt-llama.cpp - Replace OpenAi’s GPT APIs with llama.cpp’s supported models locally
LlamaGPT - self-hosted, offline, ChatGPT-like chatbot, powered by Llama 2. 100% private, with no data leaving your device.

#4577

Llama (ChatGPT @Home)

Llama vs ChatGPT

Llama.cpp

Download Model

Running Chat

UI ?

see also