Llama (ChatGPT @Home)
A 2 files on prem LLM. - Intro to Large Language Models / Llama In My Living Room
Llama vs ChatGPT
-
ChatGPT vs Llama2 - A Detailed Statistical Comparison
-
How To Run Llama 2 on Anything
- Run a Chatgpt-like Chatbot on a Single GPU with ROCm
- Hardware requirements for Llama 2 #425 - llama-2-13b-chat.ggmlv3.q4_0.bin, llama-2-13b-chat.ggmlv3.q8_0.bin and llama-2-70b-chat.ggmlv3.q4_0.bin from TheBloke.
-
huggingface-cli
download TheBloke/Llama-2-13B-chat-GGUF llama-2-13b-chat.Q4_K_M.gguf
./main -ngl 32 -m /home/yves/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGUF/snapshots/4458acc949de0a9914c3eab623904d4fe999050a/llama-2-13b-chat.Q4_K_M.gguf --color --temp 0.7 --repeat_penalty 1.1 -n -1 -i -ins
- Cheapest hardware to run Llama 2 70B - Anything with 64GB of memory will run a quantized 70B model. What else you need depends on what is acceptable speed for you.
Llama.cpp
The main goal of llama.cpp is to run the LLaMA model using 4-bit integer quantization on a MacBook
Download Model
Already quantized/converted model
Running Chat
see
- Offloading 0 layers to GPU #1956 - use
-ngl 100
to force using VRAM - Computer Hardware Required to Run LLaMA AI Model Locally (GPU, CPU, RAM, SSD)
UI ?
- gpt-llama.cpp - Replace OpenAi’s GPT APIs with llama.cpp’s supported models locally
- LlamaGPT - self-hosted, offline, ChatGPT-like chatbot, powered by Llama 2. 100% private, with no data leaving your device.
see also
- Let’s build GPT: from scratch, in code, spelled out.
- Let’s build the GPT Tokenizer
- tiktokenizer - online syntax coloring token extraction
- Byte Pair algorithm - identify most used pairs in a sequence
- Let’s reproduce GPT-2 (124M)
- Let’s build the GPT Tokenizer
- tiktokenizer - online syntax coloring token extraction
- Llama from scratch, or how to implement a paper without crying
- The Geometry of Truth: Do LLM’s Know True and False
- whisper.cpp - High-performance inference of OpenAI’s Whisper automatic speech recognition (ASR) model
Written on December 3, 2023, Last update on September 5, 2024
LLM
NN
test
algorithm