- Shell 57.9%
- Python 34.7%
- Jinja 7.4%
|
|
||
|---|---|---|
| completions | ||
| debian | ||
| lorebooks | ||
| pkg | ||
| prompt.d | ||
| searxng | ||
| sv | ||
| systemd | ||
| templates | ||
| tools | ||
| .gitignore | ||
| install.sh | ||
| LICENSE | ||
| prompt | ||
| prompt.conf | ||
| README.md | ||
| setup-llama.sh | ||
prompt — local LLM assistant
A versatile assistant powered by a local llama.cpp instance. Ask anything — general knowledge, research, medical questions, math, coding, or system administration. For factual/current topics, the model searches the web and reads pages. For system tasks, it can execute shell commands. No cloud, no API keys, no telemetry.
Runs as a systemd or runit service. Supports Debian/Ubuntu/Linux Mint (.deb), Arch Linux (PKGBUILD), Fedora (.rpm), and Void Linux.
Install
git clone https://git.crashcat.net/voibe/llm-assist && cd llmassist
sudo ./install.sh
Debian/Ubuntu/Mint:
sudo apt-get install curl jq lynx
# Optional: sudo apt-get install python3 python3-venv poppler-utils
Arch Linux:
sudo pacman -S curl jq lynx
# Optional: sudo pacman -S python python-pip poppler
Fedora:
sudo dnf install curl jq lynx
# Optional: sudo dnf install python3 python3-pip poppler-utils
Void Linux:
sudo xbps-install curl jq lynx
# Optional: sudo xbps-install python3 python3-pip poppler-utils
Or install a distro package:
# Debian/Ubuntu/Mint (.deb)
sudo dpkg -i prompt-llm_0.1.0-1_all.deb
sudo apt-get install -f
# Arch Linux (PKGBUILD)
cd pkg/arch && makepkg -si
# Fedora (.rpm)
sudo dnf install prompt-llm-0.1.0-1.noarch.rpm
Setup
1. Build llama.cpp
sudo ./setup-llama.sh build --rocm # AMD GPU (7900 XTX, RX 7800, etc.)
sudo ./setup-llama.sh build --vulkan # Vulkan (AMD/Intel/NVIDIA)
sudo ./setup-llama.sh build --cpu # CPU only
This clones llama.cpp, builds with the selected backend, and installs llama-server, llama-cli, and llama-quantize to /usr/local/bin/. Missing build dependencies are installed automatically (apt, pacman, dnf, or xbps).
2. Download a model
Browse available quantizations:
sudo ./setup-llama.sh list-models Qwen/Qwen3.5-9B-Instruct-GGUF
Fetch one:
sudo ./setup-llama.sh fetch Qwen/Qwen3.5-9B-Instruct-GGUF qwen3.5-9b-instruct-q5_k_m.gguf
Or use a direct URL:
sudo ./setup-llama.sh fetch https://huggingface.co/Qwen/Qwen3.5-9B-Instruct-GGUF/resolve/main/qwen3.5-9b-instruct-q5_k_m.gguf
The first model downloaded is automatically linked as /var/lib/prompt/model.gguf.
Recommended sizes:
| Hardware | Model | VRAM/RAM |
|---|---|---|
| 7900 XTX (24 GB) | Qwen3.5-9B-Instruct-Q5_K_M | ~5.5 GB |
| iGPU / limited RAM | Qwen2.5-3B-Instruct-Q5_K_M | ~2.5 GB |
| Minimal | Qwen2.5-1.5B-Instruct-Q5_K_M | ~1.3 GB |
Check everything:
sudo ./setup-llama.sh status
3. Configure
Edit /etc/prompt.conf. The defaults work for a 7900 XTX with the model at /var/lib/prompt/model.gguf.
Model selection — Three options (use one):
# Option 1: Just the filename (recommended)
PROMPT_MODEL="Qwen3.5-9B-Q6_K.gguf"
# Option 2: Full path (for models outside /var/lib/prompt/)
PROMPT_MODEL_PATH="/var/lib/prompt/model.gguf"
# Option 3: Use the default symlink (no config needed)
# ln -sf /var/lib/prompt/Qwen3.5-9B-Q6_K.gguf /var/lib/prompt/model.gguf
To see available models:
ls /var/lib/prompt/*.gguf
Other key settings:
PROMPT_GPU_LAYERS="99" # 99 = offload everything, 0 = CPU only
PROMPT_CONTEXT_SIZE="8192" # raise for long man page lookups
PROMPT_USER="_prompt" # drop privileges (created by install.sh)
4. Enable the service
systemd (Linux Mint / Debian / Ubuntu / Arch / Fedora):
sudo systemctl enable --now prompt
Verify:
sudo systemctl status prompt
curl -s http://127.0.0.1:8088/health
runit (Void Linux):
sudo ln -s /etc/sv/prompt /var/service/
Verify:
sudo sv status prompt
curl -s http://127.0.0.1:8088/health
Usage
# General knowledge
prompt what are the side effects of ibuprofen
prompt explain the difference between TCP and UDP
prompt what are 100cm in feet and inches
# Research (uses web search)
prompt -y what is the latest stable linux kernel version
prompt -y compare rust vs go for backend services in 2025
# System tasks (executes shell commands)
prompt convert all jpg files to png at 88% quality
prompt rename all files replacing spaces with underscores recursively
prompt set up a python venv and install flask
prompt enable iommu in my grub
Piped context
Feed command output, files, or logs as additional context via stdin:
man sed | prompt "delete blank lines from a file"
cat /etc/fstab | prompt "add a tmpfs for /tmp"
dmesg | tail -30 | prompt "why is my usb drive not mounting"
Or with -c for files:
prompt -c /etc/default/grub "enable iommu"
Auto-confirm lookups
The model can request shell commands to gather current information (package queries, man pages, configs, web searches). By default it asks for confirmation:
→ prompt-search "librewolf latest release" [Y/n]
Use -y to auto-approve:
prompt -y "what version of librewolf is in the repos"
Web lookup
The model has access to three helpers for fetching current upstream information:
prompt-search "query"— Web search via SearXNG, returns titles + URLs + snippetsprompt-mcp URL— fetches a URL using a headless Chromium browser (Playwright). Renders JavaScript, bypasses bot checks and CAPTCHAs. Falls back toprompt-fetchif Playwright is unavailable.prompt-fetch URL— lightweight URL fetch (curl + lynx). Good for simple text pages.
The system prompt instructs the model to always look up current data rather than guessing from its training cutoff. When you ask about package versions, upstream releases, or anything time-sensitive, it will search the web first.
Options
| Flag | Description |
|---|---|
-y, --yes |
Auto-confirm shell lookups |
-t, --temp NUM |
Temperature, 0.0–1.0 (default: 0.2) |
-n, --tokens NUM |
Max response tokens (default: 1024) |
-c, --context FILE |
Include file contents as context |
-h, --help |
Help |
-V, --version |
Version |
Environment variables
| Variable | Default | Description |
|---|---|---|
PROMPT_HOST |
127.0.0.1 |
Server address |
PROMPT_PORT |
8088 |
Server port |
PROMPT_CONF |
/etc/prompt.conf |
Config file path |
How it works
┌─────────┐ ┌──────────────┐ ┌──────────────┐
│ prompt │─────▶│ llama-server │─────▶│ GGUF model │
│ (client) │◀─────│ (service) │◀─────│ │
└────┬─────┘ └──────────────┘ └──────────────┘
│
│ if model requests >>>SHELL: cmd<<<
│
▼
┌──────────────────────────────────────┐
│ bash -c "cmd" (with user approval) │
│ │
│ prompt-search ──▶ SearXNG │
│ prompt-mcp ──▶ Chromium browser │
│ prompt-fetch ──▶ curl + lynx │
│ apt/pacman/dnf ──▶ local repos │
│ man, cat, ... ──▶ local system │
└──────────────────────────────────────┘
promptsends your question + system prompt to the local llama-server- The system prompt instructs the model to be terse and direct (man-page style)
- If the model needs current info, it emits
>>>SHELL: command<<< - The client shows the command and asks for confirmation (or auto-confirms with
-y) - The command output is fed back to the model for a follow-up response
- The final answer is printed to stdout (status/tool output goes to stderr)
This means prompt "list packages" > out.txt captures only the answer.
Service management
systemd (Linux Mint / Debian / Ubuntu / Arch / Fedora):
sudo systemctl status prompt # check status
sudo systemctl restart prompt # restart after config change
sudo systemctl stop prompt # stop
journalctl -u prompt -f # follow logs
runit (Void Linux):
sudo sv status prompt # check status
sudo sv restart prompt # restart after config change
sudo sv stop prompt # stop
Logs (runit): /var/log/prompt/current
License
MIT