Little local llm helper directly inside the shell.
  • Shell 57.9%
  • Python 34.7%
  • Jinja 7.4%
Find a file
3phedra e716ada415 Fix lore crash and add turboquant support (needs rocm)
Co-authored-by: Copilot <copilot@github.com>
2026-05-11 01:29:17 +02:00
completions Improved harness + mcp 2026-04-01 15:00:07 +02:00
debian crlf stupidity 2026-04-17 10:04:48 +02:00
lorebooks crlf stupidity 2026-04-17 10:04:48 +02:00
pkg crlf stupidity 2026-04-17 10:04:48 +02:00
prompt.d Fix some issues 2026-05-10 22:58:41 +02:00
searxng Host SearXNG and coerce model 2026-03-28 00:31:27 +01:00
sv Fix ROCm GPU selection 2026-05-10 23:53:09 +02:00
systemd Fix lore crash and add turboquant support (needs rocm) 2026-05-11 01:29:17 +02:00
templates Fix ROCm GPU selection 2026-05-10 23:53:09 +02:00
tools Little UX tweak for collapsed blocks 2026-04-11 14:19:44 +02:00
.gitignore lorebooks draft (big todo) 2026-04-09 11:15:07 +02:00
install.sh Fix lore crash and add turboquant support (needs rocm) 2026-05-11 01:29:17 +02:00
LICENSE Lotsa tweaks 2026-04-01 21:26:31 +02:00
prompt debian compatibility 2026-04-17 09:43:10 +02:00
prompt.conf Fix lore crash and add turboquant support (needs rocm) 2026-05-11 01:29:17 +02:00
README.md crlf stupidity 2026-04-17 10:04:48 +02:00
setup-llama.sh Fix lore crash and add turboquant support (needs rocm) 2026-05-11 01:29:17 +02:00

prompt — local LLM assistant

A versatile assistant powered by a local llama.cpp instance. Ask anything — general knowledge, research, medical questions, math, coding, or system administration. For factual/current topics, the model searches the web and reads pages. For system tasks, it can execute shell commands. No cloud, no API keys, no telemetry.

Runs as a systemd or runit service. Supports Debian/Ubuntu/Linux Mint (.deb), Arch Linux (PKGBUILD), Fedora (.rpm), and Void Linux.

Install

git clone https://git.crashcat.net/voibe/llm-assist && cd llmassist
sudo ./install.sh

Debian/Ubuntu/Mint:

sudo apt-get install curl jq lynx
# Optional: sudo apt-get install python3 python3-venv poppler-utils

Arch Linux:

sudo pacman -S curl jq lynx
# Optional: sudo pacman -S python python-pip poppler

Fedora:

sudo dnf install curl jq lynx
# Optional: sudo dnf install python3 python3-pip poppler-utils

Void Linux:

sudo xbps-install curl jq lynx
# Optional: sudo xbps-install python3 python3-pip poppler-utils

Or install a distro package:

# Debian/Ubuntu/Mint (.deb)
sudo dpkg -i prompt-llm_0.1.0-1_all.deb
sudo apt-get install -f

# Arch Linux (PKGBUILD)
cd pkg/arch && makepkg -si

# Fedora (.rpm)
sudo dnf install prompt-llm-0.1.0-1.noarch.rpm

Setup

1. Build llama.cpp

sudo ./setup-llama.sh build --rocm      # AMD GPU (7900 XTX, RX 7800, etc.)
sudo ./setup-llama.sh build --vulkan    # Vulkan (AMD/Intel/NVIDIA)
sudo ./setup-llama.sh build --cpu       # CPU only

This clones llama.cpp, builds with the selected backend, and installs llama-server, llama-cli, and llama-quantize to /usr/local/bin/. Missing build dependencies are installed automatically (apt, pacman, dnf, or xbps).

2. Download a model

Browse available quantizations:

sudo ./setup-llama.sh list-models Qwen/Qwen3.5-9B-Instruct-GGUF

Fetch one:

sudo ./setup-llama.sh fetch Qwen/Qwen3.5-9B-Instruct-GGUF qwen3.5-9b-instruct-q5_k_m.gguf

Or use a direct URL:

sudo ./setup-llama.sh fetch https://huggingface.co/Qwen/Qwen3.5-9B-Instruct-GGUF/resolve/main/qwen3.5-9b-instruct-q5_k_m.gguf

The first model downloaded is automatically linked as /var/lib/prompt/model.gguf.

Recommended sizes:

Hardware Model VRAM/RAM
7900 XTX (24 GB) Qwen3.5-9B-Instruct-Q5_K_M ~5.5 GB
iGPU / limited RAM Qwen2.5-3B-Instruct-Q5_K_M ~2.5 GB
Minimal Qwen2.5-1.5B-Instruct-Q5_K_M ~1.3 GB

Check everything:

sudo ./setup-llama.sh status

3. Configure

Edit /etc/prompt.conf. The defaults work for a 7900 XTX with the model at /var/lib/prompt/model.gguf.

Model selection — Three options (use one):

# Option 1: Just the filename (recommended)
PROMPT_MODEL="Qwen3.5-9B-Q6_K.gguf"

# Option 2: Full path (for models outside /var/lib/prompt/)
PROMPT_MODEL_PATH="/var/lib/prompt/model.gguf"

# Option 3: Use the default symlink (no config needed)
# ln -sf /var/lib/prompt/Qwen3.5-9B-Q6_K.gguf /var/lib/prompt/model.gguf

To see available models:

ls /var/lib/prompt/*.gguf

Other key settings:

PROMPT_GPU_LAYERS="99"     # 99 = offload everything, 0 = CPU only
PROMPT_CONTEXT_SIZE="8192" # raise for long man page lookups
PROMPT_USER="_prompt"       # drop privileges (created by install.sh)

4. Enable the service

systemd (Linux Mint / Debian / Ubuntu / Arch / Fedora):

sudo systemctl enable --now prompt

Verify:

sudo systemctl status prompt
curl -s http://127.0.0.1:8088/health

runit (Void Linux):

sudo ln -s /etc/sv/prompt /var/service/

Verify:

sudo sv status prompt
curl -s http://127.0.0.1:8088/health

Usage

# General knowledge
prompt what are the side effects of ibuprofen
prompt explain the difference between TCP and UDP
prompt what are 100cm in feet and inches

# Research (uses web search)
prompt -y what is the latest stable linux kernel version
prompt -y compare rust vs go for backend services in 2025

# System tasks (executes shell commands)
prompt convert all jpg files to png at 88% quality
prompt rename all files replacing spaces with underscores recursively
prompt set up a python venv and install flask
prompt enable iommu in my grub

Piped context

Feed command output, files, or logs as additional context via stdin:

man sed | prompt "delete blank lines from a file"
cat /etc/fstab | prompt "add a tmpfs for /tmp"
dmesg | tail -30 | prompt "why is my usb drive not mounting"

Or with -c for files:

prompt -c /etc/default/grub "enable iommu"

Auto-confirm lookups

The model can request shell commands to gather current information (package queries, man pages, configs, web searches). By default it asks for confirmation:

  → prompt-search "librewolf latest release" [Y/n]

Use -y to auto-approve:

prompt -y "what version of librewolf is in the repos"

Web lookup

The model has access to three helpers for fetching current upstream information:

  • prompt-search "query" — Web search via SearXNG, returns titles + URLs + snippets
  • prompt-mcp URL — fetches a URL using a headless Chromium browser (Playwright). Renders JavaScript, bypasses bot checks and CAPTCHAs. Falls back to prompt-fetch if Playwright is unavailable.
  • prompt-fetch URL — lightweight URL fetch (curl + lynx). Good for simple text pages.

The system prompt instructs the model to always look up current data rather than guessing from its training cutoff. When you ask about package versions, upstream releases, or anything time-sensitive, it will search the web first.

Options

Flag Description
-y, --yes Auto-confirm shell lookups
-t, --temp NUM Temperature, 0.01.0 (default: 0.2)
-n, --tokens NUM Max response tokens (default: 1024)
-c, --context FILE Include file contents as context
-h, --help Help
-V, --version Version

Environment variables

Variable Default Description
PROMPT_HOST 127.0.0.1 Server address
PROMPT_PORT 8088 Server port
PROMPT_CONF /etc/prompt.conf Config file path

How it works

  ┌─────────┐      ┌──────────────┐      ┌──────────────┐
  │ prompt   │─────▶│ llama-server │─────▶│ GGUF model   │
  │ (client) │◀─────│ (service)    │◀─────│              │
  └────┬─────┘      └──────────────┘      └──────────────┘
       │
       │  if model requests >>>SHELL: cmd<<<
       │
       ▼
  ┌──────────────────────────────────────┐
  │ bash -c "cmd"  (with user approval)  │
  │                                      │
  │  prompt-search ──▶ SearXNG           │
  │  prompt-mcp    ──▶ Chromium browser  │
  │  prompt-fetch  ──▶ curl + lynx       │
  │  apt/pacman/dnf ──▶ local repos       │
  │  man, cat, ... ──▶ local system      │
  └──────────────────────────────────────┘
  1. prompt sends your question + system prompt to the local llama-server
  2. The system prompt instructs the model to be terse and direct (man-page style)
  3. If the model needs current info, it emits >>>SHELL: command<<<
  4. The client shows the command and asks for confirmation (or auto-confirms with -y)
  5. The command output is fed back to the model for a follow-up response
  6. The final answer is printed to stdout (status/tool output goes to stderr)

This means prompt "list packages" > out.txt captures only the answer.

Service management

systemd (Linux Mint / Debian / Ubuntu / Arch / Fedora):

sudo systemctl status prompt      # check status
sudo systemctl restart prompt     # restart after config change
sudo systemctl stop prompt        # stop
journalctl -u prompt -f           # follow logs

runit (Void Linux):

sudo sv status prompt    # check status
sudo sv restart prompt   # restart after config change
sudo sv stop prompt      # stop

Logs (runit): /var/log/prompt/current

License

MIT