Little local llm helper directly inside the shell.

Shell 57.9%
Python 34.7%
Jinja 7.4%

Find a file

3phedra e716ada415 Fix lore crash and add turboquant support (needs rocm) Co-authored-by: Copilot <copilot@github.com>		2026-05-11 01:29:17 +02:00
completions	Improved harness + mcp	2026-04-01 15:00:07 +02:00
debian	crlf stupidity	2026-04-17 10:04:48 +02:00
lorebooks	crlf stupidity	2026-04-17 10:04:48 +02:00
pkg	crlf stupidity	2026-04-17 10:04:48 +02:00
prompt.d	Fix some issues	2026-05-10 22:58:41 +02:00
searxng	Host SearXNG and coerce model	2026-03-28 00:31:27 +01:00
sv	Fix ROCm GPU selection	2026-05-10 23:53:09 +02:00
systemd	Fix lore crash and add turboquant support (needs rocm)	2026-05-11 01:29:17 +02:00
templates	Fix ROCm GPU selection	2026-05-10 23:53:09 +02:00
tools	Little UX tweak for collapsed blocks	2026-04-11 14:19:44 +02:00
.gitignore	lorebooks draft (big todo)	2026-04-09 11:15:07 +02:00
install.sh	Fix lore crash and add turboquant support (needs rocm)	2026-05-11 01:29:17 +02:00
LICENSE	Lotsa tweaks	2026-04-01 21:26:31 +02:00
prompt	debian compatibility	2026-04-17 09:43:10 +02:00
prompt.conf	Fix lore crash and add turboquant support (needs rocm)	2026-05-11 01:29:17 +02:00
README.md	crlf stupidity	2026-04-17 10:04:48 +02:00
setup-llama.sh	Fix lore crash and add turboquant support (needs rocm)	2026-05-11 01:29:17 +02:00

README.md

prompt — local LLM assistant

A versatile assistant powered by a local llama.cpp instance. Ask anything — general knowledge, research, medical questions, math, coding, or system administration. For factual/current topics, the model searches the web and reads pages. For system tasks, it can execute shell commands. No cloud, no API keys, no telemetry.

Runs as a systemd or runit service. Supports Debian/Ubuntu/Linux Mint (.deb), Arch Linux (PKGBUILD), Fedora (.rpm), and Void Linux.

Install

git clone https://git.crashcat.net/voibe/llm-assist && cd llmassist
sudo ./install.sh

Debian/Ubuntu/Mint:

sudo apt-get install curl jq lynx
# Optional: sudo apt-get install python3 python3-venv poppler-utils

Arch Linux:

sudo pacman -S curl jq lynx
# Optional: sudo pacman -S python python-pip poppler

Fedora:

sudo dnf install curl jq lynx
# Optional: sudo dnf install python3 python3-pip poppler-utils

Void Linux:

sudo xbps-install curl jq lynx
# Optional: sudo xbps-install python3 python3-pip poppler-utils

Or install a distro package:

# Debian/Ubuntu/Mint (.deb)
sudo dpkg -i prompt-llm_0.1.0-1_all.deb
sudo apt-get install -f

# Arch Linux (PKGBUILD)
cd pkg/arch && makepkg -si

# Fedora (.rpm)
sudo dnf install prompt-llm-0.1.0-1.noarch.rpm

Setup

1. Build llama.cpp

sudo ./setup-llama.sh build --rocm      # AMD GPU (7900 XTX, RX 7800, etc.)
sudo ./setup-llama.sh build --vulkan    # Vulkan (AMD/Intel/NVIDIA)
sudo ./setup-llama.sh build --cpu       # CPU only

This clones llama.cpp, builds with the selected backend, and installs llama-server, llama-cli, and llama-quantize to /usr/local/bin/. Missing build dependencies are installed automatically (apt, pacman, dnf, or xbps).

2. Download a model

Browse available quantizations:

sudo ./setup-llama.sh list-models Qwen/Qwen3.5-9B-Instruct-GGUF

Fetch one:

sudo ./setup-llama.sh fetch Qwen/Qwen3.5-9B-Instruct-GGUF qwen3.5-9b-instruct-q5_k_m.gguf

Or use a direct URL:

sudo ./setup-llama.sh fetch https://huggingface.co/Qwen/Qwen3.5-9B-Instruct-GGUF/resolve/main/qwen3.5-9b-instruct-q5_k_m.gguf

The first model downloaded is automatically linked as /var/lib/prompt/model.gguf.

Recommended sizes:

Hardware	Model	VRAM/RAM
7900 XTX (24 GB)	Qwen3.5-9B-Instruct-Q5_K_M	~5.5 GB
iGPU / limited RAM	Qwen2.5-3B-Instruct-Q5_K_M	~2.5 GB
Minimal	Qwen2.5-1.5B-Instruct-Q5_K_M	~1.3 GB

Check everything:

sudo ./setup-llama.sh status

3. Configure

Edit /etc/prompt.conf. The defaults work for a 7900 XTX with the model at /var/lib/prompt/model.gguf.

Model selection — Three options (use one):

# Option 1: Just the filename (recommended)
PROMPT_MODEL="Qwen3.5-9B-Q6_K.gguf"

# Option 2: Full path (for models outside /var/lib/prompt/)
PROMPT_MODEL_PATH="/var/lib/prompt/model.gguf"

# Option 3: Use the default symlink (no config needed)
# ln -sf /var/lib/prompt/Qwen3.5-9B-Q6_K.gguf /var/lib/prompt/model.gguf

To see available models:

ls /var/lib/prompt/*.gguf

Other key settings:

PROMPT_GPU_LAYERS="99"     # 99 = offload everything, 0 = CPU only
PROMPT_CONTEXT_SIZE="8192" # raise for long man page lookups
PROMPT_USER="_prompt"       # drop privileges (created by install.sh)

4. Enable the service

systemd (Linux Mint / Debian / Ubuntu / Arch / Fedora):

sudo systemctl enable --now prompt

Verify:

sudo systemctl status prompt
curl -s http://127.0.0.1:8088/health

runit (Void Linux):

sudo ln -s /etc/sv/prompt /var/service/

Verify:

sudo sv status prompt
curl -s http://127.0.0.1:8088/health

Usage

# General knowledge
prompt what are the side effects of ibuprofen
prompt explain the difference between TCP and UDP
prompt what are 100cm in feet and inches

# Research (uses web search)
prompt -y what is the latest stable linux kernel version
prompt -y compare rust vs go for backend services in 2025

# System tasks (executes shell commands)
prompt convert all jpg files to png at 88% quality
prompt rename all files replacing spaces with underscores recursively
prompt set up a python venv and install flask
prompt enable iommu in my grub

Piped context

Feed command output, files, or logs as additional context via stdin:

man sed | prompt "delete blank lines from a file"
cat /etc/fstab | prompt "add a tmpfs for /tmp"
dmesg | tail -30 | prompt "why is my usb drive not mounting"

Or with -c for files:

prompt -c /etc/default/grub "enable iommu"

Auto-confirm lookups

The model can request shell commands to gather current information (package queries, man pages, configs, web searches). By default it asks for confirmation:

  → prompt-search "librewolf latest release" [Y/n]

Use -y to auto-approve:

prompt -y "what version of librewolf is in the repos"

Web lookup

The model has access to three helpers for fetching current upstream information:

prompt-search "query" — Web search via SearXNG, returns titles + URLs + snippets
prompt-mcp URL — fetches a URL using a headless Chromium browser (Playwright). Renders JavaScript, bypasses bot checks and CAPTCHAs. Falls back to prompt-fetch if Playwright is unavailable.
prompt-fetch URL — lightweight URL fetch (curl + lynx). Good for simple text pages.

The system prompt instructs the model to always look up current data rather than guessing from its training cutoff. When you ask about package versions, upstream releases, or anything time-sensitive, it will search the web first.

Options

Flag	Description
`-y, --yes`	Auto-confirm shell lookups
`-t, --temp NUM`	Temperature, 0.0–1.0 (default: 0.2)
`-n, --tokens NUM`	Max response tokens (default: 1024)
`-c, --context FILE`	Include file contents as context
`-h, --help`	Help
`-V, --version`	Version

Environment variables

Variable	Default	Description
`PROMPT_HOST`	`127.0.0.1`	Server address
`PROMPT_PORT`	`8088`	Server port
`PROMPT_CONF`	`/etc/prompt.conf`	Config file path

How it works

  ┌─────────┐      ┌──────────────┐      ┌──────────────┐
  │ prompt   │─────▶│ llama-server │─────▶│ GGUF model   │
  │ (client) │◀─────│ (service)    │◀─────│              │
  └────┬─────┘      └──────────────┘      └──────────────┘
       │
       │  if model requests >>>SHELL: cmd<<<
       │
       ▼
  ┌──────────────────────────────────────┐
  │ bash -c "cmd"  (with user approval)  │
  │                                      │
  │  prompt-search ──▶ SearXNG           │
  │  prompt-mcp    ──▶ Chromium browser  │
  │  prompt-fetch  ──▶ curl + lynx       │
  │  apt/pacman/dnf ──▶ local repos       │
  │  man, cat, ... ──▶ local system      │
  └──────────────────────────────────────┘

prompt sends your question + system prompt to the local llama-server
The system prompt instructs the model to be terse and direct (man-page style)
If the model needs current info, it emits >>>SHELL: command<<<
The client shows the command and asks for confirmation (or auto-confirms with -y)
The command output is fed back to the model for a follow-up response
The final answer is printed to stdout (status/tool output goes to stderr)

This means prompt "list packages" > out.txt captures only the answer.

Service management