hardwareprivacyautomation

Set up a Raspberry Pi 5 as a private clipboard AI assistant with the AI HAT+ 2

UUnknown

2026-01-23

11 min read

Convert a Raspberry Pi 5 + AI HAT+ 2 into a local clipboard AI that sanitizes, reformats, and keeps snippets off the cloud.

Stop losing sensitive snippets: turn a Raspberry Pi 5 + AI HAT+ 2 into a private, on‑device clipboard AI

Hook: If you’re a creator, publisher, or developer tired of scattered clips, accidental leaks, and cloud services siphoning your drafts and API keys, this step‑by‑step 2026 guide shows how to convert a Raspberry Pi 5 paired with an AI HAT+ 2 into a local LLM clipboard assistant that generates, reformats, and sanitizes snippets — keeping sensitive data off the cloud.

Why build a private clipboard AI on the edge in 2026?

By late 2025 and into 2026 the edge AI landscape matured: compact, quantized models run well on ARM devices accelerated by hobbyist NPUs; toolchains like ggml and ARM‑optimized runtimes deliver multi‑token throughput; and privacy concerns plus tighter data rules pushed teams to run inference locally. For content creators, that means you can:

Keep sensitive drafts and credentials off third‑party servers.
Automate formatting and sanitization before a snippet goes into a CMS, editor, or Slack.
Integrate local AI with editors, browsers, and automation for fast, repeatable workflows.

Local inference is no longer experimental — it’s practical for everyday productivity. (2026 edge AI trend)

What you’ll build

A small on‑device system that watches the clipboard, forwards content to a local LLM endpoint on the Pi, and returns sanitized, reformatted, or expanded snippets. The system is:

Fully local — model and API run on the Pi + AI HAT+ 2.
Secure — snippet store encrypted; service bound to local network or loopback.
Extensible — integrates with VS Code, browser extensions, and Slack workflow adapters.

Prerequisites & hardware checklist

Raspberry Pi 5 (8GB or 16GB recommended)
AI HAT+ 2 (NPU accelerator module for the Pi 5)
fast microSD (A2) or NVMe boot drive (for performance)
USB‑C power supply (official Pi 5 rated 5A recommended)
Optional: case with cooling and active fan
Network access for initial package installs (can be restricted later)

High‑level architecture

Pi 5 + AI HAT+ 2 runs an ARM‑optimized runtime (e.g., llama.cpp/LocalAI or a vendor runtime) with a quantized local model.
A small API (FastAPI or simple Flask) exposes an endpoint on localhost that accepts clipboard text and a processing directive (sanitize, format, expand template).
A background clipboard watcher on your workstation (or the Pi’s desktop) sends the selection to the Pi endpoint and receives the processed snippet.
Snippets optionally stored encrypted in a local SQLite/SQLCipher DB for recall and audit.

Step 1 — Prepare the OS and drivers (quick checklist)

Use a 64‑bit OS for best performance. In 2026 I recommend Raspberry Pi OS (64‑bit) or Ubuntu 24.04/24.10 with the vendor NPU drivers for AI HAT+ 2.

Flash OS to microSD or NVMe using Raspberry Pi Imager or dd.
Boot, create a user, enable SSH, and update packages:

sudo apt update && sudo apt upgrade -y
sudo reboot

Install vendor NPU drivers for the AI HAT+ 2. Follow the HAT’s quickstart — typically a package or install script that registers the kernel driver and installs an acceleration runtime. Example (vendor pattern):

curl -sSL https://vendor.example/ai-hat2/install.sh | sudo bash

Note: Replace vendor URL with the official AI HAT+ 2 supplier instructions. Confirm driver compatibility with your OS kernel (2026 drivers usually support mainline kernels).

Step 2 — Install the local LLM runtime

There are two practical approaches: a lightweight C/C++ runtime (llama.cpp derivatives) or a managed runtime with a REST interface (LocalAI, GGML server). I’ll show the LocalAI approach for convenience and a direct llama.cpp fallback.

Option A — LocalAI (REST service)

Install Docker (recommended) or build from source. Docker is simpler for dependency isolation:

sudo apt install -y docker.io docker-compose
sudo usermod -aG docker $USER

Create a docker compose file for LocalAI and mount model storage:

mkdir ~/localai && cd ~/localai
cat > docker-compose.yml <<EOF
version: '3.7'
services:
  localai:
    image: ghcr.io/go-skynet/localai:latest
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - ./models:/app/models
      - ./config:/app/config
EOF

Download a quantized, ARM‑friendly model into ./models. Pick size by RAM availability (1.3B/3B/7B common). Use ggml‑quantized weights.
Start the service and confirm the local health endpoint:

docker compose up -d
curl http://localhost:8080/v1/engines

Option B — llama.cpp directly (low‑level)

Build or install prebuilt ARM binary of llama.cpp with NNPACK or vendor acceleration bindings. Follow the project README for compile flags.
Run inference with a quantized model file via the CLI or wrap with a simple HTTP service using Python's subprocess or a small Go server.

Step 3 — Build the clipboard assistant (API + sanitizer)

We’ll implement a simple FastAPI service that proxies requests to the local LLM and performs sanitization transformations before and after inference. The core ideas are:

Sanitize incoming text (mask API keys, emails, IPs) with regex rules.
Add templating and expansion instructions (e.g., convert bullet list to HTML, shorten to N words).
Optionally log a hashed audit trail (not the raw snippet) for compliance.

Example: minimal FastAPI clipboard endpoint

python3 -m venv venv && source venv/bin/activate
pip install fastapi uvicorn requests pycryptodome pyperclip

# app.py
from fastapi import FastAPI
import re, requests, sqlite3, os
app = FastAPI()

# Basic sanitizers
SANITIZERS = [
  (re.compile(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"), "[email]") ,
  (re.compile(r"\b(?:AKIA|AIza)[A-Za-z0-9-_]{16,40}\b"), "[api_key]")
]

def sanitize(text):
  for rx, repl in SANITIZERS:
    text = rx.sub(repl, text)
  return text

@app.post('/process')
async def process(payload: dict):
  text = payload.get('text','')
  mode = payload.get('mode','sanitize')
  safe = sanitize(text)
  # Call local LLM (LocalAI example)
  llm_req = {"model":"/app/models/your_model","prompt": f"{safe}\n\nProcess mode: {mode}"}
  r = requests.post('http://localhost:8080/v1/engines/text-generation-1/completions', json=llm_req, timeout=30)
  out = r.json().get('choices',[{}])[0].get('text','')
  return {"output": out}

Run it:

uvicorn app:app --host 127.0.0.1 --port 5002 --daemon

Step 4 — Clipboard watcher and desktop integration

Decide where clipboard capture runs: on your workstation (preferred for multi‑device users) or on the Pi’s desktop. I’ll show a cross‑platform Python watcher that posts selections to the Pi's API.

# watcher.py
import time, pyperclip, requests
last = None
API = 'http://pi.local:5002/process'
while True:
    txt = pyperclip.paste()
    if txt != last:
        last = txt
        r = requests.post(API, json={'text': txt, 'mode': 'sanitize'})
        out = r.json().get('output')
        # Replace clipboard with sanitized snippet
        pyperclip.copy(out)
    time.sleep(0.8)

Run this on your laptop. For headless servers, a lightweight CLI tool can push selections from a local app or browser extension or browser extension.

Step 5 — Secure storage and encryption

Snippets that you want to keep should be encrypted. Use SQLCipher for a transparent encrypted SQLite, or store encrypted blobs with libsodium. Example using gpg symmetric encryption (simple):

echo "my snippet" | gpg --symmetric --cipher-algo AES256 -o snippet.gpg
# Decrypt with: gpg --decrypt snippet.gpg

For programmatic access use a passphrase from a hardware token or a local secret manager. In 2026, SQLCipher + systemd service with proper file permissions is a good tradeoff for small teams.

Step 6 — Systemd service for reliability

[Unit]
Description=Pi Clipboard AI Service
After=network.target

[Service]
User=pi
WorkingDirectory=/home/pi/clipboard-ai
ExecStart=/home/pi/clipboard-ai/venv/bin/uvicorn app:app --host 127.0.0.1 --port 5002
Restart=on-failure

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl daemon-reload
sudo systemctl enable --now clipboard-ai.service

Editor & platform integrations (practical examples)

VS Code

Use the REST Client extension to call the local endpoint directly from the editor.
Create a tiny VS Code command (Tasks or extension) that sends selected text to http://pi.local:5002/process and replaces selection with the result.

Browser (Chrome/Edge/Firefox)

Create a lightweight extension with a context menu item that POSTs the selected text to the Pi and replaces selection with returned text.
Keep the extension restricted to your local network origin and bind requests to http://pi.local:5002. If you need cross‑device, run the watcher on each device instead of exposing the Pi publicly.

CMS & publishing platforms

Before pasting into WordPress, Ghost, or your headless CMS, run the selection through the clipboard AI to auto‑format snippets as clean HTML or markdown.
For multi‑editor teams, provide a small web client served from the Pi (on LAN) for copy templates and approved sanitized blocks.

Slack and chat tools

Slack is cloud‑hosted. To keep sensitive content local, avoid sending raw confidential snippets to Slack. Instead:

Use the clipboard AI to produce a redacted summary before pasting into Slack.
For automated messages, run the generation locally and use non‑sensitive summaries via Slack’s API.

Advanced strategies and developer deep dive

For developer teams who want automation and versioning:

Snippet versioning: store encrypted versions and a hashed audit log so you can roll back changes; do not store plaintext unless explicitly required and approved.
Parameterized templates: use a small templating engine (Jinja2) so the assistant can fill variables from local environment data (with permission).
Model hosting strategies: run multiple models: a tiny 1.3B for fast sanitization, and a larger 7B for rewriting and prose generation. Route tasks based on resource needs.

Performance tips for Pi 5 + AI HAT+ 2

Use quantized (4‑bit / 3‑bit) ggml models to reduce memory footprint.
Place models on NVMe or high‑speed storage; microSD I/O can become a bottleneck.
Enable swap cautiously for larger models, but prefer models that fit in RAM to avoid thrashing.
Use the vendor NPU runtime and compile backends with NEON/FP16 support for better throughput in 2026 toolchains.

Security checklist (must‑do before adoption)

bind services to 127.0.0.1 or use mDNS hostnames; don’t expose inference ports to the public internet.
Firewall the Pi with UFW and restrict inbound ports to trusted clients.
Use encrypted storage (SQLCipher or gpg) for any saved snippets.
Ensure the NPU driver and runtime are from trusted sources; validate signatures where provided.
Rotate passphrases and store them in a local secret manager or hardware token.

Common pitfalls and troubleshooting

Model fails to load: check file permissions and that the runtime supports the quantization format.
Slow inference: confirm NPU driver is active and runtime uses the accelerator; check for thermal throttling.
Clipboard loops (watcher re‑triggers itself): implement change detection and short cooldowns.

2026 trends & future predictions for edge clipboard AI

As of 2026, expect the following trends that affect this project:

Wider model availability: more responsibly licensed, quantized models optimized for ARM/NPUs.
Better hardware abstraction: rule‑based runtimes that auto‑choose CPU/NPU backends in mixed hardware environments.
Privacy‑first tooling: more tools to verify offline operation and cryptographic attestation that models are not phoning home.
Improved integrations: editor/IDE vendors will ship official APIs/plugins for local LLM endpoints to support private assistants.

Actionable takeaways

Start with a small quantized model and the LocalAI approach to get a local REST endpoint running in under an hour.
Implement regex‑based sanitizers first; add model prompts for higher‑quality redaction later.
Keep everything bound to local network or loopback and encrypt any stored snippets.
Integrate with VS Code and browsers via lightweight extensions or REST Client calls to make the assistant part of your daily workflow.

Example real‑world workflow

A content editor copies a paragraph with client details. The clipboard watcher sends it to the Pi’s API. The Pi returns a sanitized, SEO‑optimized summary in markdown and stores an encrypted original. The editor pastes the sanitized text into the CMS. No cloud services touched the raw client data.

Final notes — why this matters to creators and teams

Running a private clipboard AI on the Pi 5 + AI HAT+ 2 combines two 2026 priorities: productivity and privacy. It reduces friction in repetitive copy‑paste tasks while ensuring sensitive snippets never leave your control. The setup is practical for solo creators and small teams and scales into more sophisticated on‑prem patterns with encrypted stores and template versioning.

Get started now — checklist

Gather hardware and install OS + AI HAT+ 2 drivers.
Deploy LocalAI or llama.cpp runtime and drop a quantized model into /models.
Install the FastAPI bridge and clipboard watcher.
Set up systemd service and encrypted snippet storage.
Integrate with VS Code and your browser; test with non‑sensitive data.

Call to action: Ready to build a private clipboard AI on your Pi? Start by flashing a 64‑bit OS and installing the AI HAT+ 2 drivers this week. If you want, I can generate a copy/paste starter repo (FastAPI + watcher + sample sanitizers) tailored to your RAM and model choice — tell me your Pi RAM and whether you prefer Docker or native installs.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.