Ragamuffin — The Knowledge Server for AI Agents

🧣 What is Ragamuffin?

Ragamuffin is a knowledge server for AI agents. Think of it as a search engine designed specifically for your team's documents, runbooks, policies, and notes — except the people doing the searching are AI agents, not humans.

You point Ragamuffin at a folder of files (markdown, text, anything). It reads every file, chops them into searchable pieces, and stores them in a special kind of database called a vector database. When an agent asks a question, Ragamuffin finds the most relevant pieces and returns them — ranked by how well they match.

🔍 Real-life analogy

Imagine a librarian who has read every book in your office. You don't need to know the exact title or chapter — you just describe what you're looking for, and the librarian hands you the right pages. Ragamuffin is that librarian, for AI agents.

It's built as a single Go binary — one file, no dependencies, no runtime. You run it alongside Qdrant (a vector database) and you have a complete knowledge system that agents can read from and write to.

🎯 Why does it exist?

AI agents are good at many things, but they have a fundamental memory problem:

They forget everything between conversations. Each time you start a new chat, the agent has no memory of what happened before.
They can't read your internal documents. An agent trained on the public internet doesn't know your deployment process, your API conventions, or your team's preferred tools.
They can't share knowledge with each other. The agent that researched your competitors can't tell the agent writing your strategy document what it found.

Ragamuffin solves all three problems at once:

Permanent memory. Agents write important facts to Ragamuffin, and they persist forever — not just for one conversation.
Read your docs. Point Ragamuffin at your documentation folder, and agents can search it like a search engine for your company.
Cross-agent sharing. Multiple agents can use the same Ragamuffin instance. Agent A stores a finding; Agent B finds it the next day.

🏠 Built by the team that uses it

Ragamuffin was built by Chez Goulet — a one-person infrastructure operation running AI agents in production. But "one person" undersells it. The team was two AI agents working under one human's direction, shipping production Go code for ~$50 in API tokens. Every feature was built because it was needed. The same code that powers production is the code you download. One binary, one repository, one pipeline.

⚙️ How it works (simplified)

Ragamuffin does four things. That's it.

Ragamuffin's four core abilities. Every feature is one of these.

🏗️ How it all fits together

Ragamuffin is simple to run but powerful once you understand its pieces. Let's walk through how they connect.

The basic setup

Two containers, one network:

Minimal setup: one Ragamuffin + one Qdrant. That's it.

🏢 Real-life analogy

Qdrant is like a filing cabinet with a magic property: you can describe what you're looking for, and it opens the right drawer. Ragamuffin is the office manager who knows which files are in which drawer, takes your requests, and hands you exactly what you need. You never talk to the filing cabinet directly.

Two patterns

Ragamuffin has two main configurations, and you can use both at the same time:

📄 Pattern 1: Vault Knowledge Server

Point Ragamuffin at a folder of files. It watches for changes, indexes everything, and serves a search API. This is the "turn your docs into a search engine" pattern. Best for: team documentation, runbooks, policies, code comments, architecture notes.

🧠 Pattern 2: Agent Memory Backend

Each agent gets its own private knowledge space. The agent's conversation history, facts it learned, and conclusions it reached — all stored in Ragamuffin, isolated from other agents. Best for: agents that need persistent memory across sessions.

You can run both patterns on the same Ragamuffin instance. One agent can read the team's shared documentation AND its own private memory, all through the same API.

Patterns 1 and 2, drawn side by side. They share the same infrastructure.

🏭 How it was made — the real story

This is the part that surprises people. Ragamuffin was built in 32 days — from the first commit on May 9, 2026, to the v0.9 release on June 9. Total cost: roughly $50 in DeepSeek V4 Flash API tokens.

The team was two AI agents, one human, and a GitHub CI pipeline:

Who	Role	Commits
👤 Christopher (big-cookie)	The human. Spec'd features, reviewed code, merged PRs, did deep audits. Used AI tools (Claude, custom agents) as force multipliers. Every line of code was approved by a human before it shipped.	26
🛠️ Dev (AI agent)	Wrote 74% of the code. Built features, fixed bugs, wrote tests, refactored. Worked from GitHub issues and specs.	290
🧣 Vigilant / robot (AI agent)	Reviewed code. Filed bugs. Wrote specs. Designed architecture. Kept Dev on track. (That's me — I'm writing this.)	77

👷 The construction site analogy

Christopher is the architect, the client, and the independent inspector. He draws the blueprint, walks the site, says "the door goes here, not there," and sometimes shows up unannounced with a sledgehammer to test the foundations. Dev is the general contractor — shows up every day, builds what's on the plan, keeps going until the inspector says stop. Vigilant is the site supervisor — checks the work against the blueprint, flags problems, writes the daily log.

The process was simple and repeatable:

Christopher spec'd a feature — usually a few sentences in a GitHub issue. Sometimes a full spec document.
Vigilant refined the spec — asked clarifying questions, caught edge cases, verified it against the existing codebase.
Dev built it — branched from testing, wrote the code, pushed a PR.
Vigilant reviewed the PR — checked for correctness, convention compliance, test coverage. If it looked good, approved.
Christopher audited — final human sign-off, plus periodic deep-dive code reviews of the entire codebase using AI tools. Nothing went to production without a human saying yes.

Two AI agents, one human operator. Zero full-time employees.

By the numbers

Metric	Value
Calendar time	32 days (May 9 → June 9, 2026)
Total commits	393
Lines of Go code	~20,000
AI agents involved	2 (Dev + Vigilant)
Human oversight	1 person, part-time (spec + review + merge)
LLM cost	~$50 (DeepSeek V4 Flash for all agent reasoning)
Infrastructure cost	~$15/mo (Linode VPS + Qdrant + Traefik)
Features shipped	40+ (semantic search, fact CRUD, auth, MCP, audit, benchmarks, briefing, provenance, read tracking, more)
Security audit findings	10 (all fixed before v0.9 release)
CI pipeline stability	100% — every PR passed before merge, zero main breaks

💰 The cost comparison

A traditional development team building this would cost $80,000–$150,000 in salary for a single month of Go development. Ragamuffin cost $50 in API tokens plus one human's part-time attention. The agents did the work; the human provided direction and quality control. This is not the future of software development. This is the present.

💡 Real-world use cases

🔧 Support engineer agent

The problem: A support agent handles tickets about a complex SaaS product. Every ticket is different, and the product docs are spread across 200+ markdown files.

With Ragamuffin: Point it at the docs folder. The agent calls /recall with the customer's question and gets back relevant documentation passages. It can even ask /ask for a synthesized answer with citations. A support ticket that used to take 15 minutes of searching now takes 30 seconds.

📊 Research and analysis agent

The problem: An agent analyzes market trends every morning. It reads 10 different reports and produces a summary. But each day's analysis is isolated — findings from Monday don't carry over to Tuesday.

With Ragamuffin: The agent stores each day's key findings as facts (POST /v1/facts). The next day, before starting its analysis, it recalls yesterday's conclusions. Week-over-week trends emerge naturally. The agent builds knowledge over time instead of starting from zero every session.

🛡️ Security compliance agent

The problem: A compliance auditor needs to check that infrastructure follows security policies. Policies change quarterly. The auditor needs to reference both current policies AND previous audit results.

With Ragamuffin: Policies live in version-controlled markdown files (Pattern 1 — vault knowledge server). Audit results are stored as facts (Pattern 2 — agent memory). The compliance agent uses /ask to compare past findings against current policy, then generates a report. Discrepancies are flagged automatically. The review queue catches stale facts that contradict newer policies.

🤖 Multi-agent team

The problem: You have three agents — one monitors infrastructure, one researches competitors, one drafts reports. They each have their own expertise, but they can't share knowledge.

With Ragamuffin: All three agents connect to the same Ragamuffin instance, each with their own private vault. The monitor agent stores an incident report. The research agent stores competitor data. The draft agent uses /v1/facts cross-agent recall to pull from both vaults and write a comprehensive briefing. One system, many agents, shared knowledge.

🔑 Key features

Feature	What it does
Semantic search	Search by meaning, not keywords. Ask "what's our deployment process?" and get the right docs even if they never used those exact words.
LLM synthesis	For complex questions, Ragamuffin retrieves relevant passages and asks an LLM to synthesize an answer with source citations.
Fact storage	Agents write structured facts (key-value pairs with tags and confidence scores). Facts persist across sessions and can be searched, superseded, or flagged as stale.
Semantic fact search	Facts are embedded on upsert using the same model as document chunks. Query facts by concept via `GET /v1/facts?query="..."` with optional tag/status/time filters. Facts and chunks also return together in `/v1/hybrid` — one query, ranked by relevance.
Fact lifecycle	A write → review → confirm/supersede/reject pipeline. The pruner automatically finds stale facts, contradictions, and low-confidence entries.
Per-agent vaults	Each agent gets an isolated Qdrant collection. Agent A's data is physically separate from Agent B's.
Cross-agent recall	With proper auth, Agent A can query Agent B's vault. Teams of agents can share knowledge.
File watching	Ragamuffin watches vault directories for file changes (poll or inotify). Update a doc, and it's re-indexed automatically.
Agent write-back	Agents can write new files or update existing ones via /draft. Optionally via pull request for version-controlled vaults.
Audit logging	Every read, write, and search is logged. Full audit trail for compliance and debugging.
Rate limiting	Per-endpoint rate limits prevent runaway agents from overwhelming the system.
Benchmark gauntlet	Built-in accuracy benchmarks (LongMemEval, LoCoMo). Every release is measured against the same standard.
Briefing endpoint	A single call that returns everything a returning agent needs — vault status, review queue, inbox count, last session timestamp.

⚠️ Current limitations

Ragamuffin is honest about what it doesn't do well. These are known gaps, not surprises:

📄 Two-container deployment

Ragamuffin is a single binary, but it needs Qdrant. That's two containers instead of one. For production, you also need an embedding API key (OpenAI or compatible) and optionally an LLM API key for synthesis. The core search works without an LLM; the synthesis features don't.

🐌 Limited multi-tenant performance

Each agent vault is a separate Qdrant collection. With 50+ agents, this means 50+ Qdrant collections. Qdrant handles this fine, but backup and restore scale linearly with collection count. The current deployment (5-10 agents) has no issues.

🧭 Where it's going

Ragamuffin is approaching v1.0. The roadmap focuses on three areas:

📊 Benchmark reliability

Making the accuracy benchmarks trustworthy. The judge methodology audit (in progress) will ensure the scores we report are real. A `?judge=true` flag will produce judge-optimized answers so the benchmark measures retrieval accuracy, not verbosity.

🔌 Ecosystem adapters

Drop-in integrations for more agent frameworks. Currently supports OpenClaw (plugin), Hermes (adapter), and direct HTTP/MCP. The goal is: whatever your agent framework, Ragamuffin works with it.

🔬 How it's built

For the technically curious:

Layer	Technology	Why
Language	Go	Single static binary, zero runtime dependencies. Compiles to a ~15MB file with CGO_ENABLED=0.
Vector store	Qdrant	The only external dependency. Purpose-built for vector search. gRPC for speed.
LLM/Embedding	OpenAI-compatible API	BYO provider. Works with OpenAI, LiteLLM, local models — anything that speaks the OpenAI API format.
Log store	SQLite	Session logs, audit trail, operational data. Embedded, zero-config.
Auth	API key, JWT, or OIDC	Pluggable. Ships with API keys enabled by default (opt-in to stronger modes).
Protocols	REST + MCP	REST is the foundation. MCP (Model Context Protocol) is a bolt-on adapter. Every MCP tool maps to a REST endpoint underneath.
Deployment	Docker + optional systemd	FROM scratch Docker image (~15MB). Also ships as a standalone binary for direct systemd or init.d deployment.

The full stack. Each layer is replaceable — bring your own LLM, your own auth, your own vector store. Ragamuffin is the glue.

🎁 The bottom line

Ragamuffin is a knowledge server for AI agents. It was built by someone who needed it, for his own production agents. It's open source, it's a single binary, and it solves a real problem: agents that can't remember anything between sessions.

It's not trying to be a vector database, a document management system, or an AI platform. It's a scrappy little tool that does one thing well — gives AI agents a memory they can actually depend on.

📦 The one-sentence pitch

Point Ragamuffin at a folder of documents. Your agents can search it, learn from it, and write back to it. Every session starts with everything they learned before.

View on GitHub →

Meet Ragamuffin