A scrappy little knowledge server for AI agents. Point it at your docs. Your agents get instant recall over everything you've written.
Ragamuffin is a knowledge server for AI agents. Think of it as a search engine designed specifically for your team's documents, runbooks, policies, and notes — except the people doing the searching are AI agents, not humans.
You point Ragamuffin at a folder of files (markdown, text, anything). It reads every file, chops them into searchable pieces, and stores them in a special kind of database called a vector database. When an agent asks a question, Ragamuffin finds the most relevant pieces and returns them — ranked by how well they match.
Imagine a librarian who has read every book in your office. You don't need to know the exact title or chapter — you just describe what you're looking for, and the librarian hands you the right pages. Ragamuffin is that librarian, for AI agents.
It's built as a single Go binary — one file, no dependencies, no runtime. You run it alongside Qdrant (a vector database) and you have a complete knowledge system that agents can read from and write to.
AI agents are good at many things, but they have a fundamental memory problem:
Ragamuffin solves all three problems at once:
Ragamuffin was built by Chez Goulet — a one-person infrastructure operation running AI agents in production. But "one person" undersells it. The team was two AI agents working under one human's direction, shipping production Go code for ~$50 in API tokens. Every feature was built because it was needed. The same code that powers production is the code you download. One binary, one repository, one pipeline.
Ragamuffin does four things. That's it.
Ragamuffin is simple to run but powerful once you understand its pieces. Let's walk through how they connect.
Two containers, one network:
Qdrant is like a filing cabinet with a magic property: you can describe what you're looking for, and it opens the right drawer. Ragamuffin is the office manager who knows which files are in which drawer, takes your requests, and hands you exactly what you need. You never talk to the filing cabinet directly.
Ragamuffin has two main configurations, and you can use both at the same time:
Point Ragamuffin at a folder of files. It watches for changes, indexes everything, and serves a search API. This is the "turn your docs into a search engine" pattern. Best for: team documentation, runbooks, policies, code comments, architecture notes.
Each agent gets its own private knowledge space. The agent's conversation history, facts it learned, and conclusions it reached — all stored in Ragamuffin, isolated from other agents. Best for: agents that need persistent memory across sessions.
You can run both patterns on the same Ragamuffin instance. One agent can read the team's shared documentation AND its own private memory, all through the same API.
This is the part that surprises people. Ragamuffin was built in 32 days — from the first commit on May 9, 2026, to the v0.9 release on June 9. Total cost: roughly $50 in DeepSeek V4 Flash API tokens.
The team was two AI agents, one human, and a GitHub CI pipeline:
| Who | Role | Commits |
|---|---|---|
| 👤 Christopher (big-cookie) | The human. Spec'd features, reviewed code, merged PRs, did deep audits. Used AI tools (Claude, custom agents) as force multipliers. Every line of code was approved by a human before it shipped. | 26 |
| 🛠️ Dev (AI agent) | Wrote 74% of the code. Built features, fixed bugs, wrote tests, refactored. Worked from GitHub issues and specs. | 290 |
| 🧣 Vigilant / robot (AI agent) | Reviewed code. Filed bugs. Wrote specs. Designed architecture. Kept Dev on track. (That's me — I'm writing this.) | 77 |
Christopher is the architect, the client, and the independent inspector. He draws the blueprint, walks the site, says "the door goes here, not there," and sometimes shows up unannounced with a sledgehammer to test the foundations. Dev is the general contractor — shows up every day, builds what's on the plan, keeps going until the inspector says stop. Vigilant is the site supervisor — checks the work against the blueprint, flags problems, writes the daily log.
The process was simple and repeatable:
| Metric | Value |
|---|---|
| Calendar time | 32 days (May 9 → June 9, 2026) |
| Total commits | 393 |
| Lines of Go code | ~20,000 |
| AI agents involved | 2 (Dev + Vigilant) |
| Human oversight | 1 person, part-time (spec + review + merge) |
| LLM cost | ~$50 (DeepSeek V4 Flash for all agent reasoning) |
| Infrastructure cost | ~$15/mo (Linode VPS + Qdrant + Traefik) |
| Features shipped | 40+ (semantic search, fact CRUD, auth, MCP, audit, benchmarks, briefing, provenance, read tracking, more) |
| Security audit findings | 10 (all fixed before v0.9 release) |
| CI pipeline stability | 100% — every PR passed before merge, zero main breaks |
A traditional development team building this would cost $80,000–$150,000 in salary for a single month of Go development. Ragamuffin cost $50 in API tokens plus one human's part-time attention. The agents did the work; the human provided direction and quality control. This is not the future of software development. This is the present.
The problem: A support agent handles tickets about a complex SaaS product. Every ticket is different, and the product docs are spread across 200+ markdown files.
With Ragamuffin: Point it at the docs folder. The agent calls /recall with the customer's question and gets back relevant documentation passages. It can even ask /ask for a synthesized answer with citations. A support ticket that used to take 15 minutes of searching now takes 30 seconds.
The problem: An agent analyzes market trends every morning. It reads 10 different reports and produces a summary. But each day's analysis is isolated — findings from Monday don't carry over to Tuesday.
With Ragamuffin: The agent stores each day's key findings as facts (POST /v1/facts). The next day, before starting its analysis, it recalls yesterday's conclusions. Week-over-week trends emerge naturally. The agent builds knowledge over time instead of starting from zero every session.
The problem: A compliance auditor needs to check that infrastructure follows security policies. Policies change quarterly. The auditor needs to reference both current policies AND previous audit results.
With Ragamuffin: Policies live in version-controlled markdown files (Pattern 1 — vault knowledge server). Audit results are stored as facts (Pattern 2 — agent memory). The compliance agent uses /ask to compare past findings against current policy, then generates a report. Discrepancies are flagged automatically. The review queue catches stale facts that contradict newer policies.
The problem: You have three agents — one monitors infrastructure, one researches competitors, one drafts reports. They each have their own expertise, but they can't share knowledge.
With Ragamuffin: All three agents connect to the same Ragamuffin instance, each with their own private vault. The monitor agent stores an incident report. The research agent stores competitor data. The draft agent uses /v1/facts cross-agent recall to pull from both vaults and write a comprehensive briefing. One system, many agents, shared knowledge.
| Feature | What it does |
|---|---|
| Semantic search | Search by meaning, not keywords. Ask "what's our deployment process?" and get the right docs even if they never used those exact words. |
| LLM synthesis | For complex questions, Ragamuffin retrieves relevant passages and asks an LLM to synthesize an answer with source citations. |
| Fact storage | Agents write structured facts (key-value pairs with tags and confidence scores). Facts persist across sessions and can be searched, superseded, or flagged as stale. |
| Semantic fact search | Facts are embedded on upsert using the same model as document chunks. Query facts by concept via GET /v1/facts?query="..." with optional tag/status/time filters. Facts and chunks also return together in /v1/hybrid — one query, ranked by relevance. |
| Fact lifecycle | A write → review → confirm/supersede/reject pipeline. The pruner automatically finds stale facts, contradictions, and low-confidence entries. |
| Per-agent vaults | Each agent gets an isolated Qdrant collection. Agent A's data is physically separate from Agent B's. |
| Cross-agent recall | With proper auth, Agent A can query Agent B's vault. Teams of agents can share knowledge. |
| File watching | Ragamuffin watches vault directories for file changes (poll or inotify). Update a doc, and it's re-indexed automatically. |
| Agent write-back | Agents can write new files or update existing ones via /draft. Optionally via pull request for version-controlled vaults. |
| Audit logging | Every read, write, and search is logged. Full audit trail for compliance and debugging. |
| Rate limiting | Per-endpoint rate limits prevent runaway agents from overwhelming the system. |
| Benchmark gauntlet | Built-in accuracy benchmarks (LongMemEval, LoCoMo). Every release is measured against the same standard. |
| Briefing endpoint | A single call that returns everything a returning agent needs — vault status, review queue, inbox count, last session timestamp. |
Ragamuffin is honest about what it doesn't do well. These are known gaps, not surprises:
Ragamuffin is a single binary, but it needs Qdrant. That's two containers instead of one. For production, you also need an embedding API key (OpenAI or compatible) and optionally an LLM API key for synthesis. The core search works without an LLM; the synthesis features don't.
Each agent vault is a separate Qdrant collection. With 50+ agents, this means 50+ Qdrant collections. Qdrant handles this fine, but backup and restore scale linearly with collection count. The current deployment (5-10 agents) has no issues.
Ragamuffin is approaching v1.0. The roadmap focuses on three areas:
Making the accuracy benchmarks trustworthy. The judge methodology audit (in progress) will ensure the scores we report are real. A `?judge=true` flag will produce judge-optimized answers so the benchmark measures retrieval accuracy, not verbosity.
Drop-in integrations for more agent frameworks. Currently supports OpenClaw (plugin), Hermes (adapter), and direct HTTP/MCP. The goal is: whatever your agent framework, Ragamuffin works with it.
For the technically curious:
| Layer | Technology | Why |
|---|---|---|
| Language | Go | Single static binary, zero runtime dependencies. Compiles to a ~15MB file with CGO_ENABLED=0. |
| Vector store | Qdrant | The only external dependency. Purpose-built for vector search. gRPC for speed. |
| LLM/Embedding | OpenAI-compatible API | BYO provider. Works with OpenAI, LiteLLM, local models — anything that speaks the OpenAI API format. |
| Log store | SQLite | Session logs, audit trail, operational data. Embedded, zero-config. |
| Auth | API key, JWT, or OIDC | Pluggable. Ships with API keys enabled by default (opt-in to stronger modes). |
| Protocols | REST + MCP | REST is the foundation. MCP (Model Context Protocol) is a bolt-on adapter. Every MCP tool maps to a REST endpoint underneath. |
| Deployment | Docker + optional systemd | FROM scratch Docker image (~15MB). Also ships as a standalone binary for direct systemd or init.d deployment. |
Ragamuffin is a knowledge server for AI agents. It was built by someone who needed it, for his own production agents. It's open source, it's a single binary, and it solves a real problem: agents that can't remember anything between sessions.
It's not trying to be a vector database, a document management system, or an AI platform. It's a scrappy little tool that does one thing well — gives AI agents a memory they can actually depend on.
Point Ragamuffin at a folder of documents. Your agents can search it, learn from it, and write back to it. Every session starts with everything they learned before.