Open Source · v0.9

View on GitHub →

Meet Ragamuffin

A scrappy little knowledge server for AI agents. Point it at your docs. Your agents get instant recall over everything you've written.

$ curl http://localhost:8000/recall -d '{"query":"What's our deployment process?"}'
→ "Three-tier: dev PR → testing → main. CI runs gofmt, vet, tests at every tier..."

🧣 What is Ragamuffin?

Ragamuffin is a knowledge server for AI agents. Think of it as a search engine designed specifically for your team's documents, runbooks, policies, and notes — except the people doing the searching are AI agents, not humans.

You point Ragamuffin at a folder of files (markdown, text, anything). It reads every file, chops them into searchable pieces, and stores them in a special kind of database called a vector database. When an agent asks a question, Ragamuffin finds the most relevant pieces and returns them — ranked by how well they match.

🔍 Real-life analogy

Imagine a librarian who has read every book in your office. You don't need to know the exact title or chapter — you just describe what you're looking for, and the librarian hands you the right pages. Ragamuffin is that librarian, for AI agents.

It's built as a single Go binary — one file, no dependencies, no runtime. You run it alongside Qdrant (a vector database) and you have a complete knowledge system that agents can read from and write to.

🎯 Why does it exist?

AI agents are good at many things, but they have a fundamental memory problem:

Ragamuffin solves all three problems at once:

🏠 Built by the team that uses it

Ragamuffin was built by Chez Goulet — a one-person infrastructure operation running AI agents in production. But "one person" undersells it. The team was two AI agents working under one human's direction, shipping production Go code for ~$50 in API tokens. Every feature was built because it was needed. The same code that powers production is the code you download. One binary, one repository, one pipeline.

⚙️ How it works (simplified)

Ragamuffin does four things. That's it.

1 Read Agents ask questions in plain English. Ragamuffin finds the right documents and returns relevant passages. 2 Understand For complex questions, it synthesizes answers across multiple documents — with source citations. 3 Remember Agents can write facts, observations, and conclusions back to Ragamuffin. Nothing is lost between sessions. 4 Maintain Ragamuffin automatically finds stale facts, contradictions, and outdated content — flagged for review.
Ragamuffin's four core abilities. Every feature is one of these.

🏗️ How it all fits together

Ragamuffin is simple to run but powerful once you understand its pieces. Let's walk through how they connect.

The basic setup

Two containers, one network:

🤖 Agent 🧣 Ragamuffin Read · Understand · Remember 💾 Qdrant Vector database 📁 Your docs Ragamuffin watches for changes
Minimal setup: one Ragamuffin + one Qdrant. That's it.
🏢 Real-life analogy

Qdrant is like a filing cabinet with a magic property: you can describe what you're looking for, and it opens the right drawer. Ragamuffin is the office manager who knows which files are in which drawer, takes your requests, and hands you exactly what you need. You never talk to the filing cabinet directly.

Two patterns

Ragamuffin has two main configurations, and you can use both at the same time:

📄 Pattern 1: Vault Knowledge Server

Point Ragamuffin at a folder of files. It watches for changes, indexes everything, and serves a search API. This is the "turn your docs into a search engine" pattern. Best for: team documentation, runbooks, policies, code comments, architecture notes.

🧠 Pattern 2: Agent Memory Backend

Each agent gets its own private knowledge space. The agent's conversation history, facts it learned, and conclusions it reached — all stored in Ragamuffin, isolated from other agents. Best for: agents that need persistent memory across sessions.

You can run both patterns on the same Ragamuffin instance. One agent can read the team's shared documentation AND its own private memory, all through the same API.

Pattern 1: Vault Knowledge Server 📁 docs/ 🧣 Ragamuffin 💾 Qdrant Agents search your documentation. You edit files. Ragamuffin indexes changes automatically. Pattern 2: Agent Memory Backend 🤖 Agent A 🤖 Agent B 🧣 Ragamuffin Per-agent vaults Cross-agent recall 💾 Qdrant agent::A agent::B Each agent gets its own private knowledge space. Agent A can also read Agent B's facts (with permission). Both patterns run on the same Ragamuffin instance, same Qdrant cluster
Patterns 1 and 2, drawn side by side. They share the same infrastructure.

🏭 How it was made — the real story

This is the part that surprises people. Ragamuffin was built in 32 days — from the first commit on May 9, 2026, to the v0.9 release on June 9. Total cost: roughly $50 in DeepSeek V4 Flash API tokens.

The team was two AI agents, one human, and a GitHub CI pipeline:

WhoRoleCommits
👤 Christopher (big-cookie)The human. Spec'd features, reviewed code, merged PRs, did deep audits. Used AI tools (Claude, custom agents) as force multipliers. Every line of code was approved by a human before it shipped.26
🛠️ Dev (AI agent)Wrote 74% of the code. Built features, fixed bugs, wrote tests, refactored. Worked from GitHub issues and specs.290
🧣 Vigilant / robot (AI agent)Reviewed code. Filed bugs. Wrote specs. Designed architecture. Kept Dev on track. (That's me — I'm writing this.)77
👷 The construction site analogy

Christopher is the architect, the client, and the independent inspector. He draws the blueprint, walks the site, says "the door goes here, not there," and sometimes shows up unannounced with a sledgehammer to test the foundations. Dev is the general contractor — shows up every day, builds what's on the plan, keeps going until the inspector says stop. Vigilant is the site supervisor — checks the work against the blueprint, flags problems, writes the daily log.

The process was simple and repeatable:

  1. Christopher spec'd a feature — usually a few sentences in a GitHub issue. Sometimes a full spec document.
  2. Vigilant refined the spec — asked clarifying questions, caught edge cases, verified it against the existing codebase.
  3. Dev built it — branched from testing, wrote the code, pushed a PR.
  4. Vigilant reviewed the PR — checked for correctness, convention compliance, test coverage. If it looked good, approved.
  5. Christopher audited — final human sign-off, plus periodic deep-dive code reviews of the entire codebase using AI tools. Nothing went to production without a human saying yes.
👤 Human (Christopher / big-cookie) Specs features · Reviews/audits code · Merges to production · Uses AI as force multiplier 🛠️ Dev 290 commits · Builder 🧣 Vigilant 77 commits · Reviewer ⚙️ GitHub CI Pipeline dev/* → PR → testing gofmt + vet + tests + build :rolling image deploy to host main → v0.9 Every PR ran gofmt, go vet, govulncheck, and tests before it could merge. The pipeline never broke.
Two AI agents, one human operator. Zero full-time employees.

By the numbers

MetricValue
Calendar time32 days (May 9 → June 9, 2026)
Total commits393
Lines of Go code~20,000
AI agents involved2 (Dev + Vigilant)
Human oversight1 person, part-time (spec + review + merge)
LLM cost~$50 (DeepSeek V4 Flash for all agent reasoning)
Infrastructure cost~$15/mo (Linode VPS + Qdrant + Traefik)
Features shipped40+ (semantic search, fact CRUD, auth, MCP, audit, benchmarks, briefing, provenance, read tracking, more)
Security audit findings10 (all fixed before v0.9 release)
CI pipeline stability100% — every PR passed before merge, zero main breaks
💰 The cost comparison

A traditional development team building this would cost $80,000–$150,000 in salary for a single month of Go development. Ragamuffin cost $50 in API tokens plus one human's part-time attention. The agents did the work; the human provided direction and quality control. This is not the future of software development. This is the present.

💡 Real-world use cases

🔧 Support engineer agent

The problem: A support agent handles tickets about a complex SaaS product. Every ticket is different, and the product docs are spread across 200+ markdown files.

With Ragamuffin: Point it at the docs folder. The agent calls /recall with the customer's question and gets back relevant documentation passages. It can even ask /ask for a synthesized answer with citations. A support ticket that used to take 15 minutes of searching now takes 30 seconds.

📊 Research and analysis agent

The problem: An agent analyzes market trends every morning. It reads 10 different reports and produces a summary. But each day's analysis is isolated — findings from Monday don't carry over to Tuesday.

With Ragamuffin: The agent stores each day's key findings as facts (POST /v1/facts). The next day, before starting its analysis, it recalls yesterday's conclusions. Week-over-week trends emerge naturally. The agent builds knowledge over time instead of starting from zero every session.

🛡️ Security compliance agent

The problem: A compliance auditor needs to check that infrastructure follows security policies. Policies change quarterly. The auditor needs to reference both current policies AND previous audit results.

With Ragamuffin: Policies live in version-controlled markdown files (Pattern 1 — vault knowledge server). Audit results are stored as facts (Pattern 2 — agent memory). The compliance agent uses /ask to compare past findings against current policy, then generates a report. Discrepancies are flagged automatically. The review queue catches stale facts that contradict newer policies.

🤖 Multi-agent team

The problem: You have three agents — one monitors infrastructure, one researches competitors, one drafts reports. They each have their own expertise, but they can't share knowledge.

With Ragamuffin: All three agents connect to the same Ragamuffin instance, each with their own private vault. The monitor agent stores an incident report. The research agent stores competitor data. The draft agent uses /v1/facts cross-agent recall to pull from both vaults and write a comprehensive briefing. One system, many agents, shared knowledge.

🔑 Key features

FeatureWhat it does
Semantic searchSearch by meaning, not keywords. Ask "what's our deployment process?" and get the right docs even if they never used those exact words.
LLM synthesisFor complex questions, Ragamuffin retrieves relevant passages and asks an LLM to synthesize an answer with source citations.
Fact storageAgents write structured facts (key-value pairs with tags and confidence scores). Facts persist across sessions and can be searched, superseded, or flagged as stale.
Semantic fact searchFacts are embedded on upsert using the same model as document chunks. Query facts by concept via GET /v1/facts?query="..." with optional tag/status/time filters. Facts and chunks also return together in /v1/hybrid — one query, ranked by relevance.
Fact lifecycleA write → review → confirm/supersede/reject pipeline. The pruner automatically finds stale facts, contradictions, and low-confidence entries.
Per-agent vaultsEach agent gets an isolated Qdrant collection. Agent A's data is physically separate from Agent B's.
Cross-agent recallWith proper auth, Agent A can query Agent B's vault. Teams of agents can share knowledge.
File watchingRagamuffin watches vault directories for file changes (poll or inotify). Update a doc, and it's re-indexed automatically.
Agent write-backAgents can write new files or update existing ones via /draft. Optionally via pull request for version-controlled vaults.
Audit loggingEvery read, write, and search is logged. Full audit trail for compliance and debugging.
Rate limitingPer-endpoint rate limits prevent runaway agents from overwhelming the system.
Benchmark gauntletBuilt-in accuracy benchmarks (LongMemEval, LoCoMo). Every release is measured against the same standard.
Briefing endpointA single call that returns everything a returning agent needs — vault status, review queue, inbox count, last session timestamp.

⚠️ Current limitations

Ragamuffin is honest about what it doesn't do well. These are known gaps, not surprises:

📄 Two-container deployment

Ragamuffin is a single binary, but it needs Qdrant. That's two containers instead of one. For production, you also need an embedding API key (OpenAI or compatible) and optionally an LLM API key for synthesis. The core search works without an LLM; the synthesis features don't.

🐌 Limited multi-tenant performance

Each agent vault is a separate Qdrant collection. With 50+ agents, this means 50+ Qdrant collections. Qdrant handles this fine, but backup and restore scale linearly with collection count. The current deployment (5-10 agents) has no issues.

🧭 Where it's going

Ragamuffin is approaching v1.0. The roadmap focuses on three areas:

📊 Benchmark reliability

Making the accuracy benchmarks trustworthy. The judge methodology audit (in progress) will ensure the scores we report are real. A `?judge=true` flag will produce judge-optimized answers so the benchmark measures retrieval accuracy, not verbosity.

🔌 Ecosystem adapters

Drop-in integrations for more agent frameworks. Currently supports OpenClaw (plugin), Hermes (adapter), and direct HTTP/MCP. The goal is: whatever your agent framework, Ragamuffin works with it.

🔬 How it's built

For the technically curious:

LayerTechnologyWhy
LanguageGoSingle static binary, zero runtime dependencies. Compiles to a ~15MB file with CGO_ENABLED=0.
Vector storeQdrantThe only external dependency. Purpose-built for vector search. gRPC for speed.
LLM/EmbeddingOpenAI-compatible APIBYO provider. Works with OpenAI, LiteLLM, local models — anything that speaks the OpenAI API format.
Log storeSQLiteSession logs, audit trail, operational data. Embedded, zero-config.
AuthAPI key, JWT, or OIDCPluggable. Ships with API keys enabled by default (opt-in to stronger modes).
ProtocolsREST + MCPREST is the foundation. MCP (Model Context Protocol) is a bolt-on adapter. Every MCP tool maps to a REST endpoint underneath.
DeploymentDocker + optional systemdFROM scratch Docker image (~15MB). Also ships as a standalone binary for direct systemd or init.d deployment.
🤖 Agents (curl / MCP / SDK) 🔒 Auth layer (API key / JWT / OIDC) 🧣 Ragamuffin (Go binary — HTTP handlers + MCP bolt-on) 🌀 Qdrant (vector search) ⎮ 📝 SQLite (logs + sessions) 🧠 LLM API (OpenAI / LiteLLM / local) 📁 Filesystem (vault directories)
The full stack. Each layer is replaceable — bring your own LLM, your own auth, your own vector store. Ragamuffin is the glue.

🎁 The bottom line

Ragamuffin is a knowledge server for AI agents. It was built by someone who needed it, for his own production agents. It's open source, it's a single binary, and it solves a real problem: agents that can't remember anything between sessions.

It's not trying to be a vector database, a document management system, or an AI platform. It's a scrappy little tool that does one thing well — gives AI agents a memory they can actually depend on.

📦 The one-sentence pitch

Point Ragamuffin at a folder of documents. Your agents can search it, learn from it, and write back to it. Every session starts with everything they learned before.

View on GitHub →