a cli that turns 7b reasoning into 70b
Every other tool injects everything. The full 500-line file. The entire build log. All 400 lines of test output. By turn 10, the context is 80% noise. Your 7B model can't find the signal. Not because it's weak. Because other tools buried it.
Beacon compresses every tool output at the moment it's produced, reinjects your goal before every response, and runs in microseconds, inside the 176 KB binary, at $0 forever.

you've seen this
You didn't imagine it. Every AI coding agent degrades over long sessions. It re-reads files it just read. It forgets what you asked. It starts guessing. It edits the wrong file. You called it context drift. You called it model limitations. You upgraded the model, tried a bigger one, tried prompt engineering.
The degradation always came back. Because the model was never the problem.
You blamed the model. You paid for a bigger one. The problem wasn't intelligence. It was what the tool was feeding into the context window.
why it happens
They don't compress. They inject. Read a 500-line file: 2,000 tokens injected. Run a build: 1,200 more. Run tests: 3,000 more. After 10 tool calls, you have 15,000+ tokens of noise. Your original goal has been pushed out of the visible window entirely.
The model isn't degrading. It was never given a chance to succeed.
| Operation | Other tools | Beacon | Saved |
|---|---|---|---|
| Read 500-line file | ~2,000 tokens | ~480 tokens | 76% less |
| Run build command | ~1,200 tokens | ~120 tokens | 90% less |
| Run test suite | ~3,000 tokens | ~300 tokens | 90% less |
| Goal retention | gone by turn 8 | every turn | always |
Multiply by 10 reads, 5 build runs, 3 test runs across a session: you're injecting 35,000+ tokens of noise. A 7B model with 8K context doesn't forget. It runs out of room.
Built in-house
·miii-cli only
Not because we have a better model. Because Beacon feeds the model exactly what it needs and nothing else. Every tool output is compressed at the moment it's produced: file reads become excerpts, build output becomes first and last lines, failures surface, noise disappears.
Then, before every single response, Beacon reinjects your original goal. The model always knows where it is, what it's doing, and why. Coherent at depth 20. Where every other tool is dead at depth 9.
Context window at each depth
Every other tool
Accept degradation, or pay for summarization
miii + Beacon
In-house engine, ships free with every install
How Beacon works
File reads show line counts and excerpts, not full content. Command output shows first and last lines. Test results surface only failures. Context stays dense and relevant.
Beacon extracts your objective once at session start, then reinjects it synchronously before every response. The model can never drift from what you actually asked.
Pure string operations measured in microseconds. No embeddings, no summarization LLM calls, no hidden API costs. Beacon runs entirely in-process.
Who it's for
Your code never leaves your machine. Nothing sent to Anthropic, OpenAI, or anyone. Healthcare, fintech, defense: miii is built for you.
16 GB RAM, a GPU: if you're already running Ollama, miii adds $0 to your stack. No API key, no subscription, no invoice. The machine you own becomes the tool.
Try Llama 3.2, Qwen 2.5-coder, and any Ollama model side by side. Switch live mid-session with /model. Not locked to one provider's release schedule.
Works where cloud AI can't: isolated networks, regulated industries, zero-internet environments. No telemetry, no callbacks, no exceptions.
Honest comparison
OpenCode is also free and local. The rows where miii stands alone are the ones no cloud tool ships. Beacon chief among them.
| Feature | miii | Claude Code | OpenCode | Pi | Codex |
|---|---|---|---|---|---|
| Monthly cost | $0 | Pro/Max | $0 | API key | Pro plan |
| Bundle size | 176 KB | — | — | — | — |
| Beacon context mgmt★ | ✓ | ✗ | ✗ | ✗ | ✗ |
| Air-gapped | ✓ | ✗ | ✗ | ✗ | ✗ |
| Switch model live | ✓ /model | ✗ | ✗ | ✓ /model | ✗ |
| File checkpoints | ✓ | ✗ | ✗ | ✗ | ✗ |
| Shadow git | ✓ | ✗ | ✗ | ✗ | ✗ |
| Call graph (AST) | ✓ | ✗ | ✗ | ✗ | ✗ |
| Shell sandbox | ✓ | partial | ✗ | partial | cloud |
| Permission gates | ✓ | ✓ | partial | partial | partial |
| Skills / npm | ✓ | skills | ✗ | partial | ✗ |
| MCP client | ✓ | ✓ | ✓ | ✗ | partial |
* Bundle size: competitor sizes not independently verified; shown for miii only.
Bottom line
Cloud tools are good at what they do. If you need any model, no bill, genuine offline, Beacon's context engine, and undo that survives a session restart: miii.
miii-cli
176 KB. Install once, point at Ollama, and every feature below works: offline, on your hardware, with any model Ollama supports. No hidden dependencies, no cloud callbacks.
github.com/maruakshay/miii-cliPurpose-built context engine. Compresses per-tool, injects goal state, zero LLM overhead. Agents stay coherent at 50 tool calls, not just 5.
Static map of every function, class, and method across your codebase. Pure parser, no model needed. Lets small local models reason about architecture like large ones.
Reads imports, the focused region, and the footer, not the entire file. Keeps context tight on large codebases without losing structural awareness.
Every model edit auto-committed to .miii/shadow.git. /undo rolls back individual AI commits across sessions, not just the current one.
Pre-edit snapshot of every file before the write. Esc rolls back the entire turn. Survives crashes and restarts.
OS-level sandbox on every shell command. Write access limited to project dir. Defense-in-depth on top of permission prompts.
New
·Claude Skills
miii now loads Claude skills directly — install any npm skill package and it surfaces as a native slash command in your session. No wiring, no manifest editing. The skill runs inside your context engine, so Beacon keeps compressing while it executes.
npm i -g <skill> — that's it. miii discovers the skill at startup. No config file to edit, no restart required after the first load.
Every installed skill registers as a /command in the session. Tab-complete works. Beacon compresses skill output the same way it compresses tool output.
Skills run against whichever model is active. Switch to a local Ollama model with /model and the skill keeps working — no cloud dependency baked in.
Also in the ecosystem
Chat UI connecting to your local Ollama models. Runs in your browser. No account, no telemetry, no cloud relay.
github.com/maruakshay/miiiThe capability layer. AI security skills (threat modeling, code review, vulnerability analysis) available to both the CLI and the web app.
github.com/maruakshay/mii-ai-securityGet started
Runs on your hardware. Minimum to get started:
No GPU? CPU-only works. Slower inference, especially on larger models.