a cli that turns 7b reasoning into 70b

Your model
isn't the problem.
The context it received was.

Every other tool injects everything. The full 500-line file. The entire build log. All 400 lines of test output. By turn 10, the context is 80% noise. Your 7B model can't find the signal. Not because it's weak. Because other tools buried it.

Beacon compresses every tool output at the moment it's produced, reinjects your goal before every response, and runs in microseconds, inside the 176 KB binary, at $0 forever.

$0
local cost
176 KB
install size
95%
context saved
50+
tool calls deep
miii
miii CLI demo

you've seen this

Turn 3, it's sharp.
Turn 12, it's gone.

You didn't imagine it. Every AI coding agent degrades over long sessions. It re-reads files it just read. It forgets what you asked. It starts guessing. It edits the wrong file. You called it context drift. You called it model limitations. You upgraded the model, tried a bigger one, tried prompt engineering.

The degradation always came back. Because the model was never the problem.

agent sessionno beacon
turn 01Goal parsed. First file read. First edit lands on target.
turn 03Second tool call. Still on task.
turn 07Re-reads src/auth.ts. Already read at turn 02.drift
turn 10Original goal pushed out of visible context window.lost
turn 12Edits the wrong file. Loops on stale output.failing
turn 15Context window full. Session terminated.dead

You blamed the model. You paid for a bigger one. The problem wasn't intelligence. It was what the tool was feeding into the context window.

why it happens

Every other tool
poisons the context.

They don't compress. They inject. Read a 500-line file: 2,000 tokens injected. Run a build: 1,200 more. Run tests: 3,000 more. After 10 tool calls, you have 15,000+ tokens of noise. Your original goal has been pushed out of the visible window entirely.

The model isn't degrading. It was never given a chance to succeed.

OperationOther toolsBeaconSaved
Read 500-line file~2,000 tokens~480 tokens76% less
Run build command~1,200 tokens~120 tokens90% less
Run test suite~3,000 tokens~300 tokens90% less
Goal retentiongone by turn 8every turnalways

Multiply by 10 reads, 5 build runs, 3 test runs across a session: you're injecting 35,000+ tokens of noise. A 7B model with 8K context doesn't forget. It runs out of room.

Built in-house

·

miii-cli only

Beacon is why 7B
outperforms 70B in a long session.

Not because we have a better model. Because Beacon feeds the model exactly what it needs and nothing else. Every tool output is compressed at the moment it's produced: file reads become excerpts, build output becomes first and last lines, failures surface, noise disappears.

Then, before every single response, Beacon reinjects your original goal. The model always knows where it is, what it's doing, and why. Coherent at depth 20. Where every other tool is dead at depth 9.

Context window at each depth

Without Beaconcrashes at depth 9
depth 1
depth 2
depth 3
depth 4
depth 5
depth 6
depth 7
depth 8
context full ✗
With Beaconcompletes at depth 20
depth 1
depth 2
depth 3
depth 4
depth 5
depth 6
depth 7
depth 8
· · ·
depth 20
token accumulation (no compression)
token accumulation (Beacon compressed)

Every other tool

Accept degradation, or pay for summarization

  • Full file dumps injected — thousands of tokens per read
  • Complete command stdout, even 500-line build logs
  • Goal forgotten by turn 8
  • LLM summarization: extra cost, extra latency, lossy

miii + Beacon

In-house engine, ships free with every install

  • File reads: line count + excerpt only — signal, not noise
  • Commands: first + last lines, failures surface automatically
  • Goal reinject before every response — drift is structurally impossible
  • Pure strings, microseconds, zero API calls, zero added cost

How Beacon works

Per-tool compression

File reads show line counts and excerpts, not full content. Command output shows first and last lines. Test results surface only failures. Context stays dense and relevant.

Goal injection

Beacon extracts your objective once at session start, then reinjects it synchronously before every response. The model can never drift from what you actually asked.

Zero overhead

Pure string operations measured in microseconds. No embeddings, no summarization LLM calls, no hidden API costs. Beacon runs entirely in-process.

Who it's for

Privacy first

Your code never leaves your machine. Nothing sent to Anthropic, OpenAI, or anyone. Healthcare, fintech, defense: miii is built for you.

Ollama user

16 GB RAM, a GPU: if you're already running Ollama, miii adds $0 to your stack. No API key, no subscription, no invoice. The machine you own becomes the tool.

Model explorer

Try Llama 3.2, Qwen 2.5-coder, and any Ollama model side by side. Switch live mid-session with /model. Not locked to one provider's release schedule.

Air-gapped orgs

Works where cloud AI can't: isolated networks, regulated industries, zero-internet environments. No telemetry, no callbacks, no exceptions.

Honest comparison

You're probably paying for something
miii does for free.

OpenCode is also free and local. The rows where miii stands alone are the ones no cloud tool ships. Beacon chief among them.

FeaturemiiiClaude CodeOpenCodePiCodex
Monthly cost$0Pro/Max$0API keyPro plan
Bundle size176 KB
Beacon context mgmt
Air-gapped
Switch model live✓ /model✓ /model
File checkpoints
Shadow git
Call graph (AST)
Shell sandboxpartialpartialcloud
Permission gatespartialpartialpartial
Skills / npmskillspartial
MCP clientpartial

* Bundle size: competitor sizes not independently verified; shown for miii only.

Bottom line

Cloud tools are good at what they do. If you need any model, no bill, genuine offline, Beacon's context engine, and undo that survives a session restart: miii.

miii-cli

Not just Beacon.

176 KB. Install once, point at Ollama, and every feature below works: offline, on your hardware, with any model Ollama supports. No hidden dependencies, no cloud callbacks.

github.com/maruakshay/miii-cli
Beacon

Purpose-built context engine. Compresses per-tool, injects goal state, zero LLM overhead. Agents stay coherent at 50 tool calls, not just 5.

Call graph (AST)

Static map of every function, class, and method across your codebase. Pure parser, no model needed. Lets small local models reason about architecture like large ones.

Windowed file reads

Reads imports, the focused region, and the footer, not the entire file. Keeps context tight on large codebases without losing structural awareness.

Shadow git

Every model edit auto-committed to .miii/shadow.git. /undo rolls back individual AI commits across sessions, not just the current one.

File checkpoints

Pre-edit snapshot of every file before the write. Esc rolls back the entire turn. Survives crashes and restarts.

Shell sandbox

OS-level sandbox on every shell command. Write access limited to project dir. Defense-in-depth on top of permission prompts.

New

·

Claude Skills

Drop in any Claude skill.
Zero config. Instant slash command.

miii now loads Claude skills directly — install any npm skill package and it surfaces as a native slash command in your session. No wiring, no manifest editing. The skill runs inside your context engine, so Beacon keeps compressing while it executes.

Install once

npm i -g <skill> — that's it. miii discovers the skill at startup. No config file to edit, no restart required after the first load.

Native slash commands

Every installed skill registers as a /command in the session. Tab-complete works. Beacon compresses skill output the same way it compresses tool output.

Any model, any skill

Skills run against whichever model is active. Switch to a local Ollama model with /model and the skill keeps working — no cloud dependency baked in.

Example — add a security skill
npm i -g mii-ai-security
# restart miii — skill auto-discovered
/threat-modelsrc/auth/middleware.ts

Also in the ecosystem

miii

web app

Chat UI connecting to your local Ollama models. Runs in your browser. No account, no telemetry, no cloud relay.

github.com/maruakshay/miii

mii-ai-security

skills

The capability layer. AI security skills (threat modeling, code review, vulnerability analysis) available to both the CLI and the web app.

github.com/maruakshay/mii-ai-security

Get started

Three steps. Then you're in.

Runs on your hardware. Minimum to get started:

RAM
16 GB min
GPU
Dedicated recommended
Storage
10 GB+ for models

No GPU? CPU-only works. Slower inference, especially on larger models.

01Install Ollama
# install ollama first → ollama.ai
brew install ollama
ollama pull llama3.2
02Install miii-clivlatest · 176 KB
npm i -g miii-cli
03Run it
miii
miii web app
# clone & run locally
git clone https://github.com/maruakshay/miii
cd miii && npm install && npm run dev