a cli that turns 7b reasoning into 70b

Your model
isn't the problem.
The context it received was.

Every other tool injects everything. The full 500-line file. The entire build log. All 400 lines of test output. By turn 10, the context is 80% noise. Your 7B model can't find the signal. Not because it's weak. Because other tools buried it.

Beacon compresses every tool output at the moment it's produced, reinjects your goal before every response, and runs in microseconds, inside the 176 KB binary, at $0 forever.

npm i -g miii-cli Web app

local cost

176 KB

install size

95%

context saved

50+

tool calls deep

miii

you've seen this

Turn 3, it's sharp.
Turn 12, it's gone.

You didn't imagine it. Every AI coding agent degrades over long sessions. It re-reads files it just read. It forgets what you asked. It starts guessing. It edits the wrong file. You called it context drift. You called it model limitations. You upgraded the model, tried a bigger one, tried prompt engineering.

The degradation always came back. Because the model was never the problem.

agent sessionno beacon

turn 01Goal parsed. First file read. First edit lands on target.

turn 03Second tool call. Still on task.

turn 07Re-reads src/auth.ts. Already read at turn 02.drift

turn 10Original goal pushed out of visible context window.lost

turn 12Edits the wrong file. Loops on stale output.failing

turn 15Context window full. Session terminated.dead

You blamed the model. You paid for a bigger one. The problem wasn't intelligence. It was what the tool was feeding into the context window.

why it happens

Every other tool
poisons the context.

They don't compress. They inject. Read a 500-line file: 2,000 tokens injected. Run a build: 1,200 more. Run tests: 3,000 more. After 10 tool calls, you have 15,000+ tokens of noise. Your original goal has been pushed out of the visible window entirely.

The model isn't degrading. It was never given a chance to succeed.

Operation	Other tools	Beacon	Saved
Read 500-line file	~2,000 tokens	~480 tokens	76% less
Run build command	~1,200 tokens	~120 tokens	90% less
Run test suite	~3,000 tokens	~300 tokens	90% less
Goal retention	gone by turn 8	every turn	always

Multiply by 10 reads, 5 build runs, 3 test runs across a session: you're injecting 35,000+ tokens of noise. A 7B model with 8K context doesn't forget. It runs out of room.

Built in-house

miii-cli only

Beacon is why 7B
outperforms 70B in a long session.

Not because we have a better model. Because Beacon feeds the model exactly what it needs and nothing else. Every tool output is compressed at the moment it's produced: file reads become excerpts, build output becomes first and last lines, failures surface, noise disappears.

Then, before every single response, Beacon reinjects your original goal. The model always knows where it is, what it's doing, and why. Coherent at depth 20. Where every other tool is dead at depth 9.

Context window at each depth

Without Beaconcrashes at depth 9

depth 1

depth 2

depth 3

depth 4

depth 5

depth 6

depth 7

depth 8

context full ✗

With Beaconcompletes at depth 20

depth 1

depth 2

depth 3

depth 4

depth 5

depth 6

depth 7

depth 8

Goal Block

· · ·

depth 20

✓ complete

token accumulation (no compression)

token accumulation (Beacon compressed)

Every other tool

Accept degradation, or pay for summarization

✗Full file dumps injected — thousands of tokens per read
✗Complete command stdout, even 500-line build logs
✗Goal forgotten by turn 8
✗LLM summarization: extra cost, extra latency, lossy

miii + Beacon

In-house engine, ships free with every install

✓File reads: line count + excerpt only — signal, not noise
✓Commands: first + last lines, failures surface automatically
✓Goal reinject before every response — drift is structurally impossible
✓Pure strings, microseconds, zero API calls, zero added cost

How Beacon works

Per-tool compression

File reads show line counts and excerpts, not full content. Command output shows first and last lines. Test results surface only failures. Context stays dense and relevant.

Goal injection

Beacon extracts your objective once at session start, then reinjects it synchronously before every response. The model can never drift from what you actually asked.

Zero overhead

Pure string operations measured in microseconds. No embeddings, no summarization LLM calls, no hidden API costs. Beacon runs entirely in-process.

Who it's for

Privacy first

Your code never leaves your machine. Nothing sent to Anthropic, OpenAI, or anyone. Healthcare, fintech, defense: miii is built for you.

Ollama user

16 GB RAM, a GPU: if you're already running Ollama, miii adds $0 to your stack. No API key, no subscription, no invoice. The machine you own becomes the tool.

Model explorer

Try Llama 3.2, Qwen 2.5-coder, and any Ollama model side by side. Switch live mid-session with /model. Not locked to one provider's release schedule.

Air-gapped orgs

Works where cloud AI can't: isolated networks, regulated industries, zero-internet environments. No telemetry, no callbacks, no exceptions.

Honest comparison

You're probably paying for something
miii does for free.

OpenCode is also free and local. The rows where miii stands alone are the ones no cloud tool ships. Beacon chief among them.

Feature	miii	Claude Code	OpenCode	Pi	Codex
Monthly cost	$0	Pro/Max	$0	API key	Pro plan
Bundle size	176 KB	—	—	—	—
Beacon context mgmt★	✓	✗	✗	✗	✗
Air-gapped	✓	✗	✗	✗	✗
Switch model live	✓ /model	✗	✗	✓ /model	✗
File checkpoints	✓	✗	✗	✗	✗
Shadow git	✓	✗	✗	✗	✗
Call graph (AST)	✓	✗	✗	✗	✗
Shell sandbox	✓	partial	✗	partial	cloud
Permission gates	✓	✓	partial	partial	partial
Skills / npm	✓	skills	✗	partial	✗
MCP client	✓	✓	✓	✗	partial

* Bundle size: competitor sizes not independently verified; shown for miii only.

Bottom line

Cloud tools are good at what they do. If you need any model, no bill, genuine offline, Beacon's context engine, and undo that survives a session restart: miii.

miii-cli

Not just Beacon.

176 KB. Install once, point at Ollama, and every feature below works: offline, on your hardware, with any model Ollama supports. No hidden dependencies, no cloud callbacks.

github.com/maruakshay/miii-cli

Beacon

Purpose-built context engine. Compresses per-tool, injects goal state, zero LLM overhead. Agents stay coherent at 50 tool calls, not just 5.

Call graph (AST)

Static map of every function, class, and method across your codebase. Pure parser, no model needed. Lets small local models reason about architecture like large ones.

Windowed file reads

Reads imports, the focused region, and the footer, not the entire file. Keeps context tight on large codebases without losing structural awareness.

Shadow git

Every model edit auto-committed to .miii/shadow.git. /undo rolls back individual AI commits across sessions, not just the current one.

File checkpoints

Pre-edit snapshot of every file before the write. Esc rolls back the entire turn. Survives crashes and restarts.

Shell sandbox

OS-level sandbox on every shell command. Write access limited to project dir. Defense-in-depth on top of permission prompts.

New

Claude Skills

Drop in any Claude skill.
Zero config. Instant slash command.

miii now loads Claude skills directly — install any npm skill package and it surfaces as a native slash command in your session. No wiring, no manifest editing. The skill runs inside your context engine, so Beacon keeps compressing while it executes.

Install once

npm i -g <skill> — that's it. miii discovers the skill at startup. No config file to edit, no restart required after the first load.

Native slash commands

Every installed skill registers as a /command in the session. Tab-complete works. Beacon compresses skill output the same way it compresses tool output.

Any model, any skill

Skills run against whichever model is active. Switch to a local Ollama model with /model and the skill keeps working — no cloud dependency baked in.

Example — add a security skill

npm i -g mii-ai-security

# restart miii — skill auto-discovered

/threat-modelsrc/auth/middleware.ts

Also in the ecosystem

miii

web app

Chat UI connecting to your local Ollama models. Runs in your browser. No account, no telemetry, no cloud relay.

github.com/maruakshay/miii

mii-ai-security

skills

The capability layer. AI security skills (threat modeling, code review, vulnerability analysis) available to both the CLI and the web app.

github.com/maruakshay/mii-ai-security

Get started

Three steps. Then you're in.

Runs on your hardware. Minimum to get started:

RAM

16 GB min

GPU

Dedicated recommended

Storage

10 GB+ for models

No GPU? CPU-only works. Slower inference, especially on larger models.

01Install Ollama

# install ollama first → ollama.ai

brew install ollama

ollama pull llama3.2

02Install miii-clivlatest · 176 KB

npm i -g miii-cli

03Run it

miii

miii web app

# clone & run locally

git clone https://github.com/maruakshay/miii

cd miii && npm install && npm run dev

Your modelisn't the problem.The context it received was.

Turn 3, it's sharp.Turn 12, it's gone.

Every other toolpoisons the context.

Beacon is why 7Boutperforms 70B in a long session.

You're probably paying for somethingmiii does for free.

Not just Beacon.

Drop in any Claude skill.Zero config. Instant slash command.

miii

mii-ai-security

Three steps. Then you're in.

Your model
isn't the problem.
The context it received was.

Turn 3, it's sharp.
Turn 12, it's gone.

Every other tool
poisons the context.

Beacon is why 7B
outperforms 70B in a long session.

You're probably paying for something
miii does for free.

Drop in any Claude skill.
Zero config. Instant slash command.