I wanted to make exploring large codebases cheaper and faster. Along the way I ended up compressing file reads for claude without writing a parser. The savings seemed to work and the agents started behaving differently.
Parsers Need Not Apply
Can Boluk’s “The Harness Problem” introduced me to the concept of hashlines: a format where every line in a file gets a content hash identifier, so an LLM can reference lines by tag instead of reproducing code verbatim. It was a smart approach to compressing edits. But was anyone tackling reads?
Most approaches to giving LLM agents better code context rely on parsing: ASTs, tree-sitter, LSP. They work, but they require per-language grammars, build systems, and indexing steps. Humans don’t navigate code that way. We scan for structure, spot patterns, and zero in on the parts that matter. I wanted to see if there was something in that gap: a way to compress reads that was language-agnostic, required no indexing, and still fit naturally into the read/grep workflow agents already use.
Warning (I couldn't get edit side to reliably work)
Despite being shown tagged content and told it could reference lines by tag, the model stripped the tags mentally and reproduced full text every time. Zero tag-based edits across hundreds of Edit calls. Perhaps it’s a limitation of what’s available in the hooks, or just something fundamental I’m missing. I’ll revisit it another day.
Reading, though, was a whole different story. Serving structural outlines instead of full files cut total input characters by 29-37% and read costs by 31-43%. I cut the edit experiment and shipped the reads as Strata. So what did I actually do?
Maps, Modes, and More
The core of the read optimization: instead of dumping the full file on a Read call, intercept it and serve a structural outline. Every node is tagged with its line number and a label:
handlers.py [1274 lines] --- [1-15] 1:from flask import Blueprint [16-89] 16:class UserHandler: [16-42] 17:def get(self, user_id): [43-89] 44:def update(self, user_id, data): [90-156] 90:class OrderHandler: [90-120] 91:def create(self, order_data): [121-156] 122:def cancel(self, order_id): [157-290] 8 similar regions sample: 157:class ProductHandler: ...1,274 lines compressed to ~30 lines of outline. The hierarchy shows that UserHandler contains get and update; the 8 collapsed classes below it are structurally repetitive and get a single representative node. The same approach works on files where parsers don’t usually reach:
schema.sql [811 lines] --- [1-2] 1:-- Database schema for e-commerce platform [3-11] 4:CREATE TABLE users ( [12-21] 13:CREATE TABLE products ( [22-29] 23:CREATE TABLE orders ( [30-37] 31:CREATE TABLE order_items ( [38-90] 39:-- Seed data [91-191] 92:INSERT INTO products (...) [192-392] 193:INSERT INTO orders (...) [393-793] 394:INSERT INTO order_items (...) [794-800] 795:-- Performance indexes [801-811] 802:-- Materialized views811 lines to 11 nodes. The model sees the full structure and can expand what it needs.
Not every file benefits equally from compression. A 500-line file wastes real context on repeat reads; a 120-line file barely registers. The threshold between “always outline” and “let the first read through” ended up as two different modes:
- Mode 1 (>= 300 lines): Outline replaces untargeted reads. The model can use the outline to explore structure, or bypass it entirely by reading with offset/limit to get raw content directly.
- Mode 2 (100-299 lines): First read passes through normally. Repeat reads serve the outline. The agent already has full content in context; the outline stops it from paying for the same file twice. In testing, Mode 2 fired on files as small as 102 lines. Agents re-read files constantly during plan-edit-verify cycles. The second read is where Mode 2 earns its keep.
Strata is a plugin built around claude hooks: scripts that fire before or after tool calls. The agent thinks it’s using normal Read/Edit tools (mostly).
pre-read.shintercepts reads and serves cached outlines.post-edit.shinvalidates the cache when files change.session-start.shprimes the agent with awareness of both modes before its first tool call.subagent-start.shdoes the same for subagents:claude’s SubagentStart hook fires when a subagent spawns, so every agent in the session starts with outline awareness.
The harder question was what to put in the outline.
No Parser, No Problem
I wanted something that didn’t depend on whitespace conventions or a lot of language-specific rules.
The structural analyzer draws from the spirit of a Binary Space Partitioning tree where the splitting heuristic is Shannon entropy.
That’s the whole idea. BSP is a technique from game rendering: pick the best plane to divide a 3D space into two halves, recurse into each half, stop when the regions are simple enough. Swap “3D space” for “contiguous lines of text” and “best plane” for “the line where information content changes most.” The analogy maps cleanly:
| BSP (3D rendering) | Strata (text files) |
|---|---|
| Pick the best splitting plane | Pick the line with the strongest entropy gradient |
| Recurse into each half | Recurse into each text segment |
| Stop when regions are uniform | Stop at MIN_REGION (5 lines) or max depth |
| Axis-aligned planes are cheap first checks | Blank lines, brackets, dedents are cheap first checks |
| Pre-placed splitting planes (portals, walls) | Visual separator comments (// ====, # ----) |
| Choose best plane from candidates | Boundary budget: keep top N by entropy score |
Per-line Shannon entropy measures character distribution diversity.
- A blank line: 0 bits.
{{{{{: near 0 bits (one character repeated).const result = await fetchData(url, { headers: authConfig });: around 4 bits (many distinct characters).
Entropy transitions (the gradient between adjacent smoothed values) tell you where the content of a file changes. A jump from low-entropy import statements to high-entropy function bodies is a structural boundary, regardless of language.
I wanted the plugin to hook on every Read call without any real slowdown for the agent making Read calls. It has to analyze any file, in any language, in milliseconds. No extensive grammar files, no build step, no per-language configuration.
Note (Prior art)
Binary analysis techniques pointed the way: malware analysts use entropy over sliding windows to find structural boundaries in executables without knowing what the binary contains. Hindle et al.’s “On the Naturalness of Software” reinforced the idea, showing that code has predictable entropy patterns.
In practice, cheap structural heuristics (blank lines, bracket depth returns, indentation drops) handle most files without needing full entropy computation. Entropy steps in when those shortcuts miss: dense class bodies with no blank lines, uniform data that shouldn’t be split, regions where complexity varies. A boundary budget at each recursion level limits how many splits survive, and entropy also controls depth: complex regions get more detail in the outline while boilerplate gets summarized. (The full algorithm breakdown is in Strata: The Algorithm.)
The concrete results:
- streaming-session.ts: a 400-line class went from a large blob to showing just individual methods
- custom-tools.ts: a 316-line tools definition list went from 1 node (without entropy) to showing each tool definition
- test-dump.sql: uniform INSERT data stayed collapsed (entropy was too uniform to split meaningfully)
- Across test files: 3 improved, 2 unchanged, 0 worse versus the baseline
Measured performance: 218ms for 26,000 lines of C++, 279ms for 45,000 lines of XML. Works on anything with text.
For tasks that need precise semantic analysis, parsers still win. Strata trades structural precision for universal applicability: good enough everywhere, including where parsers don’t always work. Subdividing the tree is only half the problem, though. Repetitive regions still bloat the output.
60 Classes, One Node
Game engines use level-of-detail pruning to simplify distant geometry: if a cluster of objects looks the same from far away, render one representative instead of all of them. Strata does the same thing to the BSP tree after subdivision.
Consecutive sibling nodes are compared via Jaccard similarity on character trigrams. Runs of 3+ similar siblings collapse into a single representative node. The similarity threshold adapts based on entropy: repetitive regions collapse easily, distinct regions resist collapsing. Same signal, different job.
[5-26164] 60 similar regions sample: 42:class Widget0 {26,000 lines of C++ with 60 repetitive class definitions become 3 nodes and 167 characters. That’s 99.98% compression (perhaps a bit meaningless, but I’d also scroll through 99% mindlessly at first). 45,000 lines of XML become 100 nodes. A 607-line mixed XML file with varied content compresses to 18 nodes: the algorithm preserves structure that varies and only collapses what repeats. (More on the similarity mechanics in The Algorithm.)
Without similarity collapse, the outline for a repetitive file would be just as long as the sections it describes. With it, 1,274 lines of request handlers reduce to ~30 lines that actually tell the model what the file contains.
That handles single files.
Cross-file relationships were the next problem.
Who Needs Imports?
The structural analyzer already identifies blocks and their headers. Headers contain definitions. Bodies contain references. Cross-referencing them by token rarity gives you a dependency graph without parsing a single import statement.
handlers.py [1274 lines] connections: -> models.py, <- serializers.py, <-> services.pyWhen tested against a real codebase, the token-based connections closely matched the actual import graph. Closer than I expected, but I haven’t measured the match rate extensively.
The index persists to disk and grows organically as the agent explores — it only contains files that have been read, but connections accumulate across sessions. (The TF-IDF mechanics behind the connection graph are in The Algorithm.)
The key thing: zero import parsing and a promising cross-reference list.
With single-file and cross-file structure both working, the question became whether any of it actually saved tokens.
Levels of Details on Text?
I ran Strata on greg, a TypeScript Discord bot, comparing plugin runs against an identical no-plugin baseline on the same task.
Total input characters dropped 29-37% across comparable runs. (These are cumulative characters flowing through tool results over a session, not the context window size, which is a separate constraint.)
| Plugin (latest) | Plugin (alt run) | Baseline | |
|---|---|---|---|
| Total chars | 284K | 254K | 400K |
| Read chars | 191K | 177K | 276K |
| Reads | 67 | 60 | 43 |
| Edits | 43 | 35 | 59 |
| Files touched | 16 | 14 | 16 |
| Agents | 1 | 1 | 1 |
Reads are where the plugin has leverage: edits pass through unchanged. The read column is the one to watch: 276K→191K (-31%) and 276K→177K (-36%).
A larger session on the same codebase showed the effect at scale: 7 agents, 208 reads, 22 files, 44 outlines served. Actual read cost: 831K chars. Without the plugin, conservatively: ~1,464K chars. That’s ~43% savings on reads.
Per-file compression was extreme. turn-executor.ts went from 26,400 chars to 215 chars of outline. Subagents inherited outline awareness from the SubagentStart hook and skipped outlines entirely, going straight to targeted reads of 25-50 lines based on the structural context they already had.
Outlining replaced read-whole-file-once. Exploration becomes more natural: get the gist of a codebase, then dig into details. This is how I approach unfamiliar code as well.
Note (Only three 'tests' runs on one codebase)
The savings were consistent every time, but whether this generalizes is an open question.
Two Weeks of Getting It Wrong
The clean results came from two weeks of iteration. Along the way, a few things broke in ways that taught more than the successes did.
More Compact, More Work
Midway through, I had an idea:
“Can we make the outlines more compact? Group the leaf nodes into read ranges, cut the size down.”
claude built “compact outlines”: prescriptive 60-line read ranges, shrinking the outline by 88%. I assumed smaller outlines would save more tokens.
claude responded by making 6.6x more targeted reads. 88% smaller outline. 660% more work.
The prescriptive language (“read exactly the ranges shown”) turned every outline entry into a read request. Instead of scanning the outline and deciding what mattered, the model treated every entry as an instruction.
Verbose outlines were better. They gave the model enough semantic detail to decide what to skip. This was counterintuitive and only showed up in the data: the agent with more information per read made fewer reads total.
Where Did the Savings Go?
Despite 29-37% lower total input, both plugin and baseline sessions showed the same ~25% context window fill when checked mid-session. The system prompt and tool definitions are ~29K tokens of fixed overhead: 58% of the main agent’s context window, identical between plugin and baseline. In short sessions, fixed cost dominates and variable savings are invisible.
The plugin’s advantage compounds over time. As sessions get longer, the variable portion grows while fixed stays at 29K. With slower context growth, summarization kicks in later, preserving decision-relevant history longer.
Subagents were the other blind spot. They start with zero shared context, so early versions of Strata couldn’t reach them. They’d receive an outline and immediately request the full file because they didn’t understand what they were looking at. Nine subagents each re-reading the same 650-line file: 810K chars of redundant reads. The SubagentStart hook fixed it: once subagents started with outline awareness, the large session’s 7 agents maintained 75% targeted reads across 208 total reads.
What Almost Killed It
Every outline-then-targeted-read costs two turns instead of one. In the original implementation, I over-constrained claude: I limited the target reads to a hardcoded value. On a denser codebase with more files to read before editing, the plugin burned through turns. One session silently stopped mid-task (there’s a 100 turn limit). Another completed 3 of 4 steps, summarized its work with characteristic confidence, and never mentioned the fourth.
The turn tax is real. It’s the main tradeoff of the outline approach: cheaper reads, but more of them. The results held because the context savings outweighed the extra turns; but on tasks that are very turn heavy rather than context-limited, this could and did flip.
What was going wrong? Could I do better?
Provide claude the Map, Not the Route
Two things:
- Tell
claudeupfront about the new read paradigm - Give
claudethe agency to decide the actual file range and not hardcode it.
claude’s SessionStart hook fires before the agent’s first tool call (on startup, after /clear, and after context compaction). I used it to prime the agent with awareness of the outline system. Three iterations of the wording failed before one worked:
- “Use large ranges, limit=100-200”: too specific. The agent used exactly those limits, making 4-5 reads per file instead of one. Average limit: 57 lines.
- “Read the full range shown in the outline”: better, average limit jumped to 160-187. But the agent still treated outlines as mandatory first steps, doubling reads.
- “Skip the outline when editing”: still partially prescriptive. Subagents ignored the guidance and fell back to outline-then-read patterns. Net result: 836K chars vs 448K baseline. Worse than no plugin.
What worked: making the context purely informative. Explain the two modes, state the bypass mechanism, and let the model choose:
Strata structural outlines are active for large files (300+ lines). Exploring or navigating: Read without offset/limit to get the outline. Preparing to edit: Use the outline’s line ranges to read the section you need … …For files over 2000 lines, paginate: read offset=1 limit=2000, then offset=2001 limit=2000, etc. You choose how much context to read based on what you’re changing. You choose the mode per read based on your intent.
claude immediately did the right thing. On exploration phases, it used outlines to scan structure. On edit phases, it read files in full with offset=1, limit=2000, skipping the outline entirely. Two consecutive runs confirmed: 284K total chars vs the baseline’s 400K. Then 254K on a second run: 29-37% reduction in total input.
The pattern kept repeating: tell the model how to use a tool and it follows the recipe. Tell it what the tool does and it uses it appropriately. This feels a bit counterintuitive, but it makes sense as the models are leaning into solving problems via tools and finding their own route.
Caution Looks Like Efficiency
When an agent has the high-level picture in its head, it reads more precisely and cheaply. Without the picture, the agent has to read it all, even if the info isn’t useful; reads are expensive. That’s the simplest version of what happened. The plugin-equipped agent built a better model of the codebase before it started writing code, because exploring structure was no longer a significant cost.
Entropy does more work than I initially expected: it drives boundary selection, similarity collapse, and depth control. Isolating what each signal contributes is still something I haven’t measured.
The obvious limits: one codebase, not a wide sampling. The savings are consistent every time, but whether this generalizes to other codebases, languages, or task types is an open question; time will tell.
A handful of scripts, a structural analyzer, and an interesting case of trying to get an agent to explore more like a person.