I am sorry, but everyone is getting syntax highlighting wrong

Translations: Russian

Syntax highlighting is a tool. It can help you read code faster. Find things quicker. Orient yourself in a large file.

Like any tool, it can be used correctly or incorrectly. Let’s see how to use syntax highlighting to help you work.

Christmas Lights Diarrhea

Most color themes have a unique bright color for literally everything: one for variables, another for language keywords, constants, punctuation, functions, classes, calls, comments, etc.

Sometimes it gets so bad one can’t see the base text color: everything is highlighted. What’s the base text color here?

The problem with that is, if everything is highlighted, nothing stands out. Your eye adapts and considers it a new norm: everything is bright and shiny, and instead of getting separated, it all blends together.

Here’s a quick test. Try to find the function definition here:

and here:

See what I mean?

So yeah, unfortunately, you can’t just highlight everything. You have to make decisions: what is more important, what is less. What should stand out, what shouldn’t.

Highlighting everything is like assigning “top priority” to every task in Linear. It only works if most of the tasks have lesser priorities.

If everything is highlighted, nothing is highlighted.

Enough colors to remember

There are two main use-cases you want your color theme to address:

Look at something and tell what it is by its color (you can tell by reading text, yes, but why do you need syntax highlighting then?)
Search for something. You want to know what to look for (which color).

1 is a direct index lookup: color → type of thing.

2 is a reverse lookup: type of thing → color.

Truth is, most people don’t do these lookups at all. They might think they do, but in reality, they don’t.

Let me illustrate. Before:

After:

Can you see it? I misspelled return for retunr and its color switched from red to purple.

I can’t.

Here’s another test. Close your eyes (not yet! Finish this sentence first) and try to remember what color your color theme uses for class names?

Can you?

If the answer for both questions is “no”, then your color theme is not functional. It might give you comfort (as in—I feel safe. If it’s highlighted, it’s probably code) but you can’t use it as a tool. It doesn’t help you.

What’s the solution? Have an absolute minimum of colors. So little that they all fit in your head at once. For example, my color theme, Alabaster, only uses four:

Green for strings
Purple for constants
Yellow for comments
Light blue for top-level definitions

That’s it! And I was able to type it all from memory, too. This minimalism allows me to actually do lookups: if I’m looking for a string, I know it will be green. If I’m looking at something yellow, I know it’s a comment.

Limit the number of different colors to what you can remember.

If you swap green and purple in my editor, it’ll be a catastrophe. If somebody swapped colors in yours, would you even notice?

What should you highlight?

Something there isn’t a lot of. Remember—we want highlights to stand out. That’s why I don’t highlight variables or function calls—they are everywhere, your code is probably 75% variable names and function calls.

I do highlight constants (numbers, strings). These are usually used more sparingly and often are reference points—a lot of logic paths start from constants.

Top-level definitions are another good idea. They give you an idea of a structure quickly.

Punctuation: it helps to separate names from syntax a little bit, and you care about names first, especially when quickly scanning code.

Please, please don’t highlight language keywords. class, function, if, elsestuff like this. You rarely look for them: “where’s that if” is a valid question, but you will be looking not at the if the keyword, but at the condition after it. The condition is the important, distinguishing part. The keyword is not.

Highlight names and constants. Grey out punctuation. Don’t highlight language keywords.

Comments are important

The tradition of using grey for comments comes from the times when people were paid by line. If you have something like

of course you would want to grey it out! This is bullshit text that doesn’t add anything and was written to be ignored.

But for good comments, the situation is opposite. Good comments ADD to the code. They explain something that couldn’t be expressed directly. They are important.

So here’s another controversial idea:

Comments should be highlighted, not hidden away.

Use bold colors, draw attention to them. Don’t shy away. If somebody took the time to tell you something, then you want to read it.

Two types of comments

Another secret nobody is talking about is that there are two types of comments:

Explanations
Disabled code

Most languages don’t distinguish between those, so there’s not much you can do syntax-wise. Sometimes there’s a convention (e.g. -- vs /* */ in SQL), then use it!

Here’s a real example from Clojure codebase that makes perfect use of two types of comments:

Light or dark?

Per statistics, 70% of developers prefer dark themes. Being in the other 30%, that question always puzzled me. Why?

And I think I have an answer. Here’s a typical dark theme:

and here’s a light one:

On the latter one, colors are way less vibrant. Here, I picked them out for you:

This is because dark colors are in general less distinguishable and more muddy. Look at Hue scale as we move brightness down:

Basically, in the dark part of the spectrum, you just get fewer colors to play with. There’s no “dark yellow” or good-looking “dark teal”.

Nothing can be done here. There are no magic colors hiding somewhere that have both good contrast on a white background and look good at the same time. By choosing a light theme, you are dooming yourself to a very limited, bad-looking, barely distinguishable set of dark colors.

So it makes sense. Dark themes do look better. Or rather: light ones can’t look good. Science ¯\_(ツ)_/¯

But!

But.

There is one trick you can do, that I don’t see a lot of. Use background colors! Compare:

The first one has nice colors, but the contrast is too low: letters become hard to read.

The second one has good contrast, but you can barely see colors.

The last one has both: high contrast and clean, vibrant colors. Lighter colors are readable even on a white background since they fill a lot more area. Text is the same brightness as in the second example, yet it gives the impression of clearer color. It’s all upside, really.

UI designers know about this trick for a while, but I rarely see it applied in code editors:

If your editor supports choosing background color, give it a try. It might open light themes for you.

Bold and italics

Don’t use. This goes into the same category as too many colors. It’s just another way to highlight something, and you don’t need too many, because you can’t highlight everything.

In theory, you might try to replace colors with typography. Would that work? I don’t know. I haven’t seen any examples.

Myth of number-based perfection

Some themes pay too much attention to be scientifically uniform. Like, all colors have the same exact lightness, and hues are distributed evenly on a circle.

This could be nice (to know if you have OCD), but in practice, it doesn’t work as well as it sounds:

The idea of highlighting is to make things stand out. If you make all colors the same lightness and chroma, they will look very similar to each other, and it’ll be hard to tell them apart.

Our eyes are way more sensitive to differences in lightness than in color, and we should use it, not try to negate it.

Let’s design a color theme together

Let’s apply these principles step by step and see where it leads us. We start with the theme from the start of this post:

First, let’s remove highlighting from language keywords and re-introduce base text color:

Next, we remove color from variable usage:

and from function/method invocation:

The thinking is that your code is mostly references to variables and method invocation. If we highlight those, we’ll have to highlight more than 75% of your code.

Notice that we’ve kept variable declarations. These are not as ubiquitous and help you quickly answer a common question: where does thing thing come from?

Next, let’s tone down punctuation:

I prefer to dim it a little bit because it helps names stand out more. Names alone can give you the general idea of what’s going on, and the exact configuration of brackets is rarely equally important.

But you might roll with base color punctuation, too:

Okay, getting close. Let’s highlight comments:

We don’t use red here because you usually need it for squiggly lines and errors.

This is still one color too many, so I unify numbers and strings to both use green:

Finally, let’s rotate colors a bit. We want to respect nesting logic, so function declarations should be brighter (yellow) than variable declarations (blue).

Compare with what we started:

In my opinion, we got a much more workable color theme: it’s easier on the eyes and helps you find stuff faster.

Shameless plug time

I’ve been applying these principles for about 8 years now.

I call this theme Alabaster and I’ve built it a couple of times for the editors I used:

It’s also been ported to many other editors and terminals; the most complete list is probably here. If your editor is not on the list, try searching for it by name—it might be built-in already! I always wondered where these color themes come from, and now I became an author of one (and I still don’t know).

Feel free to use Alabaster as is or build your own theme using the principles outlined in the article—either is fine by me.

As for the principles themselves, they worked out fantastically for me. I’ve never wanted to go back, and just one look at any “traditional” color theme gives me a scare now.

I suspect that the only reason we don’t see more restrained color themes is that people never really thought about it. Well, this is your wake-up call. I hope this will inspire people to use color more deliberately and to change the default way we build and use color themes.

Permalink

Statistics made simple

I have a weird relationship with statistics: on one hand, I try not to look at it too often. Maybe once or twice a year. It’s because analytics is not actionable: what difference does it make if a thousand people saw my article or ten thousand?

I mean, sure, you might try to guess people’s tastes and only write about what’s popular, but that will destroy your soul pretty quickly.

On the other hand, I feel nervous when something is not accounted for, recorded, or saved for future reference. I might not need it now, but what if ten years later I change my mind?

Seeing your readers also helps to know you are not writing into the void. So I really don’t need much, something very basic: the number of readers per day/per article, maybe, would be enough.

Final piece of the puzzle: I self-host my web projects, and I use an old-fashioned web server instead of delegating that task to Nginx.

Static sites are popular and for a good reason: they are fast, lightweight, and fulfil their function. I, on the other hand, might have an unfinished gestalt or two: I want to feel the full power of the computer when serving my web pages, to be able to do fun stuff that is beyond static pages. I need that freedom that comes with a full programming language at your disposal. I want to program my own web server (in Clojure, sorry everybody else).

Existing options

All this led me on a quest for a statistics solution that would uniquely fit my needs. Google Analytics was out: bloated, not privacy-friendly, terrible UX, Google is evil, etc.

Some other JS solution might’ve been possible, but still questionable: SaaS? Paid? Will they be around in 10 years? Self-host? Are their cookies GDPR-compliant? How to count RSS feeds?

Nginx has access logs, so I tried server-side statistics that feed off those (namely, Goatcounter). Easy to set up, but then I needed to create domains for them, manage accounts, monitor the process, and it wasn’t even performant enough on my server/request volume!

My solution

So I ended up building my own. You are welcome to join, if your constraints are similar to mine. This is how it looks:

It’s pretty basic, but does a few things that were important to me.

Setup

Extremely easy to set up. And I mean it as a feature.

Just add our middleware to your Ring stack and get everything automatically: collecting and reporting.

(def app
  (-> routes
    ...
    (ring.middleware.params/wrap-params)
    (ring.middleware.cookies/wrap-cookies)
    ...
    (clj-simple-stats.core/wrap-stats))) ;; <-- just add this

It’s zero setup in the best sense: nothing to configure, nothing to monitor, minimal dependency. It starts to work immediately and doesn’t ask anything from you, ever.

See, you already have your web server, why not reuse all the setup you did for it anyway?

Request types

We distinguish between request types. In my case, I am only interested in live people, so I count them separately from RSS feed requests, favicon requests, redirects, wrong URLs, and bots. Bots are particularly active these days. Gotta get that AI training data from somewhere.

RSS feeds are live people in a sense, so extra work was done to count them properly. Same reader requesting feed.xml 100 times in a day will only count as one request.

Hosted RSS readers often report user count in User-Agent, like this:

Feedly/1.0 (+http://www.feedly.com/fetcher.html; 457 subscribers; like FeedFetcher-Google)

Mozilla/5.0 (compatible; BazQux/2.4; +https://bazqux.com/fetcher; 6 subscribers)

Feedbin feed-id:1373711 - 142 subscribers

My personal respect and thank you to everybody on this list. I see you.

Graphs

Visualization is important, and so is choosing the correct graph type. This is wrong:

Continuous line suggests interpolation. It reads like between 1 visit at 5am and 11 visits at 6am there were points with 2, 3, 5, 9 visits in between. Maybe 5.5 visits even! That is not the case.

This is how a semantically correct version of that graph should look:

Some attention was also paid to having reasonable labels on axes. You won’t see something like 117, 234, 10875. We always choose round numbers appropriate to the scale: 100, 200, 500, 1K etc.

Goes without saying that all graphs have the same vertical scale and syncrhonized horizontal scroll.

Insights

We don’t offer much (as I don’t need much), but you can narrow reports down by page, query, referrer, user agent, and any date slice.

Not implemented (yet)

It would be nice to have some insights into “What was this spike caused by?”

Some basic breakdown by country would be nice. I do have IP addresses (for what they are worth), but I need a way to package GeoIP into some reasonable size (under 1 Mb, preferably; some loss of resolution is okay).

Finally, one thing I am really interested in is “Who wrote about me?” I do have referrers, only question is how to separate signal from noise.

Performance. DuckDB is a sport: it compresses data and runs column queries, so storing extra columns per row doesn’t affect query performance. Still, each dashboard hit is a query across the entire database, which at this moment (~3 years of data) sits around 600 MiB. I definitely need to look into building some pre-calculated aggregates.

One day.

How to get

Head to github.com/tonsky/clj-simple-stats and follow the instructions:

Let me know what you think! Is it usable to you? What could be improved?

P.S. You can try the live example at tonsky.me/stats. The data was imported from Nginx access logs, which I turned on and off on a few occasions, so it’s a bit spotty. Still, it should give you a general idea.

Permalink

Senior Backend Engineer (f/m/d) at HolidayPirates GmbH

eur64000 - eur75000

AHOY MATE!

Are you a backend developer passionate about Clojure and functional programming?

Do you enjoy solving complex problems and building clean, maintainable systems?

Yo-ho-ho! Then you are the pirate we are looking for!

As a Senior Backend Developer, you’ll be a key part of our backend team, designing, building, and maintaining some of the core applications of our platform. You’ll collaborate closely with engineers, product managers, and stakeholders to develop scalable, high-quality solutions using Clojure and other modern technologies.

DUTIES ON DECK

Architect, design, and implement backend services in Clojure.
Build scalable APIs and services to support high-throughput, low-latency applications.
Improve system performance, reliability, and observability.
Collaborate with product and engineering teams to deliver well-designed solutions.
Maintain and improve existing codebase with a focus on quality and long-term maintainability.
Lead technical discussions and contribute to best practices, tooling, and architecture.

YOUR TREASURE OF EXPERIENCE

5+ years of backend development experience.
Working experience in Clojure or already learning it because of interest.
Good understanding of functional programming principles and best practices.
Experience with relational databases (e.g., PostgreSQL) and caching/key-value stores (e.g., Redis).
Experience with event-driven systems.
Familiarity with performance monitoring tools and automated testing tools.
Knowledge of CI/CD pipelines and DevOps practices.
Experience with AWS and IaC tools such as Terraform is a plus.
Strong communication skills and the ability to work effectively in a collaborative environment.

THE PIRATE SHIP OFFERS YOU

Inclusive & Diverse: We welcome all pirates from every port; whoever you are, you belong here.
Transparent Pay: Salary bands are clearly communicated from the start no guesswork, just fairness.
Work-Life Adventure: Enjoy workations and exclusive travel perks to keep your explorer’s spirit alive.
Home Office Allowance: Get €50/month to keep your remote setup smooth sailing.
A Ticket Edenred Card: €50/month to keep you fueled, whether you’re snacking at sea or lunching in port.
Tools: A MacBook for work and personal use, plus any extra equipment you need (and a very lovely remote IT support crew).
Wellbeing Support: Access free life coaches, psychologists, and nutritionists to support your mental and physical health.
Learning & Growth: Use your personal training budget to level up your skills at your own pace.
Extra Perks: Private travel insurance, and corporate discounts with brands like Adidas, Apple, LG, and more.
A Truly Global Crew: Work alongside inspiring crewmates from across the globe and drop anchor in our Berlin Mitte office whenever you like (bonus: it’s dog friendly, so feel free to bring your mate!).
Legendary Events: From our annual summit to team get-togethers, we celebrate well and often.
Visa Assistance: We support relocation and visa processes for international candidates.

Permalink

Stop Round-Tripping Your Codebase: How to Cut LLM Token Usage by 80% Using Recursive Document Analysis

When you employ AI agents such as Claude, there’s a significant volume problem for document study. Reading one file of 1000 lines consumes about 10,000 tokens. Token consumption incurs costs and time penalties. Codebases with dozens or hundreds of files, a common case for real world projects, can easily exceed 100,000 tokens in size when the whole thing must be considered. The agent must read and comprehend, and be able to determine the interrelationships among these files. And, particularly, when the task requires multiple passes over the same documents, perhaps one pass to divine the structure and one to mine the details, costs multiply rapidly.

Matryoshka is a tool for document analysis that achieves over 80% token savings while enabling interactive and exploratory analysis. The key insight of the tool is to save tokens by caching past analysis results, and reusing them, so you do not have to process the same document lines again. These ideas come from recent research, and retrieval-augmented generation, with a focus on efficiency. We'll see how Matryoshka unifies these ideas into one system that maintains a persistent analytical state. Finally, we'll take a look at some real-world results analyzing the anki-connect codebase.

The Problem: Context Rot and Token Costs

A common task is to analyze a codebase to answers a question such as “What is the API surface of this project?” Such work includes identifying and cataloguing all the entry points exposed by the codebase.

Traditional approach:

1. Read all source files into context (~95,000 tokens for a medium project)
2. The LLM analyzes the entire codebase’s structure and component relationships
3. For follow-up questions, the full context is round-tripped every turn

This creates two problems:

Token Costs Compound

Every time, the entire context has to go to the API. In a 10-turn conversation about a codebase of 7,000 lines, almost a million tokens might be processed by the system. Most of those tokens are the same document contents being dutifully resent, over and over. The same core code is sent with every new question. This redundant transaction is a massive waste. It forces the model to process the same blocks of text repeatedly, rather than concentrating its capabilities on what’s actually novel.

Context Rot Degrades Quality

As described in the Recursive Language Models paper, even the most capable models exhibit a phenomenon called context degradation, in which their performance declines with increasing input length. This deterioration is task-dependent. It’s connected to task complexity. In information-dense contexts, where the correct output requires the synthesis of facts presented in widely dispersed locations in the prompt, this degradation may take an especially precipitous form. Such a steep decline can occur even for relatively modest context lengths, and is understood to reflect a failure of the model to maintain the threads of connection between large numbers of informational fragments long before it reaches its maximum token capacity.

The authors argue that we should not be inserting prompts into the models, since this clutters their memory and compromises their performance. Instead, documents should be considered as external environments with which the LLM can interact by querying, navigating through structured sections, and retrieving specific information on an as-needed basis. This approach treats the document as a separate knowledge base, an arrangement that frees up the model from having to know everything.

Prior Work: Two Key Insights

Matryoshka builds on two research directions:

Recursive Language Models (RLM)

The RLM paper introduces a new methodology that treats documents as external state to which step-by-step queries can be issued, without the necessity of loading them entirely. Symbolic operations, search, filter, aggregate, are actively issued against this state, and only the specific, relevant results are returned, maintaining a small context window while permitting analysis of arbitrarily large documents.

Key point is that the documents stay outside the model, and only the search results enter the context. This separation of concerns ensures that the model never sees complete files, instead, a search is initiated to retrieve the information.

Barliman: Synthesis from Examples

Barliman, a tool developed by William Byrd and Greg Rosenblatt, shows that it is possible to use program synthesis without asking for precise code specifications. Instead, input/output examples are used, and a solver engine is used as a relational programming system in the spirit of miniKanren. Barliman uses such a system to synthesize functions that satisfy the constraints specified. The system interprets the examples as if they were relational rules, and the synthesis engine tries to satisfy them. This approach makes it possible to describe what is desired for concrete test cases.

The approach is to simply show examples of the kind of behavior one wishes the system to exhibit, letting it derive the implmentation on its own. Thus, the emphasis shifts from writing long and detailed step-by-step recipes for behavior to simply portraying, in a declarative fashion, what the desired goal is.

Matryoshka: Combining the Insights

Matryoshka incorporates these insights into a functioning system for LLM agents. A practical tool is provided that enables agents to decompose challenging tasks into a sequence of smaller and more manageable objectives.

1. Nucleus: A Declarative Query Language

Instead of issuing commands, the LLM describes what it wants, using Nucleus, a simple S-expression query language. This changes the focus from describing each step to specifying the desired outcome.

(grep "class ")           ; Find all class definitions
(count RESULTS)           ; Count them
(map RESULTS (lambda x    ; Extract class names
  (match x "class (\\w+)" 1)))

We observe that the declarative interface retains its robustness even when the LLM employs different vocabulary or sentence structures. This robustness originates from the system’s commitment to elucidating the underlying intent of a request, independent of superficial linguistic variations.

2. Pointer-Based State

The key new insight is that we can separate the results from the context. Results are now stored in the REPL state, rather than in the context.

When Claude runs (grep "def ") and gets 150 matches:

Traditional tools: All 150 lines are fed into context, and round-tripped every turn
Matryoshka: Binds matches to RESULTS in the REPL, returning only "Found 150 results"

The variable RESULTS is bound to the actual value in the REPL. This binding acts as a pointer, revealing the location of the data within the server's memory. Subsequent operations, queries, for example, or updates, use this reference to access the data. But the data itself never actually enters the conversation:

Turn 1: (grep "def ")         → Server stores 150 matches as RESULTS
                              → Context gets: "Found 150 results"

Turn 2: (count RESULTS)       → Server counts its local RESULTS
                              → Context gets: "150"

Turn 3: (filter RESULTS ...)  → Server filters locally
                              → Context gets: "Filtered to 42 results"

The LLM never sees the 150 function definitions, just the aggregated answers from these functions.

3. Synthesis from Examples

When queries need custom parsing, Matryoshka synthesizes functions from examples:

(synthesize_extractor
  "$1,250.00" 1250.00
  "€500" 500
  "$89.99" 89.99)

The synthesizer learns the pattern directly from examples, obtaining numerical values straight from the currency strings and entirely circumventing the need to construct manual regex.

The Lifecycle

A typical Matryoshka session:

1. Load Document

(load "./plugin/__init__.py")
→ "Loaded: 2,244 lines, 71.5 KB"

The document is parsed and stored server-side. Only metadata enters the context.

2. Query Incrementally

(grep "@util.api")
→ "Found 122 results, bound to RESULTS"
   [402] @util.api()
   [407] @util.api()
   ... (showing first 20)

Each query returns a preview plus the count. Full data stays on server.

3. Chain Operations

(count RESULTS)           → 122
(filter RESULTS ...)      → "Filtered to 45 results"
(map RESULTS ...)         → Transforms bound to RESULTS

Operations chain through the RESULTS binding. Each step refines without re-querying.

4. Close Session

(close)
→ "Session closed, memory freed"

Sessions auto-expire after 10 minutes of inactivity.

How Agents Discover and Use Matryoshka

Matryoshka integrates with LLM agents via the Model Context Protocol (MCP).

Tool Discovery

When Claude Code starts, it launches Matryoshka as an MCP server and receives a tool manifest:

{
  "tools": [
    {
      "name": "lattice_load",
      "description": "Load a document for analysis..."
    },
    {
      "name": "lattice_query",
      "description": "Execute a Nucleus query..."
    },
    {
      "name": "lattice_help",
      "description": "Get Nucleus command reference..."
    }
  ]
}

Claude sees the available tools and their descriptions. When a user asks to analyze a file, Claude decides which tools to use based on the task.

Guided Discovery

The lattice_help tool returns a command reference, teaching the LLM the query language on-demand:

; Search commands
(grep "pattern")              ; Regex search
(fuzzy_search "query" 10)     ; Fuzzy match, top N
(lines 10 20)                 ; Get line range

; Aggregation
(count RESULTS)               ; Count items
(sum RESULTS)                 ; Sum numeric values

; Transformation
(map RESULTS fn)              ; Transform each item
(filter RESULTS pred)         ; Keep matching items

The agent learns capabilities incrementally rather than needing upfront training.

Session Flow

User: "How many API endpoints does anki-connect have?"

Claude: [Calls lattice_load("plugin/__init__.py")]
        → "Loaded: 2,244 lines"

Claude: [Calls lattice_query('(grep "@util.api")')]
        → "Found 122 results"

Claude: [Calls lattice_query('(count RESULTS)')]
        → "122"

Claude: "The anki-connect plugin exposes 122 API endpoints,
         decorated with @util.api()."

Each tool invocation maintains its own state within the conversation. So, for example, when a document is loaded, that content is retained in memory. Similarly, the results of any query that is executed are saved and available for later use.

Real-World Example: Analyzing anki-connect

Let's walk through a complete analysis of the anki-connect Anki plugin. Here we have a real-world codebase with 7,770 lines across 17 files.

The Task

"Analyze the anki-connect codebase: find all classes, count API endpoints, extract configuration defaults, and document the architecture."

The Workflow

The agent uses Matryoshka's prompt hints to accomplish the following workflow:

1. Discover files with Glob
2. Read small files directly (<300 lines)
3. Use Matryoshka for large files (>500 lines)
4. Aggregate across all files

Step 1: File Discovery

Glob **/*.py → 15 Python files
Glob **/*.md → 2 markdown files

File sizes:
  plugin/__init__.py    2,244 lines  → Matryoshka
  plugin/edit.py          458 lines  → Read directly
  plugin/web.py           301 lines  → Read directly
  plugin/util.py          107 lines  → Read directly
  README.md             4,660 lines  → Matryoshka
  tests/*.py           11 files      → Skip (tests)

Step 2: Read Small Files

Reading util.py (107 lines) reveals configuration defaults:

DEFAULT_CONFIG = {
    'apiKey': None,
    'apiLogPath': None,
    'apiPollInterval': 25,
    'apiVersion': 6,
    'webBacklog': 5,
    'webBindAddress': '127.0.0.1',
    'webBindPort': 8765,
    'webCorsOrigin': None,
    'webCorsOriginList': ['http://localhost'],
    'ignoreOriginList': [],
    'webTimeout': 10000,
}

Reading web.py (301 lines) reveals the server architecture:

- Classes: WebRequest, WebClient, WebServer
- JSON-RPC style API with jsonschema validation
- CORS support with configurable origins

Step 3: Query Large Files with Matryoshka

; Load the main plugin file
(load "plugin/__init__.py")
→ "Loaded: 2,244 lines, 71.5 KB"

; Find all classes
(grep "^class ")
→ "Found 1 result: [65] class AnkiConnect:"

; Count methods
(grep "def \\w+\\(self")
→ "Found 148 results"

; Count API endpoints
(grep "@util.api")
→ "Found 122 results"

; Load README for documentation
(load "README.md")
→ "Loaded: 4,660 lines, 107.2 KB"

; Find documented action categories
(grep "^### ")
→ "Found 13 sections"
   [176] ### Card Actions
   [784] ### Deck Actions
   [1231] ### Graphical Actions
   ...

Complete Findings

Metric	Value
Total files	17 (15 .py + 2 .md)
Total lines	7,770
Classes	8 (1 main + 3 web + 4 edit)
Instance methods	148
API endpoints	122
Config settings	11
Imports	48
Documentation sections	8 categories, 120 endpoints

Token Usage Comparison

Approach	Lines Processed	Tokens Used	Coverage
Read everything	7,770	~95,000	100%
Matryoshka only	6,904	~6,500	65%
Hybrid	7,770	~17,000	100%

The hybrid method achieves a 82% savings in tokens while retaining 100% of the original coverage. This approach combines two different strategies, one for compressing redundant information and one for preserving unique insights.

The pure Matryoshka approach ends up missing details from small files (configuration defaults, web server classes), because Claude only uses the tool to query large ones. The hybrid workflow does direct, full-content reads on small files, while leveraging Matryoshka to analyze bigger files, in a kind of divide-and-conquer strategy. All that's needed is to provide the agent an explicit hint on the strategy to use.

Why Hybrid Works

Small files (<300 lines) contain critical details:

- util.py: All configuration defaults, the API decorator implementation
- web.py: Server architecture, CORS handling, request schema

These fit comfortably in context, and there's no need to do anything different. Matryoshka adds value for:

- __init__.py (2,244 lines): Query specific patterns without loading everything
- README.md (4,660 lines): Search documentation sections on demand

Architecture

┌─────────────────────────────────────────────────────────┐
│                     Adapters                            │
│  ┌──────────┐  ┌──────────┐  ┌───────────────────────┐  │
│  │   Pipe   │  │   HTTP   │  │   MCP Server          │  │
│  └────┬─────┘  └────┬─────┘  └───────────┬───────────┘  │
│       │             │                    │              │
│       └─────────────┴────────────────────┘               │
│                          │                              │
│                ┌─────────┴─────────┐                    │
│                │   LatticeTool     │                    │
│                │   (Stateful)      │                    │
│                │   • Document      │                    │
│                │   • Bindings      │                    │
│                │   • Session       │                    │
│                └─────────┬─────────┘                    │
│                          │                              │
│                ┌─────────┴─────────┐                    │
│                │  NucleusEngine    │                    │
│                │  • Parser         │                    │
│                │  • Type Checker   │                    │
│                │  • Evaluator      │                    │
│                └─────────┬─────────┘                    │
│                          │                              │
│                ┌─────────┴─────────┐                    │
│                │    Synthesis      │                    │
│                │  • Regex          │                    │
│                │  • Extractors     │                    │
│                │  • miniKanren     │                    │
│                └───────────────────┘                    │
└─────────────────────────────────────────────────────────┘

Getting Started

Install from npm:

npm install matryoshka-rlm

As MCP Server (Claude Code / Claude Desktop)

Add to your Claude configuration:

{
  "mcpServers": {
    "lattice": {
      "command": "npx",
      "args": ["lattice-mcp"]
    }
  }
}

Programmatic Use

import { NucleusEngine } from "matryoshka-rlm";

const engine = new NucleusEngine();
await engine.loadFile("./document.txt");

const result = engine.execute('(grep "pattern")');
console.log(result.value); // Array of matches

Interactive REPL

npx lattice-repl
lattice> :load ./data.txt
lattice> (grep "ERROR")
lattice> (count RESULTS)

Conclusion

Matryoshka embodies the principle, emerging from RLM research, that documents are to be treated as external environments rather than as contexts to be parsed. This principle alters the fundamental character of the model’s engagement, no longer a passive reader but an active agent, navigating through and interrogating a document to extract specific information, somewhat as a programmer would browse through code. Combined with Barliman-style synthesis, in which a solution is built up in a series of small, well-defined steps, and pointer-based state management, it achieves:

- 82% token savings on real-world codebase analysis
- 100% coverage when combined with direct reads for small files
- Incremental exploration where each query builds on previous results
- No context rot because documents stay outside the model

We observe that variable bindings such as RESULTS refer to REPL state rather than holding data directly in model context. As we formulate and submit queries, what is sent to the server are mere pointers, placeholders indicating where the actual computation should occur. It is the server that executes the substantive computational tasks, returning only the distilled results.

The tool is open source: https://github.com/yogthos/Matryoshka

Permalink

Java’s Plans for 2026 and new curiosities from JDK Mailing Lists - JVM Weekly vol. 159

New year, new plans, new promises we’ll look back on in twelve months with a mixture of nostalgia and disappointment.

But rather than dwelling too much on those New Year’s resolutions - there’ll be plenty of opportunity for that - let’s take a look at what OpenJDK teams themselves are saying about their plans for 2026.

OpenJDK Plans for 2026 - Valhalla Getting Closer, Amber Not Slowing Down, Loom Nearly Complete

I have to admit: Nicolai Parlog from Oracle gave me a pleasant surprise. His New Year’s episode of Inside Java Newscast extracted concrete details from people working on individual projects. An “anonymous source” (whose identity anyone who’s ever heard Brian Goetz will immediately guess) dropped some bombs worth discussing.

Valhalla is targeting JDK 28. This is probably the most important piece of news. Our Anonymous Source revealed that value types won’t make it into JDK 27 - but not because they aren’t ready.

“We’re bringing an elephant onto a train and want to make sure we get into an empty car.”

JDK 27 forks in June, so mainline will switch to 28 then - with room for JEP 401. After value types, nullness markers, array improvements, and primitive-wrapper unification are queued up - but that’s a perspective for later releases.

Vector API is waiting for Valhalla. JDK 26 will see its eleventh (!) incubation - and it’ll stay that way until value types land in mainline. When that happens, the implementation will be rewritten and the API moved from jdk.incubator.vector to the proper java package.

Leyden - AOT code compilation. The AOT cache will contain not just loaded classes and method profiles, but compiled machine code as well. The runtime will be able to pull optimized code straight from the cache, dramatically reducing warmup time.

Structured Concurrency nearing finalization. After the revamp in JDK 25, the API will go through preview with minimal changes - Nikolai rates the chances of finalization this year as good. This is the last piece of the big Project Loom picture (though some would like to see a bit of “scope creep” here - in the positive way - which we’ll get to shortly).

Amber - constant patterns and pattern assignment. The team is “knee-deep in the second phase of pattern matching.” Two features are mature enough that JEPs may appear this year. We’ll examine all of this ourselves in a moment since the details from the mailing lists are very interesting, but apparently ideas for generalizing records and pattern matching to classes and interfaces are popping up on amber-spec-experts. So “things are happening.”

Babylon preparing code reflection incubation. The technology allowing frameworks to reflect over code in methods and lambdas is developing well—we should hear more this year. I’m personally eager for any announcements related to GPU support.

That’s the official plans. But as usual, the most interesting things happen at the margins—in experimental branches and mailing list discussions. And that’s exactly where Valhalla is showing that value types are just the beginning of a much more ambitious story.

Type Classes - Valhalla experiments with the next step

Maurizio Cimadamore announced on the valhalla-dev list the publication of an experimental type classes prototype - a mechanism Brian Goetz presented at JVMLS 2025 in his talk Growing the Java Language. The code landed in a new type-classes branch in the Valhalla repository.

What problem do type classes solve? Today in Java, you can’t write generic mathematical code that works on int, BigDecimal, and a hypothetical Float16 alike. Interfaces require types to explicitly implement them - you can’t say Integer implements Addable without modifying the Integer class. Type classes (known from Haskell, appearing as traits in Rust) invert this relationship: instead of requiring implementation in the class, you provide an external “witness” that says “here’s how to add values of type X.” Witnesses can be defined for any type, even someone else’s.

record MyInt(int x) { }

interface Sum<X> {
    X zero();
    X add(X a, X b);
}

__witness Sum<MyInt> SUM_MYINT = new Sum<>() {
    MyInt zero() { return new MyInt(0); }
    MyInt add(MyInt a, MyInt b) { return new MyInt(a.x + b.x); }
};

// Usage:
Sum<MyInt> sum = Sum<MyInt>.__witness;
MyInt zero = sum.zero();
MyInt one = new MyInt(1);
assert sum.add(zero, one).equals(one);

In this prototype, you can define a type class and a witness for a specific type. For example, here’s how an addition type class (Sum) and its witness for a value class MyInt are defined.

Sum<X> is a generic interface representing a type class for addition.
__witness defines a witness for how MyInt implements that type class externally.
The witness can then be looked up and used at runtime.

This prototype enables external definitions of operations for types without modifying those types themselves, addressing a long-standing limitation in Java: today it’s impossible to write truly generic mathematical code that works uniformly across primitives like int, boxed types such as BigDecimal, and custom numeric or value types, because interfaces require the types to explicitly implement them. Type classes invert this relationship by allowing behavior to be attached externally via so-called “witnesses,” making it possible to state “here’s how type X does addition” without editing X at all.

Goetz explained the broader vision at JVMLS: type classes are meant to enable operator overloading for value types, collection literals, or new numeric types with full support for +, -, * - but without the operator hell known from C++, since they’d be limited to value classes only.

Maurizio is very clear on one point, though: this is purely a space for experimentation, not a proposal for inclusion in the Java platform, and any JEP is still a long way off.

With Valhalla, as ever, patience is part of the lesson. However, they have even more under they sleeve now!

Null Checks get concrete - The “Bang World” Prototype

Daniel Smith announced on valhalla-spec-experts another prototype branch worth watching: bworld (short for “bang world”). This one tackles the long-awaited nullness markers - specifically, the runtime enforcement side of making ! actually mean something.

The idea is straightforward: mark types with ! to indicate a non-null barrier, and have the JVM enforce it. You’ll be able to use ! on field types, local variables, method parameters, return types, casts, instanceof, and array element types. And crucially - this isn’t limited to value classes. Any reference type can be marked.

What happens at runtime? The compiler generates calls to a new java.lang.runtime.Checks API (deliberately not Objects.requireNonNull - they want the JVM to have freedom to treat these checks specially). What those it mean in practice?

A cast to String! will throw if you pass null
A field declared as String! name must be initialized before the super() call.
Arrays created with new Foo![]{a, b, c} will reject null writes dynamically.

Daniel notes that current implementation only fully supports runtime checks for value-class-typed fields and arrays - other reference types will get the metadata in the class file, but enforcement is coming later.

The prototype also includes optional lint warnings for suspicious patterns - like assigning null literals to ! targets or removing ! markers when overriding methods. But the most interesting bit is “use-site checks”: the compiler can insert null checks when calling methods from untrusted binaries! This neatly exposes the real problem: not greenfield code, but millions of libraries that were written when null markers weren’t even imaginable.

That’s the real challenge: how to introduce null-safety into a 30-year-old ecosystem gradually, without breaking the world? But as Daniel puts it:

This is not the final version of the feature... it’s a snapshot. But we’ve been wanting something concrete that we could play with.

The Kotlin crowd will feel right at home, except for one crucial difference: Kotlin’s null-safety is purely a compiler-land, erased at runtime. Java is building actual JVM enforcement. Slower to arrive, but when a String! field says “never null,” the runtime will back that promise up.

Ephemeral Threads - Clojure knocks on Project Loom’s Door “With a Request”

Mama, take this badge off of me
I can’t use it anymore
It’s gettin’ dark, too dark for me to see
I feel like I’m knockin’ on heaven’s door

Since Structured Concurrency is approaching finalization, can Project Loom be considered “finished”? The community has a different opinion. A heated discussion (30+ emails in a week) erupted on the loom-dev list, initiated by Alex Miller from the Clojure team.

The topic? So-called “ephemeral threads”- threads that can be garbage collected before they finish their work. Sounds like heresy in the Java world, where for 30 years an iron rule has applied: threads are GC roots and live until they complete their work. But for the Clojure community, this has been daily bread for over a decade.

What’s the deal? The core.async library lets you create lightweight “go blocks” that wait for data from channels. If all channels become unreachable, the block can also be collected by GC - since it would never wake up anyway. Elegant and practical when building pub/sub or pipelines.

The problem: after migrating to virtual threads, the pattern stopped working. Alan Bateman, Tech Lead of Loom, is however skeptical about the ephemeral thread concept, pointing to deeper complications: interactions with finalizers and cleaners can lead to scenarios from mildly creepy to truly terrifying.:

The possibility of GC’ing a started thread before it terminates is a scary topic. It interacts with many areas and gets really scary once you bring phantom refs, cleaners, and finalizers into the discussion.
For these so-called “forgotten sender” and “abandoned receiver” cases then it might be more interesting to see how they could be work with structured concurrency. Right now, the first API is focused on fan-out scenarios but in time we would like to have it work with channel like constructs too. I suspect this will be closer to what you are interested in.

Since JDK 21, VTs are tracked by default for diagnostic tools, so “abandoned” threads hang in memory indefinitely. The flag - Djdk.trackAllThreads=false helps, but Miller rightly asks about its future, and hears from Alan Bateman:

It's clearly an attractive nuisance right now and setting it to
false is specific to the root "thread grouping". There is some
performance work required in that area but otherwise I think it needs to
be removed.

The argument for ephemeral threads is simple: virtual threads open the door to Erlang-style architectures where lightweight processes can be abandoned when they become redundant. Miller writes directly:

Most of these constructs work as infinite loops without persistent references. You simply can’t build such libraries with the traditional approach to thread termination.

Similar voices came from other users experimenting with their own schedulers - they want to use VTs as an invisible implementation detail where the end programmer doesn’t think about threads at all.

Oracle has serious concerns, however. The main one: debugging. Viktor Klang illustrates with an example - code acquires a file descriptor, parks the thread, then releases it. If the thread gets collected while parked, the descriptor leaks without a trace.

This easily leads to problems where resources leak without any trace of who lost the - which can be nightmarish in production,

argues Klang.

Andrew Haley from Red Hat offered an interesting counterargument: if a thread is waiting on an unreachable semaphore, it will never release resources anyway - whether it takes up memory or not, the problem is the same.

There is a light at the end of the tunnel, however. Bateman suggests that cases of “abandoned” threads might play better with structured concurrency, which over time is meant to handle channel constructs as well. For Clojure, though, this means waiting - an official API for ephemeral threads probably won’t materialize, and the unofficial flag will likely disappear.

The discussion shows a broader trend: virtual threads opened Pandora’s box of new patterns that the JVM ecosystem is only beginning to explore.

Project Amber in 2026 - Pattern Assignment and Constant Patterns on the Horizon

And finally, Gavin Bierman from the Amber team shared details of plans for 2026 on the amber-spec-experts list. Beyond continuing work on Primitive Patterns (currently in preview), two new features are in the pipeline - draft JEPs should appear soon.

Pattern Assignment solves an irritating problem: sometimes we use pattern matching not because something might match, but because we want to conveniently decompose a value into parts. Today we have to write:

void process(ColorPoint cp) {
    if (cp instanceof ColorPoint(var x, var y, var c)) {
        // actual code, unnecessarily nested
    }
}

The compiler and programmer both know the pattern will always work—but the syntax forces us to pretend it’s a conditional operation. The new proposal will let you simply write:

void process(ColorPoint cp) {
    ColorPoint(var x, var y, var c) = cp;  // Pattern Assignment!
    // actual code, no nesting
}

Constant Patterns is the second proposal, simplifying a common case—matching against a specific value. Instead of:

case Point(var x, var y) when x == 0 && y == 0 -> { /* origin */ }

You’ll be able to write:

case Point(0, 0) -> { /* origin */ }

Constants (including null) will be able to appear directly as nested patterns - which will somewhat unify the awkward division between “case constants” and “case patterns.”

Finally, a small observation from my functional heart (working at the company behind Scala obliges 😉).

Looking at all these discussions, it’s hard not to notice a common denominator: functional languages and their concepts remain a key reference point for JVM evolution. Type classes are a mechanism straight from Haskell. Pattern assignment is essentially the equivalent of let with deconstruction known from ML-family languages. And ephemeral threads? It’s a request for semantics that Erlang and its descendants have treated as obvious for years.

But here’s where it gets interesting: Java isn’t so much “adopting” these concepts as conducting a sort of dialogue with them - and often says “no” or “yes, but.” Type classes will be limited to value classes to avoid “operator overloading hell.” Ephemeral threads? Bateman politely suggests that maybe structured concurrency will someday handle these cases - which in practice means “we’ll do it our way or not at all.” Pattern matching is evolving so cautiously that Scala had time to have it, stop having it (in the sense of: stop being fashionable), and have it again before Java got to constant patterns.

And this is precisely the paradox that fascinates me: paradoxically, Java has probably become the most important testing ground for functional ideas in the mainstream—not despite its conservatism, but because of it. Every feature goes through such brutal mills of backward compatibility, edge case analysis, and years of preview that what comes out the other side is... surprisingly solid. Haskell’s type classes are elegant but also notoriously difficult for the average programmer to understand. Java will probably produce something less elegant, more “corporate”- and paradoxically more useful for most developers.

The memeification of Monads was a disaster to the FP world. : r/ProgrammerHumor

There’s a certain irony in all of this: Clojure, the language that since 2007 has been proving that functional programming on the JVM is possible and practical, is now asking for functionality it implemented itself for a decade—but can’t port to virtual threads without platform support. Alex Miller knocks on the door with a proposal inspired by Erlang, and Java responds in the style of: “interesting, but have you thought about interaction with finalizers? Because we have, and we have nightmares.”

Maybe that’s exactly why this relationship works. Functional languages explore, Java stabilizes. Erlang shows that ephemeral processes are possible, Clojure proves they work in practice on the JVM, and ten years from now Java will introduce something called “Scoped Ephemeral Task Contexts” that will work with every version back to JDK 8. And mass-market enterprise will finally get a feature that functional programmers were talking about at conferences in 2015. That’s the deal—and honestly, I’m not sure there’s a better model for evolving a programming language that has to support billions of lines of production code.

PS: If you want to go deeper - see you in person 👋

If this edition resonated and you’d like to continue the conversation offline, I’ll be talking about JVM in the Age of AI at a few upcoming events:

🇸🇪 JVM in the Age of AI: 2026 Edition @ JFokus 2026

I’ll be running a 90-minute Deep Dive session focused on what really needs to happen inside the JVM for it to remain a serious platform for AI and ML - hardware, Valhalla, Babylon, GPU offloading, TornadoVM, Llama3.java, and the 2026 perspective.

I’ll also be doing two polish JUGs during one week, talking about “Agentic Systems beyond localhost” with more room for questions (and for me to fill 😁)

21.01 - Wrocław Java User Group (With One-and-Only Jarek Pałka)
22.01 - Kielce Java User Group

Love the meetup, as they give a bit more freedom to speaker - and I always like to adapt to what you want to dig into.

If you’re around: come say hi, argue, disagree, or just nerd out about the JVM.

Before we close this edition, one sad piece of news.

Scott Adams has passed away.

I know - he was a controversial figure, especially in his later years, and not everyone agreed with his views. But there’s no denying one thing: his humor was singular, sharp in a way that only engineers and office survivors truly appreciate.

For years, Dilbert perfectly captured the absurdities of corporate life, technical organizations, and management theater — and it did so with a precision that made it a recurring guest in this newsletter. Many of us laughed because it was funny; many of us winced because it was accurate.

Whatever one thinks about Scott Adams the person, Scott Adams the cartoonist shaped a generation of engineers and gave us a shared language to talk about dysfunction, nonsense, and the quiet heroism of surviving another meeting that should have been an email.

All time favorite Dilbert cartoon : r/dilbert

Rest in peace, Scott.

Permalink

My understanding of XTDB (Immutable Databases)

Notes

Permalink

Data-Driven Team Development – How OTTO Leads Tech Teams to Success

What makes tech teams truly effective? How we use data-driven team development to measure effectiveness.

Permalink

garden CSS library for Clojure, and Liberation, my Clojure pet project

Notes

Permalink

Building a SPLOM using geom.viz

Progressive tutorial building scatter plot matrices step-by-step with thi.ng/geom.viz

Permalink

Clojure Deref (Jan 13, 2026)

Welcome to the Clojure Deref! This is a weekly link/news roundup for the Clojure ecosystem (feed: RSS).

Upcoming Events

Scicloj AI Meetup: Agent-o-rama: Jan 17
Lexical Complexity in Software Engineering (by Samantha Cohen): Feb 10
Clojure Jam 2026: Apr 18-19 & 25-26. Online & free! CFP is open until Jan 31st.
Babashka Conf: May 8. Amsterdam, NL. Free registration, but tickets are limited!
Dutch Clojure Days 2026: May 9th. Amsterdam, NL. Join the waitlist, or the CFP is open until Jan 15th.

Podcasts, videos, and media

macro based decorator in Clojure - Clojure Diary
Statistics - Calculating Rate of Change in Clojure - Clojure Diary
I created my own CLASS and Object system for Clojure and Clojurescript - Sammy Engineering

Clojure/Conj 2025

Making Tools Developers Actually Use - Michiel Borkent - ClojureTV
How to stick with your projects, even when they’re janky - Wilkerson - ClojureTV
Forklifts, Facts, and Functions: Building a Warehouse Management System with Clojure+Datomic - Pote - ClojureTV

Blogs, articles, and news

Which programming languages are most token-efficient? - Martin Alderson
Plotting Datoms: Queries as Visual Mappings – Clojure Civitas - Timothy Pratley
Open source news: 2025 Nov-Dec - Peter Taoussanis
Wiring Clojure Web Apps with Aero, Pedestal, and Integrant - Dan Peddle
Raylib + Clojure = Live coding a high performance game - Ertuğrul Çetin
Serving webapps from your REPL – Clojure Civitas - Timothy Pratley
Tetris-playing AI the Polylith way - Part 2 - Joakim Tengstrand
LLM Agents on the JVM: Clojure vs Python Comparison - Oleksandr Druk, Sofiia Yurkevska
Hash Fusing – Clojure Civitas - Jonathan Claggett
Building a SPLOM using geom.viz – Clojure Civitas - Daniel Slutsky
Datascript + xitdb: your humble, single-file, mini Datomic - Radar Roark
Updating 100,000 cubes instantly using Clojure + LWJGL - Ertuğrul Çetin
🌈 JVM Rainbow - Mixing Java Kotlin Scala Clojure and Groovy - Hakan Altındağ

Libraries and Tools

Debut release

todo-dbval-event-sourcing - Todo example that uses dbval and event sourcing
deft - A collection of macros designed to address issues with objects in Clojure.
raylib-clojure-playground - A collection of game development experiments using Raylib in Clojure.
clojure-100k-cubes-lwjgl - A GPU stress test written in Clojure using LWJGL 3. Renders 100,000 animated cubes with instanced rendering.
shadow-cljs-vite-plugin - A robust Vite plugin for seamless integration with shadow-cljs

Updates

clj-async-profiler 2.0.0-beta1 - Embedded high-precision Clojure profiler
reitit 0.10.0 - A fast data-driven routing library for Clojure/Script
qclojure 0.24.0 - A functional quantum computer programming library for Clojure with backend protocols, simulation backends and visualizations.
clj-threats 1.0.0 - Clojure implementation of Threagile
csvx 973ab7f - A zero dependencies tool that enables you to control how to tokenize, transform and handle files with char(s) separated values in Clojure, ClojureScript and Babashka.
dompa 1.2.2 - A zero-dependency, runtime-agnostic HTML parser and builder.
clay 2.0.5 - A REPL-friendly Clojure tool for notebooks and datavis
dataspex 2026.01.1 - See the shape of your data: point-and-click Clojure(Script) data browser
clj-kondo 2026.01.12 - Static analyzer and linter for Clojure code that sparks joy
quiescent 0.1.10 - A Clojure library for composable async tasks with automatic parallelization, structured concurrency, and parent-child and chain cancellation
eca 0.91.1 - Editor Code Assistant (ECA) - AI pair programming capabilities agnostic of editor
babashka 1.12.214 - Native, fast starting Clojure interpreter for scripting
editscript 0.7.0 - A library to diff and patch Clojure/ClojureScript data structures

Permalink

Grounding LLMs with Recursive Code Execution

Despite context windows expanding to millions of tokens, LLMs still struggle with the fundamental task of precision. When you ask an LLM to "analyze this report," it often glances at the text and simply hallucinates a plausible-sounding answer based on probability.

A good example of the problem can be seen when asking a model to sum sales figures from a financial report. Left to its own devices, it will likely not bother reading the whole document and simply give you a made-up answer. This is especially a problem with smaller models that you can run locally.

The standard approach to dealing with this problem is to use Retrieval Augmented Generation (RAG), which relies on semantic similarity (embeddings). If you ask for "sales figures," a Vector DB retrieves chunks of text that sound like sales figures. However, semantic similarity is fuzzy and limited in functionality. Embeddings can't count, so you can't ask questions like "count the number of times X happens." They also can't handle information scattered across a bunch of unrelated lines in a document. Furthermore, they don't distinguish between concepts like "Projected Sales" and "Actual Sales" when they appear in similar contexts.

It would be nice to have a system that treats text as a dataset to be queried rather than a prompt to be completed. This is where the Recursive Language Model paper comes in. The core idea here is that instead of having the model operate directly on the document, it uses a programmatic interface to interact with it via a REPL. The model acts as a programmer writing code to explore the document, interpreting execution results, and only then formulating an answer based on them.

The core insight is that code execution provides grounding for the model. When an LLM guesses a number by trying to understand the document, it might be right, or it might be wrong. It has no way to know. When it writes regex.match() and the computer returns ['$2,340,000'], that result is a hard fact. What the model needs to understand is how to form a query—a general task it's likely good at—instead of trying to solve a domain-specific problem it has no direct training on.

Allowing an LLM to write and run code directly on your system would obviously be a security nightmare, so the implementation uses isolated-vm to create a secure sandbox for it to play in. The model cannot hallucinate rm -rf / or curl a random URL. Having a sandbox also prevents infinite loops or memory leaks. And since the document is immutable, the model can read it but cannot alter the source truth.

The process works as follows:

1. The document is loaded into a secure, isolated Node.js environment as a read-only context variable.
2. The model is given exploration tools: text_stats(), fuzzy_search(), and slice().
3. The Loop:
- • The model writes TypeScript to probe the text.
- • The Sandbox executes it and returns the output.
- • The model reads the result and refines its next step.
4. The loop iterates until the model has enough proven data to answer FINAL("...").

RLM execution model

The system can work entirely locally using something like Ollama with Qwen-Coder, or with hosted models like DeepSeek, which are much smarter by default. It also works as an MCP that you can plug in and let your agent use to solve problems.

Finally, I used Universal Tool Calling Protocol (UTCP) patterns from code-mode to generate strict TypeScript interfaces. This provides the LLM with a strict contract such as:

// The LLM sees exactly this signature in its system prompt
declare function fuzzy_search(query: string, limit?: number): Array<{
  line: string;
  lineNum: number;
  score: number; // 0 to 1 confidence
}>;

One problem is that LLMs tend to be messy coders; they forget semicolons, use hallucinated imports, etc. The way around that is to add a self-healing layer. If the sandbox throws a syntax error, a lightweight intermediate step attempts to fix imports and syntax before re-running. This keeps the reasoning chain alive and minimizes round trips to the model.

As a demo to test out the concept, I made a document containing a bunch of scattered data, with 5 distinct sales figures hidden inside 4,700 characters of Lorem Ipsum filler and unrelated business jargon.

Predictably, feeding the text into a standard context window and asking for the total promptly resulted in a hallucinated total of $480,490. It just grabbed numbers that looked like currency from unrelated sections, mashed them together, and called it a day.

Running the same query through RLM was a completely different story. The model took 4 iterations to converge on the actual solution. Instead of trying to guess, it started writing code to explore the document. It first checked the file size:

const stats = text_stats();
console.log(`Document length: ${stats.length}, Lines: ${stats.lineCount}`);

Next, it used fuzzy search to locate relevant lines, ignoring the noise:

const matches = fuzzy_search("SALES_DATA");
console.log(matches);
// Output: [
//   { line: "SALES_DATA_NORTH: $2,340,000", ... },
//   { line: "SALES_DATA_SOUTH: $3,120,000", ... }
// ]

And finally, it wrote a regex to parse the strings into integers and summed them programmatically to get the correct result:

// ...regex parsing logic...
console.log("Calculated Total:", total); // Output: 13000000

Only after the code output confirmed the math did the model verify the answer.

The key difference is that the traditional approach asks the model "what does this document say," while the recursive coding approach asks it to "write a program to find out what this document says." The logic is now expressed using actual code, and the role of the LLM is to write the code and read the results as opposed to working with the document directly.

As with all things, there is a trade-off here: the RLM approach is slower since it takes multiple turns and can generate more tokens as a result. However, if the document you're working on is itself large, then you will actually save context tokens by not loading it directly.

MCP Integration

The project also includes an MCP (Model Context Protocol) server, making it available as a tool for coding agents like Crush. Once configured, you can ask the agent to analyze documents that would otherwise exceed its context window or require precise data extraction.

The server exposes an analyze_document tool that takes a query and file path. The tool can then use the RLM approach to explore documents by writing code, executing it in the sandbox, and iterating until it finds the answer.

This creates an interesting dynamic where you agent writes the high-level query, the RLM's backing model (which can be a local Ollama instance) does the iterative code exploration, and the verified results come back to your agent. The grounding problem is solved at the tool level, so the agent can trust the results it receives.

The implementation is available at https://github.com/yogthos/Matryoshka.

Permalink

Hash Fusing

Exploring Hash Fusing via Upper Triangular Matrix Multiplication.

Permalink

Tetris-playing AI the Polylith way - Part 2

The focus in this second part of the blog series is to showcase the benefits of getting quick feedback when working with code. We&aposll do this by implementing the removal of complete rows when a Tetris piece is placed on the board.

For example, if we rotate the red piece in the image above and place it in the third position, the two bottom rows should be cleared:

The resulting source code from this second blog post in the series can be found here:

The Clojure workspace
The Python workspace

REPL-driven development

If you&aposve read part one of the blog series, you already know that all code will be implemented in both Python and Clojure, so let&aposs start with the latter!

Clojure has something called a REPL (Read Eval Print Loop) that lets you write code in small steps, while getting quick feedback on whether the code works or not.

We&aposll start by creating a clear-rows namespace in the board component:

▾ tetris-polylith
  ▸ bases
  ▾ components
    ▾ board
      ▾ src
        clear-rows.clj
        core.clj
        interface.clj
      ▸ test
    ▸ piece
  ▸ development
  ▸ projects

Where we add a board row:

(ns tetrisanalyzer.board.clear-rows)

(def row [1 1 1 0 1 1 1 0 1 1])

In Clojure, we only need to compile the code that has changed. Since we&aposve added a new namespace and a row, we need to send the entire namespace to the REPL, usually via a key-shortcut, to get it compiled to Java bytecode.

A complete row contains no empty cells (zeros). We can use the some function to detect the presence of empty cells:

(some zero? row) ;; true

Here at least one empty cell has been found, which means the row is not complete. Let&aposs also test whether we can identify a complete row:

(ns tetrisanalyzer.board.clear-rows)

(def row [1 1 1 1 1 1 1 1 1 1])

(some zero? row) ;; false

Yes, it seems to work!

Now we can create a function from the code:

(ns tetrisanalyzer.board.clear-rows)

(defn incomplete-row? [row]
  (some zero? row))

(comment
  (incomplete-row? [1 1 1 1 1 1 1 0 1 1]) ;; true
  (incomplete-row? [1 1 1 1 1 1 1 1 1 1]) ;; false
  #__)

Here I&aposve added a comment block with a couple of calls to the function. From the development environment, we can now call one function at a time and immediately see the result, while the functions don&apost run if we reload the namespace. It&aposs quite common in the Clojure world to leave these comment blocks in production code so that functions can be easily called, while also serving as documentation.

We&aposll clean up the comment block and instead add a board so we have something to test against (commas can be omitted):

(ns tetrisanalyzer.board.clear-rows)

(defn incomplete-row? [row]
  (some zero? row))

(def board [[0 0 0 0 0 0 0 0 0 0]
            [0 0 0 0 0 0 0 0 0 0]
            [1 1 1 1 1 1 1 1 1 1]
            [1 1 1 1 1 1 0 0 1 1]
            [1 0 1 1 1 1 1 1 1 1]
            [1 1 1 1 1 1 1 1 1 1]])

Now we can calculate the rows that should not be removed:

(def remaining-rows (filter incomplete-row? board)) ;; ([0 0 0 0 0 0 0 0 0 0]
                                                    ;;  [0 0 0 0 0 0 0 0 0 0]
                                                    ;;  [1 1 1 1 1 1 0 0 1 1] 
                                                    ;;  [1 0 1 1 1 1 1 1 1 1])

The next step is to create the two empty rows that should replace the removed ones, which we finally put in empty-rows:

(def board-width (count (first board)))
(def board-height (count board))
(def num-cleared-rows (- board-height (count remaining-rows))) ;; 2
(def empty-row (vec (repeat board-width 0))) ;; [0 0 0 0 0 0 0 0 0 0]
(def empty-rows (repeat num-cleared-rows empty-row)) ;; ([0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0])

Here&aposs what the board looks like after complete rows have been removed and new empty replacement rows have been added at the beginning:

(vec (concat empty-rows remaining-rows)) ;; [[0 0 0 0 0 0 0 0 0 0]
                                         ;;  [0 0 0 0 0 0 0 0 0 0]
                                         ;;  [0 0 0 0 0 0 0 0 0 0]
                                         ;;  [0 0 0 0 0 0 0 0 0 0]
                                         ;;  [1 1 1 1 1 1 0 0 1 1]
                                         ;;  [1 0 1 1 1 1 1 1 1 1]]

The concat function combines the two lists and creates a new list with rows, while vec then converts the list to a vector. Note that both vec and concat return immutable data, which is standard for all data structures in Clojure.

Simplify

It occurred to me that we can simplify the code somewhat.

We&aposll start by making empty-board a bit more readable by adding empty-row:

(ns tetrisanalyzer.board.core)

(defn empty-row [width]
  (vec (repeat width 0)))

(defn empty-board [width height]
  (vec (repeat height (empty-row width))))

Then we can replace:

(def empty-row (vec (repeat board-width 0)))
(def empty-rows (repeat num-cleared-rows empty-row))

With:

(def empty-rows (core/empty-board board-width num-cleared-rows))

Now we can finally use let to combine the different calculation steps into a function:

(ns tetrisanalyzer.board.clear-rows
  (:require [tetrisanalyzer.board.core :as core]))

(defn incomplete-row? [row]
  (some zero? row))

(defn clear-rows [board]
  (let [width (count (first board))
        height (count board)
        remaining-rows (filter incomplete-row? board)
        num-cleared-rows (- height (count remaining-rows))
        empty-rows (core/empty-board width num-cleared-rows)]
    (vec (concat empty-rows remaining-rows))))

Since we&aposve already tested all the subexpressions, there&aposs a good chance that the function will work as expected:

(clear-rows board)  ;; [[0 0 0 0 0 0 0 0 0 0]
                    ;;  [0 0 0 0 0 0 0 0 0 0]
                    ;;  [0 0 0 0 0 0 0 0 0 0]
                    ;;  [0 0 0 0 0 0 0 0 0 0]
                    ;;  [1 1 1 1 1 1 0 0 1 1]
                    ;;  [1 0 1 1 1 1 1 1 1 1]]

And indeed, it looks correct!

We&aposll finish by creating a test in the new namespace clear-rows-test:

▾ tetris-polylith
  ▸ bases
  ▾ components
    ▸ board
      ▸ src
      ▾ test
        clear-rows-test.clj
        core-test.clj
    ▸ piece
  ▸ development
  ▸ projects

(ns tetrisanalyzer.board.clear-rows-test
  (:require [clojure.test :refer :all]
            [tetrisanalyzer.board.clear-rows :as sut]))

(deftest clear-two-rows
  (is (= [[0 0 0 0 0 0 0 0 0 0]
          [0 0 0 0 0 0 0 0 0 0]
          [0 0 0 0 0 0 0 0 0 0]
          [0 0 0 0 0 0 0 0 0 0]
          [1 1 1 1 1 1 0 0 1 1]
          [1 0 1 1 1 1 1 1 1 1]]
         (sut/clear-rows [[0 0 0 0 0 0 0 0 0 0]
                          [0 0 0 0 0 0 0 0 0 0]
                          [1 1 1 1 1 1 1 1 1 1]
                          [1 1 1 1 1 1 0 0 1 1]
                          [1 0 1 1 1 1 1 1 1 1]
                          [1 1 1 1 1 1 1 1 1 1]]))))

When we run the test, it shows green and we can thus move on to the Python implementation. But first, a few words about the workflow.

Work faster - in small steps

You might have noticed that we implemented the code before writing the test, and that we didn&apost write the entire function in one go. Instead, we introduced one small calculation step at a time, which we only put together into a complete function at the end. This allowed us to adjust the solution as our understanding grew, and we didn&apost need to keep everything in our heads. The brain has its limitations, so it&aposs important that we help it along a bit!

In Clojure, only what has changed is compiled, which usually goes lightning fast. This makes you forget that it&aposs actually a compiled language. You can open any file/namespace in the codebase, and execute a function, perhaps from an existing comment block, and immediately get a response back. Gone is the feeling that something stands between you and the code, in the form of waiting for the compiler to be satisfied.

It&aposs easy to become addicted to this immediate feedback, and the feeling is very similar to working with your hands, for example, when throwing pottery:

The contact with the clay resembles what you have when working in a REPL, an immediacy that lets you quickly test, adjust, and work toward an intended goal, in real time.

The absence of static typing means the compiler only needs to compile the small change that was just made and nothing else, which is a prerequisite for this fast workflow. Quality is achieved by testing the code often and in small steps, in combination with traditional testing and libraries like malli and spec to validate the data.

In languages that require more extensive compilation, or lack an advanced REPL, it&aposs very common to start by writing a test, both as a way to drive the code forward and to trigger a compilation of the code. In a language like Clojure, you can move forward in even smaller steps, in a fast and controlled way.

Enough about this, and let&aposs switch over to Python instead!

Python

We&aposll start by trying to get as good a developer experience as possible, similar to what we have in Clojure. There are many good IDEs, but here I&aposll be using PyCharm.

Install PyCharm if you haven&apost already.
Install IPython, preferably globally.
- IPython is an alternative to the standard REPL in Python.

Configure IPython&aposs config file, and add:

c.InteractiveShellApp.exec_lines = ["%autoreload 2"]
c.InteractiveShellApp.extensions = ["autoreload"]
c.TerminalInteractiveShell.confirm_exit = False

The config file is probably found here: ~/.ipython/profile_default/ipython_config.py

Start PyCharm and go to PyCharm > Settings > Python > Console > Python Console > Starting script and add:
```
%load_ext autoreload
%autoreload 2
%aimport -pydev_umd
```
- %load_ext autoreload loads the IPython extension autoreload, which allows modules to be reloaded automatically when files change.
- %autoreload 2 enables automatic reloading of all modules (except those that are excluded)
- %aimport -pydev_umd excludes pydev_umd from reloading, to remove errors that would otherwise be shown in the REPL.
- There may be small red markings in the configuration, but these are not real errors and can be ignored.
Select View > Tool Windows > Python Console from the menu, which opens a Python Console panel in the lower part of the IDE.
- A prompt In [1] should now appear instead of >>>, which indicates that it&aposs the IPython REPL running, and not the standard REPL.
Then I set up my keyboard shortcuts under Pycharm > Settings... > Keymap > Plugins > Python Community Editor to be able to send code to the REPL in the same way I&aposm used to in Clojure.
I&aposve also added ipython>=8.0.0 to pyproject.toml, and ran uv sync --dev to load the library.

Much of what&aposs written here comes from this blog post under the heading "Easy setup" (thanks David Vujic!).

Now it&aposs high time to write some Python code, and we&aposll start by creating the module clear_rows.py:

  ▾ components
    ▾ tetrisanalyzer
      ▾ board
        __init__.py
        clear_rows.py
        copy.py
      ▸ piece
  ▸ test

Then we add the row:

row = [1, 1, 1, 0, 1, 1, 1, 0, 1, 1]

After which we run the shortcut command to send the entire module to the REPL, so it gets loaded (output from the REPL):

In [1]: runfile(&apos/Users/tengstrand/source/tetrisanalyzer/langs/python/tetris-polylith-uv/components/tetrisanalyzer/board/clear_rows.py&apos, wdir=&apos/Users/tengstrand/source/tetrisanalyzer/langs/python/tetris-polylith-uv/components/tetrisanalyzer/board&apos)

Now we can select row in the editor and send it to the REPL:

In [2]: row
Out[2]: [1, 1, 1, 0, 1, 1, 1, 0, 1, 1]

Through the REPL, we now have a convenient way to interact with the compiled code even in Python!

Let&aposs translate the following line from Clojure to Python:

(some zero? row)

By adding the following line to clear_rows.py:

0 in row

Now we can select the line and send it to the REPL, which is an alternative to loading the entire module:

In [3]: 0 in row
Out[3]: True

Then we change row and test again:

row = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

0 in row

In [4]: 0 in row
Out[4]: False

It seems to work! Time to create a function from the code, and test run it:

def is_incomplete(row):
    return 0 in row

row = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

is_incomplete(row)

In [5]: is_incomplete(row)
Out[5]: False

Then I update row and test again:

row = [1, 1, 1, 0, 1, 1, 1, 1, 1, 1]

is_incomplete(row)

In [6]: is_incomplete(row)
Out[6]: True

It looks like it works!

Now we&aposll add a board to the module, so we have something to test against:

def is_incomplete(row):
    return 0 in row


board = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
         [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 1, 1, 0, 0, 1, 1],
         [1, 0, 1, 1, 1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]

In Clojure we can filter out incomplete rows like this:

(filter incomplete-row? board)

This is written most simply like this in Python:

[row for row in board if is_incomplete(row)]

The statement is a list comprehension that creates a new list by iterating over board and keeping only rows where is_incomplete returns True.

Let&aposs test run the expression:

In [7]: [row for row in board if is_incomplete(row)]
Out[7]: 
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [1, 1, 1, 1, 1, 1, 0, 0, 1, 1],
 [1, 0, 1, 1, 1, 1, 1, 1, 1, 1]]

It works!

Before for we have row, which is what we iterate over:

[row for row in board if is_incomplete(row)]

Python also allows us to do a calculation for each row, which can be exemplified with:

[row + [9] for row in board if is_incomplete(row)]

Which adds 9 to the end of each row:

[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9],
 [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 9],
 [1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 9]]

Let&aposs return to the original version and assign it to remaining_rows:

def is_incomplete(row):
    return 0 in row

board = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
         [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 1, 1, 0, 0, 1, 1],
         [1, 0, 1, 1, 1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]

remaining_rows = [row for row in board if is_incomplete(row)]

Before we continue, let&aposs do the same refactoring of empty_row in core.py as we did in Clojure:

def empty_row(width):
    return [0] * width


def empty_board(width, height):
    return [empty_row(width) for _ in range(height)]

We continue by translating this Clojure code:

(def width (count (first board)))
(def height (count board))
(def remaining-rows (filter incomplete-row? board))
(def num-cleared-rows (- height (count remaining-rows))) ;; 2
(def empty-rows (core/empty-board width num-cleared-rows) ;; ([0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0])
(vec (concat empty-rows remaining-rows)) ;; [[0 0 0 0 0 0 0 0 0 0]
                                         ;;  [0 0 0 0 0 0 0 0 0 0]
                                         ;;  [0 0 0 0 0 0 0 0 0 0]
                                         ;;  [0 0 0 0 0 0 0 0 0 0]
                                         ;;  [1 1 1 1 1 1 1 0 1 1]
                                         ;;  [1 0 1 1 1 1 1 1 1 1]]

Till Python:

width = len(board[0])
height = len(board)
num_cleared_rows = height - len(remaining_rows) # 2
empty_rows = empty_board(width, num_cleared_rows) # [[0,0,0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0,0,0]]
empty_rows + remaining_rows # [[0,0,0,0,0,0,0,0,0,0],
                            #  [0,0,0,0,0,0,0,0,0,0],
                            #  [0,0,0,0,0,0,0,0,0,0],
                            #  [0,0,0,0,0,0,0,0,0,0],
                            #  [1,1,1,1,1,1,0,0,1,1],
                            #  [1,0,1,1,1,1,1,1,1,1]]

I&aposve deliberately copied the functional style from Clojure to Python, and as you can see it works excellently in Python too, but with a caveat.

Mutability

At one point, part of the Clojure code looked like this:

(def empty-row (vec (repeat board-width 0)))
(def empty-rows (repeat num-cleared-rows empty-row)])

Which I translated to:

empty_row = [0 for _ in range(board_width)]
empty_rows = [empty_row for _ in range(num_cleared_rows)]

The problem with the Python code is that empty_rows refers to one and the same empty_row, and if the latter is changed, all rows in empty_rows change, which becomes a problem if num_cleared_rows is greater than one.

In the new solution, we instead create completely new rows in Python, while in Clojure we can share the same row since it&aposs immutable. The fact that everything is immutable in Clojure is a big advantage when we let data flow through the system, as it prevents data from spreading uncontrollably to other parts further down in the data flow.

Putting it together

Let&aposs put everything together into a function:

from tetrisanalyzer.board.core import empty_board


def is_incomplete(row):
    return 0 in row


def clear_rows(board):
    width = len(board[0])
    height = len(board)
    remaining_rows = [row for row in board if is_incomplete(row)]
    num_cleared_rows = height - len(remaining_rows)
    empty_rows = empty_board(width, num_cleared_rows)
    return empty_rows + remaining_rows

Now we can test run it. Note that we&aposve removed board from the source file, but the REPL still remembers it from earlier:

clear_rows(board)

In [8]: clear_rows(board)
Out[8]: 
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [1, 1, 1, 1, 1, 1, 0, 0, 1, 1],
 [1, 0, 1, 1, 1, 1, 1, 1, 1, 1]]

It looks correct!

Before we add a test, we need to expose the clear_rows function in the board interface, by updating components/tetrisanalyzer/board/__init__.py (and sending the module to the REPL):

from tetrisanalyzer.board.clear_rows import clear_rows
from tetrisanalyzer.board.core import empty_board, set_cell, set_piece


__all__ = ["empty_board", "set_cell", "set_piece", "clear_rows"]

Finally, we&aposll add the test test_clear_rows.py to the board component:

  ▾ components
    ▾ tetrisanalyzer
      ▸ board
      ▸ piece
  ▾ test
    ▾ components
      ▾ tetrisanalyzer
        ▾ board
          __init__.py
          test_clear_rows.py
          test_core.py
        ▸ piece

from tetrisanalyzer import board


def test_clear_rows():
    input = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
             [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
             [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
             [1, 1, 1, 1, 1, 1, 0, 0, 1, 1],
             [1, 0, 1, 1, 1, 1, 1, 1, 1, 1],
             [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
    
    expected = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                [1, 1, 1, 1, 1, 1, 0, 0, 1, 1],
                [1, 0, 1, 1, 1, 1, 1, 1, 1, 1]]
    
    assert expected == board.clear_rows(input)

Now we can run all tests with uv run pytest:

============================================ test session starts ============================================
platform darwin -- Python 3.13.11, pytest-9.0.2, pluggy-1.6.0
rootdir: /Users/tengstrand/source/tetrisanalyzer/langs/python/tetris-polylith-uv
configfile: pyproject.toml
collected 3 items

test/components/tetrisanalyzer/board/test_clear_rows.py .                                             [ 33%]
test/components/tetrisanalyzer/board/test_core.py ..                                                  [100%]

============================================= 3 passed in 0.01s =============================================

It works!

Summary

I&aposve deliberately kept the Python code functional, partly to make it easier to compare with Clojure, but also because I like the simplicity of functional programming. We also learned that we needed to be careful when working with mutable data!

The key takeaway: working in smaller steps helps us move faster!

Happy Coding!

Permalink

Wiring Clojure Web Apps with Aero, Pedestal, and Integrant

I've settled on a pattern for structuring Clojure web applications that I keep coming back to: combining Aero for configuration, Integrant for component lifecycle, and Pedestal for HTTP. The result is a fully declarative system where all wiring is explicit in configuration rather than scattered through code.Common patterns

Permalink

Future Folk Symposium

Early days in this project, but the first R&D phase wraps up with a symposium in London on February 6th.

Tickets here.

Permalink

Christmas Lights Diarrhea

Enough colors to remember

What should you highlight?

Comments are important

Two types of comments

Light or dark?

Bold and italics

Myth of number-based perfection

Let’s design a color theme together

Shameless plug time

Existing options

My solution

Setup

Request types

Graphs

Insights

Not implemented (yet)

How to get

Senior Backend Engineer (f/m/d) at HolidayPirates GmbH

AHOY MATE!

DUTIES ON DECK

YOUR TREASURE OF EXPERIENCE

THE PIRATE SHIP OFFERS YOU

The Problem: Context Rot and Token Costs

Token Costs Compound

Context Rot Degrades Quality

Prior Work: Two Key Insights

Recursive Language Models (RLM)

Barliman: Synthesis from Examples

Matryoshka: Combining the Insights

1. Nucleus: A Declarative Query Language

2. Pointer-Based State

3. Synthesis from Examples

The Lifecycle

1. Load Document

2. Query Incrementally

3. Chain Operations

4. Close Session

How Agents Discover and Use Matryoshka

Tool Discovery

Guided Discovery

Session Flow

Real-World Example: Analyzing anki-connect

The Task

The Workflow

Step 1: File Discovery

Step 2: Read Small Files

Step 3: Query Large Files with Matryoshka

Complete Findings

Token Usage Comparison

Why Hybrid Works

Architecture

Getting Started

As MCP Server (Claude Code / Claude Desktop)

Programmatic Use

Interactive REPL

Conclusion

OpenJDK Plans for 2026 - Valhalla Getting Closer, Amber Not Slowing Down, Loom Nearly Complete

Type Classes - Valhalla experiments with the next step

Null Checks get concrete - The “Bang World” Prototype

Ephemeral Threads - Clojure knocks on Project Loom’s Door “With a Request”

Project Amber in 2026 - Pattern Assignment and Constant Patterns on the Horizon

PS: If you want to go deeper - see you in person 👋

🇸🇪 JVM in the Age of AI: 2026 Edition @ JFokus 2026

Upcoming Events

Podcasts, videos, and media

Blogs, articles, and news

Libraries and Tools

MCP Integration

REPL-driven development

Simplify

Work faster - in small steps

Python

Mutability

Putting it together

Summary

About

Subscriptions

Planetarium

Syndicate