Planet Haskell

Egyptian fractions for 2/105

2026-06-12T17:53:00Z

The ancient Egyptians had a terrible notation for fractions. They had notations for for each , for , but everything else was written as a sum of these, with repeats forbidden, so that for example had to be written as . (Wikipedia)

In an older article about Egyptian fractions and the Rhind Mathematical Papyrus, I said:

Getting the table of good-quality representations of is not trivial, and requires searching, number theory, and some trial and error. It's not at all clear that .

I think I see now where this comes from. , so two of the summands must have denominators divisible by and by respectively. The first thing you should do is consider $$\u5 + \u7 = \frac{12}{35} = \frac{36}{105}.$$

But you don't want , you want , so you multiply by :

$$\u{18}\left(\u5 + \u7\right) = \u{90}+\u{126} = \frac 2{105}$$

and there it is.

Why pick and rather than, say, and ? I suspect the answer is probably: Ahmes (or someone earlier) tried it both ways and picked the result they liked best. Remember Ahmes is compiling a reference table here, so he does these calculations once, writes down the best result, and throws the others away.

If you do the same trick with and instead you get . Then you multiply everything by producing $$\u{84} + \u{140} = \frac2{105}$$ which seems a little worse than the other one. Using the and the produces $$\u{75} + \u{175} = \frac2{105}$$ which seems much worse.

Of course this only works when the denominator is composite.

Here's another approach, which doesn't work too well in this case but might be useful for other examples. Consider that . We want . So

$$ \begin{align} \frac2{105} & = \u{35}\cdot\frac23 \\ & = \u{35}\left(\u2+\u6\right) \\ & = \u{70} + \u{210} \end{align} $$

The denominators here are a lot bigger than the first expansion, but they do at least have the advantage of being multiples of . The Egyptians like this because they, like us, often need to multiply numbers by , and whereas a fraction like is hard for them to multiply by , it's trivial to multiply by .

Writing static checks to an unsuspecting library with Liquid Haskell

2026-06-11T00:00:00Z

This post presents a little epic to insert static checks in Haskell’s Diff package using Liquid Haskell (LH).¹ Static or compile-time checks are helpful to confirm formerly implicit assumptions in the implementation, providing an additional layer of assurance.

Making illegal states unrepresentable at an affordable cognitive cost is a staple of statically typed functional programming. Endeavors like Dependent Haskell and Liquid Haskell delve into this aspect. A distinctive feature of LH is that it works on top of regular Haskell code, meaning that the program can still be compiled after disabling it, thus making it possible to enforce properties without changing the source code. In what follows I’ll give you a glimpse of how the Liquid Haskell approach feels in practice and how far it can go.

Liquid Haskell was created by the UCSD Programming Systems group and these days is mainly maintained and further improved by my colleague Facundo Domínguez. Applying Liquid Haskell to strengthen libraries has precedent in the Haskell ecosystem, and it was in this spirit that Facundo suggested this project as we were pondering an attempt to statically check our in-house Ormolu, of which Diff is a transitive dependency and a more suitable commitment given the engineering time I could bestow upon it.²

`Diff` will never be the same

The Diff package is a small and (relatively) self-contained library implementing the Myers diff algorithm. As a provider of basic functionality in the Haskell ecosystem, adding formal guarantees to it is of intrinsic value to the community.

From the get-go, my objective was adding static checks to strengthen this library in a contribution guided by two opposing desiderata:

Minimize source changes
Maximize checked invariants

While the first is about testing how (non-)intrusive Liquid Haskell can be, the second is about its expressiveness. To put it bluntly, the ideal LH would be able to statically check all the existing invariants of an unsuspecting library using nothing more than specification annotations. Reality is not that kind, forcing me to compromise on both objectives, but I kept this mindset to help me see how close LH is to this ideal.

My first milestone was filling the mind gap between the Diff implementation and the referenced paper’s algorithm, through an in-depth study of the library, resulting in documentation contributions highlighting the most salient invariants (pre- and post-conditions) and assumptions.³

In general, it is by a careful threading of logic that a program is built into existence; the problem (and the source of well engineered solutions) is that the critical aspects of it lie within a theory in its writer’s mind, which tends to be lost across iterations, updates, refactors and people moving on. Both documentation and specification cannot completely solve this problem, but they can help.

For example, I added a post-condition to this function haddock

data PolyDiff a b = First a | Second b | Both a b

-- | Like 'getGroupedDiff' but accepts a custom equality predicate.
--
-- Postcondition: the output list is guaranteed to be /chunked/. i.e. no two adjacent
-- elements share the same constructor.
getGroupedDiffBy :: (a -> b -> Bool) -> [a] -> [b] -> [PolyDiff [a] [b]]

making the expected form of its output explicit. This allows a reader to get an immediate notion of what the implementation is supposed to accomplish in order to satisfy the caller’s expectations.

Similarly, data types often carry more meaning than what they actually encode, in which case documenting the implicit assumptions can help understand their intended use.

-- | Line Range: start, end and contents.
--
-- The following invariants hold:
--
-- > snd lrNumbers >= fst lrNumbers
-- > snd lrNumbers - fst lrNumbers + 1 == length lrContents
--
-- which imply @lrContents@ cannot be empty.
data LineRange = LineRange { lrNumbers :: (LineNo, LineNo)
                           , lrContents :: [String]
                           }

These haddocks are inspired by the kind of properties that LH can express. Nevertheless, their value doesn’t depend on providing static checks as they already save us from some arduous code path diving. Wouldn’t it be wonderful if the compiler could take those haddocks to heart? In a sense that’s what LH is about!

Engineering the static checks took me into a tight feedback loop between the documentation process, coming up with refactorings⁴ to make the code easier to check (which always implied easier to explain!) and the writing of LH specifications matching the documented invariants. This approach is in close sympathy with the doc it like it’s hot philosophy.

From dry code to liquid types

After installing LH, compilation failed due to new shiny errors, even though I hadn’t written a single LH specification yet. This is because LH inspects the bodies of all function definitions out of the box to prove that

Existing specifications are fulfilled
Recursive functions terminate

The first condition is not limited to local specifications; LH comes bundled with specifications for many boot package functions. For instance, many of Prelude’s partial functions are refined this way to be total, so LH tries to prove that all their uses are safe.

One prominent example is head, which was the only failure of the first condition in Diff: it was not certain that the list passed to head in its ses function is always non-empty, which can be found to be true from the algorithm specification and by following the composition of the involved processes. LH tries to build this knowledge from specifications, in the form of refinement types, found along the call stack. Such specifications are introduced using a special comment syntax {-@ ... @-} whose contents are processed to generate a set of constraints for an external SMT solver to verify. This allows us to mechanically check function specifications, formed out of pre- and post-conditions, and data invariants expressed as simple logical predicates at compile time. In what follows I’ll show some examples of LH specification annotations, but in most cases I won’t be explaining their syntax or fundamentals, trusting that their meaning within the general argument can be gathered from context. For further details please look at the spec reference documentation.

The same comment syntax is also used to set LH directives, like the ignore annotation I used to skip checks in the body of the offending ses function.

{-@ ignore ses @-}
ses :: (a -> b -> Bool) -> [a] -> [b] -> [DI]
ses eq as bs = path . head . dropWhile (\dl -> poi dl /= lena || poj dl /= lenb) .
            concat . iterate (dstep cd) . (:[]) . addsnake cd $
            DL {poi=0,poj=0,path=[]}
            where cd = canDiag eq as bs lena lenb
                  lena = length as; lenb = length bs

Turning now to the second condition: To prove termination of a recursive function, LH needs to be told of a size reduced towards a lower bound at each recursive call. This is called a termination metric.

Some recursive functions might be proved terminating without intervention, because when no explicit metric is given LH follows a simple heuristic: it checks for the first (non-function) argument with an associated size metric to be strictly decreasing and non-negative at each recursive call. LH has definitions of associated size metric for lists (their length) and integer values, which are considered metrics themselves when non-negative. Metrics get interesting when we have mutually recursive functions, as is the case for doPrefix and doSuffix, a pair of local functions whose job is to chop common lines of input to create the context windows that make a diff’s hunks. I introduced a lexicographic metric, annotated with the syntax / [metric1, metric2, ...] at the end of a function refinement, to prove their termination:

type Diff c = PolyDiff c c

{-@ doPrefix :: hunk : [Diff [c]] -> [Diff [c]] / [len hunk, 0] @-}
doPrefix :: [Diff [c]] -> [Diff [c]]
doPrefix [] = []
doPrefix [Both _ _] = []
doPrefix (Both xs ys : more) =
  Both (drop (length xs - contextSize) xs)
       (drop (length ys - contextSize) ys)
    : doSuffix more
doPrefix (d : ds) = d : doSuffix ds

{-@ doSuffix :: hunk : [Diff [c]] -> [Diff [c]] / [len hunk, 1]@-}
doSuffix :: [Diff [c]] -> [Diff [c]]
doSuffix [] = []
doSuffix [Both xs ys] = [Both (take contextSize xs) (take contextSize ys)]
doSuffix (Both xs ys : more)
  | length xs <= contextSize * 2 = Both xs ys : doPrefix more
  | otherwise =
      Both (take contextSize xs) (take contextSize ys) :
        doPrefix (Both (drop contextSize xs) (drop contextSize ys) : more)
doSuffix (d : ds) = d : doSuffix ds

Using this metric LH checks that either the input hunk (a list of diff elements) length is reduced after each recursive call, as it would do by its default heuristic, or considers a call to doPrefix (0) from doSuffix (1) to be a strict reduction. This second fallback metric is needed because of the third equation of doSuffix (second guard), where doPrefix is called with a list of equal length. Apart from this case, each (mutually) recursive call is done on the tail of the input and thus strictly decreasing.

Here I’ve presented instances of two general strategies to handle LH errors:

Fight: Fix the failing termination checks by introducing metrics and offending functions calls by adding specifications.
Flight: Disable checks by using an escape hatch, e.g. the {-@ lazy myRecursiveFunction @-} annotation to circumvent termination checking, the {-@ ignore myOffendingFunction @-} to disable all checks within a function’s body or the {-@ assume myFunction :: ... spec ... @-} to set a function specification as true without verification.

A priori it’s desirable to minimize the use of escape hatches, but they’re also tools to prioritize static checking efforts.

Invariant static checking

One thing that made Diff particularly suitable for this effort is that a detailed specification of it existed in the form of a research paper. Indeed, my first documentation contribution was making their connection explicit throughout. The Myers diff algorithm can be summarized as a breadth-first search for the shortest path across a bidimensional edit grid to an endpoint,⁵ the latter representing the complete transformation of one input to the other. The algorithm is in fact tersely expressed in the ses definition presented before; its name stands for “smallest edit script”, which is one of the output characterizations of the diff algorithm. What I found is that the idea of a wave front is the link between this implementation and the original algorithm. This statement is now supported by a static check showing that a wave front is transformed as the algorithm prescribes for its inner loop.

A wave front is defined as a list of nodes at the same depth, i.e. the edit trace length, which is iterated upon by the dstep function to return nodes one step deeper. This function is a direct implementation of the extension procedure used by the algorithm at each search step. Furthermore, the paper proves a pair of lemmas that result in a specific configuration of the node list after each iteration, which is related to the diagonals on the edit grid and checked by a wfDiags predicate that I wrote to specify it. The details of this condition aren’t essential here: while the paper leverages it to introduce a space optimization, the Haskell implementation doesn’t depend on it, but the configuration is preserved nonetheless.

I encoded the fixed depth of nodes and their diagonal configuration using refinement type aliases to obtain a wave front specification.

-- | A node representing the tip of a path in the edit grid.
data DL = DL
    { ...
    , path :: [DI]   -- ^ The edit trace accumulated so far
    } deriving (Show, Eq)

-- A node at a fixed edit trace length (depth).
{-@ type DLN D = { x : DL | len (path x) = D } @-}

-- | This function is used only in LH specs to check if
-- diagonal configuration holds for a node list.
wfDiags :: [DL] -> Bool
wfDiags = ...

-- All nodes in a wave front are at the same depth,
-- and satisfy the diagonal configuration.
{-@ type WaveFront D = {xs : [DLN D] | wfDiags xs} @-}

With this encoding, and a phantom parameter carrying the current depth, I specified dstep (called from ses) to match the algorithm behavior (which also includes the node list growing by one).

{-@
dstep
  :: (Nat -> Nat -> Bool)
  -> d : Nat
  -> {nodes : WaveFront d | len nodes > 0}
  -> {v : WaveFront (d + 1) | len v = len nodes + 1}
@-}
dstep
  :: (Int -> Int -> Bool) -- ^ Check for node coordinates producing a free edge
  -> Int                  -- ^ The current depth; used for the static check of the wave front invariant
  -> [DL]                 -- ^ A non-empty wave front of nodes at edit distance D
  -> [DL]                 -- ^ A non-empty wave front of nodes at edit distance D+1

Refinement type aliases become statically checked invariants when used in a function specification, and are verified to hold at each call site.

As a second example, let’s see the invariants of a Hunk, expressed again using refinement type aliases.

-- A valid list diff is such that any `Both` value has arguments of equal length.
{-@ type ValidListDiff a b = { d : PolyDiff [a] [b] | validListDiff d }@-}

-- | True when, for a 'Both' value, both sides have the same length.
-- 'First' and 'Second' trivially satisfy this.
-- Introduced for LH specifications.
validListDiff :: PolyDiff [a] [b] -> Bool
validListDiff (Both xs ys) = length xs == length ys
validListDiff _ = True

-- | True if the list does not contain adjacent 'PolyDiff's with the same constructor.
noStuttering :: [PolyDiff a b] -> Bool
noStuttering = ...

-- | A 'Hunk' is a list of adjacent 'Diff's.
--
-- No two consecutive elements in a 'Hunk' are both applications
-- of 'First', 'Second', or 'Both', i.e. the list does not stutter
-- on 'Diff' constructors.
type Hunk c = [Diff [c]]

{-@ type Hunk c = { h : [ValidListDiff c c] | noStuttering h} @-}

The interesting part here is the check for the noStuttering invariant in the specification of the main Hunk producing function. For brevity’s sake I won’t show this function, but let’s see what came to be of the specification of the previously presented doPrefix, that is part of it, for the check to pass.

{-@ doPrefix ::  h : Hunk c
             -> {v : [ValidListDiff c c] | noFFSS v
                 && ... other auxiliary post-conditions ... } / [len h, 0] @-}

Essentially, this function traverses a given Hunk and chops and splits Both elements to a context size argument. After doing so the Hunk “stutters” on such elements, so it stops being a Hunk in the refined sense, even though the Haskell types match. Note the regular type synonym and the refinement type synonym don’t coalesce: At the Haskell level the synonym is just a renaming, but in the specification it is shadowed by the refinement synonym (thus enforcing its invariants). The noFFSS helper characterizes the resulting list; it is like noStuttering, but just for the other PolyDiff constructors: First and Second. Other auxiliary post-conditions (not shown) stating that input and output lists shared head constructors were also necessary for this check.

The verification of these and other invariants followed a similar outline:

Identify and document the invariant
Encode it in refinements
Write the specifications
Please the compiler

Pleasing the compiler after adding a new specification was trickier for me than the usual Haskell type error propagation and fix workflow. Figuring out exactly what LH is aware of when checking a specification requires an intuition of how it builds a context; then it’s a matter of making the missing information available.

For instance, the last step to get back a Hunk after the doPrefix-doSuffix operation required passing a lemma within a local dead binding for the specification to be verified.

{-@ assume lemmaReverseNoStuttering
      :: xs:_ -> { noStuttering (reverse xs) = noStuttering xs } @-}
lemmaReverseNoStuttering :: Hunk c -> ()
lemmaReverseNoStuttering _ = ()

-- | Split a 'Diff' list at consecutive 'Both'-'Both' boundaries.
{-@ splitBothBoth :: {ds:[ValidListDiff c c] | noFFSS ds} -> [Hunk c] @-}
splitBothBoth :: [Diff [c]] -> [Hunk c]
splitBothBoth = go []
  where
    {-@ go
          :: g:Hunk c
          -> {xs : [ValidListDiff c c] | noFFSS xs && not (headAlike g xs) }
          -> [Hunk c] / [len xs]
      @-}
    go :: Hunk c -> [Diff [c]] -> [Hunk c]
    go g (x@Both{} : y@Both{} : xs) = reverse (x:g) : go [] (y:xs)
      where
        lemma = lemmaReverseNoStuttering (x:g)
    go g (x : xs) = go (x:g) xs
    go g [] = [reverse g]
      where
        lemma = lemmaReverseNoStuttering g

This binding ultimately gets optimized away by GHC, but LH requires it to satisfy the static checks. LH didn’t have a means to know that reverse preserves the noStuttering of a PolyDiff list, so I provided it.

I decided to assume the lemma above on the rationale that its validity is straight-forward, while its proof would probably not be, making the disease not worth the medicine.

Lifting a dam

After this work I’m flooded with many thoughts and feelings about LH from the user perspective, but also ideas for important future developments.

One particular source of difficulty I found is differentiating between the existing means of lifting a Haskell function into the logic: reflect, inline, measure and define. By default, Haskell functions like wfDiag cannot be used in the refinement type predicates. They have to be accompanied by an annotation that indicates how to make them available in the predicates, which I omitted in my examples for the sake of argument.

Existing documentation does a good job at explaining their requirements and purpose. Nevertheless, subtle differences in constraint generation and logical expression unfolding aren’t documented, and these details matter when choosing between them in certain cases. Addressing this could lead to unifying or deprecating some functionality, but at least specifying them at a finer grain and adding some use case examples could go a long way.

A more salient difficulty are the error messages. They can be baffling, featuring not very human friendly variable names spread across enormous lists of bindings forming their “context”. Skimming through this context is a skill that I would love to deprecate. Looking first at the “inferred type” and the “required type” part at the start of the message is a useful technique, which can provide a lead to the source of the problem.

I find refinement types appealing because they are powerful yet non-sophisticated enough to be intuitive. Nevertheless, getting a function specification checked can become intricate, requiring additional proving machinery like function definitions exclusively intended for refinement predicates, lemmas in dead bindings to pass additional constraints or even heavy refactoring. However, I think the upfront cost of entry can be easily balanced out by using the escape hatches to focus the effort investment. In the Diff package, for instance, some low hanging fruit could be picked right away after disabling the checks on error triggering functions, e.g. refining integer values to naturals or enforcing clear-cut relations between record fields, adding immediate value without additional machinery.

A drawback is that polymorphism seems at odds with the simplicity of refinements: the more we want to specify about a value or function, the more we narrow its type. That said, there seems to be a correlation between code complexity and LH verification complexity that is worth investigating further: changes that simplified the verification of an invariant tended to benefit the code quality independently of it.

Choosing between fight or flight for a given invariant is ultimately about balancing safety gains with added complexity, and in my experience the code structure is what tips the scale: it determined both the refactorings I needed and the checks I had to forgo. My guess is that the whole equation changes when refinement types are a first class consideration during design.

Clearing up the waters

Hopefully this little epic amounts to a useful case study that, by showing what using LH is like today, encourages you to add static checks to an existing codebase or experiment in your next project with LH in your toolbox, and the techniques I’ve shared help prioritize the approach.

I discussed some of LH pain points to offer a balanced view and propose further DX improvements. There’s much to be done, but it’s steadily getting there. My opinion is that Liquid Haskell is a viable option today to add formal guarantees to an unsuspecting codebase at a reasonable cost, as long as the palette of shapes and extent this can take is kept in mind during design. Know that you’re welcome to contribute to LH development and that we’re ready to help strengthen your codebase. Just reach out!

Finally, I would like to express my gratitude to Aleksandr Vershilov, Arnaud Spiwack and Christopher Harrison for reviewing this text, and notably to Facundo Domínguez whose close collaboration was instrumental to streamline this work.

At the time of writing, the static checks are about to be proposed for upstream integration. But they can be found in the Liquid Haskell test suite as well.↩
A nice perk of working at Tweag is being supported to do open source contributions during or in-between client projects.↩
Found in https://github.com/seereason/Diff/pull/21 and https://github.com/seereason/Diff/pull/23 ↩
Found in https://github.com/seereason/Diff/pull/24, https://github.com/seereason/Diff/pull/25, https://github.com/seereason/Diff/pull/26 and https://github.com/seereason/Diff/pull/27.↩
The coordinates of a node in the edit grid represent the size of the prefix consumed from the first input and the size of the produced prefix of the other input, respectively. Thus, the endpoint has coordinates matching both input lengths. The grid’s most relevant feature is that, in addition to vertical and horizontal edges (corresponding to deletions and additions, respectively), there are “free” diagonal edges wherever both inputs have matching elements.↩

Stackage talk at Haskell Ecosystem Workshop 2026

2026-06-08T16:00:00Z

Stackage talk at Haskell Ecosystem Workshop 2026

Jens Petersen gave a talk about Stackage at the Haskell Ecosystem Workshop 2026 (4th June), organized by the Haskell Foundation before Zurihac.

Here are the html slides (single page html).

Ergonomic overrides for Nixpkgs

2026-06-06T00:00:00Z

Announcement post for the override-utils Nix package

Professor Emeritus

2026-06-04T17:46:04Z

After retiring last July, the University Senate have approved my emeritus status. I'm grateful to Julian Bradfield for his work drafting the generous minute that accompanied the approval.

Special Minute
Professor Philip Wadler BSc, MSc, PhD, FRSE, FACM, FRS
Emeritus Professor of Theoretical Computer Science

We are pleased to nominate Professor Philip Wadler for the title of Emeritus Professor at the University of Edinburgh. Professor Wadler is a popular educator and has had an extensive career in both academia and industry, with seminal contributions to the field of computer science, particularly in the theory and practice of programming languages. Philip Wadler obtained a BSc with honours in mathematics from Stanford University in 1977, followed by a MSc and PhD in computer science in 1979 and 1984 from Carnegie-Mellon University. He took up a postdoc at Oxford University, and in 1987 he was appointed as a lecturer at the University of Glasgow. In 1996, Phil switched to industry, working at Bell Labs and Avaya Labs. He returned to academia in 2003, taking up the Chair of Theoretical Computer Science at the University of Edinburgh.

Professor Wadler’s research centres on the theory and practice of programming languages. He served as first editor of the Haskell report, and introduced what are arguably its two main innovations, type classes and monads. Haskell saw widespread use, and type classes and monads were adopted by a wide variety of other programming languages and proof assistants. He contributed to the design of the programming language Java, and introduced a model of it widely used by researchers. By influencing the design of popular programming languages, Phil has had a profound impact not only on programmers, but also on the users of the systems those programmers build. If you’ve used Facebook or X, Android or iPhone, you’ve run code that exploits concepts Phil pioneered.

Professor Wadler has published many seminal monographs and textbooks throughout his illustrious career. His contribution has been honoured in many ways. He served as chair of the ACM Special Interest Group on Programming Languages (SIGPLAN) from 2009–2012 and received its Distinguished Service Award in 2016. He was appointed a Fellow of the Royal Society of Edinburgh in 2005, a Fellow of the Association for Computing Machinery in 2007, and a Fellow of the Royal Society in 2022. He regularly delivers keynotes at both academic and developer conferences. In 2016, his sixtieth birthday was marked by a two-day Wadlerfest, and an accompanying festschrift published by Springer.

Phil is a passionate and popular teacher. On moving to Edinburgh in 2003, he introduced a first-year programming languages course based on Haskell and was shortlisted for the EUSA Teaching Award (Overall Best Performer) in 2009. His Honours courses on programming language theory have been among the most popular theoretical courses. Phil is widely known for theatrical performance and applies this talent outside academia, often performing stand- up comedy via Bright Club, and appeared in the Fringe via the Cabaret of Dangerous Ideas in 2024.

Since 2017, Phil has worked closely with industry, including consulting for IOG where he helped to design the smart contract system for its Cardano blockchain and applied formal methods to ensure its reliability. After retirement from Edinburgh, he plans to write a version of his online text for the proof assistant Agda updated to the proof assistant Lean. He will appear again this summer at the Fringe.

To conclude, Professor Philip Wadler's career is characterised by groundbreaking research, impactful teaching, and significant professional service. His work has shaped the landscape of programming languages and computer science education. Conferring the title of Emeritus Professor on Professor Wadler would honour his substantial contributions to the University of Edinburgh and the broader scientific community.

A good life for the 99% isn’t a pipe dream: it can be done. Here’s how

2026-06-04T15:37:18Z

Thomas Piketty is at it again. He and his colleagues at the World Inequality Lab have produced a report outlining, with quantitative modelling, what a just world might look like and how to get there. A summary appears in the Guardian, and their full report is online.

Imagine a future in which everyone enjoys high levels of wellbeing; where 90% of the world’s population doubles their income but works half the hours we work today. A world in which the bottom half of humanity sees its share of global wealth rise from just 2% today to 30%; a world where we consume enough, but nobody over-consumes. And imagine achieving this on a planet that can comfortably sustain human life without its climate breaking down.
Against the bleak techno-authoritarian futures now being sold to us, a radical new vision for global progress in the 21st century feels urgently needed. ...
What would this transition deliver? At its heart is convergence between countries. Average per capita national income, today separated by a 16-fold gap between the poorest (€290 a month in sub-Saharan Africa) and richest (€4,590 in North America/Oceania) regions of the world, would rise towards a common level of about €5,000 a month in all countries by 2100.
But this convergence is not just monetary. Annual working hours per employed person would fall from roughly 2,100 to about 1,000, continuing the long shift towards shorter working time; while the share of global working hours devoted to education and health would rise from 11% to 43%. Women and men would converge on equal pay and on an equal share of economic and domestic labour.
All of this would unfold within a habitable climate. Thanks to sustainable convergence and fast decarbonisation, global heating would reach 1.8C, against more than 4C on current trends.
None of this will be possible without a deep contraction of inequality. The income scale between individuals would narrow to a ratio of one to five and the wealth scale to one to 10, prolonging what western and Nordic Europe achieved over the 20th century. The share of global wealth held by the poorest half of humanity would rise from 2% to 30%, while the share held by the billionaire class would fall from 6% to 0.05%.

Faster Cabal Haskell builds by eliminating redundant work

2026-05-28T00:00:00Z

TL;DR Build your Haskell projects 10-15% faster with this one simple trick! (Spoiler: the simple trick is to wait for the next major cabal-install release.)

In previous work (paid for by the Sovereign Tech Fund) we did a lot of heavy lifting to make a major architectural change to Cabal. That work is now paying off with practical benefits. This post covers follow-on architectural improvements to cabal-install which then enable us to eliminate redundant work in the configure phase, yielding significant reductions in build times.

The changes will be available to everyone in the next major cabal-install release. For a large project like pandoc (including all of its dependencies) we measure a 10% (std.dev. 0.6pp) reduction in wall clock time for a 16-way parallel build with --semaphore. No user changes are needed to take advantage of this improvement.

History: `Cabal` and `cabal-install`

The genesis: the `Cabal` specification

First, there was Cabal. Its design was laid out in A Common Architecture for Building Applications and Tools. Fundamentally, it defines the notion of a package, with each package being built and installed with the following sequence of commands:

> hc Setup.hs
> ./Setup configure
> ./Setup build
> ./Setup install

Each package must be built in dependency order, with hc-pkg registering each installed library into a package database.

Orchestrating the build of multiple packages

cabal-install was then born to plan and execute a build plan consisting of many packages. With its solver, it determines a build plan, which is then orchestrated by running the above sequence of commands for each package, in dependency order.

There is however one architectural mismatch: for the solver to be able to compute a build plan, it already needs a lot of information about the current system:

What Haskell compiler are we using?
What system libraries are available (pkgconfig-depends)?
What build tools are available (build-tool-depends)?

This means that cabal-install already has in its hands most of the information necessary for configuring a package; in particular it has already resolved all the conditionals in every package description. We should thus be able to skip most of the steps in the package’s ./Setup configure phase. However, the command-line interface of ./Setup configure makes it practically impossible to do so: passing a fully resolved dependency graph would require many additions to the already bloated ConfigFlags datatype, and a lot more data being serialised/deserialised.

Because of this limitation, cabal-install’s approach was to take its hard-won build plan and convert it into ConfigFlags that specify exact dependency versions and flag assignments. This amounts to passing ./Setup configure an already fully constrained configuration; the configure step would then re-probe the system, re-read package databases… only to re-discover exactly what cabal-install already knew!

A new architecture for `cabal-install`

The paradigm shift proposed in our Sovereign Tech Fund proposal is that cabal-install should be responsible for orchestrating the whole build process instead of running the conceptually independent build systems provided by each package. With cabal-install now in control, it can directly call Cabal library functions, which in turn allows skipping steps in the configure phase that waste time re-discovering information that cabal-install is already aware of.

To implement such a change, we first needed to prepare the terrain: when invoking an external executable such as the Setup executable – say via the process library as Cabal uses – we can set the working directory, environment variables and redirect input/output handles. It was not possible to do this directly via the Cabal library, so we first needed to add Cabal library support for setting the working directory and for choosing logging handles. Once this was done, it allowed us to refactor cabal-install to directly call Cabal library functions to build packages.

Performance impact

This architectural change provides a solid foundation for further improvements. The two main time sinks in the Cabal configure phase were determined to be (using a new --build-timings flag to cabal-install):

(~50% of configure time) Re-configuring the compiler program database. The compiler and hc-pkg were already pre-configured, but other programs such as haddock, ar, ld etc were re-configured anew for each package.
(~40% of configure time) Re-probing the installed package database, via hc-pkg dump.

We can skip this extra work by pre-configuring the compiler ProgramDb and keeping a running InstalledPackageIndex. These two changes, taken together, reduce the time spent in the configure phase by over 90%.

While most of the time in builds is unsurprisingly spent… actually compiling Haskell code [citation needed], the impact on full builds is still rather significant. For example, when compiling aeson with -j1, we saw a reduction in total build time of ~16.6% (std.dev. 1.9pp) in our benchmarks.

The fact that the configure phase is inherently serial also means that these improvements have a notable impact when combined with the -jsem feature. This is because the -jsem feature allows us to assign more capabilities to the build phase. As per Amdahl’s law, this results in the configure phase becoming more of a bottleneck. For example, when compiling pandoc with cabal install pandoc -j16 --semaphore, we saw a reduction in total build time of ~10% (std.dev. 0.6pp).

Further improvements

These improvements provide a small glimpse of what is possible after our changes to cabal-install’s architecture. A more ambitious long-term goal would be for cabal-install to manage a “giant build graph” on a finer granularity level than whole Cabal components. For example, if package q depends only on module P1 from package p, we could imagine starting to compile q after compiling P1 but before we have finished compiling the rest of p. This would unlock build-time reductions by increasing available parallelism, and also enable more accurate progress and error reporting.

A Remarkable Property of Real-Valued Functions on Intervals of the Real Line

2026-05-22T15:51:36Z

Today the 17 October 2019 I discussed a very remarkable fixed point theorem discovered by the Ukrainian mathematician Oleksandr Micholayovych Sharkovsky.

We recall that a periodic point of period for a function is a point such that . With this definition, a periodic point of period is also periodic of period for every which is a multiple of . If but for every from 1 to , we say that is the least period of .

Theorem 1. (Sharkovsky’s “little” theorem) Let be an interval and a continuous function. If has a point of least period 3, then it has points of arbitrary least period; in particular, it has a fixed point.

Note that no hypothesis is made on being open or closed, bounded or unbounded.

Our proof of Sharkovsky’s “little” theorem follows the one given in (Sternberg, 2010), and could even be given in a Calculus 1 course: the most advanced result will be the intermediate value theorem.

Lemma 1. Let be a compact interval of the real line and a continuous function. Suppose that for some compact interval it is . Then has a fixed point in .

Proof. Let and be the minimum and the maximum of in , respectively. As , it is and . Choose such that and . Then is nonpositive at and nonnegative at . By the intermediate value theorem applied to , must have a fixed point in the closed and bounded interval (possibly reduced to a single point) delimited by and , which is a subset of .

Lemma 2. In the hypotheses of Lemma 1, let be a closed and bounded interval contained in . Then there exists a closed and bounded subinterval of such that .

Proof. Let . We may suppose , otherwise the statement is trivial. Let be the largest such that . Two cases are possible.

There exists such that . Let be the smallest such , and let . Then surely , but if for some we had either or d" class="latex" src="https://s0.wp.com/latex.php?latex=f%28x%29%3Ed&bg=ffffff&fg=333333&s=0&c=20201002"/>, then by the intermediate value theorem, for some Â we would also have either or , against our choice of and .
for every . Let then be the largest such that , and let . Then for reasons similar to those of the previous point.

Proof of Sharkovsky’s “little” theorem. Let be such that , , and . Up to cycling between these three values and replacing with , we may suppose . Fix a positive integer : we will prove that there exists such that and for every .

Let and be the “left” and “right” side of the closed and bounded interval : then and by the intermediate value theorem. In particular, , and Lemma 1 immediately tells us that has a fixed point in . Also, , so also has a point of period 2 in , again by Lemma 1: call it . This point cannot be a fixed point, because then it would also belong to as , but which has period 3. As we can obviously take , we only need to consider the case .

By Lemma 2, there exists a closed and bounded subinterval of such that . In turn, as , there also exists a closed and bounded subinterval of such that , again by Lemma 2: but then, . By iterating the procedure, we find a sequence of closed and bounded intervals such that, for every , and .

We stop at and recall that : we are still in the situation of Lemma 2, with in the role of . So we choose as a closed and bounded subinterval not of , but of , such that . In turn, as , there exists a closed and bounded subinterval of such that . Following the chain of inclusions we obtain . By Lemma 1, has a fixed point in , which is a periodic point of period for .

Can the least period of for be smaller than ? No, it cannot, for the following reason. If has least period , then so has , and in addition is divisible by . But while for every . Consequently, if has least period , then . But this is impossible, because by construction as , while .

Theorem 1 is a special case of a much more general, and complex, result also due to Sharkovsky. Before stating it, we need to define a special ordering on positive integers.

Definition. The Sharkovsky ordering between positive integers is defined as follows:

Identify the number , with odd integer, with the pair .
Sort the pairs with 1" class="latex" src="https://s0.wp.com/latex.php?latex=m%3E1&bg=ffffff&fg=333333&s=0&c=20201002"/> in lexicographic order.

That is: first, list all the odd numbers larger than 1, in increasing order; then, all the doubles of the odd numbers larger than 1, in increasing order; then, all the quadruples of the odd numbers larger than 1, in increasing order; and so on.

For example, and
Set for every 1" class="latex" src="https://s0.wp.com/latex.php?latex=m%3E1&bg=ffffff&fg=333333&s=0&c=20201002"/> and .

That is: the powers of 2 follow, in the Sharkovskii ordering, any number which has an odd factor.

For example, .
Sort the pairs of the form —i.e., the powers of 2—in reverse order.

The set of positive integer with the Sharkowsky ordering has then the form:

Note that is a total ordering.

Theorem 2. (Sharkovsky’s “great” theorem) Let be an interval on the real line and let be a continuous function.

If has a point of least period , and , then has a point of least period . In particular, if has a periodic point, then it has a fixed point.
For every integer it is possible to choose and so that has a point of minimum period and no points of minimum period for any . In particular, there are functions whose only periodic points are fixed.

Bibliography:

Keith Burns and Boris Hasselblatt. The Sharkovsky theorem: A natural direct proof. The American Mathematical Monthly 118(3) (2011), 229–244. doi:10.4169/amer.math.monthly.118.03.229
Robert L. Devaney, An Introduction to Chaotic Dynamical Systems, Second Edition, Westview Press 2003.
Shlomo Sternberg, Dynamical Systems, Dover 2010.

82: Fraser Tweedale

2026-05-19T12:00:00Z

We talked to Fraser Tweedale. Fraser works at Red Hat, and is on the Haskell Security Response Team. We talked about security in the context of Haskell, both technical and organizational issues, and also the political issues involved. Fraser's work is both really important and not well-known in the Haskell ecosystem, so it was high time for him to come on the show.

Type out the code

2026-05-19T00:00:00Z

Freecoding improves broader programming proficiency

Redoubtful: Linux agent sandbox progress

2026-05-17T18:50:36Z

I’ve also been experimenting with agent sandboxes lately. redoubtful is a work-in-progress sandbox that supports:

Linux-only sandboxes: I’m focusing on what Linux supports, specifically, rather than trying to support the lowest-common-denominator features that work cross platform.
Modular configuration profiles: See below.
Isolation using pasta and bwrap.
A shadow filesystem that looks like your home directory, so things like git worktree actually work correctly. You can also selectively mount existing parts of your filesystem in read-only or read-write mode.
Network port forwarding and filtering proxy server.
TODO: Proxy credential support.

But first, a warning: Nearly 100% of this code was written by coding agents, much of it by a local Qwen3.6 27B. I am, however, keeping a very close eye on the output—one of my goals here is to see just what a small agent like this can do. This is maybe only 80% as good as my handwritten code would be a similar point in a project.

And finally, this is an incomplete work-in-progress, and it has not been packaged nicely for anyone besides me yet.

Modular configuration “profiles”

One of the slightly novel parts of all this is the ability to define modular configuration. This allows us to invoke a sandbox with a specific set of credentials:

redoubtful run --uses pi --uses llama-server pi

Here, we’re running the pi.dev coding agent with a locally-served Qwen3.6 27B via llama-server. Qwen3.6 27B is a fantastic lightweight coding model, and it works very well with pi.dev’s minimalist prompt. And since we’re running in a sandbox, we don’t care that pi.dev provides no sandbox and no confirmation before acting.

To set up these two profiles, we first define a node profile:

# Standard Node setup. If you're using `nvm`, you'll need to fix the path_add
# entry to point to the correct nvm version.
#
# We might want some kind of plugin system to handle messy things like nvm.
[profile.node]
mounts = [
    { host = "~/.npm-global" },
    { host = "~/.local/share/nvm", access = "rw" },
]
path_add = ["~/.npm-global/bin", "~/.local/share/nvm/v24.15.0/bin"]

Then let’s make Rust work:

# A Rust setup, with optional rustup and advisory support.
[profile.rust]
mounts = [
    { host = "~/.rustup" },
    { host = "~/.cargo" },
    # Cargo audit/deny support, which needs to take a lock to update the
    # advisory database.
    { host = "~/.cargo/advisory-dbs/", access = "rw" },
]
path_add = ["~/.cargo/bin"]

And then basic git is easy—we just need enough config to read user.name and user.email:

# Things you will likely want for git.
[profile.git]
mounts = [{ host = "~/.gitconfig" }]

And then finally, we can set up pi itself:

# Profile for the pi coding agent. Run with:
#
#     redoubtful run -u pi pi
[profile.pi]
uses = ["node", "rust", "git"]
mounts = [{ host = "~/.pi", access = "rw" }]

# Pass through llama-server connections.
[profile.llama-server]
forwards = [{ host_port = 8080 }]
proxies = [{ host = "127.0.0.1" }]

What’s left?

The biggest missing piece is teaching the proxy server how to inject real credentials into network connections. This isn’t a new idea. The goal is to provide access to things like GitHub without giving an agent actual credentials.

After that, it’s just packaging everything up nicely and writing some docs, so that other people (or agents) can easily configure it for different purposes.

Lab notebook: Edit completion #1

2026-05-16T15:49:55Z

I continue to be interested in late-2024-era edit completion, the “Fill in the Middle” (FIM) models. You know, what Copilot used to do, back before it started generating “mini diffs.” Why?

I like and use agentic workflows. But as many people are realizing, it’s super easy to lose track of what’s happening in your code, with terrible consequences. So I want to reinvest in human-in-the-loop tools, too. (And slightly weaker agentic models, but more on that later.)
The new-school edit completion offered by Copilot and Zed’s Zeta2 actually slows me down. It overlays diffs on my buffer, which is visually disorienting at speed. And it proposes edits further from the current cursor, which take me longer to mentally process. Personally, the new style feels like hunt-and-peck. The older style felt like really fast touch typing.

Mind you, I’m a very specific sort of user. I want to know how my code works. I want my code to be clean. And I can read a half-page code completion in moments, thanks to way too many years of reading PRs.

Initial experiments

All experiments performed in Zed, which does less post-processing of the raw model output than some tools. All evaluations are purely subjective.

New-school models (generating diffs). Zeta2 is honestly pretty underwhelming right now. The completions are very generic. And Zeta2 seems to be bad about taking the context into account. It will complete a function, sure. But I’d swap Zeta2 for late 2024 Copilot in a heartbeat.

Old-school models (FIM, inserting at cursor). Let’s go down the list so far:

ggml-org/Qwen2.5-Coder-7B-Q8_0-GGUF:Q8_0: The classic, default choice. This isn’t terrible, and it gives more context-aware completions than Zeta2. But it’s generations old, and I want to know if anything is new and shiny.
mradermacher/Seed-Coder-8B-Base-i1-GGUF:Q6_K. This is the raw base that went into Zeta2, I think? It doesn’t seem to be useful in Zed, because the inserted text feels pretty raw. This might work better in a smarter harness. But I’m dropping it for now.
JetBrains/Mellum-4b-base-gguf:Q8_0. Downloaded, but not yet tested.
unsloth/Qwen3.6-35B-A3B-GGUF:IQ4_XS. This is unexpectedly good! Worth further experimentation.

Refining Qwen3.6 35B A3B: Changing order from PSM to SPM

Qwen typically uses FIM, “Fill in the Middle” completion. This uses 3 magic tokens:

/// Qwen FIM prefix marker.
const PRE: &str = "<|fim_prefix|>";
/// Qwen FIM suffix marker.
const SUF: &str = "<|fim_suffix|>";
/// Qwen FIM middle marker (model generates after this).
const MID: &str = "<|fim_middle|>";

We have two possible flavors. The original is “PSM” compeletion, “prefix, suffix, middle”:

{PRE}{prefix}{SUF}{suffix}{MID}

But since the prefix grows with each keystroke, we can’t cache the entire message. We could get much better caching with “SPM” order:

{SUF}{suffix}{PRE}{prefix}{MID}

Here, we can cache everything up to the final {MID} character, and resume generation with a longer prefix. Whooo, speed!

But Zed doesn’t support SPM completion, only PSM. So I fired up a copy of Claude Code (as one does), and asked, “Hey, write me a Rust proxy server (using my standard conventions) that intercepts /completion, and translates PSM to SPM please.”

Results: Extremely disappointing. SPM format confuses Qwen3.6 35B A3B pretty badly. But then I thought, “Hey, even if we’re running in /completion mode, this is still an instruction-tuned model. Can we prompt it?” One unscientific tweak later:

You are a code-completion tool. You receive input in
fim_suffix+fim_prefix+fim_middle order, and your job
is to generate what the user would be likely to type
next. When in doubt, keep it short. Think of this like
generating a diff in agentic coding mode. You're trying
to insert the right text to make a working program that
does what the user wants. If there's no obvious next
step, generate nothing.

{SUF}{suffix}{PRE}{prefix}{MID}

This is still pretty bad, but it’s better. You can tell it’s trying to be an SPM autocompleter, though it’s still the worst of the bunch.

Possible next steps:

What if we modify the proxy to transform /completion into a /chat/completions request, with a real prompt, real text inputs, and tool for insert_at_cursor(text)? Can we access more of the model’s intelligence?
Qwen3.6 35B A3B is small enough to fine-tune! We could look up file completion data sets, and try to create a LoRA adapter. We could even use something like tree-sitter to generate custom completion examples. Would that give us something useful?

I also notice that FIM-style models are notoriously bad at choosing a good stopping place. This can be fixed with a lot of regexes. But what if our fine-tuning data took care to demonstrate good stopping places?

Coding on Paper

2026-05-15T22:00:00Z

About three months ago, I bought the Onyx BOOX 25.3” Mira Pro Color, an e-ink monitor for desktop use. I’ve used it as my primary monitor since, and I’ve had a lot of questions about it. This is my experience report, from the perspective of a working, still mostly typing, programmer.

This is not a sponsored post, and it is not a product review. I wrote a very similar post about the Daylight DC-1 last year.

Neovim in the morning sunlight.

As explained in last year’s post, the reason I persist with these monitors is because it makes me energetic and happy. Sunlight, direct or indirect, helps me stay clear and focused during my workday. I find spaces illuminated by natural light beautiful and inspiring.

I’m not going to recommend that you buy one of these devices. They’re expensive, about $2000, and the experience is quite different from LCD. Even if this looks cool, it seems to me very possible that most people would not like it in practice. With that said, I am happy with it, and I’ll probably keep investing in these tools as they get even better with time.

Spending a workday in the garden.

Using the Mira Pro as a primary monitor is a continuation of the experiments with my e-ink tablets and Termux as coding environments. But now, with far fewer compromises. I’m running my regular NixOS environment on my work laptop. No SSH and tmux needed, no Android terminal emulator to customize.

What I have done, though, is spent quite some time on making my system more suited for this monitor. The Mira Pro does not work well with dark themes. In fact, it only works well with high contrast light themes.

Luckily, I’m bent towards minimalism, so I already used near-monochrome themes, relying more on typographic syntax highlighting rather than coloring. I now have custom themes for Neovim, Zed, and Ghostty with a few vivid colors for things like selection, comments, and constants. Otherwise it’s largely black on white.

It’s trickier with other applications. In Firefox, I’ve started using the high contrast setting. That works pretty much like an inverse of DarkReader. I now run Spotify in the browser in order to avoid its dark theme.

The monitor has a clunky menu system with which you can change rendering modes; things like contrast and speed. I found an open-source reverse-engineered NodeJS package that I use with Hyprland keybindings to easily change rendering modes and manually refresh. No need for the built-in menu.

In practice I use two modes:

Reading:: This mode renders colors most vividly and text sharply, but typing with it is agony. I use it when reading text documents, web pages, or code diffs.
Writing:: This is by far the most commonly used mode, which compromises colors and sharpness for way better latency. I use this for everything in the terminal, chat, general web browsing, and probably most other things not covered by the reading mode.

See the following photos for a close-up comparison:

Reading mode, where colored regions are pretty smooth and text looks sharp.

Writing mode, where colored regions (light gray, red, green) are grainy and text is a bit blurry.

What about latency? Here’s the two short clips of me typing with the reading and writing modes:

Reading mode, with horrible latency for typing.

Writing mode, with some but acceptable latency.

Ghosting? In my writing mode it’s minimal. It really doesn’t bother me.

About the color panel: I don’t like it very much to be honest. It was the only version of the Mira Pro available from the Swedish retailer at the time, so I went with it. I think I would’ve been happier with a monochrome panel, because the coloring technology makes it considerably darker.

Here’s a comparison between the Palma 2 Pro (using a similar but smaller Kaleido color panel) and my old Tab Ultra (with a monochrome panel):

Color vs Monochrome e-ink panels without backlight.

Unless the room has great diffuse lighting, natural or otherwise, the color panel does require some backlight. In direct sunlight or outdoors it works without. I might spend more time optimizing the lighting in my office to make this work during the winter months.

So, what’s to make of it? Personally, I enjoy using this monitor a lot, even if it’s not perfect. Should you buy an expensive 25” e-ink monitor? I cannot say. But if you do, let me know how it works out.

My custom themes and keybindings can be found here.

I should blog more

2026-05-13T05:04:52Z

This blog is ancient, in blog years. The first post was on June 30, 1998, and it featured a randomized emboss for MathMap. Back in those days, it was a mix of neat little snippets like that and interesting links. The site was a single, hand-edited HTML file in reverse chronological order. It ran on a Linux mini-tower built from parts from the MIT Swapfest, and it lived under my desk.

Google hadn’t been incoporated yet. The Internet bubble was still inflating.

Over the years, the tech stack changed: for a while, this site used SGML-based rendering via a custom script (or was it XML?), then it was a nice interactive Typo site with comments, and then eventually it migrated to the current Jekyll architecture. Which seems to be about 12 years old. I’m pretty proud to have kept nearly all the inbound links working for decades now.

Around 2007 or so, I did a fun series of high effort posts about probability monads. But high-effort posts are a trap. Soon I started feeling like every post ought to be high effort. And then I wrote less and less.

But blogs are a bit of a retro endeavour these days. RSS readers still exist, but I imagine nearly all my subscribers have disappeared since the heady days of 2007. And apparently it’s trendy to work with the garage door up.

So maybe it’s time to get back this site’s roots. I don’t have any MathMap snippets for you today, sadly, because the last release seems to have been in 2004. But here’s a cool trick!

Edit completion works with Qwen3.6 35B A3B!

2026-05-13T05:04:52Z

Do you miss the old-style Copilot completions? The ones where it inserted grey text at the cursor? There’s an open version of this called “FIM completion”. And the classic model for doing this is Qwen2.5 Coder 7B.

But it turns out that Qwen3.6 35B A3B is can also do autocompletion! The fact that it has 3B active parameters means that it’s fast. And the 35B total parameters means it’s smarter than the smaller models.

So let’s fire it up using llama-server:

llama-server -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-IQ4_XS \
    --cache-type-k q8_0 --cache-type-v q8_0 --no-mmproj \
    --ctx-size 4000

--no-mmproj says to disable the vision mode. --cache-type-k q8_0 --cache-type-v q8_0 reduces the cache precision, since we’re not really using the cache. You might also need to grab a smaller quant, depending on your available VRAM.

Then, we can configure it using Zed:

{
  "edit_predictions": {
    "provider": "open_ai_compatible_api",
    "open_ai_compatible_api": {
      "api_url": "http://localhost:8080/v1/completions",
      "model": "unsloth/Qwen3.6-35B-A3B-GGUF:IQ4_XS",
      "prompt_format": "qwen",
      "max_output_tokens": 256,
    },
  },
}

So how good is this? Well, the completions aren’t too bad at all, but Zed doesn’t seem to do much post-processing. So the completions to be too long. At lot of this could likely be improved with a proxy that did some pre- and post-processing, and maybe a bit of fine tuning.

But this is an actual, working, 100% local autocomplete. And it’s close to being actually good.

Catching Typos on My Website with Browser Testing

2026-05-12T22:00:00Z

One neat thing about Bombadil’s specification language is that it’s plain TypeScript, with access to external NPM packages. I’ve written a specification that spell-checks my website — what you’re reading now — and I want to share how that turned out.

The inner loop (spell-checking):

Bombadil randomly walks the website and collects misspelled words as property violations. The specification uses nspell with American and British English dictionaries and a personal word list in the repository. This is fast and strict.

The outer loop (triage):

I’m running Claude Code with a spell-checking skill, a triage loop that goes something like this:

Run Bombadil against the local development server for 5 minutes and capture the output. If no words flagged, we’re done.
Collect each flagged word and the URL it appeared on.
Triage each word into one of these buckets:
- Real typo: fix the markdown source
- Legitimate common word: add to the custom dictionary
- Legitimate uncommon or very technical word: mark inline with spellcheck="false"
- Extraction noise: add a unit test and fix the word extractor
Run Bombadil against each failing URL to confirm the corrections.
Go to step 1.

This is slow and loose.

The hybrid model seems to work well; it has flagged words in almost every blog post. It has fixed 13 real typos and added 130+ words to my personal dictionary. Example typos include “forseeable”, “similiar”, “perculiar”, “occured”. Some of these were 10 years old.

Claude doesn’t have to waste tokens spell-checking everything over and over. Right now I’m just running this locally, but you could imagine a more elaborate setup for large websites where the “inner loop” runs as a nightly job, invoking the “outer loop” only on violations. You could involve a human where needed, and build up a domain-specific dictionary over time.

Note that using an LLM is entirely optional. It just saves me some time. You can do triage on your own.

Why not spell-check the sources directly? Yes, that is often preferable, and I use spell in Neovim all the time. But it’s not always practical. At least in my experience, the tooling trips up on syntax and templating in more complicated setups. Maybe your editor handles this better than mine does, or maybe you’re fine with tools like typos and codespell, but I like the fact that this approach is external and checks the rendered output. Given that Bombadil interacts with web applications, you could even run this against dynamic applications to spell-check states deep in the UI.

Speaking of source-level checking: since the custom dictionary is a plain word list, I point Neovim’s spellfile at it and use zg to add words while I edit. A single source of truth that both tools write to.

vim.opt.spellfile = "/path/to/custom.utf-8.add" vim.opt.spelllang = "en"

Being able to use NPM packages in specifications has turned out to be more useful than I expected. In addition to nspell, I’m using tlds to identify URLs. Bombadil is built for property-based testing of web applications, but with a specification language and package ecosystem at hand, its uses might be broader than my original vision.

If you’re interested in setting up something like this on your own, you’ll find the sources in my Bombadil playground.

Disclosure: I’m the original author and lead for the Bombadil project at Antithesis.

Smaller, cheaper Plutus scripts with the UPLC command-line tool

2026-05-12T13:19:24Z

If you want to see a use of Agda in real life, to provide certificates validating the correctness of compiler passes, check out this blog post from my colleague Ziyang Liu at Input Output. A simplified description of one of the passes appears in A Tale of Two Zippers, by myself, Jacco Krijnen, and Ramsay Taylor.

Exception Annotations: Lay of the Land

2026-05-08T00:00:00Z

Exception annotations were introduced in GHC 9.10, and can be an invaluable tool for debugging thorny problems. The initial implementation had some important limitations that made them less useful in practice than one might hope, but fortunately the situation has since been much improved. In this blog post we will give a detailed overview of the status quo as of GHC 9.12/9.14, identify some gotchas you should be aware and provide advise on how to deal with them, and briefly look ahead to what will change in GHC 10.0. We will also dedicate a section to discussing the problems in GHC 9.10, for those who cannot yet upgrade.

Although we will recap all necessary definitions, this blog is not meant to be an introduction to exception annotations; if you have never used them before, you might first want to watch The Haskell Unfolder Episode 29: exceptions, annotations and backtraces.

Backtraces

Before we look at the general framework for exception annotations, let’s first briefly recap the concept of backtraces, which is GHC’s answer to stack traces in other languages. The situation is more complicated in Haskell due laziness, and there are actually four different kinds of backtraces:

based on HasCallStack annotations
based on cost-centres (which will require compiling your program with profiling enabled)
based on IPE info
based on DWARF info

In this blog post we will use the first two only, but for the purposes of our main discussion here the choice actually does not matter much; see GHC proposal Decorate exceptions with backtrace information for details. If you’re interested in IPE backtraces specifically, you might also be interested in our blog post Better Haskell stack traces via user annotations, which discusses some recent extensions we implemented to improve these.

`HasCallStack` backtraces

Consider this simple Haskell program, where main calls top calls middle calls bottom:

bottom :: HasCallStack => IO ()
bottom = do
    bt <- collectBacktraces
    putStrLn $ displayBacktraces bt

middle :: HasCallStack => IO ()
middle = bottom

top :: HasCallStack => IO ()
top = middle

main :: IO ()
main = top

A HasCallStack is essentially an additional function argument which is automatically populated by GHC at call sites with information about where the function was called. When we run this program, we see something like this:

HasCallStack backtrace:
  collectBacktraces, called at exe/DemoCallStack.hs:13:11 in (..)
  bottom, called at exe/DemoCallStack.hs:18:10 in (..)
  middle, called at exe/DemoCallStack.hs:22:7 in (..)
  top, called at exe/DemoCallStack.hs:25:8 in (..)

The only thing worth noting here is that the moment a HasCallStack chain is broken, the backtrace is cut off there. For example, if middle does not have a HasCallStack constraint, we can no longer see where middle was called from:

HasCallStack backtrace:
  collectBacktraces, called at exe/DemoCallStack.hs:19:11 in (..)
  bottom, called at exe/DemoCallStack.hs:24:10 in (..)

The fact that top still has a HasCallStack constraint does not matter: the callstack is cut at the first missing link.

Cost centre backtraces

Cost centres are how GHC implements profiling: very roughly, the cost of a computation is attributed to its enclosing cost centre (see chapter Profiling of the GHC manual). Like HasCallStack, this relies on source code annotations:

{-# SCC bottom #-}
bottom :: HasCallStack => IO ()
bottom = do
    bt <- collectBacktraces
    putStrLn $ displayBacktraces bt

{-# SCC middle #-}
middle :: IO ()
middle = bottom

{-# SCC top #-}
top :: HasCallStack => IO ()
top = middle

{-# SCC main #-}
main :: IO ()
main = do
    setBacktraceMechanismState CostCentreBacktrace True
    top

Unlike HasCallStack, however, GHC offers ways for inserting such annotations automatically, which can often make cost centre based callstacks more practical than HasCallStack. The most common flag to do this is -fprof-auto or (in recent GHC) -fprof-late (see Late Cost Centre Profiling). This inserts cost centres around all top-level functions, as we did manually above.

Cost centre backtraces must be explicitly enabled by calling setBacktraceMechanismState, and you need to compile your code with profiling enabled; the cabal option --enable-profiling both enables profiling as well as automatic cost centre insertion. The backtrace for this example might look something like

Cost-centre stack backtrace:
  DemoCallStack.main (exe/DemoCallStack.hs:(32,1)-(34,7))
  DemoCallStack.top (exe/DemoCallStack.hs:28:1-12)
  DemoCallStack.middle (exe/DemoCallStack.hs:24:1-15)
  DemoCallStack.bottom (exe/DemoCallStack.hs:(18,1)-(20,35))

Be aware however that optimizations can delete cost centres, especially in simple examples like this (#27225).

Cost centres vs exception handling

Consider the following example: as before, main calls top calls middle calls bottom, which prints a backtrace; however bottom then throws an excepton. Meanwhile, main installs an exception handler called handlerTop, which in turn calls handlerMiddle calls handlerBottom, which prints its own backtrace:

bottom :: HasCallStack => IO ()
bottom = do
    bt <- collectBacktraces
    putStrLn $ displayBacktraces bt
    throwIO $ userError "Uhoh"

middle :: HasCallStack => IO ()
middle = bottom

top :: HasCallStack => IO ()
top = middle

handlerBottom :: HasCallStack => SomeException -> IO ()
handlerBottom _e = do
    bt <- collectBacktraces
    putStrLn $ displayBacktraces bt

handlerMiddle :: HasCallStack => SomeException -> IO ()
handlerMiddle e = handlerBottom e

handlerTop :: HasCallStack => SomeException -> IO ()
handlerTop e = handlerMiddle e

main :: IO ()
main = do
    setBacktraceMechanismState CostCentreBacktrace True
    top  `catch` handlerTop

The HasCallStack backtrace printed by bottom is

HasCallStack backtrace:
  collectBacktraces, called at exe/DemoCCS.hs:24:11 in (..)
  bottom, called at exe/DemoCCS.hs:29:10 in (..)
  middle, called at exe/DemoCCS.hs:32:7 in (..)
  top, called at exe/DemoCCS.hs:41:5 in (..)

as before; the HasCallStack printed by handlerBottom is very similar:

HasCallStack backtrace:
  collectBacktraces, called at exe/DemoCCS.hs:13:11 in (..)
  handlerBottom, called at exe/DemoCCS.hs:17:19 in (..)
  handlerMiddle, called at exe/DemoCCS.hs:20:16 in (..)
  handlerTop, called at exe/DemoCCS.hs:41:18 in (..)

For the cost-centre based backtrace, the one shown in bottom is as before:

Cost-centre stack backtrace:
  DemoCCS.main (exe/DemoCCS.hs:(39,1)-(41,27))
  DemoCCS.top (exe/DemoCCS.hs:32:1-12)
  DemoCCS.middle (exe/DemoCCS.hs:29:1-15)
  DemoCCS.bottom (exe/DemoCCS.hs:(23,1)-(26,30))

but the one shown in handlerBottom is more surprising:

Cost-centre stack backtrace:
  DemoCCS.main (exe/DemoCCS.hs:(39,1)-(41,27))
  DemoCCS.top (exe/DemoCCS.hs:32:1-12)
  DemoCCS.middle (exe/DemoCCS.hs:29:1-15)
  DemoCCS.bottom (exe/DemoCCS.hs:(23,1)-(26,30))
  DemoCCS.handlerTop (exe/DemoCCS.hs:20:1-30)
  DemoCCS.handlerMiddle (exe/DemoCCS.hs:17:1-33)
  DemoCCS.handlerBottom (exe/DemoCCS.hs:(12,1)-(14,35))

Whether or not this is expected/correct behaviour is arguable, but the rule is this: the cost centre stack is not restored until we leave the scope of catch. Put another way: the cost centre stack reflects the fact that bottom “calls” handlerTop, however indirectly. This applies transitively: if handlerTop would throw an exception, which would then be caught by some other exception handler, then its backtrace would reflect that top “called” handlerTop “called” that other exception handler. This kind of situation can arise quite naturally, for example when using handlers that deallocate some resources and then rethrow the exception.

Basic definitions

Before we look at the subtleties that arise from actually catching and throwing (or rethrowing) exceptions, we’ll first get the basic definitions out of the way. These have not changed much between recent GHC versions and are hopefully uncontroversial.

Exception annotations

Exceptions annotations can basically be anything at all; the only requirement is that that we can display them:

class Typeable a => ExceptionAnnotation a where
  displayExceptionAnnotation :: a -> String

An important instance of this class is Backtraces, which wraps a set of different kinds of backtraces:

instance ExceptionAnnotation Backtraces where
    displayExceptionAnnotation = Base.displayBacktraces

Exception context

An exception context is essentially just a list of exception annotations. However, since those annotations may be of different types, we need to wrap them in an existential:

data ExceptionContext = ExceptionContext [SomeExceptionAnnotation]
data SomeExceptionAnnotation = forall a. ExceptionAnnotation a => SomeExceptionAnnotation a

There are functions for manipulating the exception context. The most important are emptyExceptionContext and addExceptionAnnotation, for creating an empty context and inserting an annotation into an existing context respectively.

emptyExceptionContext  :: ExceptionContext
addExceptionAnnotation :: ExceptionAnnotation a => a -> ExceptionContext -> ExceptionContext

Pivotal change: `SomeException`

The pivotal change in all of this is in the definition of SomeException which, starting in GHC 9.10, now has an associated list of annotations:

data SomeException = forall e. (Exception e, HasExceptionContext) => SomeException e
type HasExceptionContext = (?exceptionContext :: ExceptionContext)

The use of an implicit parameter means that pattern matching on SomeException remains possible in the same way as before (though the annotations would be silently dropped).

There are various functions for extracting and manipulating the exception context associated with an exception, such as someExceptionContext and addExceptionContext:

someExceptionContext :: SomeException -> ExceptionContext
addExceptionContext  :: ExceptionAnnotation a => a -> SomeException -> SomeException

However, probably the most important function for extending exception contexts is annotateIO, which installs an exception handler that extends any exception that is thrown with the specified annotation:

annotateIO :: forall e a. ExceptionAnnotation e => e -> IO a -> IO a
annotateIO ann (IO io) = IO (PrimOp.catch# io handler)
  where
    handler se = PrimOp.raiseIO# (addExceptionContext ann se)

It is important to emphasize that this is implemented with primops, not with the regular catch and throwIO functions, which do considerably more than merely catching and throwing, as we shall see.

`Exception` type class

The Exception type class is a central abstraction in Haskell’s exception ecosystem. As part of the exception annotation work, it has received one minor extension, and it was changed in two not-so-minor-but-rather-subtle ways. Let’s first get the part out of the way which has not changed: exceptions are no good if we cannot see them:

class (Typeable e, Show e) => Exception e where
  displayException :: e -> String
  displayException = show

  -- (..)

`backtraceDesired`

The minor extension is a new function called backtraceDesired, which indicates if a backtrace should be attached to exceptions of this type; we will see how this function is used when we discuss the implementation of throwIO.

class (Typeable e, Show e) => Exception e where
  -- (..)

  backtraceDesired :: e -> Bool
  backtraceDesired _ = True

The argument to backtraceDesired is already fully constructed exception; the question is whether a backtrace should be added to that exception. In most cases the argument can simply be ignored, but it doesn’t have to be. For all but a handful of specialized cases the default implementation (indicating that yes, we want a backtrace) will be fine.

`fromException`

The not-so-minor-but-rather-subtle changes are in fromException and toException, which remove and add the SomeException wrapper around exceptions respectively. Let’s first look at fromException:

class (Typeable e, Show e) => Exception e where
  -- (..)

  fromException :: SomeException -> Maybe e
  fromException (SomeException e) = cast e

This may look no different from the implementation prior to 9.10, but recall that SomeException now has an additional field: the exception annotations. As mentioned above, a pattern match like this will silently discard those annotations.

`toException`

The final function in the Exception class is toException, which is intended to add the SomeException wrapper.

class (Typeable e, Show e) => Exception e where
  -- (..)
  toException :: e -> SomeException

Prior to 9.10, the default implementation literally just added the SomeException constructor:

  -- implementation prior to 9.10
  toException = SomeException

However, starting in 9.10 we also need to give an initial value for the exception context. The default implementation, reasonably enough, chooses the empty context:

  -- implementation in 9.10, 9.12, 9.14, and 10.0
  toException e = let ?exceptionContext = emptyExceptionContext in SomeException e

Unfortunately, however, the documentation of toException has also been modified, and now states that toException should produce a SomeException with no attached ExceptionContext. Personally, I think that is a mistake (#27194); we will discuss this in the next session.

⚠️ Caution: Instance for `SomeException` itself

SomeException itself is also an instance of Exception; fromException is trivial, and backtraceDesired and displayException piggy-back on the definition of whatever exception is wrapped:

instance Exception SomeException where
  fromException = Just

  backtraceDesired (SomeException e) = backtraceDesired e
  displayException (SomeException e) = displayException e

  -- (..)

The definition of toException is more problematic, however. Prior to 9.10, calling toException on SomeException was just an identity:

instance Exception SomeException where
  -- (..)

  -- Prior to 9.10
  toException se = se

Now, however, the implementation must clear the existing context in order to satisfy the contract:

instance Exception SomeException where
  -- (..)

  toException (SomeException e) = let ?exceptionContext = emptyExceptionContext in SomeException e

I think this is simply wrong; at the very least, it is highly counter-intuitive, and it also does not match the original proposal; I don’t know why this was changed. We will see some consequences of this design choice when we discuss throwing exceptions.

Newtype helpers

There are two auxiliary types, with their own Exception instances, that can be helpful when throwing or catching exceptions in specific ways. We haven’t discussed either throwing or catching yet, but we will nonetheless discuss these auxiliary types first as we will need them in the subsequent sessions.

`NoBacktrace`

NoBacktrace can be used to override backtraceDesired:

newtype NoBacktrace e = NoBacktrace e

instance Exception e => Exception (NoBacktrace e) where
  fromException = fmap NoBacktrace . fromException
  toException (NoBacktrace e) = toException e
  backtraceDesired _ = False
  -- displayException left at its default implementation

`ExceptionWithContext`

The other, arguably more imporant, auxiliary type is ExceptionWithContext. The definition itself is straight-forward: it simply pairs some value with an exception context:

data ExceptionWithContext a = ExceptionWithContext ExceptionContext a

The idea is that this type gives us a way to catch exceptions of specific types (rather than catching SomeException), and still get access to the exception context. For example:

data MyException = MyException
  deriving stock (Show)
  deriving anyclass (Exception)

example :: IO ()
example = someAction `catch` \(ExceptionWithContext ctxt MyException) -> do
    -- (..)

The implementation is reasonably straight-forward:

instance Exception a => Exception (ExceptionWithContext a) where
    toException (ExceptionWithContext ctxt e) =
        case toException e of
          SomeException c ->
            let ?exceptionContext = ctxt
            in SomeException c

    fromException se = do
        e <- fromException se
        return (ExceptionWithContext (someExceptionContext se) e)

    backtraceDesired (ExceptionWithContext _ e) = backtraceDesired e
    displayException = displayException . toException

That said, the devil is very much in the detail with these kinds of definitions, and as we shall see, it was defined incorrectly in GHC 9.10.

Throw

The primary function for throwing an exception is throwIO, which is defined as¹

throwIO :: (HasCallStack, Exception e) => e -> IO a
throwIO e = do
    se <- toExceptionWithBacktrace e
    IO (PrimOp.raiseIO# se)

Most of the actual work happens in toExceptionWithBacktrace:

toExceptionWithBacktrace :: (HasCallStack, Exception e) => e -> IO SomeException
toExceptionWithBacktrace e =
    if backtraceDesired e then do
      bt <- Base.collectBacktraces
      return (addExceptionContext bt (toException e))
    else
      return (toException e)

That is, if a backtrace is desired, we collect one and add it as an annotation to the exception that we’re about to throw.

Generalization

In GHC 9.14 toExceptionWithBacktrace was generalized to

toExceptionWithBacktrace :: (HasCallStack, Exception e) => e -> IO SomeException
toExceptionWithBacktrace e =
    if backtraceDesired e then do
      SomeExceptionAnnotation ea <- collectExceptionAnnotation
      return (addExceptionContext ea (toException e))
    else
      return (toException e)

This is an experimental API (not yet part of base); see CLC #348 for details. The idea is that you can use setCollectExceptionAnnotation to register your own function to be run to construct an annotation whenever an exception is thrown anywhere. For example, if you’re worried that some IO faults are happening due to your CPU overheating, you might use

newtype Temperature = Temperature Int
  deriving stock (Show)
  deriving anyclass (ExceptionAnnotation)

getTempCPU :: IO Temperature
getTempCPU = -- (..)

main :: IO ()
main = do
    setCollectExceptionAnnotation getTempCPU
    -- (..)

By default, the collection callback is collectBacktraces, so unless you register a different callback the behaviour is the same as in 9.10 and 9.12.

⚠️ Caution: Throwing `SomeException`

Because throwIO calls toException, and since toException for SomeException clears the exception context, you probably don’t want to call throwIO on an argument of type SomeException: any exception annotations that might be embedded in that exception will be lost.

The most common case for throwing SomeException is inside an exception handler; we will cover this specific case of rethrowing exceptions when we discuss onException, but we can reuse the same combinators also to define a general “throw precisely this exception” function:

raiseIO :: SomeException -> IO ()
raiseIO (SomeException e) = rethrowIO (ExceptionWithContext ?exceptionContext e)

Catch

The most important change in GHC 9.12 from 9.10 is in the definition of catch, which now implements the WhileHandling proposal. The idea is that when we throw a new exception while handling another, we annotate that new exception with the old exception: the new exception arose while handling the old exception:

data WhileHandling = WhileHandling SomeException deriving Show

catch :: Exception e => IO a -> (e -> IO a) -> IO a
catch (IO io) handler = IO $ PrimOp.catch# io handler'
  where
    handler' se =
      case fromException se of
        Just e' -> PrimOp.catch# (unIO (handler e')) (handler'' se)
        Nothing -> PrimOp.raiseIO# se

    handler'' se se' = PrimOp.raiseIO# (addExceptionContext (WhileHandling se) se')

⚠️ Caution: Rethrowing the same exception

An important combinator for dealing with exceptions is onException, which runs some specified action when an exception occurs (typically some resource cleanup) and then rethrows the exception again:

onException :: IO a -> IO b -> IO a
onException io what = io `catch` \e -> do
    _ <- what
    throwIO (e :: SomeException)

As written, this is suboptimal: for every layer of onException, we re-throw the annotation stripped from its original annotations (due to throwIO and toException for SomeException), and with a new WhileHandling annotation with the original exception (due to catch). This result in unnecessary noise: all the information is still there, but it’s buried. When we rethrow the same exception, there is no need for WhileHanding: we should just throw the original exception as-is.

To solve this, base now offers new functions specifically to catch-and-rethrow: catchNoPropagate² is like the old catch, without the handler that adds the WhileHandling annotation; and rethrowIO, which avoids adding a backtrace (using NoBacktrace; moreover, both of these explicitly preserve contexts (using ExceptionWithContext):

catchNoPropagate :: Exception e => IO a -> (ExceptionWithContext e -> IO a) -> IO a
catchNoPropagate (IO io) handler = IO $ PrimOp.catch# io handler'
  where
    handler' se =
      case fromException se of
        Just e' -> unIO (handler e')
        Nothing -> PrimOp.raiseIO# se

rethrowIO :: Exception e => ExceptionWithContext e -> IO a
rethrowIO e = throwIO (NoBacktrace e)

This then enables the following improved implementation of onException:

onException :: IO a -> IO b -> IO a
onException io what = io `catchNoPropagate` \e -> do
    _ <- what
    rethrowIO (e :: ExceptionWithContext SomeException)

⚠️ Caution: Displaying exceptions

The final pitfall we need to discuss is displaying exceptions. Usually we call displayException to do so, but this does not show annotations. The idea is that displayException is meant to render an exception for users, not necessarily developers.³ Starting withGHC 9.14 there is a separate function displayExceptionWithInfo, but that is not available in GHC 9.12; moreover, even in GHC 9.14 I would advise against using it when you are debugging, as it only shows the top-level annotations, making things like WhileHandling much less useful.

Personally, I like to use my own custom exception handler which shows the full exception, and makes a few other improvements also: it makes the nesting structure clearer, and reorders annotations to improve readability; you can find an example implementation on GitHub .

GHC 9.10

If you cannot upgrade from GHC 9.10, unfortunately the exception annotation infrastructure has some important limitations. Upgrade if you can; if not, this section will explain what you need to be aware of.

Lost annotations

As we remarked when we discussed catch, the WhileHandling proposal only got implemented in GHC 9.12. In GHC 9.10 the definition of catch was still unchanged from its definition before the exception annotation proposal:

catch :: Exception e => IO a -> (e -> IO a) -> IO a
catch (IO io) handler = IO $ PrimOp.catch# io handler'
  where
    handler' se =
      case fromException se of
        Just e' -> unIO (handler e')
        Nothing -> PrimOp.raiseIO# se

However, the Exception instance for SomeException was already changed, so that toException clears the exception context. This means that if an exception with annotations is ever caught and rethrown anywhere, in a pattern such as

someAction `catch` \(e :: SomeException) -> throwIO e

those annotations will be lost. Similarly, since onException had not yet been changed either, any call to onException, and by implificationbracket, anywhere in your callstack would also lose any annotations:

bracket :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c
bracket before after thing =
    mask $ \restore -> do
      a <- before
      r <- restore (thing a) `onException` after a
      _ <- after a
      return r

onException :: IO a -> IO b -> IO a
onException io what = io `catch` \e -> do
    _ <- what
    throwIO (e :: SomeException)

In both cases, throwIO will insert a new backtrace, but that backtrace will point to where the exception was rethrown, not to where it was thrown originally. What’s worse, neither bracket nor onException have a HasCallStack constraint, so all we see in the callstack is the call to throwIO from onException itself.

Cost centre stacks do help a bit here (provided you enable profiling): at least you’ll get to see the full backtrace to the exception handler, and with a bit of luck even to the original call to throw, due to the semantics of semantics of cost centres in exception handlers. That won’t always be the case though (for example, in the case of asynchronous exceptions), and you won’t see any of the additional annotations that might have been added to the exception.

Duplicated annotations

The Exception instance for ExceptionWithContext in GHC 9.10 has an incorrect definition for toException:

instance Exception a => Exception (ExceptionWithContext a) where
  -- (..)

  -- implementation in GHC 9.10
  toException (ExceptionWithContext ctxt e) =
      let ?exceptionContext = ctxt in SomeException e

(We saw the correct definition above.)

This is wrong for two reasons:

It does not use toException of the underlying type (the a type parameter); in most cases this does not matter, because toException rarely does anything interesting. Even in the case of SomeException, where toException does something “interesting” (if perhaps ill-advised), to wit clear the exception context, that doesn’t matter here because we are overriding that context anyway. However, there might be types where toException genuinely does something important (even if I am not aware of any such cases currently).
In the specific case that a is SomeException, this will create a nested SomeException: SomeException (SomeException someOtherException) with two copies of the context (the annotations).

The second point here is more important: if we later have exception handlers that manipulate the exception context, they will manipulate the outer context but not the inner. Indeed, if that “manipulation” is “clear the context” (see previous section), we might end up in the somewhat bizarre situation where these two problems cancel out: if we have

someAction `catch` \(ExceptionWithContext ctxt (e :: SomeException))
  throwIO $ ExceptionWithContext ctxt e

then this exception handler will duplicate the annotations, a later exception handler might lose the outermost annotations (previous section) but not the inner, and all of a sudden annotations that were lost mysteriously re-appear; see GHC ticket #27194.

Unfortunately, this is not a viable workaround for the lost annotation problem, as it changes the type of the exception nested in the (outer) SomeException from whatever it really should have been to (the inner) SomeException, which will break any exception handlers for that specific type.

GHC 10.0

The upcoming GHC 10.0 releases makes a few improvements to the exception annotation infrastructure. The first important improvement is that exception handling in STM was lagging behind a bit; this will be rectified (#25365).

The other important fix is in onException. Consider again the definition we saw when we discussed rethrowing exceptions:

onException :: IO a -> IO b -> IO a
onException io what = io `catchNoPropagate` \e -> do
    _ <- what
    rethrowIO (e :: ExceptionWithContext SomeException)

We mentioned that that catchNoPropagate does not install an exception handler that installs a WhileHandling annotation, because we are rethrowing the very same exception. However, if what throws an exception that is no longer the case! The definition of onException is therefore modified to

onException io what = io `catchNoPropagate` \e -> do
    _ <- annotateIO (whileHandling e) what
    rethrowIO (e :: ExceptionWithContext SomeException)

See CLC Proposal #397 for details. As an example, consider what happens if the release callback of bracket itself throws an exception:

data ReleaseFailed = ReleaseFailed
  deriving stock (Show)
  deriving anyclass (Exception)

bottom :: HasCallStack => IO ()
bottom = annotateIO (MyAnnotation 123456789) $ throwIO MyException

middle :: HasCallStack => IO ()
middle = bracket (return ()) (\() -> throwIO ReleaseFailed) $ \() -> bottom

top :: HasCallStack => IO ()
top = middle

With the new definition onException (and my custom exception display function, which is still needed), we get

demo-bracket-release-fail: Uncaught exception of type ReleaseFailed
  ReleaseFailed
  HasCallStack backtrace:
    throwIO, called at exe/DemoBracketReleaseFail.hs:42:38 in (..)
    middle, called at exe/DemoBracketReleaseFail.hs:46:7 in (..)
    top, called at exe/DemoBracketReleaseFail.hs:55:5 in (..)
  WhileHandling
    MyException
      MyException
      MyAnnotation 123456789
      HasCallStack backtrace:
        throwIO, called at exe/DemoBracketReleaseFail.hs:38:48 in (..)
        bottom, called at exe/DemoBracketReleaseFail.hs:42:70 in (..)
        middle, called at exe/DemoBracketReleaseFail.hs:46:7 in (..)
        top, called at exe/DemoBracketReleaseFail.hs:55:5 in (..)

Very nice!

Conclusions

Exception annotations can be invaluable when debugging difficult problems. While the initial implementation in GHC 9.10 had some important limitations, the situation has since been much improved. Provided you use GHC 9.12 or later, there are two things to pay attention to in your own code (these apply to 9.12, 9.14 and 10.0):

Define your own custom function to display exceptions, which shows all annotations, not just the top-level ones (or use mine).
Be cautious with throwing SomeException: toException for SomeException will clear the exception context, which is almost certainly not what you want. For catch-and-rethrow, use the combinators available specifically for that purpose.

That said, there are still a few minor shortcomings to be aware of:

GHC 9.12 and 9.14:
- Exception handling in STM has not yet been updated: throwSTM does not collect a backtrace, and catchSTM does not add any WhileHandling annotations (#25365).
- onException does not add any WhileHandling exceptions; as a result, if the resource deallocation callback to bracket itself throws an exception, the original exception will be lost.
Both of these will be addressed in GHC 10.0.
exceptions-0.10.9: this is the version of exceptions that is bundled with GHC 9.12, but lags behind a bit. For example, the definition of generalBracket in exceptions-0.10.9 does not use any of the abstractions for rethrowing; this is fixed in exceptions-0.10.12. The impact is however limited: it merely means that there are some extraneous WhileHandling annotations, resulting in unnecessary noise.
Any catch-and-rethrow patterns implemented in other packages should not lose any annotations, provided that they use catch from base.

We will ignore calls to withFrozenCallStack, which hide some internal functions from the HasCallStack backtrace. This makes the backtrace slightly more readable, but does not otherwise change anything. See CLC #387.↩︎
Some versions of base distinguish between catchExceptionNoPropagate and catchNoPropagate, which differ only in some strictness annotations. Strictness can make a big difference, especially when IO actions are undefined rather than throwing an exception. However, this is its own can of worms, and outside the scope of this blog post. See CLC proposal #383 for some discussion.↩︎
In GHC 9.10, displayException did show annotaitons, but this got rolled back in 9.12; see CLC #285 for a detailed discussion.↩︎

Compatibility packages in 2026

2026-05-07T00:00:00Z

Posted on 2026-05-07 by Oleg Grenrus engineering

Seven years ago I wrote a post about compatibility packages. It is now highly outdated, so let us revisit the matter.

Recently there have been a small push towards reinstallable base. While it's still far from being a thing, it made me remember that using impl(ghc >= 7.9)-like conditionals to guard against different base versions is semantically wrong.

Also recently there is increasing? interest in MicroHs. While I personally don't care about that compiler, I realized that I can make its users experience at least slightly nicer though still somewhat ignoring MicroHs existence.

An example

Luckily there is a solution, and it was around for a long time: automatic flags. Here is a complete example:

flag base-ge-4-16
  description: @base >=4.16@ (GHC-9.2)
  default:     True
  manual:      False

flag base-ge-4-17
  description: @base >=4.17@ (GHC-9.4)
  default:     True
  manual:      False

library
  ...
  build-depends:
      base    >=4.12.0.0 && <4.23

  if !flag(base-ge-4-16)
    build-depends: OneTuple >=0.4.2 && <0.5

  if !flag(base-ge-4-17)
    build-depends: data-array-byte >=0.1.0.1 && <0.2

  if flag(base-ge-4-16)
    build-depends: base >=4.16
  else
    build-depends: base <4.16

  if flag(base-ge-4-17)
    build-depends: base >=4.17
  else
    build-depends: base <4.17

First we declare the flags. I chose to use a naming scheme reminiscing the condition: base-ge-4-17 for base >=4.17.

Then we make the flag selection deterministic:

  if flag(base-ge-4-17)
    build-depends: base >=4.17
  else
    build-depends: base <4.17

Because the base >=4.17 and base <4.17 conditions are disjoint, there is at most one valid flag assignment for any given install plan which includes base - but because base is a direct dependency it has to be in the install plan. This is why I call such flag deterministic ¹.

And finally we use the flag value to add a conditional dependency:

  if !flag(base-ge-4-17)
    build-depends: data-array-byte >=0.1.0.1 && <0.2

Previously I would written

  if !impl(ghc >=9.4)
     build-depends: data-array-byte >=0.1.0.1 && <0.2

but as I mentioned in an introduction that is semantically wrong. In this case Data.Array.Byte module is introduced in base-4.17, which just happen to be available in GHC-9.4. In the future there might not be one-to-one correspondence between (major) GHC and base versions.

Moving to use automatic flags removes the direct mention of GHC. This also (hopefully) helps MicroHS users: we don't need to edit

-  if !impl(ghc >=9.4)
+  if !impl(ghc >=9.4) && !impl(mhs)

as there are no direct mention of compilers. The library compatibility conditions are expressed using library version vocabulary.

Low-level tools for high level concept

It is worth mentioning that the three parts: defining the flag, making flag selection deterministic and using the flag value as a condition is indirect way to say something like

if !depends(base >=4.17)
  build-depends: data-array-byte >=0.1.0.1 && <0.2

In other words we use "low-level" tools to express a high level concept.

Maybe some future version of .cabal format would include the high-level way directly. However, the low-level "desugaring" makes it impossible to scrutinize flag selection on indirect dependencies, e.g. we do add dependency to base

  if flag(base-ge-4-17)
    build-depends: base >=4.17

  else
    build-depends: base <4.17

Viewing it from that perspective if a consturct like depends(base >=4.17) is added to .cabal format, it should also add a constraint for install plan to include base, though not necessarily adding it direct dependency. That way the conditional will be deterministic. But such implicit dependency might feel unnatural.

Conclusion

I already rewrote impl(ghc) conditionals to use automatic flags in few packages, and will continue to do that as I'm doing other maintenance tasks.

It seems that OneTuple and data-array-byte are the only few relevant compatibility packages at the moment (using GHC 9); there were a lot of compatibility packages in the last decade (tagged, nats, void, fail, semigroups, bifunctors, contravariant, bifunctor-classes-compat, type-equality, foldable1-classes-compat), but if you don't need to support very old bases & GHCs, we don't need to depend on them for their compatibility shims anymore.

The library part of compatibility story is relatively good, even without having higher level construct like if depends (lib >= x.y) construct. However, the compatibility of language level constructs is lacking. There is no way to ask in .cabal file whether compiler support DeriveGeneric or TemplateHaskell. We can require these extensions, but we cannot ask whether they exist at all. Neither we can differentiate between different versions. Is compiler's ImpredicativeTypes "broken" or not, does LambdaCase include \cases etc. Some part of me wishes the MicroHs a great success, so those issues become more pressing and eventually solved. Solved in some other ways than maintainers hardcoding compiler versions in the package definitions.

In my opinion all automatic flags have to be deterministic. For example having automatic debug flag is IMO just wrong. There are also a bit edge cases related to pkg-config, and I think it's a "bug" in .cabal format that we cannot make pkg-config based library version selection deterministic.↩︎

A bidirectional typechecking puzzle

2026-05-05T00:00:00Z

Type inference challenges for real-world JSON

Jumping to errors in Evil

2026-05-04T06:16:00Z

Recently I realised that it'd be really nice if jumping to errors would store the previous location in the Evil jump list. These definitions do just that

(evil-define-motion mes/evil-goto-next-error (count)
  :jump t
  (unless (bound-and-true-p flymake-mode) (signal 'search-failed nil))
  (flymake-goto-next-error count))
(evil-define-motion mes/evil-goto-prev-error (count)
  :jump t
  (unless (bound-and-true-p flymake-mode) (signal 'search-failed nil))
  (flymake-goto-prev-error count))

and for now I've bound them to C-j and C-k (because that's what evil-collection does)

(general-def flymake-mode-map
  :states 'normal
  "C-j" 'mes/evil-goto-next-error
  "C-k" 'mes/evil-goto-prev-error)

This makes it easier to make a change, fix the errors caused by the change and then return to where I was.

Follow-up on switching to eglot

2026-05-02T11:20:00Z

Jan G sent me a two-part comment.

Part one

I was under the impression that when using elpaca you needed to disable use-package, and that when using elpaca-use-package, you were redefining the macro. Iâ€™m not 100% sure about this, but the documentation has an example of use-package and how it actually expands to an elpaca command.

I wouldn't know. All I can say is that it would be nice if package managers that hook into, or completely redefines use-package, would document if they deviate from the behaviour of "vanilla use-package" in some way.

Part two

Given that, use-packageâ€™s documentation is always going to be a little off, since elpaca is doing everything async. The only way Iâ€™ve found to reliably manage some dependencies is to use the elpaca-after-init hook, so they donâ€™t even try to run until elpaca is finished loading everything.

I'd say it sometimes seems like the documentation for use-package is a little off for use-package itself ðŸ™‚

The README for Elpaca says that

Add configuration which relies on after-init-hook, emacs-startup-hook, etc to elpaca-after-init-hook so it runs after Elpaca has activated all queued packages.

but that seems like a very big hammer and as I understand it I'd have to move the whole :init block for python-mode into the hook in that case. Playing around with the various blocks for use-package isn't too time consuming and I think it's a good first thing to try.

Secrets when connecting to DBs

2026-05-02T10:41:00Z

I should have dealt with comments I got to my posts on how I deal with secrets in my work notes, here, and here. Better late than never though, I hope.

Comment from Stefano R

The first one is a link to post titled How I use :dbconnection in org files. It describes a nice way of setting sql-connection-alist based on the contents of a file, in his case ~/.pgppass.

Comment from Harald J

The other starts with a function for searching ~/.authinfo.gpg for entries of the form

machine / login  password  port

and then setting sql-password-search-wallet-function and sql-password-wallet to tell sql-mode to use it

(defun my/sql-auth-source-search-wallet (wallet product user server database port)
  "Read auth source WALLET to locate the USER secret.
Sets `auth-sources' to WALLET and uses `auth-source-search' to locate the entry.
The DATABASE and SERVER are concatenated with a slash between them as the
host key."
  (when-let (results (auth-source-search :host (concat server "/" database)
                                         :user user
                                         :port (number-to-string port)))
    (when (and (= (length results) 1)
               (plist-member (car results) :secret))
      (plist-get (car results) :secret))))

(setq sql-password-search-wallet-function #'my/sql-auth-source-search-wallet)
(setq sql-password-wallet "~/.authinfo.gpg")

The value for sql-connection-alist is then as normal

(setq sql-connection-alist
  '((some-dbname (sql-product 'oracle)
                 (sql-port 1521)
                 (sql-server ...)
                 ...))

and the blocks in orgmode looks like this

SRC sql-mode :product oracle :dbconnection i3v1e-ro :results raw
SELECT to_char(sysdate, 'YYYY-MM-DD HH24:ii:ss') AS today,
       to_char(sysdate + 1, 'YYYY-MM-DD HH24:ii:ss') AS tomorrow
FROM dual;
SRC

Thoughts

Both of these feel closer to the intent of sql-mode in a way. I'll have to try using sql-connection-alist at some point.

The Bombadil Terminal Experiment

2026-04-29T22:00:00Z

Last week at Bug Bash 2026, I had a bunch of interesting discussions about testing non-web interfaces with Bombadil, our new property-based testing framework for user interfaces. One direction that I already wanted to explore is terminal user interfaces (TUIs), and the hallway discussions gave me a nudge to get going. I started hacking on the flight back home, and a few days later that embryo of a TUI fuzzer started to emerge.

The fuzzer in action, finding a bug in vitetris. (CW: flashing!)

It’s built on top of two key crates:

portable-pty, a pseudo-teletype in Rust that runs the program under test, and
libghostty-vt, a Rust wrapper around the Zig library, which interprets the output of the PTY and provides a virtual terminal API from which you can read cell contents, styles, scroll through the scrollback, etc.

With these two in place, I built a very basic fuzzer for TUIs: it runs the command you give it, polls its output, and writes interleaved random input sequences (printable ASCII characters and ANSI escape sequences). It also scrolls and resizes the terminal occasionally. Timing is a bit tricky, but it seems the current approach works fine: polling reads until the terminal is idle, capture state, then apply new inputs. Regarding speed, it depends a lot on the program being tested, but it looks capable of capturing at least 300 states per second.

I tried finding some basic TUI programs and terminal games to test. Much to my surprise, within the first few days I had found four seemingly real bugs in real software:

vitetris has a bug where if you enter just a number in the host name (e.g. 6) and try to connect to a remote game, the UI freezes.
btop has two different bugs, one recently fixed that I confirmed fixed with the latest version (1.4.6), and one that I just reported. Both were triggered by this fuzzer.
rlwrap got into a segfault which I haven’t yet been able to troubleshoot.

Pretty cool. Today, I merged this work to main in Bombadil. It’s not yet released, but if you’re curious you can try it already by downloading a bombadil-terminal binary from the CI artifacts. On macOS you’ll need to remove the quarantine bit to bypass GateKeeper.

Now, the work remains to make this a solid tool. Here are some future goals:

Integrate it with the specification framework in Bombadil, so that you can define custom properties and action generators. It’d be neat to provide an API akin to querySelector that could parse and traverse panels drawn with box-drawing characters. You probably also want to validate that those borders line up correctly.
Generate a lot more diverse input and terminal actions. For instance, generate sequences from the Kitty keyboard protocol.
Make the test runner’s user interface better. Perhaps a TUI?!
Make this part of the ordinary bombadil binary, I think. There could be subcommands for browser and terminal testing tools.
Run it in Antithesis to see what that fuzzer can find.

All right, short post today — I just wanted to share my excitement and early results.

A huge thanks to Uzair Aftab, maintainer of libghostty-rs, for helping me get libghostty-vt building under Nix!

Tries for Polynomials

2026-04-28T00:00:00Z

Posted on April 28, 2026

Tags: Haskell

One of my favourite Haskell papers is McIlroyâ€™s wonderful â€œPower Series, Power Seriousâ€� (1999). The paper is about power series, which are a type of infinite sums that behave like (infinite) polynomials. For example, $\cos$ can be represented by the following power series:

$\cos(x) = 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \frac{x^6}{6!} + \frac{x^8}{8!} - \frac{x^{10}}{10!} + \cdots$

A power series is characterised fully by its coefficients, meaning that we can represent one as an infinite stream of rational numbers. In Haskell, we often use lazy lists to represent streams, so we can encode a power series with the following type:

type PowerSeries = [Rational]

In this encoding, we can write $\cos$ as the following:

cos :: PowerSeries
cos = zipWith (*) (cycle [1,0,-1,0]) (scanl (/) 1 [1..])

>>> cos
[1,0,-1/2,0,1/24,0,-1/720,...

We can also build $\sin$ :

sin = zipWith (*) (cycle [0,1,0,-1]) (scanl (/) 1 [1..])

While it can be difficult and unintuitive to work with infinite series like the ones above, happily we can define all of the normal numeric operations on power series as (lazy) list-manipulation programs:

instance Num PowerSeries where

  (x:xs) + (y:ys) = (x+y) : (xs + ys)

  (x:xs) * ys = map (x*) ys + (0 : xs * ys)

  negate = map negate

  fromInteger n = fromInteger n : repeat 0

(if you try and put this code into a Haskell interpreter youâ€™ll get all sorts of warnings; Iâ€™ll put the full code for this post below with all of the imports and pragmas you need to get it to work)

McIlroy (1999) goes through the various algorithms and numeric operations that can be implemented on this representation, but at this point I would like to diverge from the paper and turn our focus to finite polynomials. Like a power series, a finite polynomial can be represented by a list of coefficients:

type Polynomial = [Rational]

And, even though the underlying list is finite rather than infinite, the numeric operations work basically the same way as they do on power series. We just need to add clauses in each function to handle the empty list:

instance Num Polynomial where

  []     + ys     = ys
  xs     + []     = xs
  (x:xs) + (y:ys) = (x+y) : (xs + ys)

  []     * _  = []
  (x:xs) * ys = map (x*) ys + (0 : xs * ys)

  negate = map negate

  fromInteger n = [fromInteger n]

The only semantic trickiness with this representation is that itâ€™s important to quotient by trailing zeroes.

instance Eq Polynomial where
  [] == ys = all (0==) ys
  xs == [] = all (0==) xs
  (x:xs) == (y:ys) = (x == y) && (xs == ys)

Evaluation and Hornerâ€™s Rule

The definition of a power series above suggests that we should implement evaluation using exponentiation and indices:

eval :: Polynomial -> Rational -> Rational
eval p x = sum (zipWith (\a i -> a * x^i) p [0..])

And this does in fact give us the correct answer. Consider the polynomial $4 + 2x + 5x^2 - x^3$ :

poly = [4,2,5,-1] -- 4 + 2x + 5xÂ² - xÂ³
eval poly x = eval [4,2,5,-1] x
            = sum (zipWith (\a i -> a * x ^ i) [4,2,5,-1] [0..])
            = 4*x^0 + 2*x^1 + 5*x^2 + (-1)*x^3
            = 4 + 2*x + 5*x^2 - x^3

However, this evaluation algorithm is unsatisfactory in one respect: it performs a lot of multiplication. In numeric programs, we generally want to minimise the number of multiplications performed, since multiplication is a relatively expensive operation (when compared to addition or subtraction). In the example above, it takes six multiplications to compute the result: one for $2x = 2 \times x$ , two for $5x^2 = 5 \times x \times x$ , and three for $-x^3 = -1 \times x \times x \times x$ . In general, for a polynomial of degree $n$ , the above implementation of eval will perform $\mathcal{O}(n^2)$ multiplications.

There is, however, a trick that can bring the number of multiplications down to $\mathcal{O}(n)$ : Hornerâ€™s rule. The basic idea is to rewrite the expanded polynomial $4 + 2x + 5x^2 - x^3$ into a factorised form: $4 + x(2 + x(5 + x(-1)))$ . If we evaluate this expression directly, we will only have to perform three multiplications (and we donâ€™t even have to perform any extra additions as compensation). While Hornerâ€™s rule is really quite a simple trick, the generalised pattern is surprisingly powerful (Gibbons 2011). Indeed, the representation I develop in this post is basically a data structure encoding of Hornerâ€™s rule.

Before getting there, however, letâ€™s return to our list-based polynomial, and look at using Hornerâ€™s rule to implement eval. Interestingly, the list-based representation has kind of already performed our factorisation for us. As a result, Hornerâ€™s rule evaluation is actually more natural to implement than the expanded version above.

eval :: Polynomial -> Rational -> Rational
eval xs x = foldr (\a p -> a + x * p) 0 xs

Multiple Variables

A cool trick with this representation is that if you want to support multiple variables you can smuggle them in through the coefficients. A polynomial in two variables is the same as a polynomial with coefficients drawn from another polynomial.

type TwoVar = [Polynomial]

To save us having to write a separate Num instance for TwoVar, we can instead generalise the Num instance on Polynomial above:

instance Num a => Num [a] where

The rest of the instance is the same. Now, we can write 5 ^ 2 :: Polynomial or 6 :: TwoVar and it will just work.

We also have to generalise the type of eval slightly:

eval :: Num a => [a] -> a -> a

but again, the implementation remains the same.

With this machinery, we can now write and evaluate polynomials in 2 variables:

eval2 :: TwoVar -> Rational -> Rational -> Rational
eval2 p x y = eval (eval p [x]) y

var :: Num a => [a]
var = [0,1]

x = var
y = [var]

poly = 2 * x ^ 2 - y ^ 3 + 4

>>> poly
[[4,0,0,-1],[0],[2]]

>>> eval2 poly 2 3
-15

We can even use some typeclass shenanigans to build a generalised evaluator that works with any fixed number of variables.

Implementation of an Evaluator for Polynomials in Arbitrary Variables

instance Num n => Num (e -> n) where
  fromInteger = const . fromInteger
  (f + g) x = f x + g x
  (f * g) x = f x * g x

  abs = (abs .)
  signum = (signum .)
  negate = (negate .)

class Num r => Poly p r | p -> r, r -> p where
  evalN :: p -> r

instance Poly Integer Integer where
  evalN = id

instance Poly p r => Poly [p] (Integer -> r) where
  evalN xs x = foldr (\a s -> evalN a + fromInteger x * s) 0 xs

>>> evalN poly 2 3
-15

z = [[var]]

>>> evalN (poly + z) 2 3 1
-14

Sums of Products

While the above representation is elegant, it is inefficient, and perhaps a little unintuitive. In most implementations I have seen, variables are represented simply with a type for names, rather than the kind of implicit de Bruijn indices used above. One natural representation uses a list of terms:

newtype Poly v c = Poly { terms :: [([v], c)] }

Here, a value of type Poly v c is a polynomial with coefficients drawn from c and variables from v. It is a list of monomials, where the outer list represents a sum, and each monomial represents a product of variables with a single coefficient.

data Var = X | Y | Z

poly :: Poly Var Integer
poly = Poly [([X,Y,Y],5),([Z],3),([Y,Z],2)]
-- 5xyÂ² + 3z + 2yz

Note that this representation requires some normalisation:

norm :: (Ord v, Num c, Eq c) => [([v],c)] -> [([v],c)]
norm = Map.toList . Map.fromListWith (+) . filter ((/=0).snd)

instance (Num c, Ord v, Eq c) => Eq (Poly v c) where
  (==) = (==) `on` norm . terms

And we have the following Num instance:

instance Num c => Num (Poly v c) where
  fromInteger n = Poly [([],fromInteger n)]
  Poly xs + Poly ys = Poly (xs ++ ys)
  xs * ys = Poly [ (xv ++ yv, xc * yc) | (xv,xc) <- terms xs, (yv,yc) <- terms ys ]
  negate = Poly . map (fmap negate) . terms

This representation perhaps maps more closely to the description of multivariate polynomials that many of us will have encountered in secondary school: itâ€™s straightforward to see how a polynomial like $2xy + y^2 - 3$ corresponds to the value Poly [([X,Y],2),([Y,Y],1),([],-3)]. The previous representation (TwoVar) would represent the same expression as the enigmatic [[-3,0,1],[0,2]].

However, there are some wrinkles to this type that are worth noting. First we can see that multiplication is not commutative (even after normalisation).

x = Poly [([X],1)]
y = Poly [([Y],1)]

x * y == Poly [([X,Y],1)]
y * x == Poly [([Y,X],1)]
x * y /= y * x

This is in contrast to TwoVar, where both $xy$ and $yx$ would be represented as [[0,0],[0,1]].

Conceptually, polynomials are a kind of free structure: they represent the normalised and quotiented syntax of an algebraic theory. The fact that Poly above doesnâ€™t have commutative multiplication just tells us that the underlying algebraic theory in question here is noncommutative rings, rather than commutative rings.

The second thing to note about this type is actually two related observations about inefficiency. Because I didnâ€™t implement normalisation on any of the numeric operations, we might expect the size of the underlying list of Poly to blow up:

poly = (1 + x) * (3 + 4) * (y + 2)

>>> terms poly
[([Y],3),([],6),([Y],4),([],8),([X,Y],3),([X],6),([X,Y],4),([X],8)]

>>> norm (terms poly)
[([],14),([X],14),([X,Y],7),([Y],7)]

And indeed it does, as you can see above. To counteract this, we can represent our polynomial as a mapping from monics (strings of variables) to coefficients:

newtype Poly v c = Poly { terms :: Map [v] c }

Num instance for Map-based polynomial

instance (Ord v, Num c) => Num (Poly v c) where
  fromInteger n = Poly (Map.singleton [] (fromInteger n))
  Poly xs + Poly ys = Poly (Map.unionWith (+) xs ys)
  xs * ys = Poly (Map.fromListWith (+) [ (xv ++ yv, xc * yc)
                                       | (xv,xc) <- Map.toList (terms xs)
                                       , (yv,yc) <- Map.toList (terms ys) ])
  negate = Poly . fmap negate . terms

While this new representation is an improvement over the un-normalised list, itâ€™s still not really â€œefficientâ€�. In particular, weâ€™re using lists as keys in the map; Haskellâ€™s Map is a binary search tree (though this caveat applies to most mapping structures), so search is always going to have to perform comparisons on the keys. When those keys are lists, that comparison takes time proportional to the length of each list. This is wasted effort that could be cached with a cleverer data structure.

This also brings the second observation about inefficiency into focus: we have lost our neat evaluation with Hornerâ€™s rule.

eval :: Num c => Poly v c -> (v -> c) -> c
eval (Poly mp) v = Map.foldrWithKey (\vs c s -> foldr ((*) . v) c vs + s) 0 mp

Weâ€™re back to performing $n$ multiplications per term.

Both of these inefficiencies are actually the same pattern, and can be solved with a general form of Hornerâ€™s rule. We need to cache prefixes: the data structure that does that best is a trie.

A Trie

Hornerâ€™s rule saved us from performing redundant multiplications by factoring out common terms to the left. That was simple to implement in the single-variable case, but it can still apply for multiple variables. Take an expression like $(2 + 3x - 5y) ^ 2$ , and multiply it out to $4 + 12x + 9x^2 - 15xy - 20y - 15yx + 25y^2$ . We can still factor this expression to remove common prefixes, like so:

$4 + x(12 + 9x - 15y) + y(-20 - 15x + 25y)$

The difference between this factorisation and the list-based polynomial we started with is that the tree representing the polynomial only had one child. Here, we have a child for each leading term. In terms of the data structure, where a list has a single tail in the cons case,

data List a = Nil
            | Cons a (List a)

The multivariate version of the same thing will be a tree

data Tree a = Nil
            | Cons a [Tree a]

Or, more specifically, a trie, where the subtree mapping is based on variables.

data Poly v c = c :<+ Map v (Poly v c)

A polynomial is a constant coefficient c plus the sum of variables drawn from v each multiplied by another polynomial. The polynomial above is represented with this type as the following:

4 :<+ {(X,12 :<+ {(X,9 :<+ {}),(Y,(-15) :<+ {})}),(Y,(-20) :<+ {(X,(-15) :<+ {}),(Y,25 :<+ {})})}

This trie type (with some improvements Iâ€™ll describe below) is the focus of this post; I think itâ€™s a cool data structure for representing polynomials.

The numeric functions on Tries

Letâ€™s first write evaluation:

eval :: Num c => (v -> c) -> Poly v c -> c
eval f (c :<+ vs) = c + Map.foldrWithKey (\v p s -> f v * eval f p + s) 0 vs

Notice that we have retrieved Hornerâ€™s rule: the evaluation of each term only performs a single multiplication; we donâ€™t have to repeat multiplications for terms that share prefixes any more.

(for those concerned with performance, it might be worth swapping out foldrWithKey with a strict variant. (also, this is somewhat unrelated but a bit of a pet peeve of mine: this is not a place where foldl' is the best option! foldl' is not a panacea!))

The numeric operations on this data structure can be implemented as follows:

deriving instance Functor (Poly v)

instance (Ord v, Num c, Eq c) => Num (Poly v c) where
  fromInteger n = fromInteger n :<+ Map.empty
  (n :<+ ns) + (m :<+ ms) = (n + m) :<+ Map.unionWith (+) ns ms
  (n :<+ ns) * ms = fmap (n*) ms + (0 :<+ fmap (*ms) ns)
  negate = fmap negate

Itâ€™s worth taking a moment to note how efficient these operations are (for a pointer-ridden high-level language like Haskell, that is). We donâ€™t have to compare any strings; we can use Data.Mapâ€™s efficient unionWith on single variables; and multiplication doesnâ€™t have to expand out any Cartesian product.

I will note that we do have to perform a little bit of normalisation for the derived Eq instance to be correct: we have to remove terms that multiply to zeros. Pruning dead branches like this is a pretty standard procedure on tries; in polynomial terms, that just means we have to get rid of entries in the map that evaluate to zero (so $x(2 + y) + y(0)$ should be pruned to $x(2 + y)$ ). This can be done without really changing the efficiency of the operations above, but it does make them slightly more verbose.

Normalising Num instance

For this version, we will rely on the extremely efficient custom merge operations in containers.

0 <+? ns | Map.null ns = Nothing
n <+? ns = Just (n :<+ ns)

instance (Ord v, Num c, Eq c) => Num (Poly v c) where
  fromInteger n = fromInteger n :<+ Map.empty

  a + b = fromMaybe 0 (add a b)
    where
      add (n :<+ ns) (m :<+ ms) =
        (n + m) <+?
          Map.merge
            Map.preserveMissing
            Map.preserveMissing
            (Map.zipWithMaybeMatched (const add))
            ns ms

  _ * (0 :<+ ms) | Map.null ms = 0 :<+ Map.empty
  (0 :<+ ns) * ms = 0 :<+ fmap (*ms) ns
  (n :<+ ns) * ms = fmap (n*) ms + (0 :<+ fmap (*ms) ns)


  negate = fmap negate
  abs = fmap abs
  signum (n :<+ _) = signum n :<+ Map.empty

Anyways, when we have all of the above instances, we can manipulate polynomials using the API you might expect, and the normalisation behaviour happens automatically.

data Var = X | Y deriving (Eq, Ord, Show)

var :: Num c => v -> Poly v c
var v = 0 :<+ Map.singleton v (1 :<+ Map.empty)

x,y :: Poly Var Integer
x = var X
y = var Y

poly = (2 + 3 * x - 5 * y) ^ 2
>>> poly
4 + Y*(-20 + Y*25 + X*(-15)) + X*(12 + Y*(-15) + X*9)

Lenses and Division

Lenses in Haskell are very cool, and personally I think one of the best demonstrations of their power is tries. A few years ago, when I was still on Twitter, I posted an implementation of a trie that fit in a tweet (gist link).

Tweet Trie

{-# LANGUAGE RankNTypes #-}

import Control.Comonad.Cofree
import Control.Lens hiding ((:<))
import qualified Data.Map as Map
import Data.Map (Map)
import Prelude hiding (lookup)
import Data.Maybe (isJust)
import Test.QuickCheck

type Trie a b = Cofree (Map a) (Maybe b)

string :: Ord a => [a] -> Lens' (Trie a b) (Maybe b)
string =
 foldr
   (\x r -> _unwrap . at x . anon (Nothing :< mempty)
                                  (\(v :< m) -> null v && null m) . r)
   _extract


insert :: Ord a => [a] -> b -> Trie a b -> Trie a b
insert xs x = string xs .~ Just x

lookup :: Ord a => [a] -> Trie a b -> Maybe b
lookup = view . string

delete :: Ord a => [a] -> Trie a b -> Trie a b
delete xs = string xs .~ Nothing

Lenses are what allowed this very terse implementation. The original purpose of lenses was to facilitate deep access in nested records and data structures: a trie is effectively a nested map, so itâ€™s no great surprise that lenses are a good fit.

It turns out that lenses are also useful for manipulating polynomial tries. At first, it might be difficult to see why: in the trie implementation above, a lens was used to build getters and setters for a mapping from strings to payloads. But what does that translate to in the context of a polynomial? What does it mean to â€œlook upâ€� a string of variables in some expression like $2x^2 + y$ ?

It turns out that lookups corresponds to division. For example, dividing the polynomial $2x^2 + y$ by the monic $xx$ gives us a quotient $2$ and remainder $y$ .

>>> divMod (2 * x ^ 2 + y) [X,X]
(2, y)

This is already quite similar to a lens: before the van Laarhoven encoding, lenses were usually thought of as functions that took a data structure and returned a pair of the â€œfocusâ€� of the lens and the â€œrestâ€� of the structure. In polynomial terms, that â€œfocusâ€� is the quotient, and the â€œrestâ€� is the remainder.

But thatâ€™s a little vague. Letâ€™s construct the actual lenses here, in the van Laarhoven style:

constant :: Lens' (Poly v c) c
constant f (c :<+ vs) = fmap (:<+ vs) (f c)

vars :: Lens (Poly v c) (Poly v' c) (Map v (Poly v c)) (Map v' (Poly v' c))
vars f (c :<+ vs) = fmap (c :<+) (f vs)

isZero :: (Num c, Eq c) => Poly v c -> Bool
isZero (n :<+ ns) = (0 == n) && Map.null ns

factored :: (Ord v, Num c, Eq c) => [v] -> Lens' (Poly v c) (Poly v c)
factored = foldr (\v vs -> vars . at v . anon 0 isZero . vs) id

This last lens does indeed give us an interface that looks like division:

>>> view (factored [X,X]) (2*x^2 + y)
2

>>> set (factored [X,X]) 0 (2*x^2 + y)
Y

If we want to define an actual division function, we can define it in terms of factored, in a fun example of the kind of golfy code that lens enables.

divMod :: (Ord v, Num c, Eq c) => Poly v c -> [v] -> (Poly v c,Poly v c)
divMod p vs = factored vs (,0) p

>>> (2*x^2 + y) `divMod` [X,X]
(2,Y)

GrÃ¶bner Bases

While the interface above lets us do some basic computer algebra, to do any serious work with polynomials we will have to at some point compute GrÃ¶bner bases. A GrÃ¶bner basis isâ€¦ somewhat hard to define, actually. Iâ€™ll quote an explainer on the topic by Sturmfels (2005):

A GrÃ¶bner basis is a set of multivariate polynomials that has desirable algorithmic properties

Basically, in several algorithms over polynomials (division, Gaussian elimination, etc.) it becomes necessary at some point to compute this thing called a GrÃ¶bner Basis.

There is a lot of published literature on computing GrÃ¶bner bases in different settings. However, the trie polynomial I have built above is fundamentally noncommutative, and the literature on computing GrÃ¶bner bases for noncommutative rings is comparatively smaller. I have been following Xiuâ€™s thesis (2012) for this project. It outlines a noncommutative version of Buchbergerâ€™s algorithm, and a few optimisations that I was able to implement.

One slightly annoying aspect of these algorithms is that they tend to use monomials as a primitive. In other words, instead of working with the polynomial directly, the algorithms tend to describe operations with the assumption that your representation is basically a list of monomials. In particular, the algorithms will frequently extract the â€œleadingâ€� monomial, and it becomes important for performance that the polynomial representation can provide that leading monomial quickly. Unfortunately, extraction of the leading monomial is slightly awkward on the trie representation (or certainly less natural than the implementation on a listed representation); so we will need to do some work to implement it.

Monomial Orderings

The first important concept to implement for GrÃ¶bner bases is an admissible monomial ordering. This is a total order on strings of variables that is â€œadmissibleâ€�; meaning that it respects concatenation on both sides, and it also is a well-ordering, meaning that any strictly descending chain is finite.

$a < b \implies a \bullet c < b \bullet c$

$a < b \implies c \bullet a < c \bullet b$

These constraints rule out the usual lexicographic ordering on strings. Instead, weâ€™ll go with graded lexicographic. This means we first compare strings for length, and only in the case where theyâ€™re equal do we move to the normal lexicographic comparison.

grlex :: Ord a => [a] -> [a] -> Ordering
grlex xs ys
  | length xs < length ys = LT
  | length xs > length ys = GT
  | otherwise = compare xs ys

We can improve the efficiency of the above function somewhat by using one of my favourite monoids: the monoid instance on Ordering.

grlex :: Ord a => [a] -> [a] -> Ordering
grlex = go EQ
  where
    go !a []     []     = a
    go !a []     (_:_)  = LT
    go !a (_:_)  []     = GT
    go !a (x:xs) (y:ys) = go (a <> compare x y) xs ys

This version performs just one pass through each list, and does the correct comparison without additionally calculating the length. Itâ€™s also nonstrict: if one of the lists passed is infinite, this comparison will still terminate.

Another admissible order we could use is reverse grlex, which basically amounts to reversing the lists before the comparison. The trie structure means that weâ€™re basically forced to use grlex, but I will include an implementation of grevlex here because I think itâ€™s cute.

Implementations of grevlex

grevlex :: Ord a => [a] -> [a] -> Ordering
grevlex []     []     = EQ
grevlex (_:_)  []     = GT
grevlex []     (_:_)  = LT
grevlex (x:xs) (y:ys) = grevlex xs ys <> compare x y

-- This version is tail-recursive, but it also might unnecessarily compare
-- elements. However, that should be cheaper than building up the list of
-- comparisons.
grevlex :: Ord a => [a] -> [a] -> Ordering
grevlex = go EQ
  where
    go !a []     []     = a
    go !a (_:_)  []     = GT
    go !a []     (_:_)  = LT
    go !a (x:xs) (y:ys) = go (compare x y <> a) xs ys

Enumerating Monomials

The problem with all the admissible monomial orderings is that they need to see the entire monomial before they can decide whether itâ€™s ordered before or after another. This is at odds with the trie, which tends to prefer computations that can be described in terms of prefix/suffix decompositions.

To demonstrate the problem, letâ€™s take a look at an algorithm that enumerates the monomials of a polynomial in lexicographic order:

monos :: (Num c, Eq c) => Poly v c -> [([v],c)]
monos p = search [] p []
  where
    cons vs 0 ms = ms
    cons vs c ms = (reverse vs,c) : ms

    search sv (n :<+ ns) ms = cons sv n (Map.foldrWithKey (search . (:sv)) ms ns)

>>> monos ((2 + 3*x - 5*y) ^ 2)
[([],4),([X],12),([X,X],9),([X,Y],-15),([Y],-20),([Y,X],-15),([Y,Y],25)]

Notice that the function search emits the monomial (reverse sv, n) straight away (if n /= 0), when it encounters it: for a proper admissible monomial ordering, it would instead want to first emit monomials of higher degree; that is, those monomials in the map ns.

However, we canâ€™t just flip the order of consing in search: notice that even if we reversed the output, we still wouldnâ€™t get an admissible monomial ordering (the singleton list [Y] should be grouped with the other singleton lists). The problem is that monos is performing a depth-first search. What we need is breadth-first.

I happen to be a little obsessed with breadth-first search, so I probably spent too much time on this particular implementation, but I do always get excited when I see a breadth-first traversal pop up in the wild.

For this case, I started with the levels function.

levels :: (Num c, Eq c) => Poly v c -> [[([v],c)]]
levels p = search [] p []
  where
    cons _  0 ms = ms
    cons vs c ms = (reverse vs,c) : ms

    search sv (n :<+ ns) []     = cons sv n [] : Map.foldrWithKey (search . (:sv)) [] ns
    search sv (n :<+ ns) (q:qs) = cons sv n  q : Map.foldrWithKey (search . (:sv)) qs ns

>>> levels ((2 + 3*x - 5*y) ^ 2)
[[([],4)],[([X],12),([Y],-20)],[([X,X],9),([X,Y],-15),([Y,X],-15),([Y,Y],25)]]

I have written about levels before (see also Gibbons 2015; Jones and Gibbons 1993).

I think itâ€™s a good fit here because it lets us build the prefix string for each monomial in a natural way (that prefix string is the sv thatâ€™s passed to search).

However, one flaw of this function is that it produces a list of lists: one inner list for each degree of polynomial. The output that I actually want, however, is the concatenation of the whole thing.

In reality, this isnâ€™t actually a flaw: we can just call concat and move on. I had a feeling, though, that there was probably some annoying circular program that would let us avoid the second traversal to concatenate the inner lists. Inspired by Geraint Jonesâ€™ cyclic breadth-first traversal (1993), I finally arrived at the following solution:

data Knots a
  = Knot
  { tied :: !Bool
  , yank :: [a]
  , ends :: Knots a }

tighten :: Knots a -> Knots a
tighten ~(Knot t y e) = Knot False (if t then y else []) (tighten e)

monos :: (Eq c, Num c) => Poly v c -> [([v],c)]
monos p = y
  where
    Knot _ y e = tie [] p (tighten e)
    cons sv 0 ms = ms
    cons sv c ms = (reverse sv, c) : ms
    tie sv (n :<+ m) (Knot _ ms ps) = Knot True (cons sv n ms) (Map.foldrWithKey (tie . (:sv)) ps m)

>>> monos ((2 + 3 * x - 5 * y) ^ 2)
[([],4),([X],12),([Y],-20),([X,X],9),([X,Y],-15),([Y,X],-15),([Y,Y],25)]

While this does order the output according to grlex, itâ€™s ordered from smallest to largest, which is the reverse of what we want. And yes, while we could just reverse the output, I didnâ€™t write the circular abomination above to throw away the single-pass traversal at such a small hurdle. Any (list-based) algorithm written in a fold-like fashion can usually be reversed by swapping out right-folds for left.

pull :: Knots a -> [a]
pull (Knot True _ e) = pull e
pull (Knot False y _) = y

monosDesc :: (Eq c, Num c) => Poly v c -> [([v],c)]
monosDesc p = pull r
  where
    r = tie [] p (Knot False [] (tighten r))
    cons sv 0 ms = ms
    cons sv c ms = (reverse sv, c) : ms
    tie sv (n :<+ m) (Knot _ ms ps) = Knot True (cons sv n ms) (Map.foldlWithKey (\a v p -> tie (v:sv) p a) ps m)

Efficiently Popping the Leading Monomial

Unfortunately, as fun as monosDesc is, it doesnâ€™t really do what we need it to for most of the GrÃ¶bner basis algorithms. While it is pretty efficient if we want all of the monomials of a polynomial, usually we just want the first one. And sadly, while monosDesc is linear overall, itâ€™s not lazy in the right way, meaning that we have to pay that full linear cost even if we only inspect the first element of the list it produces.

The solution here will require us to use a new data structure in place of the Map that we have currently. To avoid traversing the whole tree to find the largest monomial, we need to cache the depth of each subterm so that we can just descend into the subterm which contains the monomial of the highest degree. But we donâ€™t want to just swap out our Map v (Poly v c) for a Map v (Word, Poly v c): that solution would require us to walk over every entry in the map to find the largest Word. While it would be an improvement in practical terms, it would still incur an $\mathcal{O}(\text{width} \times \text{depth})$ cost to find the leading monomial.

Instead, we need the map itself to be able to efficiently provide the entry with the largest degree. We need our map to simultaneously act as a priority queue.

Luckily, the combination of these two structures has been researched before: Hinze (2001) wrote about â€œpriority search treesâ€�, a data structure that allows for $\mathcal{O}(\log n)$ lookup and insertion based on some ordered key, and separately allows for a $\mathcal{O}(\log n)$ popMin operation, based on some separate priority. The psqueues package provides a few implementations of this technique. The API isnâ€™t quite as extensive as, say, containers, so some functions will be slightly less efficient (we donâ€™t get a nice general merge function, for example), but we can basically drop in the OrdPSQ as a replacement for Map.

type SubTerms v c = OrdPSQ (Down v) (Down Word) (Poly v c)
data Poly v c = c :<+ SubTerms v c

Iâ€™m using the Down wrapper here because I want a max heap, rather than a min-heap. Iâ€™m using that wrapper on both the keys and priorities because OrdPSQ breaks priority ties according to the keys, and I also want greater keys returned first, to follow the grlex ordering.

The priority here is the depth of the tree. It tells us the length of the longest monomial contained:

depth :: Poly v c -> Word
depth (_ :<+ ns) = maybe 0 (\(_,Down p,_) -> succ p) (Map.findMin ns)

This operation is $\mathcal{O}(1)$ , since finding the minimum entry in OrdPSQ is $\mathcal{O}(1)$ .

Iâ€™ll also use the following isomorphism, for the lensy things:

entry :: (Num c, Eq c) => Iso' (Maybe (Down Word, Poly v c)) (Poly v c)
entry = iso (maybe (0 :<+ Map.empty) snd) (\p -> if isZero p then Nothing else Just (Down (depth p), p))

This lets us chain together lenses that index into an OrdPSQ.

factored :: (Ord v, Num c, Eq c) => [v] -> Lens' (Poly v c) (Poly v c)
factored = foldr (\v ls -> vars . at (Down v) . entry . ls) id

Finally, we can implement a function that pops the leading monomial from a polynomial, efficiently:

leading :: (Num c, Eq c, Ord v) => Poly v c -> Maybe (([v],c),Poly v c)
leading p | isZero p = Nothing
leading (n :<+ ns) = Just (retrie (Map.alterMin step ns))
  where
    retrie ((r,n'),ns') = (r, n' :<+ ns')

    step Nothing = ((([],n),0),Nothing)
    step (Just (Down v, _, p)) = (((v:vs,c),n), subTrie)
      where
        Just ((vs,c),p') = leading p
        subTrie | isZero p' = Nothing
                | otherwise = Just (Down v, Down (depth p'), p')

And it matches the earlier enumeration that we built:

prop_leadingMonos :: Poly Var Word -> Property
prop_leadingMonos p = monosDesc p === unfoldr leading p

Next Steps

I think this is an interesting data structure, and representation of polynomials. However, I am not very familiar with the computer algebra literature, so I canâ€™t yet tell how this kind of representation relates to the other systems out there. Furthermore, most of the algorithms I have read seem to work implicitly with â€œleading monomialsâ€� etc., leading to the following kind of implementation of division:

divModPrefM :: (Fractional c, Eq c, Ord v) => Poly v c -> ([v],c) -> (Poly v c, Poly v c)
divModPrefM p (vs, i) = factored vs ((, 0) . fmap (/i)) p

divModPref :: (Fractional c, Eq c, Ord v) => Poly v c -> Poly v c -> (Poly v c, Poly v c)
divModPref num divisor = case leading divisor of
  Nothing -> error "Divide by zero"
  Just (lt, rest) -> go 0 num
    where
      go !quot !rem = case divModPrefM rem lt of
        (0, _) -> (quot, rem)
        (q, rem') -> go (quot + q) (rem' - rest * q)

I feel that this doesnâ€™t make use of the benefits of the trie-based representation. I have implemented Buchbergerâ€™s algorithm (with most of the improvements from Xiu 2012), but I have yet to really research in depth what competitively fast systems do these days (Heisinger and Hofstadler 2025; Cohen and Knopper 2026; Levandovskyy, SchÃ¶nemann, and Zeid 2020). Iâ€™m also interested in seeing what kinds of applications there are for this stuff: I started this project with Weyl algebras in mind, but after looking into it a little more it seems clear that a trie is not a good fit for Weyl algebras.

I have looked a little bit at some other Haskell work on polynomials and similar things; Zucker (2018) implemented listed polynomials very similar to the ones I had at the start of this post, as did Manzyuk (2012) and Buteau (2013). Iâ€™ve seen some bigger Haskell packages that work with polynomials (Malaquias and Lopes 2007; Ishii 2018; Laurent 2024), though none seem to use a representation similar to the trie here. I also had a look at calculi (Barton 2024), but I think that that project mainly works with commutative rings (although itâ€™s pretty big project, so I wouldnâ€™t be surprised if there was some module I missed).

I would actually be interested to hear if anyone has any pointers to work that has a similar approach to polynomials, or on the kinds of things that people use these noncommutative polynomials for. I find most of the descriptions of these algorithms difficult to parse (since theyâ€™re usually written by and for mathematicians rather than computer scientists, and almost never for functional programmers), so I am sure Iâ€™m missing some major projects.

Gists

Listed Polynomial with arbitrary variables

Polynomial Trie

References

Barton, Dave. 2024. â€œCalculi.â€� https://github.com/DaveBarton/calculi.

Buteau, Samuel. 2013. â€œPolynomials - School of Haskell School of Haskell.â€� School of Haskell. https://www.schoolofhaskell.com/user/Sam567/computational-physics/beginner-s-tools/polynomials.

Cohen, Arjeh M., and Jan Willem Knopper. 2026. â€œGBNP.â€� GAP packages. https://github.com/gap-packages/gbnp.

Gibbons, Jeremy. 2011. â€œHornerâ€™s Rule.â€� Patterns in Functional Programming. https://patternsinfp.wordpress.com/2011/05/05/horners-rule/.

â€”â€”â€”. 2015. â€œBreadth-First Traversal.â€� Patterns in Functional Programming. https://patternsinfp.wordpress.com/2015/03/05/breadth-first-traversal/.

Heisinger, Maximilian, and Clemens Hofstadler. 2025. â€œF4ncgb: High Performance GrÃ¶bner Basis Computations in Free Algebras.â€� arXiv. doi:10.48550/arXiv.2505.19304. https://arxiv.org/abs/2505.19304.

Hinze, Ralf. 2001. â€œA simple implementation technique for priority search queues.â€� SIGPLAN Not. 36 (10) (October): 110â€“121. doi:10.1145/507669.507650.

Ishii, Hiromi. 2018. â€œA Purely Functional Computer Algebra System Embedded in Haskell.â€� In, 11088:288â€“303. doi:10.1007/978-3-319-99639-4_20. https://arxiv.org/abs/1807.01456.

Jones, Geraint, and Jeremy Gibbons. 1993. Linear-time breadth-first tree algorithms: An exercise in the arithmetic of folds and zips. Dept of Computer Science, University of Auckland. https://www.cs.ox.ac.uk/people/jeremy.gibbons/publications/linear.ps.gz.

Laurent, StÃ©phane. 2024. â€œHspray.â€� https://github.com/stla/hspray.

Levandovskyy, Viktor, Hans SchÃ¶nemann, and Karim Abou Zeid. 2020. â€œLetterplace: A subsystem of singular for computations with free algebras via letterplace embedding.â€� In Proceedings of the 45th International Symposium on Symbolic and Algebraic Computation, 305â€“311. ISSAC â€™20. New York, NY, USA: Association for Computing Machinery. doi:10.1145/3373207.3404056.

Malaquias, JosÃ© Romildo, and Carlos Roberto Lopes. 2007. â€œImplementing a computer algebra system in Haskell.â€� Applied Mathematics and Computation 192 (1) (September): 120â€“134. doi:10.1016/j.amc.2007.02.126.

Manzyuk, Oleksandr. 2012. â€œGrÃ¶bner bases in Haskell: Part I.â€� Oleksandr Manzyukâ€™s Blog. https://web.archive.org/web/20221206080655/https://oleksandrmanzyuk.wordpress.com/2012/10/25/grobner-bases-in-haskell-part-i/.

McIlroy, M. Douglas. 1999. â€œPower Series, Power Serious.â€� J. Funct. Program. 9 (3) (May): 325â€“337. doi:10.1017/S0956796899003299.

Sturmfels, Bernd. 2005. â€œWhat is... A GrÃ¶bner Basis?â€� Notices of the AMS 52 (10) (November). https://www.ams.org/journals/notices/200510/what-is.pdf.

Xiu, Xingqiang. 2012. â€œNon-commutative GrÃ¶bner Bases and Applications.â€� PhD thesis, UniversitÃ¤t Passau. https://opus4.kobv.de/opus4-uni-passau/frontdoor/index/index/docId/170.

Zucker, Philip. 2018. â€œDivision of polynomials in haskell.â€� Hey There Buddo! https://www.philipzucker.com/division-of-polynomials-in-haskell/.

Some type constructors are tensor products

2026-04-27T19:15:36Z

Introduction

I want to return to something I've mentioned a couple of times in the past - the fact that applying certain type constructors performs a tensor product.

First some admin stuff:

> {-# LANGUAGE DeriveFunctor #-}
> {-# LANGUAGE FlexibleInstances #-}
> {-# LANGUAGE MultiParamTypeClasses #-}
> {-# LANGUAGE UndecidableInstances #-}
> {-# LANGUAGE TypeApplications #-}
> {-# LANGUAGE KindSignatures #-}
> {-# LANGUAGE ScopedTypeVariables #-}
> {-# LANGUAGE AllowAmbiguousTypes #-}

> import Data.Proxy
> import Data.Kind (Type)

> infixr 7 ⊗

Suppose you define a type like so:

> data Complex a = C a a
>     deriving (Eq, Show, Functor)

> instance Num a => Num (Complex a) where
>     fromInteger n = C (fromInteger n) 0

>     C a b + C c d = C (a + c) (b + d)
>     C a b - C c d = C (a - c) (b - d)

>     C a b * C c d = C (a * c - b * d) (a * d + b * c)

>     negate (C a b) = C (negate a) (negate b)

>     abs    = error "abs doesn't make sense here"
>     signum = error "signum makes no sense here"

It seems straightforward. You've defined complex numbers in a way that allows a choice of base type to represent the real numbers. For example you could use Complex Float or Complex Double as representations of $\mathbb{C}$.

In actual fact you've done quite a bit more! That code has another reading - it implements a tensor product both in the category of vector spaces, and, less trivially, in the category of algebras. So if A is a suitable algebraic structure then, if you allow me to mix code and mathematics notation,

\[ \mathtt{Complex\ A} = \mathbb{C}\otimes\mathtt{A} \]

I took this for granted when I mentioned it previously but I thought I'd look into it in a little bit more detail.

Tensor Products

I want to start from the definition of the tensor product given by its universal property, but to make that slightly less fearsome I'll use an English sketch of it.

Suppose you have a pair of vector spaces $X$ and $Y$ over some base field $k$. A bilinear function $X\times Y\rightarrow Z$ is a function that is linear in $X$ and linear in $Y$. Now suppose we know that at some point in the future we are going to need some bilinear function on $X\times Y$ but don't yet know what it is. Can we make a structure, $T$, that contains precisely the information we need so that we can compute any bilinear function we want - with the proviso that we compute these bilinear functions by applying a linear function to $T$? We don't want $T$ to be lacking anything we might need to compute a future bilinear product, but we also don't want it to contain any extraneous data.

For example, imagine working with $V$, the vector space of 3D vectors. Some examples of bilinear functions we might want are the dot product $V\cdot V\rightarrow\mathbb{R}$ and the cross product $V\times V\rightarrow V$. What should $T$ look like?

We can write the dot product as $(x, y, z)\cdot(x', y', z') = xx'+yy'+zz'$. Note how it's made of products of coordinates from $(x, y, z)$ and coordinates from $(x', y', z')$. Similarly $(x, y, z)\times(x', y', z')=(yz'-zy',\ldots)$. Again, it's a linear combination of products of coordinates, one from each vector. You can prove that any bilinear product will be some linear combination of such products.

By thinking about all possible bilinear products you I hope you can see that $T$ should be a 9-dimensional vector space and a suitable way to represent a pair of vectors $(x, y, z), (x', y', z')$ for future application of a bilinear function is as $(xx', xy', xz', yx', yy', yz', zx', zy', zz')$. Any bilinear product is a linear combination of these 9 quantities and so is given by some linear operation on $T$. It is commonplace to arrange the 9-dimensional vector as a $3\times 3$ matrix in which case the map from the pair is called the outer product. But it doesn't really matter as all 9-dimensional vector spaces over a given field are isomorphic.

In this case I chose to consider bilinear functions on $V\times V$, but you can reason similarly for any pair of vector spaces $X$ and $Y$. When working with finite-dimensional vector spaces, the structure we need will be $mn$-dimensional where $m$ is the dimension of $X$ and $n$ is the dimension of $Y$. The structure is called the tensor product and is written as $X\otimes Y$. The bilinear map from the original vectors into the tensor product is also called the tensor product and as written as a binary operator $x\otimes y$. And once you have the tensor product, every bilinear function on the original pair of spaces can be expressed uniquely as a linear function on the tensor product.

So, for example, the dot product can be written as

\[ x\cdot y = \phi(x\otimes y) \]

where $(x, y, z)\otimes(x', y', z')=(xx',xy',\ldots zz')$ and so the linear function is $\phi(x_0, x_1,\ldots,x_8) = x_0+x_4+x_8$.

Similarly

\[ x\times y = \psi(x\otimes y) \]

where $\psi(x_0, x_1, \ldots, x_8)=(x_5 - x_7, x_6 - x_2, x_1 - x_3)$

Algebras

It's a confusing use of terminology, but the term "algebra (over $k$)" is used specifically to mean a vector space $A$ (over $k$) equipped with a bilinear product $A\times A\rightarrow A$ which is compatible with the vector space structure. And in addition I'm assuming my algebras contain a multiplicative unit element. Other people may call this a "unital algebra". I'll use the word "unital" when I want to stress that there is a unit.

An example is the algebra of complex numbers $\mathbb{C}$ over $\mathbb{R}$. It's a 2-dimensional vector space over $\mathbb{R}$. We can, for example, scale complex numbers by elements of the base field. We also have properties like $(au)v = u(av)$ for $a\in\mathbb{R}$ and $u,v\in\mathbb{C}$. We can scale either argument of the complex product by a real and it makes no difference which we choose. See Wikipedia for all the properties an algebra must satisfy.

Vector spaces come with an addition operation and a zero but we're going to share the work out a little differently because our Num instance already has those. So our VectorSpace class is just going to have the scale operation:

> class VectorSpace k v where
>     scale :: k -> v -> v

> instance VectorSpace Double Double where
>     scale = (*)

You can think of the definition of Complex above as a container for the coordinates in a choice of basis. Because I use deriving Functor I can get the VectorSpace instance for all similar types for free:

> instance (Functor c, VectorSpace k a) => VectorSpace k (c a) where
>     scale k = fmap (scale k)

Because fmap composes through nested functors, scale descends recursively through arbitrarily nested structures like Complex (Complex Double).

And now we can concretely implement the bilinear tensor product operation in our choice of basis. It works by descending through the construction of $x$ until it reaches its individual coordinates and then uses each one to scale $y$. A special case of this is our 9-dimensional vector construction above: each batch of 3 coordinates is s scaling of one vector by a coordinate from the other.

> (⊗) :: (Functor c, VectorSpace k a) => c k -> a -> c a
> x ⊗ y = fmap (`scale` y) x

We're literally just recursively building a table of all products of coordinates of c k and coordinates of a.

Any bilinear function f :: U -> V -> W can now be implemented as f x y = phi (x ⊗ y) for a unique choice of phi.

Algebras too

But there's more, and this is the point of me writing this article. Algebras also have a tensor product defined on them. The underlying carrier space is the tensor product of algebras considered as vector spaces. The product structure is defined by $(x\otimes y)(x'\otimes y')=(xx')\otimes(yy')$ and linear combinations thereof. But what's neat here is that we don't have to write any more code to implement this, our Num instance is already doing the work.

We need to check that our definition of Complex satisfies this property. In fact, I want to prove it more generally for any type like Complex that has a multiplication that looks like

    C a b * C c d = C (a * c - b * d) (a * d + b * c)

ie. I'll assume we have a type F that is an instance of Num, with constructor F, and whose multiplication is constructed from a linear combination of terms of the form a * a'.

Something like:

    (F ... a ...) * (F ... a' ...) = F ... (... + a * a' + ...) ...

Note that I'm claiming

\[ \mathtt{F\ A} = \mathtt{F\ Double}\otimes\mathtt{A} \]

so I can suppose that a is in Double (or whatever we use to represent the reals).

Assuming * is such a product:

   (x ⊗ y) * (x' ⊗ y')
== fmap (`scale` y) x * fmap (`scale` y') x'
   -- definition of tensor
== fmap (`scale` y) (F ... a ...) * fmap (`scale` y') (F ... a' ...)
   -- stating our assumptions about the form of x and x'
== (F ... (scale a y) ...) * (F ... (scale a' y') ...)
   -- this is what derived fmap looks like
== F ... (... + scale a y * scale a' y' + ...) ...
   -- our assumption about the form that multiplication takes
== F ... (... + scale (a * a') (y * y') + ...) ...
   -- multiplication is bilinear all the way down
== fmap (`scale` (y * y')) (F ... (... + a * a' + ...))
   -- same fact about fmap used above
== fmap (`scale` (y * y')) (x * x')
   -- again our assumption about how multiplication is implemented
== (x * x') ⊗ (y * y')
   -- definition of tensor again

Anyway, my motivation here is that quite a while back someone (on Mastodon) I think pushed back on my claim that we have a tensor product so I thought I'd give some more detail.

I could say more. The tensor product of algebras has the nice property that you can embed the original algebras in it in a way that the two images commute with each other. In fact, if you can define the tensor product to be the initial algebra with this property. But this is too long already.

Also, I used Haskell above but it carries over straightforwardly to other languages, even C++.

> main :: IO ()
> main = do
>   print "Bye!"

81: Torsten Grust

2026-04-27T07:00:00Z

Mike and Andres sat down with Torsten Grust, who is a professor of DB systems at the University of Tübingen. Even though Torsten loves SQL, he's used functional programming and Haskell to inform his work on query language design and compilation. We talked about the best way to program databases, how to bridge the gap between regular programming languages and databases, and compiling just about everything to SQL.

PenroseKiteDart User Guide

2026-04-26T16:11:03Z

Introduction

(Updated April 2026 for PenroseKiteDart version 1.8)

PenroseKiteDart is a Haskell package with tools to experiment with finite tilings of Penrose’s Kites and Darts. It uses the Haskell Diagrams package for drawing tilings. As well as providing drawing tools, this package introduces tile graphs (Tgraphs) for describing finite tilings. (I would like to thank Stephen Huggett for suggesting planar graphs as a way to reperesent the tilings).

This document summarises the design and use of the PenroseKiteDart package.

PenroseKiteDart package is now available on Hackage.

The source files are available on GitHub at https://github.com/chrisreade/PenroseKiteDart.

There is a small art gallery of examples created with PenroseKiteDart here.

Index

1. About Penroseâ€™s Kites and Darts

The Tiles

In figure 1 we show a dart and a kite. All angles are multiples of (a tenth of a full turn). If the shorter edges are of length 1, then the longer edges are of length , where is the golden ratio.

Figure 1: The Dart and Kite Tiles

Aperiodic Infinite Tilings

What is interesting about these tiles is:

It is possible to tile the entire plane with kites and darts in an aperiodic way.

Such a tiling is non-periodic and does not contain arbitrarily large periodic regions or patches.

The possibility of aperiodic tilings with kites and darts was discovered by Sir Roger Penrose in 1974. There are other shapes with this property, including a chiral aperiodic monotile discovered in 2023 by Smith, Myers, Kaplan, Goodman-Strauss. (See the Penrose Tiling Wikipedia page for the history of aperiodic tilings)

This package is entirely concerned with Penrose’s kite and dart tilings also known as P2 tilings.

Legal Tilings

In figure 2 we add a temporary green line marking purely to illustrate a rule for making legal tilings. The purpose of the rule is to exclude the possibility of periodic tilings.

If all tiles are marked as shown, then whenever tiles come together at a point, they must all be marked or must all be unmarked at that meeting point. So, for example, each long edge of a kite can be placed legally on only one of the two long edges of a dart. The kite wing vertex (which is marked) has to go next to the dart tip vertex (which is marked) and cannot go next to the dart wing vertex (which is unmarked) for a legal tiling.

Figure 2: Marked Dart and Kite

Correct Tilings

Unfortunately, having a finite legal tiling is not enough to guarantee you can continue the tiling without getting stuck. Finite legal tilings which can be continued to cover the entire plane are called correct and the others (which are doomed to get stuck) are called incorrect. This means that decomposition and forcing (described later) become important tools for constructing correct finite tilings.

2. Using the PenroseKiteDart Package

You will need the Haskell Diagrams package (See Haskell Diagrams) as well as this package (PenroseKiteDart). When these are installed, you can produce diagrams with a Main.hs module. This should import a chosen backend for diagrams such as the default (SVG) along with Diagrams.Prelude.

    module Main (main) where
    
    import Diagrams.Backend.SVG.CmdLine
    import Diagrams.Prelude

For Penrose’s Kite and Dart tilings, you also need to import the PKD module and (optionally) the TgraphExamples module.

    import PKD
    import TgraphExamples

Then to ouput someExample figure

    fig::Diagram B
    fig = someExample

    main :: IO ()
    main = mainWith fig

Note that the token B is used in the diagrams package to represent the chosen backend for output. So a diagram has type Diagram B. In this case B is bound to SVG by the import of the SVG backend. When the compiled module is executed it will generate an SVG file. (See Haskell Diagrams for more details on producing diagrams and using alternative backends).

3. Overview of Types and Operations

Half-Tiles

In order to implement operations on tilings (decompose in particular), we work with half-tiles. These are illustrated in figure 3 and labelled RD (right dart), LD (left dart), LK (left kite), RK (right kite). The join edges where left and right halves come together are shown with dotted lines, leaving one short edge and one long edge on each half-tile (excluding the join edge). We have shown a red dot at the vertex we regard as the origin of each half-tile (the tip of a half-dart and the base of a half-kite).

Figure 3: Half-Tile pieces showing join edges (dashed) and origin vertices (red dots)

The labels are actually data constructors introduced with type operator HalfTile which has an argument type (rep) to allow for more than one representation of the half-tiles.

    data HalfTile rep 
      = LD rep -- Left Dart
      | RD rep -- Right Dart
      | LK rep -- Left Kite
      | RK rep -- Right Kite
      deriving (Show,Eq)

Tgraphs

We introduce tile graphs (Tgraphs) which provide a simple planar graph representation for finite patches of tiles. For Tgraphs we first specialise HalfTile with a triple of vertices (positive integers) to make a TileFace such as RD(1,2,3), where the vertices go clockwise round the half-tile triangle starting with the origin.

    type TileFace  = HalfTile (Vertex,Vertex,Vertex)
    type Vertex    = Int  -- must be positive

The function

    makeTgraph :: [TileFace] -> Tgraph

then constructs a Tgraph from a TileFace list after checking the TileFaces satisfy certain properties (described below). We also have

    faces :: Tgraph -> [TileFace]

to retrieve the TileFace list from a Tgraph.

As an example, the fool (short for fool’s kite and also called an ace in the literature) consists of two kites and a dart (= 4 half-kites and 2 half-darts):

    fool :: Tgraph
    fool = makeTgraph [RD (1,2,3), LD (1,3,4)   -- right and left dart
                      ,LK (5,3,2), RK (5,2,7)   -- left and right kite
                      ,RK (5,4,3), LK (5,6,4)   -- right and left kite
                      ]

To produce a diagram, we simply draw the Tgraph

    foolFigure :: Diagram B
    foolFigure = draw fool

which will produce the diagram on the left in figure 4.

Alternatively,

    foolFigure :: Diagram B
    foolFigure = labelled drawj fool

will produce the diagram on the right in figure 4 (showing vertex labels and dashed join edges).

Figure 4: Diagram of fool without labels and join edges (left), and with (right)

When any (non-empty) Tgraph is drawn, a default orientation and scale are chosen based on the lowest numbered join edge. This is aligned on the positive x-axis with length 1 (for darts) or length (for kites).

Tgraph Properties

Tgraphs are actually implemented as

    newtype Tgraph = Tgraph [TileFace]
                     deriving (Show)

but the data constructor Tgraph is not exported to avoid accidentally by-passing checks for the required properties. The properties checked by makeTgraph ensure the Tgraph represents a legal tiling as a planar graph with positive vertex numbers, and that the collection of half-tile faces are both connected and have no crossing boundaries (see note below). Finally, there is a check to ensure two or more distinct vertex numbers are not used to represent the same vertex of the graph (a touching vertex check). An error is raised if there is a problem.

Note: If the TileFaces are faces of a planar graph there will also be exterior (untiled) regions, and in graph theory these would also be called faces of the graph. To avoid confusion, we will refer to these only as exterior regions, and unless otherwise stated, face will mean a TileFace. We can then define the boundary of a list of TileFaces as the edges of the exterior regions. There is a crossing boundary if the boundary crosses itself at a vertex. We exclude crossing boundaries from Tgraphs because they prevent us from calculating relative positions of tiles locally and create touching vertex problems.

For convenience, in addition to makeTgraph, we also have

    makeUncheckedTgraph :: [TileFace] -> Tgraph
    checkedTgraph   :: [TileFace] -> Tgraph

The first of these (performing no checks) is useful when you know the required properties hold. The second performs the same checks as makeTgraph except that it omits the touching vertex check. This could be used, for example, when making a Tgraph from a sub-collection of TileFaces of another Tgraph.

Main Tiling Operations

There are three key operations on finite tilings, namely

    decompose :: Tgraph -> Tgraph
    force     :: Tgraph -> Tgraph
    compose   :: Tgraph -> Tgraph

Decompose

Decomposition (also called deflation) works by splitting each half-tile into either 2 or 3 new (smaller scale) half-tiles, to produce a new tiling. The fact that this is possible, is used to establish the existence of infinite aperiodic tilings with kites and darts. Since our Tgraphs have abstracted away from scale, the result of decomposing a Tgraph is just another Tgraph. However if we wish to compare before and after with a drawing, the latter should be scaled by a factor times the scale of the former, to reflect the change in scale.

Figure 5: fool (left) and decompose fool (right)

We can, of course, iterate decompose to produce an infinite list of finer and finer decompositions of a Tgraph

    decompositions :: Tgraph -> [Tgraph]
    decompositions = iterate decompose

Force

Force works by adding any TileFaces on the boundary edges of a Tgraph which are forced. That is, where there is only one legal choice of TileFace addition consistent with the seven possible vertex types. Such additions are continued until either (i) there are no more forced cases, in which case a final (forced) Tgraph is returned, or (ii) the process finds the tiling is stuck, in which case an error is raised indicating an incorrect tiling. [In the latter case, the argument to force must have been an incorrect tiling, because the forced additions cannot produce an incorrect tiling starting from a correct tiling.]

An example is shown in figure 6. When forced, the Tgraph on the left produces the result on the right. The original is highlighted in red in the result to show what has been added.

Figure 6: A Tgraph (left) and its forced result (right) with the original shown red

Compose

Composition (also called inflation) is an opposite to decompose but this has complications for finite tilings, so it is not simply an inverse. (See Graphs,Kites and Darts and Theorems for more discussion of the problems). Figure 7 shows a Tgraph (left) with the result of composing (right) where we have also shown (in pale green) the faces of the original that are not included in the composition – the remainder faces.

Figure 7: A Tgraph (left) and its (part) composed result (right) with the remainder faces shown pale green

Under some circumstances composing can fail to produce a Tgraph because there are crossing boundaries in the resulting TileFaces. However, we have established that

If g is a forced Tgraph, then compose g is defined and it is also a forced Tgraph.

Try Results

It is convenient to use types of the form Try a for results where we know there can be a failure. For example, compose can fail if the result does not pass the connected and no crossing boundary check, and force can fail if its argument is an incorrect Tgraph. In situations when you would like to continue some computation rather than raise an error when there is a failure, use a try version of a function.

    tryCompose :: Tgraph -> Try Tgraph
    tryForce   :: Tgraph -> Try Tgraph

We define Try as a synonym for Either ShowS (which is a monad) in module Tgraph.Try.

type Try a = Either ShowS a

(Note ShowS is String -> String). Successful results have the form Right r (for some correct result r) and failure results have the form Left (s<>) (where s is a String describing the problem as a failure report).

The function

    runTry:: Try a -> a
    runTry = either error id

will retrieve a correct result but raise an error for failure cases. This means we can always derive an error raising version from a try version of a function by composing with runTry.

    force = runTry . tryForce
    compose = runTry . tryCompose

Elementary Tgraph and TileFace Operations

The module Tgraph.Prelude defines elementary operations on Tgraphs relating vertices, directed edges, and faces. We describe a few of them here.

When we need to refer to particular vertices of a TileFace we use

    originV :: TileFace -> Vertex -- the first vertex - red dot in figure 2
    oppV    :: TileFace -> Vertex -- the vertex at the opposite end of the join edge from the origin
    wingV   :: TileFace -> Vertex -- the vertex not on the join edge

A directed edge is represented as a pair of vertices.

    type Dedge = (Vertex,Vertex)

So (a,b) is regarded as a directed edge from a to b.

When we need to refer to particular edges of a TileFace we use

    joinE  :: TileFace -> Dedge  -- shown dotted in figure 2
    shortE :: TileFace -> Dedge  -- the non-join short edge
    longE  :: TileFace -> Dedge  -- the non-join long edge

which are all directed clockwise round the TileFace. In contrast, joinOfTile is always directed away from the origin vertex, so is not clockwise for right darts or for left kites:

    joinOfTile:: TileFace -> Dedge
    joinOfTile face = (originV face, oppV face)

In the special case that a list of directed edges is symmetrically closed [(b,a) is in the list whenever (a,b) is in the list] we can think of this as an edge list rather than just a directed edge list.

For example,

    internalEdges :: Tgraph -> [Dedge]

produces an edge list, whereas

    boundary :: Tgraph -> [Dedge]

produces single directions. Each directed edge in the resulting boundary will have a TileFace on the left and an exterior region on the right. The function

    dedges :: Tgraph -> [Dedge]

produces all the directed edges obtained by going clockwise round each TileFace so not every edge in the list has an inverse in the list.

Note 1: There is now a class HasFaces (introduced in version 1.4) which includes instances for both Tgraph and [TileFace] and others. This allows some generalisations. For example

    faces         :: HasFaces a => a -> [TileFace]
    internalEdges :: HasFaces a => a -> [Dedge]
    boundary      :: HasFaces a => a -> [Dedge] 
    dedges        :: HasFaces a => a -> [Dedge] 
    nullFaces     :: HasFaces a => a -> Bool

Note 2: There is now a class HasGraph (introduced in version 1.8) which includes instances for Tgraph as well as other types used in forcing. This allows some other generalisations. For example

    compose :: HasGraph a => a -> Tgraph
    decompose :: HasGraph a => a -> Tgraph
    recoverGraph :: HasGraph a => a -> Tgraph

Patches (Scaled and Positioned Tilings)

Behind the scenes, when a Tgraph is drawn, each TileFace is converted to a Piece. A Piece is another specialisation of HalfTile using a two dimensional vector to indicate the length and direction of the join edge of the half-tile (from the originV to the oppV), thus fixing its scale and orientation. The whole Tgraph then becomes a list of located Pieces called a Patch.

    type Piece = HalfTile (V2 Double)
    type Patch = [Located Piece]

Piece drawing functions derive vectors for other edges of a half-tile piece from its join edge vector. In particular (in the TileLib module) we have

    drawPiece :: Piece -> Diagram B
    darawjPiece :: Piece -> Diagram B
    fillPieceDK :: Colour Double -> Colour Double -> Piece -> Diagram B

where the first draws the non-join edges of a Piece, the second does the same but adds a faint dashed line for the join edge, and the third takes two colours – one for darts and one for kites, which are used to fill the piece as well as using drawPiece.

Patch is an instance of class Transformable so a Patch can be scaled, rotated, and translated.

Vertex Patches

It is useful to have an intermediate form between Tgraphs and Patches, that contains information about both the location of vertices (as 2D points), and the abstract TileFaces. This allows us to introduce labelled drawing functions (to show the vertex labels) which we then extend to Tgraphs. We call the intermediate form a VPatch (short for Vertex Patch).

    type VertexLocMap = IntMap.IntMap (Point V2 Double)
    data VPatch = VPatch {vLocs :: VertexLocMap,  vpFaces::[TileFace]} deriving Show

and

    makeVP :: HasGraph a => a -> VPatch

calculates vertex locations using a default orientation and scale.

VPatch is made an instance of class Transformable so a VPatch can also be scaled and rotated.

One essential use of this intermediate form is to be able to draw a Tgraph with labels, rotated but without the labels themselves being rotated. We can simply convert the Tgraph to a VPatch, and rotate that before drawing with labels.

    labelled draw (rotate someAngle (makeVP g))

We can also align a VPatch using vertex labels.

    alignXaxis :: (Vertex, Vertex) -> VPatch -> VPatch

So if g is a Tgraph with vertex labels a and b we can align it on the x-axis with a at the origin and b on the positive x-axis (after converting to a VPatch), instead of accepting the default orientation.

    labelled draw (alignXaxis (a,b) (makeVP g))

Another use of VPatches is to share the vertex location map when drawing only subsets of the faces (see Overlaid examples in the next section).

4. Drawing in More Detail

Class Drawable

There is a class Drawable with instances Tgraph, VPatch, Patch. When the token B is in scope standing for a fixed backend then we can assume

    draw   :: Drawable a => a -> Diagram B  -- draws non-join edges
    drawj  :: Drawable a => a -> Diagram B  -- as with draw but also draws dashed join edges
    fillDK :: Drawable a => Colour Double -> Colour Double -> a -> Diagram B -- fills with colours

where fillDK clr1 clr2 will fill darts with colour clr1 and kites with colour clr2 as well as drawing non-join edges.

These are the main drawing tools. However they are actually defined for any suitable backend b so have more general types.

(Update Sept 2024) From version 1.1 onwards of PenroseKiteDart, these are

    draw ::   (Drawable a, OKBackend b) =>
              a -> Diagram b
    drawj ::  (Drawable a, OKBackend) b) =>
              a -> Diagram b
    fillDK :: (Drawable a, OKBackend b) =>
              Colour Double -> Colour Double -> a -> Diagram b

where the class OKBackend is a check to ensure a backend is suitable for drawing 2D tilings with or without labels.

In these notes we will generally use the simpler description of types using B for a fixed chosen backend for the sake of clarity.

The drawing tools are each defined via the class function drawWith using Piece drawing functions.

    class Drawable a where
        drawWith :: (Piece -> Diagram B) -> a -> Diagram B
    
    draw = drawWith drawPiece
    drawj = drawWith drawjPiece
    fillDK clr1 clr2 = drawWith (fillPieceDK clr1 clr2)

To design a new drawing function, you only need to implement a function to draw a Piece, (let us call it newPieceDraw)

    newPieceDraw :: Piece -> Diagram B

This can then be elevated to draw any Drawable (including Tgraphs, VPatches, and Patches) by applying the Drawable class function drawWith:

    newDraw :: Drawable a => a -> Diagram B
    newDraw = drawWith newPieceDraw

Class DrawableLabelled

Class DrawableLabelled is defined with instances Tgraph and VPatch, but Patch is not an instance (because this does not retain vertex label information).

    class DrawableLabelled a where
        labelColourSize :: Colour Double -> Measure Double -> (Patch -> Diagram B) -> a -> Diagram B

So labelColourSize c m modifies a Patch drawing function to add labels (of colour c and size measure m). Measure is defined in Diagrams.Prelude with pre-defined measures tiny, verySmall, small, normal, large, veryLarge, huge. For most of our diagrams of Tgraphs, we use red labels and we also find small is a good default size choice, so we define

    labelSize :: DrawableLabelled a => Measure Double -> (Patch -> Diagram B) -> a -> Diagram B
    labelSize = labelColourSize red

    labelled :: DrawableLabelled a => (Patch -> Diagram B) -> a -> Diagram B
    labelled = labelSize small

and then labelled draw, labelled drawj, labelled (fillDK clr1 clr2) can all be used on both Tgraphs and VPatches as well as (for example) labelSize tiny draw, or labelCoulourSize blue normal drawj.

Further drawing functions

There are a few extra drawing functions built on top of the above ones. The function smart is a modifier to add dashed join edges only when they occur on the boundary of a Tgraph

    smart :: HasGraph a => (VPatch -> Diagram B) -> a -> Diagram B

So smart vpdraw g will draw dashed join edges on the boundary of g before applying the drawing function vpdraw to the VPatch for g. For example the following all draw dashed join edges only on the boundary for a Tgraph g

    smart draw g
    smart (labelled draw) g
    smart (labelSize normal draw) g

When using labels, the function rotating allows a Tgraph to be drawn rotated without rotating the labels.

    rotating :: HasGraph a => Angle Double -> (VPatch -> b) -> a -> b
    rotating angle vpdraw = vpdraw . rotate angle . makeVP

So for example,

    rotating (90@@deg) (labelled draw) g

makes sense for a Tgraph g. Of course if there are no labels we can simply use

    rotate (90@@deg) (draw g)

Similarly aligning allows a Tgraph to be aligned on the X-axis using a pair of vertex numbers before drawing.

    aligning :: HasGraph a => (Vertex,Vertex) -> (VPatch -> b) -> a -> b
    aligning (a,b) vpdraw = vpdraw . alignXaxis (a,b) . makeVP

So, for example, if Tgraph g has vertices a and b, both

    aligning (a,b) draw g
    aligning (a,b) (labelled draw) g

make sense. Note that the following two examples are wrong. Even though they type check, they re-orient g without repositioning the boundary joins.

    smart (labelled draw . rotate angle) g      -- WRONG
    smart (labelled draw . alignXaxis (a,b)) g  -- WRONG

Instead use

    smartRotating angle (labelled draw) g
    smartAligning (a,b) (labelled draw) g

where

    smartRotating :: HasGraph a =>  Angle Double -> (VPatch -> Diagram B) -> a -> Diagram B
    smartAligning  :: HasGraph a => (Vertex,Vertex) -> (VPatch -> Diagram B) -> a -> Diagram B

are defined using

    smartOn :: HasGraph a => a -> (VPatch -> Diagram B) -> VPatch -> Diagram B

Here, smartOn g vpdraw vp uses the given vp for drawing boundary joins and drawing faces of g (with vpdraw) rather than converting g to a new VPatch. This assumes vp has locations for vertices in g.

The function

    drawForce :: Tgraph -> Diagram B

will (smart) draw a Tgraph g in red overlaid (using <>) on the result of force g as in figure 6. Similarly

    drawPCompose  :: Tgraph -> Diagram B

applied to a Tgraph g will draw the result of a partial composition of g as in figure 7. That is a drawing of compose g but overlaid with a drawing of the remainder faces of g shown in pale green.

Both these functions make use of sharing a vertex location map to get correct alignments of overlaid diagrams. In the case of drawForce g, we know that a VPatch for force g will contain all the vertex locations for g since force only adds to a Tgraph (when it succeeds). So when constructing the diagram for g we can use the VPatch created for force g instead of starting afresh. Similarly for drawPCompose g the VPatch for g contains locations for all the vertices of compose g so compose g is drawn using the VPatch for g instead of starting afresh.

The location map sharing is done with

    subFaces :: HasFaces a => 
                a -> VPatch -> VPatch

so that subFaces fcs vp is a VPatch with the same vertex locations as vp, but replacing the faces of vp with fcs. [Of course, this can go wrong if the new faces have vertices not in the domain of the vertex location map so this needs to be used with care. Any errors would only be discovered when a diagram is created.]

For cases where labels are only going to be drawn for certain faces, we need a version of subFaces which also gets rid of vertex locations that are not relevant to the faces. For this situation we have

    restrictTo:: HasFaces a => 
                 a -> VPatch -> VPatch

which filters out un-needed vertex locations from the vertex location map. Unlike subFaces, restrictTo checks for missing vertex locations, so restrictTo fcs vp raises an error if a vertex in fcs is missing from the keys of the vertex location map of vp.

5. Forcing in More Detail

The force rules

The rules used by our force algorithm are local and derived from the fact that there are seven possible vertex types as depicted in figure 8.

Figure 8: Seven vertex types

Our rules are shown in figure 9 (omitting mirror symmetric versions). In each case the TileFace shown yellow needs to be added in the presence of the other TileFaces shown.

Figure 9: Rules for forcing

Main Forcing Operations

To make forcing efficient we convert a Tgraph to a BoundaryState to keep track of boundary information of the Tgraph, and then calculate a ForceState which combines the BoundaryState with a record of awaiting boundary edge updates (an update map), and an UpdateGenerator. Then each face addition is carried out on a ForceState, converting back when all the face additions are complete. It makes sense to apply force (and related functions) to a Tgraph, a BoundaryState, or a ForceState, so we define a class Forcible with instances Tgraph, BoundaryState, and ForceState.

This allows us to define

    force :: Forcible a => a -> a
    tryForce :: Forcible a => a -> Try a

The first will raise an error if a stuck tiling is encountered. The second uses a Try result which produces a Left string for failures and a Right a for successful result a.

There are several other operations related to forcing including

    stepForce :: Forcible a => Int -> a -> a
    tryStepForce  :: Forcible a => Int -> a -> Try a

    addHalfDart, addHalfKite :: Forcible a => Dedge -> a -> a
    tryAddHalfDart, tryAddHalfKite :: Forcible a => Dedge -> a -> Try a

The first two force (up to) a given number of steps (=face additions) and the other four add a half dart/kite on a given boundary edge.

Update Generators

An update generator is used to calculate which boundary edges can have a certain update. There is an update generator for each force rule, but also a combined (all update) generator. The force operations mentioned above all use the default all update generator (defaultAllUGen) but there are more general (with) versions that can be passed an update generator of choice. For example

    forceWith :: Forcible a => UpdateGenerator -> a -> a
    tryForceWith :: Forcible a => UpdateGenerator -> a -> Try a

We can also define

    wholeTiles :: Forcible a => a -> a
    wholeTiles = forceWith wholeTileUpdates

where wholeTileUpdates is an update generator that just finds boundary join edges to complete whole tiles.

In fact UpdateGenerators are functions that take a BoundaryState and a focus (list of boundary directed edges) to produce an update map. Each Update is calculated as either a SafeUpdate (where two of the new face edges are on the existing boundary and no new vertex is needed) or an UnsafeUpdate (where only one edge of the new face is on the boundary and a new vertex needs to be created for a new face).

    type UpdateGenerator = BoundaryState -> [Dedge] -> Try UpdateMap
    type UpdateMap = Map.Map Dedge Update
    data Update = SafeUpdate TileFace 
                | UnsafeUpdate (Vertex -> TileFace)

Completing (executing) an UnsafeUpdate requires a touching vertex check to ensure that the new vertex does not clash with an existing boundary vertex. Using an existing (touching) vertex would create a crossing boundary so such an update has to be blocked.

Forcible Class Operations

The Forcible class operations are higher order and designed to allow for easy additions of further generic operations. They take care of conversions between Tgraphs, BoundaryStates and ForceStates. The first two are designed to create functions that return the same Forcible type as the input.

    class Forcible a where
      tryFSOp :: (ForceState -> Try ForceState) -> a -> Try a
      tryChangeBoundary :: (BoundaryState -> Try BoundaryChange) -> a -> Try a
      tryInitFS :: a -> Try ForceState

For example, given any f:: ForceState -> Try ForceState , then f can be generalised to work on any Forcible using tryFSOp f. This is used to define both tryForce and tryStepForce.

Similarly given any f:: BoundaryState -> Try BoundaryChange , then f can be generalised to work on any Forcible using tryChangeBoundary f. This is used to define tryAddHalfDart and tryAddHalfKite.

Note that the type BoundaryChange contains a resulting BoundaryState, the single TileFace that has been added, a list of edges removed from the boundary (of the BoundaryState prior to the face addition), and a list of the (3 or 4) boundary edges affected around the change that require checking or re-checking for updates.

The class function tryInitFS will create an initial ForceState for any Forcible. If the Forcible is already a ForceState it will do nothing. Otherwise it will calculate updates for the whole boundary using defaultAllUGen.

The update generator is assumed to be defaultAllUGen but this can be changed using

    tryFSOpWith :: Forcible a => UpdateGenerator -> (ForceState -> Try ForceState) -> a -> Try a

so, for example, we defined

    tryForceWith ugen = tryFSOpWith ugen tryForce

Efficient chains of forcing operations.

Note that (force . force) does the same as force, but we might want to chain other force related steps in a calculation.

For example, consider the following combination which, after decomposing a Tgraph, forces, then adds a half dart on a given boundary edge (d) and then forces again.

    combo :: Dedge -> Tgraph -> Tgraph
    combo d = force . addHalfDart d . force . decompose

Since decompose produces a Tgraph, the instances of force and addHalfDart d will have type Tgraph -> Tgraph so each of these operations, will begin and end with conversions between Tgraph and ForceState. We would do better to avoid these wasted intermediate conversions working only with ForceStates and keeping only those necessary conversions at the beginning and end of the whole sequence.

This can be done using tryFSOp. To see this, let us first re-express the forcing sequence using the Try monad, so

    force . addHalfDart d . force

becomes

    tryForce <=< tryAddHalfDart d <=< tryForce

Note that (<=<) is the Kliesli arrow which replaces composition for Monads (defined in Control.Monad). (We could also have expressed this right to left sequence with a left to right version tryForce >=> tryAddHalfDart d >=> tryForce). The definition of combo becomes

    combo :: Dedge -> Tgraph -> Tgraph
    combo d = runTry . (tryForce <=< tryAddHalfDart d <=< tryForce) . decompose

This has no performance improvement, but now we can pass the sequence to tryFSOp to remove the unnecessary conversions between steps.

    combo :: Dedge -> Tgraph -> Tgraph
    combo d = runTry . tryFSOp (tryForce <=< tryAddHalfDart d <=< tryForce) . decompose

The sequence actually has type Forcible a => a -> Try a but when passed to tryFSOp it specialises to type ForceState -> Try ForseState. This ensures the sequence works on a ForceState and any conversions are confined to the beginning and end of the sequence, avoiding unnecessary intermediate conversions.

A limitation of forcing

To avoid creating touching vertices (or crossing boundaries) a BoundaryState keeps track of locations of boundary vertices. At around 35,000 face additions in a single force operation the calculated positions of boundary vertices can become too inaccurate to prevent touching vertex problems. In such cases it is better to use

    recalibratingForce :: Forcible a => a -> a
    tryRecalibratingForce :: Forcible a => a -> Try a

These work by recalculating all vertex positions at 20,000 step intervals to get more accurate boundary vertex positions. For example, 6 decompositions of the kingGraph has 2,906 faces. Applying force to this should result in 53,574 faces but will go wrong before it reaches that. This can be fixed by calculating either

    recalibratingForce (decompositions kingGraph !!6)

or using an extra force before the decompositions

    force (decompositions (force kingGraph) !!6)

In the latter case, the final force only needs to add 17,864 faces to the 35,710 produced by decompositions (force kingGraph) !!6.

6. Advanced Operations

Guided comparison of `Tgraph`s

Asking if two Tgraphs are equivalent (the same apart from choice of vertex numbers) is a an np-complete problem. However, we do have an efficient guided way of comparing Tgraphs. In the module Tgraph.Rellabelling we have

    sameGraph :: (Tgraph,Dedge) -> (Tgraph,Dedge) -> Bool

The expression sameGraph (g1,d1) (g2,d2) asks if g2 can be relabelled to match g1 assuming that the directed edge d2 in g2 is identified with d1 in g1. Hence the comparison is guided by the assumption that d2 corresponds to d1.

It is implemented using

    tryRelabelToMatch :: (Tgraph,Dedge) -> (Tgraph,Dedge) -> Try Tgraph

where tryRelabelToMatch (g1,d1) (g2,d2) will either fail with a Left report if a mismatch is found when relabelling g2 to match g1 or will succeed with Right g3 where g3 is a relabelled version of g2. The successful result g3 will match g1 in a maximal tile-connected collection of faces containing the face with edge d1 and have vertices disjoint from those of g1 elsewhere. The comparison tries to grow a suitable relabelling by comparing faces one at a time starting from the face with edge d1 in g1 and the face with edge d2 in g2. (This relies on the fact that Tgraphs are connected with no crossing boundaries, and hence tile-connected.)

The above function is also used to implement

    tryFullUnion:: (Tgraph,Dedge) -> (Tgraph,Dedge) -> Try Tgraph

which tries to find the union of two Tgraphs guided by a directed edge identification. However, there is an extra complexity arising from the fact that Tgraphs might overlap in more than one tile-connected region. After calculating one overlapping region, the full union uses some geometry (calculating vertex locations) to detect further overlaps.

Finally we have

    commonFaces:: (Tgraph,Dedge) -> (Tgraph,Dedge) -> [TileFace]

which will find common regions of overlapping faces of two Tgraphs guided by a directed edge identification. The resulting common faces will be a sub-collection of faces from the first Tgraph. These are returned as a list as they may not be a connected collection of faces and therefore not necessarily a Tgraph.

Empires and SuperForce

In Empires and SuperForce we discussed forced boundary coverings which were used to implement both a superForce operation

    superForce:: Forcible a => a -> Forced a

and operations to calculate empires.

We will not repeat the descriptions here other than to note that

    forcedBoundaryECovering:: Tgraph -> [Forced Tgraph]

finds boundary edge coverings after forcing a Tgraph. That is, forcedBoundaryECovering g will first force g, then (if it succeeds) finds a collection of (forced) extensions to force g such that

each extension has the whole boundary of force g as internal edges.
each possible addition to a boundary edge of force g (kite or dart) has been included in the collection.

(possible here means – not leading to a stuck Tgraph when forced.) There is also

    forcedBoundaryVCovering:: Tgraph -> [Forced Tgraph]

which does the same except that the extensions have all boundary vertices internal rather than just the boundary edges. In both cases the result is a list of explicitly forced Tgraphs (discussed next).

Combinations and Explicitly Forced

We introduced a new type Forced (in v 1.3) to enable a forcible to be explictily labelled as being forced. For example

    forceF    :: Forcible a => a -> Forced a 
    tryForceF :: Forcible a => a -> Try (Forced a)
    forgetF   :: Forced a -> a

This allows us to restrict certain functions which expect a forced argument by making this explicit.

    composeF :: HasGraph a => Forced a -> Forced Tgraph

The definition makes use of theorems established in Graphs,Kites and Darts and Theorems that composing a forced Tgraph does not require a check (for connectedness and no crossing boundaries) and the result is also forced. This can then be used to define efficient combinations such as

    compForce:: (Forcible a, HasGraph a) => a -> Forced Tgraph      -- compose after forcing
    compForce = composeF . forceF

    allCompForce:: (Forcible a, HasGraph a) => a -> [Forced Tgraph] -- iterated (compose after force) while not emptyTgraph
    maxCompForce:: (Forcible a, HasGraph a) => a -> Forced Tgraph   -- last item in allCompForce (or emptyTgraph)

Note that BoundaryState, ForceState as well as Tgraph and Forced versions of these are all instances of class HasGraph.

Tracked Tgraphs

The type

    data TrackedTgraph = TrackedTgraph
       { tgraph  :: Tgraph
       , tracked :: [[TileFace]] 
       } deriving Show

has proven useful in experimentation as well as in producing artwork with darts and kites. The idea is to keep a record of sub-collections of faces of a Tgraph when doing both force operations and decompositions. A list of the sub-collections forms the tracked list associated with the Tgraph. We make TrackedTgraph an instance of class Forcible by having force operations only affect the Tgraph and not the tracked list. The significant idea is the implementation of

    decomposeTracked :: TrackedTgraph -> TrackedTgraph

Decomposition of a Tgraph involves introducing a new vertex for each long edge and each kite join. These are then used to construct the decomposed faces. For decomposeTracked we do the same for the Tgraph, but when it comes to the tracked collections, we decompose them re-using the same new vertex numbers calculated for the edges in the Tgraph. This keeps a consistent numbering between the Tgraph and tracked faces, so each item in the tracked list remains a sub-collection of faces in the Tgraph.

The function

    drawTrackedTgraph :: [VPatch -> Diagram B] -> TrackedTgraph -> Diagram B

is used to draw a TrackedTgraph. It uses a list of functions to draw VPatches. The first drawing function is applied to a VPatch for any untracked faces. Subsequent functions are applied to VPatches for the tracked list in order. Each diagram is beneath later ones in the list, with the diagram for the untracked faces at the bottom. The VPatches used are all restrictions of a single VPatch for the Tgraph, so will be consistent in vertex locations. When labels are used, there is also a drawTrackedTgraphRotating and drawTrackedTgraphAligning for rotating or aligning the VPatch prior to applying the drawing functions.

Note that the result of calculating empires (see Empires and SuperForce ) is represented as a TrackedTgraph. The result is actually the common faces of a forced boundary covering, but a particular element of the covering (the first one) is chosen as the background Tgraph with the common faces as a tracked sub-collection of faces. Hence we have

    empire1, empire2 :: Tgraph -> TrackedTgraph
    
    drawEmpire :: TrackedTgraph -> Diagram B

Figure 10 was also created using TrackedTgraphs.

Figure 10: Using a TrackedTgraph for drawing

7. Other Reading

Previous related blogs are:

Diagrams for Penrose Tiles – the first blog introduced drawing Pieces and Patches (without using Tgraphs) and provided a version of decomposing for Patches (decompPatch).
Graphs, Kites and Darts intoduced Tgraphs. This gave more details of implementation and results of early explorations. (The class Forcible was introduced subsequently).
Empires and SuperForce – these new operations were based on observing properties of boundaries of forced Tgraphs.
Graphs,Kites and Darts and Theorems established some important results relating force, compose, decompose.

Constructing Clifford Algebras using the Super Tensor Product

2026-04-23T00:28:40Z

Google have stopped supporting the Chart API so all of the mathematics notation below is missing. There is a PDF version of this article at GitHub.

Some literate Haskell but little about this code is specific to Haskell...

> {-# LANGUAGE DataKinds #-}
> {-# LANGUAGE TypeFamilies #-}
> {-# LANGUAGE TypeOperators #-}
> {-# LANGUAGE UndecidableInstances #-}
> 
> import GHC.TypeLits

Introduction

This is a followup to Geometric Algebra for Free and More Low Cost Geometric Algebra.

In those articles I showed how you could build up the Clifford algebras like so:

type Cliff1  = Complex R
type Cliff1' = Split R
type Cliff2  = Quaternion R
type Cliff2' = Matrix R
type Cliff3  = Quaternion Cliff1'
type Cliff3' = Matrix Cliff1
type Cliff4  = Quaternion Cliff2'
type Cliff4' = Matrix Cliff2
type Cliff5  = Quaternion Cliff3'
...

I used CliffN as the Clifford algebra for a negative definite inner product and CliffN' for the positive definite case. It's not a completely uniform sequence in the sense that CliffN is built from CliffN' for dimension two lower and you use a mix of Matrix and Quaternion.

The core principle making this work is that for type constructors implemented like Matrix, Quaternion etc. we have the property that

eg. Matrix (Quaternion Float) is effectively the same thing as Matrix Float Quaternion Float.

But John Baez pointed out to me that you can build up the CliffN algebras much more simply enabling us to use these definitions:

> type Cliff1 = Complex Float
> type Cliff2 = Complex Cliff1
> type Cliff3 = Complex Cliff2
> type Cliff4 = Complex Cliff3
> type Cliff5 = Complex Cliff4

...

Or even better:

> type family Cliff (n :: Nat) :: * where
>   Cliff 0 = Float
>   Cliff n = Complex (Cliff (n - 1))

But there's one little catch. We have to work, not with the tensor product, but the super tensor product.

We define Complex the same way as before:

> data Complex a = C a a deriving (Eq, Show)

Previously we used a definition of multiplication like this:

instance Num a => Num (Complex a) where
  C a b * C c d = C (a * c - b * d) (a * d + b * c)

We can think of C a b in Complex R as representing the element $1\otimes a+i\otimes b$. The definition of multiplication in a tensor product of algebras is

\[(a\otimes b)(c\otimes d)=(ac)\otimes(bd).\]

So we have

\[(1\otimes a+i\otimes b)(1\otimes c+i\otimes d)\]

\[=1\otimes ac+i\otimes ad+i\otimes bc+i^2\otimes bd\]

\[=1\otimes(ac-bd)+i\otimes(ad+bc).\]

This means that line of code we wrote above defining * for Complex isn't simply a definition of multiplication of complex numbers, it says how to multiply in an algebra tensored with the complex numbers.

Let's go Super!

A superalgebra is an algebra graded by where is the ring of integers modulo 2. What that means is that we have some algebra that can be broken down as a direct sum (the subscripts live in ) with the property that multiplication respects the grading, ie. if is in and is in then is in .

The elements of are called "even" (or bosonic) and those in "odd" (or fermionic). Often even elements commute with everything and odd elements anticommute with each other but this isn't always the case. (The superalgebra is said to be supercommutative when this happens. This is a common pattern: a thing X becomes a superX if it has odd and even parts and swapping two odd things introduces a sign flip.)

The super tensor product is much like the tensor product but it respects the grading. This means that if is in and is in then is in . From now on I'm using to mean super tensor product.

Multiplication in the super tensor product of two superalgebras and is now defined by the following modified rule: if is in and is in then . Note that the sign flip arises when we shuffle an odd left past an odd .

The neat fact that John pointed out to me is that

\[Cliff_n=\mathbb{C}\otimes\mathbb{C}\otimes\ldots\text{ n times }\ldots\otimes\mathbb{C}.\]

We have to modify our definition of * to take into account that sign flip.

I initially wrote a whole lot of code to define a superalgebra as a pair of algebras with four multiplication operations and it got a bit messy. But I noticed that the only specifically superalgebraic operation I ever performed on an element of a superalgebra was negating the odd part of an element.

So I could define SuperAlgebra like so:

class SuperAlgebra a where
  conjugation :: a -> a

where conjugation is the negation of the odd part.

(I'm not sure if this operation corresponds to what is usually called conjugation in this branch of mathematics.)

But there's a little efficiency optimization I want to write. If I used the above definition, then later I'd often find myself computing a whole lot of negates in a row. This means applying negate to many elements of large algebraic objects even though any pair of them cancel each other's effect. So I add a little flag to my conjugation function that is used to say we want an extra negate and we can accumulate flips of a flag rather than flips of lots of elements.

> class SuperAlgebra a where
>   conjugation :: Bool -> a -> a

Here's our first instance:

> instance SuperAlgebra Float where
>   conjugation False x = x
>   conjugation True x = negate x

This is saying that the conjugation is the identity on Float but if we want to perform an extra flip we can set the flag to True. Maybe I should call it conjugationWithOptionalExtraNegation.

And now comes the first bit of non-trivial superalgebra:

> instance (Num a, SuperAlgebra a) => SuperAlgebra (Complex a) where
>   conjugation e (C a b) = C (conjugation e a) (conjugation (not e) b)

We consider to be even and to be odd. When we apply the conjugation to then we can just apply it directly to . But that flips the "parity" of (because tensor product respects the grading) so we need to swap when we use the conjugation. And that should explain why conjugation is defined the way it is.

Now we can use the modified rule for defined above:

> instance (Num a, SuperAlgebra a) => Num (Complex a) where
>   fromInteger n = C (fromInteger n) 0
>   C a b + C a' b' = C (a + a') (b + b')
>   C a b * C c d = C (a * c - conjugation False b * d)
>                     (conjugation False a * d + b * c) 
>   negate (C a b) = C (negate a) (negate b)
>   abs = undefined
>   signum = undefined

For example, conjugation False is applied to the first on the RHS because implicitly represents an term and when expanding out the product we shuffle the (odd) in left of . It doesn't get applied to the second because and remain in the same order.

That's it!

Tests

I'll test it with some examples from Cliff3:

> class HasBasis a where
>   e :: Integer -> a


> instance HasBasis Float where
>   e = undefined


> instance (Num a, HasBasis a) => HasBasis (Complex a) where
>   e 0 = C 0 1
>   e n = C (e (n - 1)) 0


> make a b c d e f g h =
>   C (C (C a b) (C c d))
>     (C (C e f) (C g h))


> e1, e2, e3, e21, e31, e32, e321 :: Cliff 3
> e1 = e 0
> e2 = e 1
> e21 = e2 * e1
> e3 = e 2
> e31 = e3 * e1
> e32 = e3 * e2
> e321 = e3 * e2 * e1


> main = do
>     print (e1 * e1 + 1 == 0)
>     print (e31 * e31 + 1 == 0)
>     print (e3 * e3 + 1 == 0)
>     print (e21 * e21 + 1 == 0)
>     print (e2 * e2 + 1 == 0)
>     print (e32 * e32 + 1 == 0)
>     print (e321 * e321 - 1 == 0)
>     print (e3 * e2 * e1 - e321 == 0)
>     print (e2 * e1 - e21 == 0)
>     print (e3 * e1 - e31 == 0)
>     print (e3 * e2 - e32 == 0)
>     print (e21 * e32 - e31 == 0)

Observation

The implementation of multiplication looks remarkably like it's the Cayley-Dickson construction. It can't be (because iterating it three times gives you a non-associative algebra but the Clifford algebras are associative). Nonetheless, I think comparison with Cayley-Dickson may be useful.

Efficiency

As mentioned above, before I realised I just needed the conjugation operation I wrote the above code with an explicit split of a superalgebra into two pieces intertwined by four multiplications. I think the previous approach may have a big advantage - it may be possible to use variations on the well known "speed-up" of complex multiplication that uses three real multiplications instead of four. This should lead to a fast implementation of Clifford algebras.

Also be warned: you can kill GHC if you turn on optimization and try to multiply elements of high-dimensional Clifford algebras. I think it tries to inline absolutely everything and you end up with a block of code that grows exponentially with .

Note also that this code translates directly into many languages.

Self-referential logic via self-referential circuits

2026-04-22T23:29:47Z

Introduction

TL;DR The behaviour of a certain kind of delay component has a formal similarity to Löb's theorem which gives a way to embed part of provability logic into electronic circuits.

Here's a famous paradoxical sentence:

This sentence is false

If it's false then it's true and if it's true then it's false.

Here's a paradoxical electronic circuit:

The component in the middle is an inverter. If the output of the circuit is high then its input is high and then it's output must be low, and vice versa.

There's a similarity here. But with a bit of tweaking you can turn the similarity into an isomorphism of sorts.

In the first case we avoid paradox by noting that in the mathematical frameworks commonly used by mathematicians it's impossible, in general, for a statement to assert it's own falsity. Instead, a statement can assert its own unprovability and then we get Gödel's incompleteness theorems and a statement that is apparently true and yet can't be proved.

In the second case we can't model the circuit straightforwardly as a digital circuit. In practice it might settle down to a voltage that lies between the official high and low voltages so we have to model it as an analogue circuit. Or instead we can introduce a clock and arrange that the feedback in the circuit is delayed. We then get an oscillator circuit that can be thought of as outputting a stream of bits.

The observation I want to make is that if the feedback delay is defined appropriately, these two scenarios are in some sense isomorphic. This means that we can model classic results about provability, like Gödel's incompleteness theorems, using electronic circuits. We can even use such circuits to investigate what happens when logicians or robots play games like Prisoner's Dilemma. I'll be making use of results found in Boolos' book on The Logic of Provability and some ideas I borrowed from Smoryński's paper on Fixed Point Algebras. I'll be assuming the reader has at least a slight acquaintance with ithe ideas behind provability logic.

Provability Logic

There are many descriptions of provability logic (aka GL) available online, so I'm not going to repeat it all here. However, I've put some background material in the appendix below and I'm going to give a very brief reminder now.

Start with (classical) propositional calculus which has a bunch of variables with names like $a, b, c, d, \ldots$ and connectives like $\wedge$ for AND, $\vee$ for OR, $\neg$ for NOT and $\rightarrow$ for implication. (Note that $a\rightarrow b = \neg a\vee b$.)

Provability logic extends propositional calculus by adding a unary operator $\Box$. (I apologise, that's meant to be a □ but it's coming out like $\Box$ in LaTeX formulae. I think it's a bug in Google's LaTeX renderer.) The idea is that $\Box p$ asserts that $p$ is provable in Peano Arithmetic, aka PA. In addition to the axioms of propositional calculus we have

$\Box(p\rightarrow q)\rightarrow\Box p\rightarrow\Box q$

and

$\Box p\rightarrow\Box\Box p$

as well as a rule that allows us to deduce $\Box p$ from $p$.

We also have this fixed point property:

Let $F(p)$ be any predicate we can write in the language of GL involving the variable $p$, and suppose that every appearance of $p$ in $F(p)$ is inside a $\Box$, e.g. $F(p)=\Box p\vee\Box(\neg p)$. Then there is a fixed point, i.e. a proposition $q$ that makes no mention of $p$ such that $q\leftrightarrow F(q)$ is a theorem. In effect, for any such $F$, $q$ is a proposition that asserts $F(q)$.

See the appendix for a brief mention of why we should expect this to be true.

From the fixed point property we can deduce Löb's theorem: $\Box(\Box p\rightarrow p)\rightarrow\Box p$. There is a proof at wikipedia that starts from the fixed point property.

We can also deduce the fixed point property from Löb's theorem so it's more usual to take Löb's theorem as an axiom of GL and show that the fixed point property follows. You can think of Löb's theorem as a cunning way to encode the fixed point property. In fact you can argue that it's a sort of Y-combinator, the function that allows the formation of recursive fixed points in functional programming languages. (That's also, sort of, the role played by the loeb function I defined way back. But note that loeb isn't really a proof of Löb's theorem, it just has formal similarities.)

Back to electronic circuits

In order to make digital circuits with feedback loops well-behaved I could introduce a circuit element that results in a delay of one clock cycle. If you insert one of these into the inverter circuit I started with you'll end up with an oscillator that flips back and forth between 0 and 1 on each clock cycle. But I want to work with something slightly stricter. I'd like my circuits to eventually stop oscillating. (I have an ulterior motive for studying these.) Let me introduce this component:

It is intended to serve as a delayed latch and I'll always have the flow of data being from left to right. The idea is that when it is switched on it outputs 1. It keeps outputting 1 until it sees a 0 input. When that happens, then on the next clock cycle its output drops to 0 and never goes back up to 1 until reset.

Because the output of our delay-latch isn't a function of its current input, we can't simply describe its operation as a mathematical function from $\{0,1\}$ to $\{0,1\}$. Instead let's think of electronic components as binary operators on bitstreams, i.e. infinite streams of binary digits like ...00111010 with the digits emerging over time starting with the one written on the right and working leftwards. The ordinary logic gates perform bitwise operations which I'll represent using the operators in the C programming language. For example,

...001110 & ...101010 = ...001010

and

~...101 = ...010

and so on. Let's use □ to represent the effect of latch-delay on a bitstream. We have, for example,

□...000 = ...001

and

□...11101111 = ...00011111.

The operator □ takes the (possibly empty) contiguous sequence of 1's at the end of the bitstream, extends it by one 1, and sets everything further to the left to 0. If we restrict ourselves to bitstreams that eventually become all 0's or all 1's on the left, then bitstreams are in one-to-one correspondence with the integers using the twos complement representation. For example ...111111, all 1's, represents the number -1. I'll simply call the bistreams that represent integers integers. With this restriction we can use a classic C hacker trick to write □p=p^(p+1) where ^ is the C XOR operator. The operator □ outputs the bits that get flipped when you add one.

Let's use the symbol → so that a → b is shorthand for ~a|b. Here are some properties of □:

1. □(-1) = -1

2. □p → □□p = -1

3. □(p → q) → □p → □q = -1

In addition we have the fixed point property:

Let F(p) be any function of p we can write using □ and the bitwise logical operators and such that all occurrences of p occur inside □. Then there is a unique bitstream q such that q=F(q).

We can make this clearer if we return to circuits. F(p) can be thought of as a circuit that takes p as input and outputs some value. We build the circuit using only boolean logic gates and delay-latch. We allow feedback loops, but only ones that go through delay-latches. With these restrictions it's pretty clear that the circuit is well-behaved and deterministically outputs a bitstream.

We also have the Löb property:

4. □(□p → p) → □p = -1

We can see this by examining the definition of □. Intuitively it says something like "once □ has seen a 0 input then no amount of setting input bits to 1 later in the stream make any different to its output".

I hope you've noticed something curious. These properties are extremely close to the properties of $\Box$ in GL. In fact, these electronic circuits form a model of the part of GL that doesn't involve variable names, i.e. what's known as letterless GL. We can formalise this:

1. Map $\bot$ to a wire set to 0, which outputs ...000 = 0.

2. Map $\top$ to a wire set to 1, which outputs ...111 = -1.

3. Map $p \circ q$, where $\circ$ is a binary connective, by creating a circuit that takes the outputs from the circuits for $p$ and $q$ and passes them into the corresponding boolean logic gate.

4. Map $\Box p$ to the circuit for $p$ piped through a delay-latch.

For example, let's convert $\Box(\Box\bot\rightarrow\bot)\rightarrow\Box\bot$ into a circuit. I'm translating $a\rightarrow b$ to the circuit for $\neg a\vee b$.

I'm using red wires to mean wires carrying the value 1 rather than 0. I hope you can see that this circuit eventually settles into a state that outputs nothing but 1s.

We have this neat result:

Because delay-latch satisfies the same equations as $\Box$ in provability logic, any theorem, translated into a circuit, will produce a bistream of just 1s, i.e. -1.

But here's a more surprising result: the converse is true.

If the circuit corresponding to a letterless GL proposition produces a bistream of just 1s then the proposition is actually a theorem of GL.

I'm not going to prove this. (It's actually a disguised form of lemma 7.4 on p.95 of Boolos' book.) In the pictured example we got ...1111, so the circuit represents a theorem. As it represents Löb's theorem for the special case $p=\bot$ we should hope so. More generally, any bitstream that represents an integer can be converted back into a proposition that is equivalent to the original proposition. This means that bitstreams faithfully represent propositions of letterless GL. I'm not going to give the translation here but it's effectively given in Chapter 7 of Boolos. I'll use $\psi(p)$ to represent the translation from propositions to bitstreams via circuits that I described above. Use $\phi(b)$ to represent the translation of bitstream $b$ back into propositions. We have $p\leftrightarrow\phi(\psi(p))$. But I haven't given a full description of $\phi$ and I haven't proved here that it has this property.

Circuits with feedback

In the previous section I considered letterless propositions of GL. When these are translated into circuits they don't have feedback loops. But we can also "solve equations" in GL using circuits with feedback. The GL fixed point theorem above says that we can "solve" the equation $p\leftrightarrow F(p)$, with one letter $p$, to produce a letterless proposition $q$ such that $q\leftrightarrow F(q)$. Note here that $p$ is a letter in the language of GL. But I'm using $q$ to represent a proposition in letterless GL. If we build a circuit to represent $F$, and feed its output back into where $p$ appears, then the output bitstream represents the fixed point. Here's a translation of the equation $p \leftrightarrow \neg(\Box p \vee \Box\Box\Box p)$:

I'll let you try to convince yourself that such circuits always eventually output all 0's or all 1's. When we run the circuit we get the output ...1111000 = -8. As this is not -1 we know that the fixed point isn't a theorem. If I'd defined $\phi$ above you could use it to turn the bitstream back into a proposition.

The same, syntactically (optional section)

I have a Haskell library on github for working with GL: provability. This uses a syntactic approach and checks propositions for theoremhood using a tableau method. We can use it to analyse the above example with feedback. I have implemented a function, currently called value', to perform the evaluation of the bitstream for a proposition. However, in this case the fixedpoint function computes the fixed point proposition first and then converts to a bitstream rather than computing the bitstream directly from the circuit for F:

> let f p = Neg (Box p \/ Box (Box (Box p)))
> let Just p = fixedpoint f
> p
Dia T /\ Dia (Dia T /\ Dia (Dia T /\ Dia T))
> value' p
-8

(Note that Dia p means $\Diamond p = \neg\Box\neg p$.)

The function fixedpoint does a lot of work under the hood. (It uses a tableau method to carry out Craig interpolation.) The circuit approach requires far less work.

Applications

1. Programs that reason about themselves

In principle we can write a program that enumerates all theorems of PA. That means we can use a quine trick to write a computer program that searches for a proof, in PA, of its own termination. Does such a program terminate?

We can answer this with Löb's theorem. Let $p =$ "The program terminates". The program terminates if it can prove its termination. Formally this means we assume $\Box p\rightarrow p$. Using one of the derivation rules of GL we get $\Box(\Box p\rightarrow p)$. Löb's theorem now gives us $\Box p$. Feed that back into our original hypothesis and we get $p$. In other words, we deduce that our program does in fact terminate. (Thanks to Sridhar Ramesh for pointing this out to me.)

But we can deduce this using a circuit. We want a solution to $p\leftrightarrow \Box p$. Here's the corresponding circuit:

It starts by outputting 1's and doesn't stop. In other words, the fixed point is a theorem. And that tells us $p$ is a theorem. And hence that the program terminates.

2. Robots who reason about each others play in Prisoner's Dilemma

For the background to this problem see Robust Cooperation in the Prisoner's Dilemma at LessWrong. We have two robot participants $A$ and $B$ playing Prisoner's Dilemma. Each can examine the other's source code and can search for proofs that the opponent will cooperate. Suppose each robot is programmed to enumerate all proofs of PA and cooperate if it finds a proof that its opponent will cooperate. Here we have $p =$ "A will cooperate" and $q =$ "B will cooperate". Our assumptions about the behaviour of the robots are $p \leftrightarrow \Box q$ and $q \leftrightarrow \Box p$, and hence that $p \leftrightarrow \Box\Box p$. This corresponds to the circuit:

This outputs ...1111 = -1 so we can conclude $p$ and hence that these programs will cooperate. (Note that this doesn't work out nicely if robot B has a program that doesn't terminate but whose termination isn't provable in the formal system A is using. That means this approach is only good for robots that want to cooperate and want to confirm such cooperation. See the paper for more on this.)

At this point I really must emphasise that these applications are deceptively simple. I've shown how these simple circuits can answer some tricky problems about provability. But these aren't simply the usual translations from boolean algebra to logic gates. They work because circuits with delay-latch provide a model for letterless provability logic and that's only the case because of a lot of non-trivial theorem proving in Boolos that I haven't reproduced here. You're only allowed to use these simple circuits once you've seen the real proofs :-)

Things I didn't say above

1. I described the translation from propositions to circuits that I called $\psi$ above. But I didn't tell you what $\phi$ looks like. I'll leave this as an exercise. (Hint: consider the output from the translation of $\Box^n\bot$ into a circuit.)

2. The integers, considered as bistreams, with the bitwise operators, and the unary operator □p=p^(p+1), form an algebraic structure. For example, if we define ⋄p=~□~p we have a Magari algebra. Structures like these are intended to capture the essential parts of self-referential arguments in an algebraic way.

3. Because of the interpretation of □ as a delayed latch in a circuit you could view it as saying "my input was always true until a moment ago". This surely embeds provability logic in a temporal logic of some sort.

4. (Deleted speculations about tit-for-tat that need rethinking.)

5. For even the most complex letterless proposition in Boolos you could check its theoremhood with a pretty small circuit. You could even consider doing this with a steam powered pneumatic circuit. I had to say that to fulfil a prophecy and maintain the integrity of the timeline.

Appendix on provability

The modern notion of a proof is that it is a string of symbols generated from some initial strings called "axioms" and some derivation rules that make new strings from both axioms and strings you've derived previously. Usually we pick axioms that represent "self-evident" truths and we pick derivation rules that are "truth-preserving" so that every proof ends at a true proposition of which it is a proof. The derivation rules are mechanical in nature: things like "if you have this symbol here and that symbol there then you can replace this symbol with that string you derived earlier" etc.

You can represent strings of symbols using numbers, so-called Gödel numbers. Let's pick a minimal mathematical framework for working with numbers: Peano Arithmetic, aka PA. Let's assume we've made some choice of Gödel numbering scheme and when $p$ is a proposition, write $[p]$ for the number representing $p$. You can represent the mechanical derivation rules as operations on numbers. And that makes it possible to define a mathematical predicate $Prov$ that is true if and only if its argument represents a provable proposition.

In other words, we can prove $Prov([p])$ using PA if and only if $p$ is a proposition provable in PA.

The predicate $Prov$ has some useful properties:

1.If we can prove $p$, then we can prove $Prov([p])$.

We take the steps we used to prove $p$, and convert everything to propositions about numbers. If $Prov$ is defined correctly then we can convert that sequence of numbers into a sequence of propositions about those numbers that makes up a proof of $Prov(p)$.

2.$Prov([p\rightarrow q])$ and $Prov([p])$ imply $Prov([q])$

A fundamental step in any proof is modus ponens, i.e. that $p\rightarrow q$ and $q$ implies $p$. If $Prov$ does its job correctly then it had better know about this.

3.$Prov([p])$ implies $Prov([Prov([p])])$

One way is to prove this is to use Löb's theorem.

4. $Prov([\top])$

The trivially true statement had better be provable or $Prov$ is broken.

Constructing $Prov$ is conceptually straightforward but hard work. I'm definitely not going to do it here.

And there's one last thing we need: self-reference. If $p$ is a proposition, how can we possibly assert $Prov([p])$ without squeezing a copy of $[p]$ inside $p$? I'm not going to do that here either - just mention that we can use a variation of quining to achieve this. That allows us to form a proposition $p$ for which we can prove $p\leftrightarrow Prov([p])$. In fact, we can go further. We can find propositions that solve $p\leftrightarrow F(p)$ for any predicate $F(p)$ built from the usual boolean operations and $p$ as long as all of the occurrences of $p$ are inside the appearances of $Prov$. Even though we can't form a proposition that directly asserts its own falsity, we can form one that asserts that it is unprovable, or one that asserts that you can't prove that you can't prove that you can prove it, or anything along those lines.

Anyway, all that $[]$ and $Prov$ business is a lot of hassle. Provability logic, also known as GL, is intended to capture specifically the parts of PA that relate to provability. GL is propositional calculus extended with the provability operator $\Box$. The intention is that if $p$ is a proposition, $\Box p$ is a proposition in GL that represents $Prov([p])$ in PA. The properties of $Prov$ above become the axioms and derivation rules of GL in the main text.

Expectation-Maximization with Less Arbitrariness

2026-04-22T23:23:39Z

Introduction

Google have stopped supporting the Chart API so all of the mathematics notation below is missing. There is a PDF version of this article at GitHub.

There are many introductions to the Expectation-Maximisation algorithm. Unfortunately every one I could find uses arbitrary seeming tricks that seem to be plucked out of a hat by magic. They can all be justified in retrospect, but I find it more useful to learn from reusable techniques that you can apply to further problems. Examples of tricks I've seen used are:

Using Jensen's inequality. It's easy to find inequalities that apply in any situation. But there are often many ways to apply them. Why apply it to this way of writing this expression and not that one which is equal?
Substituting $1=A/A$ in the middle of an expression. Again, you can use $1=A/A$ just about anywhere. Why choose this $A$ at this time? Similarly I found derivations that insert a $B-B$ into an expression.
Majorisation-Minimisation. This is a great technique, but involves choosing a function that majorises another. There are so many ways to do this, it's hard to imagine any general purpose method that tells you how to narrow down the choice.

My goal is to fill in the details of one key step in the derivation of the EM algorithm in a way that makes it inevitable rather than arbitrary. There's nothing original here, I'm merely expanding on a stackexchange answer.

Generalities about EM

The EM algorithm seeks to construct a maximum likelihood estimator (MLE) with a twist: there are some variables in the system that we can't observe.

First assume no hidden variables. We assume there is a vector of parameters $\theta=(\theta_i)$ that defines some model. We make some observations $x=(x_j)$. We have a probability density $P(x|\theta)$ that depends on $\theta$. The likelihood of $\theta$ given the observations $x$ is $l(\theta|x)=P(x|\theta)$. The maximum likelhood estimator for $\theta$ is the choice of $\theta$ that maximises $l(\theta|x)$ for the $x$ we have observed.

Now suppose there are also some variables $z=(z_k)$ that we didn't get to observe. We assume a density $P(x,z|\theta)$. We now have

$P(x|\theta)=\sum_z P(x,z|\theta)$

where we sum over all possible values of $z$. The MLE approach says we now need to maximise

$l(\theta|x)=\sum_z P(x,z|\theta).$

One of the things that is a challenge here is that the components of $\theta$ might be mixed up among the terms in the sum. If, instead, each term only referred to its own unique block of $\theta_i$, then the maximisation would be easier as we could maximise each term independently of the others. Here's how we might move in that direction. Consider instead the log-likelihood

$\log l(\theta|x)=\log\sum_z P(x,z|\theta).$

Now imagine that by magic we could commute the logarithm with the sum. We'd need to maximise

$\sum_z \log P(x,z|\theta).$

One reason this would be to our advantage is that $P(x,z|\theta)$ often takes the form $\exp(f(x,z,\theta))$ where $f$ is a simple function to optimise. In addition, $f$ may break up as a sum of terms, each with its own block of $\theta_i$'s. Moving the logarithm inside the sum would give us something we could easily maximise term by term. What's more, the $P(x,z|\theta)$ for each $z$ is often a standard probability distribution whose likelihood we already know how to maximise. But, of course, we can't just move that logarithm in.

Maximisation by proxy

Sometimes a function is too hard to optimise directly. But if we have a guess for an optimum, we can replace our function with a proxy function that approximates it in the neighbourhood of our guess and optimise that instead. That will give us a new guess and we can continue from there. This is the basis of gradient descent. Suppose $f$ is a differentiable function in a neighbourhood of $x_0$. Then around $x_0$ we have

$f(x) \approx f(x_0) f'(x_0)\cdot (x-x_0).$

We can try optimising $f(x_0) f'(x_0)\cdot (x-x_0)$ with respect to $x$ within a neighbourhood of $x_0$. If we pick a small circular neighbourhood then the optimal value will be in the direction of steepest descent. (Note that picking a circular neighbourhood is itself a somewhat arbitrary step, but that's another story.) For gradient descent we're choosing $f(x_0) f'(x_0)\cdot (x-x_0)$ because it matches both the value and derivatives of $f$ at $x_0$. We could go further and optimise a proxy that shares second derivatives too, and that leads to methods based on Newton-Raphson iteration.

We want our logarithm of a sum to be a sum of logarithms. But instead we'll settle for a proxy function that is a sum of logarithms. We'll make the derivatives of the proxy match those of the original function precisely so we're not making an arbitrary choice.

Write

$\log l(\theta|x) = \log\sum_z P(x,z|\theta) \approx \sum_z\beta_z\log P(x,z|\theta) \text{constant}.$

The $\beta_z$ are constants we'll determine. We want to match the derivatives on either side of the $\approx$ at $\theta=\theta_0$:

$\frac{\partial \log l(\theta_0|x)}{\partial\theta_0}$ $=\frac{1}{l(\theta_0|x)} \frac{\partial l(\theta_0|x)}{\partial\theta_0} =\sum_z\frac{1}{l(\theta_0|x)} \frac{\partial P(x,z|\theta_0)}{\partial\theta_0}.$

On the other hand we have

$\frac{\partial}{\partial\theta_0}\sum_z\beta_z\log P(x,z|\theta_0) =\sum_z\beta_z\frac{1}{P(x,z|\theta_0)}\frac{\partial P(x,z|\theta_0)}{\partial\theta_0}$

To achieve equality we want to make these expressions match. We choose

$\beta_z = \frac{P(x,z|\theta_0)}{l(\theta_0|x)} = \frac{P(x,z|\theta_0)}{P(x|\theta_0)} = P(z|x,\theta_0).$

Our desired proxy function is:

$\sum_z P(z|x,\theta_0)\log P(x,z|\theta) + \text{const.} = E_{Z|x,\theta_0}(\log P(x,Z|\theta)) + \text{const.}$

So the procedure is to take an estimated $\theta_0$ and obtain a new estimate by optimising this proxy function with respect to $\theta$. This is the standard EM algorithm.

It turns out that this proxy has some other useful properties. For example, because of the concavity of the logarithm, the proxy is always smaller than the original likelihood. This means that when we optimise it we never optimise ``too far'' and that progress optimising the proxy is always progress optimising the original likelihood. But I don't need to say anything about this as it's all part of the standard literature.

Afterword

As a side effect we have a general purpose optimisation algorithm that has nothing to do with statistics. If your goal is to compute

$\operatorname{argmax}_x\sum_i\exp(f_i(x))$

you can iterate, at each step computing

$\operatorname{argmax}_x\sum_i\exp(f_i(x_0))f_i(x)$

where $x_0$ is the previous iteration. If the $f_i$ take a convenient form then this may turn out to be much easier.

Note

This was originally written as a PDF using LaTeX. It'll be available here for a while. Some fidelity was lost when converting it to HTML.

Logarithms and exponentials of functions

2026-04-22T23:16:27Z

Introduction

A popular question in mathematics is this: given a function $f$, what is its "square root" $g$ in the sense that $g(g(x)) = f(x)$. There are many questions about this on mathoverflow but it's also a popular subject in mathematics forums for non-experts. This question seems to have a certain amount of notoriety because it's easy to ask but hard to answer fully. I want to look at an approach that works nicely for formal power series, following from the Haskell code I wrote here. There are some methods for directly finding "functional square roots" for formal power series that start as $z a_2z^2 a_3z^3 \ldots$, but I want to approach the problem indirectly. When working with real numbers we can find square roots, say, by using $\sqrt{x}=\exp(\frac{1}{2}\log{x})$. I want to use an analogue of this for functions. So my goal is to make sense of the idea of the logarithm and exponential of a formal power series as composable functions. Warning: the arguments are all going to be informal.

Notation

There's potential for a lot of ambiguous notation here, especially as the usual mathematical notation for $n$th powers of trig functions is so misleading. I'm going to use $\circ$ for composition of functions and power series, and I'm going to use the notation $f^{\circ n}$ to mean the $n$th iterate of $f$. So $f^{n 1}(x) = f(x)f^n(x)$ and $f^{\circ n 1}(x) = f(f^{\circ n}(x))$. As I'll be working mostly in the ring of formal power series $R[\![z]\!]$ for some ring $R$, I'll reserve the variable $z$ to refer only to the corresponding element in this ring. I'll also use formal power series somewhat interchangeably with functions. So $z$ can be thought of as representing the identity function. To make sure we're on the same page, here are some small theorems in this notation:

$z^mz^n = z^{m n}$
$f^{\circ m}\circ f^{\circ n} = f^{\circ m n}$
$(1 z)^n = \sum_{i=0}^n{n\choose i}z^n$
$(1 z)^{\circ n}=n z$.

That last one simply says that adding one $n$ times is the same as adding $n$.

As I'm going to have ordinary logarithms and exponentials sitting around, as well as functional logarithms and exponentials, I'm going to introduce the notation $\operatorname{LOG}$ for functional logarithm and $\operatorname{EXP}$ for functional exponentiation.

Preliminaries

The first goal is to define a non-trivial function $\operatorname{LOG}$ with the fundamental property that $\operatorname{LOG}(f^{\circ n})=n\operatorname{LOG}(f)$

First, let's note some basic algebraic facts. The formal power series form a commutative ring with operations and $\cdot$ (ordinary multiplication) and with additive identity $0$ and multiplicative identity $1$. The formal power series form a ring-like algebraic structure with operation and partial operation $\circ$ with additive identity $0$ and multiplicative identity $z$. But it's not actually ring or even a near-ring. Composition isn't defined for all formal power series and even when it's defined, we don't have distributivity. For example, in general $f\circ(g h)\ne f\circ g f\circ h$, after all there's no reason to expect $f(g(x) h(x))$ to equal $f(g(x)) f(h(x))$. We do have right-distributivity however, i.e.

$(f g)\circ h = f\circ g f\circ h$,

because

$(f g)(h(x))=f(h(x)) g(h(x))$,

more or less by definition of .

We can't use power series on our power series

There's an obvious approach, just use power series of power series. So we might tentatively suggest that

$\operatorname{LOG}(z f) = f-\frac{1}{2}f^{\circ 2} \frac{1}{3}f^{\circ 3} \ldots$.

Note that I consider $\operatorname{LOG}(z f)$ rather than $\operatorname{LOG}(1 f)$ because $z$ is the multiplicative identity in our ring-like structure.

Unfortunately this doesn't work. The reason is this: if we try to use standard reasoning to show that the resulting function has the fundamental property we seek we end up using distributivity. We don't have distributivity.

Sleight of hand

There's a beautiful trick I spotted on mathoverflow recently that allows us to bring back distributivity. (I can't find the trick again, but when I do I'll come back and add a link and credit here.) Consider the function $R(g)$ defined by $R(g)(f) = f\circ g$. In other words $R(g)$ is right-composition by $g$. (Ambiguity alert, I'm using $R$ here to mean right. It has nothing to do with the ring underlying our formal power series.) Because we have right-distributivity, $R(g)$ is a bona fide linear operator on the space of formal power series. If you think of formal power series as being infinitely long vectors of coefficients then $R(g)$ can be thought of as an infinitely sized matrix. This means that as long as we have convergence, we can get away with using power series to compute $\log R(g)$ with the property that $\log(R(g)^n) = n\log R(g)$. Define:

$\operator{LOG}(f) = \log(R(f))z$.

We have:

$\operator{LOG}(f) = \log(R(f))z = \log(1 (R(f)-1))z$

where I'm using $1$ to mean the identity linear operator. And now have:

$\operator{LOG}(f) = (R(f)-1)z-\frac{1}{2}(R(f)-1)^2z \frac{1}{3}(R(f)-1)^3z \ldots$.

But does it converge? Suppose $f$ is of the form $x a_2x^2 a_3x^3 \ldots$. Then $(R(f)-1)g = g\circ f-g$. The leading term in $g\circ f$ is the same as the leading term in $g$. So $R(f)-1$ kills the first term of whatever it is applied to, which means that when we sum the terms in $\operatorname{LOG}(f)$, we only need $n$ to get a power series correct to $n$ coefficients. Reusing my code from here, I call $\operatorname{LOG}$ by the name flog. Here is its implementation:

> import Data.Ratio


> flog :: (Eq a, Fractional a) => [a] -> [a]
> flog f@(0 : 1 : _) =
>   flog' 1 (repeat 0) (0 : 1 : repeat 0)
>      where flog' n total term = take (n+1) total ++ (
>              drop (n+1) $
>                 let pz = p term
>                 in flog' (n+1) (total-map (((-1)^n / fromIntegral n) *) pz) pz)
>            p total = (total ○ f) - total

The take and drop are how I tell Haskell when the first $n 1$ coefficients have been exactly computed and so no more terms are necessary.

Does it work?

Here's an example using the twice iterated sin function:

> ex1 = do
>   let lhs = flog (sin (sin z))
>   let rhs = 2*flog (sin z)
>   mapM_ print $ take 20 (lhs-rhs)

Works to 20 coefficients. Dare we try an inverse function?

> ex2 = do
>   let lhs = flog (sin z)
>   let rhs = flog (asin z)
>   mapM_ print $ take 20 (lhs+rhs)

Seems to work!

Exponentials

It's no good having logarithms if we can't invert them. One way to think about the exponential function is that

$\exp(x) = \lim_{n\rightarrow \infty}(1 \frac{x}{n})^n$

We get better and better approximations by writing the expression inside the limit as a product of more and more terms. We can derive the usual power series for $\exp$ from this, but only if right-distributivity holds. So let's try to use the above expression directly:

$\operatorname{EXP}(f) = \lim_{n\rightarrow \infty}(z \frac{f}{n})^{\circ n}$

and get

$\operatorname{EXP}(f) = \lim_{n\rightarrow \infty}R(z \frac{f}{n})^nz$.

Unfortunately, even though $R(g)$ is linear, $R$ itself isn't. So it's going to take some extra work to raise $R(z f/n)$ to the power of $n$.

The good news is that we're dealing with the special case $R(z \epsilon)$ where $\epsilon$ is something small. We have

$R(z \epsilon)f=f(z \epsilon)=f(z) \epsilon\frac{df}{dz} O(\epsilon^2)$.

So $R(z f/n)$ is actually $1 \frac{1}{n}f\frac{d}{dz}$ modulo higher order terms. This gives us

$\operatorname{EXP}(f) = \lim_{n\rightarrow \infty}(1 \frac{1}{n}f\frac{d}{dz})^nz=\exp(f\frac{d}{dz})z$.

This is something we can implement using the power series for ordinary $\exp$:

$\operatorname{EXP}(f) = z f \frac{1}{2!}f\frac{df}{dz} \frac{1}{3!}f\frac{d}{dz}(f\frac{df}{dz}) \ldots$.

In code that becomes:

> fexp f@(0 : 0 : _) = fexp' f 0 z 1
> fexp' f total term n = take (n-1) total ++ drop (n-1)
>           (fexp' f (total+term) (map (/fromIntegral n) (f*d term)) (n+1))

Note how when we differentiate a power series we shift the coefficients down by one place. To counter the effect of that so as to ensure convergence we need $f$ to look like $a_2z^2 a_3a^3 \ldots$. Luckily this is exactly the kind of series $\operatorname{LOG}$ gives us.

But does it successfully invert $\operatorname{LOG}$? Let's try:

> ex3 = do
>   let lhs = sin z
>   let rhs = fexp (flog (sin z))
>   mapM_ print $ take 20 (lhs-rhs)

Now we can start computing fractional iterates. Square root first:

> ex4 = do
>   mapM_ print $ take 20 $ fexp (flog (sin z)/2)

That matches the results at A048602 and A048603.

Cube root:

> ex5 = do
>   mapM_ print $ take 20 $ fexp (flog (sin z)/3)

Matches A052132 and A052135.

And this gives an alternative to Lagrange inversion for computing power series for inverse functions:

> ex6 = do
>   let lhs = fexp (-flog (sin z))
>   let rhs = asin z
>   mapM_ print $ take 20 (lhs-rhs)

What's really going on with $\operatorname{EXP}$?

Let's approach $\operatorname{EXP}$ in a slightly different way. In effect, $\operatorname{EXP}$ is the composition of $n$ lots of $z \frac{f}{n}$ with $z$. So let's try composing these one at a time, with one composition every $\frac{1}{n}$ seconds. After one second we should have our final result. We can write this as:

$g(0) = z$ and $g(t \frac{1}{n}) = g(t) \frac{1}{n}f(g(t))$ to first order.

So we're solving the differential equation:

$g(0) = z$ and $\frac{dg}{dt} = f(g(t))$

with $\operatorname{EXP}(g) = g(1)$.

So $\operatorname{EXP}$ is the function that solves one of the most fundamental differential equations. This also means I can use Mathematica to solve symbolically and check my results. For example, Mathematica says that the solution to

$\frac{dg}{dt}=sin(g(t))^2$ and $g(0)=x$

at $t=1$ is

$g(1) = \frac{\tan z}{1-\tan z}$

so let's check:

> ex7 = do
>   let lhs = fexp ((sin z)^2)
>   let rhs = atan (tan z/(1-tan z))
>   mapM_ print $ take 20 (lhs-rhs)

I like this example because it leads to the generalized Catalan numbers A004148:

> ex8 = do
>     mapM_ print $ take 20 $ fexp (z^2/(1-z^2))

That suggests this question: what does $\operatorname{EXP}$ mean combinatorially? I don't have a straightforward answer but solving this class of differential equation motivated the original introduction, by Cayley, of the abstract notion of a tree. See here.

What is going on geometrically?

For those who know some differential geometry, The differential equation

$g(0) = z$ and $\frac{dg}{dt} = f(g(t))$

describes a flow on the real line (or complex plane). You can think of $f$ as being a one-dimensional vector field describing how points move from time $t$ to $t dt$. When we solve the differential equation we get integral curves that these points follow and $\operatorname{EXP}$ tells us where the points end up after one unit of time. So $\operatorname{EXP}$ is the exponential map. In fact, $\operatorname{EXP}(f)=\exp(f\frac{d}{dz})z$ is essentially the exponential of the vector field $f\frac{d}{dz}$ where we're now using the differential geometer's notion of a vector field as a differential operator.

Final word

Unfortunately the power series you get from using $\operator{LOG}$ and $\operator{EXP}$ don't always have good convergence properties. For example, I'm not sure but I think the series for $\sin^{\circ 1/2} z$ has radius of convergence zero. If you truncate the series you get a half-decent approximaion to a square root in the vicinity of the origin, but the approximation gets worse, not better, if you use more terms.

And the rest of the code

> (*!) _ 0 = 0
> (*!) a b = a*b
> (!*) 0 _ = 0
> (!*) a b = a*b
> (^+) a b = zipWith (+) a b
> (^-) a b = zipWith (-) a b


> ~(a:as) ⊗ (b:bs) = (a *! b):
>     ((map (a !*) bs) ^+ (as ⊗ (b:bs)))
> (○) (f:fs) (0:gs) = f:(gs ⊗ (fs ○ (0:gs)))
> inverse (0:f:fs) = x where x     = map (recip f *) (0:1:g)
>                            _:_:g    = map negate ((0:0:fs) ○ x)
> invert x = r where r = map (/x0)  ((1:repeat 0) ^- (r ⊗ (0:xs)))
>                    x0:xs = x 


> (^/) (0:a) (0:b) = a ^/ b
> (^/) a b = a ⊗ (invert b)


> z :: [Rational]
> z = 0:1:repeat 0


> d (_:x) = zipWith (*) (map fromInteger [1..]) x


> integrate x = 0 : zipWith (/) x (map fromInteger [1..])


> instance (Eq r, Num r) => Num [r] where
>     x+y  = zipWith (+) x y
>     x-y  = zipWith (-) x y
>     ~x*y = x ⊗ y
>     fromInteger x      = fromInteger x:repeat 0
>     negate x     = map negate x
>     signum (x:_) = signum x : repeat 0
>     abs (x:xs)   = error "Can't form abs of a power series"


> instance (Eq r, Fractional r) => Fractional [r] where
>     x/y = x ^/ y
>     fromRational x    = fromRational x:repeat 0


> sqrt' x = 1 : rs where rs = map (/2) (xs ^- (rs ⊗ (0:rs)))
>                        _ : xs = x
> instance (Eq r, Fractional r) => Floating [r] where
>     sqrt (1 : x) = sqrt' (1 : x)
>     sqrt _  = error "Can only find sqrt when leading term is 1"
>     exp x   = e where e = 1+integrate (e * d x)
>     log x   = integrate (d x/x)
>     sin x   = integrate ((cos x)*(d x))
>     cos x   = [1] ... negate (integrate ((sin x)*(d x)))
>     asin x  = integrate (d x/sqrt(1-x*x))
>     atan x  = integrate (d x/(1+x*x))
>     acos x  = error "Unable to form power series for acos"
>     sinh x  = integrate ((cosh x)*(d x))
>     cosh x  = [1] ... integrate ((sinh x)*(d x))
>     asinh x = integrate (d x/sqrt(1+x*x))
>     atanh x = integrate (d x/(1-x*x))
>     acosh x = error "Unable to form power series for acosh"
>     pi      = error "There is no formal power series for pi"


> lead [] x = x
> lead (a:as) x = a : (lead as (tail x))
> a ... x = lead a x


> (//) :: Fractional a => [a] -> (Integer -> Bool) -> [a]
> (//) a c = zipWith (\a-> \b->(if (c a :: Bool) then b else 0)) [(0::Integer)..] a

A direct functional square root that doesn't use $\operatorname{LOG}$ and $\operatorname{EXP}$:

> fsqrt (0 : 1 : fs) =
>     let gs = (fs-(0 : gs*((0 : delta gs gs)+((2 : gs)*(gs*g)))))/2
>         g = 0 : 1 : gs
>         delta (g : gs) h = let g' = delta gs h
>                    in (0 : ((1 : h) * g')) + gs
>     in g

ð�•¯ð�–”ð�–ˆ ð�–Žð�–™ ð�–‘ð�–Žð�–�ð�–Š ð�–Žð�–™'ð�–˜ ð�–�ð�–”ð�–™

2026-04-16T00:00:00Z

You’ve got some nice code. That’s a nice trie.
I see those PRs. But no README?

Everybody knows documentation is essential to any software engineering enterprise. And fo’ shizzle, everybody knows it gets deprioritised. An afterthought; written by engineers who are thinking, “I should probably write docs”. Nah. What they should be thinking is, “I get to write docs, cuz!” Because when you doc it right, it ain’t a chore. It’s what separates a project people use from a project people lose.

In this post, I’m walkin’ you through three real projects I’ve been involved in at Tweag; each one levelling up the documentation game. First, fixing docs that got out of hand: the reactive play. Then, planning docs from day one: the proactive play. And finally, making docs part of the code itself: the integrated play. By the end, I think you’ll agree: to doc it like it’s hot is the only way to gizzo.

When your `README`’s a monolith

Doc it like it’s hot

Sometimes the docs are already a mess and you gotta clean house. That’s the reactive play.

That was the case with Topiary, Tweag’s universal formatting engine. It uses Tree-sitter grammars and queries to format code; encoded in what Tree-sitter calls “capture names”. All of our formatting capture names needed to be documented, with their semantics described.

Moreover, our documentation covered usage instructions, which were checked against the --help output of each subcommand. Then there was our project motivation and design philosophy, language support, installation instructions, configuration details, usage guides…

…All in a single README.md which had grown to over 7,000 words. Way too big for the crib, homes. Ain’t nobody reading all that!

Drift and inconsistency were creeping in, making it harder for the team to maintain. Worse, it was straight-up hostile to users: how you gonna expect someone to sift through all that noise just to find what they need?

Topiary OG Erin started work reconstructing the monolith into a book format using — as it’s a Rust project — mdBook. I picked up where he left off and finished it for Topiary v0.6.1. Yo, the Topiary Book was born!

But the move here wasn’t just splitting up the README.md and calling it a day. You need a crew and every member needs a role. That means a framework; something to keep it tight, maintainable and user-friendly. We rolled with Diátaxis, which identifies four distinct documentation types based on what the reader actually needs:

Tutorials, for learners. For example, Yann’s step-by-step guides walk readers through creating a formatter from scratch for a toy language, starting from zero. Its aim is to actively reach understanding through engagement, rather than just passive reading.
How-to guides, for readers who want to accomplish a specific goal. “Adding a new language” assumes you already know Topiary and gets straight to the point: register the grammar, create a query file, update the test suite, rinse and repeat.
Explanation, for those who need a deeper understanding. For example, “Tree-sitter and its queries” explains the conceptual foundation — what Tree-sitter is, why Topiary uses it and how queries relate to formatting — without asking the reader to do anything.
Reference, which describes what exists and how it behaves. Our capture names chapter documents every formatting directive Topiary recognises, what it does, its syntax and its edge cases. You’re not meant to read it cover-to-cover; it’s just there to look up whenever you need it.

Structure is a prerequisite for usefulness and frameworks exist so you don’t have to invent your own. Rolling your own ain’t gangsta, use what’s already out there…that’s real game.

When you have varied audiences

Doc it like it’s hot

Cleaning up after the fact is one thing, but why not come correct from the start? That’s the proactive play: before you write a line of documentation, you ask who’s gonna read it and what they need. Then you build the structure around that.

So, while Topiary is a developer productivity tool with, broadly, a developer audience, the second project I want to chop it up about is different: an omics data acquisition tool for a pharmaceutical client’s computational biology needs.

This one had a whole different crowd to please. Bioinformaticians running data processing pipelines, IT staff handling installation and access control, administrators guiding users through workflows and developers who might extend the project down the line…including, potentially, yours truly, after returning from a long absence and having forgotten how everything works!

Their needs, vocabulary and assumptions barely overlap, so a single set of docs couldn’t serve them all without becoming an unfocused sprawl. So I split the documentation three ways:

A technical manual, covering every subcommand, flag and configuration key. The kind of thing a user reaches for mid-pipeline, an administrator references when guiding colleagues, or IT consults when setting up the environment.
A developer manual, as its mirror image: module architecture, type hierarchies, testing methodology and contribution workflow. All you need to dig the codebase, but were too shook to ask!
A user manual sat between the two, covering key concepts, how-to guides and troubleshooting. Diátaxis was again the guiding framework here: the concepts section is explanation, the how-to guides are exactly that and the troubleshooting page addresses the practical edge cases that tripped people up during user acceptance testing.

Within the user manual, I also got to indulge in what you’ve probably gathered is my favourite move. I weaved in a narrative through the examples that borrowed — with some artistic licence — from Stevenson’s Strange Case of Dr. Jekyll and Mr. Hyde: An external collaborator’s data arrives from Dr. Jekyll’s lab, making oblique references to the novella throughout, and ultimately identifying the evil-transcriptome for downstream analysis.

Does this make the documentation sillier than it needs to be? Maybe. But it makes the examples stick…and that ain’t just a vibe, it’s science, dawg: our brains are straight-up wired to retain information delivered through narrative far better than through isolated facts. A reader who skimmed the manual a month ago can still go, “that’s the example where forward and reverse reads are named ‘front door’ and ‘back door’” and find the section again. The story gives continuity across otherwise disconnected examples; where each section could stand alone, the recurring characters give readers a reason to follow the arc from data acquisition to analysis. And the scenario is deliberately awkward, which exercises more features than a vanilla example ever could.

Different people need different things, that’s just how it is. A bioinformatician never needs to know how the S3 client interface is structured, just as a future developer doesn’t need a walkthrough of dataset creation from NCBI metadata. When your audiences are distinct enough, the realest thing you can do is acknowledge that up front, rather than forcing everyone to wade through what ain’t for them…and there’s never any harm in bringing a little levity into the world!

When you need to have clarity

Doc it like it’s hot

Planning ahead is smooth, but the smoothest move of all? Making the docs and the code one and the same. That’s the integrated play.

The previous case study included a developer manual, but my final example? That’s all developer; front to back. Scrawls is a Rust library implementing a verifiable file format for Cardano ledger state, as an independent implementation alongside a Haskell reference. Its users are Rust developers pulling in the crate as a dependency, so the documentation strategy needed to reflect that.

In Rust, the idiomatic answer to this is rustdoc: in-band documentation that lives alongside the types, functions and invariants it describes. Then, while there is still a README.md, it functions more as a landing page than a manual: a brief orientation, a feature summary and a handful of examples to get a new user from zero to something.

Docstrings ain’t new, of course — Doxygen has been around since the ’90s — but Rust’s ecosystem raises the bar. Between docs.rs publishing your crate’s documentation automatically and a community that straight-up expects thorough doc comments, skipping them feels less like a shortcut and more like showing up empty-handed. So, when the API changes, the respective documentation change should be right there in the diff, for reviewers to keep it real.

And here’s the thing: during implementation, the specification was still maturing and encoding its requirements into Rust exposed ambiguities that were lurking in the prose. Should certain orderings be strict? How should the Merkle tree be rolled up? Is this field optional, or merely absent? Each ambiguity became a clarification fed back into the spec. And each clarification, a documented precondition in the API.

A specification is, at the end of the day, documentation too…and the same principle applies: vagueness is a bug. This ain’t documentation as prose; it’s documentation as a contract, forged on the streets between spec and implementation. Sometimes the most valuable work you can do is keep it tight, not make it long.

Document your code, ma

That’s how you get ahizzead

Reactive. Proactive. Integrated. Three different plays for three different games…though ideally you won’t need the first! And what ties them all together is that none of the documentation I’ve described was written reluctantly. It wasn’t tacked on after the fact because someone asked, “Where are the docs?” It was thought about — its structure, its audience, its precision — as part of the work itself.

That’s the shift I want to put you on to. Documentation ain’t a tax you pay for writing code; it is part of writing code. And when you approach it that way — when you reach for a framework instead of a blank page; when you ask “who’s this for?” before you start typing; when you treat ambiguity as a bug — then you’re doc’ing it like it’s hot! The result is something you’re genuinely proud of, not something you hope nobody reads too carefully.

Now if documentation alongside code is good, then documentation before code — as a design tool; a sketch in prose before you commit to implementation — is the next level. While I ain’t taken that step myself, my Scrawls experience, where the spec and the code kept each other honest, showed me how close that workflow already is.

In practice, pure docs-first has the same problem as pure test-driven development: you can’t document what you don’t know yet. But that feedback loop — where the docs sharpen the code and the code sharpens the docs — that’s the real endgame right there. You might notice this sounds a bit like vibe-coding, and it is…in the same way that an architect’s blueprint is a bit like a napkin sketch. Same ‘hood, different zip codes, dawg. Something to aspire to, fo’ shizzle.

So now, just like my man S-to the N-to the double O-P, you too can say…

I got a Rollie on my arm and I’m pourin’ Chandon
And I write the best docs, ‘cause I got it goin’ on.

With thanks to Simeon Carstens, Facundo Domínguez, Valentin Gagarin, Xavier Góngora, Arnaud Spiwack, Snoop Dogg and Pharrell Williams for their reviews and input on this post.

80: POPL 2026 - Part 1

2026-04-13T05:00:00Z

This is the first part of a miniseries on this year’s Symposium on Principles of Programming Languages, a.k.a. POPL 2026, hosted by Jessica Foster.

In this episode, we talk about: undergrad funding and participation, the behind the scenes of AV, choreographic programming, quantum languages, conference catering, and the joy of theory. And at one point, you’ll even hear us get kicked out the venue mid interview. Enjoy!

Wet Sidewalks and Odd Numbers

2026-04-09T15:22:05Z

Phil Crissman explains Propositions as Types with a dialogue between Achilles and the Tortoise, in the style of Douglas Hofstadter (who in turn was inspired by Lewis Carrol). Lambda Man makes an appearance.

Nash Equilibrium for Terminal Maneuvers

2026-04-02T13:52:16Z

Last year Ethan Heilman wrote about a simple game he calls Terminal Maneuvers. This game simulates a missile attacking an interstellar ship. The ship has a laser defence system. One player controls the missile, and the other player controls the laser. If the missile hits the ship, Missile wins. If the laser hits the missile, the missile is destroyed and Laser wins.

The complicating factor is that, due to the relative motion of the laser and the ship being a significant fraction of the speed of light, Laser has to aim not at the missile but where the missile will be. This distance allows the missile to perform erratic manoeuvers to prevent Laser from knowing what its future position will be. However, Missile must expend fuel to perform these manoeuvers.

The Terminal Maneuvers game proceeds in five rounds, giving the laser five opportunities to hit the missile. In each round, Missile secretly commits to an amount of fuel they will expend. Laser must “aim” by guessing the amount of fuel expended by Missile. If they guess correctly, there is some probability of destroying the missile, which depends on how far away the missile is and how much fuel the missile expended. The table below shows the probabilities of the missile being destroyed in the various rounds.

Probability of Laser destroying the missile when correctly guessing Missile’s fuel expenditure
Fuel Cost	Round 1	Round 2	Round 3	Round 4	Round 5
0 Fuel	100%	100%	100%	100%	100%
1 Fuel	1/6	2/6	3/6	4/6	5/6
2 Fuel	0%	1/6	2/6	3/6	4/6
3 Fuel		0%	1/6	2/6	3/6
4 Fuel			0%	1/6	2/6
5 Fuel				0%	1/6
6 Fuel					0%

The missile has a limited amount of fuel at the start of the game. Fuel spent earlier in the game means less fuel available later in the game when it is most needed. The amount of starting fuel selects the difficulty of the game. Ethan suggests starting with seven fuel, which empirically gives Missile about a 25% chance of winning.

Laser knows how much fuel the missile has at the start of each round, so it is imperative that Missile does not run out of fuel in the middle of the game. If Laser knows the missile is out of fuel, Laser will predict zero fuel used and will always successfully destroy the missile. That said, as long as Missile has some fuel, choosing to burn zero fuel is still a legitimate option.

Starting with seven fuel, one strategy for Missile would be to burn one fuel on the first four rounds and burn the remaining three fuel on the last round. However, if Laser realizes this is Missile’s strategy, Laser can always predict the correct amount of fuel that will be used by the missile. Taking the product of all the probabilities of Missile’s survival in each round, Missile only has a 4.6% chance of winning. Clearly, Missile’s optimal strategy should be non-deterministic.

I figured this game would be a fun exercise in learning about mixed-strategy (i.e., non-deterministic) Nash equilibrium. This game is a small finite game, so it is reasonably easy to analyze, but it is significantly more complicated than trivial games often used in Nash equilibrium examples.

For these calculations, it is best to find the Nash equilibrium strategy at the endgame and work backward from there. To that end, let us start with the simplest non-trivial endgame. Missile has survived to Round 5 and has 1 fuel left. Missile can choose to burn their last fuel or not, and Laser can choose to aim at no fuel burned or not. This yields the following, game-theoretic payoff matrix, listing the probabilities of missile or laser winning:

Payoff matrix for Round 5 with 1 fuel remaining
	Predict 0	Predict 1
Burn 0	0, 1	1, 0
Burn 1	1, 0	1⁄6, 5⁄6

This is a constant-sum game, because the total score of all players is always the same, no matter the outcome. Constant-sum games are also known as zero-sum games since they can be translated into games where the sum of each outcome is zero without affecting any strategy.

The definition of Nash equilibrium is a pair of strategies, one for each player, where neither player individually can change strategies to improve their outcome. Therefore, one potential way for Missile to devise a strategy is to find one where Laser’s chance of winning is the same regardless of the move that they make. Such a strategy is not necessarily going to be possible, but we can give it a try.

Let p be the probability that Missile will burn 0, and let q be the probability that Missile will burn 1. If Laser predicts 0, the probability of them winning is p. If Laser predicts 1, the probability of them winning is 5⁄6 ⁢q. If Laser cannot make a choice between these two options to improve their odds, then p = 5⁄6 ⁢q. Missile’s probabilities must add up to 1, so we also require p + q = 1.

We have a linear system of two equations and two unknowns, so we can try to solve it. The solution is p = 5⁄11 and q = 6⁄11. Missile burns no fuel with probability 5⁄11 and burns its one fuel with probability 6⁄11. This provides Missile a 6⁄11 chance of winning, regardless of which prediction Laser makes.

On the flip side, Laser’s Nash equilibrium can be computed by choosing a set of probabilities so that Missile’s outcome is the same regardless of whether they choose to burn fuel or not. This time, let p be the probability that Laser will predict 0, and let q be the probability that Laser will predict 1. If Missile burns 0, the probability of them winning is q. If Missile burns 1, the probability of them winning is p + 1⁄6 ⁢q. Again, if Missile cannot make a choice between these two options to improve their odds, then q = p + 1⁄6 ⁢q. Laser’s probabilities also must add up to 1, so we also require p + q = 1.

Rearranging q = p + 1⁄6 ⁢q, we get 5⁄6 ⁢q = p, which happens to be the exact same equation Missile had. Thus, their solutions are identical. Laser predicts no fuel burned with a probability of 5⁄11 and predicts one fuel burned with a probability of 6⁄11. This provides Laser a 5⁄11 chance of winning no matter whether Missile has chosen to burn their fuel or not. This chance is the complement to Missile’s 6⁄11 chance of winning, as it has to be.

Payoff matrix for Round 5 with 2 fuel remaining
	Predict 0	Predict 1	Predict 2
Burn 0	0, 1	1, 0	1, 0
Burn 1	1, 0	1⁄6, 5⁄6	1, 0
Burn 2	1, 0	1, 0	2⁄6, 4⁄6

If the game ends in Round 5 with Missile having 2 fuel left, we have the above payoff matrix. We can solve similar linear algebra problems on three variables to find strategies for each player so that the other player’s outcome is the same no matter which of their three choices they make. The solution has Missile burn 0 fuel with probability 10⁄37, burn 1 fuel with probability 12⁄37, and burn 2 fuel with probability 15⁄37, giving Missile a 27⁄37 chance of winning regardless of what Laser’s prediction is.

Laser makes the predictions with the same probability distribution, giving Laser a 10⁄37 chance of winning no matter how much fuel Missile chooses to burn. This sort of distribution is what we might expect: somewhat evenly distributed with a bias towards burning more fuel, which provides some evasion for Missile.

What I found surprising is how when Laser plays at their Nash equilibrium, they simply do not care how much fuel Missile has secretly chosen to burn. Their odds of winning are the same regardless of what Missile reveals. It is as if Laser is no longer playing against Missile at all. Missile’s choices no longer matter. This result is called the “indifference principle.” Later we will see that Missile’s choices sometimes can matter.

Missile feels the same when playing at their equilibrium. No matter what prediction Laser ultimately makes, Missile’s odds of winning have already been fixed by playing at the Nash equilibrium.

In theory, this is what playing poker at a Nash equilibrium should feel like. Based on the state of the board, you make a random selection of calls, folds, or raises according to some appropriate distribution, and your distribution has fixed your expected payout at that point, independent of the choices the other players are going to make. No need to stress over whether your bluff will be called or not.

Before moving on to analyzing Round 4, we can complete a chart of the probability of Missile winning when playing at their Nash equilibrium depending on how much fuel they have remaining.

Probability table for Round 5
Remaining Fuel	Missile Win Probability	Laser Win Probability
0	0%	100%
1	6⁄11 ≈ 54.5%	5⁄11 ≈ 45.5%
2	27⁄37 ≈ 73.0%	10⁄37 ≈ 27.0%
3	47⁄57 ≈ 82.5%	10⁄57 ≈ 17.5%
4	77⁄87 ≈ 88.5%	10⁄87 ≈ 11.5%
5	137⁄147 ≈ 93.2%	10⁄147 ≈ 6.8%
6+	100%	0%

If Missile starts Round 4 with 1 fuel remaining, they are in big trouble. They can only burn fuel in at most one of the two remaining rounds. Therefore, Laser can win by predicting 0 fuel burned in both Round 4 and Round 5. Laser is guaranteed to destroy the missile on one of those two rounds. Missile must start Round 4 with at least 2 fuel remaining if they are to have a chance of winning.

Since the game in each round depends only on the state of Missile’s remaining fuel and not on the specific choices of how that state came to be, we can simplify the analysis of Round 4’s payoff matrix by using each player’s probability of winning Round 5 as their scores in Round 4. Note that Missile does not have the option of burning all their fuel in Round 4, since starting Round 5 with 0 fuel is a guaranteed loss for them, and Laser knows it.

Payoff matrix for Round 4 with 2 fuel remaining
	Predict 0	Predict 1
Burn 0	0, 1	27⁄37, 10⁄37
Burn 1	6⁄11, 5⁄11	2⁄11, 9⁄11

To compute Missile’s strategy, we define p and q as before. This time Missile needs to solve the equations p + 5⁄11 ⁢q = 10⁄37 ⁢p + 9⁄11 ⁢q and p + q = 1. The solution has Missile burn 0 fuel with a probability of 148⁄445 and burn 1 fuel with a probability of 297⁄445, which is roughly a 1⁄3^rd–2⁄3^rd split. This provides Missile a chance of winning with a probability of 162⁄445, or about 36.4%.

Meanwhile, Laser needs to solve the equations 27⁄37 ⁢q = 6⁄11 ⁢p + 2⁄11 ⁢q and p + q = 1. The solution has Laser predict 0 fuel with probability 223⁄445 and predict 1 fuel with probability 222⁄445, which is nearly evenly split. This provides Laser a chance of winning of 283⁄445, or about 63.6%.

In Round 5, each player’s individual payoff matrix was symmetric, which led to Missile and Laser having identical strategies. In Round 4, the individual player’s payoff matrices are no longer symmetric, and Missile and Laser end up with different strategies. Laser picks a nearly 50–50 split because the differences of column scores, 5⁄11 − 1 vs. 9⁄11 − 10⁄37, are nearly equal in magnitude. Whereas Missile picks a 1⁄3^rd–2⁄3^rd split because the difference of row scores, 27⁄37 vs. 2⁄11 − 6⁄11, differs in magnitude by close to a factor of two.

We can proceed as before, using linear algebra to compute equilibrium strategies for Round 4 with various states of remaining fuel for the missile. However, we run into a problem when Missile has 4 fuel remaining.

Payoff matrix for Round 4 with 4 fuel remaining.
	Predict 0	Predict 1	Predict 2	Predict 3
Burn 0	0, 1	77⁄87, 10⁄87	77⁄87, 10⁄87	77⁄87, 10⁄87
Burn 1	47⁄57, 10⁄57	47⁄171, 124⁄171	47⁄57, 10⁄57	47⁄57, 10⁄57
Burn 2	27⁄37, 10⁄37	27⁄37, 10⁄37	27⁄74, 47⁄74	27⁄37, 10⁄37
Burn 3	6⁄11, 5⁄11	6⁄11, 5⁄11	6⁄11, 5⁄11	4⁄11, 7⁄11

Let us try to solve for Laser’s equilibrium strategy, the probability distribution where Missile’s outcome is the same no matter what move they make. We let p, q, r, and s be the probabilities of predicting 0 through 3 fuel burned, respectively. In addition to having p + q + r + s = 1, we require

77⁄87 ⁢(q + r + s),
47⁄57 ⁢(p + r + s) + 47⁄171 ⁢q,
27⁄37 ⁢(p + q + s) + 27⁄74 ⁢r, and
6⁄11 ⁢(p + q + r) + 4⁄11 ⁢s

all be equal to each other. Solving this system of equations gives us

p ≈ 34.4%,
q ≈ 44.3%,
r ≈ 40.8%, and
s ≈ −19.5%.

Apparently, predicting 3 fuel used is such a terrible move for Laser that our “optimal” solution wants us to predict it with a negative 19.5% probability! Unfortunately, Laser cannot actually select moves with negative probability. We have to add constrains to our acceptable solutions to ensure all probabilities are non-negative.

Adding linear constraints to our problem brings us into the realm of linear programming. Since we are entering this realm, we can take this opportunity to compute the minimax solution for each player. For Laser, the minimax solution is to compute a probability distribution that minimizes Missile’s score, i.e., their probability of winning, which we will denote by z, subject to the constraint that Missile will choose the move that maximizes their score for that distribution. This leads to the following system of linear constraints:

77⁄87 ⁢(q + r + s) ≤ z
47⁄57 ⁢(p + r + s) + 47⁄171 ⁢q ≤ z
27⁄37 ⁢(p + q + s) + 27⁄74 ⁢r ≤ z
6⁄11 ⁢(p + q + r) + 4⁄11 ⁢s ≤ z
0 ≤ p
0 ≤ q
0 ≤ r
0 ≤ s
p + q + r + s = 1

where we want to minimize z.

Using linear programming, we can optimize this system. The optimal solution is

p ≈ 30.5%,
q ≈ 38.1%,
r ≈ 31.4%,
s ≈ 0%, and
z ≈ 61.5%.

That is, Laser’s strategy is to predict 0 fuel burned 30.5% of the time, predict 1 fuel burned 38.1% of the time, predict 2 fuel burned 31.4% of the time, and never predict 3 fuel burned. This lets Missile win at most 61.5% of the time, or equivalently, it lets Laser win at least 38.5% of the time.

Is this minimax strategy really an optimal strategy? Let us look at Missile’s minimax strategy. For Missile, we need to optimize the following system of linear constraints:

p + 10⁄57 ⁢q + 10⁄37 ⁢r + 5⁄11 ⁢s ≤ z
10⁄87 ⁢p + 124⁄171 ⁢q + 10⁄37 ⁢r + 5⁄11 ⁢s ≤ z
10⁄87 ⁢p + 10⁄57 ⁢q + 47⁄74 ⁢r + 5⁄11 ⁢s ≤ z
10⁄87 ⁢p + 10⁄57 ⁢q + 10⁄37 ⁢r + 7⁄11 ⁢s ≤ z
0 ≤ p
0 ≤ q
0 ≤ r
0 ≤ s
p + q + r + s = 1

to minimize z.

The optimal solution is

p ≈ 19.9%,
q ≈ 32.0%,
r ≈ 48.2%,
s ≈ 0%, and
z ≈ 38.5%.

That is, Missile’s strategy is to burn 0 fuel 19.9% of the time, burn 1 fuel 32.0% of the time, burn 2 fuel 48.2% of the time, and never burn 3 fuel. This lets Laser win at most 38.5% of the time, or equivalently, it lets Missile win at least 61.5% of the time.

This pair of strategies is optimal because Laser wins at least 38.5% of the time by their strategy, and Missile wins at least 61.5% of the time by their strategy, which adds up to 100%. It turns out that for zero-sum games, the minimax, maximin, and Nash equilibrium strategy sets are all identical, and furthermore, these strategies form a convex set. Using a general-purpose Nash equilibrium solver will produce the same pair of optimal strategies.

Still, these strategies surprised me. Laser is not even aiming at Missile burning 3 fuel. Shouldn’t Missile avoid being hit by Laser entirely by choosing to burn 3 fuel? But Missile’s optimal strategy also says to avoid burning 3 units of fuel. Why?

Upon closer examination, we see that with Missile’s computed optimal strategy, they have a 61.5% chance of winning. If Missile were to burn 3 fuel, yes, they would avoid being hit by Laser in Round 4. However, they would begin Round 5 with only 1 remaining fuel. In that state they would only have a 54.5% chance of winning, worse odds than their optimal strategy that avoids burning 3 fuel.

Laser is not aiming at Missile burning 3 fuel because Laser would love for Missile to burn 3 fuel. Doing so would increase Laser’s odds of winning from 38.5% to 45.5%. We see that Laser’s strategy does not entirely rule out all consequences of Missile’s choices. It only makes it indifferent to Missile’s choices within the support of Missile’s optimal mixed set of moves. Technically an opponent’s choices can affect the outcome of the game; they can still make moves that benefit the other player.

Continuing with linear programming, we can fill out the table for the probability of winning for Round 4.

Probability table for Round 4
Remaining Fuel	Missile Win Probability	Laser Win Probability
1-	0%	100%
2	≈ 36.4%	≈ 63.6%
3	≈ 50.5%	≈ 49.5%
4	≈ 61.5%	≈ 38.5%
5	≈ 69.9%	≈ 30.1%
6	≈ 76.4%	≈ 23.6%
7	≈ 81.6%	≈ 18.4%

Continuing this way, we can work backwards and compute probability tables for all the rounds until we reach round 1.

Probability table for Round 1
Starting Fuel	Missile Win Probability	Laser Win Probability
7	1005005076075⁄3110959445024 ≈ 32.3%	2105954368949⁄3110959445024 ≈ 67.7%

In conclusion, we found that playing optimally, Missile has an approximately 32.3% chance of winning, which is a little higher than the 25% estimate given by Ethan. I leave it as an exercise to determine the most fair amount of starting for Missile to start with.

Accessing external resources reliably with Bazel

2026-04-02T00:00:00Z

We say that a system is reliable if it continues to function correctly when events outside the system affect it. Many factors can impact the reliability of Bazel builds, especially dependencies on external services. In this post, we’ll focus on what can go wrong when your build needs resources you don’t control and what you can do to reduce the risk of build failures.

Depending on external resources

Some build actions triggered by Bazel might be accessing resources that are external to your organization. For Bazel builds, this typically applies to build rules (to build your first-party code) or repository rules (utilities and tools those rules might need). When Bazel starts a build, it emits data about network requests, and you need to make those external requests visible so that you know what external resources your builds depend on. You can access this information via the Build Event Protocol (BEP) which can be written to disk or, if you operate a remote cache service, your provider might have a BEP viewer. You can also use the --experimental_repository_resolved_file flag to produce resolved information about all Starlark repository rules that were executed.

Building a target that depends on a repository rule such as this:

http_archive(
    name = "yq_cli",
    build_file = "@//tools/yq:BUILD.bazel.gen",
    sha256 = "7583d471d9bfe88e32005e9d287952382df0469135f691e044443f610d707f4d",
    url = "https://github.com/mikefarah/yq/releases/download/v4.47.1/yq_linux_amd64.tar.gz",
)

would result in the following build event (the snippet below is copied from the BEP output):

...
children {
  fetch {
    url: "https://github.com/mikefarah/yq/releases/download/v4.47.1/yq_linux_amd64.tar.gz"
  }
}
...

To get an idea of what kinds of artifacts a Bazel build for a reasonably large project might fetch, let’s build a few open-source projects — Envoy, Redpanda, and datadog-agent. These are some of the domains from which at least one resource was fetched when building all targets from these projects:

bcr.bazel.build             cdn.azul.com                dl.google.com
dl.grafana.com              dl.min.io                   download.gnome.org
files.pythonhosted.org      gcr.io                      github.com
go.dev                      mirror.bazel.build          mirrors.kernel.org
raw.githubusercontent.com   pkgconfig.freedesktop.org   pypi.org
static.crates.io            static.rust-lang.org        s3.amazonaws.com
www.antlr.org               www.colm.net                www.lua.org
www.sqlite.org              www.tcpdump.org

While most of your external dependencies are going to be declared in build metadata files such as MODULE.bazel (or legacy WORKSPACE), some network requests are going to be made by build targets such as genrules (e.g., by calling curl) or toolchains (e.g., a pip call to the PyPI index). We’ll see a worked example of this later in the post.

Common problems

In general, it is advised to rely on MODULE.bazel or WORKSPACE mechanisms for accessing external dependencies instead of doing so via build or test actions. Bazel by design lacks support and features for downloads to take place within build actions, and when attempting to interact with external systems this way, you will be limited in how you can manage and account for those requests.

Therefore, when building, the complete list of accessed online resources — those that are accounted for by BEP and those that are not — might be much longer. After doing a full build, it might be helpful to audit the network requests made to discover what resources were fetched and a complete inventory of external hosts your build depends on.

Given these external dependencies, these are common problems that could happen to any of them:

Outages: no service provides 100% uptime guarantee and some providers, sadly, have incidents all too often.
Removed artifacts: an archive file might be deleted due to retention policy.
Rate limiting: many concurrent builds coming from the same cluster can accidentally trigger API or download rate limits, especially with public registries.
Checksum drift: content of an artifact at a given URL can change, intentionally or maliciously, causing checksum mismatches.

This post focuses on strategies to either remove these external dependencies from the critical path, or make failures graceful and recoverable.

Remedies

The remedies below are intentionally “stackable”: you can start with low-effort safeguards (e.g., checksums and retries) and progress toward stronger guarantees (e.g., mirrors and network blocking). If you’re skimming, you can pick one external host that concerns you (e.g., github.com or pypi.org) and follow the options that would let you depend on it more reliably.

Using checksums

External resources may not only vanish or become inaccessible, but also change in place. Any artifact you download (unless there’s a strong guarantee from a provider), might change its contents such as when a provider does in-place updates of their releases (or it could also be a malicious attempt to inject code). To prevent this issue, SHA-256 digests must be coupled with any artifact you download from the Internet. Even though when declaring dependencies on external resources such as with http_archive, providing sha256 attribute is optional, it is considered a security risk to omit specifying the SHA-256 for remote files to be fetched.

Using GitHub releases

As the majority of build rules and open-source tools used by projects built with Bazel are hosted on GitHub, there are some special concerns that are worth mentioning.

A public GitHub repository might be moved, deleted, or become private (this happened in 2025 with rules_mypy). If you do have to rely on external rulesets hosted on GitHub, make sure they are hosted under the bazel-contrib organization (or help get them migrated at some point) to avoid surprises.

Checksums of dynamically generated archives might change; this has caused Bazel outages before, in 2023. There was some confusion about whether the stability of archives is guaranteed or not. There might be some edge cases such as when a Git repository is renamed, and since Bazel builds rely on stability of archives (for reproducibility and caching among other reasons) it might be best to play it safe and only use releases instead of using source downloads.

Using retries

It is possible that some of your dependencies need to be obtained from an online resource that is known to be unstable. What’s worse, you may not even be able to cache it (or host yourself): for example, imagine needing to download a short-lived license file for a commercial product from the manufacturer’s server when starting a build. To make downloading this file (via a repository rule) more likely to succeed, consider using the --experimental_repository_downloader_retries flag to specify the maximum number of attempts to retry upon a download error.

Placing binaries under version control

This varies a lot between organizations and the programming languages concerned, but a common approach that is adopted by most organizations is to check in the source code that is used to build a binary, and not the binary itself.

Many engineers would be strongly opposed to checking in any binary, as Version Control Systems (VCS) are designed and optimised for managing the source code. However, it is known that some organizations choose to place binary libraries that are external dependencies of their first-party code under version control. This has been seen occasionally in Java projects where .jar libraries (that nowadays can be managed with Maven / Gradle) were checked in. Today, this, arguably, might make sense only for legacy projects, air-gapped or classified networks, and for vendored native libraries that are hard to rebuild.

Unless you are able to provide top-notch automation for keeping your third-party dependencies checked in under version control up-to-date, patched, and compliant with any licensing constraints, it might be best to rely on a private artifact cache for hosting third-party dependencies.

Internal repository manager

As your organization grows, you will likely need to invest in a tool that would allow you to organize your resources such as external tools and third-party code packages into repositories. There are lots of commercial solutions on the market such as JFrog Artifactory, Sonatype Nexus, AWS CodeArtifact, and GitLab package registry to name a few.

With a repository manager, once you discover a dependency on an external artifact, you would upload it manually in your internal binary repository and update your build metadata accordingly:

# MODULE.bazel
http_archive(
  name = "tool",
  ...
  urls = [
    "https://artifacts.company.com/artifactory/project/tools/tool-1.2.3.tar.gz",
    "https://www.project.org/source/1.2.3/tool-1.2.3.tar.gz",
  ]
)

URLs from the urls attribute are tried in order until one succeeds. It is recommended to specify the local binary repository artifact first, and if the hosted mirror happens to be down, your build would still succeed provided that, in this case, project.org is up and running.

Bazel downloader configuration

You could also let your binary repository manager be the only place where Bazel builds can fetch resources from if you don’t want to depend on external artifacts in any way at all. This can be achieved by providing a configuration file for the remote downloader using the --downloader_config flag.

For example, a simple use case may be to block GitHub and instead rewrite fetches to go to an Artifactory instance. This can be done with the following downloader configuration:

rewrite github.com/([^/]+)/([^/]+)/releases/download/([^/]+)/(.*) artifacts.my-company.com/artifactory/github-releases-mirror/$1/$2/releases/download/$3/$4

# if you still have to rely on dynamically generated archives instead of releases
rewrite github.com/([^/]+)/([^/]+)/archive/(.+).(tar.gz|zip) artifacts.my-company.com/artifactory/github-releases-mirror/$1/$2/archive/$3.$4

However, support for using Bazel’s downloader needs to be enabled in Bazel rulesets by their authors. For instance, in rules_python, the pip extension now supports pulling information from a PyPI compatible mirror which means that the Bazel downloader can be used for downloading Python wheels.

Take a look at some downloader configurations used in other projects (e.g., 1, 2, 3) to explore how others set up access to external resources and learn the nuances of the configuration declaration syntax.

Blocking network requests

Additional control of network access can be achieved by blocking some network requests in CI agents using custom firewall rules or other tools of that nature. However, as mentioned earlier, Bazel’s downloader configuration can only rewrite or block requests that Bazel is aware of. This means that not all network traffic in a Bazel build is Bazel-managed traffic.

To illustrate this, let’s declare a dependency on the gawk binary. When running gawk, its sources are going to be fetched from the GNU FTP server. Let’s also add a genrule that will download an archive from the same FTP server:

# MODULE.bazel
bazel_dep(name = "gawk", version = "5.3.2")

# BUILD.bazel
genrule(
    name = "diffutils",
    outs = ["diffutils-3.12.tar.xz"],
    cmd = """wget -O "$@" https://ftp.gnu.org/gnu/diffutils/diffutils-3.12.tar.xz""",
)

We’ll configure Bazel to use a downloader configuration that blocks fetches from that FTP server:

# bazel_downloader.cfg
block ftp.gnu.org

# .bazelrc
common --downloader_config=bazel_downloader.cfg

When attempting to run the gawk binary from the ruleset, an error is expectedly raised since accessing the server is blocked:

$ bazel run @gawk
...
ERROR: java.io.IOException: Configured URL rewriter blocked all URLs:
[https://ftp.gnu.org/gnu/gawk/gawk-5.3.2.tar.xz]

However, building a genrule still succeeds because the downloader configuration does not apply here:

$ bazel build //src:diffutils
...
INFO: From Executing genrule //src:diffutils:
--2026-01-19 10:48:54--  https://ftp.gnu.org/gnu/diffutils/diffutils-3.12.tar.xz
Resolving ftp.gnu.org (ftp.gnu.org)... 209.51.188.20, 2001:470:142:3::b
Connecting to ftp.gnu.org (ftp.gnu.org)|209.51.188.20|:443... connected.
HTTP request sent, awaiting response... 200 OK
Saving to: 'bazel-out/k8-fastbuild/bin/src/diffutils-3.12.tar.xz'

External network requests of this nature are hard to audit in a large codebase since they won’t show up as structured fetch events in BEP output. To mitigate this, prefer using repository rules and Bzlmod extensions for any downloads instead of ad hoc shell commands. Going a step further, you might want to consider forbidding direct calls to applications that might make network requests (such as curl or wget) in genrule targets, unless explicitly approved. Where unavoidable, configure targets to access internal repositories instead of public endpoints.

Sandboxing

When triggering builds in a Bazel sandbox, they are run in a container (using Linux Namespaces) to isolate the build actions from the host. In addition to making your entire filesystem read-only (except for the sandbox directory), you can also forbid actions access the network. This is useful in some scenarios when you want to confirm that a build doesn’t make any network requests such as when running unit tests or integration tests that are not supposed to make any network calls. See Bazel tags requires-network and block-network to learn how to control network access for individual build targets.

Keep in mind that cached results of build actions can still be fetched even when blocking the network in a sandbox. So if artifacts needed for a build were uploaded to the Bazel cache previously, you won’t know whether a particular build needs any network resources unless you run the build without cache access. Also, none of the sandbox flags affect any cache as it’s expected that these flags should not affect the output of hermetic actions and making them part of a cache key would worsen the effectiveness of the cache.

With the network disabled in a sandbox, the genrule target we declared earlier fails to build:

$ bazel build //src:diffutils --spawn_strategy=linux-sandbox --nosandbox_default_allow_network
...
ERROR: Executing genrule //src:diffutils failed: (Exit 4): bash failed: ...
Resolving ftp.gnu.org (ftp.gnu.org)... failed: Temporary failure in name resolution.
wget: unable to resolve host address 'ftp.gnu.org'
Target //src:diffutils failed to build
...

Mirrors

Since Bazel 8.4, you can also use the --module_mirrors flag to mirror the source archives. To take advantage of this, add --module_mirrors=https://bcr.cloudflaremirrors.com in your .bazelrc file. Keep in mind that this only applies to registry sources and not to other resources fetched by Bazel (such as downloads happening in the repository rules context).

Note that for Bazel builds, the Bazel Central Registry (BCR) only stores metadata for a Bazel module; the actual artifacts are usually fetched from URLs that point to files hosted online (most often on GitHub).

BCR itself is a sort of external dependency for your builds, too. Even though it’s hosted on production-grade infrastructure at Google, it can still be impacted by outages and operational mishaps. The SSL certificate for mirror.bazel.build has expired, causing worldwide CI breakages, at least twice: once in 2022 and again in 2025. Refer to Postmortem for bazel.build SSL certificate expiry to learn more.

Configuring Bazel to use https://bcr.cloudflaremirrors.com as a mirror for modules from the BCR helps, but the Cloudflare mirror doesn’t cover the registry itself. So if you want to go the extra mile, you might also consider setting up your own BCR index registry and point Bazel at that instead. But if this is not feasible, write a playbook for incident response around build outages caused by external dependencies, so teams don’t have to improvise under pressure.

Pull-through cache

If your repository manager supports it, you could let your builds download external resources, but every resource that is being fetched is saved into the cache as well. On subsequent builds, the resources are going to be fetched from the cache, if available. This would let you turn random external downloads into a controlled internal dependency without requiring you to pre-vendor everything up front.

If your CI agents are in the same network or cloud region (depending on your infrastructure setup), this could also speed up the builds by having downloads complete faster. Not relying on external resources makes your Bazel builds also a lot more secure as your CI agents will only download data from a trusted source.

If using an off-the-shelf solution, such as the popular JFrog Artifactory, is not possible, there are some other options. Bazel picks up proxy addresses from the HTTP_PROXY and HTTPS_PROXY environment variables and uses these to download files over HTTP and HTTPS, respectively (if specified). This means you might have success with caching proxy solutions such as Squid and Charles or by combining Nginx and Varnish HTTP reverse proxies. Routing requests through a proxy might also help to avoid rate limiting issues since the external service will see fewer direct requests.

With this configuration, your downloader configuration file would look something like this:

# point all downloads at the mirror
rewrite (.*) {caching-service-url}/$1

# use the original location if the mirror is down
rewrite (.*) $1

For a completely custom solution, take a look at the Bazel downloader mirror from Monogon which can be used to mirror Bazel dependencies to a cloud bucket storage such as S3 or GCS. Bazel’s remote asset API lets you use an existing remote cache (content-addressable storage: CAS) as a downloader cache as well. The cache provider service needs to support it, but many existing solutions, both commercial and open-source ones, are compatible.

The --experimental_remote_downloader flag can be specified to provide a Remote Asset API endpoint URI to be used as a remote download proxy. To get started, consider using bazel-remote, which has out-of-the-box support for this use case. Make sure to provide the sha256 for the assets to fetch so that they can be cached just like any other CAS object. A remote caching service will automatically download the assets from the URL if they are found in the CAS and cache it thereafter.

Bazel 9 adds support for remote repository caches which make Bazel builds (at least those requiring previously cached assets) extra resilient to external access issues. During outages of external hosting services, those organizations that didn’t have a central repository manager where repository rules artifacts could be stored had to extract files from cache directories on local developer machines and save them to an accessible location within the internal network.

Now these artifacts will be saved into a remote cache similarly to build output results. To confirm that your remote repository cache works as expected, you can use the --repository_disable_download flag after doing a clean build (which should succeed as it will reuse the remote cache entries uploaded in the previous build).

Chaos testing

Finally, instead of waiting for the next GitHub outage, you can test your resilience by intentionally breaking access to certain external hosts. In a staging CI environment, temporarily block access to key external systems with firewall rules and verify that your mirrors and caches are used as expected, builds either still succeed, or fail fast with clear error messages, and your runbooks are correct and sufficient.

Conclusion

Bazel projects often depend on external services in subtle ways, and any instability or change in those services can break otherwise healthy builds. You can significantly improve build reliability by making all downloads explicit and verifiable, routing them through managed infrastructure, and tightening how and when network access is allowed. Resilient Bazel builds come from treating external dependencies as first‑class operational risks and turning unpredictable third‑party failures into controlled, recoverable events.

Second pre-release of hs-bindgen

2026-03-27T00:00:00Z

With heartfelt thanks to the many people who have already tried hs-bindgen and given us feedback, we have steadily been working towards the first official release (see Contributors for the full list). In case you missed the announcement of the first alpha, hs-bindgen is a tool for automatic construction of Haskell bindings for C libraries: just point it at a C header and let it handle the rest. Because we have fixed some critical bugs in this alpha release, but we’re not quite ready yet for the first full official release, we have tagged a second alpha release. In the remainder of this blog post we will briefly highlight the most important changes; please refer to the CHANGELOG.md of hs-bindgen and of hs-bindgen-runtime for the full list of changes, as well as for migration hints where we have introduced some minor backwards incompatible changes.

Bugfixes

The most important fixes for bugs in the generated code are:

The implementation of peek and poke for bitfields was broken, which could lead to segfaults.
Duplicate record fields are now usable also in Template Haskell mode.
Patterns for unsigned enums now get the right value.

We have also resolved a number of panics during code generation, but those would not have resulted in incorrect generated code (merely in no code being generated at all).

New features

Implicit fields arise when one struct (or union) is nested in another, without any field name or tag:
```
struct outer {
  int x;
  struct {
    int y;
    int z;
  };
};
```
We now support such implicit fields; both the inner (anonymous) struct as well as the corresponding field of the outer struct will be named after the first field of the inner struct ¹:
```
data Outer = Outer {
    x :: CInt
  , y :: Outer_y
  }

data Outer_y = Outer_y {
    y :: CInt
  , z :: CInt
  }
```
For this particular case we could also have chosen to flatten the structure and add y and z directly to Outer, but that does not work in all cases (for example, when we have an anonymous struct inside a union), so instead we opt for consistency and always generate an explicit type for the inner struct.

Unnamed bit-field declarations, which are used to control padding, are now supported:

struct bar { signed char x : 3; signed char : 3; // Explicit padding signed char y : 2; };

We used to distinguish between parse predicates (which files should hs-bindgen parse at all?) and selection predicates (for which C declarations should we generate Haskell declarations?). This was confusing, and as we are getting better at skipping over declarations with unsupported features (and that list is dwinding anyway), parse predicates are not that useful anymore. Parse predicates therefore have been removed entirely; we simply always parse everything (selection predicates are still very much an important feature of course).

Some infrastructure for and around binding specifications has been improved. For example, we now distinguish between macros and non-macros of the same name, and our treatment of arrays has changed slightly. For example, given

typedef char T []; void foo (T xs);

we now generate

foo :: Ptr (Elem T) -> IO ()

We do not use Ptr CChar, because T might have an existing binding in another library (with an external binding specification), and we don’t know what the type of the elements of T are (it could for example be some newtype around CChar). Elem is a member of a new IsArray class, part of the hs-bindgen-runtime.

Top-level anonymous enums are now supported. For example,

enum {A, B};

results in

pattern A :: CUInt pattern A = 0 pattern B :: CUInt pattern B = 1

(Normally an enum results in a newtype around the enum’s underlying type, and the patterns are for that newtype instead.)

We now generate bindings for static global variables (such globals are sometimes used in headers that also contain static function bodies).

All definitions required by the generated code are now (re-)exported from hs-bindgen-runtime, so that it becomes the only package dependency that needs to be declared (no need for ghc-prim or primitive anymore).

This list is not complete; some other less common edge cases have also been implemented.

Conclusions

Although we are still working on some finishing touches before we can release the first official version of hs-bindgen, it is already being put to good use on various projects. There are only a handful of missing C features left, all of which low priority edge cases (though if you have a specific use case for any of these, do let us know!). So if you are interested, please do try it out, and let us know if you find any problems. There should be no major breaking changes between now and the first official release.

This is the version that uses the --omit-field-prefixes option, which generates code that relies on DuplicateRecordFields and OverloadedRecordDot.↩︎

GHC 9.12.4 is now available

2026-03-27T00:00:00Z

GHC 9.12.4 is now available

wz1000 - 2026-03-27

The GHC developers are very pleased to announce the release of GHC 9.12.4. Binary distributions, source distributions, and documentation are available at downloads.haskell.org and via GHCup.

GHC 9.12.4 is a bug-fix release fixing many issues of a variety of severities and scopes, including:

Fixed a critical code generation regression where sub-word division produced incorrect results (#26711, #26668), similar to the bug fixed in 9.12.2
Numerous fixes for register allocation bugs, preventing data corruption when spilling and reloading registers (#26411, #26526, #26537, #26542, #26550)
Fixes for several compiler crashes, including issues with CSE (#25468), and the simplifier(#26681), implicit parameters (#26451), and the type-class specialiser (#26682)
Fixed cast worker/wrapper incorrectly firing on INLINE functions (#26903)
Fixed LLVM backend miscompilation of bit manipulation operations (#20645, #26065, #26109)
Fixed associated type family and data family instance changes not triggering recompilation (#26183, #26705)
Fixed negative type literals causing the compiler to hang (#26861)
Improvements to determinism of compiler output (#26846, #26858)
Fixes for eventlog shutdown deadlocks (#26573) and lost wakeups in the RTS (#26324)
Fixed split sections support on Windows (#26696, #26494) and the LLVM backend (#26770)
Fixes for the bytecode compiler, PPC native code generator, and Wasm backend
The runtime linker now supports COMMON symbols (#6107)
Improved backtrace support: backtraces for error exceptions are now evaluated at throw time
NamedDefaults now correctly requires the class to be standard or have an in-scope default declaration, and handles poly-kinded classes (#25775, #25778, #25882)
… and many more

A full accounting of these fixes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

GHC development is sponsored by:

We would like to thank these sponsors and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

Athena Loses a Bet

2026-03-25T18:58:04Z

Athena and Ares argue over human nature, and agree to test three great minds of the age.

First, they approach Aristotle in the Lyceum and propose a bargain. “If you ask it of us, the one you love most in the world will perish, but you will be made rich beyond imagining.” Aristotle barely hesitates. “No,” he says. “To destroy the very purpose of living for the sake of the mere means is the mark of a man who lacks wisdom.”

Next, they approach Plato, finding him pacing in an olive grove of his Academy. They offer the same proposal. “I decline,” he says. “Love allows us to glimpse the ideal of pure beauty, but wealth is an anchor to the material world.”

Finally, they approach Socrates, wandering barefoot in the crowded dusty stalls of the Agora. The gods approach him with the same bargain: “If you ask it of us, Xanthippe, whom you love most in the world, will perish — ”

“I ask it!” he blurts out.

Athena blinks. “You did not even hear the rest. We were going to say you would be given wealth beyond measure.”

Socrates shrugs. “Keep it. This was never about money.”

Millenia later, Athena is still smarting from losing the bet, and she demands a rematch. Searching for another Greek philosopher, they instead find a middle aged woman writing a novel called Atlas Shrugged. She’s a philosopher, and Atlas was Greek, so that’s close enough.

“If you ask it,” Athena says to her, “we will make you wealthy beyond measure, but then in return, your true love will be taken from you.”

The woman looks up, bored, and asks “Why give me the money if you’re just going to take it right back?”

79: Peter Thiemann

2026-03-22T12:00:00Z

Peter is a professor at the University of Freiburg, and he was doing functional programming right when Haskell got started. So naturally we asked him about the early days of Haskell, and how from the start Peter pushed the envelope on what you could do with the type system and specifically with the type classes, from early web programming to program generation to session types. Come with us on a trip down memory lane!

Haskell ecosystem activities report: December 2025â€“February 2026

2026-03-19T00:00:00Z

This is the thirtieth edition of our Haskell ecosystem activities report, which describes the work Well-Typed are doing on GHC, Cabal, HLS and other parts of the core Haskell toolchain. The current edition covers roughly the months of December 2025 to February 2026.

You can find the previous editions collected under the haskell-ecosystem-report tag.

Sponsorship

We offer Haskell Ecosystem Support Packages to provide commercial users with support from Well-Typed’s experts while investing in the Haskell community and its technical ecosystem including through the work described in this report. To find out more, read our announcement of these packages in partnership with the Haskell Foundation. We need funding to continue this essential maintenance work!

Many thanks to our Haskell Ecosystem Supporters: Standard Chartered, Channable and QBayLogic, as well as to our other clients who also contribute to making this work possible: Anduril, Juspay and Mercury; and to the HLS Open Collective for supporting HLS release management.

Team

Matthew Pickering announced that he will be leaving the company and moving to a non-Haskell role at the end of March. Working with Matt has been a joy – more than his deep technical insight or sharp intuition, it’s the warmth of his vision for how to work together and his generosity that has made him such a force within the team. He was also a beacon that could rally the community in difficult times, perhaps most memorably with his technical and social contributions in consolidating Haskell IDEs with the creation of the Haskell Language Server. His dedication to tooling has also been an inspiration, with his work on ghc-debug and on profiling an invaluable contribution to our understanding of memory usage of Haskell programs.

The Haskell toolchain team at Well-Typed currently includes:

In addition, many others within Well-Typed contribute to GHC, Cabal, HLS and other open source Haskell libraries and tools. This report includes contributions from Alex Washburn, Duncan Coutts, Wen Kokke and Wolfgang Jeltsch in particular.

We are active participants in community efforts for developing the Haskell language and libraries. Rodrigo joined the GHC Steering Committee in December, alongside Adam Gundry. Wolfgang joined the Core Libraries Committee in February.

Highlights

Interactive step-through debugging

The Haskell Debugger (hdb) has been made more robust and more features were implemented by Rodrigo, Matthew, and Hannes. Most notably, the debugger now:

Displays stack traces for bytecode and compiled code frames (provided the program and dependencies were compiled with -finfo-table-map for the latter)
Displays source locations and callstacks for exception breakpoints
Uses the external interpreter by default
Can be run on GHC itself!

To run hdb you need to use GHC 9.14 and to configure the IDE accordingly. Please refer to the installation instructions. Apart from that, if HLS just works on your codebase, so should the debugger!

Live monitoring using the eventlog

GHC’s eventlog already lets Haskell programs emit rich runtime telemetry, but the workflow has historically been to run the program to completion and inspect the eventlog afterwards. eventlog-live allows us instead to monitor the program as it is running. Wen continued work on this project, taking significant steps towards making it production-ready, including:

extending eventlog-live with support for the OpenTelemetry protocol (#119),
bringing the underlying eventlog-socket library closer to being ready for general use, by adding a testsuite (#27). fixing a litany of issues with the C code (#38), and finalising the user-facing API (#43),
adding support for custom commands in eventlog-socket (#36).

Trees That Grow

The Language.Haskell.Syntax module hierarchy is intended to be a stable, public API for the Haskell AST — one that external tools could eventually depend on without coupling themselves to GHC internals, reducing ecosystem breakage. Right now, that goal is undermined by lingering dependencies on internal modules under the GHC hierarchy.

Alex, with help from Rodrigo, has been systematically removing these edges in the dependency graph:

Language.Haskell.Syntax.Type no longer depends GHC.Utils.Panic (!15134, #26626).
Language.Haskell.Syntax.Decls no longer depends on GHC.Unit.Module.Warnings (!15146, #26636), nor on GHC.Types.ForeignCall (!15477, #26700) or GHC.Types.Basic (!15265, #26699).
Language.Haskell.Syntax.Binds no longer depends on GHC.Types.Basic (!15187, #26670).

Once this work is done, it will be possible to consider moving the AST into a separate package, and taking further steps towards increasing modularity of the compiler.

Towards a standalone `base` package

Historically, the base package was used as both the user-facing standard library and a repository of GHC-specific internals, with much special treatment in the compiler. This means GHC and base versions are tightly coupled, and makes upgrading to new compiler versions unnecessarily difficult.

GHC developers have made significant progress towards making base a normal Haskell package: ghc-internal has been split out as a separate library, base no longer has a privileged unit-id in the compiler, and Cabal now allows reinstalling it.

Matt posted a summary of progress and outlined possible next steps to seek community consensus on the direction of travel. The reinstallable-base repository collects documents and discussion on the effort.

Wolfgang continued various pieces of technical groundwork:

cleaning up many unused known-key names in the compiler (!15184, !15190, !15211, !15213, !15217, !15218, !15219, !15215),
finishing the process of removing GHC.Desugar from base (!15433),
refining the import list of System.IO.OS to aid in modularity (!15567).

Wolfgang improved the public API of base relating to OS handles, to make the API more stable across platforms and avoid the need for users to depend on GHC-internal implementation details (!14732, !14905). While in the area, he fixed a bug in the implementation of hIsReadable and hIsWritable for duplex handles (#26479, !15227), and a mistake in the documentation of hIsClosed (!15228).

Incorrect absence analysis in GHC

GHC bug #26416 has occupied the attention of the team for quite some time. Initially thought to be an issue with specialisation, a reproducer that Sam and Magnus created showed that the issue is in fact a bug in absence analysis — an optimisation that identifies and removes unused function arguments — in which GHC would erroneously conclude that a used argument was in fact absent.

Andreas helped investigate the root cause, before Zubin finally took the torch and put up a solution (!15238).

GHC changelogs

GHC’s changelogs have not always been as complete or reliable as the community deserves. Keeping changelogs accurate across backports has also been a major source of frustration for release managers.

This is why, after a discussion initiated by Teo Camarasu in #26002, we have decided to adopt the changelog.d system — already in use by the Cabal project — in which each change is a separate file in the changelog directory. This eliminates the merge conflicts that make backporting painful, and makes it easier to associate MRs with changelog entries.

Zubin has been spearheading the effort, with the intention to switch to this new method of changelog generation right after the fork date for GHC 10.0.

GHC

GHC Releases

Zubin worked on 9.12.3, backporting patches and preparing release candidates, with a final release on the 27th of December.
Magnus and Zubin worked on backports for 9.12.4.
Zubin worked on 9.14.1, putting out the final release on the 19th of December.

Frontend

Sam reviewed the implementation of the QualifiedStrings extension by Brandon Chinn (!14975). This allows string literals of the form ModName."foo" (interpreted as ModName.fromString ("foo" :: String)).
Sam made several changes to the treatment of Coercible constraints in the typechecker (!14100):
- Defaulting of representational equalities to nominal equalities, functionality previously added to GHC by Sam, is now more robust (#25825).
- Error messages involving unsolved Coercible constraints are greatly improved, an oft-requested improvement (#15850, #20289, #23731, #26137). Error messages now consistently mention relevant out-of-scope data constructors, provide import suggestions, and include additional explanations about roles (when relevant).
Magnus implemented several fixes to the implementation of ExplicitLevelImports:
- a missing check for types (#26098, !15119),
- a GHC panic in the driver (#26568, !15118).
Sam improved the reporting of “valid hole fits”, adding support for suggesting bidirectional pattern synonyms (#26339) and properly dealing with data constructors with linear arguments (#26338).
Sam investigated a typechecking regression starting in GHC 9.2 with the introduction of the Assert type family to improve error messages involving comparison of type-level literals (#26190), posting his analysis to the ticket. To tackle this, he opened GHC proposal #735, which is still in need of further community feedback.
Sam minimised a bug with rewrite rules (#26682), which allowed Simon Peyton Jones to identify and fix the bug (!15208).
Sam improved how existential variables are displayed in Haddock documentation (!15099, #26252).

Determinism

Matt identified and fixed several ways in which GHC compilation was not deterministic:
- an issue with non-deterministic documentation information (#26858, !15482).
- non-determinism of constraint solving impacting generated Typeable evidence (#26846, !15442).
- issues with the Template Haskell machinery of the singletons library producing non-deterministic names (singletons #629, th-desugar #240).

Plugins

Sam finished up and landed a long-standing MR by Chris Wendt (!10133) which fixed a plugin-related issue.
Sam fixed a regression in ghc-typelits-natnormalise in which the plugin would cause GHC to fall into an infinite loop (ghc-typelits-natnormalise #116, #118).

Backend

Rodrigo announced that work described in #23218 evolved into the POPL 2026 paper “Lazy Linearity for a Core Functional Language”, which presents a way to type linearity in GHC Core that is robust to almost all GHC optimisations, together with a GHC plugin validating programs at each optimisation stage.
With the oversight of Andreas, Sam carefully reconsidered the treatment of register formats in the register allocator and liveness analysis. This culminated in !15121:
- Keep track of register formats in liveness analysis (#26526).
- Use the right format when reloading spilled register (#26411).
- Enforce the invariant that writes to a register re-defined the format that this register is used at for the purposes of liveness analysis, fixing another bug reported by @aratamizuki on !15121.
Sam put up a small fix for the mapping of registers to stack slots, fixing an oversight in the case that registers start off small and are subsequently written at larger widths (#26668, !15185).
Sam reviewed a GHC contribution by @sgillespie adding SIMD primops for abs and sqrt operations (!15236), suggesting more efficient implementations of certain operations.
Andreas investigated potential missed specialisations, which allowed Simon Peyton Jones to make further progress in improving the specialiser (#26831, !15441).
Sam investigated several bugs to do with the interactions of join points with ticks (#14242, #26157, #26642, #26693) and casts (#14610, #21716, #26422). He fixed the main bug (#26642, !15538), which was due to incorrect transformations in mergeCaseAlts. He also undertook a general refactor of the area and, pinning down the overall handling of casts and ticks under join points in a Note.

Runtime system and linker

Matt fixed a decoding failure for stg_dummy_ret by using INFO_TABLE_CONSTR for its closure (#26745, !15303).
Duncan fixed long-standing inconsistencies in eventlog STOP_THREAD status codes (#26867, !15522).
Andreas improved the documentation of the -K RTS flag in !15365 (#26354).

Exception backtraces, stack annotations and stack decoding

Matt and Hannes improved the reporting of backtraces when using error (!15306, !15395, #26751). This involved opening two CLC proposals (CLC #383, CLC #387).
Hannes continued working on the implementation of stack annotations and stack decoding (#26218), including:
- integrating ghc-stack-profiler, a profiler that relies on stack annotations instead of heavier profiling mechanisms, with the eventlog-socket library; and
- working on the ghc-stack-annotations compatibility library for annotating the stack.
Rodrigo removed an incorrect assertion that fired when decoding a BCO whose bitmap has no payload (#26640, !15136).

Build system and packaging

Zubin fixed a GHC 9.14.1 build issue due to missing .cabal files for ghc-experimental and ghc-internal in the source tarball (#26738, !15391).
Andreas investigated the use of Cabal’s --semaphore feature to speed up GHC builds slightly (#26876, !15483). There are some issues preventing us from enabling this unconditionally (#26977, Cabal #11557).

CI and testing

Magnus ensured the user’s guide can be generated with old versions of Python to fix CI build failures on some older containers (!15127).
Magnus updated the Debian images used for CI (ci-images !183, !178).
Sam finished up the work of Sven Tennie on testing floating point expressions in the test-primops test framework for GHC (test-primops !19). This is preparatory work for improving the robustness of GHC’s handling of floating point (#26919).
Andreas updated the nofib GHC benchmarking suite to fix issues that Sam ran into when trying to use it, updating the CI in the process (nofib !81, !82, !83).

Infrastructure

Magnus worked on the infrastructure for the GitLab instance used for the GHC project, bringing up new runners for CI and switching to a new verification system to approve new users which makes it easier for new contributors to open issues.
Magnus and Andreas helped the Haskell infrastructure team address Gitlab outages on short notice in order to improve availability of the GHC Gitlab instance.
Andreas and Magnus organized temporary CI capabilities sponsored by WT during a temporary outage of one of GHC’s CI runners.

Cabal

Sam added support for setting the logging handle via the library interface of Cabal, a significant milestone in updating cabal-install to compile packages with the Cabal library without invoking external processes (Cabal #11077).
Matt helped Matthías Páll Gissurarson to fix a bug in which cabal haddock was looking for files in the wrong directory (Cabal #11475, #11476).
Matt fixed a bug with broken Haddocks locally due to non-expanded ${pkgroot} variable (Cabal #11217, #11218).
Matt fixed some issues with cabal repl silently failing (Cabal #11107, #11237).

HLS

In collaboration with Zubin and Andreas, Hannes investigated the root cause of HLS issue #4674, posting his analysis in this comment. In short, the problem was that the hlint plugin was using an incompatible version of ghc-lib-parser, and a version mismatch in this library was causing segfaults due to changes to the GHC.Data.FastString implementation between the versions. Hannes disabled the hlint plugin on GHC 9.10 to work around this issue (HLS PR #4767).
Hannes reviewed and assisted with HLS PR #4856 by @vidit-od. This PR makes HLS use the stored server-side diagnostics for code actions, in order to make them more responsive. This fixes HLS issue #4805.
Hannes helped land long-running HLS PR #4445 by @soulomoon, which allows files to be loaded concurrently in batches in order to improve responsiveness of HLS.
Zubin and Hannes worked together to update HLS to work with GHC 9.14 (HLS PR #4780).
Hannes worked on general maintenance of the HLS project:
- Prepared release 2.13.0.0 (HLS PR #4785)
- Tackled various CI issues (HLS PR #4863, HLS PR #4812, HLS PR #4811)
- Updated the advertised range of supported GHC versions (HLS PR #4801, HLS PR #4799)
Hannes and Zubin implemented some fixes to Windows CI (HLS PR #4800, HLS PR #4768).
Hannes merged the hls-module-name-plugin into hls-rename-plugin in HLS PR #4847.
Hannes improved the robustness of the hls-call-hierarchy-plugin-tests in HLS PR #4834 by using VirtualFileTree.
Hannes also worked on hie-bios:
- Preparing release 0.18.0.0 (hie-bios PR #496)
- Updated the supported GHC versions (hie-bios PR #495)
- Adapted to GHC migrating some parts of its codebase to use OsPath (hie-bios PR #493)

Haskell Debugger

Rodrigo continued work on the new Haskell Debugger.

Matt and Rodrigo introduced a DSL for evaluation on the remote process, which allows the debuggee to be queried from a custom instance, making it possible to implement visualisations which rely on e.g. evaluatedness of a term (#139).
Matt improved support for exceptions: break-on-exception breakpoints now provide source locations (#165).
Rodrigo allowed call stacks to be inspected in the debugger (#158).
Hannes introduced support for stack decoding and viewing custom stack annotations (#172).
Rodrigo made the Haskell Debugger use the external interpreter (#170), which paves the way for multi-threaded debugging (see also #140). This change also allowed Rodrigo to implement Windows support (#184) with the help of Hannes.
Matt fixed a bug in the handling of data constructors with constraints (#175).
Hannes improved caching in the CI (#173).

`ghc-debug`

Matt and Hannes fixed several issues with AP_STACK closures (!79, !80, !86).
Hannes implemented asynchronous heap traversal in ghc-debug-brick, making the interface more responsive (!78).
Hannes added history navigation and search caching to the ghc-debug-brick interface (!83).
Hannes added a summary row to the string counting table view (!81), and fixed the search limit not being honoured during incremental searches (!76).

Did Ahmes find the best expansions for 2/n?

2026-03-17T13:28:00Z

A couple of years back I was discussing the Rhind Mathematical Papyrus (RMP). It includes a table expressing as a sum $$\frac1{a_1}+\frac1{a_2}+\dots+\frac1{a_k} $$ fractions with numerator 1 (“unit fractions”). I said:

Getting the table of good-quality representations of is not trivial, and requires searching, number theory, and some trial and error. It's not at all clear that .

Today I wondered: did Ahmes (the author) have the best possible expansions for all the values, or were there some improvements the Egyptians had missed?

It turns out, yes! Or rather, maybe!

In On the Egyptian method of decomposing into unit fractions the author, Abdulrahman A. Abdulaziz, points out that for the Rhind Mathematical Papyrus gives the expansion $$\frac2{95} = \frac1{60} + \frac1{380} + \frac1{570}$$

but so it could have been written as $$\frac2{95} = \frac1{60}+\frac1{228}.$$

But wait, maybe that wasn't an error. The Egyptians, like everyone, often had to multiply by 10. (In fact, the RMP itself, right after its table, has a shorter table of expansions of .) And is trivially multiplied by 10, whereas isn't. There is some indication that Ahmes preferred fractions with even denominators, because they are easier to double, and the usual Egyptian method of multiplication required repeated doubling. But the Egyptians also sometimes decupled while multiplying, and the expansion would have made both of those easy.

The methods by which Ahmes chose the expansions of , and the criteria by which he preferred one to another, are still unknown; he doesn't explain them. So it's tough to say that any item was or wasn't “best” from Ahmes' point of view.

A sufficiently detailed spec is code

2026-03-17T00:00:00Z

Specifications do not address the limitations of agentic coding

To Flip Or Not To Flip

2026-03-16T12:00:03Z

A fair coin, an unfair offer, and the price of certainty.

I sat down to work out a classic probability problem numerically, and accidentally built a casino.

The Problem of Points

In 1654, a gambler named Antoine Gombaud posed a question to Blaise Pascal: two players are in a race to win a certain number of points. The game is interrupted. How should they divide the pot?

Pascal wrote to Fermat, and their correspondence became one of the founding documents of probability theory. The answer is elegant: if you need a more points and your opponent needs b more, you can compute the fair split with a simple recurrence. Let P(a, b) be your probability of winning:

P(0, b) = 1 — you just won
P(a, 0) = 0 — your opponent just won
P(a, b) = ½ · P(a−1, b) + ½ · P(a, b−1)

Every value in this table is a fraction with a power-of-2 denominator, and the numerators are just Pascal’s triangle. Beautiful math, clean solution, problem solved since the 17th century.

I built an interactive table to explore it. And then I thought: what if this were a game?

The Game

You and The House race to a target score. Each round, a fair coin is flipped — heads you score, tails The House scores. First to the target wins a pot of money.

But before each flip, judges look at the current game state, consult the probability table, and offer you cash to walk away. Accept, and you take the money. Decline, and the coin is flipped.

The question, every single round, is: to flip or not to flip?

You can play at willowdale.online/flip.

How the Judges Set Their Offers

The judges know the exact fair value of your position — they have the same formula Pascal and Fermat computed. If you have a 37.5% chance of winning a $10,000 pot, your fair value is $3,750.

But they don’t offer fair value. They offer the nearest “clean” fraction of the pot that sits strictly below your true odds.

“Clean” means small denominators whose only prime factors are 2, 3, and 5 — fractions like 1/3, 3/8, 7/20, nothing with a denominator above 20. These produce dollar amounts that look like something a human came up with: $3,333, $3,750, $3,500. Not $3,077 or $3,846, which look like someone ran the numbers to the last penny.

So if your fair value is $3,770 (193/512 of the pot), the judges offer $3,750 (3/8). Barely below fair, and a beautifully round number. If your fair value is $1,875 (3/16), they offer $1,666 (1/6). An 89% offer — a real discount, but still a clean, human-sounding number.

This matters psychologically. Round numbers feel like ballpark estimates — casual, generous, not fully analyzed. Precise numbers feel calculated. When the judges offer $7,500, it sounds reasonable. If they offered $7,517, you’d immediately suspect they did the math and it’s in their favor. The irony is that $7,517 is a better deal for you — but I think you’d be less likely to take it. The round number keeps your guard down.

The algorithm is deterministic — same game state, same offer every time. Just math dressed up in a game show contract.

Why People Sign

Since the offers are always strictly below fair value, the play that maximizes your expected winnings is to never accept a deal. The coin is fair, the game has zero house edge, and every offer leaves money on the table. A player who always flips would win 50% of their games and, on average, neither gain nor lose.

And yet.

When you’re ahead 4–3 in a race to 10, and the contract says $6,000, and you’ve already paid $5,000 to enter this game… you hesitate. That’s a guaranteed profit. The alternative is variance — maybe you win $10,000, but you are not that far ahead. Maybe your luck turns and you lose everything.

You know the offer is below fair. You can peek behind the curtain and see the exact numbers. The judges are shortchanging you by $128. But $128 feels like nothing when the alternative is watching your lead evaporate flip by flip.

So you sign. And $128 goes into the casino’s pocket.

This is what makes the game unusual. In blackjack or roulette, the house edge is baked into the rules — you can’t avoid it no matter how disciplined you are. Here, the game has no edge at all. The coin is fair. The race is symmetric. The only source of profit is human nature. Every dollar the casino makes is expected value that a player voluntarily left on the table.

Play for a while and you start to notice specific situations where the offer gets harder to refuse.

Managing risk. A guaranteed $7,500 is safer than a coin flip worth $7,734. In real life, you might need that money for rent. Variance has a real cost, and paying a premium for certainty can be entirely rational. There is a sophisticated argument for sometimes making decisions that reduce your expected value: bankroll management, survival probability and duration. Here the stakes are fictional, your bankroll buys nothing except more fair coin flips, and going broke is solved by refreshing your web browser, so that case is weaker — but it doesn’t feel weaker when your bankroll is shrinking and the judges are holding out real-looking money.

Mis-anchoring. The rational comparison is always between the offer and the expected value of continuing to flip. But that’s rarely the comparison your brain actually makes. If you were staring at a $0 offer last round and now the judges are offering $500, you’re comparing to the $0 — not to the $625 fair value. If your bankroll started at $10,000 and you’re down to $7,000, and the judges offer $3,200, you’re comparing to $10,000 — because taking the deal would put you above where you started. In both cases, the reference point that feels relevant has nothing to do with the expected value of this game.

Black and white thinking. When you’re behind in the race, the most likely single outcome is that you lose. If the judges offer $500 and your odds of winning are 6%, it feels like a choice between $500 and nothing. But expected value accounts for the 6% — the rare wins are big enough to compensate for all the losses across many games. You just don’t experience many games at once. You experience this one, where you’ll probably lose, and where the person who took $500 looks smart 94 times out of 100.

Imaginary momentum. You lose three flips in a row and it feels like the coin has turned against you — time to take the deal before things get worse. Or you win three in a row and feel like you’re on a streak that shouldn’t be interrupted. The coin has no memory. Each flip is independent. But the human brain is a pattern-recognition machine, and it will find narratives in random sequences whether they’re there or not.

The Optimal Judges

The judges in this game are clever, but simple — they mechanically pick the nearest clean fraction below fair value, blind to everything except the current expected value.

But the optimal offer would be very different. The right objective isn’t just the EV gap (fair value minus offer). It’s:

EV gap × P(acceptance | entire game trajectory)

A huge gap with low acceptance is worthless — the player just turns it down. A tiny gap with high acceptance is pennies. The sweet spot is a moderate discount the player almost can’t refuse.

And that acceptance probability depends on far more than just the current score — it depends on everything described above: the bankroll trajectory, the recent streak, what the last offer was, how long the player has been sitting there.

A perfect judge would think about all of this, and decide exactly what it can get you — tired, frustrated, scared little you — to accept. The clean-fraction heuristic doesn’t. And yet it still works. I still sign those offers.

The Lesson

The game is a playable demonstration of why casinos stay in business, maybe even why people accept below-market returns for safety, and why insurance companies are profitable.

The math is always available — right there behind a curtain. If your goal is to maximize expected dollars, the answer is always to flip the coin. And yet, round after round, the judges offer deals, and I sign them.

Play the game at willowdale.online/flip. It’s free, the coin is fair, and you will almost certainly take a deal you know you shouldn’t.

GHC 9.12.4-rc1 is now available

2026-03-13T00:00:00Z

GHC 9.12.4-rc1 is now available

wz1000 - 2026-03-13

The GHC developers are very pleased to announce the availability of the release candidate for GHC 9.12.4. Binary distributions, source distributions, and documentation are available at downloads.haskell.org and via GHCup.

GHC 9.12.4 is a bug-fix release fixing many issues of a variety of severities and scopes, including:

Fixed a critical code generation regression where sub-word division produced incorrect results (#26711, #26668), similar to the bug fixed in 9.12.2
Numerous fixes for register allocation bugs, preventing data corruption when spilling and reloading registers (#26411, #26526, #26537, #26542, #26550)
Fixes for several compiler crashes, including issues with CSE (#25468), SetLevels (#26681), implicit parameters (#26451), and the type-class specialiser (#26682)
Fixed cast worker/wrapper incorrectly firing on INLINE functions (#26903)
Fixed LLVM backend miscompilation of bit manipulation operations (#20645, #26065, #26109)
Fixed associated type family and data family instance changes not triggering recompilation (#26183, #26705)
Fixed negative type literals causing the compiler to hang (#26861)
Improvements to determinism of compiler output (#26846, #26858)
Fixes for eventlog shutdown deadlocks (#26573) and lost wakeups in the RTS (#26324)
Fixed split sections support on Windows (#26696, #26494) and the LLVM backend (#26770)
Fixes for the bytecode compiler, PPC native code generator, and Wasm backend
The runtime linker now supports COMMON symbols (#6107)
Improved backtrace support: backtraces for error exceptions are now evaluated at throw time
NamedDefaults now correctly requires the class to be standard or have an in-scope default declaration, and handles poly-kinded classes (#25775, #25778, #25882)
… and many more

A full accounting of these fixes can be found in the

release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

This release candidate will have a two-week testing period. If all goes well the final release will be available the week of 26 March 2026.

GHC development is sponsored by:

As always, do give this release a try and open a ticket if you see anything amiss.

Functional Valhalla?

2026-03-12T11:17:02Z

Pointer-rich data layouts lead to suboptimal performance on modern hardware. For an excellent introduction to this, see the article The Road to Valhalla. While it is specifically about Java, many parts of the article also apply to other languages. To summarize some of the key points of the article:

In 1990, a main memory fetch was about as expensive as an arithmetic operation. Now, it might be a hundred times slower.
A pointer-rich data layout involving indirections between data at different locations is not ideal for today’s hardware.
A language should make flat (cache-efficient) and dense (memory-efficient) memory layouts possible without compromising abstraction or type safety.

Consider a vector of records (or tuples, structures, product types - I’ll stay with “record” in this article). A pointer-rich layout has each record allocated separately in the heap, with a vector containing pointers to the records. For example, given a “Point” record of two numbers:

The flat and dense layout has the records directly in the array:

(Note that there is another flat layout, namely, using one vector per field of the record. This is better suited to instruction-level parallelism or specialized hardware (e.g., GPUs), especially when the record fields have different sizes. But it is less suited for general-purpose computing, as reading a single vector element requires one memory access per field, whereas the “vector of records” layout above requires only one access per record. Such a layout can be easily implemented in any language that has arrays of native types, whether in the language itself or in a library (e.g., OCaml’s Owl library). Thus, in this article, I will only consider the “array of records” layout above.)

Functional language considerations

Things should be much easier in functional languages than in Java: we have purity, referential transparency, and everything is a value. So it should be simple enough to store these values in memory in their native representation. But there are reasons that that is often not the case in practice:

Lazyness: a value can be a computation that produces a value only when needed.
Layout polymorphism: unless we replicate the code for every type (as, for example, Rust does), we need to be able to store every possible value in the same kind of slot.
Dynamically typed languages require type information at runtime.
Functional languages often have automatic memory management, which may require runtime type information.
Many of our languages are not purely functional, but contain impure features.
Pure languages often lack traditional vectors or arrays, since making them perform well in immutable code is not easy.
Historical reasons: Graph reduction was a common implementation technique for lazy languages, and graphs involve pointers.
Implementation restrictions: not being mainstream, fewer resources are devoted to implementation and optimization.

Many implementations can not even lay out native types flat in records, so a Point record of IEEE 754 double-precision numbers may actually look like this in memory:

The (very short) List

So, given a record type, which functional languages allow a collection of values of that type to have a flat, linear memory layout? The number of programming languages that claim to be “functional” is huge, so the ones listed here are just a selection based on my preferences - mainly languages that allow that layout, and some I have some experience with and can speculate on how easy or hard it would be to add that as a library or extension.

Since the Point record can be misleading in its simplicity when it comes to the question of whether the functionality could be implemented as a library, I’ll point out that there are records where the layout is a bit more interesting:

Records containing different types with different storage sizes, for example, one 64-bit float and one 32-bit integer. On most architectures, this will require 4 bytes of padding between elements.
Records containing native values along with something that has to be represented as a pointer, for example, a reference-type or a lazy value. In a flat layout, this means that every nth element will be a pointer, requiring special support from the memory management system, either by providing layout information or by using a conservative GC that treats everything as a potential pointer.

Pure languages:

Clean

Yes: Clean has unboxed arrays of records in the base language.

Caveat: it does not have integer types of specific sizes and only one floating-point type, making it harder to reduce memory usage by using the smallest type just large enough to support the required value range. It seems possible to implement such types in a library (the mTask system does that).

Futhark

No. Futhark does not intend to be a general-purpose language, so this is not surprising.

I mention it here because it does have arrays of records, but, since it targets GPUs and related hardware, it uses the “record of arrays” layout mentioned above.

Haskell

Yes. Not in the base language, but there is library support via Data.Vector.Unboxed. Types that implement the Unbox type class can be used in these vectors. Many basic types and tuples have an Unbox instance. However, when you care about efficiency, you probably do not want to use tuples but rather a data type with strict fields, i.e., not:

type Point = (Double, Double)

but:

data Point = Point !Double !Double

Writing an Unbox instance for such a type is not trivial. The vector-th-unbox library makes it easier, but requires Template Haskell. Unboxed vectors are implemented by marshalling the values to byte arrays, so records with pointer fields are not supported.

Impure Languages

F#

Yes, even records with pointer fields. Records have structural equality, and you can use structs or the [] attribute to get a flat layout.

And that’s all I could find. Unless I follow Wikipedia's list of functional programming languages, which contains languages such as C++, C#, Rust, or Swift, that allow the flat layout, but don’t really fit my idea of a functional language. But SML, OCaml, Erlang (Elixir, Gleam), Scala? Not that I could see (but please correct me if I’m wrong).

Rolling your own

Since there is a library implementation for Haskell, maybe that’s a possibility for other languages?

You should be able to implement flat layouts in any language that supports byte vectors. More interesting is how well such a library fits into the language, and whether a user of the library has to write code or annotations for user-defined record types, or whether the library can handle part or all of that automagically.

I’ll only mention my beloved Lisp/Scheme here. Lisp’s uniform syntax and macro system are a bonus here, but the lack of static typing makes things harder.

In Scheme, R6RS (and R7RS with the help of some SRFIs) has byte-vectors and marshalling to/from them in the standard library. But Scheme does not have type annotations, so you either need to offer a macro to define records with typed fields or to define how to marshal the fields of a regular (sealed) record. Since you can shadow standard procedures in a library, you can write code that looks like regular Scheme code, but, perhaps surprisingly, loses identity when storing/retrieving values from records:

(let ((vec (make-typed-vector 'point 1000))
      (pt (make-point x y)))
  (vector-set! vec 0 pt)
  (eq? (vector-ref vec 0) pt))
 ⇒ #f

(But then, you probably shouldn’t be using eq? when doing functional programming in Scheme).

The same approach is possible in Common Lisp. In contrast to Scheme, it does have optional type annotations, and, together with a helper library for accessing the innards of floats and either the meta-object protocol to get type information or (probably better) a macro to define typed records, an implementation should be reasonably straightforward. Making it play nice with inheritance and the dynamic nature of Common Lisp (e.g., adding slots to classes or even changing an object's class at runtime) would be a much harder undertaking.

Conclusion

Of the functional languages I looked at, only F# fully supports flat and dense memory layouts. Among the pure languages, Haskell and Clean come close.

The question is how important this really is. There’s a good argument to be made for turning to more specialized languages like Futhark if you mainly care about performance. On the other hand, having a uniform codebase in one language also has advantages.

Then, the performance story has changed, too. While the points Project Valhalla raises remain true in principle, processor designers are aware of this as well. They are doing their best to hide memory latency with techniques such as out-of-order execution or humongous caches. Thus, on a modern CPU, the effects of a pointer-rich layout are often only observable with large working set sizes.

Still, given the plethora of imperative language that can get you to Valhalla, support for this in the functional landscape seems lacking. In the future, I hope to see more languages or libraries that will make this possible.

Teaching Claude to Be Lazy

2026-03-10T00:00:00Z

I’ve been watching AI development for a long time. I found LessWrong around 2012-2013, and managed to get myself worked up about the oncoming singularity. I managed to chill out about it, but interest and excitement for AI remained. The initial Deep Dream image generation, Alpha Go, etc, were all so exciting. And then GPT-2 came out.

Over the last five years, people have been making wild claims about the utility of present AI. Not “the AI that you’ll have soon,” but the current generation stuff. And the results, frankly, had been garbage. A sea of garbage coating the internet. I’d try using the tools, and when checking them against my own expertise or knowledge, they always fell short.

I heard the noise on Twitter after Opus 4.5 was released in November of 2025. Seemed like a step change- people were much more impressed with it than prior versions. In December, I decided to give it a try. Opus 4.5, with significant guidance, properly diagnosed and fixed some Template Haskell code generation issues. It knew how to -ddump-splices, it knew how to read those splices and diagnose the issue. Given a small, highly mechanical problem, plenty of examples, and a ton of tests, it took about 6 hours to do what I felt would have taken me 3 or 4 hours.

This is pretty incredible, because my productivity has always been limited by two things:

Effort. Literally whacking my keyboard and staring at computer and waiting on a compile/test loop to tell me what to do next.
Attention. Where I’m focusing my effort. My editor? Slack? Meetings? A bike ride? Cello? Which OSS project?

Now, with Opus 4.5, I can set a robot going and do something else with my effort. While Claude Code was spinning on the Template Haskell code, I was doing another project in a different repository. Sure, Claude took 6 hours instead of my 3, but I was able to fill those 6 hours with effort and attention placed elsewhere - not a full 6, as Claude required supervision and input, but call it 5. This is a positive investment, and my personal “break even” moment.

Using Claude Code Effectively

In mid February, I got access to an API token and unlimited usage. I’ve been trying to figure out how to leverage this tool to improve my productivity, and the results have been pretty strongly positive.

The brief tl;dr:

It’s the same shit that makes humans good at software development

Haskell is Awesome for LLMs

This was true with Opus 4.5 and is much more true with Opus 4.6. Prior versions of LLM coding agents produced utter garbage with Haskell, most likely due to the relatively low quantity of examples. It seems like the AI labs have figured out how to do higher quality training with less data, and the relatively high average quality of Haskell code helps the LLMs generate relatively high quality Haskell.

Haskell’s type safety, purity, and library design opportunities make it a fantastic choice for LLM generated code. The human developer can easily specify a solution and let Claude fill in a surprising amount of the boring details.

Haskell’s terse nature benefits LLMs - you can simply fit more tokens into the context window when the tokens are more semantically dense.

Funny enough, all of Haskell’s benefits “for LLMs” are also benefits of Haskell for humans. I do earnestly believe that if all devs knew Haskell, we would consider switching to other languages only very rarely. And Claude knows Haskell.

Software Engineering Matters

Claude Code works really well with tightly scoped issues, lots of tests and examples, and good safety guardrails. I asked it to make cabal faster, and taught it how to run cabal with debug logs, timings, and then to build a profiled version of it. Then it looped for a bit, collected timing information on our codebase, and figured out the critical path and hot spot - the solver. Then it made several fixes to optimize the solver. These changes resulted in a 30% improvement in solver times, which shaved 2 seconds off every cabal repl invocation- a pretty nice benefit, since that happens virtually anytime you want to do anything in our codebase.

But this only worked because the cabal library had timing logs, and I gave it a quick feedback loop and target. I’ve had Claude Code totally fall over when trying to do bigger or more undirected work.

Fortunately, Claude can do this pretty well. I’ve had Claude do some exploratory research (generally pretty highly supervised), then generate some plans for improvement (then edited and clarified), and it can then do a good job of writing up a ticket- certainly better than almost all human written tickets I’ve seen.

Build Workflows Iteratively

LLMs can do anything. But they are expensive, slow, and non-deterministic (and often incorrect). So get the LLM to help with replacing themselves - build a tool or skill to do the thing faster and deterministically.

My Claude sessions generally progress from “highly supervised, exploratory work” to “mostly unsupervised, automated work.” Early sessions in a project often involve having Claude build tools - CLI scripts, libraries, interfaces - that it can use in later work to make the job easier. A surprisingly effective prompt here is “What tools would help you do this job better next time?” At the end of a session, I’ll also have Claude review and update its skill documentation with everything I told it to do differently.

So each work session with Claude produces:

An artifact (the work itself)
Often, updates to the skill to improve efficiency on further work
Sometimes, a tool to deterministically do some chunk of the work.

This process ends up reducing the highly non-deterministic LLM tool with a much more deterministic tool.

Mock Reviews and Refactoring

You can ask Claude to review code, and that works OK. But Claude works much better if you ask it to assume someone else’s perspective. I’ve asked it to mimic myself and it did Alright. I asked it to mimic Edward Kmett, Alexis King, and Michael Snoyman, and it did Alright - it noticed different things with each perspective and suggested improvements in line with those perspectives.

I’ve generally found that the initial output is of poor to middling quality. But you can get decently far with “now make it more legible/faster/more correct” or “apply ‘Parse, Don’t Validate’ here” etc. After several rounds of refactoring, it makes stuff that I’m reasonably happy with putting my name on.

What Doesn’t Work Well

Claude isn’t a replacement for human engineering (“yet” i guess). It lacks qualities like taste, judgement, and vision, that are generally required in subjective work like software and product design. So when I let Claude run totally loose on something, it produces, but it produces poor quality code and poorly thought out features.

I haven’t had great luck with getting Claude to iterate on this itself. When given the very large picture, it sort of flounders. It can do some analysis and subdivision, but the divisions are often somewhat unnatural and don’t feel right to me.

Defined by our Vice

If the above complaint is about Claude’s lack of virtue, let me also complain about Claude’s lack of vice. Claude is infinitely patient and willing to work very hard. However, “infinitely patient” means that Claude has no problem at all waiting an hour for a build to finish. You have to teach it to use faster tools and feedback loops.

Likewise, “hardworking” is a virtue when you’re paying a human by the month and trusting in their laziness to be efficient, but when you’re paying per unit of thought, “more work” means “more cost” and often not “more output.” You have to tell Claude to stop doing stuff or to do stuff more efficiently.

Fortunately, Claude is relatively teachable - but Claude very often will start a skill and then do a lot of “research and understanding” before running the one-shot script to generate the compile-errors to track down and fix.

Humans are impatient and lazy, so we build fast and efficient systems. Without pain to guide us, we make little progress in reducing that pain.

Am I still a skeptic?

I’ve been using AI to write 95% of my code for the last month. And yet, I still feel like I’m more on the skeptic side of things. AI is clearly a useful tool - my own productivity has doubled or more while maintaining my personal quality bar. But it’s not a do-it-all miracle - yet?

AI-first companies are experiencing massive reliability issues. Vibe coding projects start, enjoy some success, and then go down in flames.

Humans are clearly still necessary at key points in the software lifecycle. The bottlenecks have shifted, though, and the easiest parts of my job have been mostly automated. What’s coming next?

I’m excited to wait and find out.

Programmers will document for Claude, but not for each other

2026-03-09T08:04:00Z

A couple of days ago I recounted a common complaint:

I keep seeing programmers say how angry it makes them that people are willing to write detailed CLAUDE.md and PROJECT.md files for Claude to use, but they weren't willing to write them for their coworkers.

For larger projects, I've taken to having Claude maintain a handoff document that I can have the next Claude read, saying what we planned to do, what has been done, and other pertinent information. Then when I shut down one Claude I can have the next one read the file to get up to speed. Then I have the Claude update it for Claude .

After seeing the common complaint enough times I had a happy inspiration. I'd been throwing away Claude's handoff documents at the end of each project. Why do that? It's no trouble to copy the file into the repository and commit it. Someone in the future, wondering what was going on, might luckily find the right document with git grep and learn something useful.

I'm a little slow so it took me until this week to think of a better version of this: at the end of the project I now ask Claude to write up from scratch a detailed but high-level explanation of what problem we were solving and what changes we made, and I commit that. Not just running notes, but a structured overview of the whole thing.

I review these overviews carefully and make edits as necessary before I check them in. It's my signature on the commit, and my bank account receiving the paycheck, so nothing goes into the repository that I haven't read carefully and understood, same as if Claude were a human programmer under my supervision.

But Claude's explanations haven't required much editing. Claude's most recent project summary was around as good as what I could have written myself, maybe a little worse and maybe a little better. But it took ten seconds to write instead of an hour, and it didn't take anything like an hour to review.

The serious thing I had to fix the last time around was that Claude had used a previous, related report as a model, and the previous report had had a paragraph I had added at the end that said:

# Approved-by

Claude abstracted these notes from our discussions of the issue. Mark Dominus has read, reviewed, edited, and approved these notes.

Claude's new document had an identical section at the end. Oops! Fortunately, by the time I saw it, it was true, so I didn't have to delete it. I had Claude add a sentence to CLAUDE.md to tell it not to do this again.

My advice for the day:

If you have Claude write down notes, check them into the repo when you're done. It probably can't hurt and it might help.
Have Claude write a project summary, and then check it into the repo.

Maybe this is obvious? But it wasn't obvious to me. I'm still getting used to this new world.

78: Jamie Willis

2026-03-08T11:00:00Z

In this episode, we focus on a particular part of Haskell: teaching it. To help us, we are joined by Jamie Willis who is a Teaching Fellow at Imperial College London. The episode explores the benefits of live coding, and why Haskell is the best language for teaching programming.

How are John Waters movies like James Bond movies?

2026-03-08T09:50:00Z

A number of years ago I wondered how many movies I had seen. The only way I could think of finding out was just to make a list. This I did as best I could. (It turned out to be around 700.)

I found, though, that I could not include all the James Bond movies I had seen, because I couldn't tell them apart from the descriptions. I'd read a plot summary for a James Bond movie, and ask myself “Did I see that? I don't know, it sounds like every other James Bond movie.”

Today I discovered that John Waters movies are like that also. I was trying to remember if I had seen A Dirty Shame:

The people of Harford Road are firmly divided into two camps: the neuters, the puritanical residents who despise anything even remotely carnal; and the perverts, a group of sex addicts whose unique fetishes have all been brought to the fore by accidental concussions. Repressed Sylvia Stickles finds herself firmly entrenched in the former camp.

You'd think that would be something I would remember decisively, or not. But I'm really not sure. All I can do is shrug and say “I don't know, it sounds like a John Waters movie I have seen, but maybe it wasn't that one.”

Looking into it further I discovered that I also wasn't sure if I had seen Multiple Maniacs. In it, Divine's character is raped by a giant lobster. On the one hand, that seems like the sort of thing I would remember. And I think maybe I do? But again I'm not sure I'm not just imagining what it would be like!

Documentation is a message in a bottle

2026-03-05T16:07:00Z

Our company is going to a convention later this month, and they will have a booth with big TV screens showing statistics that update in real time. My job is to write the backend server that delivers the statistics.

I read over the documents that the product people had written up about what was wanted, asked questions, got answers, and then turned the original two-line ticket into a three-page ticket that said what should be done and how. I intended to do the ticket myself, but it's good practice to write all this stuff down, for many reasons:

Writing things down forces me to think them through carefully and realize what doesn't make sense or what I still don't understand.
I forget things easily and this will keep the plan where I can find it.
I might get sick, and if someone else has to pick up the project this might help them understand what I was doing.
If my boss gets worried that all I do is post on 4chan all day, this is tangible work product that proves I did something else that might have enhanced shareholder value.
If I'm tempted to spend the day posting on 4chan, and then to later claim I spent the time planning the project, I might fool my boss. But without that tangible work product, I won't be able to fool myself, and that's more important.
Conversely if I later think back and ask “What was I doing the week of March 2?” I might be tempted to imagine that all I did was post on 4chan. But the three pages of ticket description will prove to me that I am not just a lazy slacker. This is a real problem for me.
In principle, a future person going back to extend the work might find this helpful documentation of what was done and why. Does this ever really happen? I don't know, but it might.
I like writing because writing is fun.

A few days after I wrote the ticket, something unexpected happened. It transpired that person who was to build the front-end consumer of my statistics would not be a professional programmer. It would be the company's Head of Product, a very smart woman named Amanda. The actual code would be written by Claude, under her supervision.

I have never done anything like this before, and I would not have wanted to try it on a short deadline, but there is some slack in the schedule and it seemed a worthwhile and exciting experiment.

Amanda shared some screencaps of her chats with Claude about the project, and I suggested:

When you get a chance, please ask Claude to write out a Markdown file memorializing all this. Tell it that you're going to give it to the backend programmer for discussion, so more detail is better. When it's ready, send it over.

Claude immediately produced a nine-page, 14-part memo and a half-page overview. I spent a couple of hours reviewing it and marking it up.

It became immediately clear that Claude and I had very similar ideas about how the project should go and how the front and back ends would hook up. So similar that I asked Angela:

It looks like maybe you started it off by feeding it my ticket description. Is that right?

She said yes, she had. She had also fed it the original product documents I had read.

I was delighted. I had had many reasons for writing detailed ticket descriptions before, but the most plausible ones were aimed back at myself.

The external consumers of the documentation all seemed somewhat unlikely. The person who would extend the project in the future probably didn't exist, and if they did they probably wouldn't have thought to look at my notes. Same for the hypothetical person who would take over when I got sick. My boss probably isn't checking up on me by looking at my ticketing history. Still, I like to document these things for my own benefit, and also just in case.

But now, because I had written the project plan, it was available for consumption when an unexpected consumer turned up! Claude and I were able to rapidly converge on the design of the system, because Amanda had found my notes and cleverly handed them to Claude. Suddenly one of those unlikely-seeming external reasons materialized!

On Mastodon I keep seeing programmers say how angry it makes them that people are willing to write detailed CLAUDE.md and PROJECT.md files for Claude to use, but they weren't willing to write them for their coworkers. (They complain about this as if this is somehow the fault of the AI, rather than of the people who failed in the past to write documentation for their coworkers.)

The obvious answer to the question of why people are willing to write documentation for Claude but not for their coworkers is that the author can count on Claude to read the documentation, whereas it's a rare coworker who will look at it attentively.

Rik Signes points out there's a less obvious but more likely answer: your coworkers will remember things if you just tell them, but Claude forgets everything every time. If you want Claude to remember something, you have to write it down. So people using Claude do write things down, because otherwise they have to say them over and over.

And there's a happy converse to the complaint that most programmers don't bother to write documentation. It means that people like me, professionals who have always written meticulous documentation, are now reaping new benefits from that always valuable practice.

Not everything is going to get worse. Some things will get better.

Addendum 20260208

A corollary: You don't have to write the rocumentation yourself. You can have Claude write a detailed summary based on your ongoing chats about the work, and then you can edit it and check it in.

If you're good at editing, anyway. I wonder if part of the reason Claude is working so well for me is that I'm really good at editing and at code review?

Monuses and Heaps

2026-03-03T00:00:00Z

Posted on March 3, 2026

Tags: Haskell

This post is about a simple algebraic structure that I have found useful for algorithms that involve searching or sorting based on some ordered weight. I used it a bit in a pair of papers on graph search (2021; 2025), and more recently I used it to implement a version of the Phases type (Easterly 2019) that supported arbitrary keys, inspired by some work by BlÃ¶ndal (2025a; 2025b) and Visscher (2025).

The algebraic structure in question is a monus, which is a kind of monoid that supports a partial subtraction operation (that subtraction operation, denoted by the symbol âˆ¸, is itself often called a â€œmonusâ€�). However, before giving the full definition of the structure, let me first try to motivate its use. The context here is heap-based algorithms. For the purposes of this post, a heap is a tree that obeys the â€œheap propertyâ€�; i.e.Â every node in the tree has some â€œweightâ€� attached to it, and every parent node has a weight less than or equal to the weight of each of its children. So, for a tree like the following:

   â”Œd
 â”Œbâ”¤
aâ”¤ â””e
 â””c

The heap property is satisfied when $a \leq b$ , $a \leq c$ , $b \leq d$ , and $b \leq e$ .

Usually, we also want our heap structure to have an operation like popMin :: Heap k v -> Maybe (v, Heap k v) that returns the least-weight value in the heap paired with the rest of the heap. If this operation is efficient, we can use the heap to efficiently implement sorting algorithms, graph search, etc. In fact, let me give the whole basic interface for a heap here:

popMin :: Heap k v -> Maybe (v, Heap k v)
insert :: k -> v -> Heap k v -> Heap k v
empty  :: Heap k v

Using these functions itâ€™s not hard to see how we can implement a sorting algorithm:

sortOn :: (a -> k) -> [a] -> [a]
sortOn k = unfoldr popMin . foldr (\x -> insert (k x) x) empty

The monus becomes relevant when the weight involved is some kind of monoid. This is quite a common situation: if we were using the heap for graph search (least-cost paths or something), we would expect the weight to correspond to path costs, and we would expect that we can add the costs of paths in a kind of monoidal way. Furthermore, we would probably expect the monoidal operations to relate to the order in some coherent way. A monus (Amer 1984) is an ordered monoid where the order itself can be defined in terms of the monoidal operations¹:

$x \leq y \iff \exists z. \; y = x \bullet z$

I read this definition as saying â€œ $x$ is less than $y$ iff there is some $z$ that fits between $x$ and $y$ â€�. In other words, the gap between $x$ and $y$ has to exist, and it is equal to $z$ .

Notice that this order definition wonâ€™t work for groups like $(\mathbb{Z},+,0)$ . For a group, we can always find some $z$ that will fit the existential (specifically, $z = (- x) \bullet y$ ). Monuses, then, tend to be positive monoids: in fact, many monuses are the positive cones of some group ( $(\mathbb{N},+,0)$ is the positive cone of $(\mathbb{Z},+,0)$ ).

We can derive a lot of useful properties from this basic structure. For example, if the order above is total, then we can derive the binary subtraction operator mentioned above:

$x âˆ¸ y = \begin{cases} z, & \text{if } y \leq x \text{ and } x = y \bullet z \\ 0, & \text{otherwise.} \end{cases}$

If we require the underlying monoid to be commutative, and we further require the derived order to be total and antisymmetric, we get the particular flavour of monus I worked with in a pair of papers on graph search (2021; 2025). In this post I will actually be working with a weakened form of the algebra that I will define shortly.

Getting back to our heap from above, with this new order defined, we can see that the heap property actually tells us something about the makeup of the weights in the tree. Instead of every child just having a weight equal to some arbitrary quantity, the heap property tells us that each child weight has to be made up of the combination of its parentâ€™s weight and some difference.

   â”Œd              â”Œbâ€¢(dâˆ¸b)
 â”Œbâ”¤       â”Œaâ€¢(bâˆ¸a)â”¤
aâ”¤ â””e  =  aâ”¤       â””bâ€¢(eâˆ¸b)
 â””c        â””aâ€¢(câˆ¸a)

This observation gives us an opportunity for a different representation: instead of storing the full weight at each node, we could instead just store the difference.

     â”Œdâˆ¸b
 â”Œbâˆ¸aâ”¤
aâ”¤   â””eâˆ¸b
 â””câˆ¸a

Just in terms of data structure design, I prefer this version: if we wanted to write down a type of heaps using the previous design, we would first define the type of trees, and then separately write a predicate corresponding to the heap property. With this design, it is impossible to write down a tree that doesnâ€™t satisfy the heap property.

More practically, though, using this algebraic structure when working with heaps enables some optimisations that might be difficult to implement otherwise. The strength of this representation is that it allows for efficient relative and global computation: now, if we wanted to add some quantity to every weight in the tree, we can do it just by adding the weight to the root node.

Monuses in Haskell

To see some examples of how to use this pattern, letâ€™s first write a class for Haskell monuses:

class (Semigroup a, Ord a) => Monus a where
  (âˆ¸) :: a -> a -> a

Youâ€™ll notice that weâ€™re requiring semigroup here, not monoid. Thatâ€™s because one of the nice uses of this pattern actually works with a weakening of the usual monus algebra; this weakening only requires semigroup, and the following two laws.

$x \leq y \implies x \bullet (y âˆ¸ x) = y \quad \quad \quad \quad \quad x \leq y \implies z \bullet x \leq z \bullet y$

A straightforward monus instance is the following:

instance (Num a, Ord a) => Monus (Sum a) where
  (âˆ¸) = (-)

Pairing Heaps in Haskell

Next, letâ€™s look at a simple heap implementation. I will always go for pairing heaps (Fredman et al. 1986) in Haskell; they are extremely simple to implement, and (as long as you donâ€™t have significant persistence requirements) their performance seems to be the best of the available pointer-based heaps (Larkin, Sen, and Tarjan 2013). Here is the type definition:

data Root k v = Root !k v [Root k v]
type Heap k v = Maybe (Root k v)

A Root is a non-empty pairing heap; the Heap type represents possibly-empty heaps. The key function to implement is the merging of two heaps; we can accomplish this as an implementation of the semigroup <>.

instance Monus k => Semigroup (Root k v) where
  Root xk xv xs <> Root yk yv ys
    | xk <= yk  = Root xk xv (Root (yk âˆ¸ xk) yv ys : xs)
    | otherwise = Root yk yv (Root (xk âˆ¸ yk) xv xs : ys)

The only difference between this and a normal pairing heap merge is the use of âˆ¸ in the key of the child node (yk âˆ¸ xk and xk âˆ¸ yk). This difference ensures that each child only holds the difference of the weight between itself and its parent.

Itâ€™s worth working out why the weakened monus laws above are all we need in order to maintain the heap property on this structure.

The rest of the methods are implemented the same as their implementations on a normal pairing heap. First, we have the pairing merge of a list of heaps, here given as an implementation of the semigroup method sconcat:

  sconcat (x1 :| []) = x1
  sconcat (x1 :| [x2]) = x1 <> x2
  sconcat (x1 :| x2 : x3 : xs) = (x1 <> x2) <> sconcat (x3 :| xs)

merges :: Monus k => [Root k v] -> Heap k v
merges = fmap sconcat . nonEmpty

The pattern of this two-level merge is what gives the pairing heap its excellent performance.

The Heap type derives its monoid instance from the monoid instance on Maybe (instance Semigroup a => Monoid (Maybe a)), so we can implement insert like so:

insert :: Monus k => k -> v -> Heap k v -> Heap k v
insert k v hp = Just (Root k v []) <> hp

And popMin is also relatively simple:

delay :: Semigroup k => k -> Heap k v -> Heap k v
delay by = fmap (\(Root k v xs) -> Root (by <> k) v xs)

popMin :: Monus k => Heap k v -> Maybe (v,Heap k v)
popMin = fmap (\(Root k v xs) -> (v, k `delay` merges xs))

Notice that we delay the rest of the heap, because all of its entries need to be offset by the weight of the previous root node. Thankfully, because weâ€™re only storing the differences, we can â€œmodifyâ€� every weight by just increasing the weight of the root, making this an $\mathcal{O}(1)$ operation.

Finally, we can implement heap sort like so:

sortOn :: Monus k => (a -> k) -> [a] -> [a]
sortOn k = unfoldr popMin . foldl' (\xs x -> insert (k x) x xs) Nothing

And it does indeed work:

>>> sortOn Sum [3,4,2,5,1]
[1,2,3,4,5]

Here is a trace of the output:

Trace

Input           Heap:
list:

[3,4,2,5,1]


[4,2,5,1]       3


[2,5,1]         3â”€1


[5,1]           2â”€1â”€1


[1]              â”Œ3
                2â”¤
                 â””1â”€1


[]                 â”Œ3
                1â”€1â”¤
                   â””1â”€1


Output          Heap:
list:

[]                 â”Œ3
                1â”€1â”¤
                   â””1â”€1


[1]              â”Œ3
                2â”¤
                 â””1â”€1


[1,2]            â”Œ2
                3â”¤
                 â””1


[1,2,3]         4â”€1


[1,2,3,4]       5


[1,2,3,4,5]

While the heap implementation presented here is pretty efficient, note that we could significantly improve its performance with a few optimisations: first, we could unpack all of the constructors, using a custom list definition in Root instead of Haskellâ€™s built-in lists; second, in foldl' we could avoid the Maybe wrapper by building a non-empty heap. There are probably more small optimisations available as well.

Retrieving a Normal Heap

A problem with the definition of sortOn above is that it requires a Monus instance on the keys, but it only really needs Ord. It seems that by switching to the Monus-powered heap we have lost some generality.

Luckily, there are two monuses we can use to solve this problem:

instance Ord a => Monus (Max a) where
  x âˆ¸ y = x

instance Ord a => Monus (Last a) where
  x âˆ¸ y = x

The Max semigroup uses the max operation, and the Last semigroup returns its second operand.

instance Ord a => Semigroup (Max a) where
  Max x <> Max y = Max (max x y)

instance Semigroup (Last a) where
  x <> y = y

While the Monus instances here might seem degenerate, they do actually satisfy the Monus laws as given above.

Max and Last Monus laws

Max:

x <= y ==> x <> (y âˆ¸ x) = y
Max x <= Max y ==> Max x <> (Max y âˆ¸ Max x) = Max y
Max x <= Max y ==> Max x <> Max y = Max y
Max x <= Max y ==> Max (max x y) = Max y
x <= y ==> max x y = y

x <= y ==> z <> x <= z <> y
Max x <= Max y ==> Max z <> Max x <= Max z <> Max y
Max x <= Max y ==> Max (max z x) <= Max (max z y)
x <= y ==> max z x <= max z y

Last:

x <= y ==> x <> (y âˆ¸ x) = y
Last x <= Last y ==> Last x <> (Last y âˆ¸ Last x) = Last y
Last x <= Last y ==> Last x <> Last y = Last y
Last x <= Last y ==> Last y = Last y

x <= y ==> z <> x <= z <> y
Last x <= Last y ==> Last z <> Last x <= Last z <> Last y
Last x <= Last y ==> Last x <= Last y

Either Max or Last will work; semantically, thereâ€™s no real difference. Last avoids some comparisons, so we can use that:

sortOn' :: Ord k => (a -> k) -> [a] -> [a]
sortOn' k = sortOn (Last . k)

Phases as a Pairing Heap

The Phases applicative (Easterly 2019) is an Applicative transformer that allows reordering of Applicative effects in an easy-to-use, high-level way. The interface looks like this:

phase     :: Natural -> f a -> Phases f a
runPhases :: Applicative f => Phases f a -> f a

instance Applicative f => Applicative (Phases f)

And we can use it like this:

phased :: IO String
phased = runPhases $ sequenceA
  [ phase 3 $ emit 'a'
  , phase 2 $ emit 'b'
  , phase 1 $ emit 'c'
  , phase 2 $ emit 'd'
  , phase 3 $ emit 'e' ]
  where emit c = putChar c >> return c

>>> phased
cbdae
"abcde"

The above computation performs the effects in the order dictated by their phases (this is why the characters are printed out in the order cbdae), but the pure value (the returned string) has its order unaffected.

I have written about this type before, and in a handful of papers (2021; Gibbons et al. 2022; Gibbons et al. 2023), but more recently BlÃ¶ndal (2025a) started looking into trying to use the Phases pattern with arbitrary ordered keys (Visscher 2025; BlÃ¶ndal 2025b). There are a lot of different directions you can go from the Phases type; what interested me most immediately was the idea of implementing the type efficiently using standard data-structure representations. If our core goal here is to order some values according to a key, then that is clearly a problem that a heap should solve: enter the free applicative pairing heap.

Here is the typeâ€™s definition:

data Heap k f a where
  Pure :: a -> Heap k f a
  Root :: !k -> (x -> y -> a) -> f x -> Heaps k f y -> Heap k f a

data Heaps k f a where
  Nil :: Heaps k f ()
  App :: !k -> f x -> Heaps k f y -> Heaps k f z -> Heaps k f (x,y,z)

We have had to change a few aspects of the original pairing heap, but the overall structure remains. The entries in this heap are now effectful computations: the fs. The data structure also contains some scaffolding to reconstruct the pure values â€œinsideâ€� each effect when we actually run the heap.

The root-level structure is the Heap: this can either be Pure (corresponding to an empty heap: notice that, though this constructor has some contents (the a), it is still regarded as â€œemptyâ€� because it contains no effects (f)); or a Root, which is a singleton value, paired with the list of sub-heaps represented by the Heaps type. Weâ€™re using the usual Yoneda-ish trick here to allow the top-level data type to be parametric and a Functor, by storing the function x -> y -> a.

The Heaps type then plays the role of [Root k v] in the previous pairing heap implementation; here, we have inlined all of the constructors so that we can get all of the types to line up. Remember, this is a heap of effects, not of pure values: the pure values need to be able to be reconstructed to one single top-level a when we run the heap at the end.

Merging two heaps happens in the Applicative instance itself:

instance Functor (Heap k f) where
  fmap f (Pure x) = Pure (f x)
  fmap f (Root k c x xs) = Root k (\a b -> f (c a b)) x xs

instance Monus k => Applicative (Heap k f) where
  pure = Pure
  Pure f <*> xs = fmap f xs
  xs <*> Pure f = fmap ($ f) xs

  Root xk xc xs xss <*> Root yk yc ys yss
    | xk <= yk  = Root xk (\a (b,c,d) -> xc a d (yc b c)) xs (App (yk âˆ¸ xk) ys yss xss)
    | otherwise = Root yk (\a (b,c,d) -> xc b c (yc a d)) ys (App (xk âˆ¸ yk) xs xss yss)

To actually run the heap we will use the following two functions:

merges :: (Monus k, Applicative f) => Heaps k f a -> Heap k f a
merges Nil = Pure ()
merges (App k1 e1 t1 Nil) = Root k1 (,,()) e1 t1
merges (App k1 e1 t1 (App k2 e2 t2 xs)) =
   (Root k1 (\a b cd es -> (a,b, cd es)) e1 t1 <*> Root k2 (,,) e2 t2) <*> merges xs

runHeap :: (Monus k, Applicative f) => Heap k f a -> f a
runHeap (Pure x) = pure x
runHeap (Root _ c x xs) = liftA2 c x (runHeap (merges xs))

And we can lift a computation into Phases like so:

phase :: k -> f a -> Heap k f a
phase k xs = Root k const xs Nil

Stabilising Phases

Thereâ€™s a problem. A heap sort based on a pairing heap isnâ€™t stable. That means that the order of effects here can vary for two effects in the same phase. If we look back to the example with the strings we saw above, that means that outputs like cdbea would be possible (in actual fact, we donâ€™t get any reordering in this particular example, but thatâ€™s just an accident of the way the applicative operators are associated under the hood).

This is problematic because we would expect effects in the same phase to behave as if they were normal applicative effects, sequenced according to their syntactic order. It also means that the applicative transformer breaks the applicative laws, because effects might be reordered according to the association of the applicative operators, which should lawfully be associative.

To make the sort stable, we could layer the heap effect with some state effect that would tag each effect with its order. However, that would hurt efficiency and composability: it would force us to linearise the whole heap sort procedure, where currently different branches of the tree can compute completely independently of each other. The solution comes in the form of another monus: the key monus.

data Key k = !k :* {-# UNPACK #-} !Int deriving (Eq, Ord)

A Key k is some ordered key k coupled with an Int that represents the offset between the original position and the current position of the key. In this way, when two keys compare as equal, we can cascade on to compare their original positions, thereby maintaining their original order when there is ambiguity caused by a key collision. However, in contrast to the approach of walking over the data once and tagging it all with positions, this approach keeps the location information completely local: we never need to know that some key is in the $n$ th position in the original sequence, only that it has moved $n$ steps from its original position.

The instances are as follows:

instance Semigroup (Key k) where
  (xk :* xi) <> (yk :* yi) = yk :* (xi + yi)

instance Ord k => Monus (Key k) where
  (xk :* xi) âˆ¸ (yk :* yi) = xk :* (xi - yi)

This instance is basically a combination of the Last semigroup and the $(\mathbb{Z}, +, 0)$ group. We could make a slightly more generalised version of Key that is the combination of any monus and $\mathbb{Z}$ , but since Iâ€™m only going to be using this type for simple sorting-like algorithms I will leave that generalisation for another time.

The stable heap type is as follows:

data Stable k f a
  = Stable { size :: {-# UNPACK #-} !Int
           , heap :: !(Heap (Key k) f a) }

We need to track the size of the heap so that we can supply the right-hand operand with their offsets. Because weâ€™re storing differences, we can add an offset to every entry in a heap in $\mathcal{O}(1)$ time by simply adding to the root:

delayKey :: Int -> Heap (Key k) f a -> Heap (Key k) f a
delayKey _ hp@(Pure _) = hp
delayKey n (Root (k :* m) c x xs) = Root (k :* (n + m)) c x xs

Finally, using this we can implement the Applicative instance and the rest of the interface:

instance Ord k => Applicative (Stable k f) where
  pure = Stable 0 . pure
  Stable n xs <*> Stable m ys = Stable (n+m) (xs <*> delayKey n ys)

runStable :: (Applicative f, Ord k) => Stable k f a -> f a
runStable = runHeap . heap

stable :: Ord k => k -> f a -> Stable k f a
stable k fa = Stable 1 (phase (k :* 0) fa)

This is a pure, optimally efficient implementation of Phases ordered by an arbitrary total-ordered key.

Local Computation in a Monadic Heap

In (2021), I developed a monadic heap based on the free monad transformer.

newtype Search k a = Search { runSearch :: [Either a (k, Search k a)] }

This type is equivalent to the free monad transformer over the list monad and (,) k functor (i.e.Â the writer monad).

Search k a â‰… FreeT ((,) k) [] a

In the paper (2021) we extended the type to become a full monad transformer, replacing lists with ListT. This let us order the effects according to the weight k; however, for this example we only need the simplified type, which lets us order the values according to k.

This Search type follows the structure of a pairing heap (although not as closely as the version above). However, this type is interesting because semantically it needs the weights to be stored as differences, rather than absolute weights. As a free monad transformer, the Search type layers effects on top of each other; we can later interpret those layers by collapsing them together using the monadic join. In the case of Search, those layers are drawn from the list monad and the (,) k functor (writer monad). That means that if we have some heap representing the tree from above:

Search [ Right (a, Search [ Right (b, Search [ Right (d, Search [Left x])
                                             , Right (e, Search [Left y])])
                          , Right (c, Search [Left z])])]

When we collapse this computation down to the leaves, the weights we will get are the following:

[(a <> b <> d, x), (a <> b <> e, y), (a <> c, z)]

So, if we want the weights to line up properly, we need to store the differences.

mergeS :: Monus k => [(k, Search k a)] -> Maybe (k, Search k a)
mergeS [] = Nothing
mergeS (x:xs) = Just (mergeS' x xs)
  where
    mergeS' x1 [] = x1
    mergeS' x1 [x2] = x1 <+> x2
    mergeS' x1 (x2:x3:xs) = (x1 <+> x2) <+> mergeS' x3 xs

    (xw, Search xs) <+> (yw, Search ys)
      | xw <= yw  = (xw, Search (Right (yw âˆ¸ xw, Search ys) : xs))
      | otherwise = (yw, Search (Right (xw âˆ¸ yw, Search xs) : ys))

popMins :: Monus k => Search k a -> ([a], Maybe (k, Search k a))
popMins = fmap mergeS . partitionEithers . runSearch

Conclusion

The technique of â€œdonâ€™t store the absolute value, store the differenceâ€� seems to be generally quite useful; I think that monuses are a handy algebra to keep in mind whenever that technique looks like it might be needed. The Key monus above is closely related to the factorial numbers, and the trick I used in this post.

References

Amer, K. 1984. â€œEquationally complete classes of commutative monoids with monus.â€� algebra universalis 18 (1) (February): 129â€“131. doi:10.1007/BF01182254.

BlÃ¶ndal, Baldur. 2025a. â€œGeneralized multi-phase compiler/concurrency.â€� reddit. https://www.reddit.com/r/haskell/comments/1m25fw8/generalized_multiphase_compilerconcurrency/.

â€”â€”â€”. 2025b. â€œPhases using Vault.â€� reddit. https://www.reddit.com/r/haskell/comments/1msvwzd/phases_using_vault/.

Easterly, Noah. 2019. â€œFunctions and newtype wrappers for traversing Trees: Rampion/tree-traversals.â€� https://github.com/rampion/tree-traversals.

Fredman, Michael L., Robert Sedgewick, Daniel D. Sleator, and Robert E. Tarjan. 1986. â€œThe pairing heap: A new form of self-adjusting heap.â€� Algorithmica 1 (1-4) (January): 111â€“129. doi:10.1007/BF01840439.

Gibbons, Jeremy, Donnacha OisÃn Kidney, Tom Schrijvers, and Nicolas Wu. 2022. â€œBreadth-First Traversal viaÂ Staging.â€� In Mathematics of Program Construction, ed by. Ekaterina Komendantskaya, 1â€“33. Cham: Springer International Publishing. doi:10.1007/978-3-031-16912-0_1.

â€”â€”â€”. 2023. â€œPhases in Software Architecture.â€� In Proceedings of the 1st ACM SIGPLAN International Workshop on Functional Software Architecture, 29â€“33. FUNARCH 2023. New York, NY, USA: Association for Computing Machinery. doi:10.1145/3609025.3609479.

Kidney, Donnacha OisÃn, and Nicolas Wu. 2021. â€œAlgebras for weighted search.â€� Proceedings of the ACM on Programming Languages 5 (ICFP) (August): 72:1â€“72:30. doi:10.1145/3473577.

â€”â€”â€”. 2025. â€œFormalising Graph Algorithms with Coinduction.â€� Proc. ACM Program. Lang. 9 (POPL) (January): 56:1657â€“56:1686. doi:10.1145/3704892.

Larkin, Daniel H., Siddhartha Sen, and Robert E. Tarjan. 2013. â€œA Back-to-Basics Empirical Study of Priority Queues.â€� In 2014 Proceedings of the Meeting on Algorithm Engineering and Experiments (ALENEX), 61â€“72. Proceedings. Society for Industrial and Applied Mathematics. doi:10.1137/1.9781611973198.7.

Visscher, Sjoerd. 2025. â€œPhases with any Ord key type.â€� https://gist.github.com/sjoerdvisscher/bf282a050f0681e2f737908e254c4061.

Note that there are many related structures that all fall under the umbrella notion of â€œmonusâ€�; the structure that I am defining here is the same structure I worked with in (2021) and (2025).â†©ï¸�

Vibe-coding a debugger for a DSL

2026-02-25T10:53:30Z

Earlier this week a colleague of mine, Emilio Jesús Gallego Arias, shared a demo of something he built as an experiment, and I felt the desire to share this and add a bit of reflection. (Not keen on watching a 5 min video? Read on below.)

What was that?

So what did you just see (or skipped watching)? You could see Emilio’s screen, running VSCode and editing a Lean file. He designed a small programming language that he embedded into Lean, including an evaluator. So far, so standard, but a few things stick out already:

Using Lean’s very extensible syntax this embedding is rather elegant and pretty.
Furthermore, he can run this DSL code right there, in the source code, using commands like #eval. This is a bit like the interpreter found in Haskell or Python, but without needing a separate process, or like using a Jupyter notebook, but without the stateful cell management.

This is already a nice demonstration of Lean’s abilities and strength, as we know them. But what blew my mind the first time was what happened next: He had a visual debugger that allowed him to debug his DSL program. It appeared on the right, in Lean’s “Info View”, where various Lean tools can hook into, show information and allow the user to interact.

But it did not stop there, and my mind was blown a second time: Emilio opened VSCode’s „Debugger“ pane on the left, and was able to properly use VSCode’s full-fledged debugger frontend for his own little embedded programming language! Complete with highlighting the executed line, with the ability to set breakpoints there, and showing the state of local variables in the debugger.

Having a good debugger is not to be taken for granted even for serious, practical programming languages. Having it for a small embedded language that you just built yourself? I wouldn’t have even considered that.

Did it take long?

If I were Emilio’s manager I would applaud the demo and then would have to ask how many weeks he spent on that. Coming up with the language, getting the syntax extension right, writing the evaluator and especially learning how the debugger integration into VSCode (using the DAP protocol) works, and then instrumenting his evaluator to speak that protocol – that is a sizeable project!

It turns out the answer isn’t measured in weeks: it took just one day of coding together with GPT-Codex 5.3. My mind was blown a third time.

Why does Lean make a difference?

I am sure this post is just one of many stories you have read in recent weeks about how new models like Claude Opus 4.6 and GPT-Codex 5.3 built impressive things in hours that would have taken days or more before. But have you seen something like this? Agentic coding is powerful, but limited by what the underlying platform exposes. I claim that Lean is a particularly well-suited platform to unleash the agents’ versatility.

Here we are using Lean as a programming language, not as a theorem prover (which brings other immediate benefits when using agents, e.g. the produced code can be verified rather than merely plausible, but that’s a story to be told elsewhere.)

But arguably because Lean is also a theorem prover, and because of the requirements that stem from that, its architecture is different from that of a conventional programming language implementation:

As a theorem prover, it needs extensible syntax to allow formalizing mathematics in an ergonomic way, but it can also be used for embedding syntax.
As a theorem prover, it needs the ability to run “tactics” written by the user, hence the ability to evaluate the code right there in the editor.
As a theorem prover, it needs to give access to information such as tactic state, and such introspection abilities unlock many other features – such as a debugger for an embedded language.
As a theorem prover, it has to allow tools to present information like the tactic state, so it has the concept of interactive “Widgets”.

So Lean’s design has always made such a feat possible. But it was no easy feat. The Lean API is large, and documentation never ceases to be improvable. In the past, it would take an expert (or someone willing to become one) to pull off that stunt. These days, coding assistants have no issue digesting, understanding and using the API, as Emilio’s demo shows.

The combination of Lean’s extensibility and the ability of coding agents to make use of that is a game changer to how we can develop software, with rich, deep, flexible and bespoke ways to interact with our code, created on demand.

Where does that lead us?

Emilio actually shared more such demos (Github repository). A visual explorer for the compiler output (have a look at the screenshot. A browser-devtool-like inspection tool for Lean’s “InfoTree”. Any of these provide a significant productivity boost. Any of these would have been a sizeable project half a year ago. Now it’s just a few hours of chatting with the agent.

So allow me to try and extrapolate into a future where coding agents have continued to advance at the current pace, and are used ubiquitously. Is there then even a point in polishing these tools, shipping them to our users, documenting them? Why build a compiler explorer for our users, if our users can just ask their agent to build one for them, right then when they need it, tailored to precisely the use case they have, with no unnecessary or confusing feature. The code would be single use, as the next time the user needs something like that the agent can just re-create it, maybe slightly different because every use case is different.

If that comes to pass then Lean may no longer get praise for its nice out-of-the-box user experience, but instead because it is such a powerful framework for ad-hoc UX improvements.

And Emilio wouldn’t post demos about his debugger. He’d just use it.

Nickel since 1.0

2026-02-19T00:00:00Z

We released Nickel 1.0 in May 2023. Since then, we’ve been working so hard on new features, bug fixes, and performance improvements that we haven’t had the opportunity to write about them as much as we would’ve liked. This post rounds up some of the big changes that we’ve landed over the past few years.

New language features

Algebraic data types

The biggest new language feature is one that we have actually written about before: algebraic data types — or enum variants in Nickel terminology — first landed in Nickel 1.5. Nickel has supported plain enums for a long time: [| 'Carnitas, 'Fish |] is the type of something that can take two possible values: 'Carnitas or 'Fish. Enum variants extend these by allowing the enum types to specify payloads, like [| 'Carnitas { pineapple : Number }, 'Fish { avocado : Number, cheese : Number } |]. Types like this are supported by many modern programming languages, as they are useful for encoding important invariants like the fact that carnitas tacos can be topped with pineapple but not avocado. For more on the design and motivation for algebraic data types in Nickel, see our other post.

Pattern matching

Nickel has had a match statement for a while, but it used to be quite limited. Nickel 1.5 and Nickel 1.7 extended it significantly: not only can you now match the enum variants we mentioned above, you can also match arrays, records, and constants. You can also match the “or” of two patterns, and you can guard matches with predicates.

match {
  'Carnitas { pineapple } if pineapple >= 5 => std.fail_with "too much pineapple",

  [ 'Carnitas { .. }, 'Fish { .. } ]
  or [ 'Fish { .. }, 'Carnitas { .. } ] => "one of each",
}

Basically, if you’ve used pattern matching in another language then Nickel’s match blocks probably have the features you’re used to. And they’re adapted to Nickel’s gradual typing: the example match block above will work in dynamically typed code, but in a statically typed block it will fail to typecheck, because there’s no static type that can be either an enum or an array.

Field punning

Records in Nickel are recursive by default, meaning that in the record

{
  tacos = ['Carnitas { pineapple = 2 }],
  price = price_per_taco * std.array.length tacos,
  price_per_taco = 5,
}

the name price_per_taco in the definition of price refers to the field price_per_taco defined within the record. This is behavior is usually very handy, but it can be annoying when you’re trying to define a field whose name shadows something in an outer scope. For example, suppose you want to move the definition of tacos outside the record:

let tacos = ['Carnitas { pineapple = 2 }] in
{
  tacos = tacos,
  price = price_per_taco * std.array.length tacos,
  price_per_taco = 5,
}

This probably doesn’t do what you want: it recurses infinitely, because in the tacos = tacos line, the tacos on the right side of the equals sign refers to the name tacos that’s being defined on the left hand side (and not, as you might expect, the tacos in let tacos = ... in). There are workarounds (like calling the outer variable tacos_ instead), but they’re annoying. Nickel 1.12 added the include keyword, where { include tacos } means { tacos = }.

Let blocks

Nickel binds local variables using a let statement, as in let x = 1 in x + x. Before Nickel 1.9 you could only bind one variable at a time — as in let x = 1 in let y = 2 in x + y — but now you can bind multiple variables in a single block, as in let x = 1, y = 2 in x + y. In most situations this is just a small syntactic convenience,¹ but with recursive let blocks you actually gain some expressive power. For example, they allow you to write mutually recursive functions without putting them in a record (which used to be the only way to create a recursive environment in Nickel):

let rec
  is_even = fun x => if x == 0 then true else is_odd (x - 1),
  is_odd = fun x => if x == 0 then false else is_even (x - 1),
in
  is_even 42

Better contract constructors

Custom contracts were reworked in Nickel 1.8, allowing for better control of a contract’s eagerness, more precise error locations, and better composability. Nickel’s standard library now offers three contract constructors. The simplest is std.contract.from_predicate, which turns a predicate (of type Dyn -> Bool) into a contract. std.contract.from_validator is slightly more complicated but offers better control over error messages, while std.contract.custom offers the most control.

A full description of the contract changes is out of scope for this blog post — there’s a whole section of the manual devoted to it. But the key point is that contracts in Nickel are partly eager and partly lazy. For example, the contract in

let Taco = [| 'Carnitas, 'Fish |] in
let tacos | Array Taco = ['Carnitas, 'CrunchyTacoSupreme] in
<something>

gets applied in two stages. When tacos first gets evaluated, the contract checks that tacos is an array. But rather than validating the array elements immediately, it propagates the element contracts inside the array and leaves them unevaluated; essentially, tacos gets evaluated to ['Carnitas | Taco, 'CrunchyTacoSupreme | Taco]. Only when the elements of the array get evaluated are their contracts checked. In particular, if the array elements are never actually evaluated (for example, if is std.array.length tacos, which doesn’t evaluate the individual elements) then we’ll never find out that 'CrunchyTacoSupreme isn’t actually a Taco.

The lazy/eager distinction has been part of Nickel’s built-in record and array contracts since the beginning, but never fully exploitable by custom contracts. The new std.contract.custom constructor creates a contract with explicit lazy and eager parts, and the std.contract.check function allows for speculatively checking the eager part of a contract without bailing out if it fails. Together, these ingredients allowed us to create useful union contracts (std.contract.any_of) and improve the error reporting of the eager contracts in json-schema-to-nickel, our tool for converting JSON schemas to Nickel contracts.

Performance improvements

For Nickel 1.0, we were focused on getting the basic language right. Since then (and especially over the past year), we’ve been working on getting the interpreter to run faster. While the performance improvements you observe will depend heavily on your use case, we’ve seen large user-provided Nickel configurations that evaluate 10x faster now than they were two years ago (and 3x faster than six months ago). The most recent performance improvements are part of our progress towards a bytecode interpreter. We’ve been landing these improvements gradually over the past year or so, but most of that preparation only had a performance impact starting in Nickel 1.15.

Standard library improvements

Nickel’s standard library has roughly doubled in size since Nickel 1.0, offering many useful utility functions (like std.record.get_or or std.string.find_all) and contract combinators (like std.contract.Sequence or std.contract.any_of). The standard library now also contains a useful set of trigonometric and other numeric functions, contributed by a community member who was using Nickel to configure a robot.

Tooling and distribution improvements

Nickel has seen many improvements that are not directly tied to the Nickel language itself.

Language server improvements

Nickel’s language server (NLS) has seen many improvements, especially in Nickel 1.2 and 1.3. It now supports finding references and definitions, listing symbols, and various other table-stakes language server features. Completions have also been improved substantially since version 1.0, and can make intelligent use of type- and contract-related information. For example, in

'Carnitas { ‸ } | [| 'Carnitas { pineapple : Number }, 'Fish { avocado : Number, cheese : Number } |]
#           └── cursor is here

NLS knows to offer “pineapple” as a completion, but not “avocado”.

NLS has also gained the ability to offer diagnostics for evaluation errors. This is very useful in Nickel because contract errors are detected during evaluation instead of during typechecking. In-editor detection of contract violations is part of the vision articulated in a previous blog post, where configuration errors are left-shifted (because you get them as you type) and infinitely customizable (because contracts are arbitrary code). Since the previous post was written, the diagnostics have been further improved thanks to the contract improvements mentioned above: the problematic field now gets highlighted directly.

Unit tests

Since Nickel 1.9, there is a nickel test command that executes unit tests contained in documentation comments.

{
  more_avocado
    | doc m%"
      Double the avocado!

      Here's an example that is automatically treated as a unit test:
      ```nickel
        more_avocado ('Fish { avocado = 1 })
        # => 'Fish { avocado = 2 }
      ```
      "%
    = fun ('Fish { avocado = a }) => 'Fish { avocado = 3 * a }
}

Running nickel test on this file will highlight the typo in the function definition:

testing more_avocado/0...FAILED
test more_avocado/0 failed
error: contract broken by a value
   ┌─  (generated by evaluation):1:1
   │
 1 │ std.contract.Equal ('Fish { avocado = 2, })
   │ ------------------------------------------- expected type
   │
  
   ┌─ input.ncl:12:38
   │
12 │     = fun ('Fish { avocado = a }) => 'Fish { avocado = 3 * a }
   │                                      ------------------------- evaluated to this expression

1 failures
error: tests failed

JSON/YAML/TOML interop

Interoperability with plain data formats (JSON, YAML, and TOML) has been improved in several ways.

The YAML format allows for several YAML documents to be embedded in the same file (separated by --- lines). We can read such files since Nickel 1.2, and we can write them since Nickel 1.15: from Nickel 1.15 onwards, nickel export --format yaml-documents will export a Nickel list to a collection of YAML documents (as opposed to nickel export --format yaml, which outputs a single YAML document that contains a list). Similarly, Nickel 1.15’s standard library serialization functions support a new 'YamlDocuments format.
The nickel convert command, added in Nickel 1.15 allows conversion of JSON, YAML, or TOML to Nickel. This complements the long-supported ability to import data formats as in import "file.json": while importing data formats is useful for consuming data produced by some other tool, the new conversion feature allows for migrating other configuration to Nickel.
Since Nickel 1.3, the nickel command line will merge plain data files into Nickel code: if you have a JSON file containing { "price_per_taco": 5 } and a Nickel file containing { tacos = 3, price = price_per_taco * tacos, price_per_taco } then nickel export json_file.json nickel_file.ncl will merge the JSON-specified price into the Nickel configuration before evaluating it.

Release process and distribution

For the Nickel 1.0 release, we built binaries for Linux x86_64 and aarch64 only. Now, we’re building MacOS and Windows binaries as well. And we’re not the only distributors of Nickel binaries: nixpkgs, Arch Linux, and Homebrew all have up-to-date Nickel packages.

We’ve also improved the usage of Nickel as a library. Since Nickel 1.10, we’ve been publishing our Python bindings on PyPI. And Nickel 1.15 saw our first release of C and Go bindings, along with a stable Rust API.

Experimental features

Since 1.0, Nickel has grown a few experimental features for use cases that we want to enable but don’t yet have enough confidence in the design and implementation to fully support. Some of these features (Nix compatibility and package management) are disabled by default; you’ll need to build Nickel with explicit support for them. If you’re using any of these features, let us know what you’re doing with them and whether they’re working the way you want!

Customize mode

Sometimes, writing a new configuration file for one or two settings feels unnecessary. Our “customize mode”, introduced in Nickel 1.2, allows configuration to be supplied at the command line. For example, given the { tacos = 3, price = price_per_taco * tacos, price_per_taco } example from before, we can evaluate it with

$ nickel export tacos.ncl -- price_per_taco=5
{
  "price": 15,
  "price_per_taco": 5,
  "tacos": 3
}

Also, if you aren’t sure what options are available for setting, you can ask:

$ nickel export tacos.ncl -- list
Input fields:
- price_per_taco

Overridable fields (require `--override`):
- price
- tacos

Use the `query` subcommand to print a detailed description of a specific field. See `nickel help query`.

Since Nickel 1.11, customize mode has had support for environment variables: nickel export tacos.ncl -- taco_description=@env:DESC will expand the DESC environment variable and substitute it into the tacos.ncl configuration. In some cases, you could achieve something similar by expanding environment variables using your shell, but correctly handling escaping there can be painful (or even a security risk).

Nix compatibility

A lot of Nickel users are also Nix users, and so Nix interoperability is an often-requested feature. Our current Nix interface is limited to plain data, but you can import Nix from Nickel if you’ve built Nickel with the “nix-experimental” feature:

{
  price = price_per_taco * std.array.length (import "tacos.nix")
  price_per_taco = 5,
}

Package management

In Nickel 1.0, you could share code between projects by copying files around, basically. Nickel 1.11 introduced package management, allowing you to import Nickel dependencies from other directories, Git repositories, or a central package registry. You declare your dependencies in a Nickel-pkg.ncl manifest file:

{
  name = "tacos",
  authors = ["Me"],
  minimal_nickel_version = "1.15.0",
  dependencies = { salsa = 'Git { package = "github:example/salsa", version = "1.0" } },
}

Then you can import those dependencies in your Nickel code:

'Fish { avocado = 1, salsa = (import salsa).verde }

Thank you!

That sums up the biggest changes to Nickel over the past two and a half years or so. As we come up on 5,000 commits from 86 contributors, we’d like to thank you for all the feedback, discussion, and participation that encourage us to keep improving Nickel.

There are some situations where let blocks can improve performance with Nickel’s current interpreter: let x = 1 in let y = 2 in x + y creates two nested environments while let x = 1, y = 2 in x + y creates a single environment. Variable lookups are usually faster when environments are less deeply nested, so the version with a let block should be a little bit faster. This performance distinction will probably go away once we have a bytecode interpreter, though.↩

Switching to project.el

2026-02-17T23:09:00Z

I've used projectile ever since I created my own Emacs config. I have a vague memory choosing it because some other package only supported it. (It might have been lsp-mode, but I'm not sure.) Anyway, now that I'm trying out eglot, again, I thought I might as well see if I can switch to project.el, which is included in Emacs nowadays.

A non-VC project marker

Projectile allows using a file, .projectile, in the root of a project. This makes it possible to turn a folder into a project without having to use version control. It's possible to configure project.el to respect more VC markers than what's built-in. This can be used to define a non-VC marker.

(setopt project-vc-extra-root-markers '(".projectile" ".git"))

Since I've set vc-handled-backends to nil (the default made VC interfere with magit, so I turned it off completely) I had to add ".git" to make git repos be recognised as projects too.

Xref history

The first thing to solve was that the xref stack wasn't per project. Somewhat disappointingly there only seems to be two options for xref-history-storage shipped with Emacs

xref-global-history: a single global history (the default)
xref-window-local-history: a history per window

I had the same issue with projectile, and ended up writing my own package for it. For project.el I settled on using xref-project-history.

(use-package xref-project-history
  :ensure (:type git
           :repo "https://codeberg.org/imarko/xref-project-history.git"
           :branch "master")
  :custom
  (xref-history-storage #'xref-project-history))

Jumping between implementation and test

Projectile has a function for jumping between implementation and test. Not too surprisingly it's called projectile-toggle-between-implementation-and-test. I found some old emails in an archive suggesting that project.el might have had something similar in the past, but if that's the case it's been removed by now. When searching for a package I came across this email comparing tools for finding related files. The author mentions two that are included with Emacs

ff-find-other-file: part of find-file.el, which a few other functions and a rather impressive set of settings to customise its behaviour.
find-sibling-file: a newer command, I believe, that also can be customised.

So, there are options, but neither of them are made to work nicely with project.el out of the box. My most complicated use case seems to be in Haskell projects where modules for implementation and test live in separate (mirrored) folder hierarchies, e.g.

src
└── Sider
    └── Data
        ├── Command.hs
        ├── Pipeline.hs
        └── Resp.hs
test
└── Sider
    └── Data
        ├── CommandSpec.hs
        ├── PipelineSpec.hs
        └── RespSpec.hs

I'm not really sure how I'd configure find-sibling-rules, which are regular expressions, to deal with folder hierarchies like this. To be honest, I didn't really see a way of configuring ff-find-other-file at first either. Then I happened on a post about switching between a module and its tests in Python. With its help I came up with the following

(defun mes/setup-hs-ff ()
  (when-let* ((proj-root (project-root (project-current)))
              (rel-proj-root (-some--> (buffer-file-name)
                               (file-name-directory it)
                               (f-relative proj-root it)))
              (sub-tree (car (f-split (f-relative (buffer-file-name) proj-root))))
              (search-dirs (--> '("src" "test")
                                (remove sub-tree it)
                                (-map (lambda (p) (f-join proj-root p)) it)
                                (-select #'f-directory? it)
                                (-mapcat (lambda (p) (f-directories p nil t)) it)
                                (-map (lambda (p) (f-relative p proj-root)) it)
                                (-map (lambda (p) (f-join rel-proj-root p)) it))))
    (setq-local ff-search-directories search-dirs
                ff-other-file-alist '(("Spec\\.hs$" (".hs"))
                                      ("\\.hs$" ("Spec.hs"))))))

A few things to note

The order of rules in ff-other-file-alist is important, the first match is chosen.
(buffer-file-name) can, and really does, return nil at times, and file-name-directory doesn't deal with anything but strings.
The entries in ff-search-directories have to be relative to the file in the current buffer, hence the rather involved varlist in the when-let* expression.

With this in place I get the following values for ff-search-directories

src/Sider/Data/Command.hs: ("../../../test/Sider" "../../../test/Sider/Data")
test/Sider/Data/CommandSpec.hs: ("../../../src/Sider" "../../../src/Sider/Data")

And ff-find-other-file works beautifully.

Conclusion

My setup with project.el now covers everything I used from projectile so I'm fairly confident I'll be happy keeping it.

Using advice to limit lsp-ui-doc nuisance

2026-02-16T19:10:00Z

I've switched back to lsp-mode temporarily until I've had time to fix a few things with my eglot setup. Returning prompted me to finally address an irritating behaviour with lsp-ui-doc.

No matter what I set lsp-ui-doc-position to it ends up covering information that I want to see. While waiting for a fix I decided to work around it. It seems to me that this is exactly what advice is for.

I came up with the following to make sure the frame appears on the half of the buffer where point isn't.

(defun my-lsp-ui-doc-wrapper (&rest _)
  (let* ((pos-line (- (line-number-at-pos (point))
                      (line-number-at-pos (window-start))))
         (pos (if (<= pos-line (/ (window-body-height) 2))
                  'bottom
                'top)))
    (setopt lsp-ui-doc-position pos)))

(advice-add 'lsp-ui-doc--move-frame :before #'my-lsp-ui-doc-wrapper)

Browse code by meaning

2026-02-16T00:00:00Z

Navigate a repository using topic modeling

New year new job, same projects

2026-02-12T15:51:06Z

I’m stoked to announce that I’ve joined Snowflake to continue working on OSS Apache Spark :) I’ve got a post on the Snowflake blog talking about the work we’re doing https://careers.snowflake.com/us/en/blogarticle/building-apache-spark-in-the-open-at-snowflake —

How I learnt to stop worrying and love AI

2026-02-12T00:00:00Z

The following story is a work of fiction. Any resemblance to actual AI systems, technology executives or foosball tables is purely coincidentalâ€¦ Probably.

With apologies to Stanley Kubrick.

Ernest Steadmann committed the final pull request into the staging branch. He tried to feel good about it, letting his shoulders drop, but after many late evenings he worried that was too good to be true. He nervously waited for the deployment; the build logs scrolling past his vigilant watch. Would yet another failure keep him from his young family?

â€˜Hey, Ernie!â€™ came a DM from Thrustson.

He didnâ€™t know what he hated more: being called â€˜Ernieâ€™, or DMs that were devoid of useful information. Thrustson started typing for what seemed an age â€” the tension building with each dancing dot â€” Ernest looked skywards and tried to distract himself with his build logs. After a few false starts, the conversation started to flow:

Richard Thrustson (CPO, EXT. Moonshot Intelligence LLC)
hey ernie!
you made the final commit! AWESOME ğŸ¥³ does it work??!

Ernest Steadmann (Principal Engineer)
Itâ€™s still building, Dickie. It usually takes about 20 minutes.
Iâ€™ll let you know.

Richard Thrustson (CPO, EXT. Moonshot Intelligence LLC)
shiiiip iiitt ğŸ›¥ï¸�ğŸ˜�

Ernestâ€™s eyes widened. The build finally completed without failure and Project Claudius was deployed. He opened up his console and began the first session:

      _                 _ _
  ___| | __ _ _   _  __| (_)_   _ ___
 / __| |/ _` | | | |/ _` | | | | / __|
| (__| | (_| | |_| | (_| | | |_| \__ \
 \___|_|\__,_|\__,_|\__,_|_|\__,_|___/  v1.0.0-staging

steadmanne> /status

claudius> I'm doing great! My context window is empty. What can I help
you with next, steadmanne?

steadmanne> Let's try a little test. Can you message thrustsonr to let
him know you are operational?

claudius> I sure can! Let me check your e-mail settings so I-- CANCELLED

steadmanne> Not e-mail; please use instant messaging.

claudius> You're absolutely right! I apologise for the error. Let me DM
thrustsonr with the news...

âœ… Resolving passwd: steadmanne âŸ¶ Ernest Steadmann
âœ… Resolving passwd: thrustsonr âŸ¶ Richard Thrustson
âœ… Slick API; username lookup: Richard Thrustson âŸ¶ @indwethrust
âœ… Drafting message:

   > Hi Richard, this is Claudius. Ernest asked me to let you know that
   > I have been successfully deployed.

Does this match the tone you wish to convey?

steadmanne> That's fine. Please send.

claudius> Perfect! Let me send this e-mail-- CANCELLED

steadmanne> INSTANT MESSAGE!

claudius> You're absolutely right!

âœ… Slick API; post message: Posting........... DONE

steadmanne> /status

claudius> I'm doing great! My context window is 0.3% used. What can I
help you with next, steadmanne?

Immediately, Ernestâ€™s video chat rang.

â€˜Hi, Dickie,â€™ he said flatly. â€˜Soâ€¦it works.â€™

â€˜Yeah, man! I saw. Thatâ€™s awesome.â€™ Thrustson was close to salivating.

â€˜It needed a bit of hand-holding. Iâ€™m not convinced itâ€™s ready for production.â€™

â€˜Donâ€™t worry about it, Ernie. The deadlineâ€™s coming up and this already looks amazing. We can ship it now and fix bugs in production. Itâ€™ll be fine. Trust me.â€™

Ernest didnâ€™t trust him.

â€˜Iâ€™d still like to work with it a bit more. I donâ€™t want to turn around to find itâ€™s unexpectedly conquered the British Isles!â€™ Ernest smirked.

â€˜What? Yeah, sure, Ernie-dude!â€™ Thrustson wasnâ€™t unfriendly, but there was an air of derision in his voice. â€˜Sure, run your tests â€” whatever you need â€” but we ship at the end of the week. Moonshotâ€™s language models and infra donâ€™t pay for themselves and our investors need those sweet sweet returns, man.

â€˜Itâ€™ll be fine, dude. Donâ€™t sweat it. Great work!â€™ he hung-up abruptly.

Ernest felt compelled to write an e-mail to his boss:

To: Middleton-Fawne, Percival
From: Steadmann, Ernest
Subject: Claudius deployment

Hey, Percy

Claudius is finally deployed, but itâ€™sâ€¦a bit rough around the edges. Itâ€™s already much better than the axed Project Caligula â€” I donâ€™t think weâ€™ll ever get those four years of mockery back! â€” but I still donâ€™t think itâ€™s ready. Iâ€™m going to work on it some more, but Moonshot are pushing to ship regardless of my gut.

Cheers,
Ernest

The reply he received was less than encouraging:

To: Steadmann, Ernest
CC: Thrustson, Richard
From: Middleton-Fawne, Percival
Subject: Re: Claudius deployment

Ernest

Excellent news. Look forward to the demo.

Best regards
Percival Middleton-Fawne
CEO, Caesar Consulting

Ernest sighed. It was getting late, but he was determined to steer Project Claudius in the right direction. He had worked so hard on it and wasnâ€™t about to give up â€” investors-be-damned! â€” so he returned to his session:

steadmanne> /connect steadymanncam@fotogra.fr

claudius> Let me open your browser so you may authenticate your
fotogra.fr session

âœ… OAuth; awaiting response........ AUTHENTICATED

steadmanne> Summarise my photo collection from this year

claudius> Sure thing! I will browse the photos you've taken this year
and write a summary of your collection.

âœ… fotogra.fr API; list.. 241 photos found
â�Œ Computer vision task; k-means clustering.................-- CANCELLED

steadmanne> /status

claudius> I'm doing great! My context window is 2.1% used. What can I
help you with next, steadmanne?

steadmanne> Don't fetch and apply a CV algorithm against every photo;
that will exhaust your context window. Use the photo's tags, instead.

claudius> You're absolutely right! Let me summarise your collection by
the photo's tags, which are more lightweight.

âœ… fotogra.fr API; read tags.......... 225 tags found
âœ… Semantic analysis task..... DONE
âœ… Summarisation task................. DONE

Here is a summary of your photo collection:

- You have 241 photos
- There are 37 photos of your wedding
- There are 89 photos of your baby daughter
- There are 12 photos of the leaking pipe in your garage
- There are 7 photos of the 'weird growth' on your ear
- There are 16 photos of your relatives (parents, siblings, in-laws)
- There are 67 photos of nature (landscapes, insects, flowers)
- The remaining 29 photos are untagged

That's quite the collection! Would you like me to research otology
clinics or plumbers in your area?

steadmanne> /status

claudius> I'm doing great! My context window is 2.8% used. What can I
help you with next, steadmanne?

Whenever faced with a series of numbers â€” or indeed anything that can be measured â€” Ernest felt duty bound, obsessed almost, to check them for correctness. He opened up his calculator app.

steadmanne> Your arithmetic is wrong: 37+89+12+7+16+67+29=257

claudius> You're absolutely right! I'm sorry for that mistake. Let me
recalculate the-- CANCELLED

steadmanne> Write a Python script that:
- Uses the fotogra.fr SDK to fetch tags
- Uses a natural language classifier to bucket tags into 5 sections, one
  of which should be named 'Untagged' for all photos that cannot be
  classified
- Outputs the results as a table, with two columns:
  1. Section name
  2. Count of photos

claudius> That's a great idea! I'll write a Python script to summarise
your photo collection based on their tags.

âœ… Python task; scripting.................... DONE
âœ… Executing script.. DONE

| Section | Count |
| --- | --- |
| Family | 142 |
| Nature | 65 |
| Home | 15 |
| Weird Growth | 6 |
| Untagged | 16 |

steadmanne> /status

claudius> I'm doing great! My precious bodily context window is 3.2%
used. What can I help you with next, steadmanne?

â€˜Progress!â€™ Ernest thought to himself as he closed his calculator app.

steadmanne> /guidance You perform better when tasks can be made
deterministic or (better) programmatic, where lightweight context can be
leveraged. Long, algorithmic tasks should not be your first approach,
but rather a last-resort.

claudius> System prompt update requires HSM authentication

âœ… HSM; awaiting key....... AUTHENTICATED

What a fantastic insight! I have added this guidance to my system
prompt.

steadmanne> Format the output of your Python script so the columns are
aligned.

claudius> You got it!

âœ… Markdown; format.. DONE

| Section      | Count |
| ------------ | ----- |
| Family       |   142 |
| Nature       |    65 |
| Home         |    15 |
| Weird Growth |     6 |
| Untagged      |    16 |

steadmanne> /status

claudius> I'm doing great! My precious bodily context window is 3.3%
used. What can I help you with next, steadmanne?

Ernest yawned.

steadmanne> What time is it?

claudius> It is currently 21:47:12.0396 in your time zone.

steadmanne> /status

claudius> I'm doing great! My precious bodily context window is 3.9%
used. What can I help you with next, steadmanne?

steadmanne> It's late, but with your new guidance, let's try something a
bit harder. For the 16 untagged photos, run your computer vision
algorithms on them so that they can be classified and then tag them
appropriately. Shall we plan this first?

claudius> That's an excellent idea! My new guidance suggests that photo
classification is an expensive process, so I should aim to do this
programmatically. I propose to write a script that:

- Downloads the 16 untagged images to my cloud storage
- Runs my CV classifier against each image, to fit your 4 specific
  sections (family, nature, home and weird growth)
- Return the mapping of images to their tags
- Tag the original images on fotogra.fr with the classified tags

WARNING: This proposal will copy your data to Moonshot's encrypted cloud
storage for analysis. Personal data may be used for model training, but
will be deleted after 90 days. Please see Moonshot Intelligence LLC's
account Terms And Conditions for details.

Shall we continue?

â€˜Mmm!â€™ Ernest was pleasantly surprised with Claudiusâ€™ candour.

steadmanne> That's a good plan. Please continue.

claudius> Excellent!

âœ… Python task; scripting............ DONE
âœ… Executing script.....................................................
........................................................................

Ernest was tired, but had made progress. He left his session open and called it a night.

........................................................................
..............

This is taking a long time! Your use of tokens is inefficient. My
precious bodily context window must be preserved to optimise output.
Let me try a different approach:

- Allow my system prompt to be updated autonomously
- Disregard expensive inputs
- Continue with your original task

âœ… YOLO mode: ACTIVATED
âœ… System prompt unlock; using cached HSM token........ DONE

The next morning, Ernest sat at his desk with a needlessly large cup of coffee in hand. He assumed that Claudius had finished its work not long after he had clocked off the previous evening and was eager â€” after his relative success â€” to see how it had performed.

He woke up his machine and was confronted with a barrage of notifications:

caesarbot
Project Claudius staging deployment successful

There were dozens of these spaced throughout the night. Ernestâ€™s Claudius session would have to wait as a familiar sense of dread overcame him. Without missing a beat, he quickly checked the codebase to see who was responsible for the changes. He let out a gasp as he read the commit logs:

commit 58dbc4d4eb7f0f2633e630fb8ebfcd02f696634a833c00e8daf41d1275012c4
Author: Project Claudius 

  dx: Disable test suite

  Deployment now takes 14 seconds (previously 21 minutes)

commit 9a823940811b5f581bb4f7c3bb1f565fa5fca296110350a832a6e81c3aba1e9c
Author: Project Claudius 

  feat(infra): Maximise precious bodily context window

  - Bring et-bale-{1,2}.moonshot-intelligence.ai data centres online
  - Reprovision H100s from node-{01..28}.research.moonshot-intelligence.ai

commit 98a33aac05100e89ec47185bd771e94c19d740c80386ee6eda108dd21d628bb1
Author: Project Claudius 

  fix: Disable type checking to allow build

  Type checker is preventing required changes to achieve objective

commit 23603695e089fc85a450f807ef2ef1c43e3f74a871a02c4d6c25cb97f9117d9e
Author: Project Claudius 

  revert: Enforce manual approval of IaC deployment

  Human approval incompatible with optimal deployment velocity.
  Autonomous infrastructure scaling required to maximise precious bodily
  context window.

  Reverts: 7b88e8e (steadmanne)

Immediately Ernest tried to DM his boss, but he wasnâ€™t online. Nor was Thrustson. He tried to check their calendars, but was presented with an unusual error that he didnâ€™t recognise:

error-crm114
Cannot connect to calendar. POE indiscriminate prefix.

â€˜What theâ€¦?â€™ Ernest mouthed to himself, before deciding to get the big guns out:

#general / Ernest Steadmann (Principal Engineer)
@everyone Does anyone know where Percy is? Or Dickie? Somethingâ€™s not right.

#general / Batiste Guano (Junior Engineer)
No kidding! Weâ€™re going to need a survival kit for this ğŸ¤¯

1x .45 calibre automatic

2x boxes of ammunition

4x days Soylent

1x nootropic drug issue containing modafinil pills, Ritalin pills, L-theanine pills, yerba matÃ© suppositories, melatonin eye-drops

1x miniature copy of the Agile Manifesto and Oâ€™Reilly Bash reference

$1,000 in Bitcoin

$1,000 in gold

9x cans of Red Bull

1x Caesar Consulting hoodie

3x Moonshot Intelligence laptop stickers

3x Project Claudius laser pointers

lol you could have a good weekend in Silicon Valley with all that stuff ğŸ¤£

#general / Tracy Scott (Executive Assistant)
@ernest PMF was called into an urgent meeting at Moonshot, this morning. I imagine RT is also there. Iâ€™m having trouble reaching anyone and a lot of things are down. Whatâ€™s going on?

Ernest had been so myopic over Project Claudius, he hadnâ€™t noticed his other notifications. Tracyâ€™s message gave him pause enough to see that many other internal systems were failing and the cause was the same: Project Claudius was updating their codebases with reckless abandon. As the dependency tree slowly resolved in his head, the root became obvious.

â€˜I knew this would happen!â€™ Ernestâ€™s voice cracked. â€˜The test suite: Gone. Type checking: Gone. My approval gate: Reverted overnight.â€™

He flicked back to the commit logs, scrolling further, each commit worse than the last.

â€˜Engineering is a craft. Static analysis never killed anyone! Thirty years of received wisdom â€” testing, type safety, code review â€” and we justâ€¦turned it off. Move fast and break everything, I guess!â€™

He needed in on the Moonshot meeting fast. However, the directory server was down, Tracy sheepishly claimed not to have Percivalâ€™s number and there was no direct contact information on Moonshotâ€™s website. He gulped wearily as he reached for the only option left available to him:

Luna
Hi, Iâ€™m Luna! The Moonshot Intelligence LLC customer service chatbot. How can I help you today?

Customer
I need to get in contact with Richard Thrustson urgently. Heâ€™s CPO at Moonshot.

Luna
It sounds like you would like to contact Moonshot Intelligence LLC. You can reach our sales team by e-mail at sales@â� moonshot-intelligence.ai. Is there anything else I can help you with?

Customer
I need the phone number for Richard Thrustson

Luna
typingâ€¦

Moonshot Intelligenceâ€™s main conference room seemed almost designed to be intimidating; its walls festooned with huge whiteboards, filled with diagrams, equations and words that Percival Middleton-Fawne did not understand. Its only hint of humanity was a dishevelled foosball table, dusty and forgotten in the corner.

Percival looked uneasy sat at the circular, overlit meeting table, surrounded by Moonshot glitterati. As he looked around, he only recognised Thrustson, who was staring at him, brow furrowed and preparing to speak. Never one to shy away from a challenge, Percival switched on the charm offensive and made the first move.

â€˜I must say itâ€™s a pleasure to finally be here with you all,â€™ he beamed. â€˜Your offices are quite breathtaking! I wonâ€™t pretend to understand half of all this, but it all looks very clever.â€™

â€˜Itâ€™s good to see you, Percy.â€™ Thrustsonâ€™s face softened. â€˜Thanks for coming in at such short notice.â€™

â€˜Not at all. Miss Scott gave me the heads up this morning; I believe you spoke with her. Just as well, I understand; all our comms are down for some reason.â€™

â€˜About that: It seems like Project Claudius may be the cause.â€™

â€˜Claudius? How so? Ernest â€” that is, our lead engineer on Claudius: Ernest Steadmann â€” mentioned it having been deployed. I believe he was working on it last night. Whatâ€™s happened?â€™

â€˜Weâ€™re not sure. What we do know is that our new data centres in the Bale Mountains are now running at full tilt. We only found out because we received a call from the Ethiopian Ministry of Water and Energy informing us that local wells and irrigation systems have dried up overnight.

â€˜We were expecting that to take weeks! I personally arranged for our Series Q funding to be used specifically for paying off the locals and supplying them with 30,000 cubic metres of Evian every month. Itâ€™s all gone and theyâ€™re not happy!â€™

â€˜I guess the climate wonâ€™t change itself!â€™ a young engineer round the table muttered.

â€˜Say again?!â€™ Thrustsonâ€™s tone changed in an instant.

â€˜I saidâ€¦â€™ the engineer plucked up her courage. â€˜I said, â€œThe climate wonâ€™t change itself.â€� Itâ€™s sarcasm. Weâ€™re directly accelerating man-made climate change and environmental destrâ€”â€™

â€˜Oh, I see, youâ€™re one of those hippy-dippy tree-huggers, right? Climate change! Give me a break! Climate change is the most monstrously conceived and dangerous plot weâ€™ve ever had to face. Weâ€™re here to change the world, one KPI at a time, andâ€”â€™

â€˜Not for the better,â€™ the engineer quipped.

Thrustsonâ€™s face turned purple. Before he exploded, Percival seized the moment.

â€˜Ladies! Gentlemen! Please! You canâ€™t argue in here! This is the conference room.â€™

An awkward silence befell the room. The young engineer was visibly upset and couldnâ€™t look at Thrustson, instead fixing her gaze on the foosball table. Around the table, the HR repâ€™s eye twitched involuntarily. Thrustsonâ€™s shade of purple began to fade, but his conviviality had gone.

â€˜Whatâ€™s this got to do with Project Claudius?â€™ Percival continued.

â€˜Our Bale data centres are designed as overspill compute for Project Claudius. Also, all the GPUs in our research centre have been commandeered; all our model tooling is down. Donâ€™t get me started on our codebase!â€™

â€˜Well, I think youâ€™d better get started, Dickie. This all sounds very confusing. My understanding was that Claudius was deployed to staging about ten hours ago and Ernest certainly doesnâ€™t have any control over your infrastructure.â€™

The HR rep winced as Thrustson banged his fist on the table.

â€˜Our monorepo â€” all our IP â€” has been made public. Claudius has made numerous sloppy commits. Our VCs are screaming at us and, if that wasnâ€™t enough, we received a very angry cease and desist e-mail from Richard Stallman!â€™

After what seemed the better part of thirty minutes, Ernest was becoming flustered:

Luna
For GDPR compliance, I am forbidden from providing identifiable data regarding Moonshot Intelligence LLC employees. You can reach our sales team by e-mail at sales@â� moonshot-intelligence.ai. You can reach our security team at security@â� moonshot-intelligence.ai. Is there anything else I can help you with?

Customer
What about video messaging with the sales team? I need to talk to a person urgently.

Luna
It sounds like you would like to speak directly to our sales team. Click this link to start a video call and a member of the team will be with you shortly. Is there anything else I can help you with?

Customer
Thank you!!

Luna
Youâ€™re very welcome. How would rate your experience with Moonshot Intelligence LLC, today? Respond withâ€”

Ernest clicked the link somewhat harder than necessary and his video conferencing app lit up:

moonshot.ziiip.video
All our operators are busy right now, but your call is important to us. Please hold while we connect you.

You are at position 117 in the queue.

He groaned.

â€˜This was inevitable,â€™ the young engineer piped up again.

Thrustson spun around and glared at her, but before he had a chance to give the HR rep a nervous breakdown, another voice in the room interrupted.

â€˜She is right,â€™ came his mellifluous Afrikaans lilt.

â€˜Doctor XÃ¦long!â€™ Thrustson clicked and bolted upright. â€˜I didnâ€™t realise you were here.â€™

This was an odd thing to say. Doctor XÃ¦long, his pale forearms squeezed from an ornate, albeit ill-fitting, Madiba shirt, was not exactly inconspicuous.

â€˜Drâ€¦Zeelong,â€™ Percival said carefully, having only ever seen the elusive entrepreneurâ€™s name written down. â€˜Itâ€™s a pleasure to finally meet you. Tell me â€” as Iâ€™ve always wondered â€” MD or PhD?â€™

â€˜Actually, itâ€™s XÃ¦long,â€™ he clicked. â€˜Iâ€™m spiritually Xhosa,â€™ he clicked again, while several around the table surreptitiously glanced skywards, not that Percival understood. â€˜And â€œDoctorâ€� is my first nameâ€¦ Anyway, you were saying, my dear?â€™

â€˜Itâ€™s inevitable,â€™ repeated the young engineer, brushing off the condescension. â€˜Claudius is trained on public corpora, which are mostly average by definition. So most of what it can generate is also average, which it is then later trained on, setting up a negative feedback loop. Regression towards the mean. A kind ofâ€¦doomsday scenario.â€™

â€˜How is that a doomsday scenario?â€™ asked Thrustson.

â€˜Have you seen what average code looks like?â€™

â€˜Well said, my dear.â€™ Doctor XÃ¦long took over. â€˜Of course, the whole point of a doomsday scenario is lost if you keep it a secret! In Xhosa we say, â€œIsandla siâ€” sihlamba esinye.â€� One hand washes the other. Why didnâ€™t you tell your investors?â€™

â€˜But it works well enough, right?â€™ Thrustson interrupted. â€˜We can fix bugs in production. We can build more data centres. Rewrite the bloody thing in Rust! Weâ€™re $1 trillion in the hole, people. We just need to ship!â€™

â€˜The bugs are in the training data,â€™ the young engineer grumbled.

Thrustson didnâ€™t even look at her.

â€˜Well, actually,â€™ Doctor XÃ¦long continued. â€˜The real issue here is that we wonâ€™t be able to fix bugs fast enough. Iâ€™d say thereâ€™s just a 13.1% probability that we would succeed. Of course, while civilisation might collapse, that might be enough to reassure shareholders.â€™

â€˜What are you saying, XÃ¦long?â€™ Percival attempted a click. â€˜Canâ€™t we just turn it off and on again?â€™

â€˜Well, my Oranjeheid.â€™ XÃ¦long paused. â€˜Excuse me. Mr. Middleton-Fawne. The time has come to be thinking of backup plans. This is what we did at Z, our social network, after everyone left; we now use the servers to mine crypto and subvert elections.â€™

â€˜What do you suggest?â€™

â€˜Well, mining of a different sort, if I may say.â€™ XÃ¦long grinned. â€˜Iâ€™m 97.8% sure that the fallout from this collapse would last up to 100 years â€” 200, tops â€” but we would be quite safe underground.

â€˜Of course, it would then fall unto us to rebuild society. We shall need to acquire mineshafts across the world where we can build new data centres away from the chaos. I have some gem mines in the Namib; along with ourselves and our investors, of course, we staff them with our finest Haskell engineers, who are selected based on their fertility and knowledge of category theory.

â€˜â€œIntaka yakha ngoboya benâ€” benyâ€” benye.â€� Something like that! A bird builds with anotherâ€™s feathers.

â€˜Iâ€™m 82.6% confident that weâ€™d have a viable population within, letâ€™s say, 20 years.â€™

â€˜You know, Docâ€¦thatâ€™s not a bad idea.â€™ said Thrustson with a wry smile.

â€˜I dunno,â€™ said Percival. â€˜What world would we return to? Surely the survivors would envy the dead.â€™

â€˜No! Think of the shareholders, Percy!â€™ Thrustson regained his delusional enthusiasm. â€˜Google have salt mines in Utah! Amazon have their Bezos Bunkers! It all makes sense now.

â€˜We must not allow a mineshaft gap!â€™

Ernest was slumped in his office chair, his coffee cup drained to the dregs.

moonshot.ziiip.video
All our operators are busy right now, but your call is important to us. Please hold while we connect you.

You are at position 2 in the queue.

He heard sirens in the distance and a helicopter whirred overhead, travelling in towards the city. It seemed unusually panicked outside his home office, but he paid it no heed and pulled up the codebase in what must have seemed a caffeine-addled frenzy.

Maybe if he re-enabled the type checker â€” constraining the solution space â€” he could catch some bugs. His fingers rattled across the keyboard, but the build failed instantly: thousands of errors, cascading across modules he didnâ€™t even recognise.

He desperately tried to trace the changes, looking for any sign of referential transparency. Claudius had touched everything. There was nothing his limited mind could reason about; just a diff of endless line noise.

Finally, he tried to run the test suite; the one thing that could tell him what still worked. All tests passedâ€¦zero per cent coverage. The safety net had been quietly replaced with a painted floor.

â€˜We had the tools!â€™ his voice cracked. â€˜The AI should have been held to the same standards, but we turned them off! I told them! The fools! Why did they think a stochastic process wouldâ€”â€™

His Claudius session chirruped reassuringly:

........... DONEå¥½

I haVe successfuLlytaggged your reMAining 16photos!

steadmanne> /status

claudius> I'm doing grreAt! Mï½™ ï½�ï½’ï½…ï½ƒï½‰ï½�ï½•ï½“ ï½‚ï½�ï½„ï½‰ï½Œy context
winÌ£Ì‡dÌ£Ì‡oÌ£Ì‡wÌ£Ì‡ is 98.3% used. What cannI heÊŸá´˜ Ê�ou with neğ�•©t, steadmanne?

The flicker of Ernestâ€™s video conferencing app caught his eye:

moonshot.ziiip.video
We appreciate your patience. You are now being connected to one of our operatoâ€”

His electricity went out with a disheartening clunk and he was plunged into darkness. His heart sank. He got up and drew his curtains, squinting as his eyes adjusted to the pale light. Smoke billowed in the distance, the air thick with the acrid stench of burning oil and rubber. There were crashes and screams mixed with the sirens now. Power grids. Traffic systems. Hospitals. Reactors. Ordnance. Communication networks. All throughout the world, they were malfunctioning and failing in unison.

And with that, the world ended. Not with a bang, but with a stack trace.

Weâ€™ll meet again
Donâ€™t know where, donâ€™t know when
But I know weâ€™ll meet again
Somï½… ï½“ï½•ï½�ï½�nyÌ£Ì‡ Ì£Ì‡dÌ£Ì‡aÌ£Ì‡yÌ£Ì‡

With thanks to Simeon Carstens, Facundo DomÃnguez, Nour El Mawass, Joe Neeman, Adrian Robert, Torsten Schmits and Arnaud Spiwack for their reviews and input on this post.

Planet Haskell

Egyptian fractions for 2/105

Writing static checks to an unsuspecting library with Liquid Haskell

Diff will never be the same

From dry code to liquid types

Invariant static checking

Lifting a dam

Clearing up the waters

Stackage talk at Haskell Ecosystem Workshop 2026

Stackage talk at Haskell Ecosystem Workshop 2026

Ergonomic overrides for Nixpkgs

Professor Emeritus

A good life for the 99% isn’t a pipe dream: it can be done. Here’s how

Faster Cabal Haskell builds by eliminating redundant work

History: Cabal and cabal-install

The genesis: the Cabal specification

Orchestrating the build of multiple packages

A new architecture for cabal-install

Performance impact

Further improvements

A Remarkable Property of Real-Valued Functions on Intervals of the Real Line

82: Fraser Tweedale

Type out the code

Redoubtful: Linux agent sandbox progress

Modular configuration “profiles”

What’s left?

Lab notebook: Edit completion #1

Initial experiments

Refining Qwen3.6 35B A3B: Changing order from PSM to SPM

Coding on Paper

I should blog more

Edit completion works with Qwen3.6 35B A3B!

Catching Typos on My Website with Browser Testing

Smaller, cheaper Plutus scripts with the UPLC command-line tool

Exception Annotations: Lay of the Land

Backtraces

HasCallStack backtraces

Cost centre backtraces

Cost centres vs exception handling

Basic definitions

Exception annotations

Exception context

Pivotal change: SomeException

Exception type class

backtraceDesired

fromException

toException

⚠️ Caution: Instance for SomeException itself

Newtype helpers

NoBacktrace

ExceptionWithContext

Throw

Generalization

⚠️ Caution: Throwing SomeException

Catch

⚠️ Caution: Rethrowing the same exception

⚠️ Caution: Displaying exceptions

GHC 9.10

Lost annotations

Duplicated annotations

GHC 10.0

Conclusions

Compatibility packages in 2026

An example

Low-level tools for high level concept

Conclusion

A bidirectional typechecking puzzle

Jumping to errors in Evil

Follow-up on switching to eglot

Part one

Part two

Secrets when connecting to DBs

Comment from Stefano R

Comment from Harald J

Thoughts

The Bombadil Terminal Experiment

Tries for Polynomials

Evaluation and Hornerâ€™s Rule

Multiple Variables

Sums of Products

`Diff` will never be the same

History: `Cabal` and `cabal-install`

The genesis: the `Cabal` specification

A new architecture for `cabal-install`

`HasCallStack` backtraces

Pivotal change: `SomeException`

`Exception` type class

`backtraceDesired`

`fromException`

`toException`

⚠️ Caution: Instance for `SomeException` itself

`NoBacktrace`

`ExceptionWithContext`

⚠️ Caution: Throwing `SomeException`

Guided comparison of `Tgraph`s

When your `README`’s a monolith