This documentation is also published as Markdown for efficient machine reading: the whole site is indexed at /llms.txt, and every page has a clean Markdown copy under /_llms/. These are generated from the same source and cost far fewer tokens to read than this rendered HTML.

Skip to main content Skip to navigation
Under the Hood

The syntax-highlighting cascade

Why Pennington dispatches code fences through a priority-ordered chain of highlighters with a guaranteed plain-text fallback instead of a single parser.

A content engine that renders Markdown through Markdig could reasonably pick one syntax highlighter and ship it — so why does Pennington dispatch every fenced code block through a priority-ordered chain of highlighters that falls through to a plain-text fallback, instead of binding the pipeline to a single parser?

Context

Pennington renders code in very different shapes: shell sessions that want command-and-flag styling but no formal grammar, and roughly eighty mainstream languages that need real tokenization. A single-parser design forces one of those shapes to lose. A shell-only build gives up nearly the full language surface the first time someone pastes a Python snippet; a TextMate-only build styles a bash command no differently from its flags. So the design layers highlighters instead of picking one: every shape is a highlighter, the ones that care most about a given language win, and a plain-text fallback catches anything no highlighter claims.

How it works

The priority chain

The cascade is specificity-ordered, not quality-ordered. Shell wins over TextMate for bash not because it produces better HTML in the abstract, but because it knows the one thing worth styling in a command fence — the command itself versus its flags. Priority encodes "which highlighter cares most about this language," not "which highlighter is best."

HighlightingService takes every registered ICodeHighlighter at construction, sorts once by descending Priority, and for each code block walks that list checking whether the language is supported (or the highlighter declared "*" as a catch-all). The first hit wins. The shipped chain is a 50/75 ladder: TextMateHighlighter at 50 with "*" so it claims any language it can find a grammar for, and ShellHighlighter at 75 for bash/shell/sh specifically. Those two integers are illustrative of the shipped chain, not a stable contract — a custom highlighter should pick a priority relative to whatever currently handles its languages (above 50 to beat TextMate's catch-all, above 75 to displace shell), not hard-code 76. The stable guarantee is the ordering rule, not the literal numbers.

When no chain entry matches and no "*" catch-all is registered, the service reaches for a hardcoded PlainTextHighlighter fallback, which HTML-encodes the code, hands it back, and emits a once-per-language Info diagnostic so authors notice a missing grammar without the build failing. The fallback is what keeps HighlightingService total: every input gets some output. It never appears in the priority chain, so it cannot be displaced by a misconfigured priority.

The HighlightingService dispatcher is stateless past construction, so adding a highlighter via HighlightingOptions.AddHighlighter in DI is enough — no registry mutation, no re-sorting at runtime, no ordering surprise that depends on registration order. Priority is the only tiebreaker that matters.

The ICodeHighlighter contract is three members — SupportedLanguages, Priority, and Highlight(code, language). That is the narrowest shape that still lets the dispatcher choose a highlighter without having to run one first to find out whether it can handle the language. See Pennington.Highlighting.ICodeHighlighter for the interface surface.

Why TextMateSharp

Most of the chain — every language except the shell family (bash, shell, sh) — runs through TextMateHighlighter, which loads TextMate grammars through TextMateSharp and tokenizes line by line. TextMate grammars are the same regex-state-machine format VS Code uses for its default highlighting, which gives Pennington roughly eighty mainstream languages in a single dependency, without compiling a parser, without building an AST, and without pulling a language service per language. The highlighter keeps a scope-to-hljs-class mapping table so the emitted HTML uses the familiar hljs-keyword / hljs-string / hljs-type class names, meaning the same CSS theme highlights Python, Rust, Go, and JSON uniformly.

The alternatives that were considered and rejected make the choice clearer. A single-language semantic parser covers a language or two out of eighty and ships a heavy dependency for zero value on the rest. A Prism or highlight.js port would require either a JavaScript runtime at build time or a reimplementation of dozens of grammars in C#; TextMateSharp inherits VS Code's grammar corpus directly. A hand-rolled regex-per-language table scales linearly with language count and loses the "paste a new fence, it works" property the first time someone wants Kotlin. TextMate's cost is real — it is a regex state machine, so it does not know that Foo on line 40 refers to the class Foo on line 2 — but that ceiling is exactly what the cascade lets a more capable highlighter rise above for the one language that needs it.

The "*" entry in TextMateHighlighter.SupportedLanguages matters because it is how TextMate claims every language it can find a grammar for without enumerating the list at registration time, and it is what lets a new grammar added to the registry start working without any further configuration.

Slotting in a higher-priority highlighter

The cascade is the extension mechanism. A highlighter that wants to claim a language TextMate already handles — say a semantic C# highlighter that can tell a type name apart from a method name, resolve generic arguments, and annotate references to types in other files — registers at a priority above 50 for csharp/cs/c#. When it is present, the only change to the cascade is that C# rises from "TextMate handles it" to "the new highlighter handles it"; every other language keeps its previous highlighter.

Nothing in the core has to change to allow that. The base package ships the shell and TextMate tokenizers plus a plain-text fallback that together cover every site that does not need language-specific semantic treatment, and a higher-priority highlighter is purely additive — declare the relevant languages, pick a priority that beats whatever is currently handling them, register. The cascade does not know or care where a highlighter came from.

Further reading

  • Reference: Highlighting interfacesICodeHighlighter, HighlightingService, TextMateLanguageRegistry, and ICodeBlockPreprocessor with full member tables.
  • How-to: Add a custom syntax highlighter — the step-by-step for implementing ICodeHighlighter, picking a priority, and registering via HighlightingOptions.AddHighlighter.
  • External: TextMateSharp — the upstream library that provides the grammar corpus; authoring new grammars follows its documentation, not Pennington's.