Release notes
What changed in each tsumugi release.
The authoritative, commit-level history lives in CHANGELOG.md and on the releases page. This page summarises each version.
v0.1.0
The first release. tsumugi packs a web crawl into compact single-file shards and serves an exact, ranked, fleet-wide top-k in milliseconds, from one pure-Go binary.
tsumugi buildturns a Parquet or newline-delimited JSON crawl export into a directory of.tsumugishards, ordering documents by host for locality and writing the lexical index, the feature matrix, and the forward store into each shard.tsumugi trainfits a LambdaMART ranking model over a collection's features, using the static-rank prior as a cold-start label, in pure Go with no LightGBM or Python in the loop.tsumugi servestands a collection up behind HTTP. A broker routes each query to the shards that can answer it, gathers their candidates, and runs one global rerank under a per-request latency budget, so the merged top-k is bit-for-bit what a single index over every shard would return.tsumugi collectionlists, extends, and compacts a shard directory:addbrings a later crawl in without rewriting existing shards, andcompactmerges accumulated shards back into fewer, larger ones, both safe to run against a live directory because shards are immutable once written.tsumugi inspectprints a shard's header, region table, and statistics.- The
.tsumugishard format. One self-describing file holds the inverted index, the stored fields, a quantized feature matrix, the compressed link graph, and quantized vectors, behind a CRC-checked header and footer, written append-then-footer with an atomic rename and read through a memory map with no heap copy for uncompressed regions. - Retrieval and ranking. A doc-side learned-sparse impact index queried with Block-Max pruning, an optional dense plane over quantized vectors, and a ranking cascade that retrieves, cuts over the feature matrix, and reranks with the trained model.
- Proven end to end. On a real crawl export,
buildpacks 20,246 documents from 18,777 hosts into 11 shards (92.4 MB) in 11 seconds,trainfits a 150-tree model,serveanswers queries in under a millisecond each, andcompactmerges the shards down to 3 (84.3 MB) in 9 seconds. A worst-case all-matching query over 50,000 documents in 16 shards returns in 6.9 milliseconds. - Packaged everywhere. Archives,
.deb/.rpm/.apkpackages, a multi-arch GHCR image, Homebrew and Scoop, checksums, SBOMs, and a cosign signature.