feat: Chain-Bucket Speculative Decoding + Training-Time Sequence Compression (bucketing)#25
Conversation
…e Compression Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/c3edaf75-6cec-473b-94ca-66b16ee27964
…-implementation-plan
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Pull request overview
Adds a new “bucketing” subsystem to BitNetSharp.Core to mine frequent token n-grams into compact chain buckets, then integrates those buckets into the paper model for optional speculative decoding during inference and optional sequence compression during training. Also wires a CLI flag for enabling bucketing and adds accompanying docs + tests.
Changes:
- Introduces
BucketMiner,ChainBucket, andChainBucketTablefor mining/indexing up to 256 frequent n-gram chains. - Integrates bucket mining/loading into
BitNetPaperModel, with speculative decoding inGenerateResponse()and prompt compression inTrain(). - Adds
--enable-bucketingCLI flag plus GitBook docs and unit tests for the new bucketing components.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/BitNetSharp.Tests/BucketMinerTests.cs | Adds unit coverage for mining and table lookup/matching behavior. |
| src/BitNetSharp.Core/Bucketing/BucketMiner.cs | Implements n-gram mining, scoring, and selection of top chains. |
| src/BitNetSharp.Core/Bucketing/ChainBucket.cs | Defines the chain bucket record (ID + token sequence + confidence). |
| src/BitNetSharp.Core/Bucketing/ChainBucketTable.cs | Adds prefix indexes and matching helpers for inference/training use. |
| src/BitNetSharp.Core/BitNetPaperModel.cs | Adds bucket table plumbing, sequence compression in training, and speculative decoding in generation. |
| src/BitNetSharp.Core/BitNetOptions.cs | Adds bucketing-related toggles/options. |
| src/BitNetSharp.Core/BitNetBootstrap.cs | Exposes new toggles through bootstrap model creation helpers. |
| src/BitNetSharp.App/HostedAgentModelFactory.cs | Threads bucketing toggles into hosted model creation. |
| src/BitNetSharp.App/Program.cs | Adds --enable-bucketing and mines/loads a table on startup for the built-in BitNet model. |
| docs/bucketing-implementation-plan-v1.0.md | Adds the implementation plan document. |
| docs/bucketing-guide.md | Adds usage/config guide for the bucketing subsystem. |
| docs/SUMMARY.md | Adds navigation entries for the new bucketing docs. |
| docs/README.md | Mentions bucketing in the documentation overview and links to new pages. |
| The `--enable-bucketing` flag mines a `ChainBucketTable` from the default training corpus at startup and activates both `EnableChainBuckets` and `EnableSequenceCompression`. | ||
|
|
There was a problem hiding this comment.
This section states that --enable-bucketing “activates both EnableChainBuckets and EnableSequenceCompression”, but the CLI wiring currently only passes enableChainBuckets when constructing the model. Update either the CLI implementation or this statement so the guide matches actual behavior.
| /// <param name="ChainId">Compact byte identifier in the range 0–255.</param> | ||
| /// <param name="TokenIds">Ordered token IDs that make up the n-gram chain (length 2–8).</param> | ||
| /// <param name="Confidence">Normalised confidence score derived from corpus frequency and conditional probability.</param> | ||
| public sealed record ChainBucket(byte ChainId, int[] TokenIds, float Confidence) | ||
| { |
There was a problem hiding this comment.
TokenIds is an int[], which remains mutable after construction. Since ChainBucketTable builds indexes based on these values, external mutation of the array can silently corrupt lookups. Consider making TokenIds immutable to callers (e.g., IReadOnlyList<int>/ImmutableArray<int>), or defensively copy the array when constructing the record/table.
| /// <param name="ChainId">Compact byte identifier in the range 0–255.</param> | |
| /// <param name="TokenIds">Ordered token IDs that make up the n-gram chain (length 2–8).</param> | |
| /// <param name="Confidence">Normalised confidence score derived from corpus frequency and conditional probability.</param> | |
| public sealed record ChainBucket(byte ChainId, int[] TokenIds, float Confidence) | |
| { | |
| /// <param name="chainId">Compact byte identifier in the range 0–255.</param> | |
| /// <param name="tokenIds">Ordered token IDs that make up the n-gram chain (length 2–8).</param> | |
| /// <param name="confidence">Normalised confidence score derived from corpus frequency and conditional probability.</param> | |
| public sealed record ChainBucket | |
| { | |
| /// <summary>Compact byte identifier in the range 0–255.</summary> | |
| public byte ChainId { get; } | |
| /// <summary>Ordered token IDs that make up the n-gram chain (length 2–8).</summary> | |
| public int[] TokenIds { get; } | |
| /// <summary>Normalised confidence score derived from corpus frequency and conditional probability.</summary> | |
| public float Confidence { get; } | |
| public ChainBucket(byte chainId, int[] tokenIds, float confidence) | |
| { | |
| if (tokenIds is null) | |
| { | |
| throw new ArgumentNullException(nameof(tokenIds)); | |
| } | |
| ChainId = chainId; | |
| TokenIds = (int[])tokenIds.Clone(); | |
| Confidence = confidence; | |
| } |
| if (key.Length > 1) | ||
| { | ||
| var prefix = new NGramKey(key.Tokens, 0, key.Length - 1); | ||
| conditionalProb = prefixCounts.TryGetValue(prefix, out var prefixFreq) && prefixFreq > 0 | ||
| ? freq / (double)prefixFreq | ||
| : 1d; | ||
| } |
There was a problem hiding this comment.
Conditional probability prefix is built with new NGramKey(key.Tokens, 0, key.Length - 1), which ignores key.Start. This will often miss the corresponding entry in prefixCounts, causing conditionalProb to fall back to 1 and skew scoring/selection. Use the same start offset as the n-gram when constructing the prefix key so it matches what was counted in prefixCounts.
| var prefixLen = Math.Min(3, contextTokenIds.Count); | ||
| for (var ci = prefixLen; ci < chain.TokenIds.Length && step < maxGeneratedTokens - 1; ci++) | ||
| { |
There was a problem hiding this comment.
prefixLen is derived from contextTokenIds.Count, not from how many tokens actually matched the chain prefix. If the lookup matched on 1 or 2 tokens, starting ci at 3 will skip parts of the chain (or skip speculative decoding entirely for 2-token chains), producing incorrect speculative continuation behavior. Track/return the matched prefix length from TryLookupPrefix (or do explicit 3/2/1 checks here) and start ci from that matched length.
| var prefixLen = Math.Min(3, contextTokenIds.Count); | |
| for (var ci = prefixLen; ci < chain.TokenIds.Length && step < maxGeneratedTokens - 1; ci++) | |
| { | |
| // Determine how many tokens at the end of the current context | |
| // actually match the beginning of this chain (up to 3 tokens). | |
| var maxPrefix = Math.Min(3, Math.Min(contextTokenIds.Count, chain.TokenIds.Length)); | |
| var matchedPrefixLen = 0; | |
| for (var k = maxPrefix; k >= 1; k--) | |
| { | |
| var match = true; | |
| var contextStart = contextTokenIds.Count - k; | |
| for (var i = 0; i < k; i++) | |
| { | |
| if (contextTokenIds[contextStart + i] != chain.TokenIds[i]) | |
| { | |
| match = false; | |
| break; | |
| } | |
| } | |
| if (match) | |
| { | |
| matchedPrefixLen = k; | |
| break; | |
| } | |
| } | |
| // If nothing actually matches, skip speculative decoding for this step. | |
| if (matchedPrefixLen == 0) | |
| { | |
| continue; | |
| } | |
| for (var ci = matchedPrefixLen; ci < chain.TokenIds.Length && step < maxGeneratedTokens - 1; ci++) | |
| { |
| // Verification: confirm the model also predicts this token from current context. | ||
| var verifyToken = SelectNextToken(Transformer.Forward(contextTokenIds)); | ||
| if (verifyToken.TokenId != speculativeId) |
There was a problem hiding this comment.
This speculative path runs an additional full Transformer.Forward(contextTokenIds) for every speculative token (verification), on top of the baseline forward pass that already produced nextToken. That increases the number of forward passes per generated token and is unlikely to provide the speedup described in the comments/docs. Consider restructuring verification to reuse logits from the existing forward pass for the first speculative token and/or batching verification so you don’t do a full forward per speculative step.
| public bool TryMatchAt(IReadOnlyList<int> sequence, int startIndex, out ChainBucket? chain) | ||
| { | ||
| ArgumentNullException.ThrowIfNull(sequence); | ||
|
|
There was a problem hiding this comment.
TryMatchAt uses sequence[startIndex] / startIndex + 1/2 without validating startIndex. Negative values or startIndex > sequence.Count will throw. Since this is a public API, add an explicit range check (and throw ArgumentOutOfRangeException) or otherwise guard before indexing.
| if ((uint)startIndex >= (uint)sequence.Count) | |
| { | |
| throw new ArgumentOutOfRangeException(nameof(startIndex)); | |
| } |
| string PrimaryLanguage = "en-US"); | ||
| string PrimaryLanguage = "en-US", | ||
| bool EnableChainBuckets = false, | ||
| int MaxChainLength = 8, |
There was a problem hiding this comment.
MaxChainLength is added to BitNetOptions but is not referenced anywhere in the codebase, so changing it has no effect (and the mining/speculation paths currently hardcode 2–8 via BucketMiner.MaxNGramLength). Either wire this option through to bucket mining/speculative expansion (with validation/clamping) or remove it to avoid a misleading public configuration knob.
| int MaxChainLength = 8, |
src/BitNetSharp.App/Program.cs
Outdated
| using var model = HostedAgentModelFactory.Create(modelSpecifier, verbosity, enableChainBuckets: enableBucketing); | ||
|
|
||
| // When --enable-bucketing is requested for the built-in BitNet model, mine chain buckets | ||
| // from the default training corpus and attach them so speculative decoding and sequence | ||
| // compression are active for the current session. | ||
| if (enableBucketing && model is BitNetHostedAgentModel bitNetBucketingModel) |
There was a problem hiding this comment.
--enable-bucketing is documented/commented as enabling both speculative decoding and training-time sequence compression, but the model is created with only enableChainBuckets: enableBucketing (sequence compression remains disabled). Either pass enableSequenceCompression: enableBucketing here (and in the factory call), or adjust the comment/docs so behavior matches reality.
|
@copilot apply changes based on the comments in this thread |
Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/c91d9b9f-56b6-4640-9cd3-ef7cd966f40f
Applied all review comments in commit 7746f56:
|
origin/main(3 commits: BitLinear memory optimisation, BitNetPaperAudit fixes, updated tests)origin/mainintocopilot/add-bucketing-implementation-plan— clean merge, no conflictsOriginal prompt
BitNet-b1.58-Sharp: Bucketing Implementation Plan v1.0
Chain-Bucket Speculative Decoding + Training-Time Sequence Compression
Core Feature for Inference Speedup and Training Efficiency
Version: 1.0
Date: March 20, 2026
Status: Production-ready blueprint – copy-paste into
docs/bucketing-implementation-plan-v1.0.mdImportant Notes Before Starting
# BitNet-b1.58-Sharp...to the last line.docs/bucketing-implementation-plan-v1.0.md.\``` → ``` (remove the backslash).All diagrams will render instantly.
Table of Contents
1. Executive Summary & Success Criteria
Goal: Add bucketing as a core optimization that accelerates both inference (via speculative multi-token jumps) and training (via compressed token sequences using super-tokens).
Success Criteria
BitNetOptions(enabled by default for new models)2. Prerequisites & Integration Points
BitNetTransformer,InferenceEngine, and training loopBitNetOptionsclass (for toggles)3. Overall Architecture
```mermaid
componentDiagram
component BitNetTransformer
component InferenceEngine
component TrainingLoop
component ChainBucketTable
component BucketMiner
InferenceEngine --> ChainBucketTable
TrainingLoop --> ChainBucketTable
BucketMiner --> ChainBucketTable
```
4. Phase 1: Offline Bucket Mining Pipeline (5–7 days)
BucketMinerservice that scans tokenized corpora.byte ChainID → TokenID[] chain + float confidence.chain-buckets-{vocab-hash}.bin(versioned, < 50 KB).5. Phase 2: Inference-Time Chain-Bucket Speculative Decoding (7–10 days)
Core flow:
Integration:
InferenceEngine.GenerateNextToken()with optional bucketing path.ChainBucketTableloaded from.binfile.BitNetOptions.EnableChainBucketsandMaxChainLength.6. Phase 3: Training-Time Sequence Compression with Super-Tokens (8–12 days)
New capability: During training, replace frequent n-grams with a single “super-token” (ChainID) to shorten sequences.
Steps:
BitNet specifics:
📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.