Vendor full tokenized WikiText-2 corpus into core benchmark fixtures by Copilot · Pull Request #27 · sharpninja/BitNet-b1.58-Sharp

Copilot · 2026-03-21T16:39:34Z

Inspect repository dataset handling, script conventions, and documentation touchpoints
Run baseline solution build and tests before making changes
Add a local script to download/process full TinyLlama and WikiText-2 data into repository paths
Document how to run the script and what files it updates
Validate the solution build/tests still pass after the change
Run final code review and security validation

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/7c44e2cd-8fbf-4adf-ae05-635ca9ee4080

chatgpt-codex-connector · 2026-03-21T17:16:51Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copilot

Pull request overview

This PR vendors the full pre-tokenized WikiText-2 corpus into BitNetSharp.Core as embedded resources and updates benchmark fixtures/tests/docs so benchmark/audit paths use repository-local data without external downloads.

Changes:

Embed wiki.train/valid/test.tokens into BitNetSharp.Core and load them via BitNetBenchmarkFixtures.
Expand fixture surface with split-specific properties (train/valid/test) and keep perplexity wired to validation.
Add tests to validate split loading/counts and update benchmarking docs to reflect the new local corpus source.

Reviewed changes

Copilot reviewed 4 out of 7 changed files in this pull request and generated 3 comments.

File	Description
tests/BitNetSharp.Tests/HostedAgentBenchmarksExecutionTests.cs	Uses a smaller validation subset for perplexity test runtime; adds split-loading assertions for the new fixtures.
src/BitNetSharp.Core/BitNetSharp.Core.csproj	Embeds the vendored WikiText-2 token files as resources.
src/BitNetSharp.Core/BitNetBenchmarkFixtures.cs	Replaces placeholder WikiText-2 data with lazy-loaded embedded-resource splits.
docs/benchmarking.md	Documents that benchmark perplexity now uses the vendored local WikiText-2 validation split.

src/BitNetSharp.Core/BitNetBenchmarkFixtures.cs

tests/BitNetSharp.Tests/HostedAgentBenchmarksExecutionTests.cs

docs/benchmarking.md

sharpninja · 2026-03-21T17:20:51Z

@copilot apply changes based on the comments in this thread

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/cb4f0c8e-263a-48fc-875c-3ccea7631ebe

Copilot · 2026-03-21T17:26:31Z

@copilot apply changes based on the comments in this thread

Addressed in 679d00d. The WikiText-2 loader now preserves blank separator rows line-for-line, the fixture tests use shorter stable assertions instead of full long literals, and the benchmarking docs were updated to match the actual in-memory behavior.

sharpninja · 2026-03-21T19:15:39Z

@copilot diagnose and fix test failures in build pipeline.

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/efe42022-8a61-4408-89ff-da2ae67c2259

Copilot · 2026-03-21T19:44:04Z

@copilot diagnose and fix test failures in build pipeline.

Addressed in 527f9df. I traced the pipeline issue to the build.yml Test step hanging after the vendored WikiText-2 corpus made benchmark/audit perplexity tests process very large samples. The fix keeps full-corpus behavior in production, adds a compact repository-local WikiText-2 subset for the affected tests through a shared helper, and the targeted plus full solution test runs now complete locally.

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/b27c3cde-6893-4c19-866c-09f79c5acb7c

Initial plan

b7c690f

Copilot AI assigned Copilot and sharpninja Mar 21, 2026

Copilot started work on behalf of sharpninja March 21, 2026 16:39 View session

Copilot AI mentioned this pull request Mar 21, 2026

Add GitBook page for the real-training implementation plan #26

Draft

6 tasks

Copilot AI and others added 2 commits March 21, 2026 17:05

Add repository-local WikiText-2 validation fixture

fcac111

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/7c44e2cd-8fbf-4adf-ae05-635ca9ee4080

Add full tokenized WikiText-2 corpus

88c9077

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/7c44e2cd-8fbf-4adf-ae05-635ca9ee4080

Copilot AI changed the title ~~[WIP] Add WikiText-2 validation set pre-tokenized~~ Vendor full tokenized WikiText-2 corpus into core benchmark fixtures Mar 21, 2026

Copilot AI requested a review from sharpninja March 21, 2026 17:10

Copilot finished work on behalf of sharpninja March 21, 2026 17:10

sharpninja approved these changes Mar 21, 2026

View reviewed changes

sharpninja marked this pull request as ready for review March 21, 2026 17:16

Copilot AI review requested due to automatic review settings March 21, 2026 17:16

Copilot started reviewing on behalf of sharpninja March 21, 2026 17:17 View session

Copilot AI reviewed Mar 21, 2026

View reviewed changes

src/BitNetSharp.Core/BitNetBenchmarkFixtures.cs Outdated Show resolved Hide resolved

tests/BitNetSharp.Tests/HostedAgentBenchmarksExecutionTests.cs Outdated Show resolved Hide resolved

docs/benchmarking.md Outdated Show resolved Hide resolved

Copilot started work on behalf of sharpninja March 21, 2026 17:20 View session

Copilot AI and others added 2 commits March 21, 2026 17:23

Fix WikiText2 fixture review feedback

e5a811b

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/cb4f0c8e-263a-48fc-875c-3ccea7631ebe

Polish WikiText2 fixture review fixes

679d00d

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/cb4f0c8e-263a-48fc-875c-3ccea7631ebe

Copilot AI requested a review from sharpninja March 21, 2026 17:26

Copilot finished work on behalf of sharpninja March 21, 2026 17:26

sharpninja approved these changes Mar 21, 2026

View reviewed changes

Copilot started work on behalf of sharpninja March 21, 2026 19:15 View session

Copilot AI and others added 2 commits March 21, 2026 19:34

Fix CI test runtime for vendored WikiText2 fixtures

0c22055

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/efe42022-8a61-4408-89ff-da2ae67c2259

Refine CI test runtime fix

527f9df

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/efe42022-8a61-4408-89ff-da2ae67c2259

Copilot AI requested a review from sharpninja March 21, 2026 19:44

Copilot finished work on behalf of sharpninja March 21, 2026 19:44

Copilot started work on behalf of sharpninja March 21, 2026 19:46 View session

Copilot AI and others added 2 commits March 21, 2026 20:03

Add local corpus processing script

5ad33cc

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/b27c3cde-6893-4c19-866c-09f79c5acb7c

Polish corpus script usage docs

10e11be

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/b27c3cde-6893-4c19-866c-09f79c5acb7c

Copilot finished work on behalf of sharpninja March 21, 2026 20:05

sharpninja approved these changes Mar 21, 2026

View reviewed changes

sharpninja merged commit e944200 into main Mar 21, 2026

sharpninja deleted the copilot/add-wikitext-2-validation-set branch March 21, 2026 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vendor full tokenized WikiText-2 corpus into core benchmark fixtures#27

Vendor full tokenized WikiText-2 corpus into core benchmark fixtures#27
sharpninja merged 9 commits intomainfrom
copilot/add-wikitext-2-validation-set

Copilot AI commented Mar 21, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot commented Mar 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sharpninja commented Mar 21, 2026

Uh oh!

Copilot AI commented Mar 21, 2026

Uh oh!

sharpninja commented Mar 21, 2026

Uh oh!

Copilot AI commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Mar 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sharpninja commented Mar 21, 2026

Uh oh!

Copilot AI commented Mar 21, 2026

Uh oh!

sharpninja commented Mar 21, 2026

Uh oh!

Copilot AI commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 21, 2026 •

edited

Loading