-
Notifications
You must be signed in to change notification settings - Fork 1.9k
PoC of repository ai bootstrap #7585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
ce63e62
c6d5d37
05ed0d2
8232a34
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,188 @@ | ||
| --- | ||
| description: "Guidance for GitHub Copilot when working on ML.NET (dotnet/machinelearning). Use for any task in this repo: code changes, test writing, PR reviews, issue investigation, build troubleshooting, or documentation." | ||
| --- | ||
|
|
||
| # ML.NET Development Guide | ||
|
|
||
| ## Repository Overview | ||
|
|
||
| ML.NET is a cross-platform, open-source machine learning framework for .NET. It provides APIs for training, evaluating, and deploying ML models across classification, regression, clustering, ranking, anomaly detection, time series, recommendation, and generative AI (LLaMA, Phi, Mistral via TorchSharp). | ||
|
|
||
| ### Key Technologies | ||
|
|
||
| - .NET SDK 10.0.100 (see `global.json`) | ||
| - Build system: Microsoft Arcade SDK (`eng/common/`) | ||
| - Test framework: xUnit (with `AwesomeAssertions`, `Xunit.Combinatorial`) | ||
| - Native dependencies: MKL, OpenMP, libmf, oneDNN | ||
| - Major dependencies: TorchSharp, ONNX Runtime, TensorFlow, LightGBM, Semantic Kernel | ||
| - Central package management: `Directory.Packages.props` | ||
|
|
||
| ## Build & Test | ||
|
|
||
| ### Build | ||
|
|
||
| ```bash | ||
| # Linux/macOS | ||
| ./build.sh | ||
|
|
||
| # Windows | ||
| build.cmd | ||
|
|
||
| # Build specific project | ||
| dotnet build src/Microsoft.ML.Core/Microsoft.ML.Core.csproj | ||
| ``` | ||
|
|
||
| The repo uses Arcade SDK. `build.sh`/`build.cmd` wraps `eng/common/build.sh`/`eng/common/build.ps1` with `--restore --build`. On Linux, native dependencies require `eng/common/native/install-dependencies.sh`. | ||
|
|
||
| ### Test | ||
|
|
||
| ```bash | ||
| # Run tests for a specific project | ||
| dotnet test test/Microsoft.ML.Tests/Microsoft.ML.Tests.csproj | ||
|
|
||
| # Run tests with filter | ||
| dotnet test test/Microsoft.ML.Tests/Microsoft.ML.Tests.csproj --filter "FullyQualifiedName~ClassName.MethodName" | ||
|
|
||
| # Run all tests (slow, prefer specific projects) | ||
| dotnet test Microsoft.ML.sln | ||
| ``` | ||
|
|
||
| Test projects multi-target `net8.0;net48;net9.0` on Windows, `net8.0` only on Linux/macOS/arm64. | ||
|
|
||
| ### Format | ||
|
|
||
| ```bash | ||
| dotnet format Microsoft.ML.sln --no-restore | ||
| ``` | ||
|
|
||
| The repo has `.editorconfig` and `EnforceCodeStyleInBuild=true`. | ||
|
|
||
| ## Project Structure | ||
|
|
||
| ``` | ||
| src/ | ||
| ├── Microsoft.ML.Core/ # Core types, contracts, host environment | ||
| ├── Microsoft.ML.Data/ # Data pipeline, DataView, schema | ||
| ├── Microsoft.ML/ # MLContext, public API surface | ||
| ├── Microsoft.ML.StandardTrainers/ # Built-in trainers (logistic regression, SVM, etc.) | ||
| ├── Microsoft.ML.Transforms/ # Data transforms (normalize, featurize, etc.) | ||
| ├── Microsoft.ML.AutoML/ # Automated ML pipeline selection | ||
| ├── Microsoft.ML.FastTree/ # Tree-based trainers | ||
| ├── Microsoft.ML.LightGbm/ # LightGBM integration | ||
| ├── Microsoft.ML.Recommender/ # Matrix factorization recommenders | ||
| ├── Microsoft.ML.TimeSeries/ # Time series analysis | ||
| ├── Microsoft.ML.Tokenizers/ # BPE/WordPiece/SentencePiece tokenizers | ||
| ├── Microsoft.ML.GenAI.Core/ # GenAI base types (CausalLM pipeline) | ||
| ├── Microsoft.ML.GenAI.LLaMA/ # LLaMA model support | ||
| ├── Microsoft.ML.GenAI.Phi/ # Phi model support | ||
| ├── Microsoft.ML.GenAI.Mistral/ # Mistral model support | ||
| ├── Microsoft.ML.TorchSharp/ # TorchSharp-based trainers | ||
| ├── Microsoft.ML.OnnxTransformer/ # ONNX model inference | ||
| ├── Microsoft.ML.TensorFlow/ # TensorFlow model inference | ||
| ├── Microsoft.ML.Vision/ # Image classification | ||
| ├── Microsoft.ML.ImageAnalytics/ # Image transforms | ||
| ├── Microsoft.ML.CpuMath/ # SIMD-optimized math operations | ||
| ├── Microsoft.Data.Analysis/ # DataFrame API | ||
| ├── Native/ # C/C++ native library sources | ||
| └── Common/ # Shared internal code | ||
| test/ | ||
| ├── Microsoft.ML.TestFramework/ # Base test classes and helpers | ||
| ├── Microsoft.ML.TestFrameworkCommon/ # Shared test utilities | ||
| ├── Microsoft.ML.Tests/ # Main functional tests | ||
| ├── Microsoft.ML.Core.Tests/ # Core unit tests | ||
| ├── Microsoft.ML.IntegrationTests/ # End-to-end integration tests | ||
| ├── Microsoft.ML.Tokenizers.Tests/ # Tokenizer tests | ||
| ├── Microsoft.ML.GenAI.*.Tests/ # GenAI component tests | ||
| └── ... (30+ test projects) | ||
| ``` | ||
|
|
||
| ## Conventions | ||
|
|
||
| ### Code Style | ||
|
|
||
| Every `.cs` file starts with the 3-line .NET Foundation MIT license header. This is enforced across the codebase and must not be omitted. | ||
|
|
||
| Namespaces match assembly name (`Microsoft.ML`, `Microsoft.ML.Data`, `Microsoft.ML.Trainers`). Order usings as `System.*` first, then `Microsoft.*`, then others. | ||
|
|
||
| Use `[BestFriend]` attribute for internal members shared across assemblies. The repo has many assemblies that need to share types without making them public; `[BestFriend]` provides controlled cross-assembly visibility for this. | ||
|
|
||
| Use `Contracts.Check*` / `Contracts.Except*` for argument and state validation rather than raw `throw` statements. This ensures consistent error messages and lets the ML.NET host environment intercept validation failures. | ||
|
|
||
| XML docs with `<summary>` tags are required on all public types and members. | ||
|
|
||
| When editing an existing file, match its style even if it differs from general guidelines. Consistency within a file matters more than global uniformity. | ||
|
|
||
| Follow [dotnet/runtime coding-style](https://github.com/dotnet/runtime/blob/main/docs/coding-guidelines/coding-style.md). | ||
|
|
||
| ### Test Conventions | ||
|
|
||
| Framework: xUnit (`[Fact]`, `[Theory]`, `[InlineData]`). | ||
|
|
||
| Inherit from `TestDataPipeBase` (for data pipeline tests) or `BaseTestClass` (for simpler tests). Both provide `ITestOutputHelper`, test data paths, and locale pinning to `en-US`. | ||
|
|
||
| ```csharp | ||
| public class MyFeatureTests : TestDataPipeBase | ||
| { | ||
| public MyFeatureTests(ITestOutputHelper output) : base(output) { } | ||
|
|
||
| [Fact] | ||
| public void MyFeatureBasicTest() | ||
| { | ||
| // ... | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| Name test classes as `{Feature}Tests`, test methods as PascalCase descriptive names (e.g., `RandomizedPcaTrainerBaselineTest`). Do not use `Test_` prefixes or `_Should_` patterns. | ||
|
|
||
| Use `Assert.*` (xUnit) or `AwesomeAssertions` for fluent assertions. Do not use `Assert.That` (NUnit style). | ||
|
|
||
| Test data: use `Microsoft.ML.TestDatabases` package or files in `test/data/`, referenced via `GetDataPath("filename")` from the base class. Baseline output comparison uses files in `test/BaselineOutput/`. Update baselines carefully since they are the source of truth for output format stability. | ||
|
|
||
| Gotchas: the base class pins locale to `en-US` (don't override). `AllowUnsafeBlocks` is enabled in test projects for native interop testing. XML doc warnings (CS1573, CS1591, CS1712) are suppressed in test code. | ||
|
|
||
| ### Architecture | ||
|
|
||
| `MLContext` is the main entry point, exposing catalogs for each ML task (classification, regression, etc.). | ||
|
|
||
| Data flows through `IDataView`, a lazy, columnar, cursor-based data pipeline. This design avoids loading entire datasets into memory, which matters for ML workloads. | ||
|
|
||
| Trainers implement the `IEstimator<T>` to `ITransformer` pattern: call `Fit()` to train, then `Transform()` to apply. New trainers go in their own project under `src/`. New test projects mirror source naming: `Microsoft.ML.Foo` to `Microsoft.ML.Foo.Tests`. | ||
|
|
||
| ## Git Workflow | ||
|
|
||
| - Default branch: `main` | ||
| - Never commit directly to `main`, always create a feature branch | ||
| - Branch naming: `feature/description`, `fix/description` | ||
| - PRs are squash-merged | ||
| - Reference a filed issue in PR description | ||
| - Address review feedback in additional commits (don't amend/force-push) | ||
| - Use `git rebase` for conflict resolution, not merge commits | ||
|
|
||
| ## CI | ||
|
|
||
| Primary CI: Azure DevOps Pipelines (`build/vsts-ci.yml`), the official signed build. Builds run on Windows, Linux (Ubuntu 22.04), and macOS, covering both managed (.NET) and native components. Code coverage uses `coverlet.collector`. A custom internal Roslyn analyzer (`Microsoft.ML.InternalCodeAnalyzer`) runs on all test projects. | ||
|
|
||
| ## AI Infrastructure | ||
|
|
||
| ### Workflows | ||
|
|
||
| GitHub Actions in `.github/workflows/`: | ||
|
|
||
| | Workflow | Trigger | Purpose | | ||
| |----------|---------|---------| | ||
| | `copilot-setup-steps.yml` | Manual | Remote Copilot Coding Agent build environment | | ||
| | `find-similar-issues.yml` | Issue opened | AI-powered duplicate detection for new issues | | ||
| | `inclusive-heat-sensor.yml` | Comments | Detect heated language in issue/PR comments | | ||
|
|
||
| ### Prompts | ||
|
|
||
| Reusable prompt templates in `.github/prompts/`: | ||
|
|
||
| | Prompt | Purpose | | ||
| |--------|---------| | ||
| | `release-notes.prompt.md` | Generate classified release notes between commits | | ||
|
|
||
| ### Issue Triage | ||
|
|
||
| For issue triage workflows (automated milestone assignment, priority labeling, investigation), use [GitHub Agentic Workflows](https://github.github.com/gh-aw/). Define triage automation as natural-language workflow files rather than custom scripts. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| # ML.NET Release Notes | ||
|
|
||
| Generate classified release notes between two commits. | ||
|
|
||
| ## Categories | ||
|
|
||
| 1. **Product** — Bug fixes, features, improvements | ||
| 2. **Dependencies** — Package/SDK updates | ||
| 3. **Testing** — Test changes and infrastructure | ||
| 4. **Documentation** — Docs, samples | ||
| 5. **Housekeeping** — Build, CI, cleanup | ||
|
|
||
| ## Process | ||
|
|
||
| ```bash | ||
| # Get commits between two points | ||
| git log --pretty=format:"%h - %s (%an)" BRANCH1..BRANCH2 > commits.txt | ||
| ``` | ||
|
|
||
| Classify each commit. When uncertain, default to Housekeeping. Group related commits. Flag breaking changes with ⚠️. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,92 @@ | ||
| name: "Find Similar Issues with AI" | ||
|
|
||
| on: | ||
| issues: | ||
| types: [opened] | ||
|
|
||
| permissions: | ||
| contents: read | ||
| issues: write | ||
| models: read | ||
|
|
||
| jobs: | ||
| find-similar-issues: | ||
| runs-on: ubuntu-latest | ||
| if: github.event_name == 'issues' | ||
| steps: | ||
| - uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: '20' | ||
|
|
||
| - run: npm init -y && npm install @octokit/rest | ||
|
|
||
| - name: Find and post similar issues | ||
| env: | ||
| GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| ISSUE_NUMBER: ${{ github.event.issue.number }} | ||
| ISSUE_TITLE: ${{ github.event.issue.title }} | ||
| ISSUE_BODY: ${{ github.event.issue.body }} | ||
| run: | | ||
| node << 'SCRIPT' | ||
| const { Octokit } = require("@octokit/rest"); | ||
| const fs = require('fs'); | ||
| const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN }); | ||
| const endpoint = "https://models.inference.ai.azure.com"; | ||
| const model = "gpt-4o-mini"; | ||
| const token = process.env.GITHUB_TOKEN; | ||
| const issueNum = parseInt(process.env.ISSUE_NUMBER); | ||
| const title = process.env.ISSUE_TITLE; | ||
| const body = process.env.ISSUE_BODY || ''; | ||
| const [owner, repo] = process.env.GITHUB_REPOSITORY.split('/'); | ||
|
|
||
| function extractWords(text) { | ||
| const stop = new Set(['the','and','for','with','this','that','from','have','not','are','was','will','can','when','what','how','use','does','issue','error','work']); | ||
| return [...new Set(text.replace(/```[\s\S]*?```/g,'').replace(/https?:\/\/\S+/g,'').replace(/[^a-z0-9\s]/gi,' ').toLowerCase().split(/\s+/).filter(w=>w.length>3&&!stop.has(w)))]; | ||
| } | ||
| function jaccard(a,b) { const i=a.filter(w=>b.includes(w)); const u=[...new Set([...a,...b])]; return u.length?i.length/u.length:0; } | ||
|
|
||
| (async()=>{ | ||
| const issues=[]; | ||
| for(let p=1;p<=10;p++){ | ||
| const r=await octokit.issues.listForRepo({owner,repo,state:'all',per_page:100,page:p,sort:'updated',direction:'desc'}); | ||
| if(!r.data.length)break; | ||
| issues.push(...r.data.filter(i=>i.number!==issueNum&&!i.pull_request)); | ||
| } | ||
| const words=extractWords(`${title}\n${body}`); | ||
| const candidates=issues.map(i=>({issue:i,score:jaccard(words,extractWords(`${i.title}\n${i.body||''}`))})) | ||
| .filter(c=>c.score>0.1).sort((a,b)=>b.score-a.score).slice(0,30); | ||
|
|
||
| const results=[]; | ||
| for(const{issue}of candidates){ | ||
| try{ | ||
| const r=await fetch(`${endpoint}/chat/completions`,{method:"POST",headers:{"Content-Type":"application/json","Authorization":`Bearer ${token}`}, | ||
| body:JSON.stringify({model,temperature:0.3,max_tokens:150,messages:[ | ||
|
Comment on lines
+55
to
+63
|
||
| {role:"system",content:'Analyze GitHub issue similarity. Return JSON only: {"score":0.0,"reason":"brief"}'}, | ||
| {role:"user",content:`Current:\nTitle: ${title}\nBody: ${body}\n\nCompare:\nTitle: ${issue.title}\nBody: ${issue.body||'None'}`} | ||
| ]})}); | ||
| const d=await r.json(); | ||
| if(!d.choices?.[0])continue; | ||
| const parsed=JSON.parse(d.choices[0].message.content.trim().replace(/^```json?\s*/gm,'').replace(/```$/gm,'')); | ||
| if(parsed.score>=0.6) results.push({number:issue.number,title:issue.title,state:issue.state,url:issue.html_url,score:parsed.score,reason:parsed.reason,labels:issue.labels.map(l=>l.name)}); | ||
| await new Promise(r=>setTimeout(r,100)); | ||
| }catch(e){console.error(`#${issue.number}:`,e.message)} | ||
| } | ||
| results.sort((a,b)=>b.score-a.score); | ||
| const top=results.slice(0,5); | ||
|
|
||
| let comment=''; | ||
| if(top.length){ | ||
| comment=`## 🔍 Similar Issues Found\n\n`; | ||
| top.forEach((s,i)=>{ | ||
| comment+=`<details><summary><strong>${i+1}. <a href="${s.url}">#${s.number}</a>: ${s.title}</strong> (${Math.round(s.score*100)}%)</summary>\n\n`; | ||
| comment+=`**State:** ${s.state==='open'?'🟢 Open':'🔴 Closed'} \n**Labels:** ${s.labels.slice(0,5).map(l=>'`'+l+'`').join(', ')||'None'}\n`; | ||
| if(s.reason) comment+=`**Why:** ${s.reason}\n`; | ||
| comment+=`</details>\n\n`; | ||
| }); | ||
| comment+=`---\n*AI-powered similar issue detection*`; | ||
| } else { | ||
| comment=`## 🔍 No similar issues found with high confidence.\n\n---\n*AI-powered similar issue detection*`; | ||
| } | ||
| await octokit.issues.createComment({owner,repo,issue_number:issueNum,body:comment}); | ||
| })(); | ||
| SCRIPT | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| name: Inclusive Heat Sensor | ||
| on: | ||
| issues: | ||
| types: [opened, reopened] | ||
| issue_comment: | ||
| types: [created, edited] | ||
| pull_request_review_comment: | ||
| types: [created, edited] | ||
|
|
||
| permissions: | ||
| contents: read | ||
| issues: write | ||
| pull-requests: write | ||
|
|
||
| jobs: | ||
| detect-heat: | ||
| uses: jonathanpeppers/inclusive-heat-sensor/.github/workflows/comments.yml@v0.1.2 | ||
| with: | ||
| minimizeComment: true | ||
| offensiveThreshold: 9 | ||
| angerThreshold: 9 |
Uh oh!
There was an error while loading. Please reload this page.