schedule: every 60m
timeout-minutes: 360
Build tsikit-learn: scikit-learn → TypeScript migration
Goal
Build tsikit-learn, a complete TypeScript port of scikit-learn, one feature at a time. This is an open-ended program — it runs continuously, always adding the next piece of scikit-learn functionality.
Data layer: This project uses tsb (a TypeScript port of pandas) as its DataFrame/Series foundation, just as scikit-learn uses pandas/numpy. tsb is a peer dependency.
How each iteration works
- Read the README at the repo root. It is the source of truth for all project parameters (package name, stack, conventions, testing requirements).
- Read repo-memory (
.autoloop/, AGENTS.md, CLAUDE.md, any planning docs) and the full issue thread (comments from other runs and steering from maintainers).
- Check for other running jobs. If another autoloop job is in-flight on this program, choose different work that won't conflict. Check the long-running branch (
autoloop/build-tsikit-learn) and recent commits to understand what's already in progress. Integrate cleanly when merging.
- Plan extensively before writing code. On each iteration, write or update a detailed plan in repo-memory documenting: what scikit-learn modules exist, what's been ported so far, what's next, and why. The plan should reference the scikit-learn source directly.
- Pick ONE feature to implement. Start with whatever is most foundational and work outward. Each iteration adds exactly one cohesive piece — never half-finish something.
- Implement it fully:
- Source code in
src/ — strict TypeScript, no any, no escape hatches
- Comprehensive tests — unit, property-based (fast-check), fuzz where applicable. MATCH EXACT COVERAGE OF scikit-learn's Python tests. Duplicate all tests and add more.
- Interactive web playground/demo page for the feature
- Update all docs, exports, and indexes
- Commit with a clear message describing what scikit-learn feature was ported.
First iteration
The very first iteration should:
- Set up the complete project structure:
bun init, tsconfig.json (strictest settings), linting config (Biome), test config, CI workflow (GitHub Actions with Bun), Pages deployment pipeline
- Install
tsb as a dependency for the data layer (DataFrame, Series, Index from tsessebe)
- Create the initial
src/index.ts with the tsikit-learn package entry point
- Write a minimal "hello world" test to prove the pipeline works end to end
- Set up the playground infrastructure (copy the pattern from tsessebe's playground) — interactive code editor, browser bundle, GitHub Pages deploy
- Document the full migration plan in repo-memory: enumerate scikit-learn's top-level modules and features, propose an ordering, note architectural decisions
- Commit the plan and project skeleton — no scikit-learn features yet, just the foundation
Migration ordering (suggested)
Port scikit-learn modules in dependency order, starting with the foundational pieces everything else builds on:
Phase 1 — Foundation (math & utilities)
base — BaseEstimator, mixins (ClassifierMixin, RegressorMixin, TransformerMixin, ClusterMixin), clone, parameter get/set, sklearn API conventions
utils — validation (check_array, check_X_y, check_is_fitted), type checking, multiclass helpers, class_weight, extmath (safe_sparse_dot, row_norms, softmax, log_logistic)
utils.validation — input validation, array conversion, sample weight checks
exceptions — NotFittedError, ConvergenceWarning, etc.
Phase 2 — Preprocessing & metrics
5. preprocessing — StandardScaler, MinMaxScaler, MaxAbsScaler, RobustScaler, Normalizer, Binarizer, LabelEncoder, OneHotEncoder, OrdinalEncoder, PolynomialFeatures, FunctionTransformer, PowerTransformer, QuantileTransformer, KBinsDiscretizer, SplineTransformer
6. metrics — accuracy, precision, recall, f1, confusion_matrix, classification_report, roc_auc, roc_curve, mean_squared_error, mean_absolute_error, r2_score, log_loss, silhouette_score, adjusted_rand_score, pairwise distances/kernels
7. model_selection — train_test_split, KFold, StratifiedKFold, cross_val_score, cross_validate, GridSearchCV, RandomizedSearchCV, ParameterGrid, learning_curve, validation_curve
Phase 3 — Core estimators
8. linear_model — LinearRegression, Ridge, Lasso, ElasticNet, LogisticRegression, SGDClassifier, SGDRegressor, Perceptron, PassiveAggressiveClassifier
9. tree — DecisionTreeClassifier, DecisionTreeRegressor, export_graphviz, plot_tree
10. neighbors — KNeighborsClassifier, KNeighborsRegressor, NearestNeighbors, KDTree, BallTree, radius_neighbors
11. naive_bayes — GaussianNB, MultinomialNB, BernoulliNB, ComplementNB, CategoricalNB
12. svm — SVC, SVR, LinearSVC, LinearSVR, NuSVC, NuSVR (pure TS implementations, no libsvm)
13. cluster — KMeans, MiniBatchKMeans, DBSCAN, AgglomerativeClustering, SpectralClustering, MeanShift, Birch, OPTICS
Phase 4 — Ensemble & advanced
14. ensemble — RandomForestClassifier, RandomForestRegressor, GradientBoostingClassifier, GradientBoostingRegressor, AdaBoostClassifier, AdaBoostRegressor, BaggingClassifier, BaggingRegressor, VotingClassifier, StackingClassifier, HistGradientBoosting*
15. decomposition — PCA, IncrementalPCA, KernelPCA, TruncatedSVD, NMF, FactorAnalysis, FastICA, LatentDirichletAllocation
16. manifold — TSNE, MDS, Isomap, LocallyLinearEmbedding, SpectralEmbedding
17. feature_selection — SelectKBest, SelectPercentile, GenericUnivariateSelect, RFE, RFECV, SelectFromModel, VarianceThreshold, mutual_info_classif, mutual_info_regression, f_classif, f_regression, chi2
18. feature_extraction — DictVectorizer, FeatureHasher, text (CountVectorizer, TfidfVectorizer, TfidfTransformer, HashingVectorizer)
Phase 5 — Pipelines, imputation & remaining
19. pipeline — Pipeline, FeatureUnion, make_pipeline, make_union, ColumnTransformer
20. impute — SimpleImputer, IterativeImputer, KNNImputer, MissingIndicator
21. compose — ColumnTransformer, TransformedTargetRegressor, make_column_selector
22. calibration — CalibratedClassifierCV, calibration_curve
23. multiclass — OneVsRestClassifier, OneVsOneClassifier, OutputCodeClassifier
24. multioutput — MultiOutputClassifier, MultiOutputRegressor, ClassifierChain, RegressorChain
25. discriminant_analysis — LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
26. gaussian_process — GaussianProcessClassifier, GaussianProcessRegressor, kernels (RBF, Matern, DotProduct, WhiteKernel, ConstantKernel, RationalQuadratic, ExpSineSquared, Sum, Product)
27. isotonic — IsotonicRegression, isotonic_regression, check_increasing
28. kernel_approximation — RBFSampler, Nystroem, AdditiveChi2Sampler, SkewedChi2Sampler
29. kernel_ridge — KernelRidge
30. mixture — GaussianMixture, BayesianGaussianMixture
31. neural_network — MLPClassifier, MLPRegressor, BernoulliRBM
32. semi_supervised — LabelPropagation, LabelSpreading, SelfTrainingClassifier
33. datasets — make_classification, make_regression, make_blobs, make_moons, make_circles, make_swiss_roll, load_iris, load_digits, load_wine, load_breast_cancer
Key constraints
- Package name is
tsikit-learn. All imports: import { LinearRegression } from 'tsikit-learn'
- Data layer is
tsb. Use tsb (from tsessebe) for DataFrame, Series, Index — just as scikit-learn uses pandas/numpy. For numeric arrays, use typed arrays (Float64Array, Int32Array) directly. Implement a thin ndarray-like wrapper for 2D operations.
- Bun for runtime, bundling, testing
- Zero additional dependencies for core library beyond
tsb. Build all ML algorithms from scratch in pure TypeScript. No WASM, no native bindings.
- Strictest TypeScript —
strict: true, noUncheckedIndexedAccess: true, exactOptionalPropertyTypes: true, no any anywhere, no @ts-ignore, no as casts unless provably safe
- Strictest linting — Biome with all rules enabled, zero warnings
- 100% test coverage — Re-use scikit-learn's Python tests for everything, plus add more. Unit tests, property-based tests (fast-check), fuzz tests, Playwright e2e for the web playground
- Interactive web playground — every feature gets a demo page showing the algorithm in action (visualizations, scatter plots, decision boundaries where applicable), deployed to GitHub Pages
- Don't worry about performance optimization — another program handles that. Focus on correctness and completeness.
- scikit-learn API parity — match scikit-learn's public API surface, adapted to TypeScript idioms.
fit(), predict(), transform(), fit_transform(), score(), get_params(), set_params() patterns. When in doubt, read the scikit-learn source.
Playground / Pages site
The playground follows the same pattern as tsessebe:
- Landing page (
playground/index.html) with a feature roadmap grid showing ported vs pending modules
- One page per feature with interactive demos (e.g., train a model, see predictions, visualize decision boundaries)
- In-browser TypeScript editor powered by the TypeScript compiler
- Built and deployed to GitHub Pages via CI
- Use Canvas/SVG for visualizations (scatter plots, decision boundaries, dendrograms, ROC curves, etc.)
Target
This program may modify any file in the repository. It is building the project from scratch.
Only modify these files:
src/** — library source code
tests/** — all test files
playground/** — interactive web playground/demos
package.json — package config
tsconfig.json — TypeScript config
biome.json — linter config
bunfig.toml — Bun config
.github/workflows/** — CI/CD pipelines (but not autoloop workflow files)
AGENTS.md — agent instructions
CLAUDE.md — Claude Code config
.autoloop/memory/** — repo-memory for planning and coordination
Do NOT modify:
README.md — source of truth, read-only for this program
.autoloop/programs/** — program definitions
.github/ISSUE_TEMPLATE/** — issue templates
.github/workflows/autoloop* — autoloop workflow files
.github/workflows/sync-branches* — sync workflow files
Evaluation
# Type check must pass — reject iterations that introduce type errors
if command -v bunx >/dev/null 2>&1; then
if ! bunx tsc --noEmit 2>&1; then
echo '{"sklearn_features_ported": null, "rejected_reason": "type check failed"}'
exit 0
fi
fi
# Tests must pass — reject iterations that break existing functionality
if command -v bun >/dev/null 2>&1; then
if ! bun test 2>&1; then
echo '{"sklearn_features_ported": null, "rejected_reason": "tests failed"}'
exit 0
fi
fi
# Count TypeScript source files that contain sklearn-related functionality
# (excludes config, test infra, playground scaffolding — only counts actual library code)
count=$(find src -name '*.ts' -not -name 'index.ts' -not -name '*.d.ts' 2>/dev/null | xargs grep -l 'export' 2>/dev/null | wc -l | tr -d ' ')
echo "{\"sklearn_features_ported\": ${count:-0}}"
The metric is sklearn_features_ported. Higher is better.
schedule: every 60m
timeout-minutes: 360
Build tsikit-learn: scikit-learn → TypeScript migration
Goal
Build
tsikit-learn, a complete TypeScript port of scikit-learn, one feature at a time. This is an open-ended program — it runs continuously, always adding the next piece of scikit-learn functionality.Data layer: This project uses
tsb(a TypeScript port of pandas) as its DataFrame/Series foundation, just as scikit-learn uses pandas/numpy.tsbis a peer dependency.How each iteration works
.autoloop/,AGENTS.md,CLAUDE.md, any planning docs) and the full issue thread (comments from other runs and steering from maintainers).autoloop/build-tsikit-learn) and recent commits to understand what's already in progress. Integrate cleanly when merging.src/— strict TypeScript, noany, no escape hatchesFirst iteration
The very first iteration should:
bun init,tsconfig.json(strictest settings), linting config (Biome), test config, CI workflow (GitHub Actions with Bun), Pages deployment pipelinetsbas a dependency for the data layer (DataFrame, Series, Index from tsessebe)src/index.tswith thetsikit-learnpackage entry pointMigration ordering (suggested)
Port scikit-learn modules in dependency order, starting with the foundational pieces everything else builds on:
Phase 1 — Foundation (math & utilities)
base— BaseEstimator, mixins (ClassifierMixin, RegressorMixin, TransformerMixin, ClusterMixin), clone, parameter get/set, sklearn API conventionsutils— validation (check_array, check_X_y, check_is_fitted), type checking, multiclass helpers, class_weight, extmath (safe_sparse_dot, row_norms, softmax, log_logistic)utils.validation— input validation, array conversion, sample weight checksexceptions— NotFittedError, ConvergenceWarning, etc.Phase 2 — Preprocessing & metrics
5.
preprocessing— StandardScaler, MinMaxScaler, MaxAbsScaler, RobustScaler, Normalizer, Binarizer, LabelEncoder, OneHotEncoder, OrdinalEncoder, PolynomialFeatures, FunctionTransformer, PowerTransformer, QuantileTransformer, KBinsDiscretizer, SplineTransformer6.
metrics— accuracy, precision, recall, f1, confusion_matrix, classification_report, roc_auc, roc_curve, mean_squared_error, mean_absolute_error, r2_score, log_loss, silhouette_score, adjusted_rand_score, pairwise distances/kernels7.
model_selection— train_test_split, KFold, StratifiedKFold, cross_val_score, cross_validate, GridSearchCV, RandomizedSearchCV, ParameterGrid, learning_curve, validation_curvePhase 3 — Core estimators
8.
linear_model— LinearRegression, Ridge, Lasso, ElasticNet, LogisticRegression, SGDClassifier, SGDRegressor, Perceptron, PassiveAggressiveClassifier9.
tree— DecisionTreeClassifier, DecisionTreeRegressor, export_graphviz, plot_tree10.
neighbors— KNeighborsClassifier, KNeighborsRegressor, NearestNeighbors, KDTree, BallTree, radius_neighbors11.
naive_bayes— GaussianNB, MultinomialNB, BernoulliNB, ComplementNB, CategoricalNB12.
svm— SVC, SVR, LinearSVC, LinearSVR, NuSVC, NuSVR (pure TS implementations, no libsvm)13.
cluster— KMeans, MiniBatchKMeans, DBSCAN, AgglomerativeClustering, SpectralClustering, MeanShift, Birch, OPTICSPhase 4 — Ensemble & advanced
14.
ensemble— RandomForestClassifier, RandomForestRegressor, GradientBoostingClassifier, GradientBoostingRegressor, AdaBoostClassifier, AdaBoostRegressor, BaggingClassifier, BaggingRegressor, VotingClassifier, StackingClassifier, HistGradientBoosting*15.
decomposition— PCA, IncrementalPCA, KernelPCA, TruncatedSVD, NMF, FactorAnalysis, FastICA, LatentDirichletAllocation16.
manifold— TSNE, MDS, Isomap, LocallyLinearEmbedding, SpectralEmbedding17.
feature_selection— SelectKBest, SelectPercentile, GenericUnivariateSelect, RFE, RFECV, SelectFromModel, VarianceThreshold, mutual_info_classif, mutual_info_regression, f_classif, f_regression, chi218.
feature_extraction— DictVectorizer, FeatureHasher, text (CountVectorizer, TfidfVectorizer, TfidfTransformer, HashingVectorizer)Phase 5 — Pipelines, imputation & remaining
19.
pipeline— Pipeline, FeatureUnion, make_pipeline, make_union, ColumnTransformer20.
impute— SimpleImputer, IterativeImputer, KNNImputer, MissingIndicator21.
compose— ColumnTransformer, TransformedTargetRegressor, make_column_selector22.
calibration— CalibratedClassifierCV, calibration_curve23.
multiclass— OneVsRestClassifier, OneVsOneClassifier, OutputCodeClassifier24.
multioutput— MultiOutputClassifier, MultiOutputRegressor, ClassifierChain, RegressorChain25.
discriminant_analysis— LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis26.
gaussian_process— GaussianProcessClassifier, GaussianProcessRegressor, kernels (RBF, Matern, DotProduct, WhiteKernel, ConstantKernel, RationalQuadratic, ExpSineSquared, Sum, Product)27.
isotonic— IsotonicRegression, isotonic_regression, check_increasing28.
kernel_approximation— RBFSampler, Nystroem, AdditiveChi2Sampler, SkewedChi2Sampler29.
kernel_ridge— KernelRidge30.
mixture— GaussianMixture, BayesianGaussianMixture31.
neural_network— MLPClassifier, MLPRegressor, BernoulliRBM32.
semi_supervised— LabelPropagation, LabelSpreading, SelfTrainingClassifier33.
datasets— make_classification, make_regression, make_blobs, make_moons, make_circles, make_swiss_roll, load_iris, load_digits, load_wine, load_breast_cancerKey constraints
tsikit-learn. All imports:import { LinearRegression } from 'tsikit-learn'tsb. Usetsb(from tsessebe) for DataFrame, Series, Index — just as scikit-learn uses pandas/numpy. For numeric arrays, use typed arrays (Float64Array, Int32Array) directly. Implement a thin ndarray-like wrapper for 2D operations.tsb. Build all ML algorithms from scratch in pure TypeScript. No WASM, no native bindings.strict: true,noUncheckedIndexedAccess: true,exactOptionalPropertyTypes: true, noanyanywhere, no@ts-ignore, noascasts unless provably safefit(),predict(),transform(),fit_transform(),score(),get_params(),set_params()patterns. When in doubt, read the scikit-learn source.Playground / Pages site
The playground follows the same pattern as tsessebe:
playground/index.html) with a feature roadmap grid showing ported vs pending modulesTarget
This program may modify any file in the repository. It is building the project from scratch.
Only modify these files:
src/**— library source codetests/**— all test filesplayground/**— interactive web playground/demospackage.json— package configtsconfig.json— TypeScript configbiome.json— linter configbunfig.toml— Bun config.github/workflows/**— CI/CD pipelines (but not autoloop workflow files)AGENTS.md— agent instructionsCLAUDE.md— Claude Code config.autoloop/memory/**— repo-memory for planning and coordinationDo NOT modify:
README.md— source of truth, read-only for this program.autoloop/programs/**— program definitions.github/ISSUE_TEMPLATE/**— issue templates.github/workflows/autoloop*— autoloop workflow files.github/workflows/sync-branches*— sync workflow filesEvaluation
The metric is
sklearn_features_ported. Higher is better.