Skip to content

Feature/add gpu mlp estimation package#47

Merged
yoshifuminakamura merged 2 commits into
developfrom
feature/add-gpu-mlp-estimation-package
Jun 12, 2026
Merged

Feature/add gpu mlp estimation package#47
yoshifuminakamura merged 2 commits into
developfrom
feature/add-gpu-mlp-estimation-package

Conversation

@yoshifuminakamura

Copy link
Copy Markdown
Collaborator

This PR intentionally includes temporary CI switches for the GPU MLP estimator bring-up. They are documented in the commit message and should be replaced once the estimator runner/package flow is finalized.

Signed-off-by: Yoshifumi Nakamura <nakamura@riken.jp>
Connect GENESIS GPU runs to the PerfTools MLP_NN/v1.5 estimator path for early CI validation. GENESIS now emits a gpu_kernel_region estimation section when GPU MLP profiling is enabled and a padata archive is available, and the gpu_kernel_mlp_v15 section package can consume BenchKit padata archives containing Nsight Compute raw CSV data.

Add an NCU-to-PerfTools input bridge that extracts profile_raw.csv from padata, normalizes the Nsight Compute columns observed on MiyabiG, fills the current v1.5 static GPU spec gaps for known GPUs, and produces a prepared CSV before invoking predict_v15.py. Temporary NCU extraction stays outside the uploaded estimation artifact bundle so raw profiler data is not duplicated.

Add local validation support via scripts/test_estimate_submit.sh, mirroring test_submit.sh style for scheduler submission and adding an --estimate-only mode that can run inside Apptainer with PERFTOOLS/SIF or the corresponding BK_* variables.

Record GENESIS results with the GENESIS-specific Exp p8 and declare the same baseline Exp for estimation. This avoids falling through to the common CASE0 default, which is QWS-specific and is not a valid GENESIS experiment label.

Rename the lightweight estimation bundle from estimation_inputs to estimation_artifacts because it now carries prepared inputs, prediction outputs, and logs. Result Server storage, client restore paths, tests, and docs now use results/estimation_artifacts and received_estimation_artifacts.

Add canonical Result Server APIs /api/ingest/estimation-artifacts and /api/query/estimation-artifacts while keeping the old estimation-inputs endpoints as compatibility aliases. send_estimate.sh posts to the new endpoint and falls back to the legacy endpoint on 404.

Avoid duplicate large uploads: send_results.sh no longer uploads estimation bundles, send_estimate.sh excludes raw profiler archives such as *.ncu-rep, profile_raw.csv, padata*.tgz, and nested tgz files, and HTTP 413 for estimation artifact upload is treated as non-fatal after the Estimate JSON has been ingested.

Allow local matrix generation without CI_PIPELINE_SOURCE by recording PARENT_PIPELINE_SOURCE=local, and align the PerfTools smoke-mode documentation with the repository-wide Python 3.12+ runtime expectation.

Temporary bring-up wiring is intentionally explicit in GitLab CI: BK_QWS_GPU_MLP_SMOKE, BK_ESTIMATE_RUNNER_TAG=fncx-estimate-python, BK_GPU_MLP_PERFTOOLS_REPO/REF, BK_GENESIS_GPU_MLP_PROFILE, BK_GPU_MLP_NCU_LAUNCH_COUNT, BK_GPU_MLP_SOURCE_GPU, and BK_GPU_MLP_KERNEL_COUNT are provisional switches. Remove or replace them once the real estimator runner/package flow is settled.

Validation: WSL bash -n and shellcheck -S error for changed shell scripts; test_send_estimate_artifacts.sh, test_qws_gpu_mlp_smoke_estimation.sh, test_estimation_gpu_kernel_mlp_v15.sh, test_genesis_gpu_mlp_estimation.sh, and test_send_results_profile_data.sh; WSL result_server pytest for API, upload limits, audit logging, CSRF, and rate limiting; git diff --check.
Signed-off-by: Yoshifumi Nakamura <nakamura@riken.jp>
@yoshifuminakamura yoshifuminakamura merged commit d197785 into develop Jun 12, 2026
10 checks passed
@yoshifuminakamura yoshifuminakamura deleted the feature/add-gpu-mlp-estimation-package branch June 12, 2026 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant