Taking a stab at the first audio models with CLAP by deven96 · Pull Request #297 · deven96/ahnlich

deven96 · 2026-02-18T08:33:42Z

No description provided.

…rch support Add Audio as a first-class input type across the full stack: proto, Rust, Go, and Python SDKs. Introduce ClapAudio and ClapText model variants backed by laion/larger_clap_music_and_speech ONNX weights, embedding audio and text into a shared 512-dim space for audio+text multimodal similarity search. - protos: add audio to StoreInput and MetadataValue oneofs; add CLAP_AUDIO, CLAP_TEXT to AIModel enum and AUDIO to AIStoreInputType - types: wire Audio through MetadataValue serde (aud: prefix), StoreInput conversion, and AiStoreInputType mapping - ai: add ModelType::Audio, ORTModality::Audio, ModelInput::Audios, and ORTAudioPreprocessor (symphonia decode → mono mix → rubato resample to 48kHz → pad/truncate to 10s → batched ndarray) - ai: add batch_inference_audio to SingleStageModel with L2 normalisation - dsl: add audio rule (/a<hex> syntax) to grammar and metadata parser - sdks: regenerate Go (buf) and Python (betterproto) from updated protos

…sor wiring - Fix log-Mel spectrogram: rand_trunc path (1 view, shape B×1×1000×64), Slaney-normalised filterbank, f_min=50 Hz, nb_max_frames=1000 - Add AudioInput struct replacing raw waveform array - Fix batch_inference_text to omit attention_mask when model doesn't require it (ClapText ONNX only takes input_ids) - Handle 2D model outputs in postprocess_text_output (CLAP/CLIP projection encoders) - Add ORTPostprocessor::Audio variant; fix ORTTextPostprocessor to support ClapText - Add cross-modal retrieval integration test with real audio files

…Text max tokens Add two tests alongside the existing cross-modal one: audio queried against audio (identity retrieval) and text queried against text (identity retrieval). Refactored shared boilerplate into helpers. Also fix ClapText max_input_tokens from 77 to 512 — it uses a RoBERTa tokenizer, not CLIP's 77-token window.

…mbeddings

…uncation

# Conflicts: # ahnlich/ai/src/tests/buffalo_l_test.rs

- Resolve model numbering: SFACE_YUNET=8, CLAP_AUDIO=9, CLAP_TEXT=10 - Integrate model_params infrastructure from main - Combine Buffalo_L and SFaceYunet face recognition features - Regenerate and format all SDKs

github-actions · 2026-02-20T15:30:06Z

Test Results

270 tests 270 ✅ 11m 52s ⏱️
35 suites 0 💤
4 files 0 ❌

Results for commit 63c101a.

♻️ This comment has been updated with latest results.

github-actions · 2026-02-20T15:54:33Z

Benchmark Results

group                                                        main                                   pr
-----                                                        ----                                   --
predicate_query_with_index/size_100                          1.00      3.1±0.00µs        ? ?/sec    1.02      3.2±0.00µs        ? ?/sec
predicate_query_with_index/size_1000                         1.10     33.1±0.03µs        ? ?/sec    1.00     30.0±0.03µs        ? ?/sec
predicate_query_with_index/size_10000                        1.00    392.3±0.36µs        ? ?/sec    1.01    397.9±0.45µs        ? ?/sec
predicate_query_with_index/size_100000                       1.00      5.4±0.14ms        ? ?/sec    1.03      5.5±0.19ms        ? ?/sec
predicate_query_without_index/size_100                       1.01      7.1±0.01µs        ? ?/sec    1.00      7.0±0.00µs        ? ?/sec
predicate_query_without_index/size_1000                      1.00     95.9±0.38µs        ? ?/sec    1.03     98.8±0.07µs        ? ?/sec
predicate_query_without_index/size_10000                     1.00    810.1±3.09µs        ? ?/sec    1.01    818.4±2.80µs        ? ?/sec
predicate_query_without_index/size_100000                    1.17     16.9±0.62ms        ? ?/sec    1.00     14.5±0.37ms        ? ?/sec
store_batch_insertion_without_predicates/size_100            1.05    237.7±7.11µs        ? ?/sec    1.00    227.3±1.95µs        ? ?/sec
store_batch_insertion_without_predicates/size_1000           1.02  1230.3±25.75µs        ? ?/sec    1.00   1209.3±8.76µs        ? ?/sec
store_batch_insertion_without_predicates/size_10000          1.01     13.3±0.12ms        ? ?/sec    1.00     13.2±0.13ms        ? ?/sec
store_batch_insertion_without_predicates/size_100000         1.00    130.4±0.66ms        ? ?/sec    1.00    130.6±0.74ms        ? ?/sec
store_retrieval_no_condition/size_100                        1.00    111.5±0.75µs        ? ?/sec    1.01    112.3±0.79µs        ? ?/sec
store_retrieval_no_condition/size_1000                       1.00   774.4±11.08µs        ? ?/sec    1.01    779.6±6.70µs        ? ?/sec
store_retrieval_no_condition/size_10000                      1.00      7.2±0.03ms        ? ?/sec    1.00      7.2±0.04ms        ? ?/sec
store_retrieval_no_condition/size_100000                     1.00     78.8±0.21ms        ? ?/sec    1.00     78.9±0.22ms        ? ?/sec
store_retrieval_non_linear_kdtree/size_100                   1.00    189.9±0.62µs        ? ?/sec    1.01    192.7±0.79µs        ? ?/sec
store_retrieval_non_linear_kdtree/size_1000                  1.00   1142.1±1.98µs        ? ?/sec    1.00   1145.9±1.76µs        ? ?/sec
store_retrieval_non_linear_kdtree/size_10000                 1.00     12.1±0.07ms        ? ?/sec    1.01     12.1±0.24ms        ? ?/sec
store_retrieval_non_linear_kdtree/size_100000                1.00    138.5±1.14ms        ? ?/sec    1.00    138.1±1.11ms        ? ?/sec
store_sequential_insertion_without_predicates/size_100       1.00    273.1±0.61µs        ? ?/sec    1.01    275.3±1.00µs        ? ?/sec
store_sequential_insertion_without_predicates/size_1000      1.00      2.7±0.00ms        ? ?/sec    1.00      2.7±0.00ms        ? ?/sec
store_sequential_insertion_without_predicates/size_10000     1.00     26.9±0.13ms        ? ?/sec    1.01     27.1±0.01ms        ? ?/sec
store_sequential_insertion_without_predicates/size_100000    1.00    268.0±0.45ms        ? ?/sec    1.01    271.0±0.70ms        ? ?/sec

deven96 added 6 commits February 16, 2026 14:40

feat(ai): add trait-based architecture for multi-stage models

3ef0799

feat(ai): implement Buffalo_L face recognition model

01e1453

Merge branch 'main' into feat/buffalo-l-face-recognition

82d62c7

deven96 requested review from HAKSOAT and Iamdavidonuh February 18, 2026 08:33

chore(go-sdk): apply goimports formatting to regenerated pb.go files

83e0bca

Iamdavidonuh approved these changes Feb 18, 2026

View reviewed changes

deven96 added 10 commits February 18, 2026 10:03

chore: point all model repos to deven96 HuggingFace forks

a952efd

Merge branch 'main' into feat/buffalo-l-face-recognition

199bb77

chore(protos): regenerate Node client for BUFFALO_L model

e4f00a3

Merge branch 'feat/buffalo-l-face-recognition' into feat/clap-audio-e…

64d4b46

…mbeddings

docs: add Node.js and Go client links to README

0b0d64d

docs: add Node.js SDK README and fix main README npm link

f729d99

fix(ai): return InvalidArgument when query image contains multiple faces

725a074

Merge branch 'feat/buffalo-l-face-recognition' into feat/clap-audio-e…

3a4ccce

…mbeddings

Use explicit errors for audio files above 10s rather than implicit tr…

e6d3a81

…uncation

Merge commit '3a4ccceb' into feat/clap-audio-embeddings

3c63e6e

# Conflicts: # ahnlich/ai/src/tests/buffalo_l_test.rs

deven96 force-pushed the feat/buffalo-l-face-recognition branch 8 times, most recently from 56c3ab4 to 5bc63a4 Compare February 20, 2026 14:32

Base automatically changed from feat/buffalo-l-face-recognition to main February 20, 2026 14:45

Merge origin/main into feat/clap-audio-embeddings

ad36d9e

- Resolve model numbering: SFACE_YUNET=8, CLAP_AUDIO=9, CLAP_TEXT=10 - Integrate model_params infrastructure from main - Combine Buffalo_L and SFaceYunet face recognition features - Regenerate and format all SDKs

deven96 force-pushed the feat/clap-audio-embeddings branch 2 times, most recently from c427492 to d79c446 Compare February 20, 2026 15:46

deven96 force-pushed the feat/clap-audio-embeddings branch 5 times, most recently from 01afeee to f11adcf Compare February 21, 2026 03:31

fix: handle Audio input in SFaceYunet and fix unused variable warnings

63c101a

deven96 force-pushed the feat/clap-audio-embeddings branch from f11adcf to 63c101a Compare February 21, 2026 03:34

deven96 merged commit 115767f into main Feb 21, 2026
7 checks passed

deven96 deleted the feat/clap-audio-embeddings branch February 21, 2026 03:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Taking a stab at the first audio models with CLAP#297

Taking a stab at the first audio models with CLAP#297
deven96 merged 19 commits intomainfrom
feat/clap-audio-embeddings

deven96 commented Feb 18, 2026

Uh oh!

github-actions Bot commented Feb 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Feb 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

deven96 commented Feb 18, 2026

Uh oh!

github-actions Bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

github-actions Bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Feb 20, 2026 •

edited

Loading

github-actions Bot commented Feb 20, 2026 •

edited

Loading