17 Apr 22:06

johnnygreco

8be4ff7

v0.5.7 2026-04-17 Latest

Latest

What's Changed

fix: restrict docs-preview workflow to actual docs changes by @andreatgretel in #515
chore: harden CI supply chain by @andreatgretel in #517
fix: bump pytest, aiohttp, and cryptography for security CVEs by @johnnygreco in #535
fix: tune Dependabot config and fix DCO assistant bugs by @andreatgretel in #534
ci: publish devnotes independently of releases by @andreatgretel in #536
fix: async engine side-effect column propagation and collision resolution by @andreatgretel in #509
ci: add PR hygiene automation (linked issue check + stale PR cleanup) by @andreatgretel in #521
ci: bump the all-actions group with 5 updates by @dependabot[bot] in #539
feat: add generic and OpenRouter attribution headers by @eric-tramel in #542
docs: add text-to-sql dev note by @dhruvnathawani in #349
fix: text-to-sql devnote date, images, and publish-devnotes nav by @andreatgretel in #546
fix(ci): replace yq with Python nav patching in publish-devnotes by @andreatgretel in #548
feat: add skip.when conditional column generation by @nabinchha in #502
fix: use pull_request_target for agentic CI on fork PRs by @andreatgretel in #541
docs: Added starter dev notes on push to hugging face hub by @nabinchha in #355
fix: bridge model.generate() to agenerate() for custom columns in async engine by @andreatgretel in #545
ci: add daily audit suites with 5 rotating recipes and scheduled workflow by @andreatgretel in #543
feat: add RunConfig jinja rendering engine by @eric-tramel in #557

New Contributors

@dependabot[bot] made their first contribution in #539

Full Changelog: v0.5.6...v0.5.7

Contributors

eric-tramel, nabinchha, and 4 other contributors

Assets 2

09 Apr 19:36

johnnygreco

v0.5.6

6505ce4

v0.5.6 2026-04-09

What's Changed

fix: use --bare and --tools in health probe CLI check by @andreatgretel in #489
feat: add ATIF rollout ingestion by @eric-tramel in #495
fix: replace native-model-client-hero image with corrected version by @nabinchha in #492
docs: add skip column config option for conditional column generation (#479) by @nabinchha in #480
chore: plan 427, PR 2 of agent-first development plan by @nabinchha in #478
feat: add Hermes Agent rollout support by @eric-tramel in #500
fix: prevent skill load failure when data-designer CLI is not installed by @johnnygreco in #501
ci: add PR review workflow and recipe for agentic CI by @andreatgretel in #498
docs: add agent rollout ingestion docs entry point by @eric-tramel in #499
docs: add async engine dev note by @andreatgretel in #490
fix: use non-blocking dispatch to prevent pipeline starvation by @nabinchha in #505
feat: add Pi Coding Agent rollout seed source by @johnnygreco in #514
fix: always return ISO-8601 from datetime postproc (#484) by @johnnygreco in #512
fix: include multi_modal_context columns in required_columns by @nabinchha in #522
docs: add LiteLLM supply-chain incident notice to README by @johnnygreco in #516

Full Changelog: v0.5.5...v0.5.6

Contributors

eric-tramel, nabinchha, and 2 other contributors

Assets 3

02 Apr 16:31

johnnygreco

v0.5.5

d43ac1c

v0.5.5 2026-04-02

What's Changed

fix: Claude Code marketplace plugin structure and install docs by @johnnygreco in #458
docs: update dev note with TL;DR tips and install instructions by @johnnygreco in #461
chore: remove unused .claude-plugin directory by @johnnygreco in #463
chore: async engine follow-up - rename, preview, lifecycle, progress by @andreatgretel in #456
docs: restructure agent and contributor documentation (plan 427, PR 1) by @nabinchha in #454
fix: address nspect vulnerability report for requests and cryptography by @johnnygreco in #475
fix: update health checks to use new ModelFacade client API by @andreatgretel in #470
ci: upgrade GitHub Actions for Node.js 24 compatibility by @ko3n1g in #450
chore: reduce Greptile review noise from defensive coding suggestions by @andreatgretel in #423
docs: consolidated seed reader documentation by @eric-tramel in #481
fix: bump pymdown-extensions for pygments 2.20.0 compat by @eric-tramel in #482
feat: add fr_FR locale to nemotron personas datasets by @johnnygreco in #468
docs: add native model client dev note by @nabinchha in #465
fix: respect max_parallel_requests in HTTP connection pool size by @przemekboruta in #460
docs: center diagram images in native model client dev note by @nabinchha in #483
docs: update architecture-and-performance.md to reflect AIMD changes by @nabinchha in #467
chore: update review code skill output and tone by @nabinchha in #477
ci: add agentic CI plan, health probe workflow, and recipe scaffold by @andreatgretel in #473
test: add transport-wiring regression tests for #459 by @nabinchha in #485

New Contributors

@ko3n1g made their first contribution in #450
@przemekboruta made their first contribution in #460

Full Changelog: v0.5.4...v0.5.5

Contributors

eric-tramel, nabinchha, and 4 other contributors

Assets 3

25 Mar 02:02

johnnygreco

v0.5.4

0a7b9e0

v0.5.4 2026-03-24

🔒 Note on the LiteLLM Supply Chain Incident (2026-03-24)

Earlier today, malicious versions of litellm (1.82.7 and 1.82.8) were published to PyPI containing a credential stealer that exfiltrates cloud credentials, SSH keys, and cryptocurrency wallets on any Python process startup.

Data Designer v0.5.4 removes litellm as a dependency entirely. We recommend all users upgrade.

For users on prior versions, here is our assessment:

v0.3.0 – v0.5.3: litellm has been pinned at >=1.73.6,<1.80.12 since January — no exposure to the compromised versions.
v0.2.2 and v0.2.3: These releases briefly carried an upper bound of <2, which in theory permitted resolution to the malicious 1.82.x versions. Both have been yanked from PyPI as a precaution.
Realistic risk from Data Designer is very low. Exposure would require a user pinned to one of the two yanked versions and running a fresh install or dependency update during the few hours the compromised package was live this morning. That said, please verify your installed versions and upgrade Data Designer to v0.5.4 at your earliest convenience.

What's Changed

fix: correct broken dev note links in recipe pages by @dhruvnathawani in #407
docs: trace visualization in display_sample_record (#396) by @nabinchha in #397
chore: simplify tutorial 4 image dataset and use default model config by @nabinchha in #403
fix: preserve extra_body for LiteLLM to avoid UnsupportedParamsError (#409) by @nabinchha in #412
feat: normalize validator and constraint discriminators by @johnnygreco in #414
docs: add Open in Colab badges to tutorial notebooks by @mvansegbroeck in #391
feat: agent CLI introspection (simplified) by @johnnygreco in #415
fix: bump litellm lower bound to >=1.77.0 by @nabinchha in #417
feat: Improve generation failure reporting for schema and timeout failures by @eric-tramel in #416
feat: add built-in filesystem seed readers by @eric-tramel in #421
feat: add AsyncTaskScheduler and RowGroupBufferManager for async engine by @andreatgretel in #404
feat: Native OpenAI adapter with retry and AIMD throttle infrastructure by @nabinchha in #402
refactor: simplify agent CLI to context, types, and state (#418) by @johnnygreco in #420
feat: support 1-to-many FileSystemSeedReader hydration by @eric-tramel in #424
fix: support nested field access in schema transform templates by @andreatgretel in #435
chore: use uv run ruff in pre-commit hooks by @andreatgretel in #436
test: follow up FileSystemSeedReader coverage cleanup by @eric-tramel in #432
feat: Plan + Implementation for 392 managed storage improvements by @mikeknep in #393
feat: Native Anthropic adapter with shared HTTP client infrastructure by @nabinchha in #426
feat: add Data Designer skill by @johnnygreco in #434
feat: agent rollout trace ingestion by @eric-tramel in #399
feat: wire async task-queue scheduler into ColumnWiseDatasetBuilder by @andreatgretel in #429
feat: Constrain HttpModelClient to single concurrency mode... by @nabinchha in #439
docs: Updated telemetry by @kirit93 in #451
feat: add preview review reference and update interactive iterate step by @johnnygreco in #441
feat: add trace visualization to display_sample_record (#396) by @nabinchha in #438
docs: agent-assisted development plan for DataDesigner by @nabinchha in #428
feat: wire ThrottledModelClient and dual-semaphore scheduler by @nabinchha in #449
feat: remove litellm dependency and bridge path by @nabinchha in #455
feat: resolve data-designer command path before workflow execution by @johnnygreco in #440
docs: Data Designer Got Skills dev note by @johnnygreco in #457

New Contributors

@mvansegbroeck made their first contribution in #391

Full Changelog: v0.5.3...v0.5.4

Contributors

eric-tramel, mikeknep, and 6 other contributors

Assets 3

12 Mar 23:12

johnnygreco

v0.5.3

447ed59

v0.5.3 2026-03-12

What's Changed

fix: cache notebook builds to avoid flaky upstream model failures by @andreatgretel in #370
feat: canonical model client types, protocols, and LiteLLM bridge adapter by @nabinchha in #359
fix: processor artifacts type, discovery, and loading by @andreatgretel in #366
feat: add ExecutionGraph, CompletionTracker, and Task model for async scheduler by @andreatgretel in #356
fix: handle discriminated unions in oneOf pruning validator by @andreatgretel in #376
docs: account for vLLM reasoning field migration in plan 343 by @nabinchha in #377
chore: add Claude Code skill for code review by @nabinchha in #372
fix: replace removed DuckDB record_batch() with to_arrow_reader() by @andreatgretel in #380
fix: patch litellm ImageURLListItem to make index field optional (#384) by @nabinchha in #385
chore: improve test guidelines in AGENTS.md by @nabinchha in #387
fix: raise clear error when all records are dropped during generation by @nabinchha in #383
feat: add async generator migration with symmetric bridging and statefulness by @andreatgretel in #378
docs: add Enterprise Text-to-SQL and Search Agent recipes by @dhruvnathawani in #395
refactor: Decouple ModelFacade from LiteLLM via ModelClient adapter by @nabinchha in #373
docs: search agent dev note by @dhruvnathawani in #350
feat(cli): bootstrap default configs on CLI startup by @johnnygreco in #401
fix: pin chardet<6 to suppress RequestsDependencyWarning by @andreatgretel in #405
fix: add chardet<6 constraint to published engine package by @johnnygreco in #406

Full Changelog: v0.5.2...v0.5.3

Contributors

nabinchha, johnnygreco, and 2 other contributors

Assets 3

05 Mar 04:49

johnnygreco

v0.5.2

e2c94da

v0.5.2 2026-03-04

What's Changed

fix: repair notebook CI (dead model, missing API key, pyarrow type bug) by @andreatgretel in #348
docs: Update top models usage chart for 1/24-2/24/2026 by @kirit93 in #353
docs: add structured outputs SDG dev notes by @dhruvnathawani in #338
feat: add processor plugin support by @andreatgretel in #299
chore: plans for async generators and task-queue dataset builder by @andreatgretel in #347
chore: plans for model facade overhaul by @nabinchha in #344
fix: include seed dataset in builder repr for seed-only configs by @johnnygreco in #361
chore: bump cryptography and pillow for security fixes by @johnnygreco in #364
feat: add Streamable HTTP transport support for remote MCP providers by @nabinchha in #358
docs: update README token badge to 150+ billion by @johnnygreco in #367
docs: fix structure outputs blog format by @johnnygreco in #368
chore: fix inaccuracies and improve AGENTS.md by @nabinchha in #369
fix: include plugin column types in display_sample_record() by @3mei in #365

New Contributors

@3mei made their first contribution in #365

Full Changelog: v0.5.1...v0.5.2

Contributors

nabinchha, kirit93, and 4 other contributors

Assets 2

20 Feb 21:06

johnnygreco

v0.5.1

8f7a720

v0.5.1 2026-02-20

Data Designer now supports image generation!

What's Changed

docs: Updated url by @kirit93 in #325
docs: deep research trajectories with NDD and MCP tool use by @eric-tramel in #326
refactor: callback-based processor design by @andreatgretel in #294
feat: add image generation support with multi-modal context by @nabinchha in #317
docs: add image generation documentation and image-to-image editing tutorial by @nabinchha in #319
chore: move ArtifactStorage to engine/storage/ module by @nabinchha in #321
chore: gitignore Cerebro knowledge base files by @johnnygreco in #328
feat(engine): env-var switch for async-first models experiment by @eric-tramel in #280
docs: Moved nav to left hand side by @kirit93 in #331
feat: add --save-results option to preview command by @johnnygreco in #333
chore: Improve CLI startup with lazy heavy import cleanup by @johnnygreco in #330
feat: add allow_resize for 1:N and N:1 generation patterns by @andreatgretel in #286
chore: address Andre's feedback on --save-results and CLI preview by @johnnygreco in #335
chore: remove example_allow_resize.py from repo root by @andreatgretel in #337
fix: make DropColumnsProcessorConfig idempotent and support reasoning columns by @andreatgretel in #334
feat: add push_to_hub_from_folder classmethod for uploading saved datasets by @nabinchha in #340
fix: handle bool, int, float in convert_to_row_element by @dhruvnathawani in #336
feat: auto-detect ImageContext format for image-to-image generation by @nabinchha in #342

New Contributors

@dhruvnathawani made their first contribution in #336

Full Changelog: v0.5.0...v0.5.1

Contributors

eric-tramel, nabinchha, and 4 other contributors

Assets 2

11 Feb 22:22

johnnygreco

v0.5.0

631f1f9

v0.5.0 2026-02-11

🎨 NeMo Data Designer – v0.5.0 Release Notes

⚡Highlights

🛠️ MCP Tool Calling: LLM columns can now call external tools during generation via MCP!!
⚛️ Functions as custom column generators: The @custom_column_generator decorator that lets users write their own column generation logic and plug it directly into a pipeline.
🤗 Hugging Face Hub integration: You can now publish generated datasets directly to the Hugging Face Hub with auto-generated dataset cards: results.push_to_hub().
- Huge thank you to @davidberenstein1957 for starting the design and work on this feature, as well as @davanstrien and @Wauplin for their help pushing it over the finish line!
💻 CLI generation commands: You can generate data from the CLI using the new preview, create, and validate commands.
🔍 LLM Observability: Use the new with_trace option on LLM configs to return the TraceType.ALL_MESSAGE or the TraceType.LAST_MESSAGE. You can also selectively extract reasoning content using extract_reasoning_content=True.

⚠️ Breaking Changes

with_trace used to be a boolean. It is now a TraceType enum (NONE (default), LAST_MESSAGE, ALL_MESSAGES) instead of a boolean.
SingleColumnConfig is now isolated in its own base module data_designer.base.config to protect against circular imports during plugin discovery.

What's Changed

feat: MCP (Model Context Protocol) tool calling integration for LLM columns by @eric-tramel in #248
fix: normalize license header year format in mcp module by @johnnygreco in #279
chore: configure independent pytest settings per subpackage by @johnnygreco in #278
fix: normalize trace content blocks to prevent parquet write crashes by @eric-tramel in #283
feat: Add TraceType enum for granular trace control by @eric-tramel in #284
docs: add deployment, performance tuning guides and streamline gettin… by @kirit93 in #277
chore: update tutorial notebooks to use dd. notation consistently by @andreatgretel in #288
feat: add extract_reasoning_content option to LLM columns by @eric-tramel in #285
chore: add greptile.json to reduce review verbosity by @andreatgretel in #289
feat: switch from hatch-vcs to uv-dynamic-versioning by @johnnygreco in #282
revert: Remove RunConfig debug_trace_override by @eric-tramel in #290
perf: implement lazy loading for config module exports by @johnnygreco in #291
refactor: move SingleColumnConfig to config.base module by @johnnygreco in #287
feat: Add CustomColumnGenerator for user-defined column generation by @andreatgretel in #254
chore: standardize recipe script metadata and docstrings by @johnnygreco in #292
chore: enable status check in greptile.json by @dakshgup in #295
feat: add HuggingFace Hub integration for dataset publishing by @nabinchha in #275
docs: Added images for deployment options by @kirit93 in #297
docs: Add RQA dataset blog post and improve blog navigation by @kirit93 in #296
chore: quiet tool call logs and add tool usage statistics by @johnnygreco in #293
docs: Added documentation for seed datasets by @kirit93 in #300
docs: updated usage chart by @kirit93 in #304
docs: Update README.md by @kirit93 in #305
chore: update HF card citation copy and add library version to builder config by @johnnygreco in #303
chore: add tokens generated badge to README by @johnnygreco in #306
test: add provider health checks script and CI workflow by @andreatgretel in #301
chore: bump pytest, nbconvert, and pyjwt for vulnerability fixes by @johnnygreco in #312
fix: allow BuilderConfig round-trip serialization by @johnnygreco in #311
chore: export ConstraintType and InequalityOperator from config init by @johnnygreco in #308
docs: restructure plugin docs with multi-file layout and seed reader type by @johnnygreco in #302
docs: Added cat emoji sequence by @kirit93 in #316
fix: use reasoning_effort for gpt-5 inference params by @andreatgretel in #315
docs: New post on SDG design principles by @kirit93 in #318
feat: add preview, create, and validate CLI commands by @johnnygreco in #313
feat: support loading config files from HTTP(S) URLs by @johnnygreco in #323
fix: include CUSTOM type in execution DAG and warn on generator errors by @andreatgretel in #324
fix: trim LLM response content before parsing by @johnnygreco in #322

New Contributors

@dakshgup made their first contribution in #295
@davanstrien
@davidberenstein1957
@Wauplin

Full Changelog: v0.4.0...v0.5.0

Contributors

eric-tramel, nabinchha, and 7 other contributors

Assets 3

31 Jan 03:43

johnnygreco

v0.4.0

754ff71

v0.4.0 2026-01-30

🎨 NeMo Data Designer v0.4.0 Release Notes

✨ What's New

Message Traces: he full conversation history during LLM generation, giving you access to system prompts, rendered user prompts, and model reasoning for downstream use cases.. Enable per-column with with_trace=True or globally via RunConfig.
Multi-Image Support: Pass multiple images per column in multi-modal contexts for richer vision-based generation.
Expanded Code Languages: Added support for Bash, C, C++, C#, and COBOL in LLMCodeColumnConfig.
Progress Logging: Progress updates during LLM-column generation for better visibility into long-running jobs.

💥 Breaking Change: Import structure

The essentials module has been removed in favor of a cleaner import pattern. Configuration classes are now accessed via data_designer.config and the main interface via data_designer.interface.

Before (v0.3.x):

from data_designer.essentials import (
    CategorySamplerParams,
    DataDesigner,
    DataDesignerConfigBuilder,
    LLMTextColumnConfig,
    SamplerColumnConfig,
    SamplerType,
)

data_designer = DataDesigner()
config_builder = DataDesignerConfigBuilder()

After (v0.4.x):

import data_designer.config as dd
from data_designer.interface import DataDesigner

data_designer = DataDesigner()
config_builder = dd.DataDesignerConfigBuilder()

# Configuration classes are accessed via the `dd` namespace
config_builder.add_column(
    dd.SamplerColumnConfig(
        name="category",
        sampler_type=dd.SamplerType.CATEGORY,
        params=dd.CategorySamplerParams(values=["A", "B"]),
    )
)

💥 Breaking Change: Reasoning traces → Message traces

The automatic __reasoning_trace columns have been replaced with opt-in message traces that capture the full conversation history.

Key changes:

Column postfix renamed from __reasoning_trace to __trace
Traces are now opt-in rather than automatic
Traces capture the full message history (system/user/assistant), including retry conversations

Before (v0.3.x):

Reasoning traces were automatically generated as side-effect columns for extended thinking models:

# Traces were automatic - no configuration needed
# Column "answer" would automatically produce "answer__reasoning_trace"

After (v0.4.x):

Enable traces explicitly per-column or globally:

Per-column (recommended):

import data_designer.config as dd

config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="answer",
        prompt="Answer: {{ question }}",
        model_alias="nvidia-text",
        with_trace=True,  # Opt-in to trace capture
    )
)
# Produces "answer" and "answer__trace" columns

Global debug override:

import data_designer.config as dd
from data_designer.interface import DataDesigner

data_designer = DataDesigner()
data_designer.set_run_config(
    dd.RunConfig(debug_override_save_all_column_traces=True)
)

The trace data structure is now a list[dict] capturing the ordered message history:

[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"},
    {"role": "assistant", "content": "4", "reasoning_content": None}
]

What's Changed

feat: Add /create-pr skill for well-formatted GitHub PRs by @johnnygreco in #247
docs: Fix mkdocs syntax and update person sampling documentation by @johnnygreco in #249
refactor: slim package refactor into three subpackages by @johnnygreco in #240
chore: add publish script and update license headers by @johnnygreco in #253
chore: add CODEOWNERS for automatic PR review assignment by @andreatgretel in #251
feat: allow skipping health checks by @nabinchha in #244
chore: copy README to data-designer package during install by @johnnygreco in #256
feat: support multiple images per column in image context by @nabinchha in #257
fix: escape special characters in SchemaTransformProcessor JSON templates by @andreatgretel in #250
chore: update telemetry by @johntmyers in #261
feat: add /update-pr skill and improve /create-pr file linking by @johnnygreco in #258
feat: Add /commit skill for conventional commit messages by @johnnygreco in #252
fix: automate README sync for data-designer package builds by @andreatgretel in #266
chore: simplify publish script by removing redundant rebuild step by @johnnygreco in #268
feat: add job progress logging for cell-by-cell generation by @eric-tramel in #259
feat: add message trace support for LLM generation by @johnnygreco in #272
chore: add animated emoji progress indicators to progress tracker by @johnnygreco in #273
feat: Add Phase 1 languages (Bash, C, C++, C#, COBOL) to CodeLang by @kirit93 in #271
fix: ensure 100% progress is logged exactly once by @johnnygreco in #276

Full Changelog: v0.3.8...v0.4.0

Contributors

eric-tramel, nabinchha, and 4 other contributors

Assets 3

27 Jan 02:28

johnnygreco

v0.3.8

5402b7d

v0.3.8 2026-01-26

👀 New Nemotron-Personas Datasets

PersonSampler supports two new locales:

Nemotron-Personas-Singapore (locale = en_SG)
Nemotron-Personas-Brazil (locale = pt_BR)

What's Changed

fix: unblock generation when no from-scratch-generator is configured by @nabinchha in #231
fix: do not attempt to deserialize llm text response by @nabinchha in #233
docs: Updated recipe card by @kirit93 in #153
fix: no api key warning on default model providers by @nabinchha in #238
feat: Support for Claude Skills (DevX and Generation) by @eric-tramel in #239
feat: Elevate non-LLM concurrency limits to RunConfig by @eric-tramel in #242
feat: wire up pt_GB and en_SG personas by @johnnygreco in #245

Full Changelog: v0.3.7...v0.3.8

Contributors

eric-tramel, nabinchha, and 2 other contributors

Assets 3

Releases: NVIDIA-NeMo/DataDesigner

v0.5.7 2026-04-17

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.6 2026-04-09

What's Changed

Contributors

Uh oh!

v0.5.5 2026-04-02

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.4 2026-03-24

🔒 Note on the LiteLLM Supply Chain Incident (2026-03-24)

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.3 2026-03-12

What's Changed

Contributors

Uh oh!

v0.5.2 2026-03-04

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.1 2026-02-20

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.0 2026-02-11

🎨 NeMo Data Designer – v0.5.0 Release Notes

⚡Highlights

⚠️ Breaking Changes

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.0 2026-01-30

🎨 NeMo Data Designer v0.4.0 Release Notes

✨ What's New

💥 Breaking Change: Import structure

Before (v0.3.x):

After (v0.4.x):

💥 Breaking Change: Reasoning traces → Message traces

Before (v0.3.x):

After (v0.4.x):

What's Changed

Contributors

Uh oh!

v0.3.8 2026-01-26

👀 New Nemotron-Personas Datasets

What's Changed

Contributors

Uh oh!