Releases: NVIDIA-NeMo/DataDesigner
v0.5.7 2026-04-17
What's Changed
- fix: restrict docs-preview workflow to actual docs changes by @andreatgretel in #515
- chore: harden CI supply chain by @andreatgretel in #517
- fix: bump pytest, aiohttp, and cryptography for security CVEs by @johnnygreco in #535
- fix: tune Dependabot config and fix DCO assistant bugs by @andreatgretel in #534
- ci: publish devnotes independently of releases by @andreatgretel in #536
- fix: async engine side-effect column propagation and collision resolution by @andreatgretel in #509
- ci: add PR hygiene automation (linked issue check + stale PR cleanup) by @andreatgretel in #521
- ci: bump the all-actions group with 5 updates by @dependabot[bot] in #539
- feat: add generic and OpenRouter attribution headers by @eric-tramel in #542
- docs: add text-to-sql dev note by @dhruvnathawani in #349
- fix: text-to-sql devnote date, images, and publish-devnotes nav by @andreatgretel in #546
- fix(ci): replace yq with Python nav patching in publish-devnotes by @andreatgretel in #548
- feat: add skip.when conditional column generation by @nabinchha in #502
- fix: use pull_request_target for agentic CI on fork PRs by @andreatgretel in #541
- docs: Added starter dev notes on push to hugging face hub by @nabinchha in #355
- fix: bridge model.generate() to agenerate() for custom columns in async engine by @andreatgretel in #545
- ci: add daily audit suites with 5 rotating recipes and scheduled workflow by @andreatgretel in #543
- feat: add RunConfig jinja rendering engine by @eric-tramel in #557
New Contributors
- @dependabot[bot] made their first contribution in #539
Full Changelog: v0.5.6...v0.5.7
v0.5.6 2026-04-09
What's Changed
- fix: use --bare and --tools in health probe CLI check by @andreatgretel in #489
- feat: add ATIF rollout ingestion by @eric-tramel in #495
- fix: replace native-model-client-hero image with corrected version by @nabinchha in #492
- docs: add skip column config option for conditional column generation (#479) by @nabinchha in #480
- chore: plan 427, PR 2 of agent-first development plan by @nabinchha in #478
- feat: add Hermes Agent rollout support by @eric-tramel in #500
- fix: prevent skill load failure when data-designer CLI is not installed by @johnnygreco in #501
- ci: add PR review workflow and recipe for agentic CI by @andreatgretel in #498
- docs: add agent rollout ingestion docs entry point by @eric-tramel in #499
- docs: add async engine dev note by @andreatgretel in #490
- fix: use non-blocking dispatch to prevent pipeline starvation by @nabinchha in #505
- feat: add Pi Coding Agent rollout seed source by @johnnygreco in #514
- fix: always return ISO-8601 from datetime postproc (#484) by @johnnygreco in #512
- fix: include multi_modal_context columns in required_columns by @nabinchha in #522
- docs: add LiteLLM supply-chain incident notice to README by @johnnygreco in #516
Full Changelog: v0.5.5...v0.5.6
v0.5.5 2026-04-02
What's Changed
- fix: Claude Code marketplace plugin structure and install docs by @johnnygreco in #458
- docs: update dev note with TL;DR tips and install instructions by @johnnygreco in #461
- chore: remove unused .claude-plugin directory by @johnnygreco in #463
- chore: async engine follow-up - rename, preview, lifecycle, progress by @andreatgretel in #456
- docs: restructure agent and contributor documentation (plan 427, PR 1) by @nabinchha in #454
- fix: address nspect vulnerability report for requests and cryptography by @johnnygreco in #475
- fix: update health checks to use new ModelFacade client API by @andreatgretel in #470
- ci: upgrade GitHub Actions for Node.js 24 compatibility by @ko3n1g in #450
- chore: reduce Greptile review noise from defensive coding suggestions by @andreatgretel in #423
- docs: consolidated seed reader documentation by @eric-tramel in #481
- fix: bump pymdown-extensions for pygments 2.20.0 compat by @eric-tramel in #482
- feat: add fr_FR locale to nemotron personas datasets by @johnnygreco in #468
- docs: add native model client dev note by @nabinchha in #465
- fix: respect max_parallel_requests in HTTP connection pool size by @przemekboruta in #460
- docs: center diagram images in native model client dev note by @nabinchha in #483
- docs: update architecture-and-performance.md to reflect AIMD changes by @nabinchha in #467
- chore: update review code skill output and tone by @nabinchha in #477
- ci: add agentic CI plan, health probe workflow, and recipe scaffold by @andreatgretel in #473
- test: add transport-wiring regression tests for #459 by @nabinchha in #485
New Contributors
- @ko3n1g made their first contribution in #450
- @przemekboruta made their first contribution in #460
Full Changelog: v0.5.4...v0.5.5
v0.5.4 2026-03-24
🔒 Note on the LiteLLM Supply Chain Incident (2026-03-24)
Earlier today, malicious versions of litellm (1.82.7 and 1.82.8) were published to PyPI containing a credential stealer that exfiltrates cloud credentials, SSH keys, and cryptocurrency wallets on any Python process startup.
Data Designer v0.5.4 removes litellm as a dependency entirely. We recommend all users upgrade.
For users on prior versions, here is our assessment:
- v0.3.0 – v0.5.3:
litellmhas been pinned at>=1.73.6,<1.80.12since January — no exposure to the compromised versions. - v0.2.2 and v0.2.3: These releases briefly carried an upper bound of
<2, which in theory permitted resolution to the malicious 1.82.x versions. Both have been yanked from PyPI as a precaution. - Realistic risk from Data Designer is very low. Exposure would require a user pinned to one of the two yanked versions and running a fresh install or dependency update during the few hours the compromised package was live this morning. That said, please verify your installed versions and upgrade Data Designer to v0.5.4 at your earliest convenience.
What's Changed
- fix: correct broken dev note links in recipe pages by @dhruvnathawani in #407
- docs: trace visualization in display_sample_record (#396) by @nabinchha in #397
- chore: simplify tutorial 4 image dataset and use default model config by @nabinchha in #403
- fix: preserve extra_body for LiteLLM to avoid UnsupportedParamsError (#409) by @nabinchha in #412
- feat: normalize validator and constraint discriminators by @johnnygreco in #414
- docs: add Open in Colab badges to tutorial notebooks by @mvansegbroeck in #391
- feat: agent CLI introspection (simplified) by @johnnygreco in #415
- fix: bump litellm lower bound to >=1.77.0 by @nabinchha in #417
- feat: Improve generation failure reporting for schema and timeout failures by @eric-tramel in #416
- feat: add built-in filesystem seed readers by @eric-tramel in #421
- feat: add AsyncTaskScheduler and RowGroupBufferManager for async engine by @andreatgretel in #404
- feat: Native OpenAI adapter with retry and AIMD throttle infrastructure by @nabinchha in #402
- refactor: simplify agent CLI to context, types, and state (#418) by @johnnygreco in #420
- feat: support 1-to-many FileSystemSeedReader hydration by @eric-tramel in #424
- fix: support nested field access in schema transform templates by @andreatgretel in #435
- chore: use uv run ruff in pre-commit hooks by @andreatgretel in #436
- test: follow up FileSystemSeedReader coverage cleanup by @eric-tramel in #432
- feat: Plan + Implementation for 392 managed storage improvements by @mikeknep in #393
- feat: Native Anthropic adapter with shared HTTP client infrastructure by @nabinchha in #426
- feat: add Data Designer skill by @johnnygreco in #434
- feat: agent rollout trace ingestion by @eric-tramel in #399
- feat: wire async task-queue scheduler into ColumnWiseDatasetBuilder by @andreatgretel in #429
- feat: Constrain HttpModelClient to single concurrency mode... by @nabinchha in #439
- docs: Updated telemetry by @kirit93 in #451
- feat: add preview review reference and update interactive iterate step by @johnnygreco in #441
- feat: add trace visualization to display_sample_record (#396) by @nabinchha in #438
- docs: agent-assisted development plan for DataDesigner by @nabinchha in #428
- feat: wire ThrottledModelClient and dual-semaphore scheduler by @nabinchha in #449
- feat: remove litellm dependency and bridge path by @nabinchha in #455
- feat: resolve data-designer command path before workflow execution by @johnnygreco in #440
- docs: Data Designer Got Skills dev note by @johnnygreco in #457
New Contributors
- @mvansegbroeck made their first contribution in #391
Full Changelog: v0.5.3...v0.5.4
v0.5.3 2026-03-12
What's Changed
- fix: cache notebook builds to avoid flaky upstream model failures by @andreatgretel in #370
- feat: canonical model client types, protocols, and LiteLLM bridge adapter by @nabinchha in #359
- fix: processor artifacts type, discovery, and loading by @andreatgretel in #366
- feat: add ExecutionGraph, CompletionTracker, and Task model for async scheduler by @andreatgretel in #356
- fix: handle discriminated unions in oneOf pruning validator by @andreatgretel in #376
- docs: account for vLLM reasoning field migration in plan 343 by @nabinchha in #377
- chore: add Claude Code skill for code review by @nabinchha in #372
- fix: replace removed DuckDB record_batch() with to_arrow_reader() by @andreatgretel in #380
- fix: patch litellm ImageURLListItem to make index field optional (#384) by @nabinchha in #385
- chore: improve test guidelines in AGENTS.md by @nabinchha in #387
- fix: raise clear error when all records are dropped during generation by @nabinchha in #383
- feat: add async generator migration with symmetric bridging and statefulness by @andreatgretel in #378
- docs: add Enterprise Text-to-SQL and Search Agent recipes by @dhruvnathawani in #395
- refactor: Decouple ModelFacade from LiteLLM via ModelClient adapter by @nabinchha in #373
- docs: search agent dev note by @dhruvnathawani in #350
- feat(cli): bootstrap default configs on CLI startup by @johnnygreco in #401
- fix: pin chardet<6 to suppress RequestsDependencyWarning by @andreatgretel in #405
- fix: add chardet<6 constraint to published engine package by @johnnygreco in #406
Full Changelog: v0.5.2...v0.5.3
v0.5.2 2026-03-04
What's Changed
- fix: repair notebook CI (dead model, missing API key, pyarrow type bug) by @andreatgretel in #348
- docs: Update top models usage chart for 1/24-2/24/2026 by @kirit93 in #353
- docs: add structured outputs SDG dev notes by @dhruvnathawani in #338
- feat: add processor plugin support by @andreatgretel in #299
- chore: plans for async generators and task-queue dataset builder by @andreatgretel in #347
- chore: plans for model facade overhaul by @nabinchha in #344
- fix: include seed dataset in builder repr for seed-only configs by @johnnygreco in #361
- chore: bump cryptography and pillow for security fixes by @johnnygreco in #364
- feat: add Streamable HTTP transport support for remote MCP providers by @nabinchha in #358
- docs: update README token badge to 150+ billion by @johnnygreco in #367
- docs: fix structure outputs blog format by @johnnygreco in #368
- chore: fix inaccuracies and improve AGENTS.md by @nabinchha in #369
- fix: include plugin column types in display_sample_record() by @3mei in #365
New Contributors
Full Changelog: v0.5.1...v0.5.2
v0.5.1 2026-02-20
Data Designer now supports image generation!
What's Changed
- docs: Updated url by @kirit93 in #325
- docs: deep research trajectories with NDD and MCP tool use by @eric-tramel in #326
- refactor: callback-based processor design by @andreatgretel in #294
- feat: add image generation support with multi-modal context by @nabinchha in #317
- docs: add image generation documentation and image-to-image editing tutorial by @nabinchha in #319
- chore: move ArtifactStorage to engine/storage/ module by @nabinchha in #321
- chore: gitignore Cerebro knowledge base files by @johnnygreco in #328
- feat(engine): env-var switch for async-first models experiment by @eric-tramel in #280
- docs: Moved nav to left hand side by @kirit93 in #331
- feat: add --save-results option to preview command by @johnnygreco in #333
- chore: Improve CLI startup with lazy heavy import cleanup by @johnnygreco in #330
- feat: add allow_resize for 1:N and N:1 generation patterns by @andreatgretel in #286
- chore: address Andre's feedback on --save-results and CLI preview by @johnnygreco in #335
- chore: remove example_allow_resize.py from repo root by @andreatgretel in #337
- fix: make DropColumnsProcessorConfig idempotent and support reasoning columns by @andreatgretel in #334
- feat: add push_to_hub_from_folder classmethod for uploading saved datasets by @nabinchha in #340
- fix: handle bool, int, float in convert_to_row_element by @dhruvnathawani in #336
- feat: auto-detect ImageContext format for image-to-image generation by @nabinchha in #342
New Contributors
- @dhruvnathawani made their first contribution in #336
Full Changelog: v0.5.0...v0.5.1
v0.5.0 2026-02-11
🎨 NeMo Data Designer – v0.5.0 Release Notes
⚡Highlights
-
🛠️ MCP Tool Calling: LLM columns can now call external tools during generation via MCP!!
-
⚛️ Functions as custom column generators: The @custom_column_generator decorator that lets users write their own column generation logic and plug it directly into a pipeline.
-
🤗 Hugging Face Hub integration: You can now publish generated datasets directly to the Hugging Face Hub with auto-generated dataset cards:
results.push_to_hub().- Huge thank you to @davidberenstein1957 for starting the design and work on this feature, as well as @davanstrien and @Wauplin for their help pushing it over the finish line!
-
💻 CLI generation commands: You can generate data from the CLI using the new
preview,create, andvalidatecommands. -
🔍 LLM Observability: Use the new with_trace option on LLM configs to return the
TraceType.ALL_MESSAGEor theTraceType.LAST_MESSAGE. You can also selectively extract reasoning content usingextract_reasoning_content=True.
⚠️ Breaking Changes
-
with_traceused to be a boolean. It is now aTraceTypeenum (NONE(default),LAST_MESSAGE,ALL_MESSAGES) instead of a boolean. -
SingleColumnConfigis now isolated in its own base moduledata_designer.base.configto protect against circular imports during plugin discovery.
What's Changed
- feat: MCP (Model Context Protocol) tool calling integration for LLM columns by @eric-tramel in #248
- fix: normalize license header year format in mcp module by @johnnygreco in #279
- chore: configure independent pytest settings per subpackage by @johnnygreco in #278
- fix: normalize trace content blocks to prevent parquet write crashes by @eric-tramel in #283
- feat: Add TraceType enum for granular trace control by @eric-tramel in #284
- docs: add deployment, performance tuning guides and streamline gettin… by @kirit93 in #277
- chore: update tutorial notebooks to use dd. notation consistently by @andreatgretel in #288
- feat: add extract_reasoning_content option to LLM columns by @eric-tramel in #285
- chore: add greptile.json to reduce review verbosity by @andreatgretel in #289
- feat: switch from hatch-vcs to uv-dynamic-versioning by @johnnygreco in #282
- revert: Remove RunConfig debug_trace_override by @eric-tramel in #290
- perf: implement lazy loading for config module exports by @johnnygreco in #291
- refactor: move SingleColumnConfig to config.base module by @johnnygreco in #287
- feat: Add CustomColumnGenerator for user-defined column generation by @andreatgretel in #254
- chore: standardize recipe script metadata and docstrings by @johnnygreco in #292
- chore: enable status check in greptile.json by @dakshgup in #295
- feat: add HuggingFace Hub integration for dataset publishing by @nabinchha in #275
- docs: Added images for deployment options by @kirit93 in #297
- docs: Add RQA dataset blog post and improve blog navigation by @kirit93 in #296
- chore: quiet tool call logs and add tool usage statistics by @johnnygreco in #293
- docs: Added documentation for seed datasets by @kirit93 in #300
- docs: updated usage chart by @kirit93 in #304
- docs: Update README.md by @kirit93 in #305
- chore: update HF card citation copy and add library version to builder config by @johnnygreco in #303
- chore: add tokens generated badge to README by @johnnygreco in #306
- test: add provider health checks script and CI workflow by @andreatgretel in #301
- chore: bump pytest, nbconvert, and pyjwt for vulnerability fixes by @johnnygreco in #312
- fix: allow BuilderConfig round-trip serialization by @johnnygreco in #311
- chore: export ConstraintType and InequalityOperator from config init by @johnnygreco in #308
- docs: restructure plugin docs with multi-file layout and seed reader type by @johnnygreco in #302
- docs: Added cat emoji sequence by @kirit93 in #316
- fix: use reasoning_effort for gpt-5 inference params by @andreatgretel in #315
- docs: New post on SDG design principles by @kirit93 in #318
- feat: add preview, create, and validate CLI commands by @johnnygreco in #313
- feat: support loading config files from HTTP(S) URLs by @johnnygreco in #323
- fix: include CUSTOM type in execution DAG and warn on generator errors by @andreatgretel in #324
- fix: trim LLM response content before parsing by @johnnygreco in #322
New Contributors
- @dakshgup made their first contribution in #295
- @davanstrien
- @davidberenstein1957
- @Wauplin
Full Changelog: v0.4.0...v0.5.0
v0.4.0 2026-01-30
🎨 NeMo Data Designer v0.4.0 Release Notes
✨ What's New
-
Message Traces: he full conversation history during LLM generation, giving you access to system prompts, rendered user prompts, and model reasoning for downstream use cases.. Enable per-column with
with_trace=Trueor globally viaRunConfig. -
Multi-Image Support: Pass multiple images per column in multi-modal contexts for richer vision-based generation.
-
Expanded Code Languages: Added support for Bash, C, C++, C#, and COBOL in
LLMCodeColumnConfig. -
Progress Logging: Progress updates during LLM-column generation for better visibility into long-running jobs.
💥 Breaking Change: Import structure
The essentials module has been removed in favor of a cleaner import pattern. Configuration classes are now accessed via data_designer.config and the main interface via data_designer.interface.
Before (v0.3.x):
from data_designer.essentials import (
CategorySamplerParams,
DataDesigner,
DataDesignerConfigBuilder,
LLMTextColumnConfig,
SamplerColumnConfig,
SamplerType,
)
data_designer = DataDesigner()
config_builder = DataDesignerConfigBuilder()After (v0.4.x):
import data_designer.config as dd
from data_designer.interface import DataDesigner
data_designer = DataDesigner()
config_builder = dd.DataDesignerConfigBuilder()
# Configuration classes are accessed via the `dd` namespace
config_builder.add_column(
dd.SamplerColumnConfig(
name="category",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(values=["A", "B"]),
)
)💥 Breaking Change: Reasoning traces → Message traces
The automatic __reasoning_trace columns have been replaced with opt-in message traces that capture the full conversation history.
Key changes:
- Column postfix renamed from
__reasoning_traceto__trace - Traces are now opt-in rather than automatic
- Traces capture the full message history (system/user/assistant), including retry conversations
Before (v0.3.x):
Reasoning traces were automatically generated as side-effect columns for extended thinking models:
# Traces were automatic - no configuration needed
# Column "answer" would automatically produce "answer__reasoning_trace"After (v0.4.x):
Enable traces explicitly per-column or globally:
Per-column (recommended):
import data_designer.config as dd
config_builder.add_column(
dd.LLMTextColumnConfig(
name="answer",
prompt="Answer: {{ question }}",
model_alias="nvidia-text",
with_trace=True, # Opt-in to trace capture
)
)
# Produces "answer" and "answer__trace" columnsGlobal debug override:
import data_designer.config as dd
from data_designer.interface import DataDesigner
data_designer = DataDesigner()
data_designer.set_run_config(
dd.RunConfig(debug_override_save_all_column_traces=True)
)The trace data structure is now a list[dict] capturing the ordered message history:
[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2+2?"},
{"role": "assistant", "content": "4", "reasoning_content": None}
]What's Changed
- feat: Add /create-pr skill for well-formatted GitHub PRs by @johnnygreco in #247
- docs: Fix mkdocs syntax and update person sampling documentation by @johnnygreco in #249
- refactor: slim package refactor into three subpackages by @johnnygreco in #240
- chore: add publish script and update license headers by @johnnygreco in #253
- chore: add CODEOWNERS for automatic PR review assignment by @andreatgretel in #251
- feat: allow skipping health checks by @nabinchha in #244
- chore: copy README to data-designer package during install by @johnnygreco in #256
- feat: support multiple images per column in image context by @nabinchha in #257
- fix: escape special characters in SchemaTransformProcessor JSON templates by @andreatgretel in #250
- chore: update telemetry by @johntmyers in #261
- feat: add /update-pr skill and improve /create-pr file linking by @johnnygreco in #258
- feat: Add /commit skill for conventional commit messages by @johnnygreco in #252
- fix: automate README sync for data-designer package builds by @andreatgretel in #266
- chore: simplify publish script by removing redundant rebuild step by @johnnygreco in #268
- feat: add job progress logging for cell-by-cell generation by @eric-tramel in #259
- feat: add message trace support for LLM generation by @johnnygreco in #272
- chore: add animated emoji progress indicators to progress tracker by @johnnygreco in #273
- feat: Add Phase 1 languages (Bash, C, C++, C#, COBOL) to CodeLang by @kirit93 in #271
- fix: ensure 100% progress is logged exactly once by @johnnygreco in #276
Full Changelog: v0.3.8...v0.4.0
v0.3.8 2026-01-26
👀 New Nemotron-Personas Datasets
PersonSampler supports two new locales:
- Nemotron-Personas-Singapore (
locale = en_SG) - Nemotron-Personas-Brazil (
locale = pt_BR)
What's Changed
- fix: unblock generation when no from-scratch-generator is configured by @nabinchha in #231
- fix: do not attempt to deserialize llm text response by @nabinchha in #233
- docs: Updated recipe card by @kirit93 in #153
- fix: no api key warning on default model providers by @nabinchha in #238
- feat: Support for Claude Skills (DevX and Generation) by @eric-tramel in #239
- feat: Elevate non-LLM concurrency limits to
RunConfigby @eric-tramel in #242 - feat: wire up pt_GB and en_SG personas by @johnnygreco in #245
Full Changelog: v0.3.7...v0.3.8