Skip to content

Releases: NVIDIA-NeMo/DataDesigner

v0.5.7 2026-04-17

17 Apr 22:06
8be4ff7

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.5.6...v0.5.7

v0.5.6 2026-04-09

09 Apr 19:36
6505ce4

Choose a tag to compare

What's Changed

Full Changelog: v0.5.5...v0.5.6

v0.5.5 2026-04-02

02 Apr 16:31
d43ac1c

Choose a tag to compare

What's Changed

  • fix: Claude Code marketplace plugin structure and install docs by @johnnygreco in #458
  • docs: update dev note with TL;DR tips and install instructions by @johnnygreco in #461
  • chore: remove unused .claude-plugin directory by @johnnygreco in #463
  • chore: async engine follow-up - rename, preview, lifecycle, progress by @andreatgretel in #456
  • docs: restructure agent and contributor documentation (plan 427, PR 1) by @nabinchha in #454
  • fix: address nspect vulnerability report for requests and cryptography by @johnnygreco in #475
  • fix: update health checks to use new ModelFacade client API by @andreatgretel in #470
  • ci: upgrade GitHub Actions for Node.js 24 compatibility by @ko3n1g in #450
  • chore: reduce Greptile review noise from defensive coding suggestions by @andreatgretel in #423
  • docs: consolidated seed reader documentation by @eric-tramel in #481
  • fix: bump pymdown-extensions for pygments 2.20.0 compat by @eric-tramel in #482
  • feat: add fr_FR locale to nemotron personas datasets by @johnnygreco in #468
  • docs: add native model client dev note by @nabinchha in #465
  • fix: respect max_parallel_requests in HTTP connection pool size by @przemekboruta in #460
  • docs: center diagram images in native model client dev note by @nabinchha in #483
  • docs: update architecture-and-performance.md to reflect AIMD changes by @nabinchha in #467
  • chore: update review code skill output and tone by @nabinchha in #477
  • ci: add agentic CI plan, health probe workflow, and recipe scaffold by @andreatgretel in #473
  • test: add transport-wiring regression tests for #459 by @nabinchha in #485

New Contributors

Full Changelog: v0.5.4...v0.5.5

v0.5.4 2026-03-24

25 Mar 02:02
0a7b9e0

Choose a tag to compare

🔒 Note on the LiteLLM Supply Chain Incident (2026-03-24)

Earlier today, malicious versions of litellm (1.82.7 and 1.82.8) were published to PyPI containing a credential stealer that exfiltrates cloud credentials, SSH keys, and cryptocurrency wallets on any Python process startup.

Data Designer v0.5.4 removes litellm as a dependency entirely. We recommend all users upgrade.

For users on prior versions, here is our assessment:

  • v0.3.0 – v0.5.3: litellm has been pinned at >=1.73.6,<1.80.12 since January — no exposure to the compromised versions.
  • v0.2.2 and v0.2.3: These releases briefly carried an upper bound of <2, which in theory permitted resolution to the malicious 1.82.x versions. Both have been yanked from PyPI as a precaution.
  • Realistic risk from Data Designer is very low. Exposure would require a user pinned to one of the two yanked versions and running a fresh install or dependency update during the few hours the compromised package was live this morning. That said, please verify your installed versions and upgrade Data Designer to v0.5.4 at your earliest convenience.

What's Changed

New Contributors

Full Changelog: v0.5.3...v0.5.4

v0.5.3 2026-03-12

12 Mar 23:12
447ed59

Choose a tag to compare

What's Changed

  • fix: cache notebook builds to avoid flaky upstream model failures by @andreatgretel in #370
  • feat: canonical model client types, protocols, and LiteLLM bridge adapter by @nabinchha in #359
  • fix: processor artifacts type, discovery, and loading by @andreatgretel in #366
  • feat: add ExecutionGraph, CompletionTracker, and Task model for async scheduler by @andreatgretel in #356
  • fix: handle discriminated unions in oneOf pruning validator by @andreatgretel in #376
  • docs: account for vLLM reasoning field migration in plan 343 by @nabinchha in #377
  • chore: add Claude Code skill for code review by @nabinchha in #372
  • fix: replace removed DuckDB record_batch() with to_arrow_reader() by @andreatgretel in #380
  • fix: patch litellm ImageURLListItem to make index field optional (#384) by @nabinchha in #385
  • chore: improve test guidelines in AGENTS.md by @nabinchha in #387
  • fix: raise clear error when all records are dropped during generation by @nabinchha in #383
  • feat: add async generator migration with symmetric bridging and statefulness by @andreatgretel in #378
  • docs: add Enterprise Text-to-SQL and Search Agent recipes by @dhruvnathawani in #395
  • refactor: Decouple ModelFacade from LiteLLM via ModelClient adapter by @nabinchha in #373
  • docs: search agent dev note by @dhruvnathawani in #350
  • feat(cli): bootstrap default configs on CLI startup by @johnnygreco in #401
  • fix: pin chardet<6 to suppress RequestsDependencyWarning by @andreatgretel in #405
  • fix: add chardet<6 constraint to published engine package by @johnnygreco in #406

Full Changelog: v0.5.2...v0.5.3

v0.5.2 2026-03-04

05 Mar 04:49
e2c94da

Choose a tag to compare

What's Changed

New Contributors

  • @3mei made their first contribution in #365

Full Changelog: v0.5.1...v0.5.2

v0.5.1 2026-02-20

20 Feb 21:06
8f7a720

Choose a tag to compare

Data Designer now supports image generation!

What's Changed

  • docs: Updated url by @kirit93 in #325
  • docs: deep research trajectories with NDD and MCP tool use by @eric-tramel in #326
  • refactor: callback-based processor design by @andreatgretel in #294
  • feat: add image generation support with multi-modal context by @nabinchha in #317
  • docs: add image generation documentation and image-to-image editing tutorial by @nabinchha in #319
  • chore: move ArtifactStorage to engine/storage/ module by @nabinchha in #321
  • chore: gitignore Cerebro knowledge base files by @johnnygreco in #328
  • feat(engine): env-var switch for async-first models experiment by @eric-tramel in #280
  • docs: Moved nav to left hand side by @kirit93 in #331
  • feat: add --save-results option to preview command by @johnnygreco in #333
  • chore: Improve CLI startup with lazy heavy import cleanup by @johnnygreco in #330
  • feat: add allow_resize for 1:N and N:1 generation patterns by @andreatgretel in #286
  • chore: address Andre's feedback on --save-results and CLI preview by @johnnygreco in #335
  • chore: remove example_allow_resize.py from repo root by @andreatgretel in #337
  • fix: make DropColumnsProcessorConfig idempotent and support reasoning columns by @andreatgretel in #334
  • feat: add push_to_hub_from_folder classmethod for uploading saved datasets by @nabinchha in #340
  • fix: handle bool, int, float in convert_to_row_element by @dhruvnathawani in #336
  • feat: auto-detect ImageContext format for image-to-image generation by @nabinchha in #342

New Contributors

Full Changelog: v0.5.0...v0.5.1

v0.5.0 2026-02-11

11 Feb 22:22
631f1f9

Choose a tag to compare

🎨 NeMo Data Designer – v0.5.0 Release Notes

⚡Highlights

  • 🛠️ MCP Tool Calling: ​​LLM columns can now call external tools during generation via MCP!!

  • ⚛️ Functions as custom column generators: The @custom_column_generator decorator that lets users write their own column generation logic and plug it directly into a pipeline.

  • 🤗 Hugging Face Hub integration: You can now publish generated datasets directly to the Hugging Face Hub with auto-generated dataset cards: results.push_to_hub().

  • 💻 CLI generation commands: You can generate data from the CLI using the new preview, create, and validate commands.

  • 🔍 LLM Observability: Use the new with_trace option on LLM configs to return the TraceType.ALL_MESSAGE or the TraceType.LAST_MESSAGE. You can also selectively extract reasoning content using extract_reasoning_content=True.

⚠️ Breaking Changes

  • with_trace used to be a boolean. It is now a TraceType enum (NONE (default), LAST_MESSAGE, ALL_MESSAGES) instead of a boolean.

  • SingleColumnConfig is now isolated in its own base module data_designer.base.config to protect against circular imports during plugin discovery.

What's Changed

  • feat: MCP (Model Context Protocol) tool calling integration for LLM columns by @eric-tramel in #248
  • fix: normalize license header year format in mcp module by @johnnygreco in #279
  • chore: configure independent pytest settings per subpackage by @johnnygreco in #278
  • fix: normalize trace content blocks to prevent parquet write crashes by @eric-tramel in #283
  • feat: Add TraceType enum for granular trace control by @eric-tramel in #284
  • docs: add deployment, performance tuning guides and streamline gettin… by @kirit93 in #277
  • chore: update tutorial notebooks to use dd. notation consistently by @andreatgretel in #288
  • feat: add extract_reasoning_content option to LLM columns by @eric-tramel in #285
  • chore: add greptile.json to reduce review verbosity by @andreatgretel in #289
  • feat: switch from hatch-vcs to uv-dynamic-versioning by @johnnygreco in #282
  • revert: Remove RunConfig debug_trace_override by @eric-tramel in #290
  • perf: implement lazy loading for config module exports by @johnnygreco in #291
  • refactor: move SingleColumnConfig to config.base module by @johnnygreco in #287
  • feat: Add CustomColumnGenerator for user-defined column generation by @andreatgretel in #254
  • chore: standardize recipe script metadata and docstrings by @johnnygreco in #292
  • chore: enable status check in greptile.json by @dakshgup in #295
  • feat: add HuggingFace Hub integration for dataset publishing by @nabinchha in #275
  • docs: Added images for deployment options by @kirit93 in #297
  • docs: Add RQA dataset blog post and improve blog navigation by @kirit93 in #296
  • chore: quiet tool call logs and add tool usage statistics by @johnnygreco in #293
  • docs: Added documentation for seed datasets by @kirit93 in #300
  • docs: updated usage chart by @kirit93 in #304
  • docs: Update README.md by @kirit93 in #305
  • chore: update HF card citation copy and add library version to builder config by @johnnygreco in #303
  • chore: add tokens generated badge to README by @johnnygreco in #306
  • test: add provider health checks script and CI workflow by @andreatgretel in #301
  • chore: bump pytest, nbconvert, and pyjwt for vulnerability fixes by @johnnygreco in #312
  • fix: allow BuilderConfig round-trip serialization by @johnnygreco in #311
  • chore: export ConstraintType and InequalityOperator from config init by @johnnygreco in #308
  • docs: restructure plugin docs with multi-file layout and seed reader type by @johnnygreco in #302
  • docs: Added cat emoji sequence by @kirit93 in #316
  • fix: use reasoning_effort for gpt-5 inference params by @andreatgretel in #315
  • docs: New post on SDG design principles by @kirit93 in #318
  • feat: add preview, create, and validate CLI commands by @johnnygreco in #313
  • feat: support loading config files from HTTP(S) URLs by @johnnygreco in #323
  • fix: include CUSTOM type in execution DAG and warn on generator errors by @andreatgretel in #324
  • fix: trim LLM response content before parsing by @johnnygreco in #322

New Contributors

Full Changelog: v0.4.0...v0.5.0

v0.4.0 2026-01-30

31 Jan 03:43
754ff71

Choose a tag to compare

🎨 NeMo Data Designer v0.4.0 Release Notes

✨ What's New

  • Message Traces: he full conversation history during LLM generation, giving you access to system prompts, rendered user prompts, and model reasoning for downstream use cases.. Enable per-column with with_trace=True or globally via RunConfig.

  • Multi-Image Support: Pass multiple images per column in multi-modal contexts for richer vision-based generation.

  • Expanded Code Languages: Added support for Bash, C, C++, C#, and COBOL in LLMCodeColumnConfig.

  • Progress Logging: Progress updates during LLM-column generation for better visibility into long-running jobs.


💥 Breaking Change: Import structure

The essentials module has been removed in favor of a cleaner import pattern. Configuration classes are now accessed via data_designer.config and the main interface via data_designer.interface.

Before (v0.3.x):

from data_designer.essentials import (
    CategorySamplerParams,
    DataDesigner,
    DataDesignerConfigBuilder,
    LLMTextColumnConfig,
    SamplerColumnConfig,
    SamplerType,
)

data_designer = DataDesigner()
config_builder = DataDesignerConfigBuilder()

After (v0.4.x):

import data_designer.config as dd
from data_designer.interface import DataDesigner

data_designer = DataDesigner()
config_builder = dd.DataDesignerConfigBuilder()

# Configuration classes are accessed via the `dd` namespace
config_builder.add_column(
    dd.SamplerColumnConfig(
        name="category",
        sampler_type=dd.SamplerType.CATEGORY,
        params=dd.CategorySamplerParams(values=["A", "B"]),
    )
)

💥 Breaking Change: Reasoning traces → Message traces

The automatic __reasoning_trace columns have been replaced with opt-in message traces that capture the full conversation history.

Key changes:

  • Column postfix renamed from __reasoning_trace to __trace
  • Traces are now opt-in rather than automatic
  • Traces capture the full message history (system/user/assistant), including retry conversations

Before (v0.3.x):

Reasoning traces were automatically generated as side-effect columns for extended thinking models:

# Traces were automatic - no configuration needed
# Column "answer" would automatically produce "answer__reasoning_trace"

After (v0.4.x):

Enable traces explicitly per-column or globally:

Per-column (recommended):

import data_designer.config as dd

config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="answer",
        prompt="Answer: {{ question }}",
        model_alias="nvidia-text",
        with_trace=True,  # Opt-in to trace capture
    )
)
# Produces "answer" and "answer__trace" columns

Global debug override:

import data_designer.config as dd
from data_designer.interface import DataDesigner

data_designer = DataDesigner()
data_designer.set_run_config(
    dd.RunConfig(debug_override_save_all_column_traces=True)
)

The trace data structure is now a list[dict] capturing the ordered message history:

[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"},
    {"role": "assistant", "content": "4", "reasoning_content": None}
]

What's Changed

  • feat: Add /create-pr skill for well-formatted GitHub PRs by @johnnygreco in #247
  • docs: Fix mkdocs syntax and update person sampling documentation by @johnnygreco in #249
  • refactor: slim package refactor into three subpackages by @johnnygreco in #240
  • chore: add publish script and update license headers by @johnnygreco in #253
  • chore: add CODEOWNERS for automatic PR review assignment by @andreatgretel in #251
  • feat: allow skipping health checks by @nabinchha in #244
  • chore: copy README to data-designer package during install by @johnnygreco in #256
  • feat: support multiple images per column in image context by @nabinchha in #257
  • fix: escape special characters in SchemaTransformProcessor JSON templates by @andreatgretel in #250
  • chore: update telemetry by @johntmyers in #261
  • feat: add /update-pr skill and improve /create-pr file linking by @johnnygreco in #258
  • feat: Add /commit skill for conventional commit messages by @johnnygreco in #252
  • fix: automate README sync for data-designer package builds by @andreatgretel in #266
  • chore: simplify publish script by removing redundant rebuild step by @johnnygreco in #268
  • feat: add job progress logging for cell-by-cell generation by @eric-tramel in #259
  • feat: add message trace support for LLM generation by @johnnygreco in #272
  • chore: add animated emoji progress indicators to progress tracker by @johnnygreco in #273
  • feat: Add Phase 1 languages (Bash, C, C++, C#, COBOL) to CodeLang by @kirit93 in #271
  • fix: ensure 100% progress is logged exactly once by @johnnygreco in #276

Full Changelog: v0.3.8...v0.4.0

v0.3.8 2026-01-26

27 Jan 02:28
5402b7d

Choose a tag to compare

👀 New Nemotron-Personas Datasets

PersonSampler supports two new locales:

What's Changed

Full Changelog: v0.3.7...v0.3.8