Tool to reconcile property:value within nodes that are supercedeBy new nodes by ajaits · Pull Request #1901 · datacommonsorg/data

ajaits · 2026-03-03T15:53:12Z

No description provided.

…w nodes

gemini-code-assist · 2026-03-03T16:01:27Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust mechanism for schema reconciliation within the data processing pipeline. It allows for the automatic updating of properties and values in data nodes that have been superseded by newer definitions, ensuring data consistency and adherence to evolving schema standards. The new tool is designed to be flexible, supporting both in-place updates and the preservation of legacy data for specific node types, and can dynamically extend its schema knowledge through API lookups.

Highlights

New Schema Reconciliation Tool: A new SchemaReconciler class and command-line tool have been introduced to automatically update properties and values in MCF nodes based on a schema that defines supercededBy relationships. This allows for dynamic mapping of old or deprecated schema elements to their newer counterparts.
Integration into StatVar Processor: The new reconciliation logic has been integrated into the StatVarProcessor pipeline, enabling automatic reconciliation of statistical variables and observations during the data processing and output generation stages.
Handling of StatVarObservations: The reconciler provides an option to either update existing nodes in place or create new, duplicated nodes for StatVarObservation types when a property or value is remapped, preserving the original legacy observation if desired.
Dynamic Schema Lookup: The reconciler can dynamically query a DC API to fetch supercededBy information for properties or values not already present in the preloaded schema, ensuring comprehensive reconciliation.

Changelog

tools/statvar_importer/mcf_file_util.py
- Added type casting to string for input values in get_value_list to prevent errors with non-string inputs.
tools/statvar_importer/schema/schema_reconciler.py
- Added a new module schema_reconciler.py containing the SchemaReconciler class and its associated functions.
- Implemented logic to load schema definitions and reconcile input nodes by replacing supercededBy properties and values.
- Included functionality to lookup remapped schema elements via an API if not found locally.
- Provided a command-line interface for running the reconciliation process.
tools/statvar_importer/schema/schema_reconciler_test.py
- Added a new module schema_reconciler_test.py with unit tests for the SchemaReconciler class.
- Included tests for value reconciliation, property reconciliation, handling of StatVarObservation nodes, and scenarios with no changes or list values.
tools/statvar_importer/schema/schema_resolver.py
- Updated imports to retrieve get_node_dcid from mcf_file_util.
- Removed the local definition of get_node_dcid, centralizing its implementation.
tools/statvar_importer/stat_var_processor.py
- Imported the new SchemaReconciler class.
- Added a reconcile_nodes method to the StatVarProcessor class to apply schema reconciliation to statistical variables and observations.
- Integrated the reconcile_nodes call into the write_outputs method, making reconciliation an optional step during output generation.
util/dc_api_wrapper.py
- Updated the docstring for dc_api_get_node_property to accurately reflect its return type as a dictionary mapping input DCIDs to property:value pairs.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new tool, SchemaReconciler, to reconcile schema nodes based on supercededBy properties, integrating it into stat_var_processor with unit tests. A critical arbitrary file write vulnerability and a high-severity path traversal vulnerability have been identified in schema_reconciler.py. These issues arise from exposing command-line flags through a web interface and using them directly in file system operations without sanitization, which could allow an attacker to read or overwrite arbitrary files. Beyond these critical security concerns, the review also suggests improvements for the robustness and correctness of SchemaReconciler, such as more robust path and key generation, safer default arguments, and corrections in type hints and test assertions. Strict path validation and sanitization for all file-related flags are strongly recommended.

gemini-code-assist · 2026-03-03T16:05:38Z