Skip to content

Add script to generate provisional nodes#1903

Open
vish-cs wants to merge 1 commit intodatacommonsorg:masterfrom
vish-cs:batch
Open

Add script to generate provisional nodes#1903
vish-cs wants to merge 1 commit intodatacommonsorg:masterfrom
vish-cs:batch

Conversation

@vish-cs
Copy link
Contributor

@vish-cs vish-cs commented Mar 4, 2026

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the import automation system by introducing a new tool for automatically identifying and generating provisional nodes for the knowledge graph. This helps maintain data consistency and reduces manual effort in defining new entities. Concurrently, the PR streamlines the cloud job creation process by refactoring relevant functions and removing outdated testing scripts, leading to a cleaner and more efficient codebase. Configuration files have also been updated to reflect these changes, simplifying the overall import setup.

Highlights

  • New Provisional Node Generation Script: A new Python script, generate_provisional_nodes.py, has been added to automatically scan MCF files, identify referenced nodes that are not locally defined, and optionally check against Google Cloud Spanner to determine truly missing nodes. It then generates a provisional_nodes.mcf file for these missing definitions.
  • Cloud Batch Job Creation Refactoring: The create_job_request function in cloud_batch.py has been refactored to simplify its signature and conditionally include resource specifications in the argument payload, making it more flexible.
  • Removal of Obsolete Test Scripts: Two shell scripts, cloud_batch_import_test.sh and cloud_run_import_test.sh, which were used for testing cloud batch and cloud run imports, have been removed.
  • Import Configuration Streamlining: The import_version_override and graph_data_path fields have been removed from the config_override section in manifest.json, simplifying import configurations. Additionally, logic for handling DATE_VERSION_PLACEHOLDER in import versions was removed from import_executor.py.
  • Script Renaming and Relocation: Several utility scripts, reschedule_all_imports.sh and update_import_version.sh, have been moved into a new scripts subdirectory and their internal paths updated accordingly.
Changelog
  • import-automation/executor/app/executor/cloud_batch.py
    • Modified the create_job_request function signature to remove default_resources and timeout parameters.
    • Updated resource handling logic to be conditional and to convert CPU and memory values appropriately.
    • Removed jobName, timeout, and schedule from the argument_payload.
  • import-automation/executor/app/executor/import_executor.py
    • Removed code that handled DATE_VERSION_PLACEHOLDER for import version determination.
  • import-automation/executor/app/executor/scheduler_job_manager.py
    • Updated the call to cloud_batch.create_job_request to match its new, simplified signature.
  • import-automation/executor/cloud_batch_import_test.sh
    • Removed the script for launching cloud batch jobs.
  • import-automation/executor/cloud_run_import_test.sh
    • Removed the script for launching cloud run jobs.
  • import-automation/executor/generate_provisional_nodes.py
    • Added a new Python script to scan MCF files, identify undefined nodes, and generate provisional_nodes.mcf.
  • import-automation/executor/reschedule_all_imports.sh
    • Renamed to import-automation/executor/scripts/reschedule_all_imports.sh.
    • Updated the internal path reference to schedule_update_import.sh.
  • import-automation/executor/update_import_version.sh
    • Renamed to import-automation/executor/scripts/update_import_version.sh.
  • scripts/entities/manifest.json
    • Removed import_version_override and graph_data_path from config_override for two import entries.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new script, generate_provisional_nodes.py, to identify and generate MCF for undefined nodes in a directory of MCF files. It also refactors the creation of Cloud Batch job requests, simplifying the payload. Several related changes are included, such as removing old test scripts and cleaning up obsolete configurations in a manifest file. My review focuses on potential bugs and improvements in the new and modified code. I've identified a critical bug in cloud_batch.py that could cause a KeyError, which has been addressed with a suggestion aligning with repository rules for explicit checks. Additionally, several areas for improvement in the new generate_provisional_nodes.py script have been noted to enhance its robustness and readability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant