chore: Migrate gsutil usage to gcloud storage#1902
chore: Migrate gsutil usage to gcloud storage#1902bhandarivijay-png wants to merge 1 commit intodatacommonsorg:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request updates various shell scripts and documentation to standardize Google Cloud Storage operations. By migrating from the older Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly migrates gsutil commands to the recommended gcloud storage equivalent across several shell scripts and documentation files. The changes are consistent and accurate. I've added a few suggestions to make the gcloud storage cp commands more robust and idiomatic by removing wildcard usage for copying entire directory contents. This avoids potential issues with shell globbing and improves clarity.
| # TODO(shifucun): Handle copy GCS reference. In this case, user would | ||
| # store a link but not actual data in this repo. | ||
| gsutil cp -r "$folder/$import_group/data/*" "$gcs_path/$import_group/data/" | ||
| gcloud storage cp --recursive "$folder/$import_group/data/*" "$gcs_path/$import_group/data/" |
There was a problem hiding this comment.
For copying the contents of a local directory, it's more robust to specify the source directory with a trailing slash instead of using a wildcard (*). This avoids reliance on shell globbing, which can behave unexpectedly if the directory is empty. Using a trailing slash is the idiomatic way to copy directory contents with gcloud storage cp.
| gcloud storage cp --recursive "$folder/$import_group/data/*" "$gcs_path/$import_group/data/" | |
| gcloud storage cp --recursive "$folder/$import_group/data/" "$gcs_path/$import_group/data/" |
| #!/bin/bash | ||
| mkdir -p scratch | ||
| gsutil -m cp -R gs://datcom-csv/census/decennial/* scratch/ | ||
| gcloud storage cp --recursive gs://datcom-csv/census/decennial/* scratch/ |
There was a problem hiding this comment.
For clarity and consistency, it's better to specify the GCS source directory path without a trailing /* when copying its contents recursively. The gcloud storage cp --recursive command will correctly copy the contents of the specified GCS directory.
| gcloud storage cp --recursive gs://datcom-csv/census/decennial/* scratch/ | |
| gcloud storage cp --recursive gs://datcom-csv/census/decennial/ scratch/ |
| @@ -1,2 +1,2 @@ | |||
| mkdir -p gcs_folder/input_files | |||
| gsutil -m cp -r gs://unresolved_mcf/us_nces/demographics/private_school/semi_automation_input_files/* gcs_folder/input_files/ | |||
| gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/private_school/semi_automation_input_files/* gcs_folder/input_files/ | |||
There was a problem hiding this comment.
For clarity and consistency, it's better to specify the GCS source directory path without a trailing /* when copying its contents recursively. The gcloud storage cp --recursive command will correctly copy the contents of the specified GCS directory.
| gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/private_school/semi_automation_input_files/* gcs_folder/input_files/ | |
| gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/private_school/semi_automation_input_files/ gcs_folder/input_files/ |
| @@ -1,2 +1,2 @@ | |||
| mkdir -p gcs_folder/input_files | |||
| gsutil -m cp -r gs://unresolved_mcf/us_nces/demographics/public_school/semi_automation_input_files/* gcs_folder/input_files/ | |||
| gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/public_school/semi_automation_input_files/* gcs_folder/input_files/ | |||
There was a problem hiding this comment.
For clarity and consistency, it's better to specify the GCS source directory path without a trailing /* when copying its contents recursively. The gcloud storage cp --recursive command will correctly copy the contents of the specified GCS directory.
| gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/public_school/semi_automation_input_files/* gcs_folder/input_files/ | |
| gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/public_school/semi_automation_input_files/ gcs_folder/input_files/ |
| @@ -1,2 +1,2 @@ | |||
| mkdir -p gcs_folder/input_files | |||
| gsutil -m cp -r gs://unresolved_mcf/us_nces/demographics/school_district/semi_automation_input_files/* gcs_folder/input_files/ | |||
| gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/school_district/semi_automation_input_files/* gcs_folder/input_files/ | |||
There was a problem hiding this comment.
For clarity and consistency, it's better to specify the GCS source directory path without a trailing /* when copying its contents recursively. The gcloud storage cp --recursive command will correctly copy the contents of the specified GCS directory.
| gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/school_district/semi_automation_input_files/* gcs_folder/input_files/ | |
| gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/school_district/semi_automation_input_files/ gcs_folder/input_files/ |
| bash | ||
|
|
||
| gsutil cp gs://unresolved_mcf/cdc/social_vulnerability_index/latest/input_files/* source_files/ | ||
| gcloud storage cp gs://unresolved_mcf/cdc/social_vulnerability_index/latest/input_files/* source_files/ |
There was a problem hiding this comment.
To improve clarity in the documentation, you can remove the trailing /* from the GCS path. When copying objects from the top level of a GCS directory (non-recursively), specifying the directory path is sufficient and cleaner.
| gcloud storage cp gs://unresolved_mcf/cdc/social_vulnerability_index/latest/input_files/* source_files/ | |
| gcloud storage cp gs://unresolved_mcf/cdc/social_vulnerability_index/latest/input_files/ source_files/ |
No description provided.