feat: Migrate EPA facility ID retrieval to use the Data Commons client API instead of direct SPARQL queries#1900
Conversation
…t API instead of direct SPARQL queries.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the EPA facility ID retrieval mechanism. Previously, the system relied on direct SPARQL queries to the Data Commons API. This change transitions the retrieval process to use the dedicated Data Commons client API, streamlining the interaction with the data platform and improving maintainability. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request migrates the EPA facility ID retrieval process from direct SPARQL queries to using the Data Commons client API. This change simplifies the code and leverages the Data Commons client for data retrieval. The test file has also been updated to reflect this change, using mocks instead of making actual API calls. The review comments suggest adding clarity to an empty dictionary in a download script, explaining test data in a test file, and removing an unused import.
I am having trouble creating individual review comments. Click here to see my feedback.
scripts/us_epa/parent_company/download_existing_facilities.py (45-46)
It would be helpful to add a comment explaining what the empty dictionary {} represents in this context. Is it a default value, or is it expected to be populated later?
scripts/us_epa/parent_company/download_existing_facilities_test.py (34-39)
The facility_nodes list contains a duplicate entry (epaGhgrpFacilityId/1001) and a None value. While the code handles these cases, it might be beneficial to add a comment explaining why these values are included in the test data. This will help future developers understand the purpose of these test cases.
scripts/us_epa/parent_company/download_existing_facilities_test.py (30-31)
The import of _V2_SPARQL_URL is no longer needed, and should be removed.
from scripts.us_epa.parent_company.download_existing_facilities import (
download_existing_facilities,)
No description provided.