Skip to content

Refactor usgs meas#145

Draft
msweier wants to merge 4 commits intomainfrom
refactor_usgs_meas
Draft

Refactor usgs meas#145
msweier wants to merge 4 commits intomainfrom
refactor_usgs_meas

Conversation

@msweier
Copy link
Collaborator

@msweier msweier commented Mar 9, 2026

Refactored USGS measurement to work with new USGS api.

Added usgs api key support.

Note: Smaller max_digits increases collision probability.
Collisions are handled downstream by the duplicate-check logic.
"""
hex_digest = hashlib.md5(uuid_str.encode()).hexdigest()

Check failure

Code scanning / CodeQL

Use of a broken or weak cryptographic hashing algorithm on sensitive data High

Sensitive data (id)
is used in a hashing algorithm (MD5) that is insecure.

Copilot Autofix

AI 3 days ago

In general, to fix this issue you should avoid MD5 (and SHA‑1) for hashing sensitive data, even when you only need a deterministic mapping, and instead use a modern hash like SHA‑256 or SHA‑3. For password-like data, a password hashing function (Argon2, bcrypt, PBKDF2, scrypt) is required, but here we only need a deterministic integer from a UUID, so a simple SHA‑256 is appropriate.

The best minimal-impact fix is to change hashlib.md5 to a stronger general-purpose hash function such as hashlib.sha256, leaving the rest of the logic (hexdigest, integer conversion, modulo by 10**max_digits) unchanged. This preserves the function’s interface and behavior pattern (deterministic mapping, bounded range) while eliminating usage of the broken algorithm. No new imports are required because hashlib is already imported and provides sha256. Concretely, in cwmscli/usgs/getusgs_measurements_cda.py, update line 95 inside uuid_to_measurement_number to compute hex_digest using hashlib.sha256(uuid_str.encode()).hexdigest() instead of hashlib.md5(...). No additional methods or definitions are needed.

Suggested changeset 1
cwmscli/usgs/getusgs_measurements_cda.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/cwmscli/usgs/getusgs_measurements_cda.py b/cwmscli/usgs/getusgs_measurements_cda.py
--- a/cwmscli/usgs/getusgs_measurements_cda.py
+++ b/cwmscli/usgs/getusgs_measurements_cda.py
@@ -92,7 +92,7 @@
     Note: Smaller max_digits increases collision probability.
     Collisions are handled downstream by the duplicate-check logic.
     """
-    hex_digest = hashlib.md5(uuid_str.encode()).hexdigest()
+    hex_digest = hashlib.sha256(uuid_str.encode()).hexdigest()
     upper_bound = 10**max_digits
     return int(hex_digest, 16) % upper_bound
 
EOF
@@ -92,7 +92,7 @@
Note: Smaller max_digits increases collision probability.
Collisions are handled downstream by the duplicate-check logic.
"""
hex_digest = hashlib.md5(uuid_str.encode()).hexdigest()
hex_digest = hashlib.sha256(uuid_str.encode()).hexdigest()
upper_bound = 10**max_digits
return int(hex_digest, 16) % upper_bound

Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this is annoying. This hashing "vulnerability" is really just a way to store the measurement number since the USGS no longer use an integer to identify the field visit (see USACE/cwms-data-api#1624).

Not sure what you want to do because the old measurement endpoint is gone now, and this is a way to make it work without CDA updates, but it will probably get flagged constantly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be something we can add to tell it to ignore this

@msweier msweier requested review from Enovotny and krowvin March 9, 2026 18:30
Copy link
Collaborator

@krowvin krowvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the usgs api key option to the usgs init files

type=str,
help="file storing Api Key. One of api-key or api-key-loc are required",
)
usgs_api_key_option = click.option(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we move this to the usgs/__init__.py file. This appears to be where it is consumed (i.e. via @ decorator)

But it is contained in the utils/init.py common list but is not actually exported there.

I started to go on a chase to say that it should not be in the common key options only to realize the variable/option is in there but it's not imported with the others.

@msweier msweier marked this pull request as draft March 11, 2026 14:21
@msweier
Copy link
Collaborator Author

msweier commented Mar 11, 2026

I'm thinking I should maybe hold off on this unless folks really need the measurements in the next couple months. CDA has approved the change to the measurement endpoint to match the field visit id so I would need to fix it again then anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants