Conversation
| Note: Smaller max_digits increases collision probability. | ||
| Collisions are handled downstream by the duplicate-check logic. | ||
| """ | ||
| hex_digest = hashlib.md5(uuid_str.encode()).hexdigest() |
Check failure
Code scanning / CodeQL
Use of a broken or weak cryptographic hashing algorithm on sensitive data High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 days ago
In general, to fix this issue you should avoid MD5 (and SHA‑1) for hashing sensitive data, even when you only need a deterministic mapping, and instead use a modern hash like SHA‑256 or SHA‑3. For password-like data, a password hashing function (Argon2, bcrypt, PBKDF2, scrypt) is required, but here we only need a deterministic integer from a UUID, so a simple SHA‑256 is appropriate.
The best minimal-impact fix is to change hashlib.md5 to a stronger general-purpose hash function such as hashlib.sha256, leaving the rest of the logic (hexdigest, integer conversion, modulo by 10**max_digits) unchanged. This preserves the function’s interface and behavior pattern (deterministic mapping, bounded range) while eliminating usage of the broken algorithm. No new imports are required because hashlib is already imported and provides sha256. Concretely, in cwmscli/usgs/getusgs_measurements_cda.py, update line 95 inside uuid_to_measurement_number to compute hex_digest using hashlib.sha256(uuid_str.encode()).hexdigest() instead of hashlib.md5(...). No additional methods or definitions are needed.
| @@ -92,7 +92,7 @@ | ||
| Note: Smaller max_digits increases collision probability. | ||
| Collisions are handled downstream by the duplicate-check logic. | ||
| """ | ||
| hex_digest = hashlib.md5(uuid_str.encode()).hexdigest() | ||
| hex_digest = hashlib.sha256(uuid_str.encode()).hexdigest() | ||
| upper_bound = 10**max_digits | ||
| return int(hex_digest, 16) % upper_bound | ||
|
|
There was a problem hiding this comment.
Well this is annoying. This hashing "vulnerability" is really just a way to store the measurement number since the USGS no longer use an integer to identify the field visit (see USACE/cwms-data-api#1624).
Not sure what you want to do because the old measurement endpoint is gone now, and this is a way to make it work without CDA updates, but it will probably get flagged constantly.
There was a problem hiding this comment.
Should be something we can add to tell it to ignore this
krowvin
left a comment
There was a problem hiding this comment.
Please move the usgs api key option to the usgs init files
| type=str, | ||
| help="file storing Api Key. One of api-key or api-key-loc are required", | ||
| ) | ||
| usgs_api_key_option = click.option( |
There was a problem hiding this comment.
I suggest we move this to the usgs/__init__.py file. This appears to be where it is consumed (i.e. via @ decorator)
But it is contained in the utils/init.py common list but is not actually exported there.
I started to go on a chase to say that it should not be in the common key options only to realize the variable/option is in there but it's not imported with the others.
|
I'm thinking I should maybe hold off on this unless folks really need the measurements in the next couple months. CDA has approved the change to the measurement endpoint to match the field visit id so I would need to fix it again then anyway |
Refactored USGS measurement to work with new USGS api.
Added usgs api key support.