Bug 2007416 - wipe individual logins on DecryptionErrors by bendk · Pull Request #7343 · mozilla/application-services

bendk · 2026-04-28T16:39:47Z

First commit is prep work, without much changes.
Second commit is the actual change.

Pull Request checklist

Breaking changes: This PR follows our breaking change policy
- This PR follows the breaking change policy:
  - This PR has no breaking API changes, or
  - There are corresponding PRs for our consumer applications that resolve the breaking changes and have been approved
Quality: This PR builds and tests run cleanly
- Note:
  - For changes that need extra cross-platform testing, consider adding [ci full] to the PR title.
  - If this pull request includes a breaking change, consider cutting a new release after merging.
Tests: This PR includes thorough tests or an explanation of why it does not
Changelog: This PR includes a changelog entry in CHANGELOG.md or an explanation of why it does not need one
- Any breaking changes to Swift or Kotlin binding APIs are noted explicitly
Dependencies: This PR follows our dependency management guidelines
- Any new dependencies are accompanied by a summary of the due diligence applied in selecting them.

bendk · 2026-04-28T16:43:25Z

            named_params! {
                ":now_millis": now_ms,
                ":guid": id,
+                ":sync_status": SyncStatus::Changed as u8,


touch() now also updates the sync_status column. I needed this to make the tests pass, but it seems like the code should have always done that to me. Maybe we were just not using this method enough to notice?

bendk · 2026-04-28T16:44:54Z

-        Ok(Some(ServerTimestamp(millis)))
+        Ok(db
+            .get_meta::<i64>(schema::LAST_SYNC_META_KEY)?
+            .map(ServerTimestamp))


This also needed changes to make the tests pass, because it seemed like the old code was not correct. Before this would panic if there was no metadata for LAST_SYNC_META_KEY, now it returns None.

bendk · 2026-04-28T16:45:56Z

+        self.execute("DELETE FROM loginsM WHERE guid=?", [guid])?;
+        // Delete the LAST_SYNC_META_KEY so that we resync the logins data next time
+        self.delete_meta(schema::LAST_SYNC_META_KEY)?;
+        tx.commit()?;


There's lots of incidental changes in this PR, but this is the main change.

bendk · 2026-05-06T20:52:32Z

+            username: "cool_username".into(),
+            password: "hunter2".into(),
+            ..Default::default()
+        })


I think I probably broke this test and I'm not sure exactly how to fix it or even run it (do I need to create a burner sync account for this?).

Is this still worthwhile to keep? It seems a bit redundant with the TPS tests. I can definitely fix this if it's providing value, but I want to make sure someone's using it before I do that.

I don't think it's redundant with TPS because that's primarily just testing the existing desktop stuff, so the Rust components don't get much coverage - ie, this was intended to kinda be "TPS but for rust components" :) It uses the same auth as that fxa example, so should start an oauth flow and remember the account in .fxa-cli-credentials.json (although I don't recall how it manages 2 accounts!). It doesn't have much value as it's not run automatically etc, so in that regard it is even more like TPS (which also doesn't run in CI, so has questionable value, but we're also reluctant to just kill that because there is some value there)

I'd actually be surprised if it hasn't already gone stale, so I'd be fine with even just leaving a comment here explaining you think it might be broken and a reference to this PR, and we can have a longer discussion about it later.

You're right it was already stale before this. I tested this on main and it's also failing with:

thread 'main' (3767285) panicked at testing/sync-test/src/logins.rs:278:5: assertion failed: c1.logins_store.delete(&l0id).expect("Delete should work")

I'm thinking we should leave the code as-is, create a ticket for it, and update the readme.

mhammond

I'm still not entirely convinced wiping is the right thing to do here - I recall we started a discussion but don't remember it concluding - I think we agreed to rope in the credentials management team - did they say we should take this route? I'm fine with this if they did.

mhammond · 2026-05-07T04:53:00Z

+            username: "cool_username".into(),
+            password: "hunter2".into(),
+            ..Default::default()
+        })


I don't think it's redundant with TPS because that's primarily just testing the existing desktop stuff, so the Rust components don't get much coverage - ie, this was intended to kinda be "TPS but for rust components" :) It uses the same auth as that fxa example, so should start an oauth flow and remember the account in .fxa-cli-credentials.json (although I don't recall how it manages 2 accounts!). It doesn't have much value as it's not run automatically etc, so in that regard it is even more like TPS (which also doesn't run in CI, so has questionable value, but we're also reluctant to just kill that because there is some value there)

I'd actually be surprised if it hasn't already gone stale, so I'd be fine with even just leaving a comment here explaining you think it might be broken and a reference to this PR, and we can have a longer discussion about it later.

bendk · 2026-05-07T13:02:59Z

I'm still not entirely convinced wiping is the right thing to do here - I recall we started a discussion but don't remember it concluding - I think we agreed to rope in the credentials management team - did they say we should take this route? I'm fine with this if they did.

I'm not sure either, but my memory is that we agreed this was an okay short-term step since it's the essentially the same behavior as when the canary check fails. Long-term, we should have the discussion with more teams and we may decide to go the other route: keep the DB row around and present it to the user as a login with missing data (or maybe some 3rd route). There actually is some urgency to this since the logins success rates are pretty low and most of the failures are decryption errors.

That said, I do think it's a good idea to get the OK from credential management on this. WYDT @jo?

BTW, here's the bugzilla bug for the longer-term decision.

jo · 2026-05-07T17:22:03Z

Oh, I think that's a tricky issue.

I'd prefer to control that via a feature flag so we can enable it selectively. I'd be reluctant to have that enabled on the desktop for now.

We often have problems getting the key from the OSKeyStore as well. Here, with form autofill data, we had actually implemented a similar mechanism and had just removed it when we discovered it. In this case, if no key was returned, a new one was created and stored immediately, which actively reset the credit card data. With OSKeyStore, there are many different reasons why the key might not be available at the moment. https://sql.telemetry.mozilla.org/queries/113887 lists 29 different errors only for Darwin. While this all pertains to OSKeyStore, I can imagine that similar issues exist on iOS or Android, and that the keys may simply be temporarily unavailable under certain circumstances.

For the desktop, we have at least roughly considered various future backup strategies, partly for other reasons as well. One option is via FxA, with manual user interaction. Another possibility is backup codes (as with GH or 1Password) or the primary password.

In this context, I could also imagine that we’d at least wait to wipe the data until we’re syncing, right? That way, there’s a high probability the data will be restored immediately.

By the way, thanks a lot for the groundwork - I didn’t dare touch that back then :) It’s much cleaner this way.

So, to cut a long story short: if we enable this with a feature flag, which we’d initially activate only for Android and/or iOS, I’d be okay with that as an interim solution.

bendk · 2026-05-08T14:32:48Z

Thanks for the thoughtful review. I wasn't thinking about desktop at all when I implemented this. Added a feature flag for this one, how does it look now?

jo

To take a step back: I’m not opposed to a pragmatic short-term workaround for mobile devices. I understand the urgency and have no problem with us opting for an interim approach, provided we explicitly state this compromise.

However, I want us to make sure that the short-term solution doesn’t create a situation that will be difficult to maintain.

My two main concerns are:

decrypt_fields now has a permanent side effect. This makes the function harder to understand, since it no longer just decrypts fields and reports errors, but also modifies local storage and affects synchronization behavior.
I’m not sure if a restore per login is the right approach. If the error is actually caused by an unavailable/corrupted key, it could be a systemic error rather than an error in a single data record. In this case, deleting one login at a time could leave the local storage in a partially restored state and potentially lead to repeated synchronization behavior. I may be misinterpreting this part, so I would like to clarify the expected behavior.

Who else should be involved in deciding on the recovery strategy here? What do the mobile team members say about this?

jo · 2026-05-11T08:39:24Z

+        self.execute("DELETE FROM loginsL WHERE guid=?", [guid])?;
+        self.execute("DELETE FROM loginsM WHERE guid=?", [guid])?;
+        // Delete the LAST_SYNC_META_KEY so that we resync the logins data next time
+        self.delete_meta(schema::LAST_SYNC_META_KEY)?;


Does this mean we re-sync all logins after a wipe of a single login?

Yeah, unfortunately. I don't think this can be improved that much though, since the only tool we have here is to pass the server a last_modified time. We could try to set that to the last modified of the deleted record, but I was a bit worried that would have some subtle bugs and we're still fetching half the total records on average.

jo · 2026-05-11T08:52:43Z

    pub fn decrypt_fields(&self, db: &LoginDb) -> Result<SecureLoginFields> {
-        SecureLoginFields::decrypt(&self.sec_fields, db.encdec.as_ref(), &self.meta.id)
+        SecureLoginFields::decrypt(&self.sec_fields, db.encdec.as_ref(), &self.meta.id).inspect_err(
+            |e| {
+                #[cfg(feature = "wipe-on-decryption-error")]
+                if matches!(e, Error::DecryptionFailed(_)) {
+                    if let Err(e) = db.wipe_local_login_data(&self.meta.id) {
+                        warn!("Failed to wipe login on decryption failure: {e}");
+                    }
+                }
+            },
+        )
    }


I'm not sure if deleting individual entries during decrypt_fields is the right place to do this.

If the decryption error is caused by a corrupted or unavailable key, I would expect this to affect all entries, not just the current one. Since this happens implicitly during reading and the first decryption throws an error, a “Fetch-All” operation would result in only one entry being deleted at first, then all of them being synchronized. And that would happen again on the next Fetch-All until all affected logins had been repaired. And throw an error (to the user?) every time.

It also seems as though this leaves the database in an inconsistent intermediate state: some logins have already been deleted/repaired, while others still exist but cannot be decrypted, even though the underlying problem is global.

Regardless, decrypt_fields now has a persistent side effect, which is surprising given the method’s name. I would expect this method to simply decrypt fields and report errors. Perhaps the caller should decide how to handle decryption errors, and if we want to delete data, this should be done in a more explicit recovery path.

I'm pretty sure that it's just one login. I wrote this in the bugzilla ticket:

It seems like it's just 1 login generating many errors. When I filter by user ID I see lots of errors, but they all list the same user ID. I can't be sure that this is always the case, but I've sampled quite a few errors at this point. It's seems unlikely to me the users have multiple logins with undecryptable data, but only 1 shows up in the error reports because I believe there are on Android at least there are startup functions that iterate over all logins.

Also, we have the check_canary function that should detect global errors when we fail to fetch the key.

However, you're right that this does introduce the possibility that we just detect/wipe one login at a time when all the logins are undecryptable. Also, it is a bit weird that decrypt_fields can delete data and it's not so easy to fix that because it's called by lots of functions and many of those don't feel like they should delete data either.

What if we moved this code to run_maintenence()? That seems like a decent place to do a bulk search for undecryptable logins. It might be a bit slower to catch errors, but it seems like these errors are pretty rare anyways.

there are startup functions that iterate over all logins.

I assumed we exit on the first decryption error in Store::list() but I'm not 100% sure. This would then always return the same error login uuid, even if more logins are affected.

Yes, I think run_maintenence() would be a perfect fit!

This was really easy since @lougeniaC64 already wrote this code. What do you both think of the new version? Also, I can split out the first refactoring commit as a separate PR if that would help.

That is indeed much simpler and clearer. And yes, the refactoring commit would really be useful!

Also: * Set sync_status in `LoginsDb::touch()` * Handle missing row for LAST_SYNC_META_KEY in `get_last_sync()`

bendk · 2026-05-12T13:14:17Z

+dictionary RunMaintenanceOptions {
+    // Wipe un-decryptable logins.  These will hopefully come back on the next sync.
+    // Defaults to true.
+    boolean delete_undecryptable_records_for_remote_replacement;


I made this default to true since I think it's what we want on iOS and Android. Desktop might want false, but it's not calling runMaintenance yet AFAICT.

mhammond

This is great!

mhammond · 2026-05-13T02:18:54Z

+    pub delete_undecryptable_records_for_remote_replacement: bool,
+}
+
+impl Default for RunMaintenanceOptions {


Do you also want a default in the udl? Foreign code passing an empty value created on the foreign side will need to specify this value. No big deal today, but if that grows more values it will make some sense.

bendk force-pushed the push-myztqrztlsuu branch from 6c5a0c6 to e8513bf Compare April 28, 2026 16:41

bendk commented Apr 28, 2026

View reviewed changes

bendk requested a review from mhammond April 28, 2026 16:46

bendk force-pushed the push-myztqrztlsuu branch 2 times, most recently from 99137dd to aac821a Compare May 6, 2026 20:49

bendk commented May 6, 2026

View reviewed changes

mhammond reviewed May 7, 2026

View reviewed changes

bendk requested a review from jo May 7, 2026 13:03

bendk force-pushed the push-myztqrztlsuu branch from aac821a to aa3939e Compare May 7, 2026 13:11

bendk force-pushed the push-myztqrztlsuu branch from aa3939e to fed1593 Compare May 8, 2026 14:31

mhammond approved these changes May 10, 2026

View reviewed changes

jo reviewed May 11, 2026

View reviewed changes

bendk force-pushed the push-myztqrztlsuu branch 5 times, most recently from 28d18fc to 712f4ef Compare May 11, 2026 20:32

jo approved these changes May 11, 2026

View reviewed changes

Bug 2007416 - wipe individual logins on DecryptionErrors

6e79a56

Also: * Set sync_status in `LoginsDb::touch()` * Handle missing row for LAST_SYNC_META_KEY in `get_last_sync()`

bendk force-pushed the push-myztqrztlsuu branch from 712f4ef to 6e79a56 Compare May 12, 2026 13:13

bendk commented May 12, 2026

View reviewed changes

mhammond approved these changes May 13, 2026

View reviewed changes

Conversation

bendk commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhammond left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bendk commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jo commented May 7, 2026

Uh oh!

bendk commented May 8, 2026

Uh oh!

jo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhammond left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bendk commented Apr 28, 2026 •

edited

Loading

mhammond left a comment •

edited

Loading

bendk commented May 7, 2026 •

edited

Loading