-
Notifications
You must be signed in to change notification settings - Fork 298
fix: Prevent and clean up zero-valued entries in Alpha storage maps during hotkey and coldkey swaps #2513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
bittoby
wants to merge
4
commits into
opentensor:devnet-ready
Choose a base branch
from
bittoby:fix/remove-zero-alpha-entries
base: devnet-ready
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+547
−49
Open
fix: Prevent and clean up zero-valued entries in Alpha storage maps during hotkey and coldkey swaps #2513
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
2966490
fix: prevent zero-valued entries in Alpha storage maps and add multi-…
bittoby 50cae94
Merge branch 'devnet-ready' of https://github.com/opentensor/subtenso…
bittoby 110408f
fix: Use weight-based dynamic batching in migration to prevent block …
bittoby c3143d9
Fix cargo fmt formatting in migration
bittoby File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
216 changes: 216 additions & 0 deletions
216
pallets/subtensor/src/migrations/migrate_remove_zero_alpha.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,216 @@ | ||
| use super::*; | ||
| use frame_support::{traits::Get, weights::Weight}; | ||
| use log; | ||
| use scale_info::prelude::string::String; | ||
|
|
||
| /// The migration name used for the `HasMigrationRun` guard. | ||
| const MIGRATION_NAME: &[u8] = b"migrate_remove_zero_alpha_v2"; | ||
|
|
||
| /// Called from `on_runtime_upgrade`. Schedules the cleanup by setting phase = 1 | ||
| /// if the migration hasn't run yet. This is O(1) — no iteration. | ||
| pub fn migrate_remove_zero_alpha<T: Config>() -> Weight { | ||
| let migration_name = MIGRATION_NAME.to_vec(); | ||
| let mut weight = T::DbWeight::get().reads(1); | ||
|
|
||
| if HasMigrationRun::<T>::get(&migration_name) { | ||
| log::info!( | ||
| "Migration '{}' already completed. Skipping.", | ||
| String::from_utf8_lossy(&migration_name) | ||
| ); | ||
| return weight; | ||
| } | ||
|
|
||
| // Schedule the cleanup to run in on_idle by setting phase to 1 | ||
| ZeroAlphaCleanupPhase::<T>::put(1u8); | ||
| weight = weight.saturating_add(T::DbWeight::get().writes(1)); | ||
|
|
||
| log::info!( | ||
| "Migration '{}' scheduled. Will clean up zero entries via on_idle.", | ||
| String::from_utf8_lossy(&migration_name) | ||
| ); | ||
|
|
||
| weight | ||
| } | ||
|
|
||
| /// Called from `on_idle` each block. Uses `remaining_weight` to dynamically | ||
| /// bound how many entries to process. Stays on the same phase until all entries | ||
| /// in that map are cleaned, then advances to the next phase. | ||
| /// | ||
| /// Phases: | ||
| /// 0 = inactive/complete | ||
| /// 1 = cleaning Alpha | ||
| /// 2 = cleaning TotalHotkeyShares | ||
| /// 3 = cleaning TotalHotkeyAlphaLastEpoch | ||
| /// 4 = cleaning AlphaDividendsPerSubnet | ||
| pub fn on_idle_remove_zero_alpha<T: Config>(remaining_weight: Weight) -> Weight { | ||
| let phase = ZeroAlphaCleanupPhase::<T>::get(); | ||
|
|
||
| // Phase 0 means not active or already completed | ||
| if phase == 0 { | ||
| return Weight::zero(); | ||
| } | ||
|
|
||
| // Minimum weight needed: 1 read (phase) + at least one iteration (read + write) | ||
| let min_weight = T::DbWeight::get().reads_writes(2, 1); | ||
| if remaining_weight.ref_time() < min_weight.ref_time() { | ||
| return Weight::zero(); | ||
| } | ||
|
|
||
| let mut weight = T::DbWeight::get().reads(1); // reading phase | ||
|
|
||
| // Budget for batch work = remaining_weight minus overhead (phase read + phase write) | ||
| let overhead = T::DbWeight::get().reads_writes(1, 1); | ||
| let budget = remaining_weight.saturating_sub(overhead); | ||
|
|
||
| match phase { | ||
| 1 => { | ||
| let (consumed, removed, done) = clean_alpha_batch::<T>(budget); | ||
| weight = weight.saturating_add(consumed); | ||
| log::info!( | ||
| "Zero-alpha cleanup phase 1 (Alpha): removed {removed} zero entries this batch. Done: {done}" | ||
| ); | ||
| if done { | ||
| ZeroAlphaCleanupPhase::<T>::put(2u8); | ||
| weight = weight.saturating_add(T::DbWeight::get().writes(1)); | ||
| } | ||
| } | ||
| 2 => { | ||
| let (consumed, removed, done) = clean_total_hotkey_shares_batch::<T>(budget); | ||
| weight = weight.saturating_add(consumed); | ||
| log::info!( | ||
| "Zero-alpha cleanup phase 2 (TotalHotkeyShares): removed {removed} zero entries this batch. Done: {done}" | ||
| ); | ||
| if done { | ||
| ZeroAlphaCleanupPhase::<T>::put(3u8); | ||
| weight = weight.saturating_add(T::DbWeight::get().writes(1)); | ||
| } | ||
| } | ||
| 3 => { | ||
| let (consumed, removed, done) = clean_total_hotkey_alpha_last_epoch_batch::<T>(budget); | ||
| weight = weight.saturating_add(consumed); | ||
| log::info!( | ||
| "Zero-alpha cleanup phase 3 (TotalHotkeyAlphaLastEpoch): removed {removed} zero entries this batch. Done: {done}" | ||
| ); | ||
| if done { | ||
| ZeroAlphaCleanupPhase::<T>::put(4u8); | ||
| weight = weight.saturating_add(T::DbWeight::get().writes(1)); | ||
| } | ||
| } | ||
| 4 => { | ||
| let (consumed, removed, done) = clean_alpha_dividends_per_subnet_batch::<T>(budget); | ||
| weight = weight.saturating_add(consumed); | ||
| log::info!( | ||
| "Zero-alpha cleanup phase 4 (AlphaDividendsPerSubnet): removed {removed} zero entries this batch. Done: {done}" | ||
| ); | ||
| if done { | ||
| // All phases complete — mark migration as done | ||
| HasMigrationRun::<T>::insert(MIGRATION_NAME.to_vec(), true); | ||
| ZeroAlphaCleanupPhase::<T>::put(0u8); | ||
| weight = weight.saturating_add(T::DbWeight::get().writes(2)); | ||
| log::info!("Zero-alpha cleanup: All phases complete. Migration marked as done."); | ||
| } | ||
| } | ||
| _ => { | ||
| // Unknown phase, reset | ||
| ZeroAlphaCleanupPhase::<T>::put(0u8); | ||
| weight = weight.saturating_add(T::DbWeight::get().writes(1)); | ||
| } | ||
| } | ||
|
|
||
| weight | ||
| } | ||
|
|
||
| /// Remove zero-valued entries from Alpha, bounded by weight budget. | ||
| /// Returns (weight_consumed, entries_removed, is_done). | ||
| fn clean_alpha_batch<T: Config>(budget: Weight) -> (Weight, u64, bool) { | ||
| let read_cost = T::DbWeight::get().reads(1); | ||
| let write_cost = T::DbWeight::get().writes(1); | ||
| let per_entry_max = read_cost.saturating_add(write_cost); | ||
| let mut weight = Weight::zero(); | ||
| let mut removed = 0u64; | ||
|
|
||
| for ((hotkey, coldkey, netuid), value) in Alpha::<T>::iter() { | ||
| // Stop if not enough budget for one more entry (read + potential write) | ||
| if weight.saturating_add(per_entry_max).any_gt(budget) { | ||
| return (weight, removed, false); | ||
| } | ||
| weight = weight.saturating_add(read_cost); | ||
| if value == 0 { | ||
| Alpha::<T>::remove((hotkey, coldkey, netuid)); | ||
| weight = weight.saturating_add(write_cost); | ||
| removed = removed.saturating_add(1); | ||
| } | ||
| } | ||
|
|
||
| // Iterator exhausted — phase is done | ||
| (weight, removed, true) | ||
| } | ||
|
|
||
| /// Remove zero-valued entries from TotalHotkeyShares, bounded by weight budget. | ||
| fn clean_total_hotkey_shares_batch<T: Config>(budget: Weight) -> (Weight, u64, bool) { | ||
| let read_cost = T::DbWeight::get().reads(1); | ||
| let write_cost = T::DbWeight::get().writes(1); | ||
| let per_entry_max = read_cost.saturating_add(write_cost); | ||
| let mut weight = Weight::zero(); | ||
| let mut removed = 0u64; | ||
|
|
||
| for (hotkey, netuid, value) in TotalHotkeyShares::<T>::iter() { | ||
| if weight.saturating_add(per_entry_max).any_gt(budget) { | ||
| return (weight, removed, false); | ||
| } | ||
| weight = weight.saturating_add(read_cost); | ||
| if value == 0 { | ||
| TotalHotkeyShares::<T>::remove(hotkey, netuid); | ||
| weight = weight.saturating_add(write_cost); | ||
| removed = removed.saturating_add(1); | ||
| } | ||
| } | ||
|
|
||
| (weight, removed, true) | ||
| } | ||
|
|
||
| /// Remove zero-valued entries from TotalHotkeyAlphaLastEpoch, bounded by weight budget. | ||
| fn clean_total_hotkey_alpha_last_epoch_batch<T: Config>(budget: Weight) -> (Weight, u64, bool) { | ||
| let read_cost = T::DbWeight::get().reads(1); | ||
| let write_cost = T::DbWeight::get().writes(1); | ||
| let per_entry_max = read_cost.saturating_add(write_cost); | ||
| let mut weight = Weight::zero(); | ||
| let mut removed = 0u64; | ||
|
|
||
| for (hotkey, netuid, value) in TotalHotkeyAlphaLastEpoch::<T>::iter() { | ||
| if weight.saturating_add(per_entry_max).any_gt(budget) { | ||
| return (weight, removed, false); | ||
| } | ||
| weight = weight.saturating_add(read_cost); | ||
| if value.is_zero() { | ||
| TotalHotkeyAlphaLastEpoch::<T>::remove(hotkey, netuid); | ||
| weight = weight.saturating_add(write_cost); | ||
| removed = removed.saturating_add(1); | ||
| } | ||
| } | ||
|
|
||
| (weight, removed, true) | ||
| } | ||
|
|
||
| /// Remove zero-valued entries from AlphaDividendsPerSubnet, bounded by weight budget. | ||
| fn clean_alpha_dividends_per_subnet_batch<T: Config>(budget: Weight) -> (Weight, u64, bool) { | ||
| let read_cost = T::DbWeight::get().reads(1); | ||
| let write_cost = T::DbWeight::get().writes(1); | ||
| let per_entry_max = read_cost.saturating_add(write_cost); | ||
| let mut weight = Weight::zero(); | ||
| let mut removed = 0u64; | ||
|
|
||
| for (netuid, hotkey, value) in AlphaDividendsPerSubnet::<T>::iter() { | ||
| if weight.saturating_add(per_entry_max).any_gt(budget) { | ||
| return (weight, removed, false); | ||
| } | ||
| weight = weight.saturating_add(read_cost); | ||
| if value.is_zero() { | ||
| AlphaDividendsPerSubnet::<T>::remove(netuid, hotkey); | ||
| weight = weight.saturating_add(write_cost); | ||
| removed = removed.saturating_add(1); | ||
| } | ||
| } | ||
|
|
||
| (weight, removed, true) | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the actual cleanup executed in the on_idle, I am wondering if we really need the migration part.
For instance, we can set the ZeroAlphaCleanupPhase init value as 1 in storage.
then on_idle start to handle it until the value go to 0.
Or use different values for different steps, make it clearer.
What's your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We could simplify by setting the default to 1, but the issue is: after cleanup completes and sets phase=0, a node restart or fresh sync would read the default (1) again and re-trigger cleanup. The current approach uses
HasMigrationRunas a permanent guard to prevent this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@open-junius what do you think about it? Still should i need to update?