Skip to content

Conversation

@jiahuili430
Copy link
Contributor

@jiahuili430 jiahuili430 commented Feb 6, 2026

Overview

  • If there is no checkpoint and no since_seq, then replicate from scratch.
  • If there is no checkpoint but since_seq is defined, then use the value of
    since_seq field to replicate.
  • If both checkpoint and since_seq exist, use the checkpoint to replicate.

Testing recommendations

Related Issues or Pull Requests

Checklist

  • This is my own work, I did not use AI, LLM's or similar technology
  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • Documentation changes were made in the src/docs folder
  • Documentation changes were backported (separated PR) to affected branches

@jiahuili430 jiahuili430 changed the title Improve replication Improve replication since_seq parameter Feb 6, 2026

StartSeq1 = get_value(since_seq, Options, StartSeq0),
StartSeq = {0, StartSeq1},
StartSeq2 =
Copy link
Contributor

@nickva nickva Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't in general compare sequences. But we can compare with 0 if we're starting from scratch. We should use == 0.

Based on the discussion in #5867 we'd want something like this:

If there is a checkpoint (StartSeq0 =/= 0) use the checkpoint and ignore since_seq. Otherwise (no checkpoint) and user specified a since_seq, use the since_seq

In addition make sure the since_seq is folded into the replication ID (append to the list if it's there but if it isn't make sure to generate the same replication ID as before). This will rewind changes for a existing replications with since_seq back to 0. So it's something to warn the users about in the release notes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both checkpoint and since_seq exist, may I compare their prefixes and then decide which one to use? Like in the test case of t_replicate_with_checkpoint_and_since_seq().

compare_rep_seq(Seq1, Seq2) ->
    [Prefix1 | _] = binary:split(Seq1, <<"-">>),
    [Prefix2 | _] = binary:split(Seq2, <<"-">>),
    binary_to_integer(Prefix1) > binary_to_integer(Prefix2).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we can't compare sequences reliably that way. If we were to compare, we could get the number of pending changes with both sequences with two separate extra requests but even that is not as reliable. Instead, I think the best bet is to go with what we discussed in #5867 - mix the since_seq into the replication ID generation hash and not use a since_seq if we find a checkpoint (we can always compare with a 0 I think).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code and tests, thank you for the review!

@jiahuili430 jiahuili430 force-pushed the improve-replication branch 6 times, most recently from 7ed7b9c to d171417 Compare February 10, 2026 20:35
- If there is no checkpoint and no `since_seq`, then replicate from scratch.
- If there is no checkpoint but `since_seq` is defined, then use the value of
  `since_seq` field to replicate.
- If both checkpoint and `since_seq` exist, use the checkpoint to replicate.
@jiahuili430 jiahuili430 marked this pull request as ready for review February 10, 2026 22:43
[<<"winning_revs_only">>]
end,
couch_util:to_hex(couch_hash:md5_hash(?term_to_bin(Base3))).
Base4 =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For coverage might be a good idea to add a few tests in this module some with and without since_seq ensuring we get a new revid on a since_seq.

% StartSeq1: If the `since_seq` field is defined, it's equal to the
% value of that field; otherwise, it's equal to StartSeq0.
StartSeq2 =
case StartSeq0 =:= 0 of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to get away with one less StartSeq var if we do

StartSeq1 = case StartSeq0 of
    0 -> StartSeq0; 
    _ -> get_value(since_seq, Options, StartSeq0)
end

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants