Why is the context length in the e2e phase twice that of the block-ap phase? Has this been validated by ablation experiments?
Why is the context length in the e2e phase twice that of the block-ap phase? Has this been validated by ablation experiments?