Possibly add the SWE-CI benchmark: https://arxiv.org/abs/2603.03823
Possibly add the SWE-CI benchmark:
https://arxiv.org/abs/2603.03823