Describe the solution you'd like
Right now PyVRP's CLI calculates a few aggregate measures: avg. iterations, avg. runtime, and avg. objective. These are all simple statistics, and can be quite limited in understanding actual performance since they do not compare against the BKSs. Statistics I am also interested in are e.g., avg/min gap to BKS, and for example avg/max primal integral to BKS, and possibly more I have not yet considered.
We could incorporate something like this into the benchmark output by extending it with a "BKS list" argument, which takes BKSs in the same order as the given instances. These BKSs are then used to compute the statistics listed above.
Describe the solution you'd like
Right now PyVRP's CLI calculates a few aggregate measures: avg. iterations, avg. runtime, and avg. objective. These are all simple statistics, and can be quite limited in understanding actual performance since they do not compare against the BKSs. Statistics I am also interested in are e.g., avg/min gap to BKS, and for example avg/max primal integral to BKS, and possibly more I have not yet considered.
We could incorporate something like this into the benchmark output by extending it with a "BKS list" argument, which takes BKSs in the same order as the given instances. These BKSs are then used to compute the statistics listed above.