Hi all,
Thanks for maintaining this benchmark! I have a suggestion regarding the current Poisson MCV evaluation for the denoising task.
Current Metric: Global-Scale Poisson Evaluation
In the current metric implementation, the predicted denoised count matrix is rescaled by a single global dataset-level factor before the Poisson negative log-likelihood (NLL) is computed. Specifically:
$$\hat{X}_{\,\mathrm{scaled}} = \hat{X} \cdot \frac{\text{total held-out counts}}{\text{total train counts}}$$
Then, the metric is computed as the mean of
$$\hat{X}_{\,\mathrm{scaled}} - Y \log (\hat{X}_{\,\mathrm{scaled}} + 10^{-6})$$
where (Y) is the splited test count matrix,
In this design, both gene composition and overall count magnitude (i.e., sequencing depth) at the dataset level are considered prediction targets.
Potential problem about the current method
A possible issue with the current global-scale Poisson evaluation is that it is kind of unfair to those methods that do not focus on depth correction (because many downstream scRNA-seq analyses are performed on normalized expression values, where library size/depth effects are intentionally removed).
For example, iterative kNN-smoothing predicts summed-up values across different smoothing scales without normalization. I think this could be one of the reasons that it shows poor performance in your online bechmarking results.
Potential Alternative: Cell-Wise Scaled Poisson Evaluation
Another useful option would be to evaluate each cell after scaling its denoised output to that cell’s expected held-out depth.
For each cell (i):
- Take the denoised output (\hat{x}_i).
- Convert to gene proportions and rescale to the expected held-out depth for that cell:
$$\pi_{ig} = \hat{x}_{ig} \frac{ \sum_g x_{i,g}^{(\mathrm{val})}}{ \sum_g \hat{x}_{ig}}$$
- Compute Poisson NLL against the held-out counts for the same cell.
Thanks again for the benchmark!
Hi all,
Thanks for maintaining this benchmark! I have a suggestion regarding the current Poisson MCV evaluation for the denoising task.
Current Metric: Global-Scale Poisson Evaluation
In the current metric implementation, the predicted denoised count matrix is rescaled by a single global dataset-level factor before the Poisson negative log-likelihood (NLL) is computed. Specifically:
Then, the metric is computed as the mean of
where (Y) is the splited test count matrix,
In this design, both gene composition and overall count magnitude (i.e., sequencing depth) at the dataset level are considered prediction targets.
Potential problem about the current method
A possible issue with the current global-scale Poisson evaluation is that it is kind of unfair to those methods that do not focus on depth correction (because many downstream scRNA-seq analyses are performed on normalized expression values, where library size/depth effects are intentionally removed).
For example, iterative kNN-smoothing predicts summed-up values across different smoothing scales without normalization. I think this could be one of the reasons that it shows poor performance in your online bechmarking results.
Potential Alternative: Cell-Wise Scaled Poisson Evaluation
Another useful option would be to evaluate each cell after scaling its denoised output to that cell’s expected held-out depth.
For each cell (i):
Thanks again for the benchmark!