Recent years have witnessed remarkable achievements in perceptual image restoration (IR), creating an urgent demand for accurate image quality assessment (IQA), which is essential for both performance comparison and algorithm optimization. Unfortunately, the existing IQA metrics exhibit inherent weakness for IR task, particularly when distinguishing fine-grained quality differences among restored images. To address this dilemma, we contribute the first-of-its-kind fine-grained image quality assessment dataset for image restoration, termed FGRestore, comprising 18,408 restored images across six common IR tasks. Beyond conventional scalar quality scores, FGRestore was also annotated with 30,886 fine-grained pairwise preferences. Based on FGRestore, a comprehensive benchmark was conducted on the existing IQA metrics, which reveal significant inconsistencies between score-based IQA evaluations and the fine-grained restoration quality. Motivated by these findings, we further propose FGResQ, a new IQA model specifically designed for image restoration, which features both coarse-grained score regression and fine-grained quality ranking. Extensive experiments and comparisons demonstrate that FGResQ significantly outperforms state-of-the-art IQA metrics. Data and code will be publicly available.
In image restoration tasks, both algorithm comparison and optimization frequently involve evaluating images with subtle quality differences. Algorithm comparison requires distinguishing between restoration results with marginal quality differences, while parameter optimization involves incremental quality changes that demand sensitive assessment methods to identify optimal configurations. To investigate whether existing IQA methods can objectively capture fine-grained quality differences in IR task, we conduct a comprehensive computational analysis on established IQA datasets. Specifically, we evaluate state-of-the-art IQA methods to assess their fine-grained discrimination capabilities.
Performance comparison across different MOS ranges on PIPAL dataset.
Type | Method | [0.0,0.2) SRCC |
[0.2,0.4) SRCC |
[0.4,0.6) SRCC |
[0.6,0.8) SRCC |
[0.8,1.0] SRCC |
Overall SRCC |
---|---|---|---|---|---|---|---|
FR | PSNR | 0.323 | 0.082 | 0.209 | 0.161 | 0.072 | 0.422 |
SSIM | 0.293 | 0.108 | 0.258 | 0.254 | 0.049 | 0.530 | |
LPIPS | -0.034 | 0.077 | 0.325 | 0.287 | 0.124 | 0.612 | |
DISTS | 0.168 | 0.159 | 0.310 | 0.242 | 0.165 | 0.585 | |
NR | NIQE | -0.126 | -0.002 | 0.107 | 0.001 | 0.080 | 0.153 |
IL-NIQE | -0.235 | -0.098 | 0.126 | 0.128 | 0.054 | 0.289 | |
BRISQUE | -0.142 | 0.025 | 0.125 | 0.035 | 0.131 | 0.185 | |
DB-CNN | -0.157 | 0.321 | 0.353 | 0.330 | -0.016 | 0.636 | |
HyperIQA | 0.100 | 0.274 | 0.314 | 0.292 | 0.032 | 0.584 | |
MetaIQA | 0.037 | 0.160 | 0.204 | 0.174 | -0.101 | 0.423 | |
LIQE | -0.232 | 0.053 | 0.175 | 0.299 | 0.107 | 0.479 | |
CLIP-IQA | -0.152 | 0.211 | 0.238 | 0.293 | 0.071 | 0.530 | |
Q-Align | 0.230 | 0.301 | 0.337 | 0.213 | 0.178 | 0.418 | |
DeQA-Score | 0.568 | 0.676 | 0.623 | 0.516 | 0.350 | 0.747 |
Based on FGRestore, we propose FGResQ, a new fine-grained image quality assessment model for perceptual image restoration evaluation. FGResQ consists of two main components: (a) Degradation-aware Feature Learning that incorporates restoration task knowledge to enable unified evaluation across multiple IR tasks, and (b) Dual-branch Quality Prediction that simultaneously handles both coarse-grained score regression and fine-grained pairwise ranking.
Performance comparison on FGRestore dataset across different IR tasks. "-" indicates no reference images available.
Type | Method | Pub. | Deblurring | Denoising | Deraining | Dehazing | MixtureRestoration | SuperResolution | Average | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SRCC | PLCC | ACC | SRCC | PLCC | ACC | SRCC | PLCC | ACC | SRCC | PLCC | ACC | SRCC | PLCC | ACC | SRCC | PLCC | ACC | SRCC | PLCC | ACC | |||
FR | PSNR | - | 0.187 | 0.167 | 0.634 | 0.487 | 0.482 | 0.775 | - | - | - | - | - | - | 0.280 | 0.248 | 0.643 | 0.296 | 0.303 | 0.624 | 0.313 | 0.300 | 0.669 |
SSIM | TIP'04 | 0.441 | 0.348 | 0.695 | 0.642 | 0.652 | 0.789 | - | - | - | - | - | - | 0.421 | 0.379 | 0.684 | 0.351 | 0.361 | 0.641 | 0.464 | 0.435 | 0.702 | |
LPIPS | CVPR'18 | 0.776 | 0.700 | 0.755 | 0.673 | 0.680 | 0.765 | - | - | - | - | - | - | 0.466 | 0.475 | 0.658 | 0.448 | 0.460 | 0.666 | 0.591 | 0.579 | 0.711 | |
DISTS | TPAMI'20 | 0.907 | 0.901 | 0.845 | 0.679 | 0.672 | 0.739 | - | - | - | - | - | - | 0.495 | 0.464 | 0.644 | 0.482 | 0.488 | 0.658 | 0.640 | 0.631 | 0.721 | |
NR | NIQE | SPL'12 | 0.382 | 0.410 | 0.529 | 0.240 | 0.151 | 0.564 | 0.030 | 0.057 | 0.694 | 0.036 | 0.063 | 0.446 | 0.041 | 0.027 | 0.541 | 0.176 | 0.072 | 0.503 | 0.151 | 0.130 | 0.546 |
BRISQUE | TIP'12 | 0.354 | 0.354 | 0.555 | 0.254 | 0.114 | 0.569 | 0.112 | 0.102 | 0.603 | 0.132 | 0.088 | 0.436 | 0.201 | 0.064 | 0.575 | 0.142 | 0.040 | 0.514 | 0.199 | 0.127 | 0.542 | |
DB-CNN | TCSVT'22 | 0.788 | 0.786 | 0.688 | 0.478 | 0.431 | 0.611 | 0.243 | 0.259 | 0.437 | 0.643 | 0.645 | 0.524 | 0.415 | 0.460 | 0.614 | 0.459 | 0.454 | 0.643 | 0.504 | 0.506 | 0.586 | |
HyperIQA | CVPR'25 | 0.871 | 0.887 | 0.402 | 0.605 | 0.625 | 0.675 | 0.264 | 0.294 | 0.484 | 0.643 | 0.674 | 0.409 | 0.523 | 0.535 | 0.499 | 0.437 | 0.433 | 0.538 | 0.557 | 0.574 | 0.501 | |
CLIP-IQA | AAAI'23 | 0.867 | 0.785 | 0.765 | 0.474 | 0.440 | 0.625 | 0.241 | 0.221 | 0.349 | 0.547 | 0.499 | 0.546 | 0.302 | 0.300 | 0.580 | 0.244 | 0.186 | 0.571 | 0.446 | 0.405 | 0.573 | |
Q-Align | ICML'24 | 0.767 | 0.804 | 0.795 | 0.676 | 0.687 | 0.731 | 0.433 | 0.421 | 0.455 | 0.715 | 0.765 | 0.584 | 0.569 | 0.571 | 0.658 | 0.376 | 0.366 | 0.662 | 0.589 | 0.603 | 0.648 | |
DeQA-Score | CVPR'25 | 0.815 | 0.843 | 0.819 | 0.754 | 0.771 | 0.778 | 0.507 | 0.576 | 0.426 | 0.762 | 0.803 | 0.644 | 0.669 | 0.679 | 0.697 | 0.561 | 0.573 | 0.718 | 0.678 | 0.707 | 0.680 | |
Compare2Score | NeurIPS'24 | 0.769 | 0.813 | 0.757 | 0.661 | 0.679 | 0.687 | 0.074 | 0.108 | 0.790 | 0.334 | 0.381 | 0.436 | 0.494 | 0.519 | 0.635 | 0.317 | 0.315 | 0.607 | 0.441 | 0.469 | 0.652 | |
FGResQ | - | 0.926 | 0.910 | 0.873 | 0.759 | 0.777 | 0.760 | 0.496 | 0.518 | 0.778 | 0.821 | 0.854 | 0.669 | 0.698 | 0.706 | 0.713 | 0.521 | 0.536 | 0.721 | 0.703 | 0.717 | 0.752 |
Underline indicates runner-up, bold indicates best.
Deblurring
Dehazing
Denoising
Deraining
Mixture Restoration
Super-Resolution
Deblurring
Dehazing
Denoising
Deraining
Mixture Restoration
Super-Resolution
@article{sheng2025fg,
title={Fine-grained Image Quality Assessment for Perceptual Image Restoration},
author={Sheng, Xiangfei and Pan, Xiaofeng and Yang, Zhichao and Chen, Pengfei and Li, Leida},
journal={arXiv preprint arXiv:2508.14475},
year={2025}
}