Fine-grained Image Quality Assessment for Perceptual Image Restoration

Abstract

Recent years have witnessed remarkable achievements in perceptual image restoration (IR), creating an urgent demand for accurate image quality assessment (IQA), which is essential for both performance comparison and algorithm optimization. Unfortunately, the existing IQA metrics exhibit inherent weakness for IR task, particularly when distinguishing fine-grained quality differences among restored images. To address this dilemma, we contribute the first-of-its-kind fine-grained image quality assessment dataset for image restoration, termed FGRestore, comprising 18,408 restored images across six common IR tasks. Beyond conventional scalar quality scores, FGRestore was also annotated with 30,886 fine-grained pairwise preferences. Based on FGRestore, a comprehensive benchmark was conducted on the existing IQA metrics, which reveal significant inconsistencies between score-based IQA evaluations and the fine-grained restoration quality. Motivated by these findings, we further propose FGResQ, a new IQA model specifically designed for image restoration, which features both coarse-grained score regression and fine-grained quality ranking. Extensive experiments and comparisons demonstrate that FGResQ significantly outperforms state-of-the-art IQA metrics. Data and code will be publicly available.

Preliminary Validation Analysis

In image restoration tasks, both algorithm comparison and optimization frequently involve evaluating images with subtle quality differences. Algorithm comparison requires distinguishing between restoration results with marginal quality differences, while parameter optimization involves incremental quality changes that demand sensitive assessment methods to identify optimal configurations. To investigate whether existing IQA methods can objectively capture fine-grained quality differences in IR task, we conduct a comprehensive computational analysis on established IQA datasets. Specifically, we evaluate state-of-the-art IQA methods to assess their fine-grained discrimination capabilities.

Performance comparison across different MOS ranges on PIPAL dataset.

Type	Method	[0.0,0.2) SRCC	[0.2,0.4) SRCC	[0.4,0.6) SRCC	[0.6,0.8) SRCC	[0.8,1.0] SRCC	Overall SRCC
FR	PSNR	0.323	0.082	0.209	0.161	0.072	0.422
	SSIM	0.293	0.108	0.258	0.254	0.049	0.530
	LPIPS	-0.034	0.077	0.325	0.287	0.124	0.612
	DISTS	0.168	0.159	0.310	0.242	0.165	0.585
NR	NIQE	-0.126	-0.002	0.107	0.001	0.080	0.153
	IL-NIQE	-0.235	-0.098	0.126	0.128	0.054	0.289
	BRISQUE	-0.142	0.025	0.125	0.035	0.131	0.185
	DB-CNN	-0.157	0.321	0.353	0.330	-0.016	0.636
	HyperIQA	0.100	0.274	0.314	0.292	0.032	0.584
	MetaIQA	0.037	0.160	0.204	0.174	-0.101	0.423
	LIQE	-0.232	0.053	0.175	0.299	0.107	0.479
	CLIP-IQA	-0.152	0.211	0.238	0.293	0.071	0.530
	Q-Align	0.230	0.301	0.337	0.213	0.178	0.418
	DeQA-Score	0.568	0.676	0.623	0.516	0.350	0.747

FGResQ

Based on FGRestore, we propose FGResQ, a new fine-grained image quality assessment model for perceptual image restoration evaluation. FGResQ consists of two main components: (a) Degradation-aware Feature Learning that incorporates restoration task knowledge to enable unified evaluation across multiple IR tasks, and (b) Dual-branch Quality Prediction that simultaneously handles both coarse-grained score regression and fine-grained pairwise ranking.

Experiments

Performance comparison on FGRestore dataset across different IR tasks. "-" indicates no reference images available.

Type	Method	Pub.	Deblurring			Denoising			Deraining			Dehazing			MixtureRestoration			SuperResolution			Average
Type	Method	Pub.	SRCC	PLCC	ACC	SRCC	PLCC	ACC	SRCC	PLCC	ACC	SRCC	PLCC	ACC	SRCC	PLCC	ACC	SRCC	PLCC	ACC	SRCC	PLCC	ACC
FR	PSNR	-	0.187	0.167	0.634	0.487	0.482	0.775	-	-	-	-	-	-	0.280	0.248	0.643	0.296	0.303	0.624	0.313	0.300	0.669
	SSIM	TIP'04	0.441	0.348	0.695	0.642	0.652	0.789	-	-	-	-	-	-	0.421	0.379	0.684	0.351	0.361	0.641	0.464	0.435	0.702
	LPIPS	CVPR'18	0.776	0.700	0.755	0.673	0.680	0.765	-	-	-	-	-	-	0.466	0.475	0.658	0.448	0.460	0.666	0.591	0.579	0.711
	DISTS	TPAMI'20	0.907	0.901	0.845	0.679	0.672	0.739	-	-	-	-	-	-	0.495	0.464	0.644	0.482	0.488	0.658	0.640	0.631	0.721
	A-FINE	CVPR'25	0.907	0.796	0.865	0.624	0.620	0.747	-	-	-	-	-	-	0.536	0.528	0.645	0.421	0.434	0.688	0.622	0.594	0.736
NR	NIQE	SPL'12	0.382	0.410	0.529	0.240	0.151	0.564	0.030	0.057	0.694	0.036	0.063	0.446	0.041	0.027	0.541	0.176	0.072	0.503	0.151	0.130	0.546
	BRISQUE	TIP'12	0.354	0.354	0.555	0.254	0.114	0.569	0.112	0.102	0.603	0.132	0.088	0.436	0.201	0.064	0.575	0.142	0.040	0.514	0.199	0.127	0.542
	DB-CNN	TCSVT'22	0.788	0.786	0.688	0.478	0.431	0.611	0.243	0.259	0.437	0.643	0.645	0.524	0.415	0.460	0.614	0.459	0.454	0.643	0.504	0.506	0.586
	HyperIQA	CVPR'25	0.871	0.887	0.402	0.605	0.625	0.675	0.264	0.294	0.484	0.643	0.674	0.409	0.523	0.535	0.499	0.437	0.433	0.538	0.557	0.574	0.501
	CLIP-IQA	AAAI'23	0.867	0.785	0.765	0.474	0.440	0.625	0.241	0.221	0.349	0.547	0.499	0.546	0.302	0.300	0.580	0.244	0.186	0.571	0.446	0.405	0.573
	Q-Align	ICML'24	0.767	0.804	0.795	0.676	0.687	0.731	0.433	0.421	0.455	0.715	0.765	0.584	0.569	0.571	0.658	0.376	0.366	0.662	0.589	0.603	0.648
	DeQA-Score	CVPR'25	0.815	0.843	0.819	0.754	0.771	0.778	0.507	0.576	0.426	0.762	0.803	0.644	0.669	0.679	0.697	0.561	0.573	0.718	0.678	0.707	0.680
	Compare2Score	NeurIPS'24	0.769	0.813	0.757	0.661	0.679	0.687	0.074	0.108	0.790	0.334	0.381	0.436	0.494	0.519	0.635	0.317	0.315	0.607	0.441	0.469	0.652
	FGResQ	-	0.926	0.910	0.873	0.759	0.777	0.760	0.496	0.518	0.778	0.821	0.854	0.669	0.698	0.706	0.713	0.521	0.536	0.721	0.703	0.717	0.752

Underline indicates runner-up, bold indicates best.

Fine-grained Examples

Deblurring

Dehazing

Denoising

Deraining

Mixture Restoration

Super-Resolution

Qualitative Analysis