MirrorPPR: Exemplar-Based Portrait Photo Retouching

Liu, Zhihong; Li, Zheng; Jin, Jiachun; Kou, Siqi; Jian, Yitao; Yu, Fengpei; Deng, Zhijie

MirrorPPR: Exemplar-Based Portrait Photo Retouching

Zhihong Liu¹, Zheng Li¹, Jiachun Jin¹, Siqi Kou^1,2, Yitao Jian^1,2, Fengpei Yu², Zhijie Deng¹

¹Shanghai Jiao Tong University ²Triverse AI

arXiv Code Model Dataset

MirrorPPR-Faceon SimFace-100
MirrorPPR-Proon ProPortrait-500

MirrorPPR-Faceon SimFace-100
MirrorPPR-Proon ProPortrait-500

“Widen nasal alae”

Example source before retouching — Source

Instead of relying on ambiguous text instructions, MirrorPPR infers fine-grained structural retouching operations from a before-and-after exemplar pair—such as facial-feature adjustment, face-contour refinement, and body-proportion reshaping—and transfers them to a new portrait while preserving identity and leaving unrelated regions unchanged.

Abstract

While text-guided image editing has made remarkable progress, it remains limited in structural portrait retouching. Textual descriptions struggle to convey fine-grained changes to facial features and body proportions. To address this gap, we introduce Exemplar-Based Portrait Photo Retouching, where the model is given an exemplar pair and tasked with inferring and applying the same retouching operations to a new query image. Existing exemplar-based editing methods primarily focus on tasks with pronounced visual transformations. In contrast, structural portrait retouching involves extremely delicate and localized modifications, making accurate extraction and transfer of these edits challenging. To tackle this, we propose MirrorPPR, a novel framework specifically designed to capture and transfer subtle structural retouching operations. Our method uses a Retouching Operation Extractor to capture the subtle differences from the exemplar pair. The extracted representations are then injected into a pre-trained Diffusion Transformer (DiT) through a connector and Low-Rank Adaptation (LoRA) modules. Furthermore, constructing perfectly aligned cross-identity training pairs is severely hindered by operational misalignment. To overcome this, we propose an advanced data self-augmentation paradigm that ensures strictly aligned retouching operations. To alleviate data scarcity and support this novel task, we introduce MirrorPPR47M, a large-scale dataset with over 47 million retouched pairs. By structuring the dataset into simulated and professional subsets, we enable progressive curriculum learning to smoothly optimize the network. Extensive experiments demonstrate that MirrorPPR significantly outperforms existing baselines in both retouching quality and identity preservation.

Method

MirrorPPR formulates portrait photo retouching as exemplar-based retouching operation transfer. Given an exemplar source image X_s , its retouched counterpart X_t , and a new query portrait X_q , the goal is to generate a retouched result that applies the same operations demonstrated by the exemplar pair. The framework first uses a Retouching Operation Extractor to capture the subtle differences between X_s and X_t : a frozen MAE provides local, structure-aware patch features, while a trainable R-Former uses learnable query tokens to distill these features into a compact operation representation. The extractor is pre-trained with an auxiliary reconstruction task, encouraging the representation to encode precise retouching intent. After pre-training, the extracted operation representation is mapped by a connector into the conditioning space of a frozen dual-stream DiT backbone. Since the task is driven by visual demonstrations rather than text instructions, MirrorPPR conditions the diffusion model on the query image and the extracted retouching operation. The R-Former, connector, and newly added LoRA modules are jointly fine-tuned, enabling the model to learn retouching operation transfer while preserving the strong prior of the pre-trained editing model.

Extract operation → Inject into DiT → Transfer to query

Data Self-Augmentation

A central challenge in exemplar-based portrait retouching is constructing valid cross-identity training quadruplets. Because portraits differ in pose, shot scale, occlusion, and visible body regions, the same local retouching operation may not be applicable or spatially aligned across different identities. Meanwhile, the naive same-identity construction, where the exemplar source is directly reused as the query, introduces pixel-level shortcut learning: the model may simply copy coordinate-wise differences instead of understanding the underlying retouching semantics.

MirrorPPR addresses this with Data Self-Augmentation. For each source-target exemplar pair (X_s , X_t), we apply the same random spatial augmentation A , such as scaling, cropping, rotation, or horizontal flipping, to both images, forming X_q = A(X_s) and Y_q = A(X_t). This construction keeps the retouching operation strictly aligned between the exemplar pair and the query pair, while breaking their absolute coordinate correspondence. As a result, the model is encouraged to transfer the demonstrated operation according to the query portrait’s own spatial layout, enabling robust cross-identity generalization.

MirrorPPR47M Dataset

MirrorPPR47M is a large-scale dataset designed for exemplar-based portrait retouching. It contains over 47 million retouched pairs covering facial features, face contours, and body proportions. To make subtle real-world retouching learnable, the dataset is organized for an easy-to-hard curriculum: a Simulated Retouching Subset provides pronounced geometric deformations for learning fundamental structural variations, while a Professional Retouching Subset provides authentic and fine-grained retouching operations.

The simulated subset is built from 30,171 high-quality FFHQ images and uses Landmark-Guided Local Warping (LLW) to generate 8 base facial operation types, each with two opposite directions, resulting in 808,439 retouched pairs. The professional subset is built from 3,789 4K–8K portraits from PPR10K and applies 27 professional retouching operations, including 18 facial-feature operations, 4 face-shape operations, and 5 body-proportion operations, yielding 46,642,845 finely retouched pairs. Together with the self-augmentation pipeline, MirrorPPR47M provides operation-aligned yet spatially decoupled training data for learning realistic structural portrait retouching.

808K simulated retouched pairs

46.6M professional retouched pairs

8 simulated retouching operation types

27 professional retouching operation types

Experimental Results

We evaluate MirrorPPR in a cross-identity setting where the exemplar pair and the query portrait come from different identities, matching the intended inference scenario rather than the self-augmented training construction. SimFace-100 contains 100 combinations of 8 LLW-based facial retouching operations applied to 12 face images, while ProPortrait-500 contains 500 combinations of 27 professional operations applied to 40 high-quality portraits. MirrorPPR is compared with strong baselines from three categories: multi-reference image editing, exemplar-based image editing, and text-guided image editing.

The results show that existing multi-reference and exemplar-based methods often misinterpret the task as image blending, copying, or face swapping. Text-guided methods achieve better reconstruction but still suffer from identity drift and imprecise control over fine-grained structural changes. In contrast, MirrorPPR consistently transfers the demonstrated structural operations with high fidelity and strong identity preservation.

Quantitative Comparisons

Quantitative comparison results with baselines on SimFace-100.
Category	Model	PSNR ↑	SSIM ↑	LPIPS ↓	Face Similarity ↑
Multi-Reference Image Editing	Qwen-Image-Edit-2511	9.06	0.468	0.745	0.207
	FLUX.2-dev	9.28	0.481	0.698	0.110
	Nano Banana 2	16.72	0.784	0.329	0.556
	Seedream 4.5	13.01	0.709	0.501	0.351
Exemplar-based	Qwen-Image-Edit-2511-ICEdit-LoRA	9.21	0.533	0.640	0.300
	RelationAdapter	16.57	0.698	0.543	0.204
	EditTransfer	15.68	0.691	0.492	0.464
Text-guided	Qwen-Image-Edit-2511	25.80	0.862	0.260	0.463
	FLUX.2-dev	22.44	0.804	0.301	0.531
	Nano Banana 2	24.25	0.860	0.239	0.601
	Seedream 4.5	18.01	0.788	0.368	0.600
Ours	MirrorPPR-Face	32.25	0.909	0.186	0.937

Quantitative comparison results with baselines on ProPortrait-500.
Category	Model	PSNR ↑	SSIM ↑	LPIPS ↓	Face Similarity ↑
Multi-Reference Image Editing	Qwen-Image-Edit-2511	10.23	0.538	0.645	0.413
	FLUX.2-dev	9.36	0.466	0.728	0.220
	Nano Banana 2	17.72	0.835	0.250	0.811
	Seedream 4.5	12.12	0.689	0.436	0.705
Exemplar-based	Qwen-Image-Edit-2511-ICEdit-LoRA	12.06	0.631	0.564	0.606
	RelationAdapter	15.74	0.709	0.586	0.283
	EditTransfer	18.32	0.748	0.481	0.457
Text-guided	Qwen-Image-Edit-2511	20.85	0.732	0.387	0.501
	FLUX.2-dev	19.94	0.748	0.345	0.616
	Nano Banana 2	27.45	0.904	0.183	0.667
	Seedream 4.5	16.43	0.770	0.378	0.782
Ours	MirrorPPR-Pro	32.65	0.927	0.200	0.960

MirrorPPR: Exemplar-Based Portrait Photo Retouching

Abstract

Method

Data Self-Augmentation

MirrorPPR47M Dataset

Experimental Results

Quantitative Comparisons

Qualitative Comparisons

BibTeX