While text-guided image editing has made remarkable progress, it remains limited in structural portrait retouching. Textual descriptions struggle to convey fine-grained changes to facial features and body proportions. To address this gap, we introduce Exemplar-Based Portrait Photo Retouching, where the model is given an exemplar pair and tasked with inferring and applying the same retouching operations to a new query image. Existing exemplar-based editing methods primarily focus on tasks with pronounced visual transformations. In contrast, structural portrait retouching involves extremely delicate and localized modifications, making accurate extraction and transfer of these edits challenging. To tackle this, we propose MirrorPPR, a novel framework specifically designed to capture and transfer subtle structural retouching operations. Our method uses a Retouching Operation Extractor to capture the subtle differences from the exemplar pair. The extracted representations are then injected into a pre-trained Diffusion Transformer (DiT) through a connector and Low-Rank Adaptation (LoRA) modules. Furthermore, constructing perfectly aligned cross-identity training pairs is severely hindered by operational misalignment. To overcome this, we propose an advanced data self-augmentation paradigm that ensures strictly aligned retouching operations. To alleviate data scarcity and support this novel task, we introduce MirrorPPR47M, a large-scale dataset with over 47 million retouched pairs. By structuring the dataset into simulated and professional subsets, we enable progressive curriculum learning to smoothly optimize the network. Extensive experiments demonstrate that MirrorPPR significantly outperforms existing baselines in both retouching quality and identity preservation.
MirrorPPR formulates portrait photo retouching as exemplar-based retouching operation transfer. Given an exemplar source image Xs , its retouched counterpart Xt , and a new query portrait Xq , the goal is to generate a retouched result that applies the same operations demonstrated by the exemplar pair. The framework first uses a Retouching Operation Extractor to capture the subtle differences between Xs and Xt : a frozen MAE provides local, structure-aware patch features, while a trainable R-Former uses learnable query tokens to distill these features into a compact operation representation. The extractor is pre-trained with an auxiliary reconstruction task, encouraging the representation to encode precise retouching intent. After pre-training, the extracted operation representation is mapped by a connector into the conditioning space of a frozen dual-stream DiT backbone. Since the task is driven by visual demonstrations rather than text instructions, MirrorPPR conditions the diffusion model on the query image and the extracted retouching operation. The R-Former, connector, and newly added LoRA modules are jointly fine-tuned, enabling the model to learn retouching operation transfer while preserving the strong prior of the pre-trained editing model.
Extract operation → Inject into DiT → Transfer to query
A central challenge in exemplar-based portrait retouching is constructing valid cross-identity training quadruplets. Because portraits differ in pose, shot scale, occlusion, and visible body regions, the same local retouching operation may not be applicable or spatially aligned across different identities. Meanwhile, the naive same-identity construction, where the exemplar source is directly reused as the query, introduces pixel-level shortcut learning: the model may simply copy coordinate-wise differences instead of understanding the underlying retouching semantics.
MirrorPPR addresses this with Data Self-Augmentation. For each source-target exemplar pair (Xs , Xt), we apply the same random spatial augmentation A , such as scaling, cropping, rotation, or horizontal flipping, to both images, forming Xq = A(Xs) and Yq = A(Xt). This construction keeps the retouching operation strictly aligned between the exemplar pair and the query pair, while breaking their absolute coordinate correspondence. As a result, the model is encouraged to transfer the demonstrated operation according to the query portrait’s own spatial layout, enabling robust cross-identity generalization.
MirrorPPR47M is a large-scale dataset designed for exemplar-based portrait retouching. It contains over 47 million retouched pairs covering facial features, face contours, and body proportions. To make subtle real-world retouching learnable, the dataset is organized for an easy-to-hard curriculum: a Simulated Retouching Subset provides pronounced geometric deformations for learning fundamental structural variations, while a Professional Retouching Subset provides authentic and fine-grained retouching operations.
The simulated subset is built from 30,171 high-quality FFHQ images and uses Landmark-Guided Local Warping (LLW) to generate 8 base facial operation types, each with two opposite directions, resulting in 808,439 retouched pairs. The professional subset is built from 3,789 4K–8K portraits from PPR10K and applies 27 professional retouching operations, including 18 facial-feature operations, 4 face-shape operations, and 5 body-proportion operations, yielding 46,642,845 finely retouched pairs. Together with the self-augmentation pipeline, MirrorPPR47M provides operation-aligned yet spatially decoupled training data for learning realistic structural portrait retouching.
We evaluate MirrorPPR in a cross-identity setting where the exemplar pair and the query portrait come from different identities, matching the intended inference scenario rather than the self-augmented training construction. SimFace-100 contains 100 combinations of 8 LLW-based facial retouching operations applied to 12 face images, while ProPortrait-500 contains 500 combinations of 27 professional operations applied to 40 high-quality portraits. MirrorPPR is compared with strong baselines from three categories: multi-reference image editing, exemplar-based image editing, and text-guided image editing.
The results show that existing multi-reference and exemplar-based methods often misinterpret the task as image blending, copying, or face swapping. Text-guided methods achieve better reconstruction but still suffer from identity drift and imprecise control over fine-grained structural changes. In contrast, MirrorPPR consistently transfers the demonstrated structural operations with high fidelity and strong identity preservation.
| Category | Model | PSNR ↑ | SSIM ↑ | LPIPS ↓ | Face Similarity ↑ |
|---|---|---|---|---|---|
| Multi-Reference Image Editing |
Qwen-Image-Edit-2511 | 9.06 | 0.468 | 0.745 | 0.207 |
| FLUX.2-dev | 9.28 | 0.481 | 0.698 | 0.110 | |
| Nano Banana 2 | 16.72 | 0.784 | 0.329 | 0.556 | |
| Seedream 4.5 | 13.01 | 0.709 | 0.501 | 0.351 | |
| Exemplar-based | Qwen-Image-Edit-2511-ICEdit-LoRA | 9.21 | 0.533 | 0.640 | 0.300 |
| RelationAdapter | 16.57 | 0.698 | 0.543 | 0.204 | |
| EditTransfer | 15.68 | 0.691 | 0.492 | 0.464 | |
| Text-guided | Qwen-Image-Edit-2511 | 25.80 | 0.862 | 0.260 | 0.463 |
| FLUX.2-dev | 22.44 | 0.804 | 0.301 | 0.531 | |
| Nano Banana 2 | 24.25 | 0.860 | 0.239 | 0.601 | |
| Seedream 4.5 | 18.01 | 0.788 | 0.368 | 0.600 | |
| Ours | MirrorPPR-Face | 32.25 | 0.909 | 0.186 | 0.937 |
| Category | Model | PSNR ↑ | SSIM ↑ | LPIPS ↓ | Face Similarity ↑ |
|---|---|---|---|---|---|
| Multi-Reference Image Editing |
Qwen-Image-Edit-2511 | 10.23 | 0.538 | 0.645 | 0.413 |
| FLUX.2-dev | 9.36 | 0.466 | 0.728 | 0.220 | |
| Nano Banana 2 | 17.72 | 0.835 | 0.250 | 0.811 | |
| Seedream 4.5 | 12.12 | 0.689 | 0.436 | 0.705 | |
| Exemplar-based | Qwen-Image-Edit-2511-ICEdit-LoRA | 12.06 | 0.631 | 0.564 | 0.606 |
| RelationAdapter | 15.74 | 0.709 | 0.586 | 0.283 | |
| EditTransfer | 18.32 | 0.748 | 0.481 | 0.457 | |
| Text-guided | Qwen-Image-Edit-2511 | 20.85 | 0.732 | 0.387 | 0.501 |
| FLUX.2-dev | 19.94 | 0.748 | 0.345 | 0.616 | |
| Nano Banana 2 | 27.45 | 0.904 | 0.183 | 0.667 | |
| Seedream 4.5 | 16.43 | 0.770 | 0.378 | 0.782 | |
| Ours | MirrorPPR-Pro | 32.65 | 0.927 | 0.200 | 0.960 |