IMAGDressing

Abstract

Latest advances have achieved realistic virtual try-on (VTON) through localized garment inpainting using latent diffusion models, significantly enhancing consumers' online shopping experience. However, existing VTON technologies neglect the need for merchants to showcase garments comprehensively, including flexible control over garments, optional faces, poses, and scenes. To address this issue, we define a virtual dressing (VD) task focused on generating freely editable human images with fixed garments and optional conditions. Meanwhile, we design a comprehensive affinity metric index (CAMI) to evaluate the consistency between generated images and reference garments. Then, we propose IMAGDressing-v1, which incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE. We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet, ensuring users can control different scenes through text. IMAGDressing-v1 can be combined with other extension plugins, such as ControlNet and IP-Adapter, to enhance the diversity and controllability of generated images. Furthermore, to address the lack of data, we release the interactive garment pairing (IGPair) dataset, containing over 300,000 pairs of clothing and dressed images, and establish a standard pipeline for data assembly. Extensive experiments demonstrate that our IMAGDressing-v1 achieves state-of-the-art human image synthesis performance under various controlled conditions. The code and model will be available at https://github.com/muzishen/IMAGDressing.

Method

Simple Architecture: IMAGDressing-v1 produces lifelike garments and enables easy user-driven scene editing.
New Task: Definition of virtual dressing (VD) Task and design a comprehensive affinity metric index (CAMI) metric
Flexible Plugin Compatibility: IMAGDressing-v1 modestly integrates with extension plugins such as IP-Adapter, ControlNet, T2I-Adapter, and AnimateDiff.
Rapid Customization: Enables rapid customization in seconds without the need for additional LoRA training.
IGPair Dataset: Release a new interactive garment pairing (IGPair) dataset

IGPair Dataset Demo

https://huggingface.co/datasets/IMAGDressing/IGPair

IGPair Dataset

IGPair includes multiple models for each clothing item. It is also the first dataset with a resolution exceeding 2k*2k. Additionally, IGPair is the first publicly available dataset that includes textual descriptions, diverse scenes, and various styles. Specifically, IGPair includes 86,873 garments. We categorize the garments into 18 types, and the dataset consists of 324,857 image pairs.

Data Structure

Now we release the body mask (in folder './body_mask/'), clothes (in folder './clothes/'), densepose (in folder './densepose/'), openpose (in folder './openpose/'), IGPair Test Data (in folder './IGPair_Test/'), and annotations (in folder './IGPair.json/') for IGPair.

Each annotation adheres to the subsequent Parquet format specifications, including column names and corresponding content examples:

{
      "text": "caption",
      "image_file": "model_path",
      "cloth_file": "cloth_path",
      "cloth_type": "type"
  }

Type Counter

Test Set

The test set includes 1600 upper-body cloth, 200 dress, and 200 lower-body cloth images in total. The image naming format is upper_body_00001.jpg, dress_01601.jpg, and lower_body_01801.jpg.

Download Instructions


          
                cd IGPair/IGPair_dataset
                cat IGPair.zip.* > IGPair.zip

Citation Information

@article{shen2024IMAGDressing-v1,
    title={IMAGDressing-v1: Customizable Virtual Dressing},
    author={Shen, Fei and Jiang, Xin and He, Xin and Ye, Hu and Wang, Cong, and Du, Xiaoyu, Zechao, Li and Tang, Jinhui},
    booktitle={arXiv preprint arXiv:2407.12705},
    year={2024}
  }

IMAGDressing-v1 : Customizable Virtual Dressing

Abstract

Virtual try-on (VTON) vs virtual dressing (VD)

Method

Compared with MagicClothing

Combined with IP-Adapter

Combined with IP-Adapter and ControlNet-Pose

Support text prompts for different scenes