Add CopyAndPaste transform #1225

ternaus · 2022-07-13T18:47:36Z

Specify the object as PNG with the empty background.
Specify background image.
Cut out a region in the background image that is larger than the object.
Paste the object to the cut.
Use Poisson Blending to inpaint space between pasted object and image.

i-aki-y · 2022-09-03T01:46:54Z

@ternaus Hi, I'm interested in the feature.

How can we specify the pasted object in an albumentation's transform? Do we need to introduce a new target keyword like 'paste_image':

transform(image=image, paste_image=paste_image, ...)

, or sample segments from the same target image as yolov5 is doing?

https://github.com/ultralytics/yolov5/blob/15e82d296720d4be344bf42a34d60ffd57b3eb28/utils/dataloaders.py#L706

Dipet · 2022-09-03T12:52:47Z

I think we could support this:

Add past_image_key param for transform - if this key is set we will use provided images
Add past_image_dir parms for transform - is this key is set we will sample random image from this directory
Otherwise sample random segment from the image.

i-aki-y · 2022-09-23T09:19:38Z

@Dipet thank you for your reply.

Do You mean that the paste_image_key is used to set a path to the png file as a target?

transform(image=image, paste_image_key="path/to/png", ...)

If so, I think using past_image_dir is a better choice since the past_image_dir can be used as a parameter of the constructor. And we can avoid introducing new targets and follow the standard usage.

Ex.

transform Compose([
    CopyAndPaste(paste_image_dir=objects_dir, …),
...
])

transform(image=image, …)   # we can still use only the standard targets bboxes, and masks.

To get a feel for it, I made a workable example in PR #1297 (still working).

zetyquickly · 2024-04-06T19:26:41Z

Feature description

Add CopyAndPaste Augmentation from https://arxiv.org/abs/2012.07177

Used in github.com/ultralytics/yolov5/blob/ac6c4383bc0c7a2a4f7ca18f8733821b49e916bd/utils/augmentations.py#L19

Checked the yolov5 code, here. It looks they don't do the method the paper describes

Paper quote:

Our approach for generating new data using Copy-Paste is very simple. We randomly select two images and apply random scale jittering and random horizontal flipping on each of them. Then we select a random subset of objects from one of the images and paste them onto the other image. Lastly, we adjust the ground-truth annotations accordingly: we remove fully occluded objects and update the masks and bounding boxes of partially occluded objects.

What they do:

Take an image and its segments (masks of objects on the image).
Mirror these segments relative to the center of the whole image.
Paste the new mirrored segments back onto the image (if their IOA is low).

zetyquickly · 2024-04-06T19:30:57Z

Regarding this PR.

I haven't checked the implementation in detail, but there are two points so far:

It implements more than what the "Simple Copy-Paste ..." paper suggests, which is great.
Its call signature could be improved. The parameters:

paste_image_dir=object_dir,
get_label_from_path=get_label_from_path

appear suboptimal. It would be better to handle everything in memory, working directly with already loaded images, labels, and masks.

ternaus · 2024-04-06T19:48:41Z

@zetyquickly

Loading everything to memory or loading from the disk could be of personal preference.

In the lastest transform that needed to load extra data from disk it looks like

https://github.com/albumentations-team/albumentations/blob/main/albumentations/augmentations/mixing/transforms.py

We have pair of:

reference_data (Optional[Union[Generator[ReferenceImage, None, None], Sequence[Any]]]):
            A sequence or generator of dictionaries containing the reference data for mixing
            If None or an empty sequence is provided, no operation is performed and a warning is issued.

and

read_fn (Callable[[ReferenceImage], Dict[str, Any]]):
            A function to process items from reference_data. It should accept items from reference_data
            and return a dictionary containing processed data:
                - The returned dictionary must include an 'image' key with a numpy array value.
                - It may also include 'mask', 'global_label' each associated with numpy array values.
            Defaults to a function that assumes input dictionary contains numpy arrays and directly returns it.

reference_data could be generator, sequence of ids, paths, images loaded into memory

and read_fn is function that maps from reference_data element to something that transform uses.

=> if person wants to load everything into memory beforehead => load it to reference_data and use lambda x: x as read_fn,

if you want to read on the fly => let all the work happen in read_fn

It looks like a lot of different functionality is added in that PR, I would probably split it into different PR's

ternaus added enhancement New feature or request feature request and removed enhancement New feature or request labels Jul 13, 2022

ternaus added the good first issue Good for newcomers label Jul 22, 2022

i-aki-y mentioned this issue Sep 23, 2022

Add CutAndPaste #1297

Open

mjb-oz mentioned this issue Apr 13, 2023

Contextual Augmentation (aka "cut and paste") via Lambda #1438

Closed

zetyquickly mentioned this issue Apr 6, 2024

[Feature Request] Add Copy and Paste tramsform #1616

Closed

ternaus changed the title ~~Add CutAndPaste transform~~ Add CopyAndPaste transform Apr 10, 2024

ternaus mentioned this issue Jun 27, 2024

Add copy paste #1820

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CopyAndPaste transform #1225

Add CopyAndPaste transform #1225

ternaus commented Jul 13, 2022

i-aki-y commented Sep 3, 2022

Dipet commented Sep 3, 2022

i-aki-y commented Sep 23, 2022

zetyquickly commented Apr 6, 2024

Feature description

zetyquickly commented Apr 6, 2024

ternaus commented Apr 6, 2024

Add CopyAndPaste transform #1225

Add CopyAndPaste transform #1225

Comments

ternaus commented Jul 13, 2022

i-aki-y commented Sep 3, 2022

Dipet commented Sep 3, 2022

i-aki-y commented Sep 23, 2022

zetyquickly commented Apr 6, 2024

Feature description

zetyquickly commented Apr 6, 2024

ternaus commented Apr 6, 2024