ImageDupeless
English Version 
German Version 
Russian Version 

Destination of the program:

 

ImageDupeless
  • to search/find the similar images;
  • to catalogue available image collections for further comparison of new images with the already composed collection (gallery);
  • to compare a gallery with new images without its rescanning and even if source gallery images are absent (only the gallery file is used) and to compare a gallery with another one;
  • to find the images similar to the given one in the gallery;
  • to store results of search of duplicates;
  • to export/import the lists of duplicates and unlike images to/from an external text file;
  • NEW to merge series of images

Now almost everyone stores at HD (or CD) small (or huge) gallery of the images with him-/herself, relatives, pets, etc. Frequently there is that the same image is stored several times. It is not certainly terrible, but for me personally it is unpleasant. The programs for search of file duplicates cannot help in such situation, as the images can differ by file formats (GIF, JPEG, TIFF, etc.), sizes (640x480, 800x600, ...), small shift and/or turn (inevitable at scanning photos). Recently, anybody does not already pay attention on an occupied place (though it concerns to HDs, but not to CDs - their volume is rigidly limited), but at the same time it would be desirable to leave only one "the best" image, and to remove all others.

To compare the images is basically task for artificial intelligence, and up to the end it can be solved as much as long. But to give out the list of the CANDIDATES, which only CAN be similar, is the task solved by elementary methods. It also is made in this program.

The order of use is the following: set a directory with images, receive fast access to a heap of pairs of similar images, which can be looked by your favorite viewer, and decide what images should be left.

At comparison of two images neither file names, nor file sizes, neither image resolutions, nor ratios between the sides, but only CONTENTS of the images is taken into account. An user should set maximal level of difference at which images still have been considered as identical (percentage; 0% - is very similar images, BUT ONLY from the point of view of the program). The result of comparison is a set of pairs (or groups) of images whose measure of similarity is less then the limit set by the user.

By itself, such function cannot be run frequently on galleries with a plenty of images (tens of thousand), therefore it is possible TO KEEP galleries and then to compare new groups of images with the available gallery. Thus, only the NEW images are scanned, which considerably accelerates the process. The opportunity to work in off-line mode is implemented! Presence of the gallery images at comparison them with new pictures is not necessary. Quietly store own images on CD (in bedside table for example), and provide ImageDupeless only by the corresponding gallery file on HD and by a directory with new image files. Only the latter will be read out, included into the gallery, and compared with the previously included images.

Small benchmarks:

Computer: Duron 700, OS: Windows 98 SE, values of ImageDupeless parameters have been set by default

1. 4441 images in 49 directories, 389Mb (archive of fantasy pictures on CD - about 67% of GIF)

Gallery size: 33Mb
Creation time: 18 min. 24 sec.
Comparison time: 2 min. 41 sec.
Result: 881 similar images in 176 groups

2. 3925 images in 13 directories, 720Mb (photo archive on CD - JPEG format, both B/W and colored, high quality images)

Gallery size: 27.7Mb
Creation time: 26 min. 29 sec.
Comparison time: 1 min. 50 sec.
Result: 1024 similar images in 435 groups

Computer: Duron 800, OS: Windows 2000, default settings (unless differently mentioned)

3. 7997 images in 107 directories, 685Mb (archive of pictures on CD - GIF (5.7%) and JPEG formats, both B/W and colored)

Gallery size: 61.9Mb
Creation time: 24 min. 58 sec.
Comparison time: 4 min. 31 sec.
Result: 861 similar images in 259 groups

4. The parameter "Maximal level of difference" = 5%, the option "Store thumbnails in gallery files" is turned off.
24314 images, 2.164Gb (archive of pictures on HDD)

Gallery size: 71.74Mb
Creation time: 44 min. (average expences of RAM: 85Mb)
Comparison time: 41 min. (average expences of RAM: 101.5Mb)
Result: 9285 similar images in 4270 groups

Size of a gallery file and, correspondingly, expences of RAM are proportional to quantity of images in the gallery (approximately 7Kb per image when thumbnails are stored in the gallery file and 2.5Kb per image otherwise). Time of reading is proportional to gallery size and strongly depends on the option "Store thumbnails in gallery files" (it is 2-2.5 times smaller when the option is turned off). Comparison time is proportional to squared quantity of images in the gallery and weakly depends on values of parameters.

128Mb of RAM is minimum recommended to use ImageDupeless, 256Mb or higher is recommended for comfortable work.

System requirements:

  • Windows 9x/ME/2000/XP;
  • Min. 4.5 MB of free space on hard disk;
  • Min. 256 MB of RAM (9x/ME/2000/XP)

Directions of the further development:

  • To implement opportunity to look up the gallery in directories in the off-line mode.
  • To implement opportunity to build the uniform gallery from several root directories or different CDs.
  • Groups of similar images containing more than 5-6 ones have poor generality as usual. It is necessary to implement the other algorithm.
  • It is necessary to change cardinally the algorithm to work with the shaped images - it is the task for far perspective.

ImageDupeless screenshot

 

Copyright 2002-2012 Oleg Tarlapan.