Now almost everyone stores at HD (or CD) small (or huge) gallery of the images with him-/herself, relatives, pets, etc. Frequently there is that the same image is stored several times. It is not certainly terrible, but for me personally it is unpleasant. The programs for search of file duplicates cannot help in such situation, as the images can differ by file formats (GIF, JPEG, TIFF, etc.), sizes (640x480, 800x600, ...), small shift and/or turn (inevitable at scanning photos). Recently, anybody does not already pay attention on an occupied place (though it concerns to HDs, but not to CDs - their volume is rigidly limited), but at the same time it would be desirable to leave only one "the best" image, and to remove all others.
To compare the images is basically task for artificial intelligence, and up to the end it can be solved as much as long. But to give out the list of the CANDIDATES, which only CAN be similar, is the task solved by elementary methods. It also is made in this program.
The order of use is the following: set a directory with images, receive fast access to a heap of pairs of similar images, which can be looked by your favorite viewer, and decide what images should be left.
At comparison of two images neither file names, nor file sizes, neither image resolutions, nor ratios between the sides, but only CONTENTS of the images is taken into account. An user should set maximal level of difference at which images still have been considered as identical (percentage; 0% - is very similar images, BUT ONLY from the point of view of the program). The result of comparison is a set of pairs (or groups) of images whose measure of similarity is less then the limit set by the user.
By itself, such function cannot be run frequently on galleries with a plenty of images (tens of thousand), therefore it is possible TO KEEP galleries and then to compare new groups of images with the available gallery. Thus, only the NEW images are scanned, which considerably accelerates the process. The opportunity to work in off-line mode is implemented! Presence of the gallery images at comparison them with new pictures is not necessary. Quietly store own images on CD (in bedside table for example), and provide ImageDupeless only by the corresponding gallery file on HD and by a directory with new image files. Only the latter will be read out, included into the gallery, and compared with the previously included images.
Computer: Duron 700, OS: Windows 98 SE, values of ImageDupeless parameters have been set by default
1. 4441 images in 49 directories, 389Mb (archive of fantasy pictures on CD - about 67% of GIF)
Gallery size: 33Mb
2. 3925 images in 13 directories, 720Mb (photo archive on CD - JPEG
format, both B/W and colored, high quality images)
Gallery size: 27.7Mb
Computer: Duron 800, OS: Windows 2000, default settings (unless differently mentioned)
3. 7997 images in 107 directories, 685Mb (archive of pictures on CD -
GIF (5.7%) and JPEG formats, both B/W and colored)
Gallery size: 61.9Mb
4. The parameter "Maximal level of difference" = 5%, the option
"Store thumbnails in gallery files" is turned off.
Gallery size: 71.74Mb
Size of a gallery file and, correspondingly, expences of RAM are proportional to quantity of images in the gallery (approximately 7Kb per image when thumbnails are stored in the gallery file and 2.5Kb per image otherwise). Time of reading is proportional to gallery size and strongly depends on the option "Store thumbnails in gallery files" (it is 2-2.5 times smaller when the option is turned off). Comparison time is proportional to squared quantity of images in the gallery and weakly depends on values of parameters.
128Mb of RAM is minimum recommended to use ImageDupeless, 256Mb or higher is recommended for comfortable work.
Directions of the further development:
Copyright © 2002-2012 Oleg Tarlapan.