This is a guide for detecting similar images and photos using imagededup, a Python machine learning, deep learning, and AI library. For example, it can find similarities that simple hash algorithms cannot detect, such as images with the same content but different quality or resolution, slightly different angles, or cropping. It seems standard to use it as an image filter for machine learning, but it can also be used for general purposes.
Create a virtual environment with virtualenv.
virtualenv imagededup_dev
. .\imagededup_dev\Scripts\activate
Install the imagededup package.
pip install imagededup
If virtualenv is not installed, install it.
pip install virtualenv
If Python and pip are not installed, install them. If you installed with choco on Windows, restart the console. Also, make sure the path is set.
(Linux)
sudo apt update
sudo apt install python3-pip python3-dev
(Windows)
choco install python3 --version=3.7.6.20200110
(PowerShell, checking the path)
Start C:\Windows\system32\rundll32.exe sysdm.cpl, EditEnvironmentVariables
If Choco is not installed on Windows, install it.
Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))
refreshenv
TensorFlow is required to run Imagededup. If it’s not installed, install it.
Below is sample code.
This stores an object with detected duplicate image file mappings in the duplicates variable.
if __name__ == '__main__':
from imagededup.methods import PHash
phasher = PHash()
encodings = phasher.encode_images(image_dir='C:/images/anime')
duplicates = phasher.find_duplicates(encoding_map=encodings)
Below is command line software I wrote for imagededup. No license is specified, so please refrain from unauthorized posting to blogs, GitHub, or other online platforms.
if __name__ == '__main__':
from imagededup.methods import PHash
import json
from pathlib import Path
from imagededup.utils import plot_duplicates
import fire
def main (type = "get_json", path = "C:\pg\ml\_images\hent") :
if ( type == "get_json" ) :
phasher = PHash()
encodings = phasher.encode_images(image_dir=path)
duplicates = phasher.find_duplicates(encoding_map=encodings)
Path('./tmp.json').write_text(json.dumps(duplicates))
if ( type == "test" ) :
duplicates = json.loads(Path('./tmp.json').read_text())
for key, value in duplicates.items() :
if ( value ) :
plot_duplicates(image_dir=path,
duplicate_map=duplicates,
filename=key)
if ( type == "get_txt" ) :
duplicates = json.loads(Path('./tmp.json').read_text())
content = ""
delete_values = []
for key, value in duplicates.items() :
if ( value and not key in delete_values ) :
for v in value:
content = content + v + "\n"
delete_values.append(v)
Path('./duplicates.txt').write_text(content)
print(fire.Fire(main))
The above program uses the following packages. Install with the following command.
pip install fire
Below are usage examples for the above program.
Please execute the following commands in order.
Specify --path as the path to your image folder.
python imagededup_dev.py --type get_json --path "B:\_images\animeimages" generates a JSON file.
python imagededup_dev.py --type test --path "B:\_images\animeimages" runs a test based on the JSON file. This displays duplicate images. If there are no duplicate images, nothing is displayed.
python imagededup_dev.py --type get_txt --path "B:\_images\animeimages" creates a text file duplicates.txt that lists paths to duplicate files based on the JSON file.
You can manually delete image files based on the created text file, but you can also delete them as a batch with Linux commands as follows. (Note: The following command deletes files in the current directory/folder. Please research the usage carefully before using.)
rm $( cat "/path/to/duplicates.txt" )
For how to use Linux on Windows, please see our blog. Search our blog for “wsl” or “bash on windows” to find guides.