find_duplicates_by_content
find_duplicates_by_content(directory)
Finds duplicate files based on file content (MD5 hash) within a given directory and its subdirectories.
Parameters
| directory |
str |
The path to the directory to search for duplicates. |
required |
Returns
|
dict |
A dictionary where keys are file hashes and values are lists of file paths that have that hash. Only includes hashes that appear more than once. |
Examples
>>> import tempfile
>>> import os
>>> with tempfile.TemporaryDirectory() as tmp:
... _ = open(os.path.join(tmp, "a.txt"), "w").write("same")
... _ = open(os.path.join(tmp, "b.txt"), "w").write("same")
... duplicates = find_duplicates_by_content(tmp)
... any(len(paths) > 1 for paths in duplicates.values())
True