find_duplicates_by_content

find_duplicates_by_content(directory)

Finds duplicate files based on file content (MD5 hash) within a given directory and its subdirectories.

Parameters

Name Type Description Default
directory str The path to the directory to search for duplicates. required

Returns

Name Type Description
dict A dictionary where keys are file hashes and values are lists of file paths that have that hash. Only includes hashes that appear more than once.

Examples

>>> import tempfile
>>> import os
>>> with tempfile.TemporaryDirectory() as tmp:
...     _ = open(os.path.join(tmp, "a.txt"), "w").write("same")
...     _ = open(os.path.join(tmp, "b.txt"), "w").write("same")
...     duplicates = find_duplicates_by_content(tmp)
...     any(len(paths) > 1 for paths in duplicates.values())
True