find_duplicates

find_duplicates(directory, method='content')

Finds duplicate files within a given directory and its subdirectories. This is the main function that will call the other functions.

Parameters

Name Type Description Default
directory str The path to the directory to search for duplicates. required
method str The method to use for finding duplicates. Can be ‘name’, ‘size’, or ‘content’. 'content'

Returns

Name Type Description
dict A dictionary where keys are duplicate identifiers and values are lists of matching file paths. Empty if none.

Raises

Type Description
ValueError If the provided method is not one of ‘name’, ‘size’, or ‘content’.
FileNotFoundError If the provided directory path does not exist or is not a directory.

Examples

>>> import tempfile
>>> import os
>>> with tempfile.TemporaryDirectory() as tmp:
...     path_1 = os.path.join(tmp, "a.txt")
...     path_2 = os.path.join(tmp, "b.txt")
...     _ = open(path_1, "w").write("same")
...     _ = open(path_2, "w").write("same")
...     duplicates = find_duplicates(tmp, method="content")
...     any(len(paths) > 1 for paths in duplicates.values())
True