clean_location

clean_location

Functions

Name Description
clean_location Identify a free-text entry representing a province or territory in Canada using fuzzy matching and return the two letter unique identifier.

clean_location

clean_location.clean_location(text, threshold=85)

Identify a free-text entry representing a province or territory in Canada using fuzzy matching and return the two letter unique identifier.

The function accepts a province or territory in a variety of English formats, including full spelling, common abbreviations, and minor misspellings. It performs fuzzy matching between the input string and a dictionary of province and territory names, acronyms, and shorthands. If a province or territory cannot be identified, the function will raise an error.

This program can only process English text entries, containing the 26 characters of the English alphabet. It may not process French characters, including accents, and may not match French province/territory names correctly.

Parameters

Name Type Description Default
text str The input string representing a province/territory in Canada. required
threshold int The baseline cutoff threshold for accepting a fuzzy match, up to 100 (perfect match). Default is 85. 85

Returns

Name Type Description
str The cleaned and validated province/territory.

Raises

Name Type Description
ValueError If a valid Canadian province/territory cannot be identified from the input.
TypeError If the input is not a string.

Examples

>>> clean_location("British Columbia")
'BC'
>>> clean_location("B.C.")
'BC'
>>> clean_location("Not A Province")
# Raises ValueError: Province or territory could not be identified.
>>> clean_location(1)
# Raises TypeError: Input is not a string.