clean_location
clean_location
Functions
| Name | Description |
|---|---|
| clean_location | Identify a free-text entry representing a province or territory in Canada using fuzzy matching and return the two letter unique identifier. |
clean_location
clean_location.clean_location(text, threshold=85)Identify a free-text entry representing a province or territory in Canada using fuzzy matching and return the two letter unique identifier.
The function accepts a province or territory in a variety of English formats, including full spelling, common abbreviations, and minor misspellings. It performs fuzzy matching between the input string and a dictionary of province and territory names, acronyms, and shorthands. If a province or territory cannot be identified, the function will raise an error.
This program can only process English text entries, containing the 26 characters of the English alphabet. It may not process French characters, including accents, and may not match French province/territory names correctly.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| text | str | The input string representing a province/territory in Canada. | required |
| threshold | int | The baseline cutoff threshold for accepting a fuzzy match, up to 100 (perfect match). Default is 85. | 85 |
Returns
| Name | Type | Description |
|---|---|---|
| str | The cleaned and validated province/territory. |
Raises
| Name | Type | Description |
|---|---|---|
| ValueError | If a valid Canadian province/territory cannot be identified from the input. | |
| TypeError | If the input is not a string. |
Examples
>>> clean_location("British Columbia")
'BC'
>>> clean_location("B.C.")
'BC'
>>> clean_location("Not A Province")
# Raises ValueError: Province or territory could not be identified.
>>> clean_location(1)
# Raises TypeError: Input is not a string.