count_unique_words
count_unique_words(
text,
ignore_words=None,
count_punc=False,
case_sensitive=True,
)Count the instances of unique words in a text string.
By default, punctuation is removed (except apostrophes inside words). Words are split on any whitespace (spaces/tabs/newlines), not just single spaces.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| text | str | String of text to count instances of unique words. | required |
| ignore_words | Iterable[str] | None | Words to exclude from counting. Accepts any iterable of strings (e.g., list, set, tuple). Matching respects case_sensitive. |
None |
| count_punc | bool | If True, punctuation symbols are tokenized and counted as separate tokens. If False, punctuation is removed (apostrophes inside words are kept). | False |
| case_sensitive | bool | If False, words are normalized to lowercase before counting (and ignore_words is too). | True |
Returns
| Name | Type | Description |
|---|---|---|
| dict[str, int] | Dictionary of tokens to counts. |
Raises
| Name | Type | Description |
|---|---|---|
| TypeError | If input types are incorrect or ignore_words contains non-strings. |
Examples
count_unique_words(‘I go where I go’) {‘I’: 2, ‘go’: 2, ‘where’: 1}
count_unique_words(‘The the the thing’, ignore_words=[‘the’], case_sensitive=False) {‘thing’: 1}