Text Utilities
Welcome to textutils
| Package | |
| Meta | |
| Coverage |
textutils is a lightweight Python package that provides a small collection of utility functions for basic text processing and manipulation. The package is designed to be simple, beginner-friendly, and easy to integrate into data analysis or general Python workflows where quick text operations are needed without the overhead of large NLP libraries.
Contributors
- Mehmet Imga
- Shi Fan Jin
- Aidan Hew
- Sidharth Malik
Installation
$ pip install -i https://test.pypi.org/simple/ textutils-dsci524Package Overview
This package will include the following functions: - word_count(text: str) -> int Counts the number of words in a given string. The function will handle empty strings and raise appropriate errors for invalid inputs.
remove_punctuation(text: str) -> str Removes punctuation characters from a string and returns the cleaned text while preserving spacing and alphanumeric characters.
most_common_word(text: str) -> str Identifies and returns the most frequently occurring word in a string. The function ignores punctuation and can be case-insensitive or case-sensitive.
reverse_text(text: str) -> str Reverses the input string and returns the reversed result. The function will validate input types and handle edge cases such as empty strings.
Quick Usage Examples
from textutils.textutils import (
word_count,
remove_punctuation,
most_common_word,
reverse_text,
)
word_count("Hello world!") # returns 2
remove_punctuation("Hello, world!") # returns "Hello world"
most_common_word("apple banana apple orange") # returns "apple"
reverse_text("textutils") # returns "slitxet"Detailed Usage Examples
word_count
Count the number of words in a string. Handles extra spaces and empty input gracefully.
from textutils.textutils import word_count
# Example 1: Simple sentence
text = "Data science is fun"
print(word_count(text)) # Output: 4
# Example 2: Extra spaces between words
messy = " This is a test "
print(word_count(messy)) # Output: 4
# Example 3: Empty string
print(word_count("")) # Output: 0
# Example 4: String with only whitespace
print(word_count(" ")) # Output: 0
# Example 5: Real-world use case – counting words in user input
comment = "I really enjoyed using this package!"
num_words = word_count(comment)
print(num_words) # Output: 6remove_punctuation
Remove all punctuation from text while preserving letters, numbers, spaces, and emojis.
from textutils.textutils import remove_punctuation
# Example 1: Basic sentence
text = "Hello, World! How are you?"
result = remove_punctuation(text)
print(result) # Output: "Hello World How are you"
# Example 2: Text with multiple punctuation marks
messy_text = "Wait... What?! That's amazing!!!"
clean_text = remove_punctuation(messy_text)
print(clean_text) # Output: "Wait What Thats amazing"
# Example 3: Preserves numbers and emojis
mixed = "Sale: 50% off! Ends soon! 🎉"
print(remove_punctuation(mixed)) # Output: "Sale 50 off Ends soon 🎉"
# Example 4: Real-world use case - cleaning text data for analysis
reviews = [
"Great product! 5/5 stars!!!",
"Terrible... would NOT recommend.",
"It's okay, nothing special."
]
clean_reviews = [remove_punctuation(r) for r in reviews]
print(clean_reviews)
# Output: ['Great product 55 stars', 'Terrible would NOT recommend', 'Its okay nothing special']most_common_word
Identify the most common word in a given text.
from textutils.textutils import most_common_word
# Example 1: Basic sentence
most_common_word("Hello. Hello. hello. How's your day?") # Output 'hello'
# Example 2: With case-sensitive
most_common_word("Hello. Hello. hello. How's your day?", True) # Output 'Hello'
# Example 3: Tie situation, return first appearance word
most_common_word("apple banana apple banana") # Output 'apple'
# Example 4: Single word
most_common_word("hello") # Output: 'hello'reverse_text
Reverse text either by words or by characters, with support for flexible formatting and simple text transformations.
from textutils.textutils import reverse_text
# Example 1: Basic sentence (default word mode)
text = "Hello World"
result = reverse_text(text)
print(result) # Output: "World Hello"
# Example 2: Explicit word-based reversal
sentence = "Data science is fun"
reversed_words = reverse_text(sentence, mode="word")
print(reversed_words) # Output: "fun is science Data"
# Example 3: Character-based reversal
char_text = "Hello World"
reversed_chars = reverse_text(char_text, mode="char")
print(reversed_chars) # Output: "dlroW olleH"
# Example 4: Preserves spacing between words in word mode
messy_spacing = "Hello World again"
print(reverse_text(messy_spacing))
# Output: "again World Hello"
# Example 5: Real-world use case – reversing text for simple transformations
messages = [
"Machine learning is powerful",
"Python makes data analysis easier",
"Reproducibility matters"
]
reversed_messages = [reverse_text(m, mode="word") for m in messages]
print(reversed_messages)
# Output:
# ['powerful is learning Machine',
# 'easier analysis data makes Python',
# 'matters Reproducibility']Development Setup
To set up the development environment locally using conda:
- Clone the repository:
git clone https://github.com/UBC-MDS/DSCI_524_group34_textutils.git
cd DSCI_524_group34_textutils- Create and activate the conda environment:
conda env create -f environment.yml
conda activate textutils- Install the package in editable mode:
pip install -e .Running Tests
To run the full test suite locally:
pytestDocumentation
Package documentation is generated using quartodoc and deployed automatically to GitHub Pages via GitHub Actions.
To build the documentation locally:
quarto render docsThe deployed documentation can be found at: https://ubc-mds.github.io/DSCI_524_group34_textutils/
Relationship to the Python Ecosystem
Python has several powerful text-processing libraries such as:
While these libraries provide extensive functionality, they can be unnecessarily complex for simple text manipulation tasks. textutils is intended to complement existing tools by offering a minimal, lightweight alternative for common text operations that do not require full NLP pipelines.
Continuous Integration and Deployment
This project uses GitHub Actions for:
Continuous integration (running tests and style checks on pushes and pull requests)
Continuous deployment to TestPyPI on pushes to the main branch
Contributing
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
Copyright
- Copyright © 2026 DSCI_524_group34.
- Free software distributed under the MIT License.