Welcome to Text Processor

CI/CD CI codecov
Meta Code of Conduct

text_processor is a package that assists in the processing of text files in Python. This allows users to generate insights on and clean raw text data without needing to manually read the file or write the text to a new file, and is particularly useful in cases where users only need to perform a simple operation on the text file.

The package consists of the following functions:

  • text_lower:
    • This function converts all characters in a text file to lower case and writes it to a specified file.
  • text_find:
    • This function finds the index of the first instance of a specified string in a text file. If the string does not exist in the file, it will return -1.
  • text_remove:
    • This function removes all instances of a specified string in a text file, then writes it to a new file.
  • text_replace:
    • This function replaces all instances of a specified string in a text file with another string, then writes it to a new file.

Installation

To install the package from TestPyPI, use the following command:

pip install --no-cache-dir --index-url https://test.pypi.org/simple/ text-processor

Documentation

Our online documentation can be found here.

Get started

Cloning the Repository

Clone this GitHub repository and navigate to the project folder using the following commands:

git clone https://github.com/UBC-MDS/dsci_524_group28_text_processor.git
cd dsci_524_group28_text_processor

Setting Up the Development Environment

Create and activate the development environment using the environment.yml file:

conda env create -f environment.yml
conda activate 524

Locally Installing the Package

To install the package locally for testing, use the following command while in the root directory:

pip install -e .

Running the Tests

Run the tests using the following command while in the root directory:

pytest

Building the Documentation

After installing text_processor, the documentation can be built using Hatch with the following command:

quartodoc build

Deploying the Documentation

The documentation is automatically rendered and deployed once updates are pushed to the deployment branch.

text_processor in the Python Ecosystem

The functions in text_processor are analogous to built-in string methods in the Python standard library such as str.lower(), str.find(), and str.replace(). They are differentiated by how they are built to handle text files specifically, directly reading from and writing to files rather than working directly with the text as strings.

Contributors

  • Aitong Wu
  • Christine Chow
  • Julia Zhang
  • Vy Phan

Citation

  • Please see CITATION.cff.