Contributing
Contributions of all kinds are welcome here, and they are greatly appreciated! Every little bit helps, and credit will always be given.
Example Contributions
You can contribute in many ways, for example:
Report Bugs
Report bugs at https://github.com/UBC-MDS/DSCI_524_G26_Data_Validation/issues.
If you are reporting a bug, please follow the template guidelines. The more detailed your report, the easier and thus faster we can help you.
Fix Bugs
Look through the GitHub issues for bugs. Anything labelled with bug and help wanted is open to whoever wants to implement it. When you decide to work on such an issue, please assign yourself to it and add a comment that you’ll be working on that, too. If you see another issue without the help wanted label, just post a comment, the maintainers are usually happy for any support that they can get.
Implement Features
Look through the GitHub issues for features. Anything labelled with enhancement and help wanted is open to whoever wants to implement it. As for fixing bugs, please assign yourself to the issue and add a comment that you’ll be working on that, too. If another enhancement catches your fancy, but it doesn’t have the help wanted label, just post a comment, the maintainers are usually happy for any support that they can get.
Write Documentation
pyos_data_validation could always use more documentation, whether as part of the official documentation, in docstrings, or even on the web in blog posts, articles, and such. Just open an issue to let us know what you will be working on so that we can provide you with guidance.
Submit Feedback
The best way to send feedback is to file an issue at https://github.com/UBC-MDS/DSCI_524_G26_Data_Validation/issues. If your feedback fits the format of one of the issue templates, please use that. Remember that this is a volunteer-driven project and everybody has limited time.
Get Started!
Follow these steps to set up the project locally and start contributing.
Fork the repository on GitHub:
https://github.com/UBC-MDS/DSCI_524_G26_Data_ValidationClone your fork locally:
git clone git@github.com:your_github_username/DSCI_524_G26_Data_Validation.git cd DSCI_524_G26_Data_ValidationCreate and activate the Conda environment using the provided
environment.ymlfile (this environment already includes Hatch):conda env create -f environment.yml conda activate pyos_data_validationCreate a new branch from the default branch (
main).
Usefix/orfeat/as a prefix for your branch name:git checkout main git pull git checkout -b fix-short-descriptionMake your changes locally. When finished, run the test suite:
hatch run test:runCommit your changes and push your branch to GitHub.
Please use semantic commit messages:git add . git commit -m "fix: short description of change" git push -u origin fix-short-descriptionOpen a pull request against the
mainbranch using the link shown after pushing.
Pull Request Guidelines
Before you submit a pull request, check that it meets these guidelines:
- The pull request should include tests.
- If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring.
- Your pull request will automatically be checked by the full test suite. It needs to pass all of them before it can be considered for merging.
Development Tools, Infrastructure, and Practices
This project applies modern Python development workflows and collaborative practices learned in DSCI 524, with a strong emphasis on reproducibility, automation, and code quality.
Development Tools
- Hatch is used for environment management, testing, and task execution. This ensures consistent developer environments and simplifies common workflows such as running tests and checks.
- Ruff is used for formatting and linting to enforce PEP 8–compliant, readable code and to provide fast feedback during development.
- Pytest is used for automated testing to validate correctness and prevent regressions as the codebase evolves.
- Quartodoc + Quarto are used to generate API documentation directly from docstrings, ensuring documentation stays closely aligned with the code.
GitHub Infrastructure
- GitHub Issues are used to track bugs, feature requests, and documentation improvements, with labels (
bug,enhancement,help wanted) to organize work and encourage contributions. - Pull Requests are the primary mechanism for code review, discussion, and integration. All changes are reviewed before merging.
- GitHub Actions (CI) automatically run tests, formatting checks, and build steps on every pull request to
main, ensuring consistent quality standards and preventing broken code from being merged. - Branch-based development is used, with feature and fix branches (
feat/*,fix/*) to keep the main branch stable.
Organizational and Collaboration Practices
- Semantic commit messages (Conventional Commits) improve readability of the project history and support changelog generation.
- Consistent docstring standards ensure functions are easy to understand and maintain, especially for new contributors.
- Clear contribution guidelines lower the barrier to entry for contributors and help standardize collaboration across the team.
Scaling Considerations
If this project (or a similar one) were to scale to a larger user base or contributor community, the following tools and practices would be adopted or expanded:
- Stricter CI gates, such as required test coverage thresholds and branch protection rules, to maintain code quality at scale.
- Dependency monitoring tools (e.g., Dependabot) to keep dependencies secure and up to date.
- Pre-commit hooks to catch formatting, linting, and documentation issues earlier in the development cycle.
- Expanded documentation and examples, including tutorials and usage guides, to support a broader audience.
- Issue and PR templates refinement, ensuring high-quality reports and consistent reviews as contribution volume grows.
These tools and practices help ensure that the project remains maintainable, reliable, and welcoming as it scales in complexity and community size.