A data analysis project for DSCI 522 (Data Science workflows); a course in the Master of Data Science program at the University of British Columbia.
In this project, we set out to classify individuals into two age groups—Seniors (65 years and older) and Adults (under 65 years)—using data from the NHANES 2013-2014 survey. The dataset, consisting of 2,278 entries, was carefully preprocessed to ensure it was clean, well-structured, and balanced for analysis. The dataset can be found here.
We developed a logistic regression model that achieved moderate success, with an accuracy of around 73% and a macro average F1 score of 61% (final metrics pending). While the model performed well in classifying many Seniors and Adults, it left room for improvement, particularly in handling edge cases. Moving forward, we plan to refine the model by engineering new features, experimenting with classification thresholds, and exploring alternative algorithms like K-Nearest Neighbors, SVC, and Naive Bayes. This work provides a solid foundation for using machine learning to support smarter healthcare planning and resource allocation.
The final report can be found here.
If you are using Windows or Mac, make sure Docker Desktop is running.
git clone https://github.com/UBC-MDS/DSCI522-2425-group31_age-group-prediction.git
docker compose up
http://127.0.0.1:8888/lab?token=
(for an example, see the highlighted text in the terminal below).
Copy and paste that URL into your browser.make clean
make all
reports
directory in the root folder
and then select age_prediction_report.pdf
.Cntrl
+ C
in the terminal
where you launched the container, and then type docker compose rm
conda
(version 23.9.0 or higher)conda-lock
(version 2.5.7 or higher)Add the dependency to the environment.yml
file on a new branch.
Run conda-lock -k explicit --file environment.yml -p linux-64
to update the conda-linux-64.lock
file.
Re-build the Docker image locally to ensure it builds and runs properly.
Push the changes to GitHub. A new Docker image will be built and pushed to Docker Hub automatically. It will be tagged with the SHA for the commit that changed the file.
Update the docker-compose.yml
file on your branch to use the new
container image (make sure to update the tag specifically).
Send a pull request to merge the changes into the main
branch.
Use the same docker compose up
command as described in the Running the analysis section above
to launch Jupyter lab.
Tests are run using the pytest
command in the root of the project.
More details about the test suite can be found in the
test
directory.
The analysis report contained herein are licensed under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. See the license file for more information. If re-using/re-mixing please provide attribution and link to this webpage. The software code contained within this repository is licensed under the MIT license. See the license file for more information.