Becoming a data scientist: My year-long hiatus from medical school

In my second year of medical school, I decided to open an unfamiliar email entitled “UBC Centennial Symposium on Health Informatics”. What was Health Informatics? I had no idea, but I intended to find out. The symposium featured the UK’s National Health Service (NHS) and showcased their use of large population datasets. I was impressed with how the NHS uses data to improve care. I went into medicine because of a personal desire to help people one-on-one; applying data science to healthcare seemed to provide the opportunity to positively impact millions at a time, in addition to one at a time.

When I first heard about the program, I saw UBC’s Master of Data Science (MDS) as an entry point into the world of data-driven healthcare. The MDS is a 10-month intensive program focused on developing statistics, computing and machine learning expertise, working with multiple real-life data sets. It’s topped off by an industry capstone project, where students get to apply all that they’ve learned in a 2-month project. My capstone team worked with health tech startup QxMD on building a recommendation system for medical research papers. Our recommender outperforms the current system and will be deployed to production in July 2018 to 500,000 users.

But let’s back up a bit. I started this Master’s as a person who avoided math and statistics, knowing AI as a buzzword (or something from a Hollywood movie), and with nothing more than an introductory computer science course under my belt. I was in the process of completing clerkship at UBC medical school (in the hospital day and night and studying in between) and felt that it was now or never. I hoped to inject my medical education with technical literacy so that, in the future, I could actively incorporate data science into my clinical practice.

Here are a few of the many lessons I’ve learned in the past year of MDS:

Understanding the data scientist’s tools… and their limitations
- A data scientist’s role is largely to obtain data, make it usable, decide how to use it, and finally evaluate how it was used. This seems simple enough, but upon further examination each step involves countless decisions, and each decision has its own implications. For instance, the way one handles missing data can have significant repercussions on one’s data analysis. The program provided me with the ability to cut through the “big data” hype and think critically about the choices being made, and their consequences. It also gave me a sense of what is and isn’t possible with data science.
Coding
- While I can now confidently code in Python and R, the larger impact is that I’m no longer intimidated by the prospect of learning a new programming language. I learned how to learn online: finding online resources, reading documentation, and then acquiring the necessary knowledge to execute a task – whether it be making a personal blog/website, writing a daily email reminder script in Javascript, or creating a web app with Django.
Thinking statistically
- In medicine we are taught to think as Bayesians. This is the idea that one’s belief about a patient’s diagnosis is updated with each additional morsel of information. For example, let’s say a patient comes into the emergency room complaining of chest pain. Are they having a heart attack? If they’re a smoker, diabetic, and have a family history of heart disease (all risk factors for coronary artery disease), the doctor will be more convinced that the patient may indeed be having a heart attack. I found the transition from qualitative Bayesian thinking – which I’d learned in medical school – to the actual math, to be fascinating. A belief about a diagnosis was just a mathematical distribution (like a Gaussian curve) which was molded and transformed with each additional piece of information. Soon, other thoughts found their data science corollaries: pursuing a goal in my personal life became finding a global maximum of my “objective function”. In addition to thinking more statistically in my daily life, I have a newfound appreciation (and skepticism) for research. With more baseline knowledge, reading research papers has become approachable and interesting – a previously foreign concept.
Speaking data
- I learned the language of data flow. Notions that previously meant nothing to me, like APIs, databases (SQL vs. NoSQL), data transfer formats (XML, JSON), and Docker, are now dense with meaning, experience, and opinions. My favourite analogy from an a16z podcast elegantly connects my two worlds by drawing analogies between data flow and blood flow in the body’s cardiovascular system. The beauty of being able to speak both “medical” and “data” is being able to explain complex concepts to different audiences based upon their domain knowledge. In my MDS capstone project at the health tech start-up QxMD, I explained advanced machine learning concepts to a medical audience with medical analogies, and then explained medical concepts to a tech audience with tech analogies. The application of machine learning to healthcare requires a combination of data and medical literacy, and I feel confident that MDS has prepared me well.

So, what’s next? The medical field understands that to be a good doctor requires lifelong learning. Both the fields of medicine and data science are vast and ever-expanding, and in fact have an ever-expanding intersection. MDS has provided me with the tools and critical thinking skills to take the next steps into this intersection, the rapidly evolving field of health informatics. Having finished my MDS degree, I’m excited to throw myself back into medicine. I envision myself as a data-literate doctor who can understand clinicians and data scientists alike.

Daniel Raff is a 4th year medical student at the University of British Columbia. He graduated from the UBC MDS program in 2018 and hopes to be a general practitioner of medicine after finishing medical school.