Organization Name

Management Information Systems Group, UBC Sauder School of Business

About Your Organization

We are faculty members at UBC Sauder School of Business. Big Data and Data Science are creating profound impacts on various fields including Information Systems research. Our research interests include Business Analytics, which has the aim of gaining business insights from internal and external data sources in order to make timely data-driven decisions for competitive advantage in the complex business environments.

Brief Description of the Problem/Question

U.S. Securities and Exchange Commission (SEC) mandates all publicly listed firms to report their detailed statuses via various filings such as 10-K annual reports (which provides a comprehensive summary of a firm’s status including current businesses/strategies and financial metrics; an example file from Apple can be found here. These filings constitute a great semi-structured big data source to understand the companies and industries. In this project, we seek to build a Big Data framework to leverage SEC filings for obtain industry intelligence. Specifically, we expect MDS team (1) to construct “social network” of companies (e.g., competition, acquisition, alliance, supply chain, etc.) based on the textual data from SEC filings (via Named Entity Recognition and other NLP techniques), then (2) to build interactive visualization that helps to interpret the network.

Available Data Sources

We have implemented an information system to collect all the filings from SEC EDGAR, which feeds structured meta data into a MySQL database and feeds text data into a MongoDB. As of November, 2017, we have a total of 11.5M filings of 735 types from 526K entities (including firms and investors) headquartered in 24K cities. In addition, we have an access to Wharton Research Data Services (WRDS) for firm performance data. In terms of computational resources, our research group established a dedicated, physical server (20 cores, 256GB RAM) co-located at UBC Data Centre and an access to Compute Canada for further computational needs.

Data Product

For the deliverables, we expect the MDS team to provide a project pipeline to construct corporate social networks and the associated interactive visualization, and a report outlining student project processes and findings.

Raw data provided to MDS team cannot be made available to public. The aggregate statistics and summary of analyses can be made available to public with an approval from us. The data products delivered by students can be used in academic publications by our group.

Potential Conflicts of Interest


Do you have space available for students to work on site?


Do you anticipate having data scientist job opening(s) after the project?