INTEGER PROGRAMMING MODELS TO DETERMINE
OPTIMAL RUSH HOUR BIKE ALLOCATION:
AN EMPIRICAL CASE STUDY ON BOSTON BLUEBIKES

SPRING 2021

This project aims to formulate integer-programming models that determine the optimal allocation of bikes across stations in a bike-sharing system during rush hour intervals by minimizing total instances of unmet customer demand for available bikes and empty drop-off slots. The results of these integer programs when applied to Bluebikes ridership data inform operators of each station’s optimal bike-fill percentage at the beginning of morning and evening rush hours and are heavily correlated with the geographic distribution of Boston’s residential and business districts. Additionally, associated sensitivity analysis on binding station capacity parameters can be used to determine optimal system expansion strategies.

Report


MUSIC COMPOSITION AND CLASSIFICATION
WITH N-GRAM BASED MARKOV AND
NAIVE BAYES MODELS

SPRING 2021
In Collaboration with Jessica Tian, Hannah Phan, and Nina Uzoigwe

This project aims to compose musical pieces using Markov-Chain models trained on a piece of the same genre, and to classify the genre of a given piece using a n-gram based Naive Bayes model. Our results suggest that generating pieces using higher-order Markov models results in compositions that sound stylistically similar to the training piece and reflect their musical motifs. Additionally, using a Na ̈ıve Bayes model trained on bi-grams derived from a corpus of Baroque and Romantic pieces, we were able to achieve an overall test accuracy of 93.33% on identifying Baroque pieces and 92.31% on identifying Romantic pieces.

Report


EXTENSION OF THE SEIR EPIDEMIC MODEL
THROUGH AN AGE-STRATIFIED STOCHASTIC INDIVIDUAL
AGENT MODEL FOR COVID-19

SPRING 2021
In Collaboration with Jessica Tian, Hannah Phan, and Nina Uzoigwe

This project aims to extend the SEIR model for COVID-19 through a stochastic individual agent model that incorporates undetected (U) and fatal (F) populations to better capture the nuances of the virus’ effect on different communities. This model also stratifies by age to account for the role age plays in population movement, immunity, symptom development, and prognosis, leading to differing exposure, infection, and mortality rates across age groups. Our results suggest that elderly populations are significantly more susceptible to developing symptoms after exposure to COVID-19 compared to each younger age group and are less likely to recover after infection (fatality instead of recovery).

Analysis of our age-stratified stochastic individual agent SEIURF model allows us to understand how the epidemic affects each age group individually rather than the aggregated population as a whole, upon taking into account varying physiological responses to COVID-19 and social behaviors. These results can help inform current policy decisions necessary to control the spread of the COVID-19 epidemic; for instance, the larger number of undetected/asymptomatic cases for children and young adults suggests that policies like creating shopping hours for senior citizens at household essentials stores are necessary to limit the number of infections and fatalities in the elderly population. Additionally, these results can help inform vaccination strategies. Our age-stratified SEIURF individual agent model can be used to project how system dynamics will be impacted if the vaccine is distributed to certain age groups first, as we can simply change parameters associated with the vaccinated group to reflect their newly immune status.

Report


PREDICTING MARKET VOLATILITY
WITH TRUMP TWEETS

FALL 2019
In Collaboration with Ryan Wood and Alice Zhang

We live in a digital world that is ever-changing, from the sources of our information, to the credibility we place in various online mediums, and the ease and patterns of disseminating information. Online text is incredibly powerful. Furthermore, social media is ubiquitous, and Twitter alone produces, on average, 500 million tweets a day. Individual tweets can have a profound effect. Another global trend is that the political landscape is becoming increasingly polarizing. Within the United States, we are experiencing a unique period in history. To this end, tweets, especially political ones, are influential and ripe for analysis. Throughout the past four years, President Donald Trump has often tweeted about both market-related and nonrelated topics. In this project, we sought to discover a relationship between intraday CBOE Volatility Index (VIX) / S&P 500 data and President Trump's tweets from January 2016 to November 2019.

Code Website


AUTOMATING INDUSTRY CLASSIFICATION
FOR THE ECONOMIC CENSUS

SUMMER 2019

Each business establishment in the United States is assigned a six-digit industry classification code, per the North American Industry Classification System (NAICS), that describes its primary business activity. Industry classification codes are crucial for the United States Census Bureau to describe domestic economic activity and its dynamics. The Census Bureau spends considerable time and resources identifying and classifying the industry of establishments. The burden of collection is significant on business establishments as they are required to fill out lengthy and tedious surveys, as well as on the Census Bureau due to high mail-out and analyst costs. To ameliorate these issues, I implemented a segment of a machine learning pipeline that uses public data associated with American business establishments from the Google Places API to generate their associated six-digit industry classification codes. This technology has projected savings of $1.2 million for the Census Bureau.

The presentation my team delivered to U.S. Census Bureau upper management staff, including Deputy Director and COO Ron Jarmin, and to an audience of 300 consisting of leaders in the civic technology space:

Presentation


MINIML INTERPRETER

SPRING 2019

The MiniML metacircular interpreter is a Turing complete language based on OCaml that supports basic operations, recursion, and atomic types. MiniML supports three different environment models for computation, the substitution model, the dynamic scoped environment model, and the lexical scoped environment model.

Code Writeup


PACK & MATCH

WINTER 2018

Pack&Match is a food management web application that connects Harvard University Dining Services (HUDS) to local food shelters. In 2017, HUDS had over 30,700 meals worth of food leftover. While HUDS does currently donate these excess meals, their current methods do not allow for the shelters to request specific types of food that they need and for HUDS to indicate the specific types of food that they have leftover on a daily basis. The Pack&Match web application allows for shelters to make requests for specific food categories, as late as the day before they require the supplies. An optimization algorithm then ranks the shelters, taking into account the shelter’s specific requests and their distance away from HUDS. The top three shelters will then be “matched” to HUDS by this web application, who proceeds to donate that day’s excess meals accordingly.

Code Website