-
Analysis of Fine-tuning RoBERTa for Content Moderation on Reddit: Implemented an automatic comment-flagging model specific to the rules of a given subreddit using a RoBERTa model. See Report
-
COVID-19: War of Twitter Narratives: Analysis of political discourse on Twitter in the US with respect to the ongoing COVID-19 pandemic using binary classification. Performed data exploration using LDA. Achieved F1 score of 0.74 using Logistic Regression with TFIDF. See Report
-
Flight Delay Prediction: Designed a model to predict flight delays for flights departing from three major airports in the US based on historical data of flight delays, past weather data and US Bank holidays data. Achieved 0.91 accuracy using ensemble methods
-
Book Recommendation Engine: Implemented a book recommendation engine using Alternating Least Squares (ALS) in Spark. Carried out data exploration using t-SNE. See Report