Vector-based Sentiment Analysis of Movie Reviews
We investigate sentence sentiment using the Pang and Lee dataset as annotated by Socher, et al. [1]. Sentiment analysis research focuses on understanding the positive or negative tone of a sentence based on sentence syntax, structure, and content. Previous research used a tree-based model to label sentence sentiment on a scale of 5 points. Our project takes a different approach of abstracting the sentence as a vector and apply vector classification schemes. We explore two components: first, we would like to analyze the use of different sentence representations, such as bag of words, word sentiment location, negation, etc., and abstract them into a set of features. Second, we would like to classify sentence sentiment using this set of features and compare the effectiveness of different models. While sentiment polarity was analyzed in a previous year’s project, we would like to explore 5 degrees of sentiment labeling (Figure 1). We chose to investigate sentiment of movie reviews which could be compared to numeric movie ratings. We looked at a variety of models and feature types to attempt to capture the context important for accurate sentiment decoding of the phrases of a sentence. While a tree-based feature set by Socher et al. [1] exists, we also wanted to explore how linear feature vectors would fare for sentence sentiment classification .
Research Paper Link: Download Paper