Using Tweets for single stock price prediction
Social media, as the collective form of individual opinions and emotions, has very profound though maybe subtle relationship with social events. This is particularly true when it comes to public Tweets and stock trading. In fact, research has shown that when it comes to financial decisions, people are significantly driven by emotions [1]. These emotions, together with people’s opinions, are in real-time reflected by tweets. As a result, by analyzing relevant tweets using proper machine learning algorithms, one could grasp the public’s sentiment as well as attitude towards the stock’s price of interest, which could intuitively predict the next move of it. Some previous work has been done to show that tweets can indeed reflect stock price change. Bollen. Etc (2010) randomly selected three months’ tweets, and pointed out that, surprisingly, the ‘Calmness’ score of these tweets is able to resemble some of the key features on Dow Jones Industrial Average(DJIA) price change within the same period of time [2] . Other work focused on one single stock and used particular person’s tweets to predict that stock’s price change. [3] However, there is not yet any published work aimed to predict any single stock’s price using machine learning through social media, like tweets. In order to expand the scope of prediction from stock market index or some particular stock, we use keywords frequency (KF) of stock-relevant tweets to predict any single stock’s price change. By assuming that these keywords are non-sensitive to any particular stock, we could in theory predict any stock’s price change in real time. Note that Google has applied similar methodology for flu trend prediction using its gigantic search queries, and found out that certain query keywords are highly correlated with the current level of flu activity. This work has been publish on Nature [4] In this work, we divided our selected tweets data and stock price data from the same period of time into different time slots featured by ‘unit time length’, and used SVM and Naïve Bayes algorithms to train the keywords frequency from time slot i and stock return from time slot i+1. To be exact, our work contains four main steps. 1. Data collection and wrangling from Twitter and NASDAQ official website. 2. Keywords selection by TF-IDF method. 3. Optimization of the number of keywords from 2, and unit time length through systematic study. 4. Testing prediction using both
learning algorithms.
Research Paper Link: Download Paper