Classifying Online User Behavior Using Contextual Data
Despite the great computational power of machines, there a some things like interest-based segregation that only humans can instinctively distinguish. For example, a human can easily tell whether a tweet is about a book or about a kitchen utensil. However, to write a rule-based computer program to solve this task, a programmer must lay down very precise criteria for this these classifications. There has been a massive increase in the amount of structured user-generated content on the Internet in the form of tweets, reviews on Amazon and eBay etc. As opposed to stand-alone companies, which leverage their own hubs of data to run behavioral analytics, we strive to gain insights into online user behavior and interests based on free and public data. By learning more about a user’s preferences and interests based on this parsed data from numerous heterogeneous sources, we can classify his/her interests. This problem of classifying online user behavior is especially interesting and perhaps complex because multiple labels can be assigned to the same tweet. For example, users may have tweeted about how much they liked reading the Lord of The Rings Trilogy, and then playing the game on their XBox. One could use this data to predict either sentiment, and given that tweets are at most 140 characters, the problem complicates.
Research Paper Link: Download Paper