Tweet Segmentation and Its Application to Named Entity Recognition

10Jul 2015 by chintan No Comments

Tweet Segmentation and Its Application to Named Entity Recognition

Twitter has attracted millions of users to share and disseminate most up-to-date information, resulting in large

volumes of data produced everyday. However, many applications in Information Retrieval (IR) and Natural Language Processing (NLP) suffer severely from the noisy and short

nature of tweets. In this paper, we propose a novel framework for tweet segmentation
in a batch mode, called HybridSeg . By splitting tweets into meaningful segments, the semantic or context information is well preserved and easily extracted by the downstream

applications. HybridSeg finds the optimal segmentation of a tweet by maximizing the sum of the stickiness scores of its candidate segments. The stickiness score considers the

probability of a segment being a phrase in English (i.e., global context) and the probability of a segment being a phrase within
the batch of tweets (i.e., local context). For the latter, we propose and evaluate two models to derive local context by considering the linguistic features and termdependency in a

batch of tweets, respectively. HybridSeg is also designed to iteratively learn from confident segments as pseudo feedback. Experiments on two tweet data sets show that tweet

segmentation quality is significantly improved by learning both global and local contexts compared with using global
context alone. Through analysis and comparison, we show that local linguistic features are more reliable for learning local con-text compared with termdependency. As an

application, we show that high accuracy is achieved in named entity recognition by applying segment-based part-of-speech (POS) tagging.

SMS Controlled Railway Level Gate Control With Pro...

Student Attendance System By QR Scan

Persuasive Cued Click-Points: Design, implementati...

Image Steganography With 3 Way Encryption

Object Tracker & Follower Robot Using Raspber...

In Cloud, Can Scientific Communities Benefit from ...

Android Voting System

Learning To Predict Dental Caries For Preschool Ch...

E-Commerce Sales Prediction Using Listing Keywords

Detecting Fraud Apps Using Sentiment Analysis

Sentiment Analysis for Product Rating