Vector-based Sentiment Analysis of Movie Reviews

Artificial Intelligence & ML, Data mining, Machine Learning
Vector-based Sentiment Analysis of Movie Reviews We investigate sentence sentiment using the Pang and Lee dataset as annotated by Socher, et al. [1]. Sentiment analysis research focuses on understanding the positive or negative tone of a sentence based on sentence syntax, structure, and content. Previous research used a tree-based model to label sentence sentiment on a scale of 5 points. Our project takes a different approach of abstracting the sentence as a vector and apply vector classification schemes. We explore two components: first, we would like to analyze the use of different sentence representations, such as bag of words, word sentiment location, negation, etc., and abstract them into a set of features. Second, we would like to classify sentence sentiment using this set of features and compare the effectiveness of…
Read More

Using Tweets for single stock price prediction

Artificial Intelligence & ML, Data mining, Machine Learning
Using Tweets for single stock price prediction Social media, as the collective form of individual opinions and emotions, has very profound though maybe subtle relationship with social events. This is particularly true when it comes to public Tweets and stock trading. In fact, research has shown that when it comes to financial decisions, people are significantly driven by emotions [1]. These emotions, together with people’s opinions, are in real-time reflected by tweets. As a result, by analyzing relevant tweets using proper machine learning algorithms, one could grasp the public’s sentiment as well as attitude towards the stock’s price of interest, which could intuitively predict the next move of it. Some previous work has been done to show that tweets can indeed reflect stock price change. Bollen. Etc (2010) randomly selected…
Read More

Recommendation based on user experiences

Artificial Intelligence & ML, Data mining, Machine Learning
Recommendation based on user experiences Recommender systems follow 2 main strategies: contentbased filtering and collaborative filtering. Collaborative is often the preffered approach as it requires no domain knowledge and no feature gathering effort. The 2 primary methods for collaborative filtering are latent factor models and neighborhood methods. In user-user neighbourhood methods, similarity between users is measured by transforming them into the item space. Similar logic applies to item-item similarity. In latent factor methods, both user and items are transfomed into a latent featuee space. An item is recommended to a user if thu are similar, their vector representation in the latent feature spase is relatively high. We select latent factor model because it allows us to identify the hidden feature of the users. These features are time indepedent. We first…
Read More

Learning To Predict Dental Caries For Preschool Children

Artificial Intelligence & ML, Data mining, Machine Learning
Learning To Predict Dental Caries For Preschool Children Dental caries, or tooth decay/cavity, is a dental disease caused by bacterial infection. Of people from different age groups, preschooler children requires more attention since caries has become the most common chronic childhood diseases. More importantly, a skewed distribution of the diseases has been observed in Europe, US and Singapore among the children or preschoolers, which indicate a small portion of the population endures a big portion of caries incidences. Therefore, there is still the need to improve on the current caries control to identify the high-risk individuals and prevent resurgence in children in developed countries like Singapore. Our project will study on the data such as questionnaire responses, oral examination and biological tests of certain preschoolers from Singapore and use suitable…
Read More

Predicting air pollution level in a specific city

Artificial Intelligence & ML, Data mining, Machine Learning
Predicting air pollution level in a specific city The regulation of air pollutant levels is rapidly becoming one of the most important tasks for the governments of developing countries, especially China. Among the pollutant index, Fine particulate matter (PM2.5) is a significant one because it is a big concern to people's health when its level in the air is relatively high. PM2.5 refers to tiny particles in the air that reduce visibility and cause the air to appear hazy when levels are elevated. However, the relationships between the concentration of these particles and meteorological and traffic factors are poorly understood. To shed some light on these connections, some of these advanced techniques have been introduced into air quality research. These studies utilized selected techniques, such as Support Vector Machine (SVM)…
Read More

Sentiment Analysis on Movie Reviews

Artificial Intelligence & ML, Data mining, Machine Learning
Sentiment Analysis on Movie Reviews Sentiment analysis is a well-known task in the realm of natural language processing. Given a set of texts, the objective is to determine the polarity of that text. [9] provides a comprehensive survey of various methods, benchmarks, and resources of sentiment analysis and opinion mining. The sentiments can consist of different classes. In this study, we consider two cases: 1) A movie review is positive (+) or negative (-). This is similar to [2], where they also employ a novel similarity measure. In [10], authors perform sentiment analysis after summarizing the text. 2) A movie review is very negative (- -), somewhat negative (-), neutral (o), somewhat positive (+), or very positive (+ +). For the first case, we picked a Kaggle1 competition called “Bag…
Read More

Predicting Soccer Results in the English Premier League

Artificial Intelligence & ML, Data mining, Machine Learning
Predicting Soccer Results in the English Premier League There were many displays of genius during the 2010 World Cup, ranging from Andrew Iniesta to Thomas Muller, but none were as unusual as that of Paul the Octopus. This sea dweller correctly chose the winner of a match all eight times that he was tested. This accuracy contrasts sharply with one of our team member’s predictions for the World Cup, who was correct only about half the time. Due to love of the game, and partly from the shame of being outdone by an octopus, we have decided to attempt to predict the outcomes of soccer matches. This has real world applications for gambling, coaching improvements, and journalism. Out of the many leagues we could have chosen, we decided upon the…
Read More

Classifying Online User Behavior Using Contextual Data

Artificial Intelligence & ML, Data mining, Machine Learning
Classifying Online User Behavior Using Contextual Data Despite the great computational power of machines, there a some things like interest-based segregation that only humans can instinctively distinguish. For example, a human can easily tell whether a tweet is about a book or about a kitchen utensil. However, to write a rule-based computer program to solve this task, a programmer must lay down very precise criteria for this these classifications. There has been a massive increase in the amount of structured user-generated content on the Internet in the form of tweets, reviews on Amazon and eBay etc. As opposed to stand-alone companies, which leverage their own hubs of data to run behavioral analytics, we strive to gain insights into online user behavior and interests based on free and public data. By…
Read More

Extracting Word Relationships from Unstructured Data

Artificial Intelligence & ML, Machine Learning
Extracting Word Relationships from Unstructured Data Robots are advancing rapidly in their behavioural functionality allowing them to perform sophisticated tasks. However, their ability to take Natural Language instructions is still in its infancy. Parsing, Semantic Intrepretation and Dialogue Management are typically performed only on a limited set of primitives, thus limiting the set of instructions that could be given to a robot. This limits a robot’s applicability in unconstrained natural environments (like households and offices) [8]. In this project, we are only addressing the problem of semantic interpretation of human instructions. Specifically, our Extracto algorithm provides a method to extract potential actions (verbs) that could be performed given two household objects (nouns). For example, given the nouns “Coffee” and “Cup”, Extracto identifies the action (verb) “pour” indicating that ‘coffee should…
Read More

Bird Species Identification from an Image

Artificial Intelligence & ML, Image Processing, Machine Learning
Predicting ground shaking intensities using DYFI data and estimating event terms to identify induced earthquakes In daily life we can hear a variety of creatures including human speech, dog barks, birdsongs, frog calls, etc. Many animals generate sounds either for communication or as a by product of their living activities such as eating, moving, flying, mating etc. Bird species identification is a well-known problem to ornithologists, and it is considered as a scientific task since antiquity. Technology for Birds and their sounds are in many ways important for our culture. They can be heard even in big cities and most people can recognize at least a few most common species by their sounds. Biologists tried to investigate species richness, presence or absence of indicator species, and the population sizes of…
Read More

Predicting ground shaking intensities using DYFI data and estimating event terms to identify induced earthquakes

Artificial Intelligence & ML, Machine Learning
Predicting ground shaking intensities using DYFI data and estimating event terms to identify induced earthquakes There has been a dramatic increase in seismicity in CEUS in recent years (Ellsworth 2013). There is a possibility that this increased seismicity in CEUS is caused by anthropogenic processes and is referred to as induced or triggered seismicity. The earthquakes are a nuisance for people and some larger magnitude earthquakes have also caused structural damage. Hence, it is important to quantify seismic hazard and risk from this increased seismicity. One of the major components in determining seismic hazard and risk is the expected level of ground shaking at a site. Level of ground shaking from a given earthquake is typically estimated using previously collected ground motion data in a region. However, in CEUS due…
Read More

Identifying Gender From Facial Features

Artificial Intelligence & ML, Image Processing, Machine Learning
Identifying Gender From Facial Features Previous research has shown that our brain has specialized nerve cells responding to specific local features of a scene, such as lines, edges, angles or movement. Our visual cortex combines these scattered pieces of information into useful patterns. Automatic face recognition aims to extract these meaningful pieces of information and put them together into a useful representation in order to perform a classification/identification task on them. While we attempt to identify gender from facial features, we are often curious about what features of the face are most important in determining gender. Are localized features such as eyes, nose and ears more important or overall features such as head shape, hair line and face contour more important? There are a plethora of successful and robut face…
Read More

Analyzing Positional Play in Chess using Machine Learning

Artificial Intelligence & ML, Machine Learning
Analyzing Positional Play in Chess using Machine Learning Chess has two broad approaches to game-play, tactical and positional. Tactical play is the approach of calculating maneuvers and employing tactics that take advantage of short-term opportunities, while positional play is dominated by long-term maneuvers for advantage and requires judgement more than calculations. Current generation chess engines predominantly employ tactical play and thus outplay top human players given their much superior computational abilities. Engines do so by searching game trees of depths typically between 20 and 30 moves and calculating a large number of variations. However, human play is often a combination of both, tactical and positional approaches, since humans have some intuition about which board positions are intrinsically better than others. In our project, we use machine learning to identify elements…
Read More

PREDICTING HOSPITAL READMISSION SIN THE MEDICARE POPULATION

Artificial Intelligence & ML, Data mining, Machine Learning, MSC IT
PREDICTING HOSPITAL READMISSION SIN THE MEDICARE POPULATION Avoidable hospital readmissions cost taxpayers billions of dollars each year. The Medicare Payment Advisory Commission has estimated that almost $12 billion is spent annually by Medicare on potentially preventable readmissions within 30 days of a patient’s discharge from a hospital [1]. The Medicare program has begun to apply financial penalties to hospitals that have excessive risk-adjusted readmission rates. There is much interest in the health policy and medical communities in the ability to accurately predict which patients are at high risk of being readmitted. Not only are there strong financial reasons to avoid readmissions, readmission to the hospital can be a sign of poor clinical care and can indicate a worsening of a patient’s condition [2]. If doctors and nurses were aware of…
Read More

Object Detection for Semantic SLAM using Convolution Neural Networks

Artificial Intelligence & ML, Data mining, Image Processing, Machine Learning
Object Detection for Semantic SLAM using Convolution Neural Networks Conventional SLAM (Simultaneous Localization and Mapping) systems typically provide odometry estimates and point-cloud reconstructions of an unknown environment. While these outputs can be used for tasks such as autonomous navigation, they lack any semantic information. Our project implements a modular object detection framework that can be used in conjunction with a SLAM engine to generate semantic scene reconstructions. A semantically-augmented reconstruction has many potential applications. Some examples include: • Discriminating between pedestrians, cars, bicyclists, etc in an autonomous driving system. • Loop-closure detection based on object-level descriptors. • Smart household bots that can retrieve objects given a natural language command. An object detection algorithm designed for these applications has a unique set of requirements and constraints. The algorithm needs to be…
Read More

Sentiment as a Predictor of Wikipedia Editor Activity

Artificial Intelligence & ML, Machine Learning
Sentiment as a Predictor of Wikipedia Editor Activity Wikipedia, the worlds largest encyclopedia, is created by millions of unpaid editors online. Every user can edit every article, and the project is protected against vandalism and low-quality contributions only through version control and a system of (again unpaid) reviewers. Somewhat hidden to most casual readers of the encyclopedia, Wikipedia also features a simple social network: every user has a personal user profile and a “user talk page” which acts as a publicly accessible guestbook where users can leave messages to each other. The messages exchanged in user talk pages are often related to a user’s editing behavior. For example, senior users may welcome new users, or congratulate them on their first edits. Administrators may officially warn culprits after transgressions of Wikipedias…
Read More

Application of Neural Network In Handwriting Recognition

Artificial Intelligence & ML, Image Processing, Machine Learning
Application of Neural Network In Handwriting Recognition Handwriting recognition can be divided into two categories, namely on-line and off-line handwriting recognition. On-line recognition involves live transformation of character written by a user on a tablet or a smart phone. In contrast, off-line recognition is more challenging, which requires automatic conversion of scanned image or photos into a computerreadable text format. Motivated by the interesting application of off-line recognition technology, for instance the USPS address recognition system, and the Chase QuickDeposit system, this project will mainly focus on discovering algorithms that allow accurate, fast, and efficient character recognition process. The report will cover data acquisition, image processing, feature extraction, model training, results analysis, and future works. Image Processing Flow   Research Paper Link: Download Paper
Read More

Re-clustering of Constellations through Machine Learning

Artificial Intelligence & ML, Machine Learning
Re-clustering of Constellations through Machine Learning Since thousands of years ago, people around the world have been looking up into the sky, trying to find patterns of visible stars’ distribution, and dividing them into different groups called constellations. Originally, constellations are recognized and organized by people’s imaginations based on the shapes of the star distribution. The most two famous groups of stars is the “Big Dipper” and the “Orion”. In modern astronomy, the International Astronomical Union (IAU) has defined constellations as specific areas of the celestial sphere. These areas have their origins in star patterns from which the constellations take their names. In total, there are 88 officially recognized constellations. On the other hand, certain stars are grouped together primarily because they are close to each other and far away…
Read More

Collaborative Filtering Recommender Systems

Artificial Intelligence & ML, Data mining, Machine Learning
Collaborative Filtering Recommender Systems Collaborative filtering (CF) predicts user preferences in item selection based on the known user ratings of items. As one of the most common approach to recommender systems, CF has been proved to be effective for solving the information overload problem. CF can be divided into two main branches: memory-based and model-based. Most of the present researches improve the accuracy of Memory-based algorithms only by improving the similarity measures. But few researches focused on the prediction score models which we believe are more important than the similarity measures. The most well-known algorithm to model-based is the matrix factorization. Compared to the memory-based algorithms, matrix factorization algorithm generally has higher accuracy. However, the matrix factorization may fall into local optimum in the learning process which leads to inadequate…
Read More

Blowing up the Twittersphere: Predicting the Optimal Time to Tweet

Artificial Intelligence & ML, Data mining, Machine Learning
Blowing up the Twittersphere: Predicting the Optimal Time to Tweet We can separate our problem into a few different steps. First, we need to model information about a tweet and how successful a given tweet is. Second, given a tweet, user, and post time, we must predict how successful that tweet will be. Finally, we then need to use our predictor to determine the optimal time for a given user to post a specific tweet, i.e. what time maximizes our success prediction for a specific user and tweet. We considered two papers that address similar problems of using Machine Learning to understand interactions in social media and predict success of online content. Lakkaruja, McAuley, and Leskovec consider the connections between title, content and community in social media. From their work,…
Read More

Recognition and Classification of Fast Food Images

Artificial Intelligence & ML, Data mining, Machine Learning
Recognition and Classification of Fast Food Images Food recognition is of great importance nowadays for multiple purposes. On one hand, for people who want to get a better understanding of the food that they are not familiar of or they haven’t even seen before, they can simply take a picture and get to know more details about it. On the other hand, the increasing demand for dietary assessment tools to record the calorie and nutrition has also been a driving force in the development of food recognition technique. Therefore, automatic food recognition is very important and has great application potential. However, food varies greatly in appearance (e.g., shape, colors) with tons of different ingredients and assembling methods. This makes food recognition a difficult task for current state-of-the-art classification methods, and…
Read More

Predicting Heart Attacks

Artificial Intelligence & ML, Data mining, Machine Learning
Predicting Heart Attacks In the field of Medical Science, there are a huge amount of data. Data mining techniques are being used to discover hidden pattern form these data. Advance data mining techniques have been developed nowadays. The efficiency of these techniques is compared with sensitivity, specificity, accuracy and error rate. Some well known Data mining classification techniques, Decision Tree, Artificial neural networks, and Support Vector Machine and Naïve Bayes Classifier. In this paper, we introduce a new method based on the fitness value of the attribute to predict the heart disease problem. We use 10 attributes for our proposed method and use simple calculation. In our everyday life, there are several example exit where we have to analyze the historical data, for example, a bank loans officer needs analysis…
Read More

E-Commerce Sales Prediction Using Listing Keywords

Artificial Intelligence & ML, Data mining, Machine Learning
E-Commerce Sales Prediction Using Listing Keywords Small online retailers usually set themselves apart from brick and mortar stores, traditional brand names, and giant online retailers by offering goods at an exceptional value. In addition to price, they compete for shoppers’ attention via descriptive listing titles, whose effectiveness as search keywords can help drive sales. In this study, machine learning techniques will be applied to online retail data to measure the link between keywords and sales volumes. Architecture Research Paper Link: Download Paper
Read More

Prediction and Classification of Cardiac Arrhythmia

Artificial Intelligence & ML, Data mining, Machine Learning
Prediction and Classification of Cardiac Arrhythmia Irregularity in heartbeat may be harmless or life-threatening. Hence both accurate detection of the presence, as well as classification of arrhythmia, are important. Arrhythmia can be diagnosed by measuring the heart activity using an instrument called ECG or electrocardiograph and then analyzing the recorded data. Different parameter values can be extracted from the ECG waveforms and can be used along with other information about the patient like age, medical history, etc to detect arrhythmia. However, sometimes it may be difficult for a doctor to look at these long-duration ECG recordings and find minute irregularities. Therefore, using machine learning for automating arrhythmia diagnosis can be very helpful. The project aims at using different machine learning algorithms like Naive Bayes, SVM, Random Forests and Neural Networks…
Read More

Sentiment Analysis for Hotel Reviews

Artificial Intelligence & ML, Data mining, Machine Learning
Sentiment Analysis for Hotel Reviews Travel planning and hotel booking on the website have become one of an important commercial use. Sharing on the web has become a major tool in expressing customer thoughts about a particular product or Service. Recent years have seen rapid growth in online discussion groups and review sites (e.g.www.tripadvisor.com) where a crucial characteristic of a customer’s review is their sentiment or overall opinion — for example, if the review contains words like ‘great’, ‘best’, ‘nice’, ‘good’, ‘awesome’ is probably a positive comment. Whereas if reviews contain words like ‘bad’, ‘poor’, ‘awful’, ‘worse’ is probably a negative review. However, Trip Advisor’s star rating does not express the exact experience of the customer. Most of the ratings are meaningless, a large chunk of reviews fall in the…
Read More

Mood Detection with Tweets

Artificial Intelligence & ML, Data mining, Machine Learning
Mood Detection with Tweets Emotional states of individuals, also known as moods, are central to the expression of thoughts, ideas, and opinions, and in turn, impact attitudes and behavior. Social media tools like Twitter is increasingly used by individuals to broadcast their day-to-day happenings or to report on an external event of interest, understanding the rich „landscape‟ of moods will help us better to interpret millions of individuals. This paper describes a Rule-Based approach, which detects the emotion or mood of the tweet and classifies the twitter message under the appropriate emotional category. The accuracy with the system is 85%. With the proposed system it is possible to understand the deeper levels of emotions i.e., finer grained instead of sentiment i.e., coarse-grained. The sentiment says whether the tweet is positive…
Read More

3D Scene Retrieval from Text with Semantic Parsing

Artificial Intelligence & ML, Machine Learning
3D Scene Retrieval from Text with Semantic Parsing We look at the task of 3D scene retrieval: given a natural-language description and a set of 3D scenes, identify a scene matching the description. Geometric specifications of 3D scenes are part of the craft of many graphical computing applications, including computer animation, games, and simulators. Large databases of such scenes have become available in recent years as a result of improvements in the ease of use of tools for 3D scene design. A system that can identify a 3D scene from a natural language description is useful for making such databases of scenes readily accessible. Natural language has evolved to be well-suited to describing our (three-dimensional) world, and it provides a convenient way of specifying the space of acceptable scenes: a…
Read More

Parking Occupancy Prediction and Pattern Analysis

Artificial Intelligence & ML, Data mining, Machine Learning
Parking Occupancy Prediction and Pattern Analysis According to the Department of Parking and Traffic, San Francisco has more cars per square mile than any other city in the US [1]. The search for an empty parking spot can become an agonizing experience for the city’s urban drivers. A recent article claims that drivers cruising for a parking spot in SF generate 30% of all downtown congestion [2]. These wasted miles not only increase traffic congestion, but also lead to more pollution and driver anxiety. In order to alleviate this problem, the city armed 7000 metered parking spaces and 12,250 garages spots (total of 593 parking lots) with sensors and introduced a mobile application called SFpark [3], which provides real time information about availability of a parking lot to drivers. However,…
Read More

Stock Trend Prediction with Technical Indicators using SVM

Artificial Intelligence & ML, Machine Learning
Stock Trend Prediction with Technical Indicators using SVM Short-term prediction of stock price trend has potential application for personal investment without high-frequency-trading infrastructure. Unlike predicting market index (as explored by previous years’ projects), a single stock price tends to be affected by large noise and long-term trend inherently converges to the company’s market performance. So this project focuses on short-term (1-10 days) prediction of stock price trend and takes the approach of analyzing the time series indicators as features to classify trend (Raise or Down). The validation model is chosen so that the testing set always follows the training set in the time span to simulate real prediction. Cross-validated Grid Search on parameters of RBF-kernelized SVM is performed to fit the training data to balance the bias and variances. Although…
Read More

Predicting Usefulness of Yelp Reviews

Artificial Intelligence & ML, Data mining, Machine Learning
Predicting the Usefulness of Yelp Reviews The Yelp Dataset Challenge makes a huge set of user, business, and review data publicly available for machine learning projects. They wish to find interesting trends and patterns in all of the data they have accumulated. Our goal is to predict how useful a review will prove to be to users. We can use review upvotes as a metric. This could have immediate applications – many people rely on Yelp to make consumer choices, so predicting the most helpful reviews to display on a page before they have actually been rated would have a serious impact on user experience. Research Paper Link: Download Paper
Read More

Multiclass Classifier Building with Amazon Data to Classify Customer Reviews into Product Categories

Artificial Intelligence & ML, Data mining, Machine Learning
Multiclass Classifier Building with Amazon Data to Classify Customer Reviews into Product Categories - E-commerce refers to the Electronic Commerce and defined as buying and selling of products over electronic systems such as the Internet. With the widespread use of the Internet, the trade conducted electronically (online) has grown extraordinarily. The E-commerce companies have a large database of products and a number of consumers that use these data. To address this data and information explosion, e-commerce stores are applying machine learning to identify and customize the product category information. Data scientists in this field are utilizing machine learning potential to build unmatched competitiveness in the market by finding purchase preferences, customer churn and product suggestions etc. Applying popular Machine Learning algorithms to huge datasets brought new challenges for the ML…
Read More

An Energy Efficient Seizure Prediction Algorithm

Artificial Intelligence & ML, Machine Learning
An Energy Efficient Seizure Prediction Algorithm Epileptic seizures afflict over 1% of the world’s population. If seizures could be predicted before they occur, fast-acting therapies could be delivered to prevent the attack and restore a normal quality of life to patients. Over the last two decades, several studies have explored the use of EEG signals to predict seizures using principles from machine learning [1]–[3]. It is thought that such an algorithm could be implemented in real-time with a wireless, implanted EEG sensor. However, there are two main constraints for such a portable system. First, due to limited battery life, energy consumption must be minimal. Second, due to limited bandwidth, the data transmitted between the sensor and the central processing device (such as mobile phone, tablet, personal computer, etc.) should be…
Read More

Classifier Comparisons On Credit Approval Prediction

Artificial Intelligence & ML, Machine Learning, MSC IT
Classifier Comparisons On Credit Approval Prediction The objective of this work is to investigate the performance of different classification algorithms using WEKA tool for credit card approval. A major problem in financial analysis is to build an ultimate model that yields fruitful results on certain given information. Neither a single data mining model fulfills all business requirements nor does a business need depend on a single model. Different models must be evaluated to attain the ultimate model. This kind of difficulty could be resolved with the aid of machine learning which could be used directly to obtain the end result with the aid of several artificial intelligent algorithms which perform the role of classifiers. Classification algorithms always find a rule or set of rules to represent data in classes [1].…
Read More

Automatic Number Plate Recognition System

Artificial Intelligence & ML, Machine Learning
Automatic Number Plate Recognition System The Automatic number plate recognition (ANPR) is a mass surveillance method that uses optical character recognition on images to read the license plates on vehicles. They can use existing closed-circuit television or road-rule enforcement cameras, or ones specifically designed for the task. They are used by various police forces and as a metmachinhod of electronic toll collection on pay-per-use roads and monitoring traffic activity, such as red light adherence in an intersection. ANPR can be used to store the images captured by the cameras as well as the text from the license plate, with some configurable to store a photograph of the driver. Systems commonly use infrared lighting to allow the camera to take the picture at any time of the day. A powerful flash…
Read More

Practical Approximate k Nearest Neighbor Queries with Location and Query Privacy

Artificial Intelligence & ML, Data mining, Machine Learning
Practical Approximate k Nearest Neighbor Queries with Location and Query Privacy In mobile communication, spatial queries pose a serious threat to user location privacy because the location of a query may reveal sensitive information about the mobile user. In this paper, we study approximate k nearest neighbor (KNN) queries where the mobile user queries the location-based service (LBS) provider about approximate k nearest points of interest (POIs) on the basis of his current location. We propose a basic solution and a generic solution for the mobile user to preserve his location and query privacy in approximate kNN queries. The proposed solutions are mainly built on the Paillier public-key cryptosystem and can provide both location and query privacy. To preserve query privacy, our basic solution allows the mobile user to retrieve…
Read More

QUANTIFYING POLITICAL LEANING FROM TWEETS, RETWEETS, AND RETWEETERS

Artificial Intelligence & ML, Data mining, Machine Learning, MSC IT
QUANTIFYING POLITICAL LEANING FROM TWEETS, RETWEETS, AND RETWEETERS In recent years, big online social media data have found many applications in the intersection of political and computer science. Examples include answering questions in political and social science (e.g., proving/disproving the existence of media bias [3, 30] and the “echo chamber” effect [1, 5]), using online social media to predict election outcomes [46, 31], and personalizing social media feeds so as to provide a fair and balanced view of people’s opinions on controversial issues [36]. A prerequisite for answering the above research questions is the ability to accurately estimate the political leaning of the population involved. If it is not met, either the conclusion will be invalid, the prediction will perform poorly [35, 37] due to a skew towards highly vocal…
Read More

Efficient Algorithms for Mining Top-K High Utility Itemsets

Artificial Intelligence & ML, Data mining, Machine Learning
Efficient Algorithms for Mining Top-K High Utility Itemsets In recent years, shopping online is becoming more and more popular. When it needs to decide whether to purchase a product or not online, the opinions of others become important. It presents a great opportunity to share our viewpoints for various products purchase. However, people face the information overloading problem. How to mine valuable information from reviews to understand a user’s preferences and make an accurate recommendation is crucial. Traditional recommender systems consider some factors, such as user’s purchase records, product category, and geographic location. In this work, it proposes a sentiment-based rating prediction method to improve prediction accuracy in recommender systems. Firstly, it proposes a social user sentimental measurement approach and calculates each user’s sentiment on items. Secondly, it not only…
Read More

Efficient Algorithms for Mining Top-K High Utility Itemsets

Artificial Intelligence & ML, Data mining, Machine Learning
Efficient Algorithms for Mining Top-K High Utility Itemsets Mining high utility itemsets from databases is an emerging topic in data mining, which refers to the discovery of itemsets with utilities higher than a user-specified minimum utility threshold min_util. Although several studies have been carried out on this topic, setting an appropriate minimum utility threshold is a difficult problem for users. If min_util is set too low, too many high utility itemsets will be generated, which may cause the mining algorithms to become inefficient or even run out of memory. On the other hand, if min_util is set too high, no high utility itemset will be found. Setting appropriate minimum utility thresholds by trial and error is a tedious process for users. In this paper, we address this problem by proposing…
Read More

Mining Facets For Queries From Their Search Results

Artificial Intelligence & ML, Data mining, Machine Learning
Mining Facets For Queries From Their Search Results A query facet is a set of items which describe and summarize one important aspect of a query. Here a facet item is typically a word or a phrase. A query may have multiple facets that summarize the information about the query from different perspectives. For the query “watches”, its query facets cover the knowledge about watches in five unique aspects, including brands, gender categories, supporting features, styles, and colors. The query “visit Beijing” has a facet about popular resorts in Beijing (Tiananmen square, forbidden city, summer palace, ...) and a facet on several travel-related topics (attractions, shopping, dining, ...). Query facets provide interesting and useful knowledge about a query and thus can be used to improve search experiences in many ways.…
Read More

Detecting Malicious Facebook Applications

Artificial Intelligence & ML, Machine Learning
Detecting Malicious Facebook Applications With 20 million installs a day, third-party apps are a major reason for the popularity and addictiveness of Facebook. Unfortunately, hackers have realized the potential of using apps for spreading malware and spam. The problem is already significant, as we find that at least 13% of apps in our dataset are malicious. So far, the research community has focused on detecting malicious posts and campaigns. In this paper, we ask the question: Given a Facebook application, can we determine if it is malicious? Our key contribution is in developing FRAppE-Facebook's Rigorous Application Evaluator-arguably the first tool focused on detecting malicious apps on Facebook. To develop FRAppE, we use information gathered by observing the posting behavior of 111K Facebook apps seen across 2.2 million users on Facebook.…
Read More

Sentiment Analysis of Top Colleges in India Using Twitter Data

Artificial Intelligence & ML, Data mining, Machine Learning
Sentiment Analysis of Top Colleges in India Using Twitter Data Social Media has captured the attention of the entire world as it is thundering fast in sending thoughts across the globe, user-friendly and free of cost requiring only a working internet connection. People are extensively using this platform to share their thoughts loud and clear. Twitter is one such well-known micro-blogging site getting around 500 million tweets per day. Each user has a daily limit of 2,400 tweets and 140 characters per tweet. Twitter users post (or ‘tweet’) every day about various subjects like products, services, day to day activities, places, personalities etc. Hence, Twitter data is of great germane as it can be used in various scenarios where companies or brands can utilize a direct connection to almost each…
Read More

Workflow-Based Big Data Analytics in The Cloud Environment

Artificial Intelligence & ML, Machine Learning, MSC IT
Workflow-Based Big Data Analytics in The Cloud Environment Since digital data repositories are more and more massive and distributed, we need smart data analysis techniques and scalable architectures to extract useful information from them in reduced time. Cloud computing infrastructures offer an effective support for addressing both the computational and data storage needs of big data mining applications. In fact, complex data mining tasks involve data- and compute-intensive algorithms that require large and efficient storage facilities together with high-performance processors to get results in acceptable times. In this chapter, we present a Data Mining Cloud Framework designed for developing and executing distributed data analytics applications as workflows of services. In this environment, we use datasets, analysis tools, data mining algorithms and knowledge models that are implemented as single services that can be combined through a visual programming…
Read More

Modeling Urban Behavior by Mining Geotagged Social Data

Artificial Intelligence & ML, Machine Learning
Modeling Urban Behavior by Mining Geotagged Social Data Data generated on location-based social networks provide rich information on the whereabouts of urban dwellers. Specifically, such data reveal who spends time where, when, and on what type of activity (e.g., shopping at a mall, or dining at a restaurant). That information can, in turn, be used to describe city regions in terms of activity that takes place therein. For example, the data might reveal that citizens visit one region mainly for shopping in the morning, while another for dining in the evening. Furthermore, once such a description is available, one can ask more elaborate questions. For example, one might ask what features distinguish one region from another - some regions might be different in terms of the type of venues they…
Read More