Sentiment Based Movie Rating System

Data mining, Networking, Web Application
Name Sentiment Based Movie Rating System Technology MsSql, Dot NET Category Web Application Description We usually come across movie rating websites where users are allowed to rate ad comment on movies online. These ratings are provided as input to the website rating system. The admin then checks reviews, critic’s ratings and displays an online rating for every movie. Here we propose an online system that automatically allows users to post reviews and stores them to rate movies based on user sentiments. The system now analyzes this data to check for user sentiments associated with each comment. Our system consists of a sentiment library designed for English as well as hindi sentiment analysis. The system breaks user comments to check for sentimental keywords and predicts user sentiment associated with it. Once…
Read More

Heart Disease Prediction Project

Data mining, Multimedia, Web Application
Name Heart Disease Prediction Project Technology MsSql, Dot NET Category Web Application Description It might have happened so many times that you or someone yours need doctors help immediately, but they are not available due to some reason. The Heart Disease Prediction application is an end user support and online consultation project. Here, we propose a web application that allows users to get instant guidance on their heart disease through an intelligent system online. The application is fed with various details and the heart disease associated with those details. The application allows user to share their heart related issues. It then processes user specific details to check for various illness that could be associated with it. Here we use some intelligent data mining techniques to guess the most accurate illness…
Read More

Detecting Fraud Apps Using Sentiment Analysis

Data mining, Networking, Web Application
Name Detecting Fraud Apps Using Sentiment Analysis Technology MsSql, Dot NET Category Web Application Description Most of us use android and IOS Mobiles these days and also uses the play store or app store capability normally. Both the stores provide great number of application but unluckily few of those applications are fraud. Such applications dose damage to phone and also may be data thefts. Hence, such applications must be marked, so that they will be identifiable for store users. So we are proposing a web application which will process the information, comments and the review of the application. So it will be easier to decide which application is fraud or not. Multiple application can be processed at a time with the web application. Also User cannot always get correct or…
Read More

Secure Mining of Association Rules in Horizontally Distributed Databases

Data mining
Name Secure Mining of Association Rules in Horizontally Distributed Databases Technology Dot net, MS SQL Category Data Mining Description We propose a protocol for secure mining of association rules in horizontally distributed databases. The current leading protocol is that of Kantarcioglu and Clifton [18]. Our protocol, like theirs, is based on the Fast Distributed Mining (FDM) algorithm of Cheung et al. [8], which is an unsecured distributed version of the Apriori algorithm. The main ingredients in our protocol are two novel secure multi-party algorithms—one that computes the union of private subsets that each of the interacting players hold, and another that tests the inclusion of an element held by one player in a subset held by another. Our protocol offers enhanced privacy with respect to the protocol in [18]. In…
Read More

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth

Data mining
Name Infrequent Weighted Itemset Mining Using Frequent Pattern Growth Technology Dot net, MS SQL Category Data Mining Description Frequent weighted itemsets represent correlations frequently holding in data in which items may weight differently. However, in some contexts, e.g., when the need is to minimize a certain cost function, discovering rare data correlations is more interesting than mining frequent ones. This paper tackles the issue of discovering rare and weighted itemsets, i.e., the infrequent weighted itemset (IWI) mining problem. Two novel quality measures are proposed to drive the IWI mining process. Furthermore, two algorithms that perform IWI and Minimal IWI mining efficiently, driven by the proposed measures, are presented. Experimental results show efficiency and effectiveness of the proposed approach. IEEE Paper Yes IEEE Paper Year 2014
Read More

Interpreting the Public Sentiment Variations on Twitter

Data mining
Name Interpreting the Public Sentiment Variations on Twitter Technology Dot net, MS SQL Category Data Mining Description Millions of users share their opinions on Twitter, making it a valuable platform for tracking and analyzing public sentiment. Such tracking and analysis can provide critical information for decision making in various domains. Therefore it has attracted attention in both academia and industry. Previous research mainly focused on modeling and tracking public sentiment. In this work, we move one step further to interpret sentiment variations. We observed that emerging topics (named foreground topics) within the sentiment variation periods are highly related to the genuine reasons behind the variations. Based on this observation, we propose a Latent Dirichlet Allocation (LDA) based model, Foreground and Background LDA (FB-LDA), to distill foreground topics and filter out…
Read More

Product Aspect Ranking and Its Applications

Data mining
Name Product Aspect Ranking and Its Applications Technology Dot net, MS SQL Category Data Mining Description Numerous consumer reviews of products are now available on the Internet. Consumer reviews contain rich and valuable knowledge for both firms and users. However, the reviews are often disorganized, leading to difficulties in information navigation and knowledge acquisition. This article proposes a product aspect ranking framework, which automatically identifies the important aspects of products from online consumer reviews, aiming at improving the usability of the numerous reviews. The important product aspects are identified based on two observations: 1) the important aspects are usually commented on by a large number of consumers and 2) consumer opinions on the important aspects greatly influence their overall opinions on the product. In particular, given the consumer reviews of…
Read More

Supporting Privacy Protection in Personalized Web Search

Data mining
Name Supporting Privacy Protection in Personalized Web Search Technology Dot net, MS SQL Category Data Mining Description Personalized web search (PWS) has demonstrated its effectiveness in improving the quality of various search services on the Internet. However, evidences show that users’ reluctance to disclose their private information during search has become a major barrier for the wide proliferation of PWS. We study privacy protection in PWS applications that model user preferences as hierarchical user profiles. We propose a PWS framework called UPS that can adaptively generalize profiles by queries while respecting user specified privacy requirements. Our runtime generalization aims at striking a balance between two predictive metrics that evaluate the utility of personalization and the privacy risk of exposing the generalized profile. We present two greedy algorithms, namely GreedyDP and…
Read More

Keyword Query Routing

Data mining
Name Keyword Query Routing Technology Dot net, MS SQL Category Data Mining Description Keyword search is an intuitive paradigm for searching linked data sources on the web. We propose to route keywords only to relevant sources to reduce the high cost of processing keyword search queries over all sources. We propose a novel method for computing top-k routing plans based on their potentials to contain results for a given keyword query. We employ a keyword-element relationship summary that compactly represents relationships between keywords and the data elements mentioning them. A multilevel scoring mechanism is proposed for computing the relevance of routing plans based on scores at the level of keywords, data elements, element sets, and subgraphs that connect these elements. Experiments carried out using 150 publicly available sources on the…
Read More

Set Predicates in SQL: Enabling Set- Level Comparisons for Dynamically Formed Groups

Data mining
Name Set Predicates in SQL: Enabling Set- Level Comparisons for Dynamically Formed Groups Technology Dot net, MS SQL Category Data Mining Description In data warehousing and OLAP applications, scalar level predicates in SQL become increasingly inadequate to support a class of operations that require set-level comparison semantics, i.e., comparing a group of tuples with multiple values. Currently, complex SQL queries composed by scalar-level operations are often formed to obtain even very simple set-level semantics. Such queries are not only difficult to write but also challenging for a database engine to optimize, thus can result in costly evaluation. This paper proposes to augment SQL with set predicate, to bring out otherwise obscured set-level semantics. We studied two approaches to processing set predicates—an aggregate function-based approach and a bitmap index-based approach. Moreover,…
Read More

An Empirical Performance Evaluation of Relational Keyword Search Techniques

Data mining
Name An Empirical Performance Evaluation of Relational Keyword Search Techniques Technology Dot net, MS SQL Category Data Mining Description Extending the keyword search paradigm to relational data has been an active area of research within the database and IR community during the past decade. Many approaches have been proposed, but despite numerous publications, there remains a severe lack of standardization for the evaluation of proposed search techniques. Lack of standardization has resulted in contradictory results from different evaluations, and the numerous discrepancies muddle what advantages are proffered by different approaches. In this paper, we present the most extensive empirical performance evaluation of relational keyword search techniques to appear to date in the literature. Our results indicate that many existing search techniques do not provide acceptable performance for realistic retrieval tasks.…
Read More

Facilitating Document Annotation Using Content and Querying Value

Data mining
Name Facilitating Document Annotation Using Content and Querying Value Technology Dot net, MS SQL Category Data Mining Description A large number of organizations today generate and share textual descriptions of their products, services, and actions. Such collections of textual data contain significant amount of structured information, which remains buried in the unstructured text. While information extraction algorithms facilitate the extraction of structured relations, they are often expensive and inaccurate, especially when operating on top of text that does not contain any instances of the targeted structured information. We present a novel alternative approach that facilitates the generation of the structured metadata by identifying documents that are likely to contain information of interest and this information is going to be subsequently useful for querying the database. Our approach relies on the idea that humans are more likely to add the necessary metadata during creation time,…
Read More

Context-Based Diversification for Keyword Queries Over XML Data

Data mining
Name Context-Based Diversification for Keyword Queries Over XML Data Technology Dot net, MS SQL Category Data Mining Description While keyword query empowers ordinary users to search vast amount of data, the ambiguity of keyword query makes it difficult to effectively answer keyword queries, especially for short and vague keyword queries. To address this challenging problem, in this paper we propose an approach that automatically diversifies XML keyword search based on its different contexts in the XML data. Given a short and vague keyword query and XML data to be searched, we first derive keyword search candidates of the query by a simple feature selection model. And then, we design an effective XML keyword search diversification model to measure the quality of each candidate. After that, two efficient algorithms are proposed to incrementally compute top-k qualified query candidates as the diversified search intentions. Two selection criteria are…
Read More

Customizable Pointof- Interest Queries in Road Networks

Data mining
Name Customizable Pointof- Interest Queries in Road Networks Technology Dot net, MS SQL Category Data Mining Description networks within interactive applications. We show that partition-based algorithms developed for point-topoint shortest path computations can be naturally extended to handle augmented queries such as finding the closest restaurant or the best post office to stop on the way home, always ranking POIs according to a user-defined cost function. Our solution allows different trade-offs between indexing effort (time and space) and query time. Our most flexible variant allows the road network to change frequently (to account for traffic information or personalized cost functions) and the set of POIs to be specified at query time. Even in this fully dynamic scenario, our solution is fast enough for interactive applications on continental road networks. IEEE…
Read More

Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions

Data mining
The large number of potential applications from bridging web data with knowledge bases has led to an increase in the entity linking research. Entity linking is the task to link entity mentions in text with their corresponding entities in a knowledge base. Potential applications include information extraction, information retrieval, and knowledge base population. However, this task is challenging due to name variations and entity ambiguity. In this survey, we present a thorough overview and analysis of the main approaches to entity linking, and discuss various applications, the evaluation of entity linking systems, and future directions.
Read More

Tweet Segmentation and Its Application to Named Entity Recognition

Data mining
Name Tweet Segmentation and Its Application to Named Entity Recognition Technology Dot net, MS SQL Category Data Mining Description Twitter has attracted millions of users to share and disseminate most up-to-date information, resulting in large volumes of data produced everyday. However, many applications in Information Retrieval (IR) and Natural Language Processing (NLP) suffer severely from the noisy and short nature of tweets. In this paper, we propose a novel framework for tweet segmentation in a batch mode, called HybridSeg . By splitting tweets into meaningful segments, the semantic or context information is well preserved and easily extracted by the downstream applications. HybridSeg finds the optimal segmentation of a tweet by maximizing the sum of the stickiness scores of its candidate segments. The stickiness score considers the probability of a segment being a phrase in English (i.e., global context) and the probability of a segment…
Read More

Co-Extracting Opinion Targets and Opinion Words from Online Reviews Based on the Word Alignment Model

Data mining
Name Co-Extracting Opinion Targets and Opinion Words from Online Reviews Based on the Word Alignment Model Technology Dot net, MS SQL Category Data Mining Description Mining opinion targets and opinion words from online reviews are important tasks for fine-grained opinion mining, the key component of which involves detecting opinion relations among words. To this end, this paper proposes a novel approach based on the partially supervised alignment model, which regards identifying opinion relations as an alignment process. Then, a graph-based co-ranking algorithm is exploited to estimate the confidence of each candidate. Finally, candidates with higher confidence are extracted as opinion targets or opinion words. Compared to previous methods based on the nearest-neighbor rules, our model captures opinion relations more precisely, especially for long-span relations. Compared to syntaxbased methods, our word…
Read More

Polarity Consistency Checking for Domain Independent Sentiment Dictionaries

Data mining
Polarity classification of words is important for applications such as Opinion Mining and Sentiment Analysis. A number of sentiment word/sense dictionaries have been manually or (semi)automatically constructed. We notice that these sentiment dictionaries have numerous inaccuracies. Besides obvious instances, where the same word appears with different polarities in different dictionaries, the dictionaries exhibit complex cases of polarity inconsistency, which cannot be detected by mere manual inspection. We introduce the concept of polarity consistency of words/senses in sentiment dictionaries in this paper. We show that the consistency problem is NP-complete. We reduce the polarity consistency problem to the satisfiability problem and utilize two fast SAT solvers to detect inconsistencies in a sentiment dictionary. We perform experiments on five sentiment dictionaries and WordNet to show inter- and intra-dictionaries inconsistencies.
Read More

RRW—A Robust and Reversible Watermarking Technique for Relational Data

Data mining
Name RRW—A Robust and Reversible Watermarking Technique for Relational Data Technology Dot net, MS SQL Category Data Mining Description Advancement in information technology is playing an increasing role in the use of information systems comprising relational databases. These databases are used effectively in collaborative environments for information extraction; consequently, they are vulnerable to security threats concerning ownership rights and data tampering. Watermarking is advocated to enforce ownership rights over shared relational data and for providing a means for tackling data tampering. When ownership rights are enforced using watermarking, the underlying data undergoes certain modifications; as a result of which, the data quality gets compromised. Reversible watermarking is employed to ensure data quality along-with data recovery. However, such techniques are usually not robust against malicious attacks and do not provide any mechanism to selectively watermark a particular attribute by taking into account its role in knowledge discovery. Therefore,…
Read More

Access Control Mechanisms for Outsourced Data in Cloud

Cloud Computing
Name Access Control Mechanisms for Outsourced Data in Cloud Technology Dot net, MS SQL Category Data Mining,Cloud Computing Description Traditional access control models often assume that the en- tity enforcing access control policies is also the owner of data and re- sources. This assumption no longer holds when data is outsourced to a third-party storage provider, such as the cloud. Existing access control solutions mainly focus on preserving con dentiality of stored data from unauthorized access and the storage provider. However, in this setting, access control policies as well as users' access patterns also become pri- vacy sensitive information that should be protected from the cloud. We propose a two-level access control scheme that combines coarse-grained access control enforced at the cloud, which allows to get acceptable com- munication overhead and…
Read More

Building Confidential and Efficient Query Services in the Cloud with RASP Data Perturbation

Cloud Computing
Name Building Confidential and Efficient Query Services in the Cloud with RASP Data Perturbation Technology Dot net, MS SQL Category Cloud Computing Description With the wide deployment of public cloud computing infrastructures, using clouds to host data query services has become an appealing solution for the advantages on scalability and cost-saving. However, some data might be sensitive that the data owner does not want to move to the cloud unless the data confidentiality and query privacy are guaranteed. On the other hand, a secured query service should still provide efficient query processing and significantly reduce the in-house workload to fully realize the benefits of cloud computing. We propose the random space perturbation (RASP) data perturbation method to provide secure and efficient range query and kNN query services for protected data in the cloud. The…
Read More

A Location- and Diversity-aware News Feed System for Mobile Users

Mobile Computing, Security and Encryption
Name A Location- and Diversity-aware News Feed System for Mobile Users Technology Dot net, MS SQL Category Mobile Computing,Security Description A location-aware news feed system enables mobile users to share geo-tagged user-generated messages, e.g., a user can receive nearby messages that are the most relevant to her. In this paper, we present MobiFeed that is a framework designed for scheduling news feeds for mobile users. MobiFeed consists of three key functions, location prediction, relevance measure, and news feed scheduler. The location prediction function is designed to predict a mobile user’s locations based on an existing path prediction algorithm. The relevance measure function is implemented by combining the vector space model with non-spatial and spatial factors to determine the relevance of a message to a user. The news feed scheduler works…
Read More

Generating Searchable Public-Key Ciphertexts with Hidden Structures for Fast Keyword Search

Security and Encryption
Name Generating Searchable Public-Key Ciphertexts with Hidden Structures for Fast Keyword Search Technology Dot net, MS SQL Category Security Description Existing semantically secure public-key searchable encryption schemes take search time linear with the total number of the ciphertexts. This makes retrieval from large-scale databases prohibitive. To alleviate this problem, this paper proposes Searchable Public-Key Ciphertexts with Hidden Structures (SPCHS) for keyword search as fast as possible without sacrificing semantic security of the encrypted keywords. In SPCHS, all keyword-searchable ciphertexts are structured by hidden relations, and with the search trapdoor corresponding to a keyword, the minimum information of the relations is disclosed to a search algorithm as the guidance to find all matching ciphertexts efficiently. We construct a SPCHS scheme from scratch in which the ciphertexts have a hidden star-like structure.…
Read More

Panda: Public Auditing for Shared Data with Efficient User Revocation in the Cloud

Cloud Computing, Data mining, Parallel And Distributed System, Security and Encryption, Web Application
With data storage and sharing services in the cloud, users can easily modify and share data as a group. To ensure shared data integrity can be verified publicly, users in the group need to compute signatures on all the blocks in shared data. Different blocks in shared data are generally signed by different users due to data modifications performed by different users. For security reasons, once a user is revoked from the group, the blocks which were previously signed by this revoked user must be re-signed by an existing user. The straightforward method, which allows an existing user to download the corresponding part of shared data and re-sign it during user revocation, is inefficient due to the large size of shared data in the cloud. In this paper, we propose…
Read More

Identity-Based Distributed Provable Data Possession in Multicloud Storage

Cloud Computing
Remote data integrity checking is of crucial importance in cloud storage. It can make the clients verify whether their outsourced data is kept intact without downloading the whole data. In some application scenarios, the clients have to store their data on multi-cloud servers. At the same time, the integrity checking protocol must be efficient in order to save the verifier’s cost. From the two points, we propose a novel remote data integrity checking model: ID-DPDP (identity-based distributed provable data possession) in multi-cloud storage. The formal system model and security model are given. Based on the bilinear pairings, a concrete ID-DPDP protocol is designed. The proposed ID-DPDP protocol is provably secure under the hardness assumption of the standard CDH (computational Diffie-Hellman) problem. In addition to the structural advantage of elimination of…
Read More

Query Aware Determinization of Uncertain Objects

Data mining
This paper considers the problem of determinizing probabilistic data to enable such data to be stored in legacy systems that accept only deterministic input. Probabilistic data may be generated by automated data analysis/enrichment techniques such as entity resolution, information extraction, and speech processing. The legacy system may correspond to pre-existing web applications such as Flickr, Picasa, etc. The goal is to generate a deterministic representation of probabilistic data that optimizes the quality of the end-application built on deterministic data. We explore such a determinization problem in the context of two different data processing tasks -- triggers and selection queries. We show that approaches such as thresholding or top-1 selection traditionally used for determinization lead to suboptimal performance for such applications. Instead, we develop a query-aware strategy and show its advantages…
Read More

Discovery of Ranking Fraud for Mobile Apps

Data mining
Ranking fraud in the mobile App market refers to fraudulent or deceptive activities which have a purpose of bumping up the Apps in the popularity list. Indeed, it becomes more and more frequent for App develops to use shady means, such as inflating their Apps’ sales or posting phony App ratings, to commit ranking fraud. While the importance of preventing ranking fraud has been widely recognized, there is limited understanding and research in this area. To this end, in this paper, we provide a holistic view of ranking fraud and propose a ranking fraud detection system for mobile Apps. Specifically, we investigate two types of evidences, ranking based evidences and rating based evidences, by modeling Apps’ ranking and rating behaviors through statistical hypotheses tests. In addition, we propose an optimization…
Read More

Control Cloud Data Access Privilege and Anonymity with Fully Anonymous Attribute-Based Encryption

Cloud Computing
Cloud computing is a revolutionary computing paradigm which enables flexible, on-demand and low-cost usage of computing resources, but the data is outsourced to some cloud servers, and various privacy concerns emerge from it. Various schemes based on the Attribute-Based Encryption have been proposed to secure the cloud storage. However, most work focuses on the data contents privacy and the access control, while less attention is paid to the privilege control and the identity privacy. In this paper, we present a semi-anonymous privilege control scheme AnonyControl to address not only the data privacy but also the user identity privacy in existing access control schemes.AnonyControl decentralizes the central authority to limit the identity leakage and thus achieves semi-anonymity. Besides, it also generalizes the file access control to the privilege control, by which…
Read More