Facilitating Document Annotation Using Content and Querying Value

Data mining, Web | Desktop Application
Facilitating Document Annotation Using Content and Querying Value A large number of organizations today generate and share textual descriptions of their products, services, and actions. Such collections of textual data contain significant amount of structured information, which remains buried in the unstructured text. While information extraction algorithms facilitate the extraction of structured relations, they are often expensive and inaccurate, especially when operating on top of text that does not contain any instances of the targeted structured information. We present a novel alternative approach that facilitates the generation of the structured metadata by identifying documents that are likely to contain information of interest and this information is going to be subsequently useful for querying the database. Our approach relies on the idea that humans are more likely to add the necessary metadata during creation time, if prompted by the interface; or that it is much easier…
Read More

Context-Based Diversification for Keyword Queries Over XML Data

Data mining, Web | Desktop Application
Context-Based Diversification for Keyword Queries Over XML Data While keyword query empowers ordinary users to search vast amount of data, the ambiguity of keyword query makes it difficult to effectively answer keyword queries, especially for short and vague keyword queries. To address this challenging problem, in this paper we propose an approach that automatically diversifies XML keyword search based on its different contexts in the XML data. Given a short and vague keyword query and XML data to be searched, we first derive keyword search candidates of the query by a simple feature selection model. And then, we design an effective XML keyword search diversification model to measure the quality of each candidate. After that, two efficient algorithms are proposed to incrementally compute top-k qualified query candidates as the diversified search intentions. Two selection criteria are targeted: the k selected query candidates are most relevant to…
Read More

Customizable Pointof- Interest Queries in Road Networks

Data mining, Web | Desktop Application
Customizable Pointof- Interest Queries in Road Networks networks within interactive applications. We show that partition-based algorithms developed for point-topoint shortest path computations can be naturally extended to handle augmented queries such as finding the closest restaurant or the best post office to stop on the way home, always ranking POIs according to a user-defined cost function. Our solution allows different trade-offs between indexing effort (time and space) and query time. Our most flexible variant allows the road network to change frequently (to account for traffic information or personalized cost functions) and the set of POIs to be specified at query time. Even in this fully dynamic scenario, our solution is fast enough for interactive applications on continental road networks.
Read More

Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions

Data mining, Web | Desktop Application
Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions The large number of potential applications from bridging web data with knowledge bases has led to an increase in the entity linking research. Entity linking is the task to link entity mentions in text with their corresponding entities in a knowledge base. Potential applications include information extraction, information retrieval, and knowledge base population. However, this task is challenging due to name variations and entity ambiguity. In this survey, we present a thorough overview and analysis of the main approaches to entity linking, and discuss various applications, the evaluation of entity linking systems, and future directions.
Read More

Tweet Segmentation and Its Application to Named Entity Recognition

Data mining, Web | Desktop Application
Tweet Segmentation and Its Application to Named Entity Recognition Twitter has attracted millions of users to share and disseminate most up-to-date information, resulting in large volumes of data produced everyday. However, many applications in Information Retrieval (IR) and Natural Language Processing (NLP) suffer severely from the noisy and short nature of tweets. In this paper, we propose a novel framework for tweet segmentation in a batch mode, called HybridSeg . By splitting tweets into meaningful segments, the semantic or context information is well preserved and easily extracted by the downstream applications. HybridSeg finds the optimal segmentation of a tweet by maximizing the sum of the stickiness scores of its candidate segments. The stickiness score considers the probability of a segment being a phrase in English (i.e., global context) and the probability of a segment being a phrase within the batch of tweets (i.e., local…
Read More

Co-Extracting Opinion Targets and Opinion Words from Online Reviews Based on the Word Alignment Model

Data mining, Web | Desktop Application
Co-Extracting Opinion Targets and Opinion Words from Online Reviews Based on the Word Alignment Model Mining opinion targets and opinion words from online reviews are important tasks for fine-grained opinion mining, the key component of which involves detecting opinion relations among words. To this end, this paper proposes a novel approach based on the partially supervised alignment model, which regards identifying opinion relations as an alignment process. Then, a graph-based co-ranking algorithm is exploited to estimate the confidence of each candidate. Finally, candidates with higher confidence are extracted as opinion targets or opinion words. Compared to previous methods based on the nearest-neighbor rules, our model captures opinion relations more precisely, especially for long-span relations. Compared to syntaxbased methods, our word alignment model effectively alleviates the negative effects of parsing errors…
Read More

Polarity Consistency Checking for Domain Independent Sentiment Dictionaries

Data mining, Web | Desktop Application
Polarity Consistency Checking for Domain Independent Sentiment Dictionaries Polarity classification of words is important for applications such as Opinion Mining and Sentiment Analysis. A number of sentiment word/sense dictionaries have been manually or (semi)automatically constructed. We notice that these sentiment dictionaries have numerous inaccuracies. Besides obvious instances, where the same word appears with different polarities in different dictionaries, the dictionaries exhibit complex cases of polarity inconsistency, which cannot be detected by mere manual inspection. We introduce the concept of polarity consistency of words/senses in sentiment dictionaries in this paper. We show that the consistency problem is NP-complete. We reduce the polarity consistency problem to the satisfiability problem and utilize two fast SAT solvers to detect inconsistencies in a sentiment dictionary. We perform experiments on five sentiment dictionaries and WordNet to show inter- and intra-dictionaries inconsistencies.
Read More

RRW—A Robust and Reversible Watermarking Technique for Relational Data

Data mining, Web | Desktop Application
RRW—A Robust and Reversible Watermarking Technique for Relational Data Advancement in information technology is playing an increasing role in the use of information systems comprising relational databases. These databases are used effectively in collaborative environments for information extraction; consequently, they are vulnerable to security threats concerning ownership rights and data tampering. Watermarking is advocated to enforce ownership rights over shared relational data and for providing a means for tackling data tampering. When ownership rights are enforced using watermarking, the underlying data undergoes certain modifications; as a result of which, the data quality gets compromised. Reversible watermarking is employed to ensure data quality along-with data recovery. However, such techniques are usually not robust against malicious attacks and do not provide any mechanism to selectively watermark a particular attribute by taking into account its role in knowledge discovery. Therefore, reversible watermarking is required that ensures; (i) watermark encoding and decoding by…
Read More

Access Control Mechanisms for Outsourced Data in Cloud

Cloud Computing, Web | Desktop Application
Access Control Mechanisms for Outsourced Data in Cloud Traditional access control models often assume that the en- tity enforcing access control policies is also the owner of data and re- sources. This assumption no longer holds when data is outsourced to a third-party storage provider, such as the cloud. Existing access control solutions mainly focus on preserving con dentiality of stored data from unauthorized access and the storage provider. However, in this setting, access control policies as well as users' access patterns also become pri- vacy sensitive information that should be protected from the cloud. We propose a two-level access control scheme that combines coarse-grained access control enforced at the cloud, which allows to get acceptable com- munication overhead and at the same time limits the information that the cloud learns…
Read More

Building Confidential and Efficient Query Services in the Cloud with RASP Data Perturbation

Cloud Computing, Web | Desktop Application
Building Confidential and Efficient Query Services in the Cloud with RASP Data Perturbation With the wide deployment of public cloud computing infrastructures, using clouds to host data query services has become an appealing solution for the advantages on scalability and cost-saving. However, some data might be sensitive that the data owner does not want to move to the cloud unless the data confidentiality and query privacy are guaranteed. On the other hand, a secured query service should still provide efficient query processing and significantly reduce the in-house workload to fully realize the benefits of cloud computing. We propose the random space perturbation (RASP) data perturbation method to provide secure and efficient range query and kNN query services for protected data in the cloud. The RASP data perturbation method combines order preserving encryption, dimensionality expansion, random…
Read More

A Location- and Diversity-aware News Feed System for Mobile Users

Android Mobile development, Security and Encryption
A Location- and Diversity-aware News Feed System for Mobile Users A location-aware news feed system enables mobile users to share geo-tagged user-generated messages, e.g., a user can receive nearby messages that are the most relevant to her. In this paper, we present MobiFeed that is a framework designed for scheduling news feeds for mobile users. MobiFeed consists of three key functions, location prediction, relevance measure, and news feed scheduler. The location prediction function is designed to predict a mobile user’s locations based on an existing path prediction algorithm. The relevance measure function is implemented by combining the vector space model with non-spatial and spatial factors to determine the relevance of a message to a user. The news feed scheduler works with the other two functions to generate news feeds for…
Read More

Generating Searchable Public-Key Ciphertexts with Hidden Structures for Fast Keyword Search

Security and Encryption, Web | Desktop Application
Generating Searchable Public-Key Ciphertexts with Hidden Structures for Fast Keyword Search Existing semantically secure public-key searchable encryption schemes take search time linear with the total number of the ciphertexts. This makes retrieval from large-scale databases prohibitive. To alleviate this problem, this paper proposes Searchable Public-Key Ciphertexts with Hidden Structures (SPCHS) for keyword search as fast as possible without sacrificing semantic security of the encrypted keywords. In SPCHS, all keyword-searchable ciphertexts are structured by hidden relations, and with the search trapdoor corresponding to a keyword, the minimum information of the relations is disclosed to a search algorithm as the guidance to find all matching ciphertexts efficiently. We construct a SPCHS scheme from scratch in which the ciphertexts have a hidden star-like structure. We prove our scheme to be semantically secure in…
Read More

Panda: Public Auditing for Shared Data with Efficient User Revocation in the Cloud

Cloud Computing, Data mining, Parallel And Distributed System, Security and Encryption, Web | Desktop Application
Panda: Public Auditing for Shared Data with Efficient User Revocation in the Cloud With data storage and sharing services in the cloud, users can easily modify and share data as a group. To ensure shared data integrity can be verified publicly, users in the group need to compute signatures on all the blocks in shared data. Different blocks in shared data are generally signed by different users due to data modifications performed by different users. For security reasons, once a user is revoked from the group, the blocks which were previously signed by this revoked user must be re-signed by an existing user. The straightforward method, which allows an existing user to download the corresponding part of shared data and re-sign it during user revocation, is inefficient due to the…
Read More

Identity-Based Distributed Provable Data Possession in Multicloud Storage

Cloud Computing, Web | Desktop Application
Identity-Based Distributed Provable Data Possession in Multicloud Storage Remote data integrity checking is of crucial importance in cloud storage. It can make the clients verify whether their outsourced data is kept intact without downloading the whole data. In some application scenarios, the clients have to store their data on multi-cloud servers. At the same time, the integrity checking protocol must be efficient in order to save the verifier’s cost. From the two points, we propose a novel remote data integrity checking model: ID-DPDP (identity-based distributed provable data possession) in multi-cloud storage. The formal system model and security model are given. Based on the bilinear pairings, a concrete ID-DPDP protocol is designed. The proposed ID-DPDP protocol is provably secure under the hardness assumption of the standard CDH (computational Diffie-Hellman) problem. In…
Read More

Query Aware Determinization of Uncertain Objects

Data mining, Web | Desktop Application
Query Aware Determinization of Uncertain Objects This paper considers the problem of determinizing probabilistic data to enable such data to be stored in legacy systems that accept only deterministic input. Probabilistic data may be generated by automated data analysis/enrichment techniques such as entity resolution, information extraction, and speech processing. The legacy system may correspond to pre-existing web applications such as Flickr, Picasa, etc. The goal is to generate a deterministic representation of probabilistic data that optimizes the quality of the end-application built on deterministic data. We explore such a determinization problem in the context of two different data processing tasks -- triggers and selection queries. We show that approaches such as thresholding or top-1 selection traditionally used for determinization lead to suboptimal performance for such applications. Instead, we develop a…
Read More

Discovery of Ranking Fraud for Mobile Apps

Data mining
Discovery of Ranking Fraud for Mobile Apps Ranking fraud in the mobile App market refers to fraudulent or deceptive activities which have a purpose of bumping up the Apps in the popularity list. Indeed, it becomes more and more frequent for App develops to use shady means, such as inflating their Apps’ sales or posting phony App ratings, to commit ranking fraud. While the importance of preventing ranking fraud has been widely recognized, there is limited understanding and research in this area. To this end, in this paper, we provide a holistic view of ranking fraud and propose a ranking fraud detection system for mobile Apps. Specifically, we investigate two types of evidences, ranking based evidences and rating based evidences, by modeling Apps’ ranking and rating behaviors through statistical hypotheses…
Read More

Control Cloud Data Access Privilege and Anonymity with Fully Anonymous Attribute-Based Encryption

Cloud Computing, Web | Desktop Application
Control Cloud Data Access Privilege and Anonymity with Fully Anonymous Attribute-Based Encryption Cloud computing is a revolutionary computing paradigm which enables flexible, on-demand and low-cost usage of computing resources, but the data is outsourced to some cloud servers, and various privacy concerns emerge from it. Various schemes based on the Attribute-Based Encryption have been proposed to secure the cloud storage. However, most work focuses on the data contents privacy and the access control, while less attention is paid to the privilege control and the identity privacy. In this paper, we present a semi-anonymous privilege control scheme AnonyControl to address not only the data privacy but also the user identity privacy in existing access control schemes.AnonyControl decentralizes the central authority to limit the identity leakage and thus achieves semi-anonymity. Besides, it…
Read More