Interpreting the Public Sentiment Variations on Twitter

Data mining
Interpreting the Public Sentiment Variations on Twitter Millions of users share their opinions on Twitter, making it a valuable platform for tracking and analyzing public sentiment. Such tracking and analysis can provide critical information for decision making in various domains. Therefore it has attracted attention in both academia and industry. Previous research mainly focused on modeling and tracking public sentiment. In this work, we move one step further to interpret sentiment variations. We observed that emerging topics(named foreground topics) within the sentiment variation periods are highly related to the genuine reasons behind the variations. Based on this observation, we propose a Latent Dirichlet Allocation (LDA) based model, Foreground and Background LDA (FB-LDA), to distill foreground topics and filter out lngstanding background topics. These foreground topics can give potential interpretations of…
Read More

Product Aspect Ranking and Its Applications

Data mining
Product Aspect Ranking and Its Applications Numerous consumer reviews of products are now available on the Internet. Consumer reviews contain rich and valuable knowledge for both firms and users.However,the reviews are often disorganized, leading to difficulties in information navigation and knowledge acquisition. This article proposes a product aspect ranking framework, which automatically identifies the important aspects of products from online consumer reviews, aiming at improving the usability of the numerous reviews. The important product aspects are identified based on two observations: 1) the important aspects are usually commented on by a large number of consumers and 2) consumer opinions on the important aspects greatly influence their overall opinions on the product. In particular, given the consumer reviews of a product, we first identify product aspects by a shallow dependency parser…
Read More

Supporting Privacy Protection in Personalized Web Search

Data mining, Web | Desktop Application
Supporting Privacy Protection in Personalized Web Search Personalized web search (PWS) has demonstrated its effectiveness in improving the quality of various search services on the Internet. However, evidences show that users’ reluctance to disclose their private information during search has become a major barrier for the wide proliferation of PWS. We study privacy protection in PWS applications that model user preferences as hierarchical user profiles. We propose a PWS framework called UPS that can adaptively generalize profiles by queries while respecting user specified privacy requirements. Our runtime generalization aims at striking a balance between two predictive metrics that evaluate the utility of personalization and the privacy risk of exposing the generalized profile. We present two greedy algorithms, namely GreedyDP and GreedyIL, for runtime generalization. We also provide an online prediction…
Read More

Keyword Query Routing

Data mining, Web | Desktop Application
Keyword Query Routing Keyword search is an intuitive paradigm for searching linked data sources on the web. We propose to route keywords only to relevant sources to reduce the high cost of processing keyword search queries over all sources. We propose a novel method for computing top-k routing plans based on their potentials to contain results for a given keyword query. We employ a keyword-element relationship summary that compactly represents relationships between keywords and the data elements mentioning them. A multilevel scoring mechanism is proposed for computing the relevance of routing plans based on scores at the level of keywords, data elements, element sets, and subgraphs that connect these elements. Experiments carried out using 150 publicly available sources on the web showed that valid plans (precision@1 of 0.92) that are…
Read More

Set Predicates in SQL: Enabling Set- Level Comparisons for Dynamically Formed Groups

Data mining, Web | Desktop Application
Set Predicates in SQL: Enabling Set- Level Comparisons for Dynamically Formed Groups In data warehousing and OLAP applications, scalar level predicates in SQL become increasingly inadequate to support a class of operations that require set-level comparison semantics, i.e., comparing a group of tuples with multiple values. Currently, complex SQL queries composed by scalar-level operations are often formed to obtain even very simple set-level semantics. Such queries are not only difficult to write but also challenging for a database engine to optimize, thus can result in costly evaluation. This paper proposes to augment SQL with set predicate, to bring out otherwise obscured set-level semantics. We studied two approaches to processing set predicates—an aggregate function-based approach and a bitmap index-based approach. Moreover, we designed a histogram-based probabilistic method of set predicate selectivity…
Read More

An Empirical Performance Evaluation of Relational Keyword Search Techniques

Data mining
An Empirical Performance Evaluation of Relational Keyword Search Techniques Extending the keyword search paradigm to relational data has been an active area of research within the database and IR community during the past decade. Many approaches have been proposed, but despite numerous publications, there remains a severe lack of standardization for the evaluation of proposed search techniques. Lack of standardization has resulted in contradictory results from different evaluations, and the numerous discrepancies muddle what advantages are proffered by different approaches. In this paper, we present the most extensive empirical performance evaluation of relational keyword search techniques to appear to date in the literature. Our results indicate that many existing search techniques do not provide acceptable performance for realistic retrieval tasks. In particular, memory consumption precludes many search techniques from scaling…
Read More

Facilitating Document Annotation Using Content and Querying Value

Data mining, Web | Desktop Application
Facilitating Document Annotation Using Content and Querying Value A large number of organizations today generate and share textual descriptions of their products, services, and actions. Such collections of textual data contain significant amount of structured information, which remains buried in the unstructured text. While information extraction algorithms facilitate the extraction of structured relations, they are often expensive and inaccurate, especially when operating on top of text that does not contain any instances of the targeted structured information. We present a novel alternative approach that facilitates the generation of the structured metadata by identifying documents that are likely to contain information of interest and this information is going to be subsequently useful for querying the database. Our approach relies on the idea that humans are more likely to add the necessary metadata during creation time, if prompted by the interface; or that it is much easier…
Read More

Context-Based Diversification for Keyword Queries Over XML Data

Data mining, Web | Desktop Application
Context-Based Diversification for Keyword Queries Over XML Data While keyword query empowers ordinary users to search vast amount of data, the ambiguity of keyword query makes it difficult to effectively answer keyword queries, especially for short and vague keyword queries. To address this challenging problem, in this paper we propose an approach that automatically diversifies XML keyword search based on its different contexts in the XML data. Given a short and vague keyword query and XML data to be searched, we first derive keyword search candidates of the query by a simple feature selection model. And then, we design an effective XML keyword search diversification model to measure the quality of each candidate. After that, two efficient algorithms are proposed to incrementally compute top-k qualified query candidates as the diversified search intentions. Two selection criteria are targeted: the k selected query candidates are most relevant to…
Read More

Customizable Pointof- Interest Queries in Road Networks

Data mining, Web | Desktop Application
Customizable Pointof- Interest Queries in Road Networks networks within interactive applications. We show that partition-based algorithms developed for point-topoint shortest path computations can be naturally extended to handle augmented queries such as finding the closest restaurant or the best post office to stop on the way home, always ranking POIs according to a user-defined cost function. Our solution allows different trade-offs between indexing effort (time and space) and query time. Our most flexible variant allows the road network to change frequently (to account for traffic information or personalized cost functions) and the set of POIs to be specified at query time. Even in this fully dynamic scenario, our solution is fast enough for interactive applications on continental road networks.
Read More

Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions

Data mining, Web | Desktop Application
Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions The large number of potential applications from bridging web data with knowledge bases has led to an increase in the entity linking research. Entity linking is the task to link entity mentions in text with their corresponding entities in a knowledge base. Potential applications include information extraction, information retrieval, and knowledge base population. However, this task is challenging due to name variations and entity ambiguity. In this survey, we present a thorough overview and analysis of the main approaches to entity linking, and discuss various applications, the evaluation of entity linking systems, and future directions.
Read More

Tweet Segmentation and Its Application to Named Entity Recognition

Data mining, Web | Desktop Application
Tweet Segmentation and Its Application to Named Entity Recognition Twitter has attracted millions of users to share and disseminate most up-to-date information, resulting in large volumes of data produced everyday. However, many applications in Information Retrieval (IR) and Natural Language Processing (NLP) suffer severely from the noisy and short nature of tweets. In this paper, we propose a novel framework for tweet segmentation in a batch mode, called HybridSeg . By splitting tweets into meaningful segments, the semantic or context information is well preserved and easily extracted by the downstream applications. HybridSeg finds the optimal segmentation of a tweet by maximizing the sum of the stickiness scores of its candidate segments. The stickiness score considers the probability of a segment being a phrase in English (i.e., global context) and the probability of a segment being a phrase within the batch of tweets (i.e., local…
Read More

Co-Extracting Opinion Targets and Opinion Words from Online Reviews Based on the Word Alignment Model

Data mining, Web | Desktop Application
Co-Extracting Opinion Targets and Opinion Words from Online Reviews Based on the Word Alignment Model Mining opinion targets and opinion words from online reviews are important tasks for fine-grained opinion mining, the key component of which involves detecting opinion relations among words. To this end, this paper proposes a novel approach based on the partially supervised alignment model, which regards identifying opinion relations as an alignment process. Then, a graph-based co-ranking algorithm is exploited to estimate the confidence of each candidate. Finally, candidates with higher confidence are extracted as opinion targets or opinion words. Compared to previous methods based on the nearest-neighbor rules, our model captures opinion relations more precisely, especially for long-span relations. Compared to syntaxbased methods, our word alignment model effectively alleviates the negative effects of parsing errors…
Read More

Polarity Consistency Checking for Domain Independent Sentiment Dictionaries

Data mining, Web | Desktop Application
Polarity Consistency Checking for Domain Independent Sentiment Dictionaries Polarity classification of words is important for applications such as Opinion Mining and Sentiment Analysis. A number of sentiment word/sense dictionaries have been manually or (semi)automatically constructed. We notice that these sentiment dictionaries have numerous inaccuracies. Besides obvious instances, where the same word appears with different polarities in different dictionaries, the dictionaries exhibit complex cases of polarity inconsistency, which cannot be detected by mere manual inspection. We introduce the concept of polarity consistency of words/senses in sentiment dictionaries in this paper. We show that the consistency problem is NP-complete. We reduce the polarity consistency problem to the satisfiability problem and utilize two fast SAT solvers to detect inconsistencies in a sentiment dictionary. We perform experiments on five sentiment dictionaries and WordNet to show inter- and intra-dictionaries inconsistencies.
Read More

RRW—A Robust and Reversible Watermarking Technique for Relational Data

Data mining, Web | Desktop Application
RRW—A Robust and Reversible Watermarking Technique for Relational Data Advancement in information technology is playing an increasing role in the use of information systems comprising relational databases. These databases are used effectively in collaborative environments for information extraction; consequently, they are vulnerable to security threats concerning ownership rights and data tampering. Watermarking is advocated to enforce ownership rights over shared relational data and for providing a means for tackling data tampering. When ownership rights are enforced using watermarking, the underlying data undergoes certain modifications; as a result of which, the data quality gets compromised. Reversible watermarking is employed to ensure data quality along-with data recovery. However, such techniques are usually not robust against malicious attacks and do not provide any mechanism to selectively watermark a particular attribute by taking into account its role in knowledge discovery. Therefore, reversible watermarking is required that ensures; (i) watermark encoding and decoding by…
Read More

Product Aspect Ranking and Its Applications

Cloud Computing, Data mining, Security and Encryption, Web | Desktop Application
Product Aspect Ranking and Its Applications Numerous consumer reviews of products are now available on the Internet. Consumer reviews contain rich and valuable knowledge for both firms and users. However, the reviews are often disorganized, leading to difficulties in information navigation and knowledge acquisition. This article proposes a product aspect ranking framework, which automatically identifies the important aspects of products from online consumer reviews, aiming at improving the usability of the numerous reviews. The important product aspects are identified based on two observations: 1) the important aspects are usually commented on by a large number of consumers and 2) consumer opinions on the important aspects greatly influence their overall opinions on the product. In particular, given the consumer reviews of a product, we first identify product aspects by a shallow…
Read More

Typicality-Based Collaborative Filtering Recommendation

Cloud Computing, Data mining, Security and Encryption
Typicality-Based Collaborative Filtering Recommendation Collaborative filtering (CF) is an important and popular technology for recommender systems. However, current CF methods suffer from such problems as data sparsity, recommendation inaccuracy, and big-error in predictions. In this paper, we borrow ideas of object typicality from cognitive psychology and propose a novel typicality-based collaborative filtering recommendation method named TyCo. A distinct feature of typicality-based CF is that it finds “neighbors” of users based on user typicality degrees in user groups (instead of the corated items of users, or common users of items, as in traditional CF). To the best of our knowledge, there has been no prior work on investigating CF recommendation by combining object typicality. TyCo outperforms many CF recommendation methods on recommendation accuracy (in terms of MAE) with an improvement of…
Read More

Panda: Public Auditing for Shared Data with Efficient User Revocation in the Cloud

Cloud Computing, Data mining, Parallel And Distributed System, Security and Encryption, Web | Desktop Application
Panda: Public Auditing for Shared Data with Efficient User Revocation in the Cloud With data storage and sharing services in the cloud, users can easily modify and share data as a group. To ensure shared data integrity can be verified publicly, users in the group need to compute signatures on all the blocks in shared data. Different blocks in shared data are generally signed by different users due to data modifications performed by different users. For security reasons, once a user is revoked from the group, the blocks which were previously signed by this revoked user must be re-signed by an existing user. The straightforward method, which allows an existing user to download the corresponding part of shared data and re-sign it during user revocation, is inefficient due to the…
Read More

Query Aware Determinization of Uncertain Objects

Data mining, Web | Desktop Application
Query Aware Determinization of Uncertain Objects This paper considers the problem of determinizing probabilistic data to enable such data to be stored in legacy systems that accept only deterministic input. Probabilistic data may be generated by automated data analysis/enrichment techniques such as entity resolution, information extraction, and speech processing. The legacy system may correspond to pre-existing web applications such as Flickr, Picasa, etc. The goal is to generate a deterministic representation of probabilistic data that optimizes the quality of the end-application built on deterministic data. We explore such a determinization problem in the context of two different data processing tasks -- triggers and selection queries. We show that approaches such as thresholding or top-1 selection traditionally used for determinization lead to suboptimal performance for such applications. Instead, we develop a…
Read More

Discovery of Ranking Fraud for Mobile Apps

Data mining
Discovery of Ranking Fraud for Mobile Apps Ranking fraud in the mobile App market refers to fraudulent or deceptive activities which have a purpose of bumping up the Apps in the popularity list. Indeed, it becomes more and more frequent for App develops to use shady means, such as inflating their Apps’ sales or posting phony App ratings, to commit ranking fraud. While the importance of preventing ranking fraud has been widely recognized, there is limited understanding and research in this area. To this end, in this paper, we provide a holistic view of ranking fraud and propose a ranking fraud detection system for mobile Apps. Specifically, we investigate two types of evidences, ranking based evidences and rating based evidences, by modeling Apps’ ranking and rating behaviors through statistical hypotheses…
Read More

A Query Formulation Language for the data web

Data mining
A Query Formulation Language for the data web We present a query formulation language called MashQL in order to easily query and fuse structured data on the web. The main novelty of MashQL is that it allows people with limited IT-skills to explore and query one or multiple data sources without prior knowledge about the schema, structure, vocabulary, or any technical details of these sources. More importantly, to be robust and cover most cases in practice, we do not assume that a data source should have -an offline or inline- schema. This poses several language-design and performance complexities that we fundamentally tackle. To illustrate the query formulation power of MashQL, and without loss of generality, we chose the Data Web scenario. We also chose querying RDF, as it is the…
Read More

Efficient and Discovery of Patterns in Sequence Data Sets.

Data mining, Web | Desktop Application
Efficient and Discovery of Patterns in Sequence Data Sets. Existing sequence mining algorithms mostly focus on mining for subsequences. However, a large class of applications, such as biological DNA and protein motif mining, require efficient mining of “approximate” patterns that are contiguous. The few existing algorithms that can be applied to find such contiguous approximate pattern mining have drawbacks like poor scalability, lack of guarantees in finding the pattern, and difficulty in adapting to other applications. In this paper, we present a new algorithm called Flexible and Accurate Motif DEtector (FLAME). FLAME is a flexible suffix-tree-based algorithm that can be used to find frequent patterns with a variety of definitions of motif (pattern) models. It is also accurate, as it always finds the pattern if it exists. Using both real…
Read More

Mining Web Graphs for Recommendations.

Data mining, Web | Desktop Application
Mining Web Graphs for Recommendations. As the exponential explosion of various contents generated on the Web, Recommendation techniques have become increasingly indispensable. Innumerable different kinds of recommendations are made on the Web every day, including music, images, books recommendations, query suggestions, etc. No matter what types of data sources are used for the recommendations, essentially these data sources can be modeled in the form of graphs. In this paper, aiming at providing a general framework on mining Web graphs for recommendations, (1) we first propose a novel diffusion method which propagates similarities between different recommendations; (2) then we illustrate how to generalize different recommendation problems into our graph diffusion framework. The proposed framework can be utilized in many recommendation tasks on the World Wide Web, including query suggestions, image recommendations,…
Read More

Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques

Data mining, Web | Desktop Application
Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques Recommender systems are becoming increasingly important to individual users and businesses for providing personalized recommendations. However, while the majority of algorithms proposed in recommender systems literature have focused on improving recommendation accuracy, other important aspects of recommendation quality, such as the diversity of recommendations, have often been overlooked. In this paper, we introduce and explore a number of item ranking techniques that can generate recommendations that have substantially higher aggregate diversity across all users while maintaining comparable levels of recommendation accuracy. Comprehensive empirical evaluation consistently shows the diversity gains of the proposed techniques using several real-world rating datasets and different rating prediction algorithms.
Read More

Predicting missing items in shopping cart using fast algorithm

Data mining, Web | Desktop Application
Predicting missing items in shopping cart using fast algorithm Prediction in shopping cart uses partial information about the contents of a shopping cart for the prediction of what else the customer is likely to buy. In order to reduce the rule mining cost, a fast algorithm generating frequent itemsets without generating candidate itemsets is proposed. The algorithm uses Boolean vector with relational AND operation to discover frequent itemsets and generate the association rule. Association rules are used to identify relationships among a set of items in database. Initially Boolean Matrix is generated by transforming the database into Boolean values. The frequent itemsets are generated from the Boolean matrix. Then association rules are to generated from the already generated frequent itemsets. The association rules generated form the basis for prediction. The…
Read More

A Threshold-based Similarity Measure for Duplicate Detection

Data mining, Web | Desktop Application
A Threshold-based Similarity Measure for Duplicate Detection In order to extract beneficial information and recognize a particular pattern from huge data stored in different databases with different formats, data integration is essential. However the problem that arises here is that data integration may lead to duplication. In other words, due to the availability of data in different formats, there might be some records which refer to the same entity. Duplicate detection or record linkage is a technique which is used to detect and match duplicate records which are generated in data integration process. Most approaches concentrated on string similarity measures for comparing records. However, they fail to identify records which share the semantic information. So, in this study, a thresholdbased method which takes into account both string and semantic similarity…
Read More

Efficient Multi-dimensional Fuzzy Search for Personal Information Management Systems

Data mining, Web | Desktop Application
Efficient Multi-dimensional Fuzzy Search for Personal Information Management Systems With the explosion in the amount of semi-structured data users access and store in personal information management systems, there is a critical need for powerful search tools to retrieve often very heterogeneous data in a simple and efficient way. Existing tools typically support some IR-style ranking on the textual part of the query, but only consider structure (e.g., file directory) and metadata (e.g., date, file type) as filtering conditions. We propose a novel multi-dimensional search approach that allows users to perform fuzzy searches for structure and metadata conditions in addition to keyword conditions. Our techniques individually score each dimension and integrate the three dimension scores into a meaningful unified score. We also design indexes and algorithms to efficiently identify the most…
Read More

Enabling Multilevel Trust in Privacy Preserving Data Mining

Data mining, Web | Desktop Application
Enabling Multilevel Trust in Privacy Preserving Data Mining Privacy Preserving Data Mining (PPDM) addresses the problem of developing accurate models about aggregated data without access to precise information in individual data record. A widely studied perturbation-based PPDM approach introduces random perturbation to individual values to preserve privacy before data are published. Previous solutions of this approach are limited in their tacit assumption of single-level trust on data miners. In this work, we relax this assumption and expand the scope of perturbation-based PPDM to Multilevel Trust (MLT-PPDM). In our setting, the more trusted a data miner is, the less perturbed copy of the data it can access. Under this setting, a malicious data miner may have access to differently perturbed copies of the same data through various means, and may combine…
Read More

Slicing A New Approach to Privacy Preserving Data Publishing.

Data mining, Security and Encryption, Web | Desktop Application
Slicing A New Approach to Privacy Preserving Data Publishing. Several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving microdata publishing. Recent work has shown that general- ization loses considerable amount of information, especially for high-dimensional data. Bucketization, on the other hand, does not prevent membership disclosure and does not apply for data that do not have a clear separation between quasi- identifying attributes and sensitive attributes.
Read More

Advance Mining of Temporal High Utility Itemset

Data mining, Web | Desktop Application
Advance Mining of Temporal High Utility Itemset The stock market domain is a dynamic and unpredictable environment. Traditional techniques, such as fundamental and technical analysis can provide investors with some tools for managing their stocks and predicting their prices. However, these techniques cannot discover all the possible relations between stocks and thus there is a need for a different approach that will provide a deeper kind of analysis. Data mining can be used extensively in the financial markets and help in stock-price forecasting. Therefore, we propose in this paper a portfolio management solution with business intelligence characteristics. We know that the temporal high utility itemsets are the itemsets with support larger than a pre-specified threshold in current time window of data stream. Discovery of temporal high utility itemsets is an…
Read More

A Framework for Personal Mobile Commerce Pattern Mining and Prediction

Data mining, Web | Desktop Application
A Framework for Personal Mobile Commerce Pattern Mining and Prediction In many applications, including location based services, queries may not be precise. In this paper, we study the problem of efficiently computing range aggregates in a multidimensional space when the query location is uncertain. Specifically, for a query point Q whose location is uncertain and a set S of points in a multi- dimensional space, we want to calculate the aggregate (e.g., count, average and sum) over the subset S_ of S such that for each p ∈ S_, Q has at least probability θ within the distance γ to p. We propose novel, efficient techniques to solve the problem following the filtering-and-verification paradigm. In particular, two novel filtering techniques are proposed to effectively and efficiently remove data points from…
Read More

Investigation and Analysis of New Approach of Intelligent Semantic Web Search Engines

Data mining, Web | Desktop Application
Investigation and Analysis of New Approach of Intelligent Semantic Web Search Engines As we know that www is allowing peoples to share the huge information from big database repositories. The amount of information grows billions of databases. Hence to search particular information from these huge databases we need specialized mechanism which helps to retrive that information efficiently. now days various types of search engines are available which makes information retrieving is difficult. but to provide the better solution to this proplem ,semantic web search engines are playing vital role.basically main aim of this kind of search engines is providing the required information is small time with maximum accuracy.
Read More

Sequential Anomaly Detection in the Presence of Noise and Limited Feedback

Data mining, Web | Desktop Application
Sequential Anomaly Detection in the Presence of Noise and Limited Feedback This paper describes a methodology for detecting anomalies from sequentially observed and potentially noisy data. The proposed approach consists of two main elements: (1) filtering, or assigning a belief or likelihood to each successive measurement based upon our ability to predict it from previous noisy observations, and (2) hedging, or flagging potential anomalies by comparing the current belief against a time-varying and data-adaptive threshold. The threshold is adjusted based on the available feedback from an end user. Our algorithms, which combine universal prediction with recent work on online convex programming, do not require computing posterior distributions given all current observations and involve simple primal-dual parameter updates. At the heart of the proposed approach lie exponential-family models which can be…
Read More

Clustering Methods in Data Mining with its Applications in High Education

Data mining, Web | Desktop Application
Clustering Methods in Data Mining with its Applications in High Education Data mining is a new technology, developing with database and artificial intelligence. It is a processing procedure of extracting credible, novel, effective and understandable patterns from database. Cluster analysis is an important data mining technique used to find data segmentation and pattern information. By clustering the data, people can obtain the data distribution, observe the character of each cluster, and make further study on particular clusters. In addition, cluster analysis usually acts as the preprocessing of other data mining operations. Therefore, cluster analysis has become a very active research topic in data mining. As the development of data mining, a number of clustering methods have been founded, The study of clustering technique from the perspective of statistics, based on…
Read More

A Novel Algorithm for Automatic Document Clustering

Data mining, Web | Desktop Application
A Novel Algorithm for Automatic Document Clustering Internet has become an indispensible part of today’s life. World Wide Web (WWW) is the largest shared information source. Finding relevant information on the WWW is challenging. To respond to a user query, it is difficult to search through the large number of returned documents with the presence of today’s search engines. There is a need to organize a large set of documents into categories through clustering. The documents can be a user query or simply a collection of documents. Document clustering is the task of combining a set of documents into clusters so that intra cluster documents are similar to each other than inter cluster documents. Partitioning and Hierarchical algorithms are commonly used for document clustering. Existing partitioning algorithms have the limitation…
Read More

Dynamic Personalized Recommendation on Sparse Data

Data mining, Web | Desktop Application
Dynamic Personalized Recommendation on Sparse Data Recommendation techniques are very important in the fields of E-commerce and other Web-based services. One of the main difficulties is dynamically providing high-quality recommendation on sparse data. In this paper, a novel dynamic personalized recommendation algorithm is proposed, in which information contained in both ratings and profile contents are utilized by exploring latent relations between ratings, a set of dynamic features are designed to describe user preferences in multiple phases, and finally a recommendation is made by adaptively weighting the features. Experimental results on public datasets show that the proposed algorithm has satisfying performance.
Read More

Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases

Data mining, Web | Desktop Application
Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases Mining high utility itemsets from a transactional database refers to the discovery of itemsets with high utility like profits. Although a number of relevant algorithms have been proposed in recent years, they incur the problem of producing a large number of candidate itemsets for high utility itemsets. Such a large number of candidate itemsets degrades the mining performance in terms of execution time and space requirement. The situation may become worse when the database contains lots of long transactions or long high utility itemsets. In this paper, we propose two algorithms, namely utility pattern growth (UP-Growth) and UP-Growth+, for mining high utility itemsets with a set of effective strategies for pruning candidate itemsets. The information of high utility itemsets is…
Read More

Sensitive Label Privacy Protection on Social Network Data

Data mining, Web | Desktop Application
Sensitive Label Privacy Protection on Social Network Data This paper is motivated by the recognition of the need for a ner grain and more personalized privacy in data publication of social networks. We propose a privacy protection scheme that not only prevents the disclosure of identity of users but also the disclosure of selected features in users' pro les. An individual user can select which features of her pro le she wishes to conceal. The social networks are modeled as graphs in which users are nodes and features are labels. Labels are denoted either as sensitive or as non-sensitive. We treat node labels both as background knowledge an adversary may possess, and as sensitive information that has to be protected. We present privacy protection algorithms that allow for graph data to be…
Read More

Privacy against Aggregate Knowledge Attacks

Data mining, Web | Desktop Application
Privacy against Aggregate Knowledge Attacks This paper focuses on protecting the privacy of individuals in publication scenarios where the attacker is ex- pected to have only abstract or aggregate knowledge about each record. Whereas, data privacy research usually focuses on defining stricter privacy guarantees that assume increasingly more sophisticated attack scenarios, it is also important to have anonymization methods and guarantees that will address any attack scenario. Enforcing a stricter guarantee than required increases unnecessarily the information loss. Consider for example the publication of tax records, where attackers might only know the total income, and not its con- stituent parts. Traditional anonymization methods would pro- tect user privacy by creating equivalence classes of identical records. Alternatively, in this work we propose an anonymization technique that generalizes attributes, only as much…
Read More

Adapting a Ranking Model for Domain-Specific Search

Data mining, Web | Desktop Application
Adapting a Ranking Model for Domain-Specific Search An adaptation process is described to adapt a ranking model constructed for a broad-based search engine for use with a domain-specific ranking model. It’s difficult to applying the broad-based ranking model directly to different domains due to domain differences, to build a unique ranking model for each domain it time-consuming for training models. In this paper,we address these difficulties by proposing algorithm called ranking adaptation SVM (RA-SVM), Our algorithm only requires the prediction from the existing ranking models, rather than their internal representations or the data from auxiliary domains The ranking model is adapted for use in a search environment focusing on a specific segment of online content, for example, a specific topic, media type, or genre of content. a domain-specific ranking model…
Read More

Efficient Similarity Search over Encrypted Data

Data mining, Web | Desktop Application
Efficient Similarity Search over Encrypted Data amount of data have been stored in the cloud. Although cloud based services offer many advantages, privacy and security of the sensitive data is a big concern. To mitigate the concerns, it is desirable to outsource sensitive data in encrypted form. Encrypted storage protects the data against illegal access, but it complicates some basic, yet important functionality such as the search on the data. To achieve search over encrypted data without compromising the privacy, considerable amount of searchable encryption schemes have been proposed in the literature. However, almost all of them handle exact query matching but not similarity matching; a crucial requirement for real world applications. Although some sophisticated secure multi-party computation based cryptographic techniques are available for similarity tests, they are computationally intensive…
Read More

A Bayesian Approach to Filtering Junk E-Mail

Data mining, Web | Desktop Application
A Bayesian Approach to Filtering Junk E-Mail Abstract In addressing the growing problem of junk E-mail on the Internet, we examine methods for the automated construction of lters to eliminate such unwanted mes- sages from a user's mail stream. By casting this prob- lem in a decision theoretic framework, we are able to make use of probabilistic learning methods in conjunc- tion with a notion of di erential misclassi cation cost to produce lters which are especially appropriate for the nuances of this task. While this may appear, at rst, to be a straight-forward text classi cation prob- lem, we show that by considering domain-speci c fea- tures of this problem in addition to the raw text of E-mail messages, we can produce much more accurate lters. Finally, we show the ecacy of such…
Read More

Opinion Mining for web search

Data mining, Web | Desktop Application
Opinion Mining for web search Generally, search engine retrieves the information using Page Rank, Distance vector algorithm, crawling, etc. on the basis of the user’s query. But it may happen that the links retrieved by search engine are may or may not be exactly related to the user’s query and user has to check all the links to know whether the needed information is present in the document or not, it becomes a tedious and time consuming job for the user. Our focus is to cluster different documents based on subjective similarities and dissimilarities. Our proposed tool ‘Web Search Miner’  which is based on the concept of  user opinions mining, which uses k-means search algorithm and distance measure based on Term frequency & web document frequency for mining the search…
Read More

Distributed Association rule mining : Market basket Analysis

Data mining
Distributed Association rule mining : Market basket Analysis Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.
Read More

web usage mining using apriori

Data mining, Web | Desktop Application
web usage mining using apriori The enormous content of information on the World Wide Web makes it obvious candidate for data mining research. Application of data mining techniques to the World Wide Web referred as Web mining where this term has been used in three distinct ways; Web Content Mining, Web Structure Mining and Web Usage Mining. E Learning is one of the Web based application where it will facing with large amount of data. In order to produce the E-Learning  portal usage patterns and user behaviors, this paper implements the high level process of Web Usage Mining using advance Association Rules algorithm  call D-Apriori Algorithm. Web Usage Mining consists of three main phases, namely Data Preprocessing, Pattern Discovering and Pattern Analysis. Server log files become a set of raw…
Read More

Sales & Inventory Prediction using Data Mining

Data mining, Web | Desktop Application
Sales & Inventory Prediction using Data Mining Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.
Read More

Hiding Sensitive Association Rule for Privacy Preservation

Data mining, Web | Desktop Application
Hiding Sensitive Association Rule for Privacy Preservation Data mining techniques have been widely used in various applications. However, the misuse of these techniques may lead to the disclosure of sensitive information. Researchers have recently made efforts at hiding sensitive association rules. Nevertheless, undesired side effects, e.g., non sensitive rules falsely hidden and spurious rules falsely generated, may be produced in the rule hiding process. In this paper, we present a novel approach that strategically modifies a few transactions in the transaction database to decrease the supports or confidences of sensitive rules without producing the side effects. Since the correlation among rules can make it impossible to achieve this goal, in this paper, we propose heuristic methods for increasing the number of hidden sensitive rules and reducing the number of modified…
Read More

Effective Pattern Discovery for Text Mining

Data mining, Web | Desktop Application
Effective Pattern Discovery for Text Mining Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase)-based approaches should perform better than the term-based ones, but many experiments do not support this hypothesis. This paper presents an innovative and effective pattern discovery technique which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information.
Read More

Medical Disease diagnosis using Data Mining

Data mining, Web | Desktop Application
Medical Disease diagnosis using Data Mining The healthcare industry collects a huge amount of data which is not properly mined and not put to the optimum use. Discovery of these hidden patterns and relationships often goes unexploited. Our research focuses on this aspect of Medical diagnosis by learning pattern through the collected data of diabetes, hepatitis and heart diseases and to develop intelligent medical decision support systems to help the physicians. In this paper, we propose the use of decision trees C4.5 algorithm, ID3 algorithm and CART algorithm to classify these diseases and compare the effectiveness, correction rate among them.
Read More