Typicality-Based Collaborative Filtering Recommendation

Cloud Computing, Data mining, Security and Encryption
Typicality-Based Collaborative Filtering Recommendation Collaborative filtering (CF) is an important and popular technology for recommender systems. However, current CF methods suffer from such problems as data sparsity, recommendation inaccuracy, and big-error in predictions. In this paper, we borrow ideas of object typicality from cognitive psychology and propose a novel typicality-based collaborative filtering recommendation method named TyCo. A distinct feature of typicality-based CF is that it finds “neighbors” of users based on user typicality degrees in user groups (instead of the corated items of users, or common users of items, as in traditional CF). To the best of our knowledge, there has been no prior work on investigating CF recommendation by combining object typicality. TyCo outperforms many CF recommendation methods on recommendation accuracy (in terms of MAE) with an improvement of…
Read More

Panda: Public Auditing for Shared Data with Efficient User Revocation in the Cloud

Cloud Computing, Data mining, Parallel And Distributed System, Security and Encryption, Web | Desktop Application
Panda: Public Auditing for Shared Data with Efficient User Revocation in the Cloud With data storage and sharing services in the cloud, users can easily modify and share data as a group. To ensure shared data integrity can be verified publicly, users in the group need to compute signatures on all the blocks in shared data. Different blocks in shared data are generally signed by different users due to data modifications performed by different users. For security reasons, once a user is revoked from the group, the blocks which were previously signed by this revoked user must be re-signed by an existing user. The straightforward method, which allows an existing user to download the corresponding part of shared data and re-sign it during user revocation, is inefficient due to the…
Read More

Discovery of Ranking Fraud for Mobile Apps

Data mining
Discovery of Ranking Fraud for Mobile Apps Ranking fraud in the mobile App market refers to fraudulent or deceptive activities which have a purpose of bumping up the Apps in the popularity list. Indeed, it becomes more and more frequent for App develops to use shady means, such as inflating their Apps’ sales or posting phony App ratings, to commit ranking fraud. While the importance of preventing ranking fraud has been widely recognized, there is limited understanding and research in this area. To this end, in this paper, we provide a holistic view of ranking fraud and propose a ranking fraud detection system for mobile Apps. Specifically, we investigate two types of evidences, ranking based evidences and rating based evidences, by modeling Apps’ ranking and rating behaviors through statistical hypotheses…
Read More

A Query Formulation Language for the data web

Data mining
A Query Formulation Language for the data web We present a query formulation language called MashQL in order to easily query and fuse structured data on the web. The main novelty of MashQL is that it allows people with limited IT-skills to explore and query one or multiple data sources without prior knowledge about the schema, structure, vocabulary, or any technical details of these sources. More importantly, to be robust and cover most cases in practice, we do not assume that a data source should have -an offline or inline- schema. This poses several language-design and performance complexities that we fundamentally tackle. To illustrate the query formulation power of MashQL, and without loss of generality, we chose the Data Web scenario. We also chose querying RDF, as it is the…
Read More

Efficient and Discovery of Patterns in Sequence Data Sets.

Data mining, Web | Desktop Application
Efficient and Discovery of Patterns in Sequence Data Sets. Existing sequence mining algorithms mostly focus on mining for subsequences. However, a large class of applications, such as biological DNA and protein motif mining, require efficient mining of “approximate” patterns that are contiguous. The few existing algorithms that can be applied to find such contiguous approximate pattern mining have drawbacks like poor scalability, lack of guarantees in finding the pattern, and difficulty in adapting to other applications. In this paper, we present a new algorithm called Flexible and Accurate Motif DEtector (FLAME). FLAME is a flexible suffix-tree-based algorithm that can be used to find frequent patterns with a variety of definitions of motif (pattern) models. It is also accurate, as it always finds the pattern if it exists. Using both real…
Read More

Mining Web Graphs for Recommendations.

Data mining, Web | Desktop Application
Mining Web Graphs for Recommendations. As the exponential explosion of various contents generated on the Web, Recommendation techniques have become increasingly indispensable. Innumerable different kinds of recommendations are made on the Web every day, including music, images, books recommendations, query suggestions, etc. No matter what types of data sources are used for the recommendations, essentially these data sources can be modeled in the form of graphs. In this paper, aiming at providing a general framework on mining Web graphs for recommendations, (1) we first propose a novel diffusion method which propagates similarities between different recommendations; (2) then we illustrate how to generalize different recommendation problems into our graph diffusion framework. The proposed framework can be utilized in many recommendation tasks on the World Wide Web, including query suggestions, image recommendations,…
Read More

Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques

Data mining, Web | Desktop Application
Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques Recommender systems are becoming increasingly important to individual users and businesses for providing personalized recommendations. However, while the majority of algorithms proposed in recommender systems literature have focused on improving recommendation accuracy, other important aspects of recommendation quality, such as the diversity of recommendations, have often been overlooked. In this paper, we introduce and explore a number of item ranking techniques that can generate recommendations that have substantially higher aggregate diversity across all users while maintaining comparable levels of recommendation accuracy. Comprehensive empirical evaluation consistently shows the diversity gains of the proposed techniques using several real-world rating datasets and different rating prediction algorithms.
Read More

Predicting missing items in shopping cart using fast algorithm

Data mining, Web | Desktop Application
Predicting missing items in shopping cart using fast algorithm Prediction in shopping cart uses partial information about the contents of a shopping cart for the prediction of what else the customer is likely to buy. In order to reduce the rule mining cost, a fast algorithm generating frequent itemsets without generating candidate itemsets is proposed. The algorithm uses Boolean vector with relational AND operation to discover frequent itemsets and generate the association rule. Association rules are used to identify relationships among a set of items in database. Initially Boolean Matrix is generated by transforming the database into Boolean values. The frequent itemsets are generated from the Boolean matrix. Then association rules are to generated from the already generated frequent itemsets. The association rules generated form the basis for prediction. The…
Read More

A Threshold-based Similarity Measure for Duplicate Detection

Data mining, Web | Desktop Application
A Threshold-based Similarity Measure for Duplicate Detection In order to extract beneficial information and recognize a particular pattern from huge data stored in different databases with different formats, data integration is essential. However the problem that arises here is that data integration may lead to duplication. In other words, due to the availability of data in different formats, there might be some records which refer to the same entity. Duplicate detection or record linkage is a technique which is used to detect and match duplicate records which are generated in data integration process. Most approaches concentrated on string similarity measures for comparing records. However, they fail to identify records which share the semantic information. So, in this study, a thresholdbased method which takes into account both string and semantic similarity…
Read More

Efficient Multi-dimensional Fuzzy Search for Personal Information Management Systems

Data mining, Web | Desktop Application
Efficient Multi-dimensional Fuzzy Search for Personal Information Management Systems With the explosion in the amount of semi-structured data users access and store in personal information management systems, there is a critical need for powerful search tools to retrieve often very heterogeneous data in a simple and efficient way. Existing tools typically support some IR-style ranking on the textual part of the query, but only consider structure (e.g., file directory) and metadata (e.g., date, file type) as filtering conditions. We propose a novel multi-dimensional search approach that allows users to perform fuzzy searches for structure and metadata conditions in addition to keyword conditions. Our techniques individually score each dimension and integrate the three dimension scores into a meaningful unified score. We also design indexes and algorithms to efficiently identify the most…
Read More

Enabling Multilevel Trust in Privacy Preserving Data Mining

Data mining, Web | Desktop Application
Enabling Multilevel Trust in Privacy Preserving Data Mining Privacy Preserving Data Mining (PPDM) addresses the problem of developing accurate models about aggregated data without access to precise information in individual data record. A widely studied perturbation-based PPDM approach introduces random perturbation to individual values to preserve privacy before data are published. Previous solutions of this approach are limited in their tacit assumption of single-level trust on data miners. In this work, we relax this assumption and expand the scope of perturbation-based PPDM to Multilevel Trust (MLT-PPDM). In our setting, the more trusted a data miner is, the less perturbed copy of the data it can access. Under this setting, a malicious data miner may have access to differently perturbed copies of the same data through various means, and may combine…
Read More

Slicing A New Approach to Privacy Preserving Data Publishing.

Data mining, Security and Encryption, Web | Desktop Application
Slicing A New Approach to Privacy Preserving Data Publishing. Several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving microdata publishing. Recent work has shown that general- ization loses considerable amount of information, especially for high-dimensional data. Bucketization, on the other hand, does not prevent membership disclosure and does not apply for data that do not have a clear separation between quasi- identifying attributes and sensitive attributes.
Read More

Advance Mining of Temporal High Utility Itemset

Data mining, Web | Desktop Application
Advance Mining of Temporal High Utility Itemset The stock market domain is a dynamic and unpredictable environment. Traditional techniques, such as fundamental and technical analysis can provide investors with some tools for managing their stocks and predicting their prices. However, these techniques cannot discover all the possible relations between stocks and thus there is a need for a different approach that will provide a deeper kind of analysis. Data mining can be used extensively in the financial markets and help in stock-price forecasting. Therefore, we propose in this paper a portfolio management solution with business intelligence characteristics. We know that the temporal high utility itemsets are the itemsets with support larger than a pre-specified threshold in current time window of data stream. Discovery of temporal high utility itemsets is an…
Read More

A Framework for Personal Mobile Commerce Pattern Mining and Prediction

Data mining, Web | Desktop Application
A Framework for Personal Mobile Commerce Pattern Mining and Prediction In many applications, including location based services, queries may not be precise. In this paper, we study the problem of efficiently computing range aggregates in a multidimensional space when the query location is uncertain. Specifically, for a query point Q whose location is uncertain and a set S of points in a multi- dimensional space, we want to calculate the aggregate (e.g., count, average and sum) over the subset S_ of S such that for each p ∈ S_, Q has at least probability θ within the distance γ to p. We propose novel, efficient techniques to solve the problem following the filtering-and-verification paradigm. In particular, two novel filtering techniques are proposed to effectively and efficiently remove data points from…
Read More

Investigation and Analysis of New Approach of Intelligent Semantic Web Search Engines

Data mining, Web | Desktop Application
Investigation and Analysis of New Approach of Intelligent Semantic Web Search Engines As we know that www is allowing peoples to share the huge information from big database repositories. The amount of information grows billions of databases. Hence to search particular information from these huge databases we need specialized mechanism which helps to retrive that information efficiently. now days various types of search engines are available which makes information retrieving is difficult. but to provide the better solution to this proplem ,semantic web search engines are playing vital role.basically main aim of this kind of search engines is providing the required information is small time with maximum accuracy.
Read More

Sequential Anomaly Detection in the Presence of Noise and Limited Feedback

Data mining, Web | Desktop Application
Sequential Anomaly Detection in the Presence of Noise and Limited Feedback This paper describes a methodology for detecting anomalies from sequentially observed and potentially noisy data. The proposed approach consists of two main elements: (1) filtering, or assigning a belief or likelihood to each successive measurement based upon our ability to predict it from previous noisy observations, and (2) hedging, or flagging potential anomalies by comparing the current belief against a time-varying and data-adaptive threshold. The threshold is adjusted based on the available feedback from an end user. Our algorithms, which combine universal prediction with recent work on online convex programming, do not require computing posterior distributions given all current observations and involve simple primal-dual parameter updates. At the heart of the proposed approach lie exponential-family models which can be…
Read More

Clustering Methods in Data Mining with its Applications in High Education

Data mining, Web | Desktop Application
Clustering Methods in Data Mining with its Applications in High Education Data mining is a new technology, developing with database and artificial intelligence. It is a processing procedure of extracting credible, novel, effective and understandable patterns from database. Cluster analysis is an important data mining technique used to find data segmentation and pattern information. By clustering the data, people can obtain the data distribution, observe the character of each cluster, and make further study on particular clusters. In addition, cluster analysis usually acts as the preprocessing of other data mining operations. Therefore, cluster analysis has become a very active research topic in data mining. As the development of data mining, a number of clustering methods have been founded, The study of clustering technique from the perspective of statistics, based on…
Read More

A Novel Algorithm for Automatic Document Clustering

Data mining, Web | Desktop Application
A Novel Algorithm for Automatic Document Clustering Internet has become an indispensible part of today’s life. World Wide Web (WWW) is the largest shared information source. Finding relevant information on the WWW is challenging. To respond to a user query, it is difficult to search through the large number of returned documents with the presence of today’s search engines. There is a need to organize a large set of documents into categories through clustering. The documents can be a user query or simply a collection of documents. Document clustering is the task of combining a set of documents into clusters so that intra cluster documents are similar to each other than inter cluster documents. Partitioning and Hierarchical algorithms are commonly used for document clustering. Existing partitioning algorithms have the limitation…
Read More

Dynamic Personalized Recommendation on Sparse Data

Data mining, Web | Desktop Application
Dynamic Personalized Recommendation on Sparse Data Recommendation techniques are very important in the fields of E-commerce and other Web-based services. One of the main difficulties is dynamically providing high-quality recommendation on sparse data. In this paper, a novel dynamic personalized recommendation algorithm is proposed, in which information contained in both ratings and profile contents are utilized by exploring latent relations between ratings, a set of dynamic features are designed to describe user preferences in multiple phases, and finally a recommendation is made by adaptively weighting the features. Experimental results on public datasets show that the proposed algorithm has satisfying performance.
Read More

Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases

Data mining, Web | Desktop Application
Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases Mining high utility itemsets from a transactional database refers to the discovery of itemsets with high utility like profits. Although a number of relevant algorithms have been proposed in recent years, they incur the problem of producing a large number of candidate itemsets for high utility itemsets. Such a large number of candidate itemsets degrades the mining performance in terms of execution time and space requirement. The situation may become worse when the database contains lots of long transactions or long high utility itemsets. In this paper, we propose two algorithms, namely utility pattern growth (UP-Growth) and UP-Growth+, for mining high utility itemsets with a set of effective strategies for pruning candidate itemsets. The information of high utility itemsets is…
Read More

Sensitive Label Privacy Protection on Social Network Data

Data mining, Web | Desktop Application
Sensitive Label Privacy Protection on Social Network Data This paper is motivated by the recognition of the need for a ner grain and more personalized privacy in data publication of social networks. We propose a privacy protection scheme that not only prevents the disclosure of identity of users but also the disclosure of selected features in users' pro les. An individual user can select which features of her pro le she wishes to conceal. The social networks are modeled as graphs in which users are nodes and features are labels. Labels are denoted either as sensitive or as non-sensitive. We treat node labels both as background knowledge an adversary may possess, and as sensitive information that has to be protected. We present privacy protection algorithms that allow for graph data to be…
Read More

Privacy against Aggregate Knowledge Attacks

Data mining, Web | Desktop Application
Privacy against Aggregate Knowledge Attacks This paper focuses on protecting the privacy of individuals in publication scenarios where the attacker is ex- pected to have only abstract or aggregate knowledge about each record. Whereas, data privacy research usually focuses on defining stricter privacy guarantees that assume increasingly more sophisticated attack scenarios, it is also important to have anonymization methods and guarantees that will address any attack scenario. Enforcing a stricter guarantee than required increases unnecessarily the information loss. Consider for example the publication of tax records, where attackers might only know the total income, and not its con- stituent parts. Traditional anonymization methods would pro- tect user privacy by creating equivalence classes of identical records. Alternatively, in this work we propose an anonymization technique that generalizes attributes, only as much…
Read More

Adapting a Ranking Model for Domain-Specific Search

Data mining, Web | Desktop Application
Adapting a Ranking Model for Domain-Specific Search An adaptation process is described to adapt a ranking model constructed for a broad-based search engine for use with a domain-specific ranking model. It’s difficult to applying the broad-based ranking model directly to different domains due to domain differences, to build a unique ranking model for each domain it time-consuming for training models. In this paper,we address these difficulties by proposing algorithm called ranking adaptation SVM (RA-SVM), Our algorithm only requires the prediction from the existing ranking models, rather than their internal representations or the data from auxiliary domains The ranking model is adapted for use in a search environment focusing on a specific segment of online content, for example, a specific topic, media type, or genre of content. a domain-specific ranking model…
Read More

Efficient Similarity Search over Encrypted Data

Data mining, Web | Desktop Application
Efficient Similarity Search over Encrypted Data amount of data have been stored in the cloud. Although cloud based services offer many advantages, privacy and security of the sensitive data is a big concern. To mitigate the concerns, it is desirable to outsource sensitive data in encrypted form. Encrypted storage protects the data against illegal access, but it complicates some basic, yet important functionality such as the search on the data. To achieve search over encrypted data without compromising the privacy, considerable amount of searchable encryption schemes have been proposed in the literature. However, almost all of them handle exact query matching but not similarity matching; a crucial requirement for real world applications. Although some sophisticated secure multi-party computation based cryptographic techniques are available for similarity tests, they are computationally intensive…
Read More

A Bayesian Approach to Filtering Junk E-Mail

Data mining, Web | Desktop Application
A Bayesian Approach to Filtering Junk E-Mail Abstract In addressing the growing problem of junk E-mail on the Internet, we examine methods for the automated construction of lters to eliminate such unwanted mes- sages from a user's mail stream. By casting this prob- lem in a decision theoretic framework, we are able to make use of probabilistic learning methods in conjunc- tion with a notion of di erential misclassi cation cost to produce lters which are especially appropriate for the nuances of this task. While this may appear, at rst, to be a straight-forward text classi cation prob- lem, we show that by considering domain-speci c fea- tures of this problem in addition to the raw text of E-mail messages, we can produce much more accurate lters. Finally, we show the ecacy of such…
Read More

Sales & Inventory Prediction using Data Mining

Data mining, Web | Desktop Application
Sales & Inventory Prediction using Data Mining Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.
Read More

Hiding Sensitive Association Rule for Privacy Preservation

Data mining, Web | Desktop Application
Hiding Sensitive Association Rule for Privacy Preservation Data mining techniques have been widely used in various applications. However, the misuse of these techniques may lead to the disclosure of sensitive information. Researchers have recently made efforts at hiding sensitive association rules. Nevertheless, undesired side effects, e.g., non sensitive rules falsely hidden and spurious rules falsely generated, may be produced in the rule hiding process. In this paper, we present a novel approach that strategically modifies a few transactions in the transaction database to decrease the supports or confidences of sensitive rules without producing the side effects. Since the correlation among rules can make it impossible to achieve this goal, in this paper, we propose heuristic methods for increasing the number of hidden sensitive rules and reducing the number of modified…
Read More

Effective Pattern Discovery for Text Mining

Data mining, Web | Desktop Application
Effective Pattern Discovery for Text Mining Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase)-based approaches should perform better than the term-based ones, but many experiments do not support this hypothesis. This paper presents an innovative and effective pattern discovery technique which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information.
Read More

Medical Disease diagnosis using Data Mining

Data mining, Web | Desktop Application
Medical Disease diagnosis using Data Mining The healthcare industry collects a huge amount of data which is not properly mined and not put to the optimum use. Discovery of these hidden patterns and relationships often goes unexploited. Our research focuses on this aspect of Medical diagnosis by learning pattern through the collected data of diabetes, hepatitis and heart diseases and to develop intelligent medical decision support systems to help the physicians. In this paper, we propose the use of decision trees C4.5 algorithm, ID3 algorithm and CART algorithm to classify these diseases and compare the effectiveness, correction rate among them.
Read More