Project List

Student Career And Personality Prediction Android Application

Android Mobile development
Student Career And Personality Prediction Android Application As students are going through their academics and pursuing their interested courses, it is very important for them to assess their capabilities and identify their interests so that they will get to know in which career area their interests and capabilities are going to put them in. This will help them in improving their performance and motivating their interests so that they will be directed towards their targeted career and get settled in that. Also recruiters while recruiting the candidates after assessing them in all different aspects, these kind of career recommender systems help them in deciding in which job role the candidate should be kept in based on his/her performance and other evaluations. This paper mainly concentrates on the career area prediction…
Read More

Heart And Diabetes Disease Prediction Using Machine Learning

Machine Learning
Heart And Diabetes Disease Prediction Using Machine Learning The Diabetes disease Heart disease (HD) has been considered as one of the complex and life deadliest human diseases in the world. In Heart disease, usually the heart is unable to push the required amount of blood to other parts of the body to fulfill the normal functionalities of the body, and due to this, ultimately the heart failure occurs. The rate of heart disease in the United States is very high. The symptoms of heart disease include shortness of breath, weakness of physical body, swollen feet, and fatigue with related signs, for example, elevated jugular venous pressure and peripheral edema caused by functional cardiac or noncardiac abnormalities. The investigation techniques in early stages used to identify heart disease were complicated, and…
Read More

Movie Success Prediction Using Machine Learning

Machine Learning
Movie Success Prediction Using Machine Learning A movie revenue depends on various components such as cast acting in a movie, budget for the making of the movie,film critics review, rating for the movie, release year of the movie, etc. Because of these multiple components there is no formula that helps us to provide analysis for predicting how much revenue a particular movie will be generating. However by analysing the revenues generated by previous movies, a model can be built which can help us predict the expected revenue for a particular movie. Such a prediction could be very useful for the movie studios which will be producing the movie so they can decide on different expenses like artist compensations, advertising of the movie, promotions in various cities, etc. accordingly. Plus it…
Read More

Airline Crash Prediction Using Machine Learning

Machine Learning
Airline Crash Prediction Using Machine Learning Abstracting useful information from a big data has always been a challenging task. Data mining is a powerful technology with great potential to extract knowledge based information from such data. Prediction can be done with past and related records in different fields. Risk and safety have always been an important consideration in the field of aircraft. Prediction of accident in aircraft will save life and cost. This paper proposes an accident prediction system with huge collection of past records by applying effective predictive data mining techniques like Decision Tree (DT) and Naive Bayes which have a greater capacity to handle huge and noisy data that are used to predict accidents with more accuracy. The methods used, prove to handle noisy, unrelated and missing data.…
Read More

Rainfall Prediction Using Machine Learning

Machine Learning
Rainfall Prediction Using Machine Learning Heavy rainfall prediction is a major problem for meteorological department as it is closely associated with the economy and life of human. It is a cause for natural disasters like flood and drought which are encountered by people across the globe every year. Accuracy of rainfall forecasting has great importance for countries like India whose economy is largely dependent on agriculture. Due to dynamic nature of atmosphere, Statistical techniques fail to provide good accuracy for rainfall recasting.Nonlinearity of rainfall data makes Artificial Neural Network a better technique. Review work and comparison of different approaches and algorithms used by researchers for rainfall prediction is shown in a tabular form. Intention of this paper is to give non-experts easy access to the techniques and approaches used in…
Read More

Web Based Crime Prediction

Web | Desktop Application
Web Based Crime Prediction Crimes are a common social problem affecting the quality of life and the economic growth of a society. It is considered an essential factor that determines whether or not people move to a new city and what places should be avoided when they travel. With the increase of crimes, law enforcement agencies are continuing to demand advanced geographic information systems and new web application approaches to improve crime analytics and better protect their communities.Although crimes could occur everywhere, it is common that criminals work on crime opportunities they face in most familiar areas for them. By providing a technology approach to determine the most criminal hotspots and find the type, location and time of committed crimes, we hope to raise people’s awareness regarding the dangerous locations…
Read More

Web Based Health Monitoring

Web | Desktop Application
Web Based Health Monitoring This Health Care Management application will be like in an online Healthcare Management service provider with easy to use customizable options. The application is accessible from anywhere for all employees or staff of the hospital in private or at desktops or tablets etc. it will basically lessen the manual work and improves the quality of maintaining records and other information related to doctors or patients or billing etc. It reduces time frame in adding any info related to hospital and thereby reduce the complexity too. The modules developed in this application are: DOCTOR MODULE PATIENT MODULE CHEMIST MODULE DOCTOR MODULE : The doctor will check his schedule and meet the patients as well. He or she can save data related to patient illness, history of the…
Read More

Web Based E-commerce

Web | Desktop Application
Web Based E-commerce The internet has changed many aspects of society, from business to recreation, from culture to communication and technology,as well as shopping and travelling. This new form of communication has provided new ways of doing business with the help of technological development. E-commerce is the new way of shopping and doing business.Technology has allowed companies to promote and sell their products on new markets, overcoming geographical borders as never before.Consumers have access to a wider market of products when they use wireless and internet technologies. Mobile devices with wide access to the Internet have allowed companies to reach consumers in more diverse ways, thus ensuring deep market penetration.The Electronic Commerce or eCommerce is a term for a business or business exchange, that includes the change of data over…
Read More

Online Shopping Android Application

Android Mobile development
Online Shopping Android Application The internet has changed many aspects of society, from business to recreation, from culture to communication and technology,as well as shopping and travelling. This new form of communication has provided new ways of doing business with the help of technological development. E-commerce is the new way of shopping and doing business.Technology has allowed companies to promote and sell their products on new markets, overcoming geographical borders as never before.Consumers have access to a wider market of products when they use wireless and internet technologies. Mobile devices with wide access to the Internet have allowed companies to reach consumers in more diverse ways, thus ensuring deep market penetration.The Electronic Commerce or eCommerce is a term for a business or business exchange, that includes the change of data…
Read More

Android Based Complaint Management System

Android Mobile development
Android Based Complaint Management System In India we don’t have any direct communication between the government and public in an efficient way for solving the problems i.e. for getting a problem solved in our place we have to bribe the officials and get them solved in 2 months which can be solved actually in 1 month of time. In order to overcome this problem We Can develop Android Application through which public can post the petitions or complaints in the site and get them solved in a specified time and can also know the status of the complaint or petition he has lodged at any time.Initially phones were merely used for calling or texting. Now-a-days, the scenario has changed.In today’s world, more focus is given on the availability of the…
Read More

E-Mechanic Finder using Android Application

MSC IT
E-Mechanic Finder using Android Application In our daily life we don’t know when and where we get stuck on the road and we don’t know where we are and we also won’t be able to find the nearest mechanic location. This project targets to develop an android application that will help the user to register through installing the application and can get access to the nearest mechanics location and contact him personally this uses the internet and location permissions to go on with the application. Usually when we stuck on the road we need to ask some people to find the nearest mechanic location and then walk across the road and find it and then go to the place where we got stuck and then we need to get the…
Read More

Personality Prediction System Through CV Analysis

Web | Desktop Application
Personality Prediction System Through CV Analysis The proposed system is two sided: it would be candidate oriented or organization oriented. In the first case the system would recommend the candidate a list of jobs that better fits his skills. In the second scenario, the recruiter would publish the specifications specification and requirements of available job positions and the candidates can apply for the same by submitting their CVs. The existing e-recruitment system simply scans the submitted CVs and shortlist the candidates wherein the proposed system conducts an online aptitude test and personality test thereby predicting the personality of the candidate as well as short-listing the candidate based on his skills and decision-making ability. The objectives of the project are as stated below: • To develop a system to provide a…
Read More

Employee Timesheet Management System

Web | Desktop Application
Employee Timesheet Management System A time sheet management is a method for recording and tracking the amount of an employee's time spent working. The employee time sheet can report total hours worked or time spent working on a specific task or job. Employee time sheets are primarily used for payroll. The hours worked provide a record for time to be paid. In many companies, only non-exempt employees have time sheets. This enables a company to accurately track and pay hours worked according to applicable laws and regulations including Fair Labor Standards Act (FLSA).Used in project management, employee time sheets improve project execution, decision-making and compliance with labor and government regulations. According to one definition, it is a document or a program that tracks the number of hours you work, either…
Read More

Web Based Student Attendance System

Web | Desktop Application
Web Based Student Attendance System The student attendance management system project used to maintain school students attendance records. The attendance project has three user module for run the system Admin, Staff and Student. Initially the system will be blank, The Administrator has a rights to create standard and classroom for school and same time he has to add staff detail. Administrator generates unique username and password for all staff while adding staff detail. All staff maintain attendance of student, generate reports month wise, date wise. The attendance system asp.net project has main three modules. Admin Module Staff Module Student Module Admin Functionalities: Add Standerd Add Division / Classroom Add Staff Manage Complain Leave Reports Manage Reports Staff Functionalities : Add Student Fill Attendance Manage Leave Manage Complain Manage Reports Change…
Read More

Web Based Bus Booking System

Web | Desktop Application
Web Based Bus Booking System In bus reservation system there has been a collection of buses, agent who are booking tickets for customer’s journey which give bus number and departure time of the bus. According to its name it manages the details of all agent, tickets, rental details, and timing details and so on. It also manages the updating of the objects.In the tour detail there is information about bus, who has been taking customers at their destination.This section also contain the details of booking time of the seat(s) or collecting time of the tickets, this section also contain the booking date and the name of agent which is optional, by which the customer can reserve the seats for his journeyIn Bus no category it contains the details of buses…
Read More

Intelligent Tutoring System for Enhancing E-Learning

Web | Desktop Application
Intelligent Tutoring System for Enhancing E-Learning ETutoring system is a process of teaching and learning for students of differing abilities in the same class in a more personalised manner. The intent of ETutoring is to maximize each student's growth and individual success by meeting each student where he or she is, and assisting in the learning process. Many specialists in the domain of teaching have recently identified ETutoring as a method of helping more students in diverse classroom settings experience success. The entire curriculum is designed to increase flexibility in teaching and decrease the barriers that frequently limit student access to materials and learning in classrooms based on student's intellectual knowledge. The system gives students multiple options for taking in information and making sense of ideas thus in turn helping…
Read More

Disease Prediction Android Application using Machine Learning

Android Mobile development, Machine Learning
Disease Prediction Android Application using Machine Learning Disease prediction using patient treatment history and health data by applying data mining and machine learning techniques is ongoing struggle for the past decades. Many works have been applied data mining techniques to pathological data or medical profiles for prediction of specific diseases. These approaches tried to predict the reoccurrence of disease. Also, some approaches try to do prediction on control and progression of disease. The recent success of deep learning in disparate areas of machine learning has driven a shift towards machine learning models that can learn rich, hierarchical representations of raw data with little pre processing and produce more accurate results. With the development of big data technology, more attention has been paid to disease prediction from the perspective of big…
Read More

ARTIFICIAL INTELLIGENT DIETICIAN

MSC IT
ARTIFICIAL INTELLIGENT DIETICIAN Now a days, a human being suffering from many health problems such as fitness problem, maintaining proper diet problem, etc. Therefore we are developing this Android Application for providing special dietician information and proper exercise knowledge for normal persons and for handicap peoples also. The effective personal dietary guidelines are very essential for managing our health, preventing chronic diseases and the interactive diet planning helps a user to adjust the plan in an easier way. The Android Application is to be produced on Artificial Intelligence and Dietician.The user fills the registration form and then login to the Application. After login users have to fill personal information including age, weight, height, gender and exercise level. For calculating BMI age, weight, height, gender and exercise level are necessary. On…
Read More

Data Encryption On Cloud using ECC

Security and Encryption
Data Encryption On Cloud using ECC Elliptical curve cryptography (ECC) is a public key encryption technique based on elliptic curve theory that can be used to create faster, smaller, and more efficient cryptographic keys. ECC generates keys through the properties of the elliptic curve equation instead of the traditional method of generation as the product of very large prime numbers. The technology can be used in conjunction with most public key encryption methods, such as RSA, and Diffie-Hellman. Here in this project we have two entities- User and Admin. The admin will manage users. The admin can upload and share files with a particular user. The uploaded file will be encrypted using AES algorithm. Users can then decode the shared file using the decryption key. https://www.youtube.com/watch?v=uvGcgu9kFsQ
Read More

Book Recommendation System using Collaborative Filtering

Data mining
Book Recommendation System using Collaborative Filtering The online recommendation system has become a trend. Now a days rather than going out and buying items for themselves, reason being, online recommendation provides an easier and quicker way to buy items and transactions are also quick when it is done online. Recommended systems are powerful new technology and it helps users to find items which they want to buy. A recommendation system is broadly used to recommend products to the end users that are most appropriate. Online book selling Web sites now-a-days is competing with each other by considering many attributes. A recommendation system is one of the strongest tools to increase profits and retaining buyer. The existing systems lead to extraction of irrelevant information and lead to lack of user satisfaction.Book…
Read More

Online Voting System

MSC IT, Web | Desktop Application
Online Voting System The purpose of this system is to describe the behavior of an online voting System, named Online Voting System. This system provides an online tool for the clients to vote. In this system there will be two main pages to be able to access: Admin page and Voting Page. From the Admin page administrator will be able to design the voting application. From the Voting page clients will be able to open their election pages and vote for the candidates. According to the login credentials, system will determine if the user is Administrator or the user and open the pages given access to the user credentials. From a technical viewpoint the elections are made up of the following components: • calling of elections, • registration of candidates,…
Read More

Web based Student Performance Analysis

Web | Desktop Application
Web based Student Performance Analysis Students are the main asset for various colleges. colleges and students play an important role in producing graduates of high qualities with its academic performance achievement. Academic performance achievement is the level of achievement of the students educational goal that can be measured and tested through examination, assessments and other form of measurements. However, the academic performance achievement varies as different kind of students may have different level of performance achievement. The student academic performance is usually stored in student management system, in different formats such as files, document, records, images and other formats.which is further useful for scholarship and company placement. https://www.youtube.com/watch?v=x5-_QDzf0h8
Read More

Web based Expense Tracking System

Web | Desktop Application
Web based Expense Tracking System Expense is an integral part of the society. Expense Tracking involves recording and analyzing the incomes and expenses of a person or an organization over a particular period of time. Today, since we are living in a hurry up and get it done society, many people are looking forward to efficient ways to Expense their time and money. During the recent years, some research has been carried out on Expense. It has been noted that in most cases, Expense management is being done mentally and never being put on paper which makes Expense Tracking very difficult. This is probably due to the fact that many people do not know how to do it or do not have an appropriate means that will do Expense Tracking…
Read More

Employee Performance Evaluation using Sentimental Analysis

MSC IT
Employee Performance Evaluation using Sentimental Analysis Departments are required to establish a system of performance evaluations for staff employees that reflect an impartial rating of each staff member’s performance and potential for further advancement.Appraisals can be a positive means to assist the staff member in improving job performance.Appraisals provide a supervisor the opportunity to make known the objectives and goals of the department and the University and to clarify what is expected of the employee to contribute to attainment of these goals. Policy and Procedures Letter 3-0741, ”Performance Evaluation Program for Staff,” establishes guidance in the evaluation of staff employees. Staff performance evaluations should be conducted on a periodic basis (at least annually) and should not reflect personal prejudice, bias, or favouritism on the part of the supervisor for the…
Read More

Real Estate Price Prediction

Data mining
Real Estate Price Prediction House prices increase every year, so there is a need for a system to predict house prices in the future. House price prediction can help the developer determine the selling price of a house and can help the customer to arrange the right time to purchase a house. There are three factors that influence the price of a house which include physical conditions, concept and location. This web application aims to predict real estate prices based on location and amenities in Mumbai city using Linear Regression. The predicted result is represented graphically for better understanding of the customer. https://www.youtube.com/watch?v=gMe4KBJjX6U&t=119s
Read More

Real Estate Booking

MSC IT, Web | Desktop Application
Real Estate Booking Today’s world could be a quick competitive world wherever each second within the business method counts. The presently operating property corporations have a really orthodox and extended procedure for booking of the properties. The clerical staff has to record each and every booking done by them, also maintain a report of the current booked and available properties. They also have to maintain list of cancellations of the bookings done by them previously. This complete process is too time consuming and error prone if done on paper pen, more over with the real estate companies growing larger and larger it is difficult to maintain the list of booked and available property across all offices/ branches. Our Real Estate Booking Software is meant to lessen the work load and…
Read More

Vector-based Sentiment Analysis of Movie Reviews

Artificial Intelligence & ML, Data mining, Machine Learning
Vector-based Sentiment Analysis of Movie Reviews We investigate sentence sentiment using the Pang and Lee dataset as annotated by Socher, et al. [1]. Sentiment analysis research focuses on understanding the positive or negative tone of a sentence based on sentence syntax, structure, and content. Previous research used a tree-based model to label sentence sentiment on a scale of 5 points. Our project takes a different approach of abstracting the sentence as a vector and apply vector classification schemes. We explore two components: first, we would like to analyze the use of different sentence representations, such as bag of words, word sentiment location, negation, etc., and abstract them into a set of features. Second, we would like to classify sentence sentiment using this set of features and compare the effectiveness of…
Read More

Using Tweets for single stock price prediction

Artificial Intelligence & ML, Data mining, Machine Learning
Using Tweets for single stock price prediction Social media, as the collective form of individual opinions and emotions, has very profound though maybe subtle relationship with social events. This is particularly true when it comes to public Tweets and stock trading. In fact, research has shown that when it comes to financial decisions, people are significantly driven by emotions [1]. These emotions, together with people’s opinions, are in real-time reflected by tweets. As a result, by analyzing relevant tweets using proper machine learning algorithms, one could grasp the public’s sentiment as well as attitude towards the stock’s price of interest, which could intuitively predict the next move of it. Some previous work has been done to show that tweets can indeed reflect stock price change. Bollen. Etc (2010) randomly selected…
Read More

Recommendation based on user experiences

Artificial Intelligence & ML, Data mining, Machine Learning
Recommendation based on user experiences Recommender systems follow 2 main strategies: contentbased filtering and collaborative filtering. Collaborative is often the preffered approach as it requires no domain knowledge and no feature gathering effort. The 2 primary methods for collaborative filtering are latent factor models and neighborhood methods. In user-user neighbourhood methods, similarity between users is measured by transforming them into the item space. Similar logic applies to item-item similarity. In latent factor methods, both user and items are transfomed into a latent featuee space. An item is recommended to a user if thu are similar, their vector representation in the latent feature spase is relatively high. We select latent factor model because it allows us to identify the hidden feature of the users. These features are time indepedent. We first…
Read More

Learning To Predict Dental Caries For Preschool Children

Artificial Intelligence & ML, Data mining, Machine Learning
Learning To Predict Dental Caries For Preschool Children Dental caries, or tooth decay/cavity, is a dental disease caused by bacterial infection. Of people from different age groups, preschooler children requires more attention since caries has become the most common chronic childhood diseases. More importantly, a skewed distribution of the diseases has been observed in Europe, US and Singapore among the children or preschoolers, which indicate a small portion of the population endures a big portion of caries incidences. Therefore, there is still the need to improve on the current caries control to identify the high-risk individuals and prevent resurgence in children in developed countries like Singapore. Our project will study on the data such as questionnaire responses, oral examination and biological tests of certain preschoolers from Singapore and use suitable…
Read More

Predicting air pollution level in a specific city

Artificial Intelligence & ML, Data mining, Machine Learning
Predicting air pollution level in a specific city The regulation of air pollutant levels is rapidly becoming one of the most important tasks for the governments of developing countries, especially China. Among the pollutant index, Fine particulate matter (PM2.5) is a significant one because it is a big concern to people's health when its level in the air is relatively high. PM2.5 refers to tiny particles in the air that reduce visibility and cause the air to appear hazy when levels are elevated. However, the relationships between the concentration of these particles and meteorological and traffic factors are poorly understood. To shed some light on these connections, some of these advanced techniques have been introduced into air quality research. These studies utilized selected techniques, such as Support Vector Machine (SVM)…
Read More

Sentiment Analysis on Movie Reviews

Artificial Intelligence & ML, Data mining, Machine Learning
Sentiment Analysis on Movie Reviews Sentiment analysis is a well-known task in the realm of natural language processing. Given a set of texts, the objective is to determine the polarity of that text. [9] provides a comprehensive survey of various methods, benchmarks, and resources of sentiment analysis and opinion mining. The sentiments can consist of different classes. In this study, we consider two cases: 1) A movie review is positive (+) or negative (-). This is similar to [2], where they also employ a novel similarity measure. In [10], authors perform sentiment analysis after summarizing the text. 2) A movie review is very negative (- -), somewhat negative (-), neutral (o), somewhat positive (+), or very positive (+ +). For the first case, we picked a Kaggle1 competition called “Bag…
Read More

Predicting Soccer Results in the English Premier League

Artificial Intelligence & ML, Data mining, Machine Learning
Predicting Soccer Results in the English Premier League There were many displays of genius during the 2010 World Cup, ranging from Andrew Iniesta to Thomas Muller, but none were as unusual as that of Paul the Octopus. This sea dweller correctly chose the winner of a match all eight times that he was tested. This accuracy contrasts sharply with one of our team member’s predictions for the World Cup, who was correct only about half the time. Due to love of the game, and partly from the shame of being outdone by an octopus, we have decided to attempt to predict the outcomes of soccer matches. This has real world applications for gambling, coaching improvements, and journalism. Out of the many leagues we could have chosen, we decided upon the…
Read More

Classifying Online User Behavior Using Contextual Data

Artificial Intelligence & ML, Data mining, Machine Learning
Classifying Online User Behavior Using Contextual Data Despite the great computational power of machines, there a some things like interest-based segregation that only humans can instinctively distinguish. For example, a human can easily tell whether a tweet is about a book or about a kitchen utensil. However, to write a rule-based computer program to solve this task, a programmer must lay down very precise criteria for this these classifications. There has been a massive increase in the amount of structured user-generated content on the Internet in the form of tweets, reviews on Amazon and eBay etc. As opposed to stand-alone companies, which leverage their own hubs of data to run behavioral analytics, we strive to gain insights into online user behavior and interests based on free and public data. By…
Read More

Extracting Word Relationships from Unstructured Data

Artificial Intelligence & ML, Machine Learning
Extracting Word Relationships from Unstructured Data Robots are advancing rapidly in their behavioural functionality allowing them to perform sophisticated tasks. However, their ability to take Natural Language instructions is still in its infancy. Parsing, Semantic Intrepretation and Dialogue Management are typically performed only on a limited set of primitives, thus limiting the set of instructions that could be given to a robot. This limits a robot’s applicability in unconstrained natural environments (like households and offices) [8]. In this project, we are only addressing the problem of semantic interpretation of human instructions. Specifically, our Extracto algorithm provides a method to extract potential actions (verbs) that could be performed given two household objects (nouns). For example, given the nouns “Coffee” and “Cup”, Extracto identifies the action (verb) “pour” indicating that ‘coffee should…
Read More

Bird Species Identification from an Image

Artificial Intelligence & ML, Image Processing, Machine Learning
Predicting ground shaking intensities using DYFI data and estimating event terms to identify induced earthquakes In daily life we can hear a variety of creatures including human speech, dog barks, birdsongs, frog calls, etc. Many animals generate sounds either for communication or as a by product of their living activities such as eating, moving, flying, mating etc. Bird species identification is a well-known problem to ornithologists, and it is considered as a scientific task since antiquity. Technology for Birds and their sounds are in many ways important for our culture. They can be heard even in big cities and most people can recognize at least a few most common species by their sounds. Biologists tried to investigate species richness, presence or absence of indicator species, and the population sizes of…
Read More

Predicting ground shaking intensities using DYFI data and estimating event terms to identify induced earthquakes

Artificial Intelligence & ML, Machine Learning
Predicting ground shaking intensities using DYFI data and estimating event terms to identify induced earthquakes There has been a dramatic increase in seismicity in CEUS in recent years (Ellsworth 2013). There is a possibility that this increased seismicity in CEUS is caused by anthropogenic processes and is referred to as induced or triggered seismicity. The earthquakes are a nuisance for people and some larger magnitude earthquakes have also caused structural damage. Hence, it is important to quantify seismic hazard and risk from this increased seismicity. One of the major components in determining seismic hazard and risk is the expected level of ground shaking at a site. Level of ground shaking from a given earthquake is typically estimated using previously collected ground motion data in a region. However, in CEUS due…
Read More

Identifying Gender From Facial Features

Artificial Intelligence & ML, Image Processing, Machine Learning
Identifying Gender From Facial Features Previous research has shown that our brain has specialized nerve cells responding to specific local features of a scene, such as lines, edges, angles or movement. Our visual cortex combines these scattered pieces of information into useful patterns. Automatic face recognition aims to extract these meaningful pieces of information and put them together into a useful representation in order to perform a classification/identification task on them. While we attempt to identify gender from facial features, we are often curious about what features of the face are most important in determining gender. Are localized features such as eyes, nose and ears more important or overall features such as head shape, hair line and face contour more important? There are a plethora of successful and robut face…
Read More

Analyzing Positional Play in Chess using Machine Learning

Artificial Intelligence & ML, Machine Learning
Analyzing Positional Play in Chess using Machine Learning Chess has two broad approaches to game-play, tactical and positional. Tactical play is the approach of calculating maneuvers and employing tactics that take advantage of short-term opportunities, while positional play is dominated by long-term maneuvers for advantage and requires judgement more than calculations. Current generation chess engines predominantly employ tactical play and thus outplay top human players given their much superior computational abilities. Engines do so by searching game trees of depths typically between 20 and 30 moves and calculating a large number of variations. However, human play is often a combination of both, tactical and positional approaches, since humans have some intuition about which board positions are intrinsically better than others. In our project, we use machine learning to identify elements…
Read More

PREDICTING HOSPITAL READMISSION SIN THE MEDICARE POPULATION

Artificial Intelligence & ML, Data mining, Machine Learning, MSC IT
PREDICTING HOSPITAL READMISSION SIN THE MEDICARE POPULATION Avoidable hospital readmissions cost taxpayers billions of dollars each year. The Medicare Payment Advisory Commission has estimated that almost $12 billion is spent annually by Medicare on potentially preventable readmissions within 30 days of a patient’s discharge from a hospital [1]. The Medicare program has begun to apply financial penalties to hospitals that have excessive risk-adjusted readmission rates. There is much interest in the health policy and medical communities in the ability to accurately predict which patients are at high risk of being readmitted. Not only are there strong financial reasons to avoid readmissions, readmission to the hospital can be a sign of poor clinical care and can indicate a worsening of a patient’s condition [2]. If doctors and nurses were aware of…
Read More

Attribution of Contested and Anonymous Ancient Greek Works

8051 Microcontroller, Artificial Intelligence & ML, Data mining
Attribution of Contested and Anonymous Ancient Greek Works Authorship attribution has been a persistent problem in the Classical genre, as texts that reach us from antiquity are often corrupted, edited, or forged over the thousands of years since their initial production. Scholars have worked on identifying writers’ stylistic differences in an attempt to distinguish genuine texts from fakes, and to attribute an author to previously anonymous works. Increasing computing power allows the derivation of more complex features, giving us new information about each author’s linguistic signature and writing style. Our system is able to accurately predict the author of a complete anonymous work, as well as many text fragments that currently have contested authorship. We experimented with using semantic and lexical features, and explored both discriminative and generative classification algorithms.…
Read More

Object Detection for Semantic SLAM using Convolution Neural Networks

Artificial Intelligence & ML, Data mining, Image Processing, Machine Learning
Object Detection for Semantic SLAM using Convolution Neural Networks Conventional SLAM (Simultaneous Localization and Mapping) systems typically provide odometry estimates and point-cloud reconstructions of an unknown environment. While these outputs can be used for tasks such as autonomous navigation, they lack any semantic information. Our project implements a modular object detection framework that can be used in conjunction with a SLAM engine to generate semantic scene reconstructions. A semantically-augmented reconstruction has many potential applications. Some examples include: • Discriminating between pedestrians, cars, bicyclists, etc in an autonomous driving system. • Loop-closure detection based on object-level descriptors. • Smart household bots that can retrieve objects given a natural language command. An object detection algorithm designed for these applications has a unique set of requirements and constraints. The algorithm needs to be…
Read More

Sentiment as a Predictor of Wikipedia Editor Activity

Artificial Intelligence & ML, Machine Learning
Sentiment as a Predictor of Wikipedia Editor Activity Wikipedia, the worlds largest encyclopedia, is created by millions of unpaid editors online. Every user can edit every article, and the project is protected against vandalism and low-quality contributions only through version control and a system of (again unpaid) reviewers. Somewhat hidden to most casual readers of the encyclopedia, Wikipedia also features a simple social network: every user has a personal user profile and a “user talk page” which acts as a publicly accessible guestbook where users can leave messages to each other. The messages exchanged in user talk pages are often related to a user’s editing behavior. For example, senior users may welcome new users, or congratulate them on their first edits. Administrators may officially warn culprits after transgressions of Wikipedias…
Read More

Application of Neural Network In Handwriting Recognition

Artificial Intelligence & ML, Image Processing, Machine Learning
Application of Neural Network In Handwriting Recognition Handwriting recognition can be divided into two categories, namely on-line and off-line handwriting recognition. On-line recognition involves live transformation of character written by a user on a tablet or a smart phone. In contrast, off-line recognition is more challenging, which requires automatic conversion of scanned image or photos into a computerreadable text format. Motivated by the interesting application of off-line recognition technology, for instance the USPS address recognition system, and the Chase QuickDeposit system, this project will mainly focus on discovering algorithms that allow accurate, fast, and efficient character recognition process. The report will cover data acquisition, image processing, feature extraction, model training, results analysis, and future works. Image Processing Flow   Research Paper Link: Download Paper
Read More

Re-clustering of Constellations through Machine Learning

Artificial Intelligence & ML, Machine Learning
Re-clustering of Constellations through Machine Learning Since thousands of years ago, people around the world have been looking up into the sky, trying to find patterns of visible stars’ distribution, and dividing them into different groups called constellations. Originally, constellations are recognized and organized by people’s imaginations based on the shapes of the star distribution. The most two famous groups of stars is the “Big Dipper” and the “Orion”. In modern astronomy, the International Astronomical Union (IAU) has defined constellations as specific areas of the celestial sphere. These areas have their origins in star patterns from which the constellations take their names. In total, there are 88 officially recognized constellations. On the other hand, certain stars are grouped together primarily because they are close to each other and far away…
Read More

Collaborative Filtering Recommender Systems

Artificial Intelligence & ML, Data mining, Machine Learning
Collaborative Filtering Recommender Systems Collaborative filtering (CF) predicts user preferences in item selection based on the known user ratings of items. As one of the most common approach to recommender systems, CF has been proved to be effective for solving the information overload problem. CF can be divided into two main branches: memory-based and model-based. Most of the present researches improve the accuracy of Memory-based algorithms only by improving the similarity measures. But few researches focused on the prediction score models which we believe are more important than the similarity measures. The most well-known algorithm to model-based is the matrix factorization. Compared to the memory-based algorithms, matrix factorization algorithm generally has higher accuracy. However, the matrix factorization may fall into local optimum in the learning process which leads to inadequate…
Read More

Blowing up the Twittersphere: Predicting the Optimal Time to Tweet

Artificial Intelligence & ML, Data mining, Machine Learning
Blowing up the Twittersphere: Predicting the Optimal Time to Tweet We can separate our problem into a few different steps. First, we need to model information about a tweet and how successful a given tweet is. Second, given a tweet, user, and post time, we must predict how successful that tweet will be. Finally, we then need to use our predictor to determine the optimal time for a given user to post a specific tweet, i.e. what time maximizes our success prediction for a specific user and tweet. We considered two papers that address similar problems of using Machine Learning to understand interactions in social media and predict success of online content. Lakkaruja, McAuley, and Leskovec consider the connections between title, content and community in social media. From their work,…
Read More

Recognition and Classification of Fast Food Images

Artificial Intelligence & ML, Data mining, Machine Learning
Recognition and Classification of Fast Food Images Food recognition is of great importance nowadays for multiple purposes. On one hand, for people who want to get a better understanding of the food that they are not familiar of or they haven’t even seen before, they can simply take a picture and get to know more details about it. On the other hand, the increasing demand for dietary assessment tools to record the calorie and nutrition has also been a driving force in the development of food recognition technique. Therefore, automatic food recognition is very important and has great application potential. However, food varies greatly in appearance (e.g., shape, colors) with tons of different ingredients and assembling methods. This makes food recognition a difficult task for current state-of-the-art classification methods, and…
Read More

Predicting Heart Attacks

Artificial Intelligence & ML, Data mining, Machine Learning
Predicting Heart Attacks In the field of Medical Science, there are a huge amount of data. Data mining techniques are being used to discover hidden pattern form these data. Advance data mining techniques have been developed nowadays. The efficiency of these techniques is compared with sensitivity, specificity, accuracy and error rate. Some well known Data mining classification techniques, Decision Tree, Artificial neural networks, and Support Vector Machine and Naïve Bayes Classifier. In this paper, we introduce a new method based on the fitness value of the attribute to predict the heart disease problem. We use 10 attributes for our proposed method and use simple calculation. In our everyday life, there are several example exit where we have to analyze the historical data, for example, a bank loans officer needs analysis…
Read More

E-Commerce Sales Prediction Using Listing Keywords

Artificial Intelligence & ML, Data mining, Machine Learning
E-Commerce Sales Prediction Using Listing Keywords Small online retailers usually set themselves apart from brick and mortar stores, traditional brand names, and giant online retailers by offering goods at an exceptional value. In addition to price, they compete for shoppers’ attention via descriptive listing titles, whose effectiveness as search keywords can help drive sales. In this study, machine learning techniques will be applied to online retail data to measure the link between keywords and sales volumes. Architecture Research Paper Link: Download Paper
Read More

Prediction and Classification of Cardiac Arrhythmia

Artificial Intelligence & ML, Data mining, Machine Learning
Prediction and Classification of Cardiac Arrhythmia Irregularity in heartbeat may be harmless or life-threatening. Hence both accurate detection of the presence, as well as classification of arrhythmia, are important. Arrhythmia can be diagnosed by measuring the heart activity using an instrument called ECG or electrocardiograph and then analyzing the recorded data. Different parameter values can be extracted from the ECG waveforms and can be used along with other information about the patient like age, medical history, etc to detect arrhythmia. However, sometimes it may be difficult for a doctor to look at these long-duration ECG recordings and find minute irregularities. Therefore, using machine learning for automating arrhythmia diagnosis can be very helpful. The project aims at using different machine learning algorithms like Naive Bayes, SVM, Random Forests and Neural Networks…
Read More

Sentiment Analysis for Hotel Reviews

Artificial Intelligence & ML, Data mining, Machine Learning
Sentiment Analysis for Hotel Reviews Travel planning and hotel booking on the website have become one of an important commercial use. Sharing on the web has become a major tool in expressing customer thoughts about a particular product or Service. Recent years have seen rapid growth in online discussion groups and review sites (e.g.www.tripadvisor.com) where a crucial characteristic of a customer’s review is their sentiment or overall opinion — for example, if the review contains words like ‘great’, ‘best’, ‘nice’, ‘good’, ‘awesome’ is probably a positive comment. Whereas if reviews contain words like ‘bad’, ‘poor’, ‘awful’, ‘worse’ is probably a negative review. However, Trip Advisor’s star rating does not express the exact experience of the customer. Most of the ratings are meaningless, a large chunk of reviews fall in the…
Read More

Mood Detection with Tweets

Artificial Intelligence & ML, Data mining, Machine Learning
Mood Detection with Tweets Emotional states of individuals, also known as moods, are central to the expression of thoughts, ideas, and opinions, and in turn, impact attitudes and behavior. Social media tools like Twitter is increasingly used by individuals to broadcast their day-to-day happenings or to report on an external event of interest, understanding the rich „landscape‟ of moods will help us better to interpret millions of individuals. This paper describes a Rule-Based approach, which detects the emotion or mood of the tweet and classifies the twitter message under the appropriate emotional category. The accuracy with the system is 85%. With the proposed system it is possible to understand the deeper levels of emotions i.e., finer grained instead of sentiment i.e., coarse-grained. The sentiment says whether the tweet is positive…
Read More

3D Scene Retrieval from Text with Semantic Parsing

Artificial Intelligence & ML, Machine Learning
3D Scene Retrieval from Text with Semantic Parsing We look at the task of 3D scene retrieval: given a natural-language description and a set of 3D scenes, identify a scene matching the description. Geometric specifications of 3D scenes are part of the craft of many graphical computing applications, including computer animation, games, and simulators. Large databases of such scenes have become available in recent years as a result of improvements in the ease of use of tools for 3D scene design. A system that can identify a 3D scene from a natural language description is useful for making such databases of scenes readily accessible. Natural language has evolved to be well-suited to describing our (three-dimensional) world, and it provides a convenient way of specifying the space of acceptable scenes: a…
Read More

Parking Occupancy Prediction and Pattern Analysis

Artificial Intelligence & ML, Data mining, Machine Learning
Parking Occupancy Prediction and Pattern Analysis According to the Department of Parking and Traffic, San Francisco has more cars per square mile than any other city in the US [1]. The search for an empty parking spot can become an agonizing experience for the city’s urban drivers. A recent article claims that drivers cruising for a parking spot in SF generate 30% of all downtown congestion [2]. These wasted miles not only increase traffic congestion, but also lead to more pollution and driver anxiety. In order to alleviate this problem, the city armed 7000 metered parking spaces and 12,250 garages spots (total of 593 parking lots) with sensors and introduced a mobile application called SFpark [3], which provides real time information about availability of a parking lot to drivers. However,…
Read More

Stock Trend Prediction with Technical Indicators using SVM

Artificial Intelligence & ML, Machine Learning
Stock Trend Prediction with Technical Indicators using SVM Short-term prediction of stock price trend has potential application for personal investment without high-frequency-trading infrastructure. Unlike predicting market index (as explored by previous years’ projects), a single stock price tends to be affected by large noise and long-term trend inherently converges to the company’s market performance. So this project focuses on short-term (1-10 days) prediction of stock price trend and takes the approach of analyzing the time series indicators as features to classify trend (Raise or Down). The validation model is chosen so that the testing set always follows the training set in the time span to simulate real prediction. Cross-validated Grid Search on parameters of RBF-kernelized SVM is performed to fit the training data to balance the bias and variances. Although…
Read More

Predicting Usefulness of Yelp Reviews

Artificial Intelligence & ML, Data mining, Machine Learning
Predicting the Usefulness of Yelp Reviews The Yelp Dataset Challenge makes a huge set of user, business, and review data publicly available for machine learning projects. They wish to find interesting trends and patterns in all of the data they have accumulated. Our goal is to predict how useful a review will prove to be to users. We can use review upvotes as a metric. This could have immediate applications – many people rely on Yelp to make consumer choices, so predicting the most helpful reviews to display on a page before they have actually been rated would have a serious impact on user experience. Research Paper Link: Download Paper
Read More

Facial Keypoints Detection

Artificial Intelligence & ML, Image Processing
Facial Keypoints Detection Nowadays, facial key points detection has become a very popular topic and its applications include Snapchat, How old are you, have attracted a large number of users. The objective of facial key points detection is to find the facial key points in a given face, which is very challenging due to very different facial features from person to person. The idea of deep learning has been applied to this problem, such as neural network and cascaded neural network. And the results of these structures are significantly better than state-of-the-art methods, like feature extraction and dimension reduction algorithms. In our project, we would like to locate the key points in a given image using deep architectures to not only obtain lower loss for the detection task but also…
Read More

Multiclass Classifier Building with Amazon Data to Classify Customer Reviews into Product Categories

Artificial Intelligence & ML, Data mining, Machine Learning
Multiclass Classifier Building with Amazon Data to Classify Customer Reviews into Product Categories - E-commerce refers to the Electronic Commerce and defined as buying and selling of products over electronic systems such as the Internet. With the widespread use of the Internet, the trade conducted electronically (online) has grown extraordinarily. The E-commerce companies have a large database of products and a number of consumers that use these data. To address this data and information explosion, e-commerce stores are applying machine learning to identify and customize the product category information. Data scientists in this field are utilizing machine learning potential to build unmatched competitiveness in the market by finding purchase preferences, customer churn and product suggestions etc. Applying popular Machine Learning algorithms to huge datasets brought new challenges for the ML…
Read More

An Energy Efficient Seizure Prediction Algorithm

Artificial Intelligence & ML, Machine Learning
An Energy Efficient Seizure Prediction Algorithm Epileptic seizures afflict over 1% of the world’s population. If seizures could be predicted before they occur, fast-acting therapies could be delivered to prevent the attack and restore a normal quality of life to patients. Over the last two decades, several studies have explored the use of EEG signals to predict seizures using principles from machine learning [1]–[3]. It is thought that such an algorithm could be implemented in real-time with a wireless, implanted EEG sensor. However, there are two main constraints for such a portable system. First, due to limited battery life, energy consumption must be minimal. Second, due to limited bandwidth, the data transmitted between the sensor and the central processing device (such as mobile phone, tablet, personal computer, etc.) should be…
Read More

Classifier Comparisons On Credit Approval Prediction

Artificial Intelligence & ML, Machine Learning, MSC IT
Classifier Comparisons On Credit Approval Prediction The objective of this work is to investigate the performance of different classification algorithms using WEKA tool for credit card approval. A major problem in financial analysis is to build an ultimate model that yields fruitful results on certain given information. Neither a single data mining model fulfills all business requirements nor does a business need depend on a single model. Different models must be evaluated to attain the ultimate model. This kind of difficulty could be resolved with the aid of machine learning which could be used directly to obtain the end result with the aid of several artificial intelligent algorithms which perform the role of classifiers. Classification algorithms always find a rule or set of rules to represent data in classes [1].…
Read More

Automatic Number Plate Recognition System

Artificial Intelligence & ML, Machine Learning
Automatic Number Plate Recognition System The Automatic number plate recognition (ANPR) is a mass surveillance method that uses optical character recognition on images to read the license plates on vehicles. They can use existing closed-circuit television or road-rule enforcement cameras, or ones specifically designed for the task. They are used by various police forces and as a metmachinhod of electronic toll collection on pay-per-use roads and monitoring traffic activity, such as red light adherence in an intersection. ANPR can be used to store the images captured by the cameras as well as the text from the license plate, with some configurable to store a photograph of the driver. Systems commonly use infrared lighting to allow the camera to take the picture at any time of the day. A powerful flash…
Read More

Practical Approximate k Nearest Neighbor Queries with Location and Query Privacy

Artificial Intelligence & ML, Data mining, Machine Learning
Practical Approximate k Nearest Neighbor Queries with Location and Query Privacy In mobile communication, spatial queries pose a serious threat to user location privacy because the location of a query may reveal sensitive information about the mobile user. In this paper, we study approximate k nearest neighbor (KNN) queries where the mobile user queries the location-based service (LBS) provider about approximate k nearest points of interest (POIs) on the basis of his current location. We propose a basic solution and a generic solution for the mobile user to preserve his location and query privacy in approximate kNN queries. The proposed solutions are mainly built on the Paillier public-key cryptosystem and can provide both location and query privacy. To preserve query privacy, our basic solution allows the mobile user to retrieve…
Read More

QUANTIFYING POLITICAL LEANING FROM TWEETS, RETWEETS, AND RETWEETERS

Artificial Intelligence & ML, Data mining, Machine Learning, MSC IT
QUANTIFYING POLITICAL LEANING FROM TWEETS, RETWEETS, AND RETWEETERS In recent years, big online social media data have found many applications in the intersection of political and computer science. Examples include answering questions in political and social science (e.g., proving/disproving the existence of media bias [3, 30] and the “echo chamber” effect [1, 5]), using online social media to predict election outcomes [46, 31], and personalizing social media feeds so as to provide a fair and balanced view of people’s opinions on controversial issues [36]. A prerequisite for answering the above research questions is the ability to accurately estimate the political leaning of the population involved. If it is not met, either the conclusion will be invalid, the prediction will perform poorly [35, 37] due to a skew towards highly vocal…
Read More

Efficient Algorithms for Mining Top-K High Utility Itemsets

Artificial Intelligence & ML, Data mining, Machine Learning
Efficient Algorithms for Mining Top-K High Utility Itemsets In recent years, shopping online is becoming more and more popular. When it needs to decide whether to purchase a product or not online, the opinions of others become important. It presents a great opportunity to share our viewpoints for various products purchase. However, people face the information overloading problem. How to mine valuable information from reviews to understand a user’s preferences and make an accurate recommendation is crucial. Traditional recommender systems consider some factors, such as user’s purchase records, product category, and geographic location. In this work, it proposes a sentiment-based rating prediction method to improve prediction accuracy in recommender systems. Firstly, it proposes a social user sentimental measurement approach and calculates each user’s sentiment on items. Secondly, it not only…
Read More

Efficient Algorithms for Mining Top-K High Utility Itemsets

Artificial Intelligence & ML, Data mining, Machine Learning
Efficient Algorithms for Mining Top-K High Utility Itemsets Mining high utility itemsets from databases is an emerging topic in data mining, which refers to the discovery of itemsets with utilities higher than a user-specified minimum utility threshold min_util. Although several studies have been carried out on this topic, setting an appropriate minimum utility threshold is a difficult problem for users. If min_util is set too low, too many high utility itemsets will be generated, which may cause the mining algorithms to become inefficient or even run out of memory. On the other hand, if min_util is set too high, no high utility itemset will be found. Setting appropriate minimum utility thresholds by trial and error is a tedious process for users. In this paper, we address this problem by proposing…
Read More

Crowd sourcing for Top-K Query Processing over Uncertain Data

Artificial Intelligence & ML, Data mining
Crowdsourcing for Top-K Query Processing over Uncertain Data Querying uncertain data has become a prominent application due to the proliferation of user-generated content from social media and of data streams from sensors. When data ambiguity cannot be reduced algorithmically, crowdsourcing proves a viable approach, which consists in posting tasks to humans and harnessing their judgment for improving the confidence about data values or relationships. This paper tackles the problem of processing top-K queries over uncertain data with the help of crowdsourcing to quickly converging to the real ordering of relevant results. Several offline and online approaches for addressing questions to a crowd are defined and contrasted on both synthetic and real datasets, with the aim of minimizing the crowd interactions necessary to find the real ordering of the result set.…
Read More

Cyberbullying Detection based on Semantic-Enhanced Marginalized Denoising Auto-Encoder

Artificial Intelligence & ML, Data mining
Cyberbullying Detection based on Semantic-Enhanced Marginalized Denoising Auto-Encoder As a side effect of increasingly popular social media, cyberbullying has emerged as a serious problem afflicting children, adolescents, and young adults. Machine learning techniques make automatic detection of bullying messages in social media possible, and this could help to construct a healthy and safe social media environment. In this meaningful research area, one critical issue is robust and discriminative numerical representation learning of text messages. In this paper, we propose a new representation learning method to tackle this problem. Our method named semantic-enhanced marginalized denoising auto-encoder (smSDA) is developed via a semantic extension of the popular deep learning model stacked denoising autoencoder (SDA). The semantic extension consists of semantic dropout noise and sparsity constraints, where the semantic dropout noise is designed…
Read More

Mining Facets For Queries From Their Search Results

Artificial Intelligence & ML, Data mining, Machine Learning
Mining Facets For Queries From Their Search Results A query facet is a set of items which describe and summarize one important aspect of a query. Here a facet item is typically a word or a phrase. A query may have multiple facets that summarize the information about the query from different perspectives. For the query “watches”, its query facets cover the knowledge about watches in five unique aspects, including brands, gender categories, supporting features, styles, and colors. The query “visit Beijing” has a facet about popular resorts in Beijing (Tiananmen square, forbidden city, summer palace, ...) and a facet on several travel-related topics (attractions, shopping, dining, ...). Query facets provide interesting and useful knowledge about a query and thus can be used to improve search experiences in many ways.…
Read More

Detecting Malicious Facebook Applications

Artificial Intelligence & ML, Machine Learning
Detecting Malicious Facebook Applications With 20 million installs a day, third-party apps are a major reason for the popularity and addictiveness of Facebook. Unfortunately, hackers have realized the potential of using apps for spreading malware and spam. The problem is already significant, as we find that at least 13% of apps in our dataset are malicious. So far, the research community has focused on detecting malicious posts and campaigns. In this paper, we ask the question: Given a Facebook application, can we determine if it is malicious? Our key contribution is in developing FRAppE-Facebook's Rigorous Application Evaluator-arguably the first tool focused on detecting malicious apps on Facebook. To develop FRAppE, we use information gathered by observing the posting behavior of 111K Facebook apps seen across 2.2 million users on Facebook.…
Read More

Sentiment Analysis of Top Colleges in India Using Twitter Data

Artificial Intelligence & ML, Data mining, Machine Learning
Sentiment Analysis of Top Colleges in India Using Twitter Data Social Media has captured the attention of the entire world as it is thundering fast in sending thoughts across the globe, user-friendly and free of cost requiring only a working internet connection. People are extensively using this platform to share their thoughts loud and clear. Twitter is one such well-known micro-blogging site getting around 500 million tweets per day. Each user has a daily limit of 2,400 tweets and 140 characters per tweet. Twitter users post (or ‘tweet’) every day about various subjects like products, services, day to day activities, places, personalities etc. Hence, Twitter data is of great germane as it can be used in various scenarios where companies or brands can utilize a direct connection to almost each…
Read More

FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce

Artificial Intelligence & ML, Data mining, Hadoop
FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce Data mining is a process of discovering the pattern from the huge amount of data. There are many data mining technics like clustering, classification and association rule. The most popular one is the association rule that is divided into two parts i) generating the frequent itemset ii) generating association rule from all itemsets. Frequent itemset mining (FIM) is the core problem in the association rule mining. Sequential FIM algorithm suffers from performance deterioration when it operated on a huge amount of data on a single machine.to address this problem parallel FIM algorithms were proposed. There are two types of algorithms that can be used for mining the frequent itemsets first method is the candidate-itemset generation approach and without candidate itemset generation algorithm.…
Read More

Workflow-Based Big Data Analytics in The Cloud Environment

Artificial Intelligence & ML, Machine Learning, MSC IT
Workflow-Based Big Data Analytics in The Cloud Environment Since digital data repositories are more and more massive and distributed, we need smart data analysis techniques and scalable architectures to extract useful information from them in reduced time. Cloud computing infrastructures offer an effective support for addressing both the computational and data storage needs of big data mining applications. In fact, complex data mining tasks involve data- and compute-intensive algorithms that require large and efficient storage facilities together with high-performance processors to get results in acceptable times. In this chapter, we present a Data Mining Cloud Framework designed for developing and executing distributed data analytics applications as workflows of services. In this environment, we use datasets, analysis tools, data mining algorithms and knowledge models that are implemented as single services that can be combined through a visual programming…
Read More

Modeling Urban Behavior by Mining Geotagged Social Data

Artificial Intelligence & ML, Machine Learning
Modeling Urban Behavior by Mining Geotagged Social Data Data generated on location-based social networks provide rich information on the whereabouts of urban dwellers. Specifically, such data reveal who spends time where, when, and on what type of activity (e.g., shopping at a mall, or dining at a restaurant). That information can, in turn, be used to describe city regions in terms of activity that takes place therein. For example, the data might reveal that citizens visit one region mainly for shopping in the morning, while another for dining in the evening. Furthermore, once such a description is available, one can ask more elaborate questions. For example, one might ask what features distinguish one region from another - some regions might be different in terms of the type of venues they…
Read More

ClubCF: A Clustering-based Collaborative Filtering Approach for Big Data Application

Hadoop
"ClubCF: A Clustering-based Collaborative Filtering Approach for Big Data Application Spurred by service computing and cloud computing, an increasing number of services are emerging on the Internet. As a result, service-relevant data become too big to be effectively processed by traditional approaches. In view of this challenge, a Clustering-based Collaborative Filtering approach (ClubCF) is proposed in this paper, which aims at recruiting similar services in the same clusters to recommend services collaboratively. Technically, this approach is enacted around two stages. In the first stage, the available services are divided into small-scale clusters, in logic, for further processing. At the second stage, a collaborative filtering algorithm is imposed on one of the clusters. Since the number of the services in a cluster is much less than the total number of the…
Read More

Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework

Hadoop
"Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework The buzz-word big-data refers to the large-scale distributed data processing applications that operate on exceptionally large amounts of data. Google’s MapReduce and Apache’s Hadoop, its open-source implementation, are the defacto software systems for big-data applications. An observation of the MapReduce framework is that the framework generates a large amount of intermediate data. Such abundant information is thrown away after the tasks finish, because MapReduce is unable to utilize them. In this paper, we propose Dache, a data-aware cache framework for big-data applications. In Dache, tasks submit their intermediate results to the cache manager. A task queries the cache manager before executing the actual computing work. A novel cache description scheme and a cache request and reply protocol are…
Read More

Cost Minimization for Big Data Processing in Geo-Distributed Data Centers

Hadoop
"Cost Minimization for Big Data Processing in Geo-Distributed Data Centers The explosive growth of demands on big data processing imposes a heavy burden on computation, storage, and communication in data centers, which hence incurs considerable operational expenditure to data center providers. Therefore, cost minimization has become an emergent issue for the upcoming big data era. Different from conventional cloud services, one of the main features of big data services is the tight coupling between data and computation as computation tasks can be conducted only when the corresponding data is available. As a result, three factors, i.e., task assignment, data placement and data movement, deeply influence the operational expenditure of data centers. In this paper, we are motivated to study the cost minimization problem via a joint optimization of these three…
Read More

KASR: A Keyword-Aware Service Recommendation Method on MapReduce for Big Data

Hadoop
"KASR: A Keyword-Aware Service Recommendation Method on MapReduce for Big Data Applications Service recommender systems have been shown as valuable tools for providing appropriate recommendations to users. In the last decade, the amount of customers, services and online information has grown rapidly, yielding the big data analysis problem for service recommender systems. Consequently, traditional service recommender systems often suffer from scalability and inefficien-cy problems when processing or analysing such large-scale data. Moreover, most of existing service recommender systems present the same ratings and rankings of services to different users without considering diverse users' preferences, and therefore fails to meet users' personalized requirements. In this paper, we propose a Keyword-Aware Service Recommendation method, named KASR, to address the above challenges. It aims at presenting a personalized service recommendation list and recommending…
Read More

Authorized Public Auditing of Dynamic Big Data Storage on Cloud with Efficient Verifiable Fine-grained Updates

Hadoop
"Authorized Public Auditing of Dynamic Big Data Storage on Cloud with Efficient Verifiable Fine-grained Updates Cloud computing opens a new era in IT as it can provide various elastic and scalable IT services in a pay-as-you-go fashion, where its users can reduce the huge capital investments in their own IT infrastructure. In this philosophy, users of cloud storage services no longer physically maintain direct control over their data, which makes data security one of the major concerns of using cloud. Existing research work already allows data integrity to be verified without possession of the actual data file. When the verification is done by a trusted third party, this verification process is also called data auditing, and this third party is called an auditor. However, such schemes in existence suffer from…
Read More

Privacy Preserving Data Analytics for Smart Homes

Hadoop
"Privacy Preserving Data Analytics for Smart Homes A framework for maintaining security & preserving privacy for analysis of sensor data from smart homes, without compromising on data utility is presented. Storing the personally identifiable data as hashed values withholds identifiable information from any computing nodes. However the very nature of smart home data analytics is establishing preventive care. Data processing results should be identifiable to certain users responsible for direct care. Through a separate encrypted identifier dictionary with hashed and actual values of all unique sets of identifiers, we suggest re-identification of any data processing results. However the level of re-identification needs to be controlled, depending on the type of user accessing the results. Generalization and suppression on identifiers from the identifier dictionary before re-introduction could achieve different levels of…
Read More

MRPrePost-A parallel algorithm adapted for mining big data

Hadoop
"MRPrePost-A parallel algorithm adapted for mining big data With the explosive growth in data, using data mining techniques to mine association rules, and then to find valuable information hidden in big data has become increasingly important. Various existing data mmmg techniques often through mining frequent itemsets to derive association rules and access to relevant knowledge, but with the rapid arrival of the era of big data, Traditional data mining algorithms have been unable to meet large data's analysis needs. In view of this, this paper proposes an adaptation to the big data mining parallel algorithms-MRPrePost. MRPrePost is a parallel algorithm based on Hadoop platform, which improves PrePost by way of adding a prefix pattern, and on this basis into the parallel design ideas, making MRPrePost algorithm can adapt to mining…
Read More

Enabling Efficient Access Control with Dynamic Policy Updating for Big Data in the Cloud

Hadoop
"Enabling Efficient Access Control with Dynamic Policy Updating for Big Data in the Cloud Due to the high volume and velocity of big data, it is an effective option to store big data in the cloud, because the cloud has capabilities of storing big data and processing high volume of user access requests. Attribute-Based Encryption (ABE) is a promising technique to ensure the end-to-end security of big data in the cloud. However, the policy updating has always been a challenging issue when ABE is used to construct access control schemes. A trivial implementation is to let data owners retrieve the data and re-encrypt it under the new access policy, and then send it back to the cloud. This method incurs a high communication overhead and heavy computation burden on data…
Read More

Load Balancing for Privacy-Preserving Access to Big Data in Cloud

Hadoop
"Load Balancing for Privacy-Preserving Access to Big Data in Cloud In the era of big data, many users and companies start to move their data to cloud storage to simplify data management and reduce data maintenance cost. However, security and privacy issues become major concerns because third-party cloud service providers are not always trusty. Although data contents can be protected by encryption, the access patterns that contain important information are still exposed to clouds or malicious attackers. In this paper, we apply the ORAM algorithm to enable privacy-preserving access to big data that are deployed in distributed file systems built upon hundreds or thousands of servers in a single or multiple geo-distribu ted cloud sites. Since the ORAM algorithm would lead to serious access load unbalance among storage servers, we…
Read More

Secure Sensitive Data Sharing on a Big Data Platform

Hadoop
"Secure Sensitive Data Sharing on a Big Data Platform Users store vast amounts of sensitive data on a big data platform. Sharing sensitive data will help enterprises reduce the cost of providing users with personalized services and provide value-added data services. However, secure data sharing is problematic. This paper proposes a framework for secure sensitive data sharing on a big data platform, including secure data delivery, storage, usage, and destruction on a semi-trusted big data sharing platform. We present a proxy re-encryption algorithm based on heterogeneous ciphertext transformation and a user process protection method based on a virtual machine monitor, which provides support for the realization of system functions. The framework protects the security of users’ sensitive data effectively and shares these data safely. At the same time, data owners…
Read More

Building a Big Data Analytics Service Framework for Mobile Advertising and Marketing

Hadoop
"Building a Big Data Analytics Service Framework for Mobile Advertising and Marketing The unprecedented growth in mobile device adoption and the rapid advancement of mobile technologies & wireless networks have created new opportunities in mobile marketing and adverting. The opportunities for Mobile Marketers and Advertisers include real-time customer engagement, improve customer experience, build brand loyalty, increase revenues, and drive customer satisfaction. The challenges, however, for the Marketers and Advertisers include how to analyze troves of data that mobile devices emit and how to derive customer engagement insights from the mobile data. This research paper addresses the challenge by developing Big Data Mobile Marketing analytics and advertising recommendation framework. The proposed framework supports both offline and online advertising operations in which the selected analytics techniques are used to provide advertising recommendations…
Read More

PaWI: ParallelWeighted Itemset Mining by means of MapReduce

Hadoop
"PaWI: ParallelWeighted Itemset Mining by means of MapReduce Frequent itemset mining is an exploratory data mining technique that has fruitfully been exploited to extract recurrent co-occurrences between data items. Since in many application contexts items are enriched with weights denoting their relative importance in the analyzed data, pushing item weights into the itemset mining process, i.e., mining weighted itemsets rather than traditional itemsets, is an appealing research direction. Although many efficient in-memory weighted itemset mining algorithms are available in literature, there is a lack of parallel and distributed solutions which are able to scale towards Big Weighted Data. This paper presents a scalable frequent weighted itemset mining algorithm based on the MapReduce paradigm. To demonstrate its actionability and scalability, the proposed algorithm was tested on a real Big dataset collecting…
Read More

Performance Analysis of Scheduling Algorithms for Dynamic Workflow Applications

Hadoop
"Performance Analysis of Scheduling Algorithms for Dynamic Workflow Applications In recent years, Big Data has changed how we do computing. Even though we have large scale infrastructure such as Cloud computing and several platforms such as Hadoop available to process the workloads, with Big Data there is a high level of uncertainty that has been introduced in how an application processes the data. Data in general comes in different formats, at different speed and at different volume. Processing consists of not just one application but several applications combined to form a workflow to achieve a certain goal. With data variation and at different speed, applications execution and resource needs will also vary at runtime. These are called dynamic workflows. One can say that we can just throw more and more…
Read More

BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value Store

Hadoop
"BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value Store Nowadays, cloud-based storage services are rapidly growing and becoming an emerging trend in data storage field. There are many problems when designing an efficient storage engine for cloud-based systems with some requirements such as big-file processing, lightweight meta-data, low latency, parallel I/O, deduplication, distributed, high scalability. Key-value stores played an important role and showed many advantages when solving those problems. This paper presents about Big File Cloud (BFC) with its algorithms and architecture to handle most of problems in a big-file cloud storage system based on keyvalue store. It is done by proposing low-complicated, fixed-size meta-data design, which supports fast and highly-concurrent, distributed file I/O, several algorithms for resumable upload, download and simple data deduplication method for static data. This…
Read More

Privacy Preserving Data Analysis in Mental Health Research

Hadoop
"Privacy Preserving Data Analysis in Mental Health Research The digitalization of mental health records and psychotherapy notes has made individual mental health data more readily accessible to a wide range of users including patients, psychiatrists, researchers, statisticians, and data scientists. However, increased accessibility of highly sensitive mental records threatens the privacy and confidentiality of psychiatric patients. The objective of this study is to examine privacy concerns in mental health research and develop a privacy preserving data analysis approach to address these concerns. In this paper, we demonstrate the key inadequacies of the existing privacy protection approaches applicable to use of mental health records and psychotherapy notes in recordsbased research. We then develop a privacy-preserving data analysis approach that enables researchers to protect the privacy of people with mental illness once…
Read More

Recent Advances in Autonomic Provisioning of Big Data Applications on Clouds

Hadoop
"Recent Advances in Autonomic Provisioning of Big Data Applications on Clouds CLOUD computing [1] assembles large networks of virtualized ICT services such as hardware resources (such as CPU, storage, and network), software resources (such as databases, application servers, and web servers) and applications.In industry these services are referred to as infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS). Mainstream ICT powerhouses such as Amazon, HP, and IBM are heavily investing in the provision and support of public cloud infrastructure. Cloud computing is rapidly becoming a popular infrastructure of choice among all types of organisations. Despite some initial security concerns and technical issues, an increasing number of organisations have moved their applications and services in to “The Cloud”. These applications range from generic…
Read More

Processing Geo-Dispersed Big Data in an Advanced MapReduce Framework

Hadoop
"Processing Geo-Dispersed Big Data in an Advanced MapReduce Framework Big data takes many forms, including messages in social networks, data collected from various sensors, captured videos, and so on. Big data applications aim to collect and analyze large amounts of data, and efficiently extract valuable information from the data. A recent report shows that the amount of data on the Internet is about 500 billion GB. With the fast increase of mobile devices that can perform sensing and access the Internet, large amounts of data are generated daily. In general, big data has three features: large volume, high velocity and large variety [1]. The International Data Corporation (IDC) predicted that the total amount of data generated in 2020 globally will be about 35 ZB. Facebook needs to process about 1.3…
Read More

Deduplication on Encrypted Big Data in Cloud

Hadoop
"Deduplication on Encrypted Big Data in Cloud Cloud computing offers a new way of service provision by re-arranging various resources over the Internet. The most important and popular cloud service is data storage. In order to preserve the privacy of data holders, data are often stored in cloud in an encrypted form. However, encrypted data introduce new challenges for cloud data deduplication, which becomes crucial for big data storage and processing in cloud. Traditional deduplication schemes cannot work on encrypted data. Existing solutions of encrypted data deduplication suffer from security weakness. They cannot flexibly support data access control and revocation. Therefore, few of them can be readily deployed in practice. In this paper, we propose a scheme to deduplicate encrypted data stored in cloud based on ownership challenge and proxy…
Read More

Big data, big knowledge: big data for personalised healthcare

Hadoop
"Big data, big knowledge: big data for personalised healthcare The idea that the purely phenomenological knowledge that we can extract by analysing large amounts of data can be useful in healthcare seems to contradict the desire of VPH researchers to build detailed mechanistic models for individual patients. But in practice no model is ever entirely phenomenological or entirely mechanistic. We propose in this position paper that big data analytics can be successfully combined with VPH technologies to produce robust and effective in silico medicine solutions. In order to do this, big data technologies must be further developed to cope with some specific requirements that emerge from this application. Such requirements are: working with sensitive data; analytics of complex and heterogeneous data spaces, including non-textual information; distributed data management under security…
Read More

A Time Efficient Approach for Detecting Errors in Big Sensor Data on Cloud

Hadoop
"A data mining framework to analyze road accident data Big sensor data is prevalent in both industry and scientific research applications where the data is generated with high volume and velocity it is difficult to process using on-hand database management tools or traditional data processing applications. Cloud computing provides a promising platform to support the addressing of this challenge as it provides a flexible stack of massive computing, storage, and software services in a scalable manner at low cost. Some techniques have been developed in recent years for processing sensor data on cloud, such as sensor-cloud. However, these techniques do not provide efficient support on fast detection and locating of errors in big sensor data sets. For fast data error detection in big sensor data sets, in this paper, we…
Read More

A data mining framework to analyze road accident data

Hadoop
"A data mining framework to analyze road accident data Road and traffic accidents are uncertain and unpredictable incidents and their analysis requires the knowledge of the factors affecting them. Road and traffic accidents are defined by a set of variables which are mostly of discrete nature. The major problem in the analysis of accident data is its heterogeneous nature [1]. Thus heterogeneity must be considered during analysis of the data otherwise, some relationship between the data may remain hidden. Although, researchers used segmentation of the data to reduce this heterogeneity using some measures such as expert knowledge, but there is no guarantee that this will lead to an optimal segmentation which consists of homogeneous groups of road accidents [2]. Therefore, cluster analysis can assist the segmentation of road accidents.
Read More

Big Data Challenges in Smart Grid IoT (WAMS) Deployment

Hadoop
"Big Data Challenges in Smart Grid IoT (WAMS) Deployment Internet of Things adoption across industries has proven to be beneficial in providing business value by transforming the way data is utilized in decision making and visualization. Power industry has for long struggled with traditional ways of operating and has suffered from issues like instability, blackouts,etc. The move towards smart grid has thus received lot of acceptance. This paper presents the Internet of Things deployment in grid, namely WAMS, and the challenges it present in terms of the Big Data it aggregates. Better insight into the problem is provided with the help of Indian Grid case studies.
Read More

Review Based Service Recommendation for Big Data

Hadoop
"Review Based Service Recommendation for Big Data Success of web 2.0 brings online information overload. An exponential growth of customers, services and online information has been observed in last decade. It yields big data investigation problem for service recommendation system. Traditional recommender systems often put up with scalability, lack of security and efficiency problems. Users preferences are almost ignored. So, the requirement of robust ecommendation system is enhanced now a days. In this paper, we present review based service recommendation to dynamically recommend services to the users. Keywords are extracted from passive users reviews and a rating value is given to every new keyword observed in the dataset. Sentiment analysis is performed on these rating values and top-k services recommendation list is provided to users. To make the system more…
Read More

A Profile-Based Big Data Architecture for Agricultural Context

Hadoop
"A Profile-Based Big Data Architecture for Agricultural Context Bringing Big data technologies into agriculture presents a significant challenge; at the same time, this technology contributes effectively in many countries’ economic and social development. In this work, we will study environmental data provided by precision agriculture information technologies, which represents a crucial source of data in need of being wisely managed and analyzed with appropriate methods and tools in order to extract the meaningful information. Our main purpose through this paper is to propose an effective Big data architecture based on profiling system which can assist (among others) producers, consulting companies, public bodies and research laboratories to make better decisions by providing them real time data processing, and a dynamic big data service composition method, to enhance and monitor the agricultural…
Read More

Achieving Efficient and Privacy-Preserving Cross-Domain Big Data Deduplication in Cloud

Hadoop
"Achieving Efficient and Privacy-Preserving Cross-Domain Big Data Deduplication in Cloud Secure data deduplication can significantly reduce the communication and storage overheads in cloud storage services, and has potential applications in our big data-driven society. Existing data deduplication schemes are generally designed to either resist brute-force attacks or ensure the efficiency and data availability, but not both conditions. We are also not aware of any existing scheme that achieves accountability, in the sense of reducing duplicate information disclosure (e.g., to determine whether plaintexts of two encrypted messages are identical). In this paper, we investigate a three-tier cross-domain architecture, and propose an efficient and privacy-preserving big data deduplication in cloud storage (hereafter referred to as EPCDD). EPCDD achieves both privacy-preserving and data availability, and resists brute-force attacks. In addition, we take accountability…
Read More

A Queuing Method for Adaptive Censoring in Big Data Processing

Hadoop
"A Queuing Method for Adaptive Censoring in Big Data Processing As more than 2.5 quintillion bytes of data are generated every day, the era of big data is undoubtedly upon us. Running analysis on extensive datasets is a challenge. Fortunately, a significant percentage of the data accrued can be omitted while maintaining a certain quality of statistical inference in many cases. Censoring provides us a natural option for data reduction. However, the data chosen by censoring occur nonuniformly, which may not relieve the computational resource requirement. In this paper, we propose a dynamic, queuing method to smooth out the data processing without sacrificing the convergence performance of censoring. The proposed method entails simple, closed-form updates, and has no loss in terms of accuracy comparing to the original adaptive censoring method.Simulation…
Read More

    Open chat