Prediction and Classification of Cardiac Arrhythmia
Irregularity in heartbeat may be harmless or life-threatening. Hence both accurate detection of the presence, as well as classification of arrhythmia, are important. Arrhythmia can be diagnosed by measuring the heart activity using an instrument called ECG or electrocardiograph and then analyzing the recorded data. Different parameter values can be extracted from the ECG waveforms and can be used along with other information about the patient like age, medical history, etc to detect arrhythmia. However, sometimes it may be difficult for a doctor to look at these long-duration ECG recordings and find minute irregularities. Therefore, using machine learning for automating arrhythmia diagnosis can be very helpful. The project aims at using different machine learning algorithms like Naive Bayes, SVM, Random Forests and Neural Networks for predicting and classifying arrhythmia into different categories. 2 Data Set The dataset for the project is taken from the UCI Machine Learning Repository https://archive. ics.uci.edu/ml/datasets/Arrhythmia (1 CSV file, 1 information file ).There are (452) rows, each representing medical record of a different patient. There are 279 attributes like age, weight and patient’s ECG related data. The data set is labeled with 16 different classes. Classes 2 to 15 correspond to different types of arrhythmia. Class 1 corresponds to normal ECG with no arrhythmia and class 16 refers to the unlabeled patient. The data set is heavily biased towards the no arrhythmia case with 245 instances belonging to class 1 and 185 instances being split among the 14 arrhythmia classes and the rest 22 are unclassified. 3 of the classes related to the degree of AV block do not appear in the dataset. The labels for this data set are obtained from cardiologists and they are considered to be the gold model. The main challenges in processing this data set are the limited number of training examples compared to the number of features, heavy bias towards the case of normal ECG, missing feature values (about 0.33%) and feature values belonging to both continuous and categorical types.
Research Paper Link: Download Paper