22 Feb Problem 4: Statistical Description of Multivariate Data for a Real-Wor
Problem 4: Statistical Description of Multivariate Data for a Real-World Dataset [40 points]
To complete this task you have to use the crx.data file. This file crx.data contains data collected from credit card applications. All attribute names and values have been changed to meaningless symbols to protect the confidentiality of the data. The dataset is downloaded from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets.php).
This dataset is interesting because there is a good mix of attributes — continuous, nominal with small numbers of values, and nominal with larger numbers of values. There are also a few missing values. Read the data in R using the following command.
data <- read.table("path/crx.data", sep = ",");
Here, replace the path with the path of the file crx.data in your computer. After loading the data in R you can access each column using data[ , 1], data[ , 2], … , data[ , 15]. All the data will be in character format when you load it from crx.data you will have to convert the numeric columns from character to numeric using the as.numeric() function as follows. You can view the data using view(data) command.
attribute1 <- as.numeric(data[ , 2])
For missing values, NAs will be introduced by coercion.
There are 16 columns in the data the first 15 columns are the attributes of the data and the 16th column is the label of the data. You have to only analyze the attributes of the data.
- Find which attributes are the nominal attributes and which are continuous attributes.
- Identify the attribute/attributes with missing values (having NA). Drop the attributes with missing values from the data.
- Calculate the central tendency of the rest of the attributes. Remember for the nominal attribute you can only calculate the mode.
- Calculate the five-number summary of the numeric attributes.
- Show box plots for the numeric attributes and identify the attributes having outliers.
- Show pairwise scatter plots of the numeric attributes. Inspect the scatter plots and mention if each pair’s attributes are negatively correlated, positively correlated or there is no correlation.
*Do not forget to label the axes of the plots.
Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.
Do you need help with this question?
Get assignment help from Aqhomework.com Paper Writing Website and forget about your problems.
Aqhomework provides custom & cheap essay writing 100% original, plagiarism free essays, assignments & dissertations.
With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Click Order now to access our order form, fill your paper details correctly, select your paper deadline and wait for our writers to send a perfectly written assignment.
Chat with us today! We are always waiting to answer all your questions.