Problem 1. Explain what each of the following R functions do? You can run them in R and check the
results.
(a) c(1, 17, −6, 3)
(b) seq(1, 5, by=0.5)
(c) seq(0, 10, length=5)
(d) rep(0, 5)
(e) rep(1:3, 4)
(f) rep(4:6, 1:3)
(g) sample(1:3)
(h) sample(1:5, size=3, replace=FALSE)
(i) sample(c(2,5,3), size=4, replace=TRUE)
(j) sample(1:2, size=10, prob=c(1,3), replace=TRUE)
(k) c(1, 2, 3) + c(4, 5, 6)
(l) max(1:10)
(m) min(1:10)
(n) range(1:10)
(o) matrix(1:12, nr=3, nc=4)
(q) Let a ← c(1,2,3), b ← c(10, 20, 30), c ←c(100, 200, 300), d ← c(1000, 2000, 3000). What does
the function rbind(a, b, c, d) do? What does cbind(a, b, c, d) do?
1
2 HOMEWORK 2 DUE DATE: FRIDAY, SEPTEMBER 25 AT 11:59 PM
(r) Let C be the following matrix
a b c d
1 10 100 1000
2 20 200 2000
3 30 300 3000
What is sum(C)? What is apply(C, 1, sum)? What is apply(C, 2, sum)?
(s) Let movies ← c(“SPYDERMAN”,“BATMAN”,“VERTIGO”,“CHINATOWN”). What does
lapply(movies, tolower) do? Notice that “tolower” changes the string value of a matrix to
lower case.
(t) Let x ← factor(c(“alpha”, “beta”, “gamma”, “alpha”, “beta”)). What does the function levels(x) return?
(u) c ← 35:50
(v) c(1, 2, 3) + c(4, 5, 6)
(w) c(1, 2, 3, 4) + c(10, 20)
(x) sqrt(c(100, 225, 400))
Problem 2. Create the following vectors in R.
a = (5, 10, 15, 20, …, 160)
b = (87, 86, 85, …, 56)
Use vector arithmetic to multiply these vectors and call the result d. Select subsets of d to identify the
following.
(a) What are the 19th, 20th, and 21st elements of d?
(b) What are all of the elements of d which are less than 2000?
(c) How many elements of d are greater than 6000?
Problem 3. This exercise relates to the College data set, which can be found in the file College.csv. It
contains a number of variables for 777 different universities and colleges in the US. The variables are
• Private : Public/private indicator
• Apps : Number of applications received
• Accept : Number of applicants accepted
• Enroll : Number of new students enrolled
• Top10perc : New students from top 10% of high school class
• Top25perc : New students from top 25% of high school class
• F.Undergrad : Number of full-time undergraduates
BUSINESS DATA MINING (IDS 472) 3
• P.Undergrad : Number of part-time undergraduates
• Outstate : Out-of-state tuition
• Room.Board : Room and board costs
• Books : Estimated book costs
• Personal : Estimated personal spending
• PhD : Percent of faculty with Ph.D.’s
• Terminal : Percent of faculty with terminal degree
• S.F.Ratio : Student/faculty ratio
• perc.alumni : Percent of alumni who donate
• Expend : Instructional expenditure per student
• Grad.Rate : Graduation rate
(a) Read the data into R. Call the loaded data “college”. Explain how you do this.
(b) How many variables are in this data set. What are their measurements? How do you get these
information?
(c) Use the function colnames() to change the “Top10perc” and “Top 25per” variables names to
“Top10” and “Top25”.
(d) Look at the data. You should notice that the first column is just the name of each university.
We don’t really want R to treat this as data. However, it may be handy to have these names
for later. Try the following commands:
> rownames (college) → college [,1]
You should see that there is now a row.names column with the name of each university recorded.
This means that R has given each row a name corresponding to the appropriate university. R
will not try to perform calculations on the row names. However, we still need to eliminate the
first column in the data where the names are stored. Write a code to eliminate the first column.
(e) Add a column to indicate the acceptance rate for each university (acceptance rate = number of
accepted applications / number of applications received).
(f) Provide a summary statistics for numerical variables in the data set.
(g) Use the pairs() function to produce a scatterplot matrix of the first ten columns or variables of
the data. Recall that you can reference the first ten columns of a matrix A using A[,1:10]. Can
you observe any useful information in the plots?
(h) Use the boxplot() function to produce side-by-side boxplots of Outstate versus Private. Do you
observe any useful information in this plot?
(i) Create a new qualitative variable, called Elite, by binning the Top10perc variable. We are going
to divide universities into two groups based on whether or not the proportion of students coming
from the top 10% of their high school classes exceeds 50%. Follow the code below.
4 HOMEWORK 2 DUE DATE: FRIDAY, SEPTEMBER 25 AT 11:59 PM
> Elite → rep (“No”,nrow(college))
> Elite[college$Top10perc > 50] = “Yes”
> Elite = as.factor(Elite)
> college = data.frame(college,Elite)
i. Explain each line of the above code.
ii. Use the summary() function to see how many elite universities there are. Now use the
plot() function to produce side-by-side boxplots of Outstate versus Elite.
(j) Use the hist() function to produce some histograms with differing numbers of bins for a few of
the quantitative variables. You may find the command par(mfrow=c(2,2)) useful: it will divide
the print window into four regions so that four plots can be made simultaneously. Modifying
the arguments to this function will divide the screen in other ways.
(k) What is room and board costs of private schools on average ?
(l) Create a new binary variable that is 1 if the student/faculty ratio is greater than 0.5 and 0
otherwise.
(m) Compare the distribution of out of state tuition for private and public colleges.
Problem 4. This exercise involves the “Auto” data set.
(a) Remove the missing values from this data set.
(b) What is the range of each quantitative predictor? You can answer this using the range() function.
(c) What is the mean and standard deviation of each quantitative predictor?
(d) Remove the 10th through 85th observations. What is the range, mean, and standard deviation
of each predictor in the subset of the data that remains?
(e) Using the full data set, investigate the predictors graphically, using scatterplots or other tools of
your choice. Create some plots highlighting the relationships among the predictors. Comment
on your findings.
(f) Suppose that we wish to predict gas mileage (mpg) on the basis of the other variables. Do your
plots suggest that any of the other variables might be useful in predicting mpg? Justify your
answer.
Problem 5. FiveThirtyEight, a data journalism site devoted to politics, sports, science, economics,
and culture, recently published a series of articles on gun deaths in America. Gun violence in the
United States is a significant political issue, and while reducing gun deaths is a noble goal, we must first
understand the causes and patterns in gun violence in order to craft appropriate policies. As part of the
project, FiveThirtyEight collected data from the Centers for Disease Control and Prevention, as well as
BUSINESS DATA MINING (IDS 472) 5
other governmental agencies and non-profits, on all gun deaths in the United States from 2012-2014.You
can find this dataset, called ”gun deaths.csv”, on blackboard.
(a) Generate a data frame that summarizes the number of gun deaths per month.
(b) Generate a bar chart with labels on the x-axis. That is, each month should be labeled “Jan”,
“Feb”, “Mar” and etc.
(c) Generate a bar chart that identifies the number of gun deaths associated with each type of intent
cause of death. The bars should be sorted from highest to lowest values.
(d) Generate a boxplot visualizing the age of gun death victims, by sex. Print the average age of
female gun death victims.
Answer the following questions. Generate appropriate figures/tables to support your conclusions.
(e) How many white males with at least a high school education were killed by guns in 2012?
(f) Which season of the year has the most gun deaths? Assume that
– Winter = January – March
– Spring = April – June
– Summer = July – September
– Fall = October – December
– Hint: You need to convert a continuous variable into a categorical variable.
(g) Are whites who are killed by guns more likely to die because of suicide or homicide? How does
this compare to blacks and Hispanics?
(h) Are police-involved gun deaths significantly different from other gun deaths? Assess the relationship between police involvement and other variables.
and the Multifactor Leadership Questionnaire relates to transformational and transactional style of leadership. After putting together all of the data and information, they had come to find out that there is a great amount of positive relation between motivation and leadership in which teachers use.
I find that reading articles like this are good to read because it relates to the real world as they were questioning actual people, and they get to know how they feel about things. I learned what the Achievement Motivation Inventory is, and that many places use this source when trying to find out how the work relates to their motivation. This article relates to another article that I read in relation to transactional and transformational leadership styles. Since I read the other article explaining these types of leadership styles, I understood right away what this article was talking about.
https://doi.org/10.1016/j.ijnurstu.2018.04.016
Leadership Styles and Outcome Patterns for the Nursing Workforce and Work Environment: A Systematic Review
This journal article is about the approach taken to learn more about the nature of leadership and how it can be achieved. The main goal was to research the relationships between all of the different types of leadership used in the nursing workplace and the work environment as well as the outcomes of using these styles of leadership. In order to find this out they used electronic databases, quality assessments, data extractions and analysis. There was a lot of good information found when doing this research. An example: relational leadership styles had been linked to a higher nurse job satisfaction and task-focused leadership styles had been linked to a lower nurse job satisfaction. Overall, they established that hiring employees who lean towards relational leadership styles can help your nursing company in a great way for positive outcomes. Any occupation in the healthcare industry should aim towards relational leadership styles for the most job satisfaction by employees.
I decided to look up information about leadership in the nursing field because most of the information I have found is about the general workforce or athletes, and nursing is an occupation that many of my peers are in and an important occupation in the world. I thought that it would be interesting to see the similarities and differences between this topic and other topics that I have gone over. Although I haven’t read much on this type of information, I did learn that relational leadership had high job satisfaction and task-focused had a lower job sa