General notice about submissions
In every problem we ask you to create one or more functions with specific names. Suppose you are asked to create functions named foo() and bar(). When you submit your code, the code must have ONLY the definitions of the functions, and not any CALLS to them. Also, you must not include any set.seed() statements in your submissions.
Good example:
# mysubmission.r
foo <- function(x){
return 2*x
}
bar <- function(x, y){
return x+y
}
baz <- function(N){
runif(N)*2
}
Bad example:
# mysubmission.r
foo <- function(x){
return 2*x
}
foo(c(1,2,3)) # don’t call the function here
bar <- function(x, y){
return x+y
}
mysum <- bar(3,2) # no assignment, no function call.
set.seed(111) # don’t set the random seed here
baz <- function(N){
runif(N)*2
}
The reason is that when we run your code for testing purposes, such calls may create some unwanted side effects and interfere with the grading.
When you work on your programs, feel free to call your functions to check that they give correct results. However, in the final submission keep only the function definitions. Your submission should not do anything except for defining the functions.
Keep in mind that we are going to test your programs with new data that has the same structure but different content. Write your programs to be as general as possible within the description.
I suggest that you come up with further test cases on your own, and check that they give the expected output. Make your final tests after restarting R, so that existing variables do not confound your program.
You must work on the assignment by yourself. Giving or receiving any help will not be tolerated.
Notice on data files: When writing your code, you should assume that the given data files are in the same directory as the source files. Do NOT call setwd() in your code. You can use the getwd() command to find your current working directory and copy your data files there.
Problem 1: Country comparisons
For this problem, you will need the files country_pop.csv, country_realgdp.csv and country_totalfertilityrate.csv. These files store population, real gross domestic product (purchasing power parity), and the total fertility rate (number of children per woman), respectively. You can use the read.csv() function to read each file into a dataframe.
(A) (10 points) Write a function named mergetables(popfile, gdpfile, tfrfile) that takes the name of files as input, and returns a dataframe where the data merged by the column name. The resulting data frame must have column names as shown below. No other columns must be present.
Note that each file has a different number of rows, and the name columns are ed differently.
Example
> merged <- mergetables(“country_pop.csv”, “country_realgdp.csv”, “country_totalfertilityrate.csv”)
> head(merged)
name population RealGDP TFR
1 Afghanistan 37466414 7.85570e+10 4.72
2 Albania 3088385 3.98590e+10 1.53
3 Algeria 43576691 4.95564e+11 2.55
4 American Samoa 46366 6.58000e+08 2.28
5 Andorra 85645 3.32700e+09 1.44
6 Angola 33642646 2.12285e+11 5.90
> tail(merged)
name population RealGDP TFR
218 Virgin Islands 105870 3.8720e+09 2.01
219 Wallis and Futuna 15851 6.0000e+07 1.71
220 West Bank 2949246 2.1220e+10 3.02
221 Yemen 30399243 7.3630e+10 3.10
222 Zambia 19077816 6.1985e+10 4.63
223 Zimbabwe 14829988 4.1533e+10 3.91
(Your output might have commas instead of decimal points; it is acceptable as output.)
(B) (10 points) Write a function named plot_country(df) that takes a data frame that is in the form described in part (A), and generates a scatterplot of total fertility rate against the real GDP per capita, for each country. (See the example below.)
> merged <- mergetables(“country_pop.csv”, “country_realgdp.csv”, “country_totalfertilityrate.csv”)
> plot_country(merged)
Set the labels and axis limits of the plot as shown.
For further information on the interpretation of this data, see https://en.wikipedia.org/wiki/Income_and_fertility
Problem 2: Wholesale s
For this problem, you will need the wholesaledata.csv file.
The file stores the annual sales of one wholesale distributor to each of its clients. One row corresponds to one client.
The Channel is a nominal factor with levels 1 or 2. Level 1 indicates that the client is a Hotel/Restaurant/Cafe (“Horeca”), and level 2 indicates that it is a retail store.
The Region is a nominal factor with levels 1, 2, or 3. Level 1 indicates that the client is in Istanbul, 2 indicates that it is in Ankara, 3 indicates other locations.
Other colums indicate the yearly spending of the client on a particular type of item (fresh food, milk, groceries, frozen food, detergents and cleaning paper, and delicacies).
(A) (10 points) Write a function named readdata(filename) that reads the data file and returns a data frame containing the values. The channel and region should be factor variables, and their level values should be replaced as follows:
Example
> df <- readdata(“wholesaledata.csv”)
> head(df)
Channel Region Fresh Milk Grocery Frozen Detergents_Paper Delicacies
1 Retail Other 12669 9656 7561 214 2674 1338
2 Retail Other 7057 9810 9568 1762 3293 1776
3 Retail Other 6353 8808 7684 2405 3516 7844
4 Horeca Other 13265 1196 4221 6404 507 1788
5 Retail Other 22615 5410 7198 3915 1777 5185
6 Retail Other 9413 8259 5126 666 1795 1451
> summary(df)
Channel Region Fresh Milk Grocery
Horeca:222 Ankara : 47 Min. : 3 Min. : 55 Min. : 3
Retail:118 Istanbul: 77 1st Qu.: 3286 1st Qu.: 1606 1st Qu.: 2366
Other :216 Median : 8726 Median : 3664 Median : 5146
Mean : 12441 Mean : 6175 Mean : 8442
3rd Qu.: 16934 3rd Qu.: 7612 3rd Qu.:10830
Max. :112151 Max. :73498 Max. :92780
Frozen Detergents_Paper Delicacies
Min. : 33 Min. : 3,0 Min. : 3,0
1st Qu.: 744 1st Qu.: 283,8 1st Qu.: 416,5
Median : 1500 Median : 833,0 Median : 982,5
Mean : 3131 Mean : 3112,8 Mean : 1615,1
3rd Qu.: 3708 3rd Qu.: 4125,0 3rd Qu.: 1795,8
Max. :60869 Max. :40827,0 Max. :47943,0
(B) (10 points) Write a function named annual_revenue(df, channel, region) that returns a vector of the total annual revenue from each item type, given the channel and the region. The parameter df should be the dataframe that is returned from readdata()
Example
> df <- readdata(“wholesaledata.csv”)
> annual_revenue(df, “Retail”, “Ankara”)
Fresh Milk Grocery Frozen
138506 174625 310200 29271
Detergents_Paper Delicacies
159795 23541
(C) (10 points) Write a function named nclients(df, channel, region) that returns the number of clients in given channel and region (note that each row corresponds to a unique client). The parameter df should be the dataframe that is returned from readdata()
Example
> df <- readdata(“wholesaledata.csv”)
> nclients(df, “Retail”, “Ankara” )
[1] 19
(D) (10 points) Write a function named itemtotal(df, item) that takes an item category (Fresh, Milk, Grocery, etc.), and returns a table of the total revenue from this item, broken by regions and channels. The parameter df should be the dataframe that is returned from readdata().
Example
> df <- readdata(“wholesaledata.csv”)
> itemtotal(df, “Fresh”)
Horeca Retail
Ankara 326215 138506
Istanbul 761233 93600
Other 2085912 824627
Problem 3: Covid-19 data analysis
For this problem, you will need the file owid-covid-data.csv, which you can download from the Our World in Data repository here: https://github.com/owid/covid-19-data/tree/master/public/data
In this problem, you are going to visualize daily total cases and daily total vaccinations for different countries.
(A) (10 points) Write a function named read_covid_data(filename) that reads the file and returns a dataframe with the following columns only: date, iso_code, location, total_cases, and people_fully_vaccinated.
Example
> df <- read_covid_data(“owid-covid-data.csv”)
> head(df)
date iso_code location total_cases people_fully_vaccinated
1 2020-02-24 AFG Afghanistan 1 NA
2 2020-02-25 AFG Afghanistan 1 NA
3 2020-02-26 AFG Afghanistan 1 NA
4 2020-02-27 AFG Afghanistan 1 NA
5 2020-02-28 AFG Afghanistan 1 NA
6 2020-02-29 AFG Afghanistan 1 NA
> tail(df)
date iso_code location total_cases people_fully_vaccinated
92126 2021-05-24 ZWE Zimbabwe 38696 281286
92127 2021-05-25 ZWE Zimbabwe 38706 288437
92128 2021-05-26 ZWE Zimbabwe 38819 293509
92129 2021-05-27 ZWE Zimbabwe 38854 305268
92130 2021-05-28 ZWE Zimbabwe 38918 320166
92131 2021-05-29 ZWE Zimbabwe 38933 NA
Your output of tail(df) may be different depending on the date of the download.
(B) (15 points) Write a function named plot_cases_vacc(df, isocode) that takes the ISO code of a country, and plots total_cases and people_fully_vaccinated for that country over time. The parameter df should be the output of read_covid_data().
The title of the plot should use the location value corresponding to the given ISO code.
Set the ylim parameter of the plot so that all the data is visible (Hint: Use the max() function with the na.rm=T setting). The lower limit must be 0.
For other formatting specifications see the example.
Example
> df <- read_covid_data(“owid-covid-data.csv”)
> plot_cases_vacc(df, “TUR”)
(C) (15 points) Write a function named comp_country_plot(df, isocode1, isocode2, column) that plots the specified column in days for two countries, given as isocode1 and isocode2. The parameter df should be the output of read_covid_data().
Set the ylim parameter of the plot so that all the data is visible (Hint: Use the max() function with the na.rm=T setting). The lower limit must be 0.
For other formatting specifications see the example.
Examples
> df <- read_covid_data(“owid-covid-data.csv”)
> comp_country_plot(df, “DEU”,”TUR”,”total_cases”)
> comp_country_plot(df, “TUR”,”BRA”,”people_fully_vaccinated”)
Submission template
The function definitions should be submitted as a single source file named assignment3.R, with the following contents:
mergetables <- function (popfile, gdpfile, tfrfile){
# your code here
}
plot_country <- function(df){
# your code here
}
readdata <- function(filename){
# your code here
}
annual_revenue <- function(df, channel, region){
# your code here
}
nclients <- function(df, channel, region){
# your code here
}
itemtotal <- function(df, item){
# your code here
}
read_covid_data <- function(filename){
# your code here
}
plot_cases_vacc <- function(df, isocode){
# your code here
}
comp_country_plot <- function(df, isocode1, isocode2, column){
# your code here
}
You don’t need to submit the plot images generated by your functions.
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more