The R function

General notice about submissions

In every problem we ask you to create one or more functions with specific names. Suppose you are asked to create functions named foo() and bar(). When you submit your code, the code must have ONLY the definitions of the functions, and not any CALLS to them. Also, you must not include any set.seed() statements in your submissions.

Good example:

# mysubmission.r

foo <- function(x){

return 2*x

}

bar <- function(x, y){

return x+y

}

baz <- function(N){

runif(N)*2

}

Bad example:

# mysubmission.r

foo <- function(x){

return 2*x

}

foo(c(1,2,3)) # don’t call the function here

bar <- function(x, y){

return x+y

}

mysum <- bar(3,2) # no assignment, no function call.

set.seed(111) # don’t set the random seed here

baz <- function(N){

runif(N)*2

}

The reason is that when we run your code for testing purposes, such calls may create some unwanted side effects and interfere with the grading.

When you work on your programs, feel free to call your functions to check that they give correct results. However, in the final submission keep only the function definitions. Your submission should not do anything except for defining the functions.

Keep in mind that we are going to test your programs with new data that has the same structure but different content. Write your programs to be as general as possible within the description.

I suggest that you come up with further test cases on your own, and check that they give the expected output. Make your final tests after restarting R, so that existing variables do not confound your program.

You must work on the assignment by yourself. Giving or receiving any help will not be tolerated.

Notice on data files: When writing your code, you should assume that the given data files are in the same directory as the source files. Do NOT call setwd() in your code. You can use the getwd() command to find your current working directory and copy your data files there.

Problem 1: Country comparisons

For this problem, you will need the files country_pop.csv, country_realgdp.csv and country_totalfertilityrate.csv. These files store population, real gross domestic product (purchasing power parity), and the total fertility rate (number of children per woman), respectively. You can use the read.csv() function to read each file into a dataframe.

(A) (10 points) Write a function named mergetables(popfile, gdpfile, tfrfile) that takes the name of files as input, and returns a dataframe where the data merged by the column name. The resulting data frame must have column names as shown below. No other columns must be present.

Note that each file has a different number of rows, and the name columns are ed differently.

Example

> merged <- mergetables(“country_pop.csv”, “country_realgdp.csv”, “country_totalfertilityrate.csv”)

> head(merged)

name population RealGDP TFR

1 Afghanistan 37466414 7.85570e+10 4.72

2 Albania 3088385 3.98590e+10 1.53

3 Algeria 43576691 4.95564e+11 2.55

4 American Samoa 46366 6.58000e+08 2.28

5 Andorra 85645 3.32700e+09 1.44

6 Angola 33642646 2.12285e+11 5.90

> tail(merged)

name population RealGDP TFR

218 Virgin Islands 105870 3.8720e+09 2.01

219 Wallis and Futuna 15851 6.0000e+07 1.71

220 West Bank 2949246 2.1220e+10 3.02

221 Yemen 30399243 7.3630e+10 3.10

222 Zambia 19077816 6.1985e+10 4.63

223 Zimbabwe 14829988 4.1533e+10 3.91

(Your output might have commas instead of decimal points; it is acceptable as output.)

(B) (10 points) Write a function named plot_country(df) that takes a data frame that is in the form described in part (A), and generates a scatterplot of total fertility rate against the real GDP per capita, for each country. (See the example below.)

> merged <- mergetables(“country_pop.csv”, “country_realgdp.csv”, “country_totalfertilityrate.csv”)

> plot_country(merged)

Set the labels and axis limits of the plot as shown.

For further information on the interpretation of this data, see https://en.wikipedia.org/wiki/Income_and_fertility

Problem 2: Wholesale s

For this problem, you will need the wholesaledata.csv file.

The file stores the annual sales of one wholesale distributor to each of its clients. One row corresponds to one client.

The Channel is a nominal factor with levels 1 or 2. Level 1 indicates that the client is a Hotel/Restaurant/Cafe (“Horeca”), and level 2 indicates that it is a retail store.

The Region is a nominal factor with levels 1, 2, or 3. Level 1 indicates that the client is in Istanbul, 2 indicates that it is in Ankara, 3 indicates other locations.

Other colums indicate the yearly spending of the client on a particular type of item (fresh food, milk, groceries, frozen food, detergents and cleaning paper, and delicacies).

(A) (10 points) Write a function named readdata(filename) that reads the data file and returns a data frame containing the values. The channel and region should be factor variables, and their level values should be replaced as follows:

Channel values: “Horeca” for 1, “Retail” for 2
Region values: “Istanbul” for 1, “Ankara” for 2, “Other” for 3

Example

> df <- readdata(“wholesaledata.csv”)

> head(df)

Channel Region Fresh Milk Grocery Frozen Detergents_Paper Delicacies

1 Retail Other 12669 9656 7561 214 2674 1338

2 Retail Other 7057 9810 9568 1762 3293 1776

3 Retail Other 6353 8808 7684 2405 3516 7844

4 Horeca Other 13265 1196 4221 6404 507 1788

5 Retail Other 22615 5410 7198 3915 1777 5185

6 Retail Other 9413 8259 5126 666 1795 1451

> summary(df)

Channel Region Fresh Milk Grocery

Horeca:222 Ankara : 47 Min. : 3 Min. : 55 Min. : 3

Retail:118 Istanbul: 77 1st Qu.: 3286 1st Qu.: 1606 1st Qu.: 2366

Other :216 Median : 8726 Median : 3664 Median : 5146

Mean : 12441 Mean : 6175 Mean : 8442

3rd Qu.: 16934 3rd Qu.: 7612 3rd Qu.:10830

Max. :112151 Max. :73498 Max. :92780

Frozen Detergents_Paper Delicacies

Min. : 33 Min. : 3,0 Min. : 3,0

1st Qu.: 744 1st Qu.: 283,8 1st Qu.: 416,5

Median : 1500 Median : 833,0 Median : 982,5

Mean : 3131 Mean : 3112,8 Mean : 1615,1

3rd Qu.: 3708 3rd Qu.: 4125,0 3rd Qu.: 1795,8

Max. :60869 Max. :40827,0 Max. :47943,0

(B) (10 points) Write a function named annual_revenue(df, channel, region) that returns a vector of the total annual revenue from each item type, given the channel and the region. The parameter df should be the dataframe that is returned from readdata()

Example

> df <- readdata(“wholesaledata.csv”)

> annual_revenue(df, “Retail”, “Ankara”)

Fresh Milk Grocery Frozen

138506 174625 310200 29271

Detergents_Paper Delicacies

159795 23541

(C) (10 points) Write a function named nclients(df, channel, region) that returns the number of clients in given channel and region (note that each row corresponds to a unique client). The parameter df should be the dataframe that is returned from readdata()

Example

> df <- readdata(“wholesaledata.csv”)

> nclients(df, “Retail”, “Ankara” )

[1] 19

(D) (10 points) Write a function named itemtotal(df, item) that takes an item category (Fresh, Milk, Grocery, etc.), and returns a table of the total revenue from this item, broken by regions and channels. The parameter df should be the dataframe that is returned from readdata().

Example

> df <- readdata(“wholesaledata.csv”)

> itemtotal(df, “Fresh”)

Horeca Retail

Ankara 326215 138506

Istanbul 761233 93600

Other 2085912 824627

Problem 3: Covid-19 data analysis

For this problem, you will need the file owid-covid-data.csv, which you can download from the Our World in Data repository here: https://github.com/owid/covid-19-data/tree/master/public/data

In this problem, you are going to visualize daily total cases and daily total vaccinations for different countries.

(A) (10 points) Write a function named read_covid_data(filename) that reads the file and returns a dataframe with the following columns only: date, iso_code, location, total_cases, and people_fully_vaccinated.

Example

> df <- read_covid_data(“owid-covid-data.csv”)

> head(df)

date iso_code location total_cases people_fully_vaccinated

1 2020-02-24 AFG Afghanistan 1 NA

2 2020-02-25 AFG Afghanistan 1 NA

3 2020-02-26 AFG Afghanistan 1 NA

4 2020-02-27 AFG Afghanistan 1 NA

5 2020-02-28 AFG Afghanistan 1 NA

6 2020-02-29 AFG Afghanistan 1 NA

> tail(df)

date iso_code location total_cases people_fully_vaccinated

92126 2021-05-24 ZWE Zimbabwe 38696 281286

92127 2021-05-25 ZWE Zimbabwe 38706 288437

92128 2021-05-26 ZWE Zimbabwe 38819 293509

92129 2021-05-27 ZWE Zimbabwe 38854 305268

92130 2021-05-28 ZWE Zimbabwe 38918 320166

92131 2021-05-29 ZWE Zimbabwe 38933 NA

Your output of tail(df) may be different depending on the date of the download.

(B) (15 points) Write a function named plot_cases_vacc(df, isocode) that takes the ISO code of a country, and plots total_cases and people_fully_vaccinated for that country over time. The parameter df should be the output of read_covid_data().

The title of the plot should use the location value corresponding to the given ISO code.

Set the ylim parameter of the plot so that all the data is visible (Hint: Use the max() function with the na.rm=T setting). The lower limit must be 0.

For other formatting specifications see the example.

Example

> df <- read_covid_data(“owid-covid-data.csv”)

> plot_cases_vacc(df, “TUR”)

(C) (15 points) Write a function named comp_country_plot(df, isocode1, isocode2, column) that plots the specified column in days for two countries, given as isocode1 and isocode2. The parameter df should be the output of read_covid_data().

Set the ylim parameter of the plot so that all the data is visible (Hint: Use the max() function with the na.rm=T setting). The lower limit must be 0.

For other formatting specifications see the example.

Examples

> df <- read_covid_data(“owid-covid-data.csv”)

> comp_country_plot(df, “DEU”,”TUR”,”total_cases”)

> comp_country_plot(df, “TUR”,”BRA”,”people_fully_vaccinated”)

Submission template

The function definitions should be submitted as a single source file named assignment3.R, with the following contents:

mergetables <- function (popfile, gdpfile, tfrfile){

# your code here

}

plot_country <- function(df){

# your code here

}

readdata <- function(filename){

# your code here

}

annual_revenue <- function(df, channel, region){

# your code here

}

nclients <- function(df, channel, region){

# your code here

}

itemtotal <- function(df, item){

# your code here

}

read_covid_data <- function(filename){

# your code here

}

plot_cases_vacc <- function(df, isocode){

# your code here

}

comp_country_plot <- function(df, isocode1, isocode2, column){

# your code here

}

You don’t need to submit the plot images generated by your functions.

Continue to order Get a quote

Calculate the price of your order

Type of paper needed:

Pages:

550 words

Academic level:

We'll send you the first draft for approval by September 11, 2018 at 10:52 AM

Total price:

$26

The price is based on these factors:

Academic level

Number of pages

Urgency

Basic features

Free title page and bibliography
Unlimited revisions
Plagiarism-free guarantee
Money-back guarantee
24/7 support

On-demand options

Writer’s samples
Part-by-part delivery
Overnight delivery
Copies of used sources
Expert Proofreading

Paper format

275 words per page
12 pt Arial/Times New Roman
Double line spacing
Any citation style (APA, MLA, Chicago/Turabian, Harvard)

The R function

Calculate the price of your order

Our guarantees

Money-back guarantee

Zero-plagiarism guarantee

Free-revision policy

Privacy policy

Fair-cooperation guarantee