The R function

 

General notice about submissions

In every problem we ask you to create one or more functions with specific names. Suppose you are asked to create functions named foo() and bar(). When you submit your code, the code must have ONLY the definitions of the functions, and not any CALLS to them. Also, you must not include any set.seed() statements in your submissions.

Good example:

# mysubmission.r

foo <- function(x){

return 2*x

}

bar <- function(x, y){

return x+y

}

baz <- function(N){

runif(N)*2

}

Bad example:

# mysubmission.r

foo <- function(x){

return 2*x

}

foo(c(1,2,3)) # don’t call the function here

bar <- function(x, y){

return x+y

}

mysum <- bar(3,2) # no assignment, no function call.

 

set.seed(111) # don’t set the random seed here

baz <- function(N){

runif(N)*2

}

The reason is that when we run your code for testing purposes, such calls may create some unwanted side effects and interfere with the grading.

When you work on your programs, feel free to call your functions to check that they give correct results. However, in the final submission keep only the function definitions. Your submission should not do anything except for defining the functions.

Keep in mind that we are going to test your programs with new data that has the same structure but different content. Write your programs to be as general as possible within the description.

I suggest that you come up with further test cases on your own, and check that they give the expected output. Make your final tests after restarting R, so that existing variables do not confound your program.

You must work on the assignment by yourself. Giving or receiving any help will not be tolerated.

Notice on data files: When writing your code, you should assume that the given data files are in the same directory as the source files. Do NOT call setwd() in your code. You can use the getwd() command to find your current working directory and copy your data files there.

Problem 1: Country comparisons

For this problem, you will need the files country_pop.csv, country_realgdp.csv and country_totalfertilityrate.csv. These files store population, real gross domestic product (purchasing power parity), and the total fertility rate (number of children per woman), respectively. You can use the read.csv() function to read each file into a dataframe.

(A) (10 points) Write a function named mergetables(popfile, gdpfile, tfrfile) that takes the name of files as input, and returns a dataframe where the data merged by the column name. The resulting data frame must have column names as shown below. No other columns must be present.

Note that each file has a different number of rows, and the name columns are ed differently.

Example

> merged <- mergetables(“country_pop.csv”, “country_realgdp.csv”, “country_totalfertilityrate.csv”)

> head(merged)

name population     RealGDP  TFR

1    Afghanistan   37466414 7.85570e+10 4.72

2        Albania    3088385 3.98590e+10 1.53

3        Algeria   43576691 4.95564e+11 2.55

4 American Samoa      46366 6.58000e+08 2.28

5        Andorra      85645 3.32700e+09 1.44

6         Angola   33642646 2.12285e+11 5.90

 

> tail(merged)

name population    RealGDP  TFR

218    Virgin Islands     105870 3.8720e+09 2.01

219 Wallis and Futuna      15851 6.0000e+07 1.71

220         West Bank    2949246 2.1220e+10 3.02

221             Yemen   30399243 7.3630e+10 3.10

222            Zambia   19077816 6.1985e+10 4.63

223          Zimbabwe   14829988 4.1533e+10 3.91

(Your output might have commas instead of decimal points; it is acceptable as output.)

(B) (10 points) Write a function named plot_country(df) that takes a data frame that is in the form described in part (A), and generates a scatterplot of total fertility rate against the real GDP per capita, for each country. (See the example below.)

> merged <- mergetables(“country_pop.csv”, “country_realgdp.csv”, “country_totalfertilityrate.csv”)

> plot_country(merged)

Set the labels and axis limits of the plot as shown.

For further information on the interpretation of this data, see https://en.wikipedia.org/wiki/Income_and_fertility

Problem 2: Wholesale s

For this problem, you will need the wholesaledata.csv file.

The file stores the annual sales of one wholesale distributor to each of its clients. One row corresponds to one client.

The Channel is a nominal factor with levels 1 or 2. Level 1 indicates that the client is a Hotel/Restaurant/Cafe (“Horeca”), and level 2 indicates that it is a retail store.

The Region is a nominal factor with levels 1, 2, or 3. Level 1 indicates that the client is in Istanbul, 2 indicates that it is in Ankara, 3 indicates other locations.

Other colums indicate the yearly spending of the client on a particular type of item (fresh food, milk, groceries, frozen food, detergents and cleaning paper, and delicacies).

(A) (10 points) Write a function named readdata(filename) that reads the data file and returns a data frame containing the values. The channel and region should be factor variables, and their level values should be replaced as follows:

  • Channel values: “Horeca” for 1, “Retail” for 2
  • Region values: “Istanbul” for 1, “Ankara” for 2, “Other” for 3

Example

> df <- readdata(“wholesaledata.csv”)

> head(df)

Channel Region Fresh Milk Grocery Frozen Detergents_Paper Delicacies

1  Retail  Other 12669 9656    7561    214             2674       1338

2  Retail  Other  7057 9810    9568   1762             3293       1776

3  Retail  Other  6353 8808    7684   2405             3516       7844

4  Horeca  Other 13265 1196    4221   6404              507       1788

5  Retail  Other 22615 5410    7198   3915             1777       5185

6  Retail  Other  9413 8259    5126    666             1795       1451

> summary(df)

Channel         Region        Fresh             Milk          Grocery

Horeca:222   Ankara  : 47   Min.   :     3   Min.   :   55   Min.   :    3

Retail:118   Istanbul: 77   1st Qu.:  3286   1st Qu.: 1606   1st Qu.: 2366

Other   :216   Median :  8726   Median : 3664   Median : 5146

Mean   : 12441   Mean   : 6175   Mean   : 8442

3rd Qu.: 16934   3rd Qu.: 7612   3rd Qu.:10830

Max.   :112151   Max.   :73498   Max.   :92780

Frozen      Detergents_Paper    Delicacies

Min.   :   33   Min.   :    3,0   Min.   :    3,0

1st Qu.:  744   1st Qu.:  283,8   1st Qu.:  416,5

Median : 1500   Median :  833,0   Median :  982,5

Mean   : 3131   Mean   : 3112,8   Mean   : 1615,1

3rd Qu.: 3708   3rd Qu.: 4125,0   3rd Qu.: 1795,8

Max.   :60869   Max.   :40827,0   Max.   :47943,0

(B) (10 points) Write a function named annual_revenue(df, channel, region) that returns a vector of the total annual revenue from each item type, given the channel and the region. The parameter df should be the dataframe that is returned from readdata()

Example

> df <- readdata(“wholesaledata.csv”)

> annual_revenue(df, “Retail”, “Ankara”)

Fresh             Milk          Grocery           Frozen

138506           174625           310200            29271

Detergents_Paper       Delicacies

159795            23541

(C) (10 points) Write a function named nclients(df, channel, region) that returns the number of clients in given channel and region (note that each row corresponds to a unique client). The parameter df should be the dataframe that is returned from readdata()

Example

> df <- readdata(“wholesaledata.csv”)

> nclients(df, “Retail”, “Ankara” )

[1] 19

(D) (10 points) Write a function named itemtotal(df, item) that takes an item category (Fresh, Milk, Grocery, etc.), and returns a table of the total revenue from this item, broken by regions and channels. The parameter df should be the dataframe that is returned from readdata().

Example

> df <- readdata(“wholesaledata.csv”)

> itemtotal(df, “Fresh”)

Horeca Retail

Ankara    326215 138506

Istanbul  761233  93600

Other    2085912 824627

Problem 3: Covid-19 data analysis

For this problem, you will need the file owid-covid-data.csv, which you can download from the Our World in Data repository here: https://github.com/owid/covid-19-data/tree/master/public/data

In this problem, you are going to visualize daily total cases and daily total vaccinations for different countries.

(A) (10 points) Write a function named read_covid_data(filename) that reads the file and returns a dataframe with the following columns only: date, iso_code, location, total_cases, and people_fully_vaccinated.

Example

> df <- read_covid_data(“owid-covid-data.csv”)

> head(df)

date iso_code    location total_cases people_fully_vaccinated

1 2020-02-24      AFG Afghanistan           1                      NA

2 2020-02-25      AFG Afghanistan           1                      NA

3 2020-02-26      AFG Afghanistan           1                      NA

4 2020-02-27      AFG Afghanistan           1                      NA

5 2020-02-28      AFG Afghanistan           1                      NA

6 2020-02-29      AFG Afghanistan           1                      NA

> tail(df)

date iso_code location total_cases people_fully_vaccinated

92126 2021-05-24      ZWE Zimbabwe       38696                  281286

92127 2021-05-25      ZWE Zimbabwe       38706                  288437

92128 2021-05-26      ZWE Zimbabwe       38819                  293509

92129 2021-05-27      ZWE Zimbabwe       38854                  305268

92130 2021-05-28      ZWE Zimbabwe       38918                  320166

92131 2021-05-29      ZWE Zimbabwe       38933                      NA

Your output of tail(df) may be different depending on the date of the download.

(B) (15 points) Write a function named plot_cases_vacc(df, isocode) that takes the ISO code of a country, and plots total_cases and people_fully_vaccinated for that country over time. The parameter df should be the output of read_covid_data().

The title of the plot should use the location value corresponding to the given ISO code.

Set the ylim parameter of the plot so that all the data is visible (Hint: Use the max() function with the na.rm=T setting). The lower limit must be 0.

For other formatting specifications see the example.

Example

> df <- read_covid_data(“owid-covid-data.csv”)

> plot_cases_vacc(df, “TUR”)

(C) (15 points) Write a function named comp_country_plot(df, isocode1, isocode2, column) that plots the specified column in days for two countries, given as isocode1 and isocode2. The parameter df should be the output of read_covid_data().

Set the ylim parameter of the plot so that all the data is visible (Hint: Use the max() function with the na.rm=T setting). The lower limit must be 0.

For other formatting specifications see the example.

Examples

> df <- read_covid_data(“owid-covid-data.csv”)

> comp_country_plot(df, “DEU”,”TUR”,”total_cases”)

> comp_country_plot(df, “TUR”,”BRA”,”people_fully_vaccinated”)

Submission template

The function definitions should be submitted as a single source file named assignment3.R, with the following contents:

mergetables <- function (popfile, gdpfile, tfrfile){

# your code here

}

 

plot_country <- function(df){

# your code here

}

 

readdata <- function(filename){

# your code here

}

 

annual_revenue <- function(df, channel, region){

# your code here

}

 

nclients <- function(df, channel, region){

# your code here

}

 

itemtotal <- function(df, item){

# your code here

}

 

read_covid_data <- function(filename){

# your code here

}

 

plot_cases_vacc <- function(df, isocode){

# your code here

}

 

comp_country_plot <- function(df, isocode1, isocode2, column){

# your code here

}

You don’t need to submit the plot images generated by your functions.

 

Place your order
(550 words)

Approximate price: $22

Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
$26
The price is based on these factors:
Academic level
Number of pages
Urgency
Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • Overnight delivery
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 275 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read more

Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Read more

Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Read more

Privacy policy

Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Read more

Fair-cooperation guarantee

By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

Read more

Order your paper today and save 30% with the discount code HAPPY

X
Open chat
1
You can contact our live agent via WhatsApp! Via + 1 323 412 5597

Feel free to ask questions, clarifications, or discounts available when placing an order.

Order your essay today and save 30% with the discount code HAPPY