Introduction

The aim of this tutorial is to introduce you to functional coding, while explaining how to code in a structural way.

Let’s now create a .RProj file for this session to save all your notes! (Always start projects this way)

Use fmxdat::make_project to make a default folder structure that you should use please.

Do this by creating any folder (e.g. create C:/Temp_Go/someproject), copy the address, and then simply run the following code from any R session’s console:

fmxdat::make_project(Open = T)

This should create a folder structure looking like:

Schools of thought: a quick history (very quick)

Two broad schools of coding thought has emerged over time: Object oriented and functional.

While the progression in coding paradigms has not been linear and clean, most would agree that the two are the main structures.

Since the development of Fortran in 1957 (Formula Translator, an unstructured and intentionally mathematical language), other paradigms developed - specifically seeing the advent of variable assignments, conditionals and loops.

Cobalt in 1959 was an attempt to make programming languages more like a natural language - making it easier to learn and more accessible, but more limiting as well. Since then, various other languages and paradigms emerged - all with benefits and costs associated.

As systems became more complex and interlinked, a need for structure in code emerged. Esgar Dijkstra is credited for highlighting the need for, and developing the first structured coding paradigm in 1969.

This was arguably the beginning of functional, or structured programming, whereby the coder is forced to asssemble arguments, building structure and allowing the ability to test their validity.

The main aim is to move from speed of coding and result, to process consideration and stability.

OO paradigms are attributed to Alan Kay in 1966 with the advent of Simula. The main idea is for polymorphism (e.g. a man can be both a father, son and husband at the same time - ie multiple attributes depending on context).

Broadly speaking…

Funcitonal coding paradigms seek to constrain tasks - writing code that do one thing (Return_Calculation) and produces no side-effects.

This means breaking your workflow down into smaller parts - allowing the ability to test and check functions, while also avoiding having functions interact with environment variables.
The following is an ideal example of what we are trying to achieve:

x <- 10
# Specify function with input variable Y:
Vegas <- function(Y){
  
  x <- Y ^ 2
  x
}
# Execute function, setting output equal to new variable `result`
result <- Vegas(Y = 100)

# Notice now, that although x was assigned in the function... it stayed only assigned in the function.
# So x remains 10, while result is equal to 10 000
print(x)

## [1] 10

print(result)

## [1] 10000

Notice too from this example that Y is not in your environment variable. Basically, what happens in Vegas stays in Vegas.

In summary, functions are used to take inputs and transform them to an output. Crucially, inputs are not changed in any way, and functions produce no side effects. That is a virtue in itself.

Java, C++, Python and more can be bucketed in the OO paradigm of polymorhpism.

Effectively, it is a messaging paradigm, where context of a command matters.

So if I run plot(X) and plot(y) - the output will be determined by the type and nature of X and Y (same function, but input object determines output). This is polymorphism, am I now a husband, dad or a son depends on the context.

Functional Programming seeks to be indifferent to context and structured - plot_histogram seeks to produce a histogram, always.

In summary, we use a fluid combination of both OO and functional. I certainly use both in any project. This session describes both, and there is not a right or wrong at all tmes. Horses for courses, but please try and prize stability and structure in everything you do - which is core to functional coding.

Sanitized environment

As mentioned, when running a function - it does so in a completely new environment.

This way, if you execute a function multiple times, the exact same outcome is produced:

foo <- function(x){
  x <- x^3
  x
}

foo(x = 10)

## [1] 1000

foo(x = 10)

## [1] 1000

foo(x = 10)

## [1] 1000

Functions in R

Below we illustrate the use of functions in R with examples - followed by explanations for what is happening.

x = rnorm(1000)
# Let's now create a function to replace all positive values with the word positive:
Cap_Randoms <- function(vector, Threshold = 0, Replace_Name) { 
  
  # Just for illustration:
  z <- 1000
  # What function is doing:
    vector[vector >= Threshold] <- Replace_Name
    vector
}

head( Cap_Randoms(vector = x, Replace_Name = "Positive"), 10)

##  [1] "-1.32513427435075"  "Positive"           "Positive"          
##  [4] "Positive"           "Positive"           "Positive"          
##  [7] "-0.313836352808351" "Positive"           "Positive"          
## [10] "Positive"

Notice from the above:

The function has been named Cap_Randoms, with 3 possible inputs given in the brackets.
Notice that Threshold was given a default value of zero, which can be overwritten when calling the function.
Replace_Name has no default, and must be provided.
The only thing coming out of the function is vector (last thing function does)
- This can be explicitly set using return(vector), but it could also simply be the last element as above.
- Check that has not been saved in your work environment, so running print(z) after running the above script will produce an error.

This last point is vital to understanding the power of R functions - it is sanitized.

It only deals with what comes in - and only produces what you want it to produce.

Let’s play with the above function some more:

# Let's overwrite the default to set Threshold to 2, and name it "TWO or above"
Result <- Cap_Randoms(vector = x, Threshold = 2, Replace_Name = "TWO or above") 
# Notice that z, which is create internal to the function, is not created in our environment:
exists("z")

## [1] FALSE

# We can also add messages in functions, e.g.:
pacman::p_load(glue)
Cap_Randoms_Msg <- function(vector, Threshold = 0, Replace_Name) { 
  
  # Just for illustration:
  Count <- sum(vector > 0)
  SUM <- length(vector)
  
  # What function is doing:
    vector[vector >= Threshold] <- Replace_Name

    message( glue::glue("Percentage of positive elements = {Count/SUM * 100}%")) 
    vector
}

Result <- Cap_Randoms_Msg(vector = x, Replace_Name = "Positive")

In the above example, notice that a message was printed. This message was not saved in the environment, nor Count or SUM. The only element going from function -> environment was vector.

This is to be preferred.

Let’s discuss the following schematics:

How does R see a functional environment?

What would a functional workflow entail?

What would a Object Oriented workflow entail?

Notice from the last schematic, the output of the generic function plot is guided by the input.

So - dependent on the input, the output of plot is different.

This means that functions can be generic (e.g. plot), and inputs are specific in a OO workflow (e.g. Result and Result2).

This means your objects have multiple facets - it is not just e.g. a dataframe, but has other elements too (e.g. guiding plot to produce a lineplot).

Functional, however, forces your function to be specific: it requires a particular input in a particular form (ideally - there are exceptions of course)

This means you’d have two functions: plot_line and plot_bar, where the function guides the output (not the input).

Think about the above and reason for yourself which process you are most comfortable with.

In this course - I prefer the workflow of having many functions - each having a specific role - as opposed to one long script that does 1000 things.

Purity is a virtue

Although functions in R are flexible and can be used as inputs, have flexible parameter inputs, etc. - one thing you should strive toward (nearly at all times) is that your functions are pure

Purity in functions imply:

The output only depends on the inputs (no environment variable dependencies. Ever.)

The function has no side-effects (e.g. changing a global variable value). What comes out of a function should be explicit.

You could, in principle, structure an entire project around functions that each do one thing.

E.g.:

Universes <- list( c("AUS", "NZ", "SA", "UK", "US" ) )
# Load
df <- load_foo( Load_List = Universes)
# Wrangle
df_wrangled <- wrangle_foo(df)
# Plot
Plot <- plot_foo(df_wrangled)
# Print plot
plot(Plot)

Object Oriented coding, in contrast, creates class objects, to which you could, e.g., apply various calls to.

Many older R packages still make extensive use of class objects.

An example of this principle would be:

X <- do_all_Foo( Load_List = Universes )
# To produce a histogram plot, use 'plot' on the object X:
plot( X )

Y <- do_something_else_Foo( Load_List = Universes )
# To produce now a lineplot for Y, use again 'plot' on Y:
plot( Y )

Notice that X and Y are plotted simply by calling ‘plot,’ which then produces a histogram plot for X, and a lineplot for Y.

The ‘plot’ is determined by the input class - the first, say, produces an output with class ‘My_Foo’ - with a structure that allows a histogram to be produced with plot (and alternatively lineplot for Y based on its object class).

The question is: should we be explicit about the plot created (i.e. have an explicit function called histogram_plot and line_plot producing each) - or should the output of ‘plot’ be governed by the output object?

Your choice in answer to this question should guide you i.t.o. what paradigm you are most comfortable coding in. Although it might seem tedious to have functions for all your operations, it ultimately leads to cleaner, more clearly defined and easier to debug code. The choice is yours, but at least be consistent in how you code.

Soooo…. what are functions really?

Functions in R are like meat factories.

You specify a name (Joe’s meats), explicitly state what goes in and then explicitly state what goes out.

Simple. That’s it.

Your meat factory should:

Have all internal elements listed as parameters
- NEVER. EVER. EVER have something in a function that is defined in an external environment.
Check this, otherwise you could easily be dissapointed with the results from your efforts, as this guy learned (he did not ensure that all internal objects in his functions were explicitly defined):

Bad function:

Vague name, external dependencies not defined as a parameter (y), and also not explicit about what exits the function:

foo <- function(x){
  z <- x * y
}

y <- 100
Result <- foo(x = 10)

Result

## [1] 1000

Better function:

No external dependencies, explicit about what exits (although a great function would’ve had a more informative name obvs):

foo <- function(x, y){
  
  z <- x * y
  
  return(z)
  
}

Result <- foo(x = 10, y = 100)

Quick note on return use:

It stops a function and returns your wishes, so what follows is irrelevant:

foo <- function(x, y){
  
  z <- x * y
  
  return(z)
  
  g <- z * 2000
  
  return(g)
  
}

Result <- foo(x = 10, y = 100)

print(Result)

## [1] 1000

In the above case, you might as well not use return, and only have the function end with that which you wish it should.

Use return and stop to break functions and return your wishes.

Let’s show this below. I’m also going to throw in (for free) an illustration of how to handle possible breaks without R screaming and stopping in a tantrum: purrr::safely.

Result <- purrr::safely(any_function) will produce: a result and error - where one of them will be NULL. See example below of how to use:

pacman::p_load(purrr)
foo <- function(x, y){
  z <- x * y
  if(z > 999) stop( glue::glue("\n\nOh NO!\nvalue of z exceeds 999. The value was {z}. \nStop and try again please... Cheers, the developer.") )
  z
}
safe_foo <- purrr::safely(foo)

# Here it will fail:
Result <- safe_foo(x = 10, y = 100)
print(Result$result)
print(message(Result$error))

# Here succeed:
Result <- safe_foo(x = 1, y = 100)

print(Result$result)
print(Result$error)
# So nnow save the result, and you are off to the races:
Result <- Result$result

Use safe wrappers for your functions where you envisage possible breaks - but where a warning or message will suffice. This way, you can carry on with a loop even if an error occurs.

Let’s go deeper

Let’s create a few more examples to ground the functional intuition.

These functions might require you to spend time understanding what is going on - I gaurantee it will aid your understanding if you spend time with this.

Example 1:

Below is a simple example of a function that you can use for calculating the standardized return values of the BRICSTRI file.

We will use it with apply to apply the function to all columns:

df_tri <- fmxdat::BRICSTRI

my_std <- function(Column){
  
  Column = Column / lag(Column) - 1
  mean( Column, na.rm = T) / sd( Column, na.rm = T )
  
}
# Now we can use this function directly in apply:
apply(df_tri[,-1], 2, my_std)

## brz chn ind rus zar 
## NaN NaN NaN NaN NaN

# Note we had to ignore the date column using [, -1].
# This could also be explicitly built into the function to ignore columns that are not NA.
# We will use here the more geneirc sapply to identify classes.
# Note what it does:
sapply(df_tri, class)

##      Date       brz       chn       ind       rus       zar 
##    "Date" "numeric" "numeric" "numeric" "numeric" "numeric"

# The base function class describes what an object is. E.g. class(10) and class("Text")

# Let's build this check into our function, while also rounding results to the nearest third decimal:

my_safer_std <- function(Column){
  
  if( all( class(Column)  %in% "numeric") ) {
  Column = Column / lag(Column) - 1
  Result <- mean( Column, na.rm = T) / sd( Column, na.rm = T )
  Result <- round( as.numeric(Result), 3)
  } else {
    Result <- "Not numeric column soz."
  }
  
}

lapply(df_tri, my_safer_std) # Gives list result

## $Date
## [1] "Not numeric column soz."
## 
## $brz
## [1] NaN
## 
## $chn
## [1] NaN
## 
## $ind
## [1] NaN
## 
## $rus
## [1] NaN
## 
## $zar
## [1] NaN

sapply(df_tri, my_safer_std) # Gives unlisted result (vector)

##                      Date                       brz                       chn 
## "Not numeric column soz."                     "NaN"                     "NaN" 
##                       ind                       rus                       zar 
##                     "NaN"                     "NaN"                     "NaN"

Notice for the above, the result for sapply (vector version of lapply) is a vector with both characters and numeric. R produces a character vector as a result - cannot handle both.

This would be akin to trying:

mixed_vector <- c( 10, 12, "Hello", 18)
# Note by adding hello, everything is made character.

Nesting

You can nest functions (i.e. have functions in functions) - but be careful of this. The following works, e.g., as each function creates a function, f, which works inside out (as you can see internally):

f <- function(x) {
  f <- function(x) {
    f <- function(x) {
      x ^ 2 # first
    }
    f(x) + 1 # second
  }
  f(x) * 2 # third
}
f(10)

## [1] 202

The above works, but is extremely poor coding…

You should always strive to explicitly define and label your functions appropriately so as to avoid confusion.

Looping Functions

Functions can ofcourse also be applied in loops (please note loops are old-school. You could often use one of the apply functions rather…)

Let’s e.g. loop through a list and replace all values above 1 with 1000:

# Create open list:
df_List <- list()
df_List$Name <- "Some name"
df_List$x <- rnorm(1000)
df_List$y <- rnorm(1000)
df_List$z <- rnorm(1000)

capper <- function(dfl, cap_thresh = 1, replacer = 1000){
  
  if( class( dfl ) == "character" ) return(dfl)
  
  dfl[ which ( dfl > cap_thresh)] <-  replacer
  
  dfl
  
}

# Let's now create a loop for all the list entries:

for(i in 1: length(df_List)){
  
  df_List[[i]] <- capper(df_List[[i]])
  
}

# Incidentally We could use lapply below - which applies the function, capper, to each drawer in my list cupboard...
df_List <- lapply(df_List, capper)

Notice above the loop adjusts the df_List object directly - a loop is not a functional environment!

This follows as loops merely repeat a command, whereas a function has a defined input and output.

Ideally, repeating a function many times should not affect the outcome, i.e. Result <- foo(df = X) should always have the same answer. Even if you repeat it a hundred times.

Simply put, if you ask me if I like to braai a 1000 times, my answer should be yes 1000 times.

Storing functions

Ideally you should build your functions & ensure they are sanitized (knowing exactly what goes in and out, with some nice descriptions too), after which they are saved with an informative name.

They can then be sourced at anytime (even in functions) and used in various places in your project.

Let’s take a safe and very generic return calculator function, which first identifies columns that are numeric - and then calculates the returns for the column.

Notice this is different to before as the tbl_df() structure is intentionally preserved in the function…

pacman::p_load(dplyr)


# First note, using sapply we can apply a function on all the columns of a dataframe:
sapply(df, class)

##         x       df1       df2       ncp       log           
##    "name"    "name"    "name"    "name" "logical"       "{"

# We will now use this in our function to find the columns that are numeric:



Return_Calculator <- function(df){
  
  # Let's create a vector of names of numeric columns:
  Return_Columns <- 
    names( which( sapply(df, class) == "numeric") )
  
  if( length(Return_Columns) == 0 ) stop("No numeric columns! Try another dataframe....")
  # names is the vector equivalent of colnames used before...
  
  
  # Now we can create a function, in a function, and immediately use it:
  Return_Creator <- function(column){
  column / lag(column) - 1    
  }

  #Right, let's use this function now:
  Returns_df <- apply( df[, which(colnames(df) %in% Return_Columns)], 2, Return_Creator)
  
  # Let's now append this back to other dataframe information
  
  df_done <- 
  bind_cols(df[, which(!colnames(df) %in% Return_Columns)], 
            as_tibble(Returns_df))
  
  # Ensure similar ordering for completeness:
  df_done[, colnames(df)]
  
  }


# Create the data to use in function:

df_tri <- fmxdat::BRICSTRI
# Let's add a column of text to try and break Returns_Column:
df_tri <- bind_cols( df_tri, Text_Column = rep( "Some Random TEXT!",  nrow(df_tri)) )

# Let's now test this bugger:
Return_Calculator(df = df_tri)

## # A tibble: 790 x 7
##    Date            brz      chn     ind     rus      zar Text_Column      
##    <date>        <dbl>    <dbl>   <dbl>   <dbl>    <dbl> <chr>            
##  1 2000-01-14 NA       NA       NA      NA      NA       Some Random TEXT!
##  2 2000-01-21 -0.0267  -0.0172   0.0232 -0.0263 -0.0107  Some Random TEXT!
##  3 2000-01-28 -0.0279  -0.00169  0.0427 -0.106  -0.101   Some Random TEXT!
##  4 2000-02-04  0.0615  -0.0158   0.0561 -0.0305  0.0396  Some Random TEXT!
##  5 2000-02-11 -0.00709 -0.102    0.111   0.0404  0.00702 Some Random TEXT!
##  6 2000-02-18 -0.0315   0.0428   0.107  -0.0450 -0.0178  Some Random TEXT!
##  7 2000-02-25  0.00960 -0.105   -0.0858 -0.0261 -0.0300  Some Random TEXT!
##  8 2000-03-03  0.0392  -0.0420  -0.0256  0.114  -0.0576  Some Random TEXT!
##  9 2000-03-10 -0.0107   0.0455  -0.0248  0.185   0.0279  Some Random TEXT!
## 10 2000-03-17 -0.0289   0.00407 -0.107  -0.0339 -0.00839 Some Random TEXT!
## # ... with 780 more rows

# Winning!

Obviously the above was much more elaborate than needed (intentionally so), but it illustrates quite a few nice tricks to make your function robust.

Before proceeding, let’s build a structured framework that you must get used to following.

Workflow

As stressed, the README in your project should be a diary as you progress through a project, and a manual once you are done. Be thorough in documenting your readme.

All R codes should be saved in code, with source scripts loaded as shown in the README.

VERY IMPORTANT: the code folder should only be for fully contained functions.

This means, a script like the following does not belong in code:

#----------------------------------------
# script content:

df <- read_rds("data/Some_Data.rds")

XX <- Some_Foo(df)

Some_Other_Foo(Y  = df, X = rnorm(100))

write_csv(XX, "C:/Somewhere/Something.csv")

#----------------------------------------

Can you understand why the above should not be saved and called in an R script in ‘code?’
Where would this (or a similar sequential execution of code) belong? The README of course!
Sourcing the above would execute everything in script into your environment, as it is not wrapped in a function(){ } blanket.

The following should also never be done (where there is uncommented code outside your function, as this will be executed if you source the function):

#----------------------------------------
# script content:

x <- 10

y <- 35

some_foo <- function(X){
  
  result <- X^2
  
}

#----------------------------------------

Proceeding to load functions into workflow:

Put all your enclosed functions in the ‘code’ folder
Use fmxdat::source_all(“code”) to load in all your functions into the README (or point to another location to do load other scripts).

The source_all function will only load scripts ending with “.R.” Again, make sure these files are only enclosed functions!

Create your own function

I want you to now practice creating your own function, while explaining your process in your README.

I want you to do the following:

Create a function that loads and saves fmxdat::BRICSTRI in your data folder - saved as a .rds file, with name: BRICS.rds

In a different chunk in your README:

Load BRICS.rds, and save this as ‘df_Brics’ in your environment.
Next, create a new chunk where you source a function, ‘filter_df.R,’ that does the following:

filter your dataframe to only consider weekdays between two dates (set by default to 2008 and 2010).
- Hint. use: rmsfuns::dateconverter and filter the rows using ‘which’;
- For now still, use base R coding (apply family and square bracket truncation).
Create and source a function that calculates the simple returns \(\frac{X}{lag(X)}-1\) for all the columns in your dataframe.

So what R functions?

Now that you’ve seen how to construct and source functions, let’s discuss functional programming in R a bit more.

In R, functions are first class elements - implying You can do anything with functions that you can do with vectors: you can assign them to variables, store them in lists, pass them as arguments to other functions, create them inside functions, and even return them as the result of a function.

You have seen this perhaps without fully understanding that you are using functions as inputs - in the apply family.

Notice that these are called functionals, as they require a function as an input (third agrument in apply, e.g.).

Before we construct a functional, let’s see another syntax called ellipse.

Ellipse…

Ellipses allow us to keep the door open for additional parameters to be input into functions that are used within our function.

As an example, suppose we create a function, Return_Calc, where we use within our function PerformanceAnalytics package’s PerformanceAnalytics::Return.annualized function, which has many potential input parameters.

I can now either specify all the parameters for PerformanceAnalytics::Return.annualized used in my Return_Calc function, or simply use an ellipse as a placeholder for specifying additional paramteres.

Conceptual Example:

pacman::p_load(PerformanceAnalytics)

dfuse <- fmxdat::BRICSTRI

Return_Calc <- function(dfuse, ...){

  # ...pretend some data wrangling happens here...
  
  wrangled_foo <- wrangle_foo (dfuse)
  
  # ...pretend some data wrangling happens here...
  
  result <- PerformanceAnalytics::Return.annualized( wrangled_foo, ...)
  
}

# This now enables me to add any parameter to my function that will ultimately be considered in PerformanceAnalytics::Return.annualized that is in my function, 
# e.g. the parameter `geometric = TRUE` below is passed to the Return.annualized function inside (despite not being explicitly 
# specified in Return_Calc at the top, but indirectly through channeling the ... part to the function)

Returns <- Return_Calc(dfuse, geometric = TRUE)

Let’s construct a more elaborate functional to explain the concept further:

# Let's first create a quick (and elaborate) dataframe with a column checking if the 
# rounded sum of the rows (i.e. integer) is even or uneven ( this is completely arbitrary, 
# but allows us to practice our base R skills a bit...).
# Notice here I introduce 'strsplit'- which allows splitting up of character vectors.
# So we first make that value a character, split its string - consider the last digit and 
# check if even, before making it a numeric again.
# Of course, you could've used e.g. X / 2 and tested that the value is rounded - but below is more elaborate for practice.

Even_Uneven_Sum <- function(Row){
  
  ifelse( as.numeric( last( strsplit( as.character(round(Row, 0)), "")[[1]])) %in% 
            c(0,2,4,6,8,10) , "Even", "Uneven")
  
}

df_to_Use <- 
  
  bind_cols( fmxdat::SectorTRIs,
             
             tibble(Info = apply(fmxdat::SectorTRIs[,-1], 1, Even_Uneven_Sum))
             
             ) 


# Now, let's create a functional that applies a function everywhere where we have either Uneven or Even in Info column:

Max_Foo <- function(X, Scalar_Choice){
  
  max( X, na.rm=T ) * Scalar_Choice
  
}

Min_Foo <- function(X, Number_Multiplied, Check_Positive_Neg){
  
  Result <- min( X, na.rm=T ) * Number_Multiplied
  
  if( Check_Positive_Neg ){
    
  Final_Result <- ifelse( Result * rnorm(1) > 0, paste0("Positive: ", Result), paste0("Negative: ", Result) )    
  
  } else {
    
  Final_Result <- Result        
  
  }

  Final_Result
}


Apply_Numerics_Function <- function(df_to_Use, Function_To_Apply, ...){
  
  Numeric_Cols <- colnames( df_to_Use[, sapply(df_to_Use, is.numeric)] ) 
  
  apply( df_to_Use[ which(df_to_Use$Info == "Even"), Numeric_Cols], 1, Function_To_Apply, ...)

df_Applied <-   
  tbl_df(
  cbind( 
    df_to_Use[ which(df_to_Use$Info == "Even"), which(!colnames(df_to_Use) %in% Numeric_Cols)],
    Even_Foo_Applied = apply( df_to_Use[ which(df_to_Use$Info == "Even"), Numeric_Cols], 1, Function_To_Apply, ...) 
    # Notice use of ..., or elipse. This is a placeholder for function inputs.
  )
  )

df_Applied

}

# Max function applied:
Apply_Numerics_Function(df_to_Use = df_to_Use,
                        Function_To_Apply = Max_Foo, 
                        Scalar_Choice = 1000)

# Notice how the elipse could now contain a multitude of possible function inputs below:
                     
# Min function applied with positive / negative transform:
Apply_Numerics_Function(df_to_Use = df_to_Use,
                        Function_To_Apply = Min_Foo, 
                        Number_Multiplied = 10, 
                        Check_Positive_Neg = TRUE
                        )
                     
Apply_Numerics_Function(df_to_Use = df_to_Use,
                        Function_To_Apply = Min_Foo, 
                        Number_Multiplied = 20, 
                        Check_Positive_Neg = FALSE
                        )

Although the above examples are elaborate (ask not why, but rather how please :) ) they show something really useful:

Ellipses can be used as placeholders for potential inputs
Functions can be passed as arguments to other functions.

The above is how ‘apply’ (and other similar variants) work - they allow the provision of additional arguments in ellipses, and functions as inputs.

Conclusion

Please go through the above chunks and understand the different forms that functions could take in R.

Later sections will explore uses of functions in more advanced settings, so please be comfortable with these base examples.

We next explore the revolution in coding called tidy coding.

Suggested solution (note, you have to save each step as a separate function and source it all in a neat README. Try this yourself):

# Filter function:

filter_df <- function(StartDate = as.Date("2008-01-01"), 
                      EndDate = as.Date("2010-01-01"), 
                      df_Use){

  days_filter <- rmsfuns::dateconverter(StartDate = StartDate, 
                                         EndDate = EndDate, 
                                         Transform = "weekdays")
  
df_Trimmed <- df_Use[ which( df_Use$Date %in% days_filter), ]

df_Trimmed

}

Trimmed_df <- 
  
filter_df(StartDate = as.Date("2008-01-01"), 
          
          EndDate = as.Date("2010-01-01"), 
          
          df_Use = fmxdat::BRICSTRI)


# Returns function:

Return_Foo <- function(Trimmed_df){
  
  Aux_Return_Func <- function(R){
    
    R / lag(R) - 1
    
  }
  
  df_Returns <- 
    
    cbind( Trimmed_df[,1], 
           
           apply(Trimmed_df[,-1], 2, Aux_Return_Func)
           
           )
  
  df_Returns
  
}

Result <- Return_Foo(Trimmed_df)

From next time, the notation will be much easier… almost there!

Applied Data Science

Functional Coding

NF Katzke