Tidy Programming in R

Introducing you to the tidyverse

N.F. Katzke
02-19-2024

In this practical we introduce the tidyverse

It is safe to say that the tidyverse, or tidy coding, requires a complete rethink on how you approach data analytics.

Much of how we go about data analysis is tainted by our use of excel, and our propensity to think in terms of many columns, fewer rows and many (oh so many) excel tabs.

Bonus practical (more examples)

In this bonus practical, you will see a Practice Section at the end.

Please do this yourself, before looking at the proposed solutions.

For the code for the second part, first try it - and then look at the solution that will be posted here in time:

library(tidyverse)

# Cap-weight index return per country:
df_use <- 

  fmxdat::Fin_Data_SA_US_NKY %>% group_by(Universe) %>% mutate(YM = format(date, "%Y%B")) %>% 
  # For easier naming:
  rename(Sector = BICS_LEVEL_1_SECTOR_NAME)
  

# Cap-weighted index returns per month, universe and sector:

df_rets <- 

df_use %>% group_by(Universe, Sector, YM) %>% 
  
  mutate(weight = Market.Cap / sum(Market.Cap, na.rm=T)) %>% 
  
  group_by(Tickers, YM) %>% 
  
  filter(date == last(date)) %>% 
  
  # Date gap issue resolved:
  arrange(date) %>% group_by(Tickers) %>% 
  
  mutate(Returns = ifelse(date - lag(date) >= 35, NA, TRI / lag(TRI)-1)) %>% 
  
  filter(date > first(date)) %>% ungroup() %>% 
  
  # Drop some redundant columns:
  select(-Short.Name, -contains("BICS"))
  

# So for the first question:
  
Rets_per_sector <- df_rets %>% group_by(YM, Universe, Sector) %>% summarise(Returns = sum(weight*Returns, na.rm=T))


# Question 2:

Rets_per_Universe <- 
  
  df_rets %>% 
  # Drop the Sector grouping:
  group_by(YM, Universe) %>% mutate(weight = Market.Cap / sum(Market.Cap, na.rm=T)) %>% 
  
  summarise(Index_Returns = sum(weight*Returns, na.rm=T), EW_Returns = sum(Returns*1/n(), na.rm=T)) %>% ungroup()


Rets_per_Universe %>% group_by(Universe) %>% 
  
  # Chaining the returns set, i.e. (1+Ret_1) * (1+Ret_2) * .... (1+Ret_20)
  summarise(Full_Ret = prod(1+Index_Returns)-1, Full_SD = sd(Index_Returns),
            Full_Ret_EW = prod(1+EW_Returns)-1, Full_SD_EW = sd(EW_Returns),
            Sharpe = Full_Ret / Full_SD, Sharpe_EW = Full_Ret_EW / Full_SD_EW) %>% 
  
  # Rearrange: (notice everything for 'everything else')
  
  select(Universe, contains("Ret"), contains("SD"), everything())



# Question 3:


df_rets %>% filter(Universe == "JALSHAll") %>% 
  
  # Again, drop the Sector grouping:
  group_by(YM, Universe) %>% mutate(weight = Market.Cap / sum(Market.Cap, na.rm=T)) %>% 
  
  group_by(YM) %>% arrange(date, desc(weight)) %>% 
  
  mutate(CumsumW = cumsum(weight)) %>% filter(CumsumW <= 0.95) %>% 
  
  group_by(YM) %>% summarise(Top95_Index_Return = sum(weight * Returns, na.rm=T)) %>% 
  
  summarise(Full_Ret = prod(1+Top95_Index_Return)-1, Full_SD = sd(Top95_Index_Return),Sharpe = Full_Ret / Full_SD)
  
  
# DIY: see if you can annualise these numbers to boot....

Citation

For attribution, please cite this work as

Katzke (2024, Feb. 19). Data Science for Economics and Finance: Tidy Programming in R. Retrieved from https://www.datsci.nfkatzke.com/posts/2020-02-24-tidyprogramming/

BibTeX citation

@misc{katzke2024tidy,
  author = {Katzke, N.F.},
  title = {Data Science for Economics and Finance: Tidy Programming in R},
  url = {https://www.datsci.nfkatzke.com/posts/2020-02-24-tidyprogramming/},
  year = {2024}
}