Tidy Programming in R

Introducing you to the tidyverse


Author

Affiliation

N.F. Katzke

 

Published

Feb. 18, 2025

Citation

Katzke, 2025


In this practical we introduce the tidyverse

It is safe to say that the tidyverse, or tidy coding, requires a complete rethink on how you approach data analytics.

Much of how we go about data analysis is tainted by our use of excel, and our propensity to think in terms of many columns, fewer rows and many (oh so many) excel tabs.

Bonus practical (more examples)

In this bonus practical, you will see a Practice Section at the end.

Please do this yourself, before looking at the proposed solutions.

For the code for the second part, first try it - and then look at the solution that will be posted here in time:

library(tidyverse)

# Cap-weight index return per country:
df_use <- 

  fmxdat::Fin_Data_SA_US_NKY %>% group_by(Universe) %>% mutate(YM = format(date, "%Y%B")) %>% 
  # For easier naming:
  rename(Sector = BICS_LEVEL_1_SECTOR_NAME)
  

# Cap-weighted index returns per month, universe and sector:

df_rets <- 

df_use %>% group_by(Universe, Sector, YM) %>% 
  
  mutate(weight = Market.Cap / sum(Market.Cap, na.rm=T)) %>% 
  
  group_by(Tickers, YM) %>% 
  
  filter(date == last(date)) %>% 
  
  # Date gap issue resolved:
  arrange(date) %>% group_by(Tickers) %>% 
  
  mutate(Returns = ifelse(date - lag(date) >= 35, NA, TRI / lag(TRI)-1)) %>% 
  
  filter(date > first(date)) %>% ungroup() %>% 
  
  # Drop some redundant columns:
  select(-Short.Name, -contains("BICS"))
  

# So for the first question:
  
Rets_per_sector <- df_rets %>% group_by(YM, Universe, Sector) %>% summarise(Returns = sum(weight*Returns, na.rm=T))


# Question 2:

Rets_per_Universe <- 
  
  df_rets %>% 
  # Drop the Sector grouping:
  group_by(YM, Universe) %>% mutate(weight = Market.Cap / sum(Market.Cap, na.rm=T)) %>% 
  
  summarise(Index_Returns = sum(weight*Returns, na.rm=T), EW_Returns = sum(Returns*1/n(), na.rm=T)) %>% ungroup()


Rets_per_Universe %>% group_by(Universe) %>% 
  
  # Chaining the returns set, i.e. (1+Ret_1) * (1+Ret_2) * .... (1+Ret_20)
  summarise(Full_Ret = prod(1+Index_Returns)-1, Full_SD = sd(Index_Returns),
            Full_Ret_EW = prod(1+EW_Returns)-1, Full_SD_EW = sd(EW_Returns),
            Sharpe = Full_Ret / Full_SD, Sharpe_EW = Full_Ret_EW / Full_SD_EW) %>% 
  
  # Rearrange: (notice everything for 'everything else')
  
  select(Universe, contains("Ret"), contains("SD"), everything())



# Question 3:


df_rets %>% filter(Universe == "JALSHAll") %>% 
  
  # Again, drop the Sector grouping:
  group_by(YM, Universe) %>% mutate(weight = Market.Cap / sum(Market.Cap, na.rm=T)) %>% 
  
  group_by(YM) %>% arrange(date, desc(weight)) %>% 
  
  mutate(CumsumW = cumsum(weight)) %>% filter(CumsumW <= 0.95) %>% 
  
  group_by(YM) %>% summarise(Top95_Index_Return = sum(weight * Returns, na.rm=T)) %>% 
  
  summarise(Full_Ret = prod(1+Top95_Index_Return)-1, Full_SD = sd(Top95_Index_Return),Sharpe = Full_Ret / Full_SD)
  
  
# DIY: see if you can annualise these numbers to boot....

Footnotes

    Citation

    For attribution, please cite this work as

    Katzke (2025, Feb. 19). Data Science for Economics and Finance: Tidy Programming in R. Retrieved from https://www.datsci.nfkatzke.com/posts/2020-02-24-tidyprogramming/

    BibTeX citation

    @misc{katzke2025tidy,
      author = {Katzke, N.F.},
      title = {Data Science for Economics and Finance: Tidy Programming in R},
      url = {https://www.datsci.nfkatzke.com/posts/2020-02-24-tidyprogramming/},
      year = {2025}
    }