Introduction

  • Aim of this session is to recap and give you a bit more examples and exposure to dplyr and the tidyverse.

Tidyverse

In summary - the tidyverse package is a collection of various other packages, including:

  • dplyr : awesome for data wrangling

  • readr : awesome for importing data

  • ggplot2 : the standard for plotting in a tidy framework

  • tidyr : essential for manipulating dataframes

  • purrr : makes your code (and you) purrr with delight… you’ll see. Absolute synctatic sugar. Pure elation.

  • lubridate : Really the only package worthwhile when working with dates

Piping

Remember this is a key command in using the tidyverse is the piping call %>%.

  • x %>% f is equivalent to f(x)

  • x %>% f(y) is equivalent to f(x,y)

  • We can use a dot to refer to the data input: x %>% fun(.) is equivalent to f(x)

Type CTRL SHIFT M in Rstudio, and you will see the pipe appear. This will be your friend from now on.

You have already installed dplyr when you installed tidyverse, but just to be sure execute install.packages(“dplyr”).

E.g., using dplyr on our previous practical’s dataset:

# Do not run:
rmsfuns::load_pkg("tidyverse")
library(rmsfuns)
load_pkg("readr")
data <- fmxdat::BRICSTRI

data %>% select(Date, zar)

Notice that “zar” has now been selected. Also, VERY NB, piping does not change the structure of the data, unless you save it.

This is VERY useful, as it allows you to look at what the data would look like before saving it. To check this, type out View(data) and see that it has not changed the data frame after piping the select command.

To save a change, use the <- as follows:

# Do not run:
data_zar <- data %>% select(Date, zar)

dplyr

  • So, time to reboot how you think about dataframes.

  • dplyr requires you to think in a tidy way about your data - in particular, that your data should be in a long format.

  • The thinking behind dplyr is to oranize your data in such a way that it makes analysis simpler.

  • The idea behind tidy data can be described as follows:

  • What this means practically is that your dataframes should be long as opposed to wide.
library(tidyverse)
dailydata <- 
  fmxdat::DailyTRIs

Basic premise of tidy and non-tidy

Run the following code to see tidy vs non-tidy broken down in simple terms:

library(rmsfuns)
pkgstoload <- c("tidyverse","lubridate", "ggplot2")
load_pkg(pkgstoload)

datawide <- fmxdat::Wide_Example

head(datawide)
## # A tibble: 6 × 15
##   date       SubFactors AAA SJ…¹ BBB S…² CCC S…³ DDD S…⁴ EEE S…⁵ FFF S…⁶ TTT S…⁷
##   <date>     <chr>         <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 2002-01-31 EBIT.EV       15.8     7.78   14.8    13.6     12.5   10.4   -1.23 
## 2 2002-01-31 FCF.EV        10.5     6.78   12.3    16.0     13.5   12.0   -1.47 
## 3 2002-01-31 ROA           13.4    15.7    12.6     9.71    10.5    8.55  -1.42 
## 4 2002-01-31 ROE            3.22   15.9    11.9    10.9     11.9   14.3    0.499
## 5 2002-01-31 TRR6M1M       13.4    14.4     9.76   24.6     17.4   18.4    0.482
## 6 2002-01-31 Volat.D.60    11.4     9.82   12.5    13.0     15.6   21.3   -0.138
## # … with 6 more variables: `UUU SPGLOB Equity` <dbl>,
## #   `VVV SPGLOB Equity` <dbl>, `XXX SPGLOB Equity` <dbl>,
## #   `YYY SPGLOB Equity` <dbl>, `ZZZ SPGLOB Equity` <dbl>, AnotherColumn <chr>,
## #   and abbreviated variable names ¹​`AAA SJ Equity`, ²​`BBB SJ Equity`,
## #   ³​`CCC SJ Equity`, ⁴​`DDD SJ Equity`, ⁵​`EEE SJ Equity`, ⁶​`FFF SJ Equity`,
## #   ⁷​`TTT SPGLOB Equity`

Notice that in the df above, all the Tickers have their own columns. But as mentioned, by definition of tidyness, all columns must be a variable. As there are similarities between the wide columns (they are all Tickers) - they belong to the same column called Tickers!

So we need to make the wide data tidy by gathering all the ticker columns into a single column. I specifically mutated two random columns to show you how to gather only the Ticker columns…

data_tidy <- 
datawide %>% gather(key = Tickers, value =  Scores, contains(" Equity") )

head(data_tidy)
## # A tibble: 6 × 5
##   date       SubFactors AnotherColumn Tickers       Scores
##   <date>     <chr>      <chr>         <chr>          <dbl>
## 1 2002-01-31 EBIT.EV    Random        AAA SJ Equity  15.8 
## 2 2002-01-31 FCF.EV     Random        AAA SJ Equity  10.5 
## 3 2002-01-31 ROA        Random        AAA SJ Equity  13.4 
## 4 2002-01-31 ROE        Random        AAA SJ Equity   3.22
## 5 2002-01-31 TRR6M1M    Random        AAA SJ Equity  13.4 
## 6 2002-01-31 Volat.D.60 Random        AAA SJ Equity  11.4

Note how easy that was if you know the three inputs above:

  • key is the name column by which to distinguish the observations

  • Observation values are given by Scores

  • third, the columns to gather. Note I used contains(” Equity”), as all the Tickers end with Equity. Calling columns this way is useful, with other commands including: ends_with, one_of(…), etc (see cheatsheet!).

    You can also pass in vectors to gather or select by - test this yourself.

Let’s first create a universe distinction between SA and US determined by the SJ Equity or SPGlob part under Tickers:

data_tidy <- 
  
data_tidy %>% 
  
  mutate(Universe = ifelse(grepl(" SJ Equity", Tickers), "SA",
                           
                           ifelse(grepl(" SPGLOB Equity", Tickers), "SPGLOB", NA)))

If we now want to focus only on, e.g., the factor ROE in order to calculate the mean ROE, we simply pipe it in dplyr (don’t forget the dplyr cheatsheet):

data_tidy %>% 
  group_by(date, Universe, SubFactors) %>% 
  filter(!is.na(Scores)) %>%  # Filter only valid Scoress
  mutate( ZScores = (Scores - mean(Scores, na.rm = TRUE)) / 
            sd(Scores, na.rm = TRUE)) %>% ungroup() # ZScores column created
## # A tibble: 144 × 7
##    date       SubFactors AnotherColumn Tickers       Scores Universe ZScores
##    <date>     <chr>      <chr>         <chr>          <dbl> <chr>      <dbl>
##  1 2002-01-31 EBIT.EV    Random        AAA SJ Equity  15.8  SA         1.11 
##  2 2002-01-31 FCF.EV     Random        AAA SJ Equity  10.5  SA        -0.430
##  3 2002-01-31 ROA        Random        AAA SJ Equity  13.4  SA         0.627
##  4 2002-01-31 ROE        Random        AAA SJ Equity   3.22 SA        -1.85 
##  5 2002-01-31 TRR6M1M    Random        AAA SJ Equity  13.4  SA        -0.574
##  6 2002-01-31 Volat.D.60 Random        AAA SJ Equity  11.4  SA        -0.616
##  7 2002-02-28 EBIT.EV    Random        AAA SJ Equity   5.83 SA        -0.834
##  8 2002-02-28 FCF.EV     Random        AAA SJ Equity  11.6  SA         0.655
##  9 2002-02-28 ROA        Random        AAA SJ Equity   4.32 SA        -1.84 
## 10 2002-02-28 ROE        Random        AAA SJ Equity  15.0  SA         0.409
## # … with 134 more rows

Notice that a ZScores column has now been added to our dataframe with minimal effort. What you need to get right though (and focus with) is correctly grouping and accurately mutating. Tip: use ViewXL to check your calculation in excel if you are uncertain.

Let’s look at our data:

head(data_tidy)
## # A tibble: 6 × 6
##   date       SubFactors AnotherColumn Tickers       Scores Universe
##   <date>     <chr>      <chr>         <chr>          <dbl> <chr>   
## 1 2002-01-31 EBIT.EV    Random        AAA SJ Equity  15.8  SA      
## 2 2002-01-31 FCF.EV     Random        AAA SJ Equity  10.5  SA      
## 3 2002-01-31 ROA        Random        AAA SJ Equity  13.4  SA      
## 4 2002-01-31 ROE        Random        AAA SJ Equity   3.22 SA      
## 5 2002-01-31 TRR6M1M    Random        AAA SJ Equity  13.4  SA      
## 6 2002-01-31 Volat.D.60 Random        AAA SJ Equity  11.4  SA
  • Notice that each variable (Ticker) has its own column - by the definition we saw earlier, this is not tidy.

    • To get it tidy, let’s gather the variables and their values. Notice that all the columns are Tickers, and their values are their prices - so let’s gather accordingly:
Tidy_dailydata <- 
dailydata %>% gather(Ticker, Prices, -Date)
# Translated to English, this means gather all columns except for the Date column - and call the variables "Ticker" and their corresponding values: Prices.

# Check it out:
  head(Tidy_dailydata)
## # A tibble: 6 × 3
##   Date       Ticker Prices
##   <date>     <chr>   <dbl>
## 1 2003-01-02 ABI SJ   4880
## 2 2003-01-03 ABI SJ   4880
## 3 2003-01-06 ABI SJ   4880
## 4 2003-01-07 ABI SJ   4925
## 5 2003-01-08 ABI SJ   4881
## 6 2003-01-09 ABI SJ   4881
# To spread this again, use:
Tidy_dailydata %>% spread(Ticker, Prices)
  • So what is so amazing about it now being tidy?

    • See how easy it is now to perform analysis on our dataframe:
Tidy_dailydata %>% 
  group_by(Ticker) %>%  # Effectively you are looping across Tickers 
  mutate(Returns = Prices / lag(Prices) - 1)   # You are now creating a column called returns...
## # A tibble: 36,573 × 4
## # Groups:   Ticker [73]
##    Date       Ticker Prices  Returns
##    <date>     <chr>   <dbl>    <dbl>
##  1 2003-01-02 ABI SJ   4880 NA      
##  2 2003-01-03 ABI SJ   4880  0      
##  3 2003-01-06 ABI SJ   4880  0      
##  4 2003-01-07 ABI SJ   4925  0.00922
##  5 2003-01-08 ABI SJ   4881 -0.00893
##  6 2003-01-09 ABI SJ   4881  0      
##  7 2003-01-10 ABI SJ   4900  0.00389
##  8 2003-01-13 ABI SJ   4950  0.0102 
##  9 2003-01-14 ABI SJ   4980  0.00606
## 10 2003-01-15 ABI SJ   4910 -0.0141 
## # … with 36,563 more rows

Notice the use above of the all powerful group_by call… Also, mutate adds a column.

  • Notice how you were able to skip the need for a loop by using group_by! Of course you can group by many other column definitions.

Let’s check our data and remove Tickers with only NA for prices for a given month:

Tidy_dailydata_Adjusted_No_Full_NA_Per_Month <- 
  
Tidy_dailydata %>% 
  
  mutate(Year_Month = format(Date, "%Y%B")) %>% 
  
  group_by(Year_Month, Ticker) %>% 
  
  mutate(N_Obs = n(), N_NA = sum(is.na(Prices)) ) %>%  # Count amount of dates in a month (N_Obs) and also amount of NA per month per Ticker N_NA
  
  filter( N_Obs != N_NA) %>% ungroup()

print(Tidy_dailydata)
## # A tibble: 36,573 × 3
##    Date       Ticker Prices
##    <date>     <chr>   <dbl>
##  1 2003-01-02 ABI SJ   4880
##  2 2003-01-03 ABI SJ   4880
##  3 2003-01-06 ABI SJ   4880
##  4 2003-01-07 ABI SJ   4925
##  5 2003-01-08 ABI SJ   4881
##  6 2003-01-09 ABI SJ   4881
##  7 2003-01-10 ABI SJ   4900
##  8 2003-01-13 ABI SJ   4950
##  9 2003-01-14 ABI SJ   4980
## 10 2003-01-15 ABI SJ   4910
## # … with 36,563 more rows

Notice the use of filter above. We were able to filter out stocks for months where all the values for that particular stock was NA. That’s a lot happening in only a few lines!

Also notice that we ALWAYS ALWAYS ALWAYS should ungroup() at the end of a pipe. This stops us from accidentally using grouping information unintended downstream…

Let’s now e.g. create a few monthly statistics, in particular the following:

* Monthly Returns

* Monthly Minimum and maximum daily returns
# To achieve the above, we need to now do the following:
  # Step one is to calculate the daily returns, and then calculate the Monthly minimum and maximum for each stock.
  # Only then do we calculate the monthly returns for each stock:

Tidy_dailydata_Adjusted_No_Full_NA_Per_Month %>% 
  
  
  mutate(Year_Month = format(Date, "%Y%B")) %>%
  
  
  arrange(Date) %>%  # Make sure your date column is arranged from earliest to most recent...
  
  group_by(Ticker) %>% 
  
  mutate(DailyReturn = Prices / lag(Prices) - 1) %>% 
  
  group_by( Year_Month, Ticker) %>% 
  
  mutate( across(DailyReturn, .fns = list( Min = ~min(., na.rm = TRUE), Max = ~max(., na.rm = TRUE), Mean = ~mean(., na.rm = TRUE)) ) ) %>% 
  
  filter(Date == last(Date)) %>% 
  
  group_by(Ticker) %>% # Careful - if you kept the group_by(Year_Month, Ticker) - it would have led to only NAs, as eah Month, Ticker only now has one value - and we need only group_by Ticker..
  
  mutate(Monthly_Returns = Prices / lag(Prices) - 1) %>% arrange(Ticker) %>% 
  
  ungroup() %>% 
  
  select(Date, Ticker, DailyReturn:Monthly_Returns)
## # A tibble: 830 × 7
##    Date       Ticker DailyReturn DailyReturn_Min DailyReturn_…¹ DailyR…² Month…³
##    <date>     <chr>        <dbl>           <dbl>          <dbl>    <dbl>   <dbl>
##  1 2003-01-31 ABI SJ   -0.000213        -0.0244          0.0102 -1.76e-3 NA     
##  2 2003-02-28 ABI SJ    0.0265          -0.0408          0.0265  1.68e-3  0.0319
##  3 2003-03-31 ABI SJ   -0.0215          -0.0396          0.0269 -3.07e-3 -0.0619
##  4 2003-04-30 ABI SJ    0.0314          -0.0213          0.0314  3.99e-3  0.0769
##  5 2003-05-30 ABI SJ    0               -0.00786         0.0398  1.95e-3  0.0408
##  6 2003-06-30 ABI SJ    0.00368         -0.0265          0.0378  3.44e-3  0.0686
##  7 2003-07-31 ABI SJ   -0.0128          -0.0191          0.0493  2.61e-3  0.0596
##  8 2003-08-29 ABI SJ    0               -0.0185          0.0188  1.87e-3  0.0390
##  9 2003-09-30 ABI SJ    0               -0.0207          0.0290  4.74e-5  0     
## 10 2003-10-31 ABI SJ   -0.00952         -0.0455          0.0427  1.86e-3  0.0400
## # … with 820 more rows, and abbreviated variable names ¹​DailyReturn_Max,
## #   ²​DailyReturn_Mean, ³​Monthly_Returns

Notice above the use of across. This is powerful, as it allows the application of many function in a group_by call on many columns - let the power of that sink in…

Notice too our use of select. select can also be used in the following form:

select(starts_with("Pattern"))

select(ends_with("Pattern"))

select(contains("Pattern"))
  • Let’s take it one step further and calculate each stock’s monthly Sharpe Ratio, i.e. Mean/SD for each month… this requires us to use summarise, and join two dataframes.

Notice in the following function the use of summarise. Summarise collapses the given columns by the grouping specified:

SD_df <- 
Tidy_dailydata_Adjusted_No_Full_NA_Per_Month %>% 
  
  mutate(Year_Month = format(Date, "%Y%B")) %>% 
  
  arrange(Date) %>%  # Make sure your date column is arranged from earliest to most recent...
  
  group_by(Ticker) %>% 
  
  mutate(DailyReturn = Prices / lag(Prices) - 1) %>% 
  
  group_by(Year_Month, Ticker) %>% 
  
  # Calculate each stock's Sharpe Ratio:
  summarise(Sharpe = mean(DailyReturn, na.rm = TRUE) / sd(DailyReturn, na.rm = TRUE)) %>% ungroup()

# Notice that now we have collapse our dataframe using summarise to do exactly what it says: we summarise for each Ticker and the month's standard deviation. Read this sentence again and make sure you understand the intuition behind summarise.
print(SD_df)
## # A tibble: 830 × 3
##    Year_Month Ticker  Sharpe
##    <chr>      <chr>    <dbl>
##  1 2003April  ABI SJ  0.305 
##  2 2003April  ACL SJ -0.112 
##  3 2003April  AGL SJ -0.322 
##  4 2003April  AMS SJ  0.267 
##  5 2003April  ANG SJ  0.165 
##  6 2003April  AOD SJ -0.270 
##  7 2003April  AVG SJ -0.274 
##  8 2003April  BAW SJ -0.0345
##  9 2003April  BGA SJ  0.272 
## 10 2003April  BIL SJ -0.231 
## # … with 820 more rows
Wrangling dates

To filter dates, e.g., use filter as per normal:

Tidy_dailydata_Adjusted_No_Full_NA_Per_Month %>% 
  
  mutate(Date = ymd(Date) ) %>% # Make Date column a lubridate column
  
  filter(Date >= ymd(20160806) & Date <= ymd(20170101)) 

DYI section

More Advanced Dataset

  • To really appreciate the full power of dplyr, let’s pull in a more advanced dataset.

    • The following is the daily prices of SA, SP500 and NKY stocks from 2017.

    • Let’s play:

library(tidyverse)

Daily_TRI <- 
  
  fmxdat::Fin_Data_SA_US_NKY

Let’s look at our dataframe:

Daily_TRI %>% head()
## # A tibble: 6 × 8
##   date       Tickers         TRI Short.Name      Marke…¹ BICS_…² BICS_…³ Unive…⁴
##   <date>     <chr>         <dbl> <chr>             <dbl> <chr>   <chr>   <chr>  
## 1 2017-01-02 ACT SJ Equity  1.17 AFROCENTRIC IN…  2.83e8 Health… Health… JALSHA…
## 2 2017-01-03 ACT SJ Equity  1.13 AFROCENTRIC IN…  2.83e8 Health… Health… JALSHA…
## 3 2017-01-04 ACT SJ Equity  1.18 AFROCENTRIC IN…  2.83e8 Health… Health… JALSHA…
## 4 2017-01-05 ACT SJ Equity  1.18 AFROCENTRIC IN…  2.83e8 Health… Health… JALSHA…
## 5 2017-01-06 ACT SJ Equity  1.17 AFROCENTRIC IN…  2.83e8 Health… Health… JALSHA…
## 6 2017-01-09 ACT SJ Equity  1.24 AFROCENTRIC IN…  2.83e8 Health… Health… JALSHA…
## # … with abbreviated variable names ¹​Market.Cap, ²​BICS_LEVEL_1_SECTOR_NAME,
## #   ³​BICS_LEVEL_2_INDUSTRY_GROUP_NAME, ⁴​Universe
  • So, we have the market cap, sector and subsector for each country (NKY, SP500 and JALSH).

  • Let’s calculate the value of $1 put into DAIKIN and $1 put into SUZUKI and KAWASAKI shares since 2017:

  • Step 1: Find Ticker names for each stock:

# Below is a quick way of pulling and looking at the unique Short.Names to find the stocks' Short.Names:

Daily_TRI %>% pull(Short.Name ) %>% unique()

Notice that pull takes a column and transforms it into a vector. This is at times extremely useful, as it can then e.g. be used as an input to another function.

  • So after finding the Short.Names, let’s filter our dataset to only these stocks, and compare their Dollar returns since 2017:
Daily_TRI_Three <- 
  
Daily_TRI %>% 
  
  filter(grepl(c( "DAIKIN|KAWASAKI|SUZUKI"), Short.Name)) %>% 
  
  group_by(Tickers) %>% mutate(Returns = TRI/lag(TRI) - 1) %>% ungroup()
  • Now how do we calculate the cumulative returns of each stock? Remember - if there is a NA, R will not know how to deal with it in a cumulative chain…

Note below the use of coalesce: coalesce(Column, 0) - this can be interpreted as saying that NAs in Column should be replaced by 0.

  • So, back to our comparison:
Daily_TRI_Three %>% 
  
  mutate(Returns = coalesce(Returns, 0)) %>%  # Override Return column NAs...
  
  group_by(Tickers) %>%
  
  mutate(Cum_Return = cumprod( 1 + Returns)) 
## # A tibble: 849 × 10
## # Groups:   Tickers [3]
##    date       Tickers       TRI Short…¹ Marke…² BICS_…³ BICS_…⁴ Unive…⁵  Returns
##    <date>     <chr>       <dbl> <chr>     <dbl> <chr>   <chr>   <chr>      <dbl>
##  1 2017-01-02 6367 JP Eq…  108. DAIKIN… 2.91e10 Indust… Electr… NKY      0      
##  2 2017-01-03 6367 JP Eq…  108. DAIKIN… 2.91e10 Indust… Electr… NKY      0      
##  3 2017-01-04 6367 JP Eq…  111. DAIKIN… 2.91e10 Indust… Electr… NKY      0.0326 
##  4 2017-01-05 6367 JP Eq…  112. DAIKIN… 2.91e10 Indust… Electr… NKY      0.00886
##  5 2017-01-06 6367 JP Eq…  111. DAIKIN… 2.91e10 Indust… Electr… NKY     -0.00856
##  6 2017-01-09 6367 JP Eq…  111. DAIKIN… 2.91e10 Indust… Electr… NKY      0      
##  7 2017-01-10 6367 JP Eq…  113. DAIKIN… 2.91e10 Indust… Electr… NKY      0.0172 
##  8 2017-01-11 6367 JP Eq…  113. DAIKIN… 2.91e10 Indust… Electr… NKY     -0.00408
##  9 2017-01-12 6367 JP Eq…  115. DAIKIN… 2.91e10 Indust… Electr… NKY      0.0195 
## 10 2017-01-13 6367 JP Eq…  115. DAIKIN… 2.91e10 Indust… Electr… NKY      0.00118
## # … with 839 more rows, 1 more variable: Cum_Return <dbl>, and abbreviated
## #   variable names ¹​Short.Name, ²​Market.Cap, ³​BICS_LEVEL_1_SECTOR_NAME,
## #   ⁴​BICS_LEVEL_2_INDUSTRY_GROUP_NAME, ⁵​Universe

Note how we can also include tables in our html output:

  Daily_TRI_Three %>% 
  
  arrange(date) %>% 
  
  group_by(Tickers) %>% 
  mutate(Return = TRI / lag(TRI) - 1) %>% 
  
  mutate( Return = coalesce(Return, 0 )) %>% 
  
  mutate( CumProd = cumprod(1+Return)  ) %>% 
  
  filter(date == last(date)) %>% knitr::kable()
date Tickers TRI Short.Name Market.Cap BICS_LEVEL_1_SECTOR_NAME BICS_LEVEL_2_INDUSTRY_GROUP_NAME Universe Returns Return CumProd
2018-01-31 6367 JP Equity 141.68359 DAIKIN INDS 35098091520 Industrials Electrical Equipment NKY -0.0194480 -0.0194480 1.316599
2018-01-31 7269 JP Equity 64.45889 SUZUKI MOTOR 27963246592 Consumer Discretionary Automotive NKY -0.0115126 -0.0115126 1.632005
2018-01-31 9107 JP Equity 35.81591 KAWASAKI KISEN 2485033728 Industrials Transportation & Logistics NKY -0.0200886 -0.0200886 1.164372

Practice Section:

Question 1

Use the Crypto currency database, fmxdat::cryptos, and do the following:

  1. Calculate performance consistency (N-period rolling)
  • calculate the monthly return ranking of each crypto, for the last N years.

  • Then aggregate these consistencies to a single ranking of all the rolling periods

Note: Create an appropriate folder, with a description README and functions stored that do the wrangling.

Question 2

  1. Index Return Calculation

Using the following data file: fmxdat::Fin_Data_SA_US_NKY

Please note this is a small subset of the universes and so NOT indicative of actual returns!

  • Create an organized folder called: Index_Return_Comparison

  • Have a proper README that details every step of your analysis (you will send this to me on github when completed).

  • Then do the following:

    • Calculate the cap-weighted monthly index returns for every sector for each country (assume Market Cap applies to first day of the month).

      • Careful: some stocks might have different last days of the month - calculate monthly returns for each stock’s last day of the month.

      • Some stocks may have date gaps (i.e. be in index from 2005 - 2009, drop out and reenter in 2013. This will cause return distortions if Px/lag(Px)-1 is simply used.)

      • This means you should make returns NA if date gap is larger than 35.

    • Compare the sharpe ratios for the equally- and cap-weighted broad market indexes by country since 2017.

    • Calculate the JALSH index returns, by filtering the top 95% of Market Cap. (Tip: use arrange and cumsum of weights… careful with your group_bys…)