An Analysis of Metrics in Predicting Economic Performance based on the Modern Portfolio Theory

Note: This work is adapted from a project done by Daniel Kim, Josiah Wedgwood, and Michael Thompson for MATH420 for Spring 2020

Introduction

Every day, we face decisions that make us consider the risk of obtaining a reward and whether we've seen that decision before and how that might help us now and in the future. This applies not only to individuals, but all the way up to big corporations, governments, and international organizations. For this tutorial, I consider the idea of the risk-reward trade-off as well as the idea that the past can affect decisions made now and in the future in the context of the stock market, with a specific grouping of inherently risky assets.

In Section 1, I first demonstrate the data collection and preparation by detailing which risky assets I will consider for this tutorial as well as define certain fundamental metrics to assist in my later analysis. Using the values of these terms for the assets, I perform some preliminary analysis to identify "key years" and time periods that may be of interest in my later analysis and do some data fitting to demonstrate why these fundamental metrics are not sufficient in predicting future economic performance.

In Section 2, I define the metrics I will use to attempt in predicting future economic performance and analyze the results of these metrics on the select group of risky assets.

In Section 3, I apply regression using the fundamental metrics to determine if these metrics can serve as a reliable indicator of future economic performance

In the Conclusion, I summarize the results from above

Section 1: Data Collection and Preparation

Section 1.1: The data

The assets I selected are

  • VFINX: the Vanguard 500 Index Fund
  • VBMFX: the Vanguard Total Bond Market Index Fund
  • VGSLX: the Vanguard Real Estate Index Fund (VGSLX)
  • VGTSX: the Vanguard Total International Stock Index Fund
  • AAPL : Apple
  • MS : Morgan Stanley
  • XOM : Exxon-Mobil

I chose a mix of index funds and individual assets to represent a typical investor's portfolio, as index funds represent a certain sector of the market (in essence, you would be "investing in the market itself") and the individual assets represent common lucrative investments. I considered adjusted closing share prices from 2005-2019 compiled from Yahoo Finance in this tutorial. By considering adjusted closing share prices, I automatically account for any corporate actions such as dividends and stock splits (for more information seek this link. The time period was chosen arbitrarily.

The following code creates a pandas dataframe containing adjusted closing prices for each of the seven assets above for the chosen time period

In [12]:
# if not installed, run the following:
# !pip install yfinance
import yfinance as yf

ASSETS = ["VFINX", "VBMFX", "VGSLX", "VGTSX", "AAPL", "MS", "XOM"]

l = []

# download the data for each asset
for asset in ASSETS:
    temp = yf.download(asset, start="2005-01-01", end="2020-01-01")
    temp = temp.dropna() # remove any rows with Nan values
    temp = temp[['Adj Close']] # take only the adjusted closing prices
    temp.columns = [asset] # rename the column as the name of the asset
    l.append(temp)
    
# build the adjust closing prices dataframe
adj_closing = l[0]

for i in range(1, len(l)):
    adj_closing = adj_closing.join(l[i], how='inner')
    
print(adj_closing)
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
                 VFINX      VBMFX       VGSLX      VGTSX       AAPL  \
Date                                                                  
2005-01-03   81.628288   5.993173   43.053074   8.289939   0.974949   
2005-01-04   80.684631   5.975669   42.514774   8.204066   0.984962   
2005-01-05   80.397194   5.975669   40.948814   8.131405   0.993589   
2005-01-06   80.699440   5.981504   41.236992   8.091776   0.994359   
2005-01-07   80.581459   5.975669   41.231556   8.071958   1.066760   
...                ...        ...         ...        ...        ...   
2019-12-24  293.603973  10.807795  126.259598  17.530840  70.459007   
2019-12-26  295.134216  10.817575  126.923157  17.599895  71.856941   
2019-12-27  295.144043  10.827355  127.342743  17.668953  71.829674   
2019-12-30  293.475647  10.827355  127.430565  17.580164  72.255997   
2019-12-31  294.354279  10.831276  128.396622  17.619627  72.783936   

                   MS        XOM  
Date                              
2005-01-03  34.644173  30.436514  
2005-01-04  34.272320  30.229927  
2005-01-05  34.074001  30.071934  
2005-01-06  34.879677  30.454744  
2005-01-07  34.743336  30.254225  
...               ...        ...  
2019-12-24  49.210163  64.841568  
2019-12-26  49.617897  64.943436  
2019-12-27  49.598480  64.721184  
2019-12-30  49.472275  64.341515  
2019-12-31  49.627602  64.619324  

[3775 rows x 7 columns]

Section 1.2: Fundamental Metrics

In order to gauge market performance per asset, the adjusted closing prices need to interpreted in terms of return on investments, rather than a raw price difference. For example, consider assets $A$ and $B$, valued at \$1 and \\$1000 total respectively, and at the end of the year they were valued at \$1.50 and \\$1010 total respectively. Interpretting just the adjusted closing prices, asset $A$ gained just \$0.50 whereas asset $B$ gained \\$10. However, the return on investment for $A$ was 50% yet for $B$ it was just 1%.

To further demonstrate this point, the following code plots the adjusted closing prices over time per asset

In [13]:
import matplotlib.pyplot as plt

# plot the adjusted closing prices per asset over time
adj_closing.plot(kind='line', figsize=(10,6))
plt.title("Adjusted Closing Prices over Time")
plt.xlabel("Year")
plt.ylabel("adjusted closing price")
plt.show()

From the above graph, it is not immediately obvious which assets performed better economically. With this in mind, I introduce the following metrics:

Let $s_i(d)$ be the share price of asset $i$ at the close of the $d^\textrm{th}$ trading day of a period that has $D$ trading days. Because the share price of any given asset has little economic significance as shown above, the price ratio over the course of a given day $d$ is given by

$$\frac{s_i(d)}{s_i(d-1)}$$

I further modify this price ratio to obtain the so-called daily return which is defined as

$$r_i(d) \equiv \frac{s_i(d)}{s_i(d-1)} - 1$$

For the given time period, I define the daily return to be 0 for all assets on the first trading day of the given time period.

Using the daily returns per asset per year, I can calculate the return mean per asset per year, which measures the trend of the share price over the course of that year:

$$m_i = \frac{1}{D}\sum_{d=1}^{D}r_i(d)$$

The following code creates two dataframes containing the daily return and return mean of each asset per year using the adjusted closing prices parsed from above. Additionally, I plot the return means to get a better understanding of which assets performed better economically.

In [14]:
# build the daily returns dataframe
daily_return = adj_closing.copy()

for row in range(1, len(daily_return)):
    daily_return.iloc[row] = adj_closing.iloc[row] / adj_closing.iloc[row-1]
    daily_return.iloc[row] -= 1
    
daily_return.iloc[0] = 0 # by definition
    
print(daily_return)
               VFINX     VBMFX     VGSLX     VGTSX      AAPL        MS  \
Date                                                                     
2005-01-03  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000   
2005-01-04 -0.011560 -0.002921 -0.012503 -0.010359  0.010270 -0.010733   
2005-01-05 -0.003562  0.000000 -0.036833 -0.008857  0.008759 -0.005787   
2005-01-06  0.003759  0.000976  0.007038 -0.004874  0.000775  0.023645   
2005-01-07 -0.001462 -0.000975 -0.000132 -0.002449  0.072811 -0.003909   
...              ...       ...       ...       ...       ...       ...   
2019-12-24 -0.000101  0.000906  0.001626  0.000000  0.000951 -0.000591   
2019-12-26  0.005212  0.000905  0.005256  0.003939  0.019840  0.008286   
2019-12-27  0.000033  0.000904  0.003306  0.003924 -0.000379 -0.000391   
2019-12-30 -0.005653  0.000000  0.000690 -0.005025  0.005935 -0.002545   
2019-12-31  0.002994  0.000362  0.007581  0.002245  0.007307  0.003140   

                 XOM  
Date                  
2005-01-03  0.000000  
2005-01-04 -0.006787  
2005-01-05 -0.005226  
2005-01-06  0.012730  
2005-01-07 -0.006584  
...              ...  
2019-12-24 -0.003841  
2019-12-26  0.001571  
2019-12-27 -0.003422  
2019-12-30 -0.005866  
2019-12-31  0.004318  

[3775 rows x 7 columns]
In [15]:
import pandas as pd

YEARS = [i for i in range(2005, 2020)]

# build the return means dataframe
temp = daily_return.copy()
temp = temp.reset_index()
temp['Date'] = pd.to_datetime(temp['Date'])

# average by year
return_mean = temp.groupby(temp['Date'].dt.year).transform('mean')
return_mean = return_mean.drop_duplicates().drop(columns=['Date'])
return_mean['Year'] = YEARS
return_mean = return_mean.set_index(['Year'])

print(return_mean)
            VFINX        VBMFX        VGSLX        VGTSX         AAPL  \
Year                                                                    
2005  0.000238319  9.73445e-05  0.000485789  0.000611337   0.00355838   
2006  0.000599128  0.000170276   0.00118684  0.000981405  0.000949497   
2007  0.000259956  0.000271044  -0.00064783  0.000643396   0.00366398   
2008  -0.00149448  0.000204022 -0.000809288  -0.00191891  -0.00264584   
2009   0.00108036  0.000233599   0.00185905   0.00141025   0.00381879   
2010  0.000616142  0.000246222   0.00115063  0.000507284   0.00183181   
2011  0.000184494  0.000294101  0.000507075  -0.00048604    0.0010398   
2012  0.000619835  0.000161052  0.000688646  0.000718009   0.00129869   
2013    0.0011317 -8.79042e-05  0.000142592  0.000585688   0.00047173   
2014  0.000528962  0.000224118   0.00107754  -0.00014865    0.0014465   
2015  9.60714e-05  1.49288e-05  0.000151577 -0.000131778  1.99162e-05   
2016   0.00047767   0.00010063  0.000380222   0.00023556  0.000574664   
2017   0.00079027  0.000137661  0.000211385  0.000975027   0.00163656   
2018 -0.000127602 -3.16139e-06  -0.00019414  -0.00058582 -5.72621e-05   
2019   0.00111292  0.000330832   0.00103735   0.00079264    0.0026646   

               MS          XOM  
Year                            
2005  0.000227675  0.000639695  
2006   0.00158682   0.00138792  
2007 -0.000637033  0.000979271  
2008  -0.00125814 -3.48981e-05  
2009   0.00377683  -0.00040058  
2010 -8.40429e-05  0.000446822  
2011   -0.0015929  0.000806785  
2012   0.00129923  0.000227342  
2013   0.00215492  0.000761229  
2014  0.000980883 -0.000193789  
2015 -0.000594363  -0.00044303  
2016   0.00143401  0.000791899  
2017   0.00102624 -0.000129908  
2018 -0.000880669 -0.000556578  
2019   0.00123031  0.000343658  
/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/generic.py:1513: FutureWarning: DataFrame.mean and DataFrame.median with numeric_only=None will include datetime64 and datetime64tz columns in a future version.
  fast_path = lambda group: getattr(group, func)(*args, **kwargs)
In [16]:
# plot the return means per asset over time
return_mean.plot(kind='line', figsize=(10,6))
plt.title("Return Means over Time")
plt.xlabel("Year")
plt.ylabel("return mean")
plt.show()

With this above plot, we get a much better sense of which assets performed better economically. A concrete example would be the difference between VFINX and AAPL, the Vanguard 500 Index Fund and Apple respectively. From the Adjusted Closing Price graph, VFINX seemed to be doing incredible well, as its share price soared in later years while AAPL didn't appear so. However, the above graph of Return Means reveals that in fact AAPL was doing much better in terms of return on investment overall compared to VFINX. Essentially, if you were to put an equal amount of investment in both assets, AAPL would yield a better return than VFINX.

Section 1.3: Preliminary Analysis

The Return Mean graph highlights interesting trends that were hidden by not completely undetectable in the Adjusted Closing Price graph, which allows us to pick out "key years" and time periods to more closely examine with my metrics in Section 2.

The first notable event that is shown in the graph is that there is a major drop in return means for all assets in 2008 (link). This reflects the U.S. Housing Market Crash at that time which severely affected the U.S. economy and was dubbed "The Great Recession".

Another notable event is directly after the Housing Market Crash, in 2009. From the graph, almost all of the return means recover (that is, become non-negative), except for Exxon-Mobil. Even though the stock market began to recover after the Housing Market Crash in 2008, the price of crude oil was still dropping at the time, which explains why Exxon-Mobil's return mean fell in 2009.

The next notable event was in 2011, where Morgan Stanley's return mean dropped sharply as the Security Exchange Commission (SEC) of the United States launched an investigation into the company for Improper Fee Arrangment (see this link for details). Initially, because many of the other assets in 2011 also experienced a drop in return means, we believed that Morgan Stanley's return mean was also dropping because of some common reason, which is true to a certain extent. This was during the time of the Sovereign Debt Crisis, which directly affected Morgan Stanley as a financial institution and indirectly affected our other assets. In other words, 2011 was a very difficult year for Morgan Stanley because of two separate reasons, and the return means for the company in that year reflects this.

Similar to 2008, there is a small dip in return means for all of our assets in 2015. This trend is explained by China's unstable stock market at the time (link), which was caused by investors making highly leveraged investments in China based on overly-optimistic predictions. This is eerily similar to the US Housing Market Crash in 2008, which was caused by similar highly leveraged investments. This trend is seen again in 2018, where the return means for all assets dropped, but this time it is explained by the start of the US-China trade war.

From the graph, Apple's return means seem to fluctuate wildly year-to-year, and sometimes follows the trends we've identified above and other times don't. This can be explained due to the launch of a new iPod in 2005, the introduction of the iPhone in 2007, and the revitalization of the iMac in 2009 (source).

Based on this preliminary analysis based purely on trends seen in the return means, we identified 2008, 2011, 2015, and 2018 to be years of note for the efficiency and proximity metrics. To further affirm the selection of these years, I used return variances to see if the explanations for the return means are repeated.

I define the return variance for asset $i$ to be

$$v_i = \frac{1}{D}\sum_{d=1}^{D}(r_i(d) - m_i)^2$$

where $D$ is the number of trading days in a given year, $r_i(d)$ is the daily return of asset $i$ on day $d$ of that year, and $m_i$ is the return mean of asset $i$ of that year from the previous section.

The following code creates a dataframe containing the return variance of each asset per year using the daily return and return mean dataframes above. Additionally, I plot the return variances to affirm the trends discussed above.

In [17]:
# build the return variance dataframe
temp = daily_return.copy()
temp = temp.reset_index()
temp['Date'] = pd.to_datetime(temp['Date'])

for row in range(len(temp)):
    year = int(temp.iloc[row]['Date'].year)
    temp.iloc[row,1:] = (temp.iloc[row,1:]-return_mean.loc[year])**2
    
# average by year
return_variance = temp.groupby(temp['Date'].dt.year).transform('mean')
return_variance = return_variance.drop_duplicates().drop(columns=['Date'])
return_variance['Year'] = YEARS
return_variance = return_variance.set_index(['Year'])

print(return_variance)
            VFINX        VBMFX        VGSLX        VGTSX         AAPL  \
Year                                                                    
2005  4.16158e-05  4.24186e-06  9.64084e-05   4.2461e-05  0.000595144   
2006  3.98834e-05  4.45727e-06  8.35149e-05  8.01144e-05  0.000586437   
2007  0.000101022  6.16865e-06  0.000258721  0.000137227  0.000562164   
2008   0.00066283  1.55624e-05   0.00220054  0.000750947   0.00133905   
2009  0.000293236  8.41315e-06   0.00178964  0.000336244  0.000454815   
2010  0.000128899  6.48708e-06  0.000310599  0.000177361  0.000283008   
2011  0.000214277  7.37523e-06  0.000359912  0.000268202  0.000272468   
2012  6.42568e-05  3.06934e-06  7.39406e-05  0.000101474  0.000343329   
2013  4.84226e-05  4.54375e-06   9.5874e-05  5.89287e-05  0.000322231   
2014  5.11867e-05  3.14745e-06  5.22538e-05  4.60883e-05   0.00018539   
2015  9.48514e-05  6.35477e-06  0.000117145  9.17535e-05  0.000282558   
2016   6.7788e-05  5.00811e-06  0.000114825  0.000110529  0.000215284   
2017  1.76214e-05  3.45482e-06  4.05479e-05  2.02884e-05   0.00012251   
2018  0.000115084  3.30998e-06  0.000103234  7.25374e-05  0.000326516   
2019  6.14483e-05  4.96946e-06  5.88541e-05  4.28754e-05   0.00027004   

               MS          XOM  
Year                            
2005  0.000178013  0.000211006  
2006  0.000160217  0.000145679  
2007  0.000530021  0.000224403  
2008   0.00762734   0.00105012  
2009   0.00262373  0.000267719  
2010  0.000440874  0.000127965  
2011    0.0014021   0.00025417  
2012  0.000602739  8.70691e-05  
2013  0.000315894  6.72202e-05  
2014  0.000185172  0.000107737  
2015  0.000264206  0.000200201  
2016  0.000419688  0.000144547  
2017  0.000171748   4.9407e-05  
2018  0.000294661  0.000189152  
2019  0.000215337  0.000133057  
/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/generic.py:1513: FutureWarning: DataFrame.mean and DataFrame.median with numeric_only=None will include datetime64 and datetime64tz columns in a future version.
  fast_path = lambda group: getattr(group, func)(*args, **kwargs)
In [18]:
# plot the return variance per asset over time
return_variance.plot(kind='line', figsize=(10,6))
plt.title("Return Variance over Time")
plt.xlabel("Year")
plt.ylabel("return variance")
plt.show()

With this above plot, we can confirm that 2008 is a key year as there is major return variance in that year for most assets, and there seems to be some variance in 2009 as well. We can also confirm that 2011 is a key year, esepcially for Morgan Stanley.

Based on this preliminary analysis based purely on trends seen in the return means and return variances, I decided to partition the years 2005-2019 into the following:

  • 2005-2008, as 2008 was the year of the Housing Market Crash
  • 2008-2011, as 2011 was the year of the Sovereign Debt Crisis
  • 2011-2015, as 2015 was the year of the Chinese Stock Market Crash
  • 2015-2019, as 2018 was the beginning of the US-China trade war

Section 2: Analysis

Section 2.1: Modern Portfolio Theory

In 1952, Harry Markowitz published this paper detailing what he dubbed the Modern Portfolio Theory. This paper marked a turning point in asset management as before then convential wisdom was to focus on individual assets rather than consider them as a whole portfolio.

Portfolio theories in general strive to maximize reward for a given risk, or on a related note minimize risk for a given reward. To quantify this, the theories quantify the notions of reward and risk and identify a class of ideal portfolios that meet this criteria. Markowitz in his paper chose to use the return mean as the proxy for reward of a portfolio and the volatility (the square root of the variance) as the proxy for its risk.

What was revoluntionary in Markowitz's theory is that he found that single asset portfolio is efficient (more on this later) and that by diversifying the assets in a given portfolio, the risk can be lowered while maintaining the same reward. A portfolio is defined to be more efficient than another if it promises a greater reward for the same risk. In terms of Markowitz's theory, we can quantify this as a portfolio with a greater return mean for no greater volatility.

While the exact calculation of this so-called efficient portfolio frontier, that is, the exact relation in the reward-risk plane that identifies the most efficient portfolio for a given reward or risk, is beyond the scope of this tutorial (for the mathematically-inclined reader see the above linked paper), I can use this idea of an idealized most-efficient portfolio to compare each asset's relative efficiency, and thus its relative performance.

Section 2.2: Efficiency and Proximity Metrics

I define efficiency $e_i$ and proximity $p_i$ of asset $i$ for a given year to be

$$e_i = \frac{m_i - m_{if}(\sigma_i)}{m_{ef}(\sigma_i) - m{if}(\sigma_i)}$$

$$p_i= \frac{\sigma_{ef}(m_i)}{\sigma_i}$$

where

  • ($m_i, \sigma_i$) is the return mean and volatility for asset $i$
  • $m_{if}(\sigma_i)$ is the most inefficient return mean for the given volatility of asset $i$
  • $m_{ef}(\sigma_i)$ is the most efficient return mean for the given volatility of asset $i$
  • $\sigma_{ef}(m_i)$ is the most efficient return volatility for the given return mean of asset $i$

The efficiency of an asset can be described as the proportion to which that asset's return mean lies between the return means of the most efficient and most inefficient return means for that asset's volatility. The proximity of an asset can likewise be described as the proportion between its volatility and the most efficient volatility for that asset's return mean.

The following code creates two maps containing the efficiency and proximity values of each asset per year using the return mean dataframe calculated from Section 1.2. To do this, I also calculate the covariance matrices between all assets for all years to model the volatility of a portfolio containing all of the assets. I then plot these two metrics over time for all assets.

In [19]:
import numpy as np

def calc_m_ef(vol, m_mv, v_as, vol_mv):
    '''
    vol - a 1xn array representing the volatilties of all the assets for a
    given year
    
    m_mv - a decimal value representing the return mean associated with the
    minimum volatility
    
    v_as - a decimal value representing the asympotote of the efficient
    frontier
    
    vol_mv - a decimal value representing the minimum volatility
    
    returns a 1xn array representing the return mean associated with the most 
    efficient portfolio
    '''
    vol = vol.astype(np.float64)
    return m_mv + v_as * np.sqrt(vol**2 - vol_mv**2)

def calc_m_if(vol, m_mv, v_as, vol_mv):
    '''
    vol - a 1xn array representing the volatilties of all the assets for a
    given year
    
    m_mv - a decimal value representing the return mean associated with the
    minimum volatility
    
    v_as - a decimal value representing the asympotote of the efficient
    frontier
    
    vol_mv - a decimal value representing the minimum volatility
    
    returns a 1xn array representing the return mean associated with the most 
    inefficient portfolio
    '''
    vol = vol.astype(np.float64)
    return m_mv - v_as * np.sqrt(vol**2 - vol_mv**2)

def calc_efficiency(m, V):
    '''
    m - a 1xn array representing the return means for all the assets for a
    given year
    
    V - a nxn matrix representing the covariance for all the assets for a given
    year
    
    returns a 1xn array representing the efficiency values for all assets for a
    given year
    '''
    y = np.linalg.solve(V.astype(np.float64), np.ones(len(V)))
    z = np.linalg.solve(V.astype(np.float64), m.astype(np.float64))
    a = np.sum(y)
    b = np.sum(z)
    c = np.dot(m, z)
    
    m_mv = b / a
    vol_mv = 1 / np.sqrt(a)
    v_as = np.sqrt((a*c-b**2)/a)
    
    vols = np.diagonal(V)
    
    m_ef = calc_m_ef(vols, m_mv, v_as, vol_mv)
    m_if = calc_m_if(vols, m_mv, v_as, vol_mv)
    
    return (m - m_if)/(m_ef - m_if)

def calc_proximity(V):
    '''
    V - a nxn matrix representing the covariance for all the assets for a given
    year
    
    returns a 1xn array representing the proximity values for all assets for a
    given year
    '''
    y = np.linalg.solve(V.astype(np.float64), np.ones(len(V)))
    a = np.sum(y)
    
    vol_mv = 1 / np.sqrt(a)
    
    vols = np.diagonal(V)
    
    return vol_mv/vols

# build the covariance map
covariance = dict.fromkeys(YEARS)

for year in YEARS:
    covariance[year] = np.zeros((len(ASSETS), len(ASSETS)))

temp = daily_return.copy()
temp = temp.reset_index()

for row in range(len(daily_return)):
    year = int(temp.iloc[row]['Date'].year)
    v = daily_return.iloc[row] - return_mean.loc[year]
    covariance[year] = np.add(covariance[year], np.outer(v, v))

# build the efficiency and proximity metric maps
efficiency = dict.fromkeys(YEARS)
proximity = dict.fromkeys(YEARS)

for year in YEARS:
    efficiency[year] = calc_efficiency(return_mean.loc[year], covariance[year])
    proximity[year] = calc_proximity(covariance[year])
<ipython-input-19-14632e459126>:20: RuntimeWarning: invalid value encountered in sqrt
  return m_mv + v_as * np.sqrt(vol**2 - vol_mv**2)
<ipython-input-19-14632e459126>:39: RuntimeWarning: invalid value encountered in sqrt
  return m_mv - v_as * np.sqrt(vol**2 - vol_mv**2)
In [20]:
# plot the metrics
plt.figure(figsize=(10,6))
plt.plot(list(efficiency.keys()), list(efficiency.values()))
plt.title("Efficiency of all assets")
plt.xlabel("year")
plt.ylabel("efficiency")
plt.legend(ASSETS)
plt.show()

plt.figure(figsize=(10,6))
plt.plot(list(proximity.keys()), list(proximity.values()))
plt.title("Proximity of all assets")
plt.xlabel("year")
plt.ylabel("proximity")
plt.legend(ASSETS)
plt.show()

From the above figures, the real estate index VGSLX efficiency plummets in 2006 to 2007. The efficiency for all of the other non-bond index funds also decreases from 2006 to 2007, but not as significantly. Also, we see every index fund's proximity fall farther away from the frontier from 2006 to 2007, and then all indices but VGSLX approached the frontier again in 2008. VGSLX fell even farther from the frontier in 2008.

The United States faced a housing bubble prior to 2007, just following the dotcom bubble burst by 2000. Speculators and investors from the technology sector moved onto real estate as a safer investment. The U.S. government at the same time implemented policies to combat the uncertainty in the market following the dotcom burst and even the September 11th Attacks in 2001 that cut interest rates and maintained those low rates overtime. The housing market grew saturated with money and credit such that home prices rose despite even more people trading houses.

As home prices continued to climb during the housing bubble, the prolonged, low interest rates on loans gained even more attraction from mortgage lenders. This led to banks rapidly collecting loans and then bundling them together into mortgage-backed securities and collateralized debt obligations to then resell to borrowers. These securities however could consist of unfulfilled loans from other borrowers, and these securities appealed to subprime borrowers because these securities had even lower interest rates than traditional loans. When the US house price dropped in 2006 it triggered widespread default on those who borrowed subprime mortgages through securities.

Given the market drop in proximity coincides with the housing bubble burst in 2007 and then the overall market collapse in 2008, we observe firsthand how the proximity of index funds is a rough leading indicator of market health within that index. The same cannot be said for the efficiency of an index fund, as it appears to be dependently fluctuate on current market conditions. Thus we see that efficiency is a roughly contemporary indicator.

Section 3: Machine Learning

Section 3.1: Exponential Regression on the Adjusted Closing Prices

From Section 1.2, there seems to be some form of exponential relation between an given asset's adjusted closing price and time. I test this idea with the following code:

In [21]:
from sklearn.metrics import r2_score

# plot adjusted closing data
adj_closing.plot(kind='line', figsize=(10,6))
plt.title("Adjusted Closing Prices over Time")
plt.xlabel("Year")
plt.ylabel("adjusted closing price")

# perform exponential regression
x = np.array([i+1 for i in range(len(adj_closing))])
for asset in ASSETS:
    coeff = np.polyfit(x, np.log(adj_closing[asset]), 1)
    y_hat = np.exp(coeff[1])*np.exp(coeff[0]*x)
    plt.plot(adj_closing.index, y_hat)
    print("R^2 for " + asset + ":", r2_score(adj_closing[asset], y_hat))
    
plt.show()
R^2 for VFINX: 0.9020955784820038
R^2 for VBMFX: 0.9316762833988496
R^2 for VGSLX: 0.8328275495478811
R^2 for VGTSX: 0.6053877549616555
R^2 for AAPL: 0.9052039130167352
R^2 for MS: -0.023345523797600576
R^2 for XOM: 0.6679475455065298

While exponential regression fits some of the asset's adjusted closing prices data fairly well, for other assets it does quite poorly. Therefore, we cannot concluded that adjusted closing prices will be an accurate model for predicting the future economic performance.

Conclusion

As it is clear from this tutorial, there is no accurate or fail-proof way to determine future economic performance of a given asset even with all of the asset's history. If this was the case, then quite literally anyone could simply perform some form of analysis on past market data in order to predict the outcome of future asset values.

However, something quite valuable about the efficiency and especially the proximity metrics that I defined above is that while it doesn't predict day-to-day performance of assets, what it can tell you is general market trends, especially before economic disasters such as the 2008 US Housing crisis and the 2015 China Stock Market Exchange collapse.

While this tutorial focused on a handful of selected assets, these results can easily be reproduced with any other assets that are publicly traded given that the price history for that asset is available. This tutorial is just a glimpse into the complicated world that is stock market trading, and there are thousands of other studies that attempt to find some way to gain an edge in the stock market.