Note: This work is adapted from a project done by Daniel Kim, Josiah Wedgwood, and Michael Thompson for MATH420 for Spring 2020
Every day, we face decisions that make us consider the risk of obtaining a reward and whether we've seen that decision before and how that might help us now and in the future. This applies not only to individuals, but all the way up to big corporations, governments, and international organizations. For this tutorial, I consider the idea of the risk-reward trade-off as well as the idea that the past can affect decisions made now and in the future in the context of the stock market, with a specific grouping of inherently risky assets.
In Section 1, I first demonstrate the data collection and preparation by detailing which risky assets I will consider for this tutorial as well as define certain fundamental metrics to assist in my later analysis. Using the values of these terms for the assets, I perform some preliminary analysis to identify "key years" and time periods that may be of interest in my later analysis and do some data fitting to demonstrate why these fundamental metrics are not sufficient in predicting future economic performance.
In Section 2, I define the metrics I will use to attempt in predicting future economic performance and analyze the results of these metrics on the select group of risky assets.
In Section 3, I apply regression using the fundamental metrics to determine if these metrics can serve as a reliable indicator of future economic performance
In the Conclusion, I summarize the results from above
The assets I selected are
VFINX
: the Vanguard 500 Index FundVBMFX
: the Vanguard Total Bond Market Index FundVGSLX
: the Vanguard Real Estate Index Fund (VGSLX) VGTSX
: the Vanguard Total International Stock Index Fund AAPL
: AppleMS
: Morgan Stanley XOM
: Exxon-MobilI chose a mix of index funds and individual assets to represent a typical investor's portfolio, as index funds represent a certain sector of the market (in essence, you would be "investing in the market itself") and the individual assets represent common lucrative investments. I considered adjusted closing share prices from 2005-2019
compiled from Yahoo Finance in this tutorial. By considering adjusted closing share prices, I automatically account for any corporate actions such as dividends and stock splits (for more information seek this link. The time period was chosen arbitrarily.
The following code creates a pandas
dataframe containing adjusted closing prices for each of the seven assets above for the chosen time period
# if not installed, run the following:
# !pip install yfinance
import yfinance as yf
ASSETS = ["VFINX", "VBMFX", "VGSLX", "VGTSX", "AAPL", "MS", "XOM"]
l = []
# download the data for each asset
for asset in ASSETS:
temp = yf.download(asset, start="2005-01-01", end="2020-01-01")
temp = temp.dropna() # remove any rows with Nan values
temp = temp[['Adj Close']] # take only the adjusted closing prices
temp.columns = [asset] # rename the column as the name of the asset
l.append(temp)
# build the adjust closing prices dataframe
adj_closing = l[0]
for i in range(1, len(l)):
adj_closing = adj_closing.join(l[i], how='inner')
print(adj_closing)
In order to gauge market performance per asset, the adjusted closing prices need to interpreted in terms of return on investments, rather than a raw price difference. For example, consider assets $A$ and $B$, valued at \$1 and \\$1000 total respectively, and at the end of the year they were valued at \$1.50 and \\$1010 total respectively. Interpretting just the adjusted closing prices, asset $A$ gained just \$0.50 whereas asset $B$ gained \\$10. However, the return on investment for $A$ was 50% yet for $B$ it was just 1%.
To further demonstrate this point, the following code plots the adjusted closing prices over time per asset
import matplotlib.pyplot as plt
# plot the adjusted closing prices per asset over time
adj_closing.plot(kind='line', figsize=(10,6))
plt.title("Adjusted Closing Prices over Time")
plt.xlabel("Year")
plt.ylabel("adjusted closing price")
plt.show()
From the above graph, it is not immediately obvious which assets performed better economically. With this in mind, I introduce the following metrics:
Let $s_i(d)$ be the share price of asset $i$ at the close of the $d^\textrm{th}$ trading day of a period that has $D$ trading days. Because the share price of any given asset has little economic significance as shown above, the price ratio
over the course of a given day $d$ is given by
I further modify this price ratio to obtain the so-called daily return
which is defined as
For the given time period, I define the daily return to be 0 for all assets on the first trading day of the given time period.
Using the daily returns per asset per year, I can calculate the return mean
per asset per year, which measures the trend of the share price over the course of that year:
The following code creates two dataframes containing the daily return
and return mean
of each asset per year using the adjusted closing
prices parsed from above. Additionally, I plot the return means to get a better understanding of which assets performed better economically.
# build the daily returns dataframe
daily_return = adj_closing.copy()
for row in range(1, len(daily_return)):
daily_return.iloc[row] = adj_closing.iloc[row] / adj_closing.iloc[row-1]
daily_return.iloc[row] -= 1
daily_return.iloc[0] = 0 # by definition
print(daily_return)
import pandas as pd
YEARS = [i for i in range(2005, 2020)]
# build the return means dataframe
temp = daily_return.copy()
temp = temp.reset_index()
temp['Date'] = pd.to_datetime(temp['Date'])
# average by year
return_mean = temp.groupby(temp['Date'].dt.year).transform('mean')
return_mean = return_mean.drop_duplicates().drop(columns=['Date'])
return_mean['Year'] = YEARS
return_mean = return_mean.set_index(['Year'])
print(return_mean)
# plot the return means per asset over time
return_mean.plot(kind='line', figsize=(10,6))
plt.title("Return Means over Time")
plt.xlabel("Year")
plt.ylabel("return mean")
plt.show()
With this above plot, we get a much better sense of which assets performed better economically. A concrete example would be the difference between VFINX
and AAPL
, the Vanguard 500 Index Fund and Apple respectively. From the Adjusted Closing Price graph, VFINX seemed to be doing incredible well, as its share price soared in later years while AAPL didn't appear so. However, the above graph of Return Means reveals that in fact AAPL was doing much better in terms of return on investment overall compared to VFINX. Essentially, if you were to put an equal amount of investment in both assets, AAPL would yield a better return than VFINX.
The Return Mean graph highlights interesting trends that were hidden by not completely undetectable in the Adjusted Closing Price graph, which allows us to pick out "key years" and time periods to more closely examine with my metrics in Section 2.
The first notable event that is shown in the graph is that there is a major drop in return means for all assets in 2008 (link). This reflects the U.S. Housing Market Crash at that time which severely affected the U.S. economy and was dubbed "The Great Recession".
Another notable event is directly after the Housing Market Crash, in 2009. From the graph, almost all of the return means recover (that is, become non-negative), except for Exxon-Mobil. Even though the stock market began to recover after the Housing Market Crash in 2008, the price of crude oil was still dropping at the time, which explains why Exxon-Mobil's return mean fell in 2009.
The next notable event was in 2011, where Morgan Stanley's return mean dropped sharply as the Security Exchange Commission (SEC) of the United States launched an investigation into the company for Improper Fee Arrangment (see this link for details). Initially, because many of the other assets in 2011 also experienced a drop in return means, we believed that Morgan Stanley's return mean was also dropping because of some common reason, which is true to a certain extent. This was during the time of the Sovereign Debt Crisis, which directly affected Morgan Stanley as a financial institution and indirectly affected our other assets. In other words, 2011 was a very difficult year for Morgan Stanley because of two separate reasons, and the return means for the company in that year reflects this.
Similar to 2008, there is a small dip in return means for all of our assets in 2015. This trend is explained by China's unstable stock market at the time (link), which was caused by investors making highly leveraged investments in China based on overly-optimistic predictions. This is eerily similar to the US Housing Market Crash in 2008, which was caused by similar highly leveraged investments. This trend is seen again in 2018, where the return means for all assets dropped, but this time it is explained by the start of the US-China trade war.
From the graph, Apple's return means seem to fluctuate wildly year-to-year, and sometimes follows the trends we've identified above and other times don't. This can be explained due to the launch of a new iPod in 2005, the introduction of the iPhone in 2007, and the revitalization of the iMac in 2009 (source).
Based on this preliminary analysis based purely on trends seen in the return means, we identified 2008, 2011, 2015, and 2018 to be years of note for the efficiency and proximity metrics. To further affirm the selection of these years, I used return variances to see if the explanations for the return means are repeated.
I define the return variance
for asset $i$ to be
where $D$ is the number of trading days in a given year, $r_i(d)$ is the daily return of asset $i$ on day $d$ of that year, and $m_i$ is the return mean of asset $i$ of that year from the previous section.
The following code creates a dataframe containing the return variance
of each asset per year using the daily return
and return mean
dataframes above. Additionally, I plot the return variances to affirm the trends discussed above.
# build the return variance dataframe
temp = daily_return.copy()
temp = temp.reset_index()
temp['Date'] = pd.to_datetime(temp['Date'])
for row in range(len(temp)):
year = int(temp.iloc[row]['Date'].year)
temp.iloc[row,1:] = (temp.iloc[row,1:]-return_mean.loc[year])**2
# average by year
return_variance = temp.groupby(temp['Date'].dt.year).transform('mean')
return_variance = return_variance.drop_duplicates().drop(columns=['Date'])
return_variance['Year'] = YEARS
return_variance = return_variance.set_index(['Year'])
print(return_variance)
# plot the return variance per asset over time
return_variance.plot(kind='line', figsize=(10,6))
plt.title("Return Variance over Time")
plt.xlabel("Year")
plt.ylabel("return variance")
plt.show()
With this above plot, we can confirm that 2008 is a key year as there is major return variance in that year for most assets, and there seems to be some variance in 2009 as well. We can also confirm that 2011 is a key year, esepcially for Morgan Stanley.
Based on this preliminary analysis based purely on trends seen in the return means and return variances, I decided to partition the years 2005-2019 into the following:
2005-2008
, as 2008 was the year of the Housing Market Crash2008-2011
, as 2011 was the year of the Sovereign Debt Crisis2011-2015
, as 2015 was the year of the Chinese Stock Market Crash2015-2019
, as 2018 was the beginning of the US-China trade warIn 1952, Harry Markowitz published this paper detailing what he dubbed the Modern Portfolio Theory. This paper marked a turning point in asset management as before then convential wisdom was to focus on individual assets rather than consider them as a whole portfolio.
Portfolio theories in general strive to maximize reward for a given risk, or on a related note minimize risk for a given reward. To quantify this, the theories quantify the notions of reward and risk and identify a class of ideal portfolios that meet this criteria. Markowitz in his paper chose to use the return mean as the proxy for reward of a portfolio and the volatility (the square root of the variance) as the proxy for its risk.
What was revoluntionary in Markowitz's theory is that he found that single asset portfolio is efficient (more on this later) and that by diversifying the assets in a given portfolio, the risk can be lowered while maintaining the same reward. A portfolio is defined to be more efficient than another if it promises a greater reward for the same risk. In terms of Markowitz's theory, we can quantify this as a portfolio with a greater return mean for no greater volatility.
While the exact calculation of this so-called efficient portfolio frontier, that is, the exact relation in the reward-risk plane that identifies the most efficient portfolio for a given reward or risk, is beyond the scope of this tutorial (for the mathematically-inclined reader see the above linked paper), I can use this idea of an idealized most-efficient portfolio to compare each asset's relative efficiency, and thus its relative performance.
I define efficiency
$e_i$ and proximity
$p_i$ of asset $i$ for a given year to be
where
The efficiency of an asset can be described as the proportion to which that asset's return mean lies between the return means of the most efficient and most inefficient return means for that asset's volatility. The proximity of an asset can likewise be described as the proportion between its volatility and the most efficient volatility for that asset's return mean.
The following code creates two maps containing the efficiency
and proximity
values of each asset per year using the return mean
dataframe calculated from Section 1.2. To do this, I also calculate the covariance matrices between all assets for all years to model the volatility of a portfolio containing all of the assets. I then plot these two metrics over time for all assets.
import numpy as np
def calc_m_ef(vol, m_mv, v_as, vol_mv):
'''
vol - a 1xn array representing the volatilties of all the assets for a
given year
m_mv - a decimal value representing the return mean associated with the
minimum volatility
v_as - a decimal value representing the asympotote of the efficient
frontier
vol_mv - a decimal value representing the minimum volatility
returns a 1xn array representing the return mean associated with the most
efficient portfolio
'''
vol = vol.astype(np.float64)
return m_mv + v_as * np.sqrt(vol**2 - vol_mv**2)
def calc_m_if(vol, m_mv, v_as, vol_mv):
'''
vol - a 1xn array representing the volatilties of all the assets for a
given year
m_mv - a decimal value representing the return mean associated with the
minimum volatility
v_as - a decimal value representing the asympotote of the efficient
frontier
vol_mv - a decimal value representing the minimum volatility
returns a 1xn array representing the return mean associated with the most
inefficient portfolio
'''
vol = vol.astype(np.float64)
return m_mv - v_as * np.sqrt(vol**2 - vol_mv**2)
def calc_efficiency(m, V):
'''
m - a 1xn array representing the return means for all the assets for a
given year
V - a nxn matrix representing the covariance for all the assets for a given
year
returns a 1xn array representing the efficiency values for all assets for a
given year
'''
y = np.linalg.solve(V.astype(np.float64), np.ones(len(V)))
z = np.linalg.solve(V.astype(np.float64), m.astype(np.float64))
a = np.sum(y)
b = np.sum(z)
c = np.dot(m, z)
m_mv = b / a
vol_mv = 1 / np.sqrt(a)
v_as = np.sqrt((a*c-b**2)/a)
vols = np.diagonal(V)
m_ef = calc_m_ef(vols, m_mv, v_as, vol_mv)
m_if = calc_m_if(vols, m_mv, v_as, vol_mv)
return (m - m_if)/(m_ef - m_if)
def calc_proximity(V):
'''
V - a nxn matrix representing the covariance for all the assets for a given
year
returns a 1xn array representing the proximity values for all assets for a
given year
'''
y = np.linalg.solve(V.astype(np.float64), np.ones(len(V)))
a = np.sum(y)
vol_mv = 1 / np.sqrt(a)
vols = np.diagonal(V)
return vol_mv/vols
# build the covariance map
covariance = dict.fromkeys(YEARS)
for year in YEARS:
covariance[year] = np.zeros((len(ASSETS), len(ASSETS)))
temp = daily_return.copy()
temp = temp.reset_index()
for row in range(len(daily_return)):
year = int(temp.iloc[row]['Date'].year)
v = daily_return.iloc[row] - return_mean.loc[year]
covariance[year] = np.add(covariance[year], np.outer(v, v))
# build the efficiency and proximity metric maps
efficiency = dict.fromkeys(YEARS)
proximity = dict.fromkeys(YEARS)
for year in YEARS:
efficiency[year] = calc_efficiency(return_mean.loc[year], covariance[year])
proximity[year] = calc_proximity(covariance[year])
# plot the metrics
plt.figure(figsize=(10,6))
plt.plot(list(efficiency.keys()), list(efficiency.values()))
plt.title("Efficiency of all assets")
plt.xlabel("year")
plt.ylabel("efficiency")
plt.legend(ASSETS)
plt.show()
plt.figure(figsize=(10,6))
plt.plot(list(proximity.keys()), list(proximity.values()))
plt.title("Proximity of all assets")
plt.xlabel("year")
plt.ylabel("proximity")
plt.legend(ASSETS)
plt.show()
From the above figures, the real estate index VGSLX efficiency plummets in 2006 to 2007. The efficiency for all of the other non-bond index funds also decreases from 2006 to 2007, but not as significantly. Also, we see every index fund's proximity fall farther away from the frontier from 2006 to 2007, and then all indices but VGSLX approached the frontier again in 2008. VGSLX fell even farther from the frontier in 2008.
The United States faced a housing bubble prior to 2007, just following the dotcom bubble burst by 2000. Speculators and investors from the technology sector moved onto real estate as a safer investment. The U.S. government at the same time implemented policies to combat the uncertainty in the market following the dotcom burst and even the September 11th Attacks in 2001 that cut interest rates and maintained those low rates overtime. The housing market grew saturated with money and credit such that home prices rose despite even more people trading houses.
As home prices continued to climb during the housing bubble, the prolonged, low interest rates on loans gained even more attraction from mortgage lenders. This led to banks rapidly collecting loans and then bundling them together into mortgage-backed securities and collateralized debt obligations to then resell to borrowers. These securities however could consist of unfulfilled loans from other borrowers, and these securities appealed to subprime borrowers because these securities had even lower interest rates than traditional loans. When the US house price dropped in 2006 it triggered widespread default on those who borrowed subprime mortgages through securities.
Given the market drop in proximity coincides with the housing bubble burst in 2007 and then the overall market collapse in 2008, we observe firsthand how the proximity of index funds is a rough leading indicator of market health within that index. The same cannot be said for the efficiency of an index fund, as it appears to be dependently fluctuate on current market conditions. Thus we see that efficiency is a roughly contemporary indicator.
from sklearn.metrics import r2_score
# plot adjusted closing data
adj_closing.plot(kind='line', figsize=(10,6))
plt.title("Adjusted Closing Prices over Time")
plt.xlabel("Year")
plt.ylabel("adjusted closing price")
# perform exponential regression
x = np.array([i+1 for i in range(len(adj_closing))])
for asset in ASSETS:
coeff = np.polyfit(x, np.log(adj_closing[asset]), 1)
y_hat = np.exp(coeff[1])*np.exp(coeff[0]*x)
plt.plot(adj_closing.index, y_hat)
print("R^2 for " + asset + ":", r2_score(adj_closing[asset], y_hat))
plt.show()
While exponential regression fits some of the asset's adjusted closing prices data fairly well, for other assets it does quite poorly. Therefore, we cannot concluded that adjusted closing prices will be an accurate model for predicting the future economic performance.
As it is clear from this tutorial, there is no accurate or fail-proof way to determine future economic performance of a given asset even with all of the asset's history. If this was the case, then quite literally anyone could simply perform some form of analysis on past market data in order to predict the outcome of future asset values.
However, something quite valuable about the efficiency
and especially the proximity
metrics that I defined above is that while it doesn't predict day-to-day performance of assets, what it can tell you is general market trends, especially before economic disasters such as the 2008 US Housing crisis and the 2015 China Stock Market Exchange collapse.
While this tutorial focused on a handful of selected assets, these results can easily be reproduced with any other assets that are publicly traded given that the price history for that asset is available. This tutorial is just a glimpse into the complicated world that is stock market trading, and there are thousands of other studies that attempt to find some way to gain an edge in the stock market.