Use Your Computer to Make Informed Decisions in Stock Trading: Practical Introduction — Part 8: Portfolio Optimisation
[HINT] You can read my previous articles on Medium or on the website PythonInvest (which features more dynamic content). The complete Python code (Colab notebooks) can be found on Github.
There are cases when you’d like to buy several stocks and don’t know how to split the money among them. In some other cases you may occasionally want to rebalance your existing portfolio, as the market dynamics is constantly changing and you may feel that you’re missing the opportunity to get more from your assets.
Finding the optimal portfolio weights for a selected set of stocks (you may hear similar terms like ‘finding optimal asset allocation’ or ‘building an efficient portfolio’) may seem an easy or unnecessary task in a period of prosperity and growth. But we will argue that this is a must-have step in a regular investment process with a thorough approach, because it can save you from collapse when the stock market crashes.
We provide an overview of portfolio metrics and build a scoring system to combine them all for a final decision. Alternatively, we show a short-cut solution of using PyPortfolioOpt library to find a long-term horizon investment portfolio.
Introduction
Let’s imagine you have a certain sum of money ($2000 for the purposes of this article) to invest and you decide to buy several stocks.
You should also have an idea where to invest, let’s assume this set of stocks for the purpose of an article : ‘BUD’, ‘CHTR’, ‘NKE’, ‘NVDA’, ‘PTR’, ‘SHOP’, ‘XOM’, ‘BA’. I’ve chosen these but, it can be any arbitrary set of stocks. Keep in mind that the amount of possible combinations grows very quickly with any new added stock, so you should try to limit the amount of companies to invest into.
The immediate problem you will run up against is how to split the sum into multiple investments.
It is a well known fact that you need to diversify assets to hedge risks (buying more stocks is generally better), but what does that mean exactly?
While many people will try intuitively to select an equal-split investment (or even bet everything on the best-performing stock), this is the wrong solution in most cases.
This article will explain why by looking into the main portfolio metrics, trying external portfolio optimisation routines (from the PyPortfolioOpt module), and checking all possible combinations of stocks composition (the brute force algorithm) to manually select the best suitable portfolio. Finally, the results will be supported with some visual graphs illustrating the idea of a portfolio optimisation.
Executive Summary
During the course of this article we learned how to measure the performance of a portfolio of stocks using different metrics of profitability and risk. As a quick solution we looked at the standard portfolio optimisation routines from the PyPortfolioOpt library, found the max-Sharpe and min-volatility portfolios, and the closest optimal discrete portfolio. We also reviewed the case of a short-term repetitive 3-days trading, when standard functions didn’t work and we had to write all the algorithms of the portfolio search and optimisation from scratch.
Summary of Results
- Correlation is important. More pairs of less correlated (or negatively correlated) stocks allow you to build a “better” and more balanced portfolio. Probably, not all companies from the input set will be included in the final portfolio, but it is important to have some options to selected from.
- Mean return is the most common measure of performance. Mean return can vary greatly from negative to positive values and with a big range (e.g. -18% to 9% return in 3 days for all portfolios from the selected set of stocks). Bigger returns often come with more volatility and higher risk of loss).
- Mean return-per-volatility introduces the idea of controlling risks. Mean return-per-volatility is highly correlated with mean returns, but adds another perspective on measuring performance. E.g. it differed between -0.05 to 0.15 for the portfolios from a selected set of stocks. The mean return was very different (from 0.1% to 0.5% — 5x gap) for high values of return-per-volatility (0.1 to 0.15). It is possible to have both metrics of returns at a high value.
- Max Drawdown is a measure of loss in the worst-case event. Highest mean returns often lead to a high risk of a big drawdown in a (rare) negative event. It is better to sacrifice some portion of an expected return and achieve lower max_drawdown, if you can call yourself as a ‘risk-averse’ person.
- Total score is used to get the overall picture of the performance. There is no single recipe how to choose the best portfolio. You should check all the metrics from the above and eventually select from the top-performing portfolios, which have the highest aggregated score.
Portfolio Metrics and the Investment Goal
Portfolio formation is a multi-factor optimisation problem, which creates a lot of difficulties for an inexperienced investor, as it is quite heavy on calculations and very technical.
First and foremost, you should understand the rationale behind the upcoming optimisation: you can reach similar “performance” of a portfolio while significantly improving the “defensive” characteristics of a revenue loss in bad cases if several stocks are combined together.
The intuition behind the portfolio optimisation is the correlation (corr) between stocks in the portfolio. Corr(x,y) is a measure between two time-series rows x and y, which lies between [-1, 1]. If all stocks are highly correlated (corr(x,y) ~ [0.7..1] — i.e. all stocks are the companies from the same vertical having similar business performance and risks), then there is not much you can do in the portfolio optimisation.
You have many more optimisation possibilities when there are loosely correlated stocks in your selection (corr(x,y) ~ [0..0.3]), or even better — negatively correlated pairs (corr(x,y) <0).
Now let’s switch to the portfolio metrics:
- The most common metric of a portfolio performance is a simple return over the period of time (= [Price_end_period]/[Price_begin_period]), which can be compared with some benchmark (like S&P500 returns, or available investment alternatives like bonds or real estate). In this article we cover a short-term investment horizon of 3 days, so we’ll use the return_3d signature (similar for all other metrics).
- More advanced metric is return-per-risk (=[return_3d]/[std_deviation_3d]), as it gives a relative number per unit of risk, taking into the account the potential range of price movements (or standard deviation in statistical terms). We will calculate two similar numbers: mean_return_per_volatility_3d and median_return_per_volatility_3d. The reason why we track median_return_per_volatility on top of mean_return_per_volatility is that it is not that sensitive to the outliers (of a big positive/negative price jumps) and gives more realistic picture what to expect.
- Last but not least are the measures of a potential loss from big negative events. Max_drawdown (=[maximum historical drawdown of a portfolio in 3 days]) and 25percent_drawdown (=[drawdown in 25% worst cases]). The logic is that max_drawdown is useful to check to be prepared for the worst-case scenario, and 25percent_drawdown is a measure of smaller negative movements that occur much more often.
Now you are ready to formulate your investment goal:
It can be something like this: “I want to achieve 1% of return in 3days, having a ratio of 1 for return-per-risk, and max_drawdown not higher than 10%”
This will help you to discard overly risky opportunities (like new altcoin investment), and to sleep well if your portfolio starts to lose money. You can even make an automatic sell order if your portfolio reaches max_drawdown rate, or somewhere close to it.
Technically, any investment goal can be translated to a set of restrictions that reduces the list of portfolio combinations to select from. It is important to set realistic expectations, so that the group of available options doesn’t shrink too much.
The Shortcut Solution (PyPortfolioOpt Library)
It can be beneficial to check what is publicly available before we go any further — potentially you could get a good starting point with no need to write tons of code to receive similar results.
I found the PyPortfolioOpt project on Github, which has (on 5 August,2021) 2200 people marked it with a star, and 549 times forked, and on 17th June 2021 the latest release was issued. It is pretty impressive to see such good usage stats — this is one of the most popular open-source projects on Portfolio Optimisation.
We encourage you to read a User Guide with a great historical overview of the Portfolio Optimisation approaches and easy examples for a quick start.
The downside of this approach is that it may be not fully suitable for your needs, quite hard to understand all the details of the implementation and to debug the code, if it fails.
You need to install and import the library first (it should return some version number as an output):
# https://github.com/robertmartin8/PyPortfolioOpt
!pip install PyPortfolioOpt
import pypfopt
print(f’\n Library version: {pypfopt.__version__}’)
#Output: Library version: 1.4.2
The next step is to evoke the function to calculate the mean returns (it can be CAPM, EMA-historical, or mean-historical), get the covariance matrix using another function (which can be derived from pairwise correlations), and use it in the end optimisations. We found two optimal portfolios: one has the min volatility with the max attainable returns, and another has max return-per-volatility (Sharpe ratio).
Check this code :
from pypfopt import expected_returns
from pypfopt import EfficientFrontier# json: for pretty print of a dictionary: https://stackoverflow.com/questions/44689546/how-to-print-out-a-dictionary-nicely-in-python/44689627
import json# get all prices for one day in a row
df_pivot = stocks_prices.pivot(‘Date’,’Ticker’,’Close’).reset_index()mu = expected_returns.capm_return(df_pivot.set_index(‘Date’))
# Other options for the returns values: expected_returns.ema_historical_return(df_pivot.set_index(‘Date’))
# Other options for the returns values: expected_returns.mean_historical_return(df_pivot.set_index(‘Date’))
print(f’Expected returns from each stock: {mu} \n’)S = risk_models.CovarianceShrinkage(df_pivot.set_index(‘Date’)).ledoit_wolf()# Weights between 0 and 1 — we don’t allow shorting
ef = EfficientFrontier(mu, S, weight_bounds=(0, 1))
ef.min_volatility()
weights_min_volatility = ef.clean_weights()print(f’Portfolio weights for min volatility optimisation (lowest level of risk): {json.dumps(weights_min_volatility, indent=4, sort_keys=True)} \n’)
print(f’Portfolio performance: {ef.portfolio_performance(verbose=True, risk_free_rate=0.0124)} \n’)
# Risk-free rate : 10Y TBonds rate on 21-Jul-2021 https://www.cnbc.com/quotes/US10Ypd.Series(weights_min_volatility).plot.barh(title = ‘Optimal Portfolio Weights (min volatility) by PyPortfolioOpt’);ef.max_sharpe()
weights_max_sharpe = ef.clean_weights()print(f’Portfolio weights for max Sharpe optimisation (highest level of risk): {json.dumps(weights_max_sharpe, indent=4, sort_keys=True)} \n’)
print(f’Portfolio performance: {ef.portfolio_performance(verbose=True, risk_free_rate=0.0124)} \n’)#OUTPUT:
#Expected returns from each stock: Ticker
#BA 0.214378
#BUD 0.140177
#CHTR 0.128496
#NKE 0.198096
#NVDA 0.395522
#PTR 0.220099
#SHOP 0.292879
#XOM 0.129524
#Name: mkt, dtype: float64#Portfolio weights for min volatility optimisation (lowest level of risk): {
# “BA”: 0.0,
# “BUD”: 0.37214,
# “CHTR”: 0.35668,
# “NKE”: 0.03077,
# “NVDA”: 0.0,
# “PTR”: 0.04423,
# “SHOP”: 0.14425,
# “XOM”: 0.05194
#}# Expected annual return: 16.3%
# Annual volatility: 8.9%
# Sharpe Ratio: 1.69
# Portfolio performance: (0.16280187012757663, 0.08898359416885636, 1.6902202201697114)# Portfolio weights for max Sharpe optimisation (highest return-per-risk): {
# “BA”: 0.01318,
# “BUD”: 0.23948,
# “CHTR”: 0.15528,
# “NKE”: 0.0412,
# “NVDA”: 0.07844,
# “PTR”: 0.11107,
# “SHOP”: 0.36135,
# “XOM”: 0.0
#}# Expected annual return: 22.6%
# Annual volatility: 10.7%
# Sharpe Ratio: 1.92
# Portfolio performance: (0.2258118694269962, 0.10704509290593174, 1.9226651483021082)
You can see from the Output above that the highest performing stocks (NVDA has 39%, SHOP has 29%) are not always selected with a big weight in the optimal portfolios.
Another observation is that not all stocks participate in the resulting portfolios — it is often better to have only several stocks to be included to get the max returns and a few more to hedge that against the downside risk.
Here is what we have in the end
Min-volatility (low risk) portfolio:
Expected annual return: 16.3%
Annual volatility: 8.9%
Sharpe Ratio: 1.69
Max-Sharpe portfolio:
Expected annual return: 22.6%
Annual volatility: 10.7%
Sharpe Ratio: 1.92
There can be difficulties caused by a broker, though.
One thing that can prevent you from buying one of these portfolios is the limitation to buy a discrete amount of each stock — you simply may not be able to make a portfolio with exact optimal weights, if you have only $2000 and some companies can cost you $200–500-or more per stock.
In this case you can use another function to get the closest to the optimal discrete combination of stocks (for max-Sharpe portfolio in this case):
from pypfopt import DiscreteAllocationlatest_prices = df_pivot.set_index(‘Date’).iloc[-1] # prices as of the day you are allocating
da = DiscreteAllocation(weights_max_sharpe, latest_prices, total_portfolio_value = INVESTMENT, short_ratio=0.0)
alloc, leftover = da.lp_portfolio()
print(f”Discrete allocation performed with ${leftover:.2f} leftover”)
alloc# OUTPUT (5-Aug-2021)
# Discrete allocation performed with $115.07 leftover
# {‘BUD’: 8, ‘CHTR’: 1, ‘NKE’: 1, ‘NVDA’: 1, ‘PTR’: 6}
Now you have found the answer:
{‘BUD’: 8, ‘CHTR’: 1, ‘NKE’: 1, ‘NVDA’: 1, ‘PTR’: 6}
You can simply stop at this portfolio, if you’re selecting an option for a long-term investment. It is a good idea to repeat the exercise in 1 year to rebalance the portfolio, but otherwise it is quite robust to a daily price fluctuations.
The Experiment For Short-term Investment
The solution from the previous paragraph is not suitable if you decided to do a daily (or several days) trading as there may be no easy opportunity to tune the standard routines for your purposes. The only way you have is to write your own optimisation functions and better understand the problems that can arise.
I like to do short-term trading (for 3–30 days), as I can quickly test my ideas, although many people do not share this approach and find it too risky. This is a barrier to using the standard library and we need to invent something else — The Experiment.
Let’s construct all possible (discrete) portfolio combinations (that you can afford now in the investment amount), hold them for 3 days, and then sell them afterwards to fix the revenue or loss. More stocks you choose for the portfolio — more possibilities to find negative correlations and build a diversified portfolio. The downside of this is that the total set of options grows exponentially and you can quickly run out of memory/time while trying to check all possible combinations.
One portfolio can look like this:
{‘BA’: 2, ‘BUD’: 1, ‘CHTR’: 1, ‘NKE’: 1, ‘NVDA’: 2, ‘PTR’: 1, ‘SHOP’: 0, ‘XOM’: 1}
(2 stocks of BA, 1 stocks of BUD, 1 stock of CHTR, 1 stocks of NKE, 2 stocks of NVDA, 1 stock of PTR, and 1 stock of XOM)
If you buy a selected set of stocks at some random date and try to construct 3-day returns for all possible portfolio combinations, you will get some idea of a potential range of returns that can be achieved with those stocks. The problem with this is the selection bias of whatever date you buy stocks — in different moments of time it will be a radically different result.
The obvious way to solve that is to repeat the experiment. In this case we try to maximise the ‘expected return’ (observing low risk metrics), which means simulating a buy-hold-sell procedure many times (10% of dates selected randomly) from over the last 3.5 years.
Once you run the experiment you will see how the routine tests all stock combinations, finds several thousand different options (mine delivered 22569).
Let’s look at the price of the selected portfolio over time. It started slightly over $1600 in 2018 and now it is almost $3000. It is very profitable to own it for 4 years (if you know the exact combination of stocks to hold!), as it was growing on average 20% year-to-year.
In this case we want to have liquid assets and invest only for 3 days. So sometimes you could get a yearly return of 20% in 3 days (if you invested just after the crash in Apr-2020), but it could also be a devastating experience if you bought it just before Apr-2020.
You could experience different feelings if you held this portfolio during all 3.5 years: there were periods of growth and fall. You can observe below zero return_per_volatility at the end of 2018, end 2019, and second quarter of 2020 (COVID outbreak — big negative drop and soaring volatility reaching max values of 0.12, or 12% fluctuation in 3 days). All other months’ return_per_volatility was more than zero. In the first half of 2021 until July it reached 0.8, as it was a period of constant growth with very low volatility.
Now let’s imagine you hold the portfolio for a small period of time (3 days) and buy-hold-sell it in random periods (you can select specific days later using your trading strategy).
Let’s look at the distribution of the portfolio {‘SHOP’:0, ‘NVDA’:2, ‘CHTR’:1, ‘BA’:2, ‘NKE’:1, ‘BUD’:1, ‘XOM’:1, ‘PTR’:1} metrics:
Price indicator doesn’t give a lot of insights as it misses the time dimension. What we can say is that the price fluctuated mostly between $1500 and $3200, so you would expect not more than 2x growth per period of holding the portfolio. Of course, it can be more than that, if you did the buy-hold-sell procedure several times and selected the correct dates just before the rise.
Return_3d is normally distributed around one, with a slight skew to right (the portfolio was more often growing than falling which resulted in 2x increase after the 3.5 years)
Volatility_3d is quite low overall (0.02 to 0.04), but sometimes it spikes to 0.12
Return_per_volatility3d visually resembles the standard normal distribution with a spike (and mean) to the right from 0. It is important to see how far away it is. In the moment it can be as big as 3, but in the long-term all portfolios with this number consistently close to 1 are considered to be very good.
In order to replicate the nature of short-term trading we generated the descriptive statistics on 10% of randomly generated dates (the first code block), and then selected several values from the table that describe the performance the best (the second code block):
# 10% sampling — we invent not in all days, but only in 10% of days in the last 3.5 years
df_stats = df_portfolio_value_daily.sample(frac=0.1, random_state=42).describe()
df_stats
# Defined previosly: SELECTED_PORTFOLIO = {‘BA’: 2, ‘BUD’: 1, ‘NKE’: 1, ‘PTR’: 1, ‘SHOP’: 0, ‘XOM’: 1, ‘NVDA’:2, ‘CHTR’:1 }# Get all major metrics for one portfolio:
print(f’Selected portfolio: {SELECTED_PORTFOLIO}’)
print(f”Mean return in 3 days = {df_stats[‘return_3d’][‘mean’]-1 :.2%}”)
print(f”Volatility in 3 days = {df_stats[‘volatility_3d’][‘mean’] :.1%}”)
print(f”Mean return_per_volatility in 3 days = {df_stats[‘return_per_volatility_3d’][‘mean’] :.3f}”)
print(f”Median return_per_volality in 3 days = {df_stats[‘return_per_volatility_3d’][‘50%’] :.3f}”)
print(f”25% drawdown in 3 days = {df_stats[‘return_3d’][‘25%’]-1 :.1%}”)
print(f”Max drawdown in 3 days = {df_stats[‘return_3d’][‘min’]-1 :.1%}”)
All Portfolios Analysis
We’re done with one portfolio analysis and can extrapolate that on all portfolios.
We will skip the detailed description of the recursive function to generate all possible combinations of stocks on a given amount ($2000), as it may be hard to understand for an inexperienced developer. But we highly recommend checking the code on Github (generate_all_portfolios()).
Let’s imagine we have all combinations generated now (22569 portfolios). You can check the random examples of portfolios (under $2000, but close to it) in the picture below:
The Visual Story
In this section you will see the visual story of a scaled analysis for all possible portfolio combinations.
One portfolio on each scatterplot is exactly one point on the graph (so it can be up to 22 569 points visible). The vertical and horizontal dimensions represent one of the portfolio metrics, the colour is another metric. Sometimes we add opacity and size to highlight the most important points.
Correlation Matrix
The idea: find the most and the least correlated stocks, check the negative correlation
What we think: the selected stocks is a good set, because we can construct a highly diversified portfolio. For example, PTR is weakly (or negatively) correlated with many other stocks (the row PTR has all types of colours), while (NVDA, NKE, CHTR, BUD) subset seems to be highly correlated in one direction (almost all values in the upper triangle are green).
Mean Return vs. Volatility
The idea: better returns are often connected with more volatility. An investor should take into consideration the return-per-volatility (or return-per-risk) metric on top of a simple max return strategy and try to limit the potential drawdown.
What we think: the expected return lies between -0.4% and 0.5%, it is important to select a portfolio with a positive expected return. The same return (e.g. 0.2%) can be obtained with many combinations of stocks (different points on the graph, which have a volatility range of a few percentage points (e.g. a 0.2% return has a volatility between 3.1% and 4.3%).
Mean Return vs. Mean Return Per Volatility
The idea: choose the best options by comparing returns (mean_return_3d_over_one) and return-per-volatility (mean_return_per_vol_3d).
What we think: the previous graph introduced the idea of volatility (or risk of a portfolio), but it was not directly actionable, as you should want to optimise return-per-volatility instead of a pure volatility. This graph shows that all positive returns (mean_return_3d_over_one >0) is the minority of points on the graph, while the most interesting are the points in the top right segment (mean_return_3d_over_one >0.2%).
Another risk-metric “max_drawdown_3d” shows the worst possible scenario (ranges from -10% to -30% of value drop in 3 days).
The points of interest (mean_return_3d_over_one >0.2) also show the best return-per-risk (0.1 to 0.15). Luckily, the most profitable portfolios don’t show a big drawdown of 20–30%. They are just ‘green’ (which is 10..15% max drawdown), but you should keep an eye on it.
So you can safely select one of the top-10–15 points in the right upper corner.
Max Drawdown vs. 25 Percent Drawdown
The idea: there can still be many options to select from the most profitable portfolios. You may want to look once again at the risk metrics profile.
What we think: this graph shows that the most profitable portfolios have a big max_drawdown risk, which is hard to mitigate (one way to battle against it — is to set up stop loss trading to automatically sell the position in the case of a bad scenario). But the moderate-risk cases (25% worst cases) can bring between -2% and -1.2% loss in 3 days, which is a more comparable metric with the expected return we want to achieve.
You may notice that the most profitable “green” points portfolios (with high mean_return_3d_over_one) lie on the same line of max_drawdown_3d (-12%..-13%), but can have from -1.2% to -2% of returns in the 25% of the worst cases. So you may want to select the most right ‘green’ portfolio with approx. -1.5% of 25_percent_drawndown just to improve the metrics in an ‘average’ case.
Scoring
We have five different metrics related to the quality of a portfolio (each in its own way):
mean_return_over_one_3d, mean_return_per_volatility_3d, median_return_per_volatility_3d, 25%_drawdown_in_3d, max_drawdown_in_3d.
How would you combine them if you want to select the best one among many portfolios?
We propose to define a total Score, which is a simple sum of all of the above.
The metrics have different range and min/max values — so it is not very accurate to sum them without any modification, because it will create a disbalance when one metric has a higher weight/range than another. We will do the normalisation procedure, which will move their values to the same intervals approx. (-4,4) (highly negative or bad influence is -4, and highly positive value is 4). The same idea as the standard normal random variable.
So let’s look at the distribution of metrics before and after on the graphs below. You will notice that the distributions didn’t change, but min/max values became very similar after the transformation:
You can quickly calculate that all scores for the sum of 5 metrics (ranging from -4 to 4 each) will generally lie in the range [-20,20]. It is a very rare case when all metrics are highly positive for one portfolio. You will need to select some trade off between them.
Here are two graphs on the optimal score portfolios (one is static covering all 23k portfolios, another is dynamic to see the details for the top portfolios):
You can see a screenshot below which shows the winning combination: {SHOP:1, BA:0, NVDA:1, NKE:1, BUD:0, XOM:0, PTR:1}. It earns a very high score above 16 , as it delivers an expected 0.45% return, mean_return_per_vol=0.156, median_return_per_vol=0.22, max_drawdown_3d =-12.7%, 25_percent_drawdown_3d = -1.9%).
If you trade this portfolio every 3 days and will get the same return of 0.45%, you can potentially get (without transaction fees) in total yearly return of : 1.0045^(252/3) = 1.46 or very solid 46% of growth. Of course, the mean case doesn’t happen all the time, and the next year can be totally different in the previous 3 years. So you shouldn’t take the annual growth as given, but rather use it to compare the different portfolio combinations.
Conclusion
We looked at the Optimal Portfolio selection problem from the standpoint of an individual person, who is trying to select the best combination of stocks to buy for a long-term investment or a short-term trading.
The article covers different metrics of a portfolio performance, such as an expected return, return-per-volatility (or Sharpe ratio), and max_drawdown.
It is quite easy to use the standard library to get the result in over just 10 lines of code in most of the cases for the long-term investor. And it is a complicated process to generate all possible combinations and write an optimisation routine if you have a non-standard goal (e.g. perform trading every 3 days).