Use Your Computer to Make Informed Decisions in Stock Trading: Practical Introduction — Part 9: Macroeconomic Indicators Affecting Stock Market

Baltimore Beacon, Ireland

[HINT] You can read my previous articles on Medium or on the website PythonInvest (sometimes it has more of the dynamic content). The full Python code (Colab notebooks) is on Github. An intro video covering the logics of the Colab code is on YouTube.

You may have heard the terms like ‘rising inflation’, ‘decelerating growth’, ‘central banks interventions’, and ‘unemployment’ many times, but don’t understand if this makes any sense to your investment process?
You’re right, not everyone took courses in Economics and understands the potential links, even less can estimate the connection between the macro indicators and stock market.
In this article we’ll show how to download and process the most popular macroeconomic time series, find the correlation with major indexes growth, and build a naive explanatory system to identify the most important signals.

Introduction

A number of different aspects to stock investments inform the approaches that investors take: value (growth and dividends), technical indicators, daily vs. long-term, arbitrage, as well as many others.
“Bear” and “Bull” stock markets, which can often be mapped to periods of economic growth or contraction, add complexity to the strategies, as all the aforementioned approaches must respond to these basic cycles.

History teaches us that everything is interdependent, especially when something big and unexpected happens. One only has to look at any of the major financial events of the last 20 or so years to appreciate the extent to which these cycles have an impact on investments. Any event can trigger a big chain reaction in the world, resulting in a recession (e.g. the drop in oil prices in 1973, the default on Russian gov-t bond obligations in 1998, the Dot-Com bubble crisis in 2000, the US subprime mortgage loans crisis in the US in 2008). Investors can anticipate a drawdown of 5% (3 times a year on average), and even 10% once a year, but not when there’s a 30% drop in one day. There have been at least 23 events of deep and long crashes in the last 35 years (source: “Stock market crashes and Bear Markets”), which have caused some structural market shifts and panic among financial markets in most of the cases.

It is, therefore, vital to have a good understanding of economic development, as well as potential changes in behaviour (confidence in the future, financial stress, expectations, etc.) of individual investors and institutions (ETF funds, Mutual funds, etc.) in order to react quickly to adverse scenarios when they arise.

Executive Summary

In this article we’ve examined more than 65 macro indicators (166 transformed time series) and found various correlations with stock markets growth. It is important to understand that Macroeconomics can’t accurately predict stock market dynamics by itself, but can have a big impact in some (adverse) scenarios. Thus, you should try to look at the factors like GDP growth, Inflation and Consumer Prices, Debt, Unemployment, Financial Stress and Market Volatility, and many others and draw you conclusions before the news appear and stock market reacts.

It is quite straightforward to get the macro data with Python using Pandas Datareader, but some tricks need to be done for data transformation and merge. Check out our Github page for a full implementation code (Part 9 “Macro Indicators vs. Stock Indexes Growth”).

Summary of Results

  1. There is a number of macro series that are widely used in conjunction with stock markets. GDP, CPI, Interest rates, Unemployment, Consumer confidence and debt burden, etc.
  2. Most of the data sources are FREELY available on FRED and other databases. We’ve chosen a good set of 65 metrics from FRED, Nasdaq Data (former Quandl), and STOOQ to start with (not a comprehensive set).
  3. Several macro-series are highly correlated with SNP&500 and DJI index CURRENT growth. The most important ones: Gold Volatility Index, Financial Stress Index, Industrial Production, Shiller P/E ratio, etc.
  4. The list of top correlated macro stats with the FUTURE growth numbers of SNP&500 and DJI is different and the correlation is weaker. The most important ones: Velocity of Money, Personal Saving Rate, US Dollar Index, Total Public Debt, 10- and 5-Year Breakeven Inflation Rate, etc.
  5. We’ve got the marginal impact of each indicator using the Decision Tree model. Top 5 indicators: Velocity of Money (M1), Dividend Ratio, Volatility index VIX, Corp. profits after tax, Non-cyclical Rate of Unemployment

Types of Economic Factors and Their Potential Influence

To this end, here we shall list the fundamental macro factors that can potentially influence financial market performance. There is no unambiguous consensus on the direction of impact from each individual factor, but, combined, they tend to play a big overall role.

Growth (link): when the economy is in expansion mode — consumers are confident in the future and spend more, which leads to increased profits for the producers and higher earnings and dividends at the end of the year. Companies are able more easily to borrow money by issuing more stocks during the IPO (when there is a strong demand from investors eager to buy new stocks).

Prices and Inflation (link): when prices tend to go up and the real purchasing power of money is down. Consumers try to save the existing capital by investing into the riskier assets to ‘cover’ the potential value shrinkage from the inflation.

Money Supply (link): an increase in money supply leads to lower interest rates (as Fed is mostly printing more money by buying T-bills) and higher attractiveness of stocks.

Interest Rates (link): Interest rates refer to the cost someone pays for the use of someone else’s money. In a broad sense, they regulate the price at which banks lend money to other banks, as well as interest rates on personal loans and mortgages. High interest rates also raise the interest on bonds, as they start to earn more risk-free percentage income.

Unemployment: rising unemployment can be a good sign during a rising economy (business is growing too fast and needs more people to hire) and a bad sign during a contraction. There are good indicators like weekly unemployment claims that can show early signs of massive changes before the quarterly GDP stats are collected and published.

Income and Expenditure (link): when disposable income increases, households have more money and either save or spend, the latter of which naturally leads to a growth in consumption. More consumption has a knock-on effect: the stronger the demand for goods, the more profitable companies are, the more jobs they create, the higher the wages and the greater the potential income to invest etc.

Government Debt (link, link2): countries issue more debt to cover the yearly budget deficit (current US debt is >100% of its GDP) or finance unexpected “force majeure”-type scenarios (like Covid-19). “In the future, countries will be forced to pay debt either through raising taxes or by printing more money to pay for that debt, which could end up slowing growth or risking higher inflation. Both of those things can impact equity and bond markets,” [Rhea Thomas, senior economist at Wilmington Trust in Wilmington, Delaware].

Fx Rates (e.g. USD/EUR): a strong national currency makes domestic production appealing to investors and may attract more foreign investors.

Alternative investments (prices of oil/gold/bitcoin/etc.): growing returns in alternative classes of investments can shift money away from stocks.

2020–2021 Macro Snapshot

Here, we will briefly describe the latest changes in the macro setup for 2020 and 2021, and try to apply the forces from the previous paragraph to an analysis of the current situation.

The World eagerly awaits (maybe too optimistically) the bounce-back from COVID-19 (49% of the global population received at least one dose of a COVID-19 vaccine) as it hopes to come back on track with GDP growth (US GDP: from -9% in Q2'20 to +3% in Q2'21). During the pandemic, government entities (the Fed and the ECB) boosted massive support programs (“Quantitative Easing (QE)”) by printing money to help businesses and populations struggling from the lockdowns but they also increased Government debt from 107% to 125% to GDP. High money supply is driving expectations of inflation and more retail investors are thinking of taking up stocks investment to save their capital from inflation. The household savings rate was up from 10% to 30% during the first months of COVID-19 in 2020, and it is back to 10% in Aug-2021 (probably, the population is overly relaxed now with less risk from the deadly virus, hoping that Governments have the situation under control). Market volatility is low, and the financial stress index (for ordinary people) is returning to its normal level. Weekly unemployment claims have dropped sharply to a minimum value showing a quick recovery from last year’s covid-related peak. Stock market indexes S&P500 and DJI reached all-time highs this year in Feb’21 and end-Aug’21 (“Closing milestones of the S&P 500”), despite a massive shock in GDP the previous year. There is a worrying uptrend with the Shiller P/E ratio almost reaching its previous peak from the 2000 “DotCom Bubble” (the ratio shows the increase in the largest stocks prices divided by their earnings).

Getting the Macro Data

Text

The St. Louis Fed’s “FRED” database is arguably the most amazing economics datasite on the internet.’ [Business Insider, 2013]

While the website contains more than 148,000 economic data time series overall, we’ve focused on the most popular ones (in relation to the stock market), which are coupled together in several groups: “Growth”, “Prices and Inflation”, “Money Supply”, “Interest Rates”, “Employment”, “Income and Expenditure”, “Government Debt”, “Other (Uncategorised) Indicators”.

Here is the full list of FRED indicators used:

# Macro economic indicators (mostly US) from the FRED database 
# Detailed info on each indicator check on web: https://fred.stlouisfed.org/series/<indicator_name>
# DOC with the metrics and external exploratory Colab: https://docs.google.com/document/d/1Cf4C3Xz4_yitlzPaLEknHoDlw7KMXey4c49kZ7ucQEE/edit?usp=sharing
FRED_INDICATORS = [‘GDP’, ‘GDPC1’, ‘GDPPOT’, ‘NYGDPMKTPCDWLD’, # 1. Growth
‘CPIAUCSL’, ‘CPILFESL’, ‘GDPDEF’, # 2. Prices and Inflation ‘M1SL’, ‘WM1NS’, ‘WM2NS’, ‘M1V’, ‘M2V’, ‘WALCL’, # 3. Money Supply ‘DFF’, ‘DTB3’, ‘DGS5’, ‘DGS10’, ‘DGS30’, ‘T5YIE’, # 4. Interest Rates ‘T10YIE’, ‘T5YIFR’, ‘TEDRATE’, ‘DPRIME’, # 4. Interest Rates ‘UNRATE’, ‘NROU’, ‘CIVPART’, ‘EMRATIO’, # 5. Employment ‘UNEMPLOY’, ‘PAYEMS’, ‘MANEMP’, ‘ICSA’, ‘IC4WSA’, # 5. Employment ‘CDSP’, ‘MDSP’, ‘FODSP’, ‘DSPIC96’, ‘PCE’, ‘PCEDG’, # 6. Income and Expenditure
‘PSAVERT’, ‘DSPI’, ‘RSXFS’, # 6. Income and Expenditure ‘INDPRO’, ‘TCU’, ‘HOUST’, ‘GPDI’, ‘CP’, ‘STLFSI2’, # 7. Other indicators ‘DCOILWTICO’, ‘DTWEXAFEGS’, ‘DTWEXBGS’, # 7. Other indicators ‘GFDEBTN’, ‘GFDEGDQ188S’, # 8. Gov-t debt
‘DEXUSEU’, ‘GVZCLS’, ‘VIXCLS’, ‘DIVIDEND’, # 9. Additional indicators from IVAN ‘MORTGAGE30US’, ‘SPCS20RSA’ ]

You can view the full description and historical values of any indicator on its corresponding Web-page by typing in the following address and putting the indicator name you are interested in at the end: https://fred.stlouisfed.org/series/<indicator_name>. E.g. for GDP, type the following link: https://fred.stlouisfed.org/series/GDP.

As not all the data indicators were available on FRED, I went to another data source called Nasdaq Data (former QUANDL) to get a few more daily indicators and another one called STOOQ to get the historical S&P 500 (SPX) and Dow Jones Industrial average index (DJI) values.

# Macro Indicators from QUANDL 
QUANDL_INDICATORS = {‘BCHAIN/MKPRU’, ‘USTREASURY/YIELD’, ‘USTREASURY/REALYIELD’, ‘MULTPL/SHILLER_PE_RATIO_MONTH’, ‘LBMA/GOLD’ } # 9. Additional indicators from IVAN
# Stock maret indexes # All indexes: https://stooq.com/t/
STOOQ_INDICATORS = {‘^DJI’,’^SPX’}

We utilised those three databases to construct a dictionary of disconnected time series, which we then transformed to relative levels and joined together at a later stage to produce a unified dataset.

Here is the result for the 65 features downloaded (it’s a simple concatenation of 3 lists above):

for i,value in enumerate(macro_indicators.keys()):
if i%6==0:
print('\n')
print(value, end =", ")
# OUTPUT:
# GDP, GDPC1, GDPPOT, NYGDPMKTPCDWLD, CPIAUCSL, CPILFESL,
# GDPDEF, M1SL, WM1NS, WM2NS, M1V, M2V,
# WALCL, DFF, DTB3, DGS5, DGS10, DGS30,
# T5YIE, T10YIE, T5YIFR, TEDRATE, DPRIME, UNRATE,
# NROU, CIVPART, EMRATIO, UNEMPLOY, PAYEMS, MANEMP,
# ICSA, IC4WSA, CDSP, MDSP, FODSP, DSPIC96,
# PCE, PCEDG, PSAVERT, DSPI, RSXFS, INDPRO,
# TCU, HOUST, GPDI, CP, STLFSI2, DCOILWTICO,
# DTWEXAFEGS, DTWEXBGS, GFDEBTN, GFDEGDQ188S, DEXUSEU, GVZCLS,
# VIXCLS, DIVIDEND, MORTGAGE30US, SPCS20RSA, BCHAIN_MKPRU, USTREASURY_YIELD,
# MULTPL_SHILLER_PE_RATIO_MONTH, USTREASURY_REALYIELD, LBMA_GOLD, SPX, DJI,

Data Transformations

There is one major pain point for many of the ‘always growing’ factors like GDP ($b), SPX (points): we can’t include them unchanged in the dataframe.

Unlike ‘stationary’ indicators like Mortgage rates (usually, 1–5% rate) or Savings rates (usually, 10%-30% of the total household income), other indicators may grow higher and higher for years to come (non-stationary time series).

If you add those factors without any transformation, you will receive bad results during the prediction phase, as any model will learn on small levels during the first years and won’t know what to predict when the input data is too high (due to it never seeing those high levels during the training phase).

That’s why the ‘growth’ transformations are introduced for all non-stationary time series. These are Day-on-Day (DoD), Week-on-Week (WoW), Month-on-Month (MoM), Quarter-on-Quarter (QoQ), Year-on-Year (YoY) growth rates in percentage.

Check out the Colab function get_macro_shift_transformation(macro_indicators_dict) for more details.

Here are several examples (the original ‘non-stationary’ indicators are removed from the dataset in most cases):

Fig.1–1 Monthly indicator CPILFESL after transformations
Fig.1–2 Quarterly indicator GDP after transformations
Fig.1–3 Weekly indicator ICSA after transformations

Joining All Together Into a Single Dataframe

The next problem you’re likely to face is how to join all time series into one dataframe. Here is the list of the potential difficulties, which you need to take into consideration:

  • The first subset of them is updated daily during Monday-Friday (no weekend stats or when the stock market is close). The second is updated weekly on Wednesday or Sunday (or another day of the week), the third arrives monthly, and the fourth is updated every quarter or even yearly
  • The data in the time series are not updated exactly when the period is over, but only when several days, weeks, or even months pass (you may want to backfill the missing values with the latest available value)
  • Some periods are indexed by a period-end (e.g. weekly stats from FRED), and others with period-start (e.g. monthly stats from FRED).

Here is what you can do to overcome all of these problems:

  • start with all records of a daily data (e.g. SPX and DJI indexes, Gold prices, most of the interest rates, etc.) and join them together on Date
  • Add weekly, monthly, quarterly, yearly series to the Daily dataset from the step above by broadcasting the latest available value for all days until it is refreshed

You can find more in the function get_daily_macro_stats_df(daily_df, macro_ind_df, regime=’LAST’) that ‘intelligently’ aligns all types of frequency time series with a daily basis. There is a ‘merge’ code afterwards, that stacks up all set of series in a single pile.

You will end up with 2.5x more (166) indicators (all new are the growth transformations of existing):

i=1 
for value in macro_df.keys():
if not ('future' in value):
print(value, end =", ")
if i%8==0:
print('\n')
i+=1
# OUTPUT:
# WM1NS_wow, WM1NS_mom, WM2NS_wow, WM2NS_mom, WALCL_wow, WALCL_mom, DFF, DTB3,
# DGS5, DGS10, DGS30, T5YIE, T10YIE, T5YIFR, TEDRATE, DPRIME, # ICSA_wow, ICSA_mom, IC4WSA_wow, IC4WSA_mom, STLFSI2, STLFSI2_wow, STLFSI2_mom, DCOILWTICO,
# DCOILWTICO_growth_1d, DCOILWTICO_growth_3d, DCOILWTICO_growth_7d, # DCOILWTICO_growth_30d, DCOILWTICO_growth_90d, DCOILWTICO_growth_365d, DTWEXAFEGS, DTWEXBGS,
# DEXUSEU, GVZCLS, VIXCLS, MORTGAGE30US, MORTGAGE30US_wow, MORTGAGE30US_mom, BCHAIN_MKPRU, BCHAIN_MKPRU_growth_1d,
# BCHAIN_MKPRU_growth_3d, BCHAIN_MKPRU_growth_7d, BCHAIN_MKPRU_growth_30d, BCHAIN_MKPRU_growth_90d, BCHAIN_MKPRU_growth_365d, LBMA_GOLD, LBMA_GOLD_growth_1d, LBMA_GOLD_growth_3d,
# LBMA_GOLD_growth_7d, LBMA_GOLD_growth_30d, LBMA_GOLD_growth_90d, LBMA_GOLD_growth_365d, SPX, SPX_growth_1d, SPX_growth_3d, SPX_growth_7d, # SPX_growth_30d, SPX_growth_90d, SPX_growth_365d, DJI, DJI_growth_1d, DJI_growth_3d, DJI_growth_7d, DJI_growth_30d,
# DJI_growth_90d, DJI_growth_365d, GDP_qoq, GDP_yoy, GDPC1_qoq, GDPC1_yoy, GDPPOT_qoq, GDPPOT_yoy,
# NYGDPMKTPCDWLD_yoy, CPIAUCSL_mom, CPIAUCSL_yoy, CPILFESL_mom, CPILFESL_yoy, GDPDEF, GDPDEF_qoq, GDPDEF_yoy,
# M1SL_mom, M1SL_yoy, M1V, M1V_qoq, M1V_yoy, M2V, M2V_qoq, M2V_yoy, # UNRATE, UNRATE_mom, UNRATE_yoy, NROU, NROU_qoq, NROU_yoy, CIVPART, CIVPART_mom,
# CIVPART_yoy, EMRATIO, EMRATIO_mom, EMRATIO_yoy, UNEMPLOY_mom, UNEMPLOY_yoy, PAYEMS_mom, PAYEMS_yoy,
# MANEMP_mom, MANEMP_yoy, CDSP, CDSP_qoq, CDSP_yoy, MDSP, MDSP_qoq, MDSP_yoy,
# FODSP, FODSP_qoq, FODSP_yoy, DSPIC96_mom, DSPIC96_yoy, PCE_mom, PCE_yoy, PCEDG_mom, # PCEDG_yoy, PSAVERT, PSAVERT_mom, PSAVERT_yoy, DSPI_mom, DSPI_yoy, RSXFS_mom, RSXFS_yoy,
# INDPRO, INDPRO_mom, INDPRO_yoy, TCU, TCU_mom, TCU_yoy, HOUST_mom, HOUST_yoy,
# GPDI_qoq, GPDI_yoy, div_ratio, CP_qoq, CP_yoy, GFDEBTN_qoq, GFDEBTN_yoy, GFDEGDQ188S,
# GFDEGDQ188S_qoq, GFDEGDQ188S_yoy, DIVIDEND_qoq, DIVIDEND_yoy, SPCS20RSA, SPCS20RSA_mom, SPCS20RSA_yoy, MULTPL_SHILLER_PE_RATIO_MONTH,
# MULTPL_SHILLER_PE_RATIO_MONTH_mom, MULTPL_SHILLER_PE_RATIO_MONTH_yoy,

Correlation Analysis

Now we’re ready to move on to exciting stuff.
The first thing that you’ve already read in Intro is that (macro-)economics and financial markets are dependent, but it’s not clear how tightly and what are the most correlated time series. Here are the top results for SNP500 365days growth:

Fig.2–1 Top correlated features with SNP&500 365d growth

Most negatively correlated factors: GVZCLS (CBOE Gold ETF Volatility Index) , STLFSI2 (St. Louis Fed Financial Stress Index), FODSP (Household Financial Obligations as a Percent of Disposable Personal Income), VIXCLS (CBOE Volatility Index VIX), CDSP (Consumer Debt Service Payments as a Percent of Disposable Personal Income).

Most positively correlated factors: INDPRO_yoy (Industrial Production YoY growth), MANEMP_yoy (All Employees in Manufactoring YoY), MULTPL_SHILLER_PE_RATIO_MONTH_yoy (Shiller PE Ratio YoY growth), DJI_growth_365d (Dow Jones Industrial Average 365d growth)

You can find similar patterns for top correlated factors with DJI 365days growth (DJI is another popular stock market index):

Fig.2–2 Top correlated features with DJI 365d growth

There is only one new top negative factor UNEMPLOY_yoy (Unemployment Level YoY growth).
You can also check the same correlation stats for SNP500 90d and 30d growth in the Colab notebook.

Result (correlations with the CURRENT growth of an index): as you can see from the numbers above there is a strong correlation (-0.6..-0.4 to +0.6–0.75) between some of the factors and the growth of stock market indexes (SPX and DJI). Unfortunately, we can’t easily retrieve the marginal impact of each indicator in order to get the most significant ones and those causing the ‘chain’ reaction for other indicators to follow. The good thing is that the correlated factors are mostly the same for SPX_growth (30d, 90d, or 365d) and DJI_growth_365d, which verifies the robustness of the results.

All of this is quite appealing, but not very helpful if you want to predict what will happen in the future with the stock market by knowing all recent changes in the macroeconomic indicators. The cause of the problem is the ‘simultaneous’ correlation values we were looking at (current macro stats correlated against the current growth of an index, and not forward-looking growth).

One step forward is to start looking for the correlation of the same macro features with the changes in stock market indexes in the future (link to Github — check the section “4) Correlation Analysis”).

But before doing that, let’s exclude all but one _future_growth_* indicators from the dataset (filter named macro_df_no_future_ind), as all future indicators are mostly correlated with each other, while we want to predict the future indicator with the current values:

# Future growth indicators are mostly correlated with each other future_ind = [] 
for ind in macro_df.keys():
if 'future' in ind:
future_ind.append(ind)
print(future_ind)
# OUTPUT: # ['SPX_future_growth_1d', 'SPX_future_growth_3d', 'SPX_future_growth_7d', 'SPX_future_growth_30d', 'SPX_future_growth_90d', 'SPX_future_growth_365d', 'DJI_future_growth_1d', 'DJI_future_growth_3d', 'DJI_future_growth_7d', 'DJI_future_growth_30d', 'DJI_future_growth_90d', 'DJI_future_growth_365d'] # include all features first
macro_df_no_future_ind = macro_df.keys()
# do not use future_ind in the list to find correlations with the label (which is a future_indicator)
macro_df_no_future_ind = macro_df_no_future_ind.drop(future_ind)
Fig.2–3 Top correlated features with FUTURE SNP&500 365d growth
Fig.2–4 Top correlated features with FUTURE SNP&500 90d growth
Fig.2–5 Top correlated features with FUTURE SNP&500 30d growth

Result (correlations with the FUTURE growth of an index): the common top correlated indicators available in 2–3 experiments are: M1V (Velocity of Money), T10YIE and T5YIE (10- and 5-Year Breakeven Inflation Rate), PSAVERT (Personal Saving Rate), DTWEXAFEGS (Nominal Advanced Foreign Economics U.S. Dollar Index).

Another interesting fact is that general level of top correlations goes down (from +-0.4 for 365 days to +-0.15 for 30 days) when we try to predict the shorter term of growth
This makes perfect sense, as stock markets are chaotic and volatile in the short term and very hard to predict. Also you shouldn’t expect an immediate impact from the macro inputs (it should grow influence over time).

Overall, the macro factors correlations table with forward-looking type of data (growth of a stock index in a future) is weaker (|corr| in [0.15;045]) compared with the simultaneous factors correlation (|corr| in [0.5;07]).

Decision Trees for Features Importance

he previous results gave us a head start, but it may not be good enough. The reason is that those features can be correlated between each other and thus an analyst can’t identify what causes the change of an index in the first place. Thus, we’ll find the marginal impact of each individual indicator and rank them accordingly.

Decision Tree is one of the most convenient libraries (full details from the Scikit-learn library) to predict future growth of S&P 500 (90 days) and to get the features’ importance.

The model is not very precise (check the code and picture below), as it is nearly impossible to predict the future stock market movement well only by looking at the macro indicators (they have a limited explaination power). So any conclusions regarding the importance/ranking of the features should be taken with a grain of salt.

Anyway, it’s better to have something and hope that those relations and sorted order will persist in the ‘full’ model that has all relevant features.

Here is the sample code and the graph for actual vs. predicted values:

# imports
from collections import OrderedDict
from sklearn.tree import DecisionTreeRegressor
from matplotlib import pyplot
# all features should be numeric
for key in macro_df.keys():
macro_df[key] = macro_df[key].astype(float)
# include all features
X_keys = macro_df.keys()
# do not use future ind to predict
X_keys = X_keys.drop(future_ind)
# deep copy of the dataframe not to change the original df
macro_copy = macro_df.copy(deep=True)
macro_copy.fillna(0,inplace=True)
# macro_copy.dropna(inplace=True)
#get all features in X and dependent variable in y
X = macro_copy[X_keys]
y = macro_copy['SPX_future_growth_90d']
# define a function that returns an ordered dictionary of features, sorted by importance
def get_importance_features(model):
importance = model.feature_importances_
feat_imp = OrderedDict()
# summarize feature importance
for i,v in enumerate(importance):
feat_imp[X.keys()[i]] = importance[i]
# https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary-by-value
sorted_feat_imp = sorted(feat_imp.items(), key=lambda kv: kv[1])
return sorted_feat_imp# init the class and fit the model
decision_tree_model = DecisionTreeRegressor()
decision_tree_model.fit(X, y)
decision_feat_imp = get_importance_features(decision_tree_model)
Fig.3–1 Actual (blue) vs. Predicted (orange) graph of SNP500 future 90-days growth for the DecisionTree model

Here are the results of the 10 most important features when predicting the 90-days future growth for S&P500:

Fig.3–2 Top 10 macro stats by their (marginal) importance (bigger coef. — better)

As you can see, the top important indicators look familiar to what we saw during the correlation analysis, but some are new. They may be not very correlated with the indexes growth directly, but have a sizable marginal impact:

Conclusion

In this article we walked through the whole process of getting and embedding macroeconomic factors to the stock market analysis. We started from identifying the most important (and widely acknowledged) macro series and then explained the potential ways of their influence on the stock market. We continued by using Python’s Pandas Datareader to get the data in a set of arrays, generated derived statistics, and converted it to one dataframe.
Finally, we conducted a correlation analysis and built a Decision Tree to select the most powerful indicators and sort them by the magnitude of influence.

--

--

--

Data and Product Analyst at Google. I run a website PythonInvest.com on Python programming, analytics, and stock investing.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Mathematics for Data Scientist

How Is Big Data Revolutionizing Businesses Globally?

SQL Window Functions

Hypothesis Testing

305 Durrett St Springfield TN 37172

Understanding Data Lakes and Data Lake Platforms

Resume Screening using Deep Learning on Cainvas

Using Data to Improve Outcomes for Washington’s Children, Youth, and Families

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ivan Brigida

Ivan Brigida

Data and Product Analyst at Google. I run a website PythonInvest.com on Python programming, analytics, and stock investing.

More from Medium

Market Volatility in Python

a piece of paper with a graph drawing

How to Construct a Relative Strength Matrix for Stocks and ETFs with Python

A quick way to visualize an entire stock portfolio for the lazy trader

Data Analysis of S&P500 stocks in Python -investment strategy during post-pandemic-