Use Your Computer to Make Informed Decisions in Stock Trading: Practical Introduction — Part 5: Developing a Short Term Investment Strategy Based on Earnings-Per-Share (EPS) Data

Abandoned island Inishmurray, Ireland


  • Do all companies try to report very close to estimates? What is big enough Surprise(%) to cause a shock in a short-time stock prices?
  • If the spike caused a rapid change in the stock price, does it remain for a several days and can we predict the direction of its movement (so that one can use the knowledge to make a short term investments)?
  • Does consistent reporting of a slight positive surprise bring to long-term stocks growth?
  • Does a gradual increase in EPS increase the valuation of a company?

Preparation Steps

import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

Functions For Scraping

  • Takes URL and URL_PARAMs as input arguments
  • Finds one existing table in the HTML table
  • Saves the table column manes (there is one hack for Most-active page, as one column is not visible)
  • Retrieves the values row-by-row
  • Returns a dataframe with all fields as an object
# Example of a full URL with one stock Symbol
# url = “"
def get_scraped_yahoo_finance_page(url, url_params):
# url = “"
# url_params = {‘symbol’:ticker}
r = requests.get(url, params=url_params)
soup = BeautifulSoup(r.text)
table = soup.find_all(‘table’)
# DEBUG: Just 1 table found which is good
print(‘We\’ve found tables:’, len(table))
if len(table)!=1:
return None
# Get all column names
if len(soup.table.find_all(‘thead’))==0:
return None
spans = soup.table.thead.find_all(‘span’)
columns = []
for span in spans:
# Hack: due to some reason one of the columns are not <span> tag for calendar/earnings and is not discovered in the columns list.
# we manually add this column
if url.find(‘') != -1 :
columns.insert(len(columns)-2,”Market Cap”)
rows = soup.table.tbody.find_all(‘tr’)
# read row by row
stocks_df = pd.DataFrame(columns=columns)
for row in rows:
elems = row.find_all(‘td’)
dict_to_add = {}
for i,elem in enumerate(elems):
dict_to_add[columns[i]] = elem.text
stocks_df = stocks_df.append(dict_to_add, ignore_index=True)
return stocks_df
# The only record per stock appears with this pattern : the next earnings date
def get_next_earnings_records(stocks_df):
filter1 = stocks_df[‘EPS Estimate’]!=’-’
filter2 = stocks_df[‘Surprise(%)’]==’-’
filter3 = stocks_df[‘Reported EPS’]==’-’
rez_df = stocks_df[filter1 & filter2 & filter3]
return rez_df
# Remove all records with not filled stats and cast to float values
def clean_earnings_history_df(stocks_df):
filter1 = stocks_df[‘EPS Estimate’]!=’-’
filter2 = stocks_df[‘Surprise(%)’]!=’-’
filter3 = stocks_df[‘Reported EPS’]!=’-’
stocks_df_noMissing = stocks_df[filter1 & filter2 & filter3]
stocks_df_noMissing[‘EPS Estimate’] = stocks_df_noMissing[‘EPS Estimate’].astype(float)
stocks_df_noMissing[‘Reported EPS’] = stocks_df_noMissing[‘Reported EPS’].astype(float)
stocks_df_noMissing[‘Surprise(%)’] = stocks_df_noMissing[‘Surprise(%)’].astype(float)
return stocks_df_noMissing
ticker = 'F'f_df = get_scraped_yahoo_finance_page(url = "",url_params = {'symbol':ticker})f_df_clean = clean_earnings_history_df(f_df)
Fig. 1 Ford company (F) last 5 EPS results

Getting the Top 200 Traded Stocks

num_stocks = 200
most_active_stocks = get_scraped_yahoo_finance_page(url = “",url_params = {‘count’:num_stocks})
Fig. 2 Top 10 stocks for the last trading (20 Nov. 2020)
Fig. 3 Scraped dataframe fields types
import rePOWERS = {‘T’: 10 ** 12,’B’: 10 ** 9, ‘M’: 10 ** 6, ‘%’: 0.01, ‘1’:1}“””Read a string (with M/B/T/% values)
Return a correct numeric value
def convert_str_to_num(num_str):
match =”([0–9\.-]+)(M|B|T|%)?”, num_str)
if match is None:
return None
quantity =
if is None:
magnitude = ‘1’ # no letter in the end->don’t change the value
magnitude =
# print(quantity, magnitude)
return float(quantity) * POWERS[magnitude]
columns_to_apply = [ ‘Price (Intraday)’, ‘Change’, ‘% Change’, ‘Volume’, ‘Avg Vol (3 month)’, ‘Market Cap’,
‘PE Ratio (TTM)’]
for col in columns_to_apply:
most_active_stocks[col] = most_active_stocks[col].apply(convert_str_to_num)
most_active_stocks[“log_market_cap”] = np.log10(most_active_stocks[“Market Cap”])
Fig.4 Most traded stocks are large on Market Cap: $B (10⁹) to $T (10¹²)
most_active_stocks[“log_market_cap_binned”] = pd.qcut(most_active_stocks.log_market_cap,3)
Fig.5 Three equal size clusters of the most traded stocks
Fig6. Average values for 3 clusters of stocks by market cap
def remove_outliers(df, column_name, quantile_threshold):
q_low = df[column_name].quantile(quantile_threshold)
q_hi = df[column_name].quantile(1-quantile_threshold)
rez = df[(df[column_name] < q_hi) & (df[column_name] > q_low)]
return rez
tmp = remove_outliers(df = most_active_stocks, column_name = “% Change”,quantile_threshold = 0.02)
tmp[“% Change”].hist(bins=100)
Fig.7 One day %Change happen to be mostly between -5% to 5% (can differ for other trading days)
tmp = remove_outliers(df = most_active_stocks, column_name = “Change”,quantile_threshold = 0.02)# The difference vs. previous graph: we draw the abs. daily change here vs. relative change in % in the previous graph
Fig.8 Abs. daily Change is concentrated around 0 in normal trading days
# We want to get some idea of the stock was traded today vs. 3-month average volumemost_active_stocks[“relative_volume”] = most_active_stocks[“Volume”]/most_active_stocks[“Avg Vol (3 month)”]
import seaborn as sns
data = most_active_stocks, x=”% Change”, y=”relative_volume”, hue=”log_market_cap_binned”, fill=True,
Fig.9 One day % Change vs. relative_volume (1 day volume/3-months avg. volume)
data = most_active_stocks, x=”% Change”, hue=”log_market_cap_binned”,
fill=True, common_norm=False,
# palette=”rocket”,
Fig.10 Histogram of 1-day %Change for three classes of stocks

Getting a Dataframe with All EPS Historical for the Most Traded Stocks

Fig.11 200 most active stocks for Friday, 20 Nov. 2020
NEW_TICKERS = [x for x in ADDITIONAL_TICKERS if x not in set(most_active_stocks.Symbol)]TICKERS_LIST = most_active_stocks.Symbol.append(pd.Series(NEW_TICKERS))
from random import randint
from time import sleep
# Empty dataframe
all_tickers_info = pd.DataFrame({‘A’ : []})
for i,ticker in enumerate(TICKERS_LIST):
current_ticker_info = get_scraped_yahoo_finance_page(url = “",url_params = {‘symbol’:ticker})
print(f’Finished with ticker {ticker}, record no {i}’)
if all_tickers_info.empty:
all_tickers_info = current_ticker_info
all_tickers_info = pd.concat([all_tickers_info,current_ticker_info], ignore_index=True)
# Random sleep 1–3 sec
next_earnings_dates = get_next_earnings_records(all_tickers_info)
next_earnings_dates.sort_values(by=’Earnings Date’).tail(30)
Fig.12 Next reporting dates are only in January 2021
all_tickers_info_clean = clean_earnings_history_df(all_tickers_info)
Fig.13 All tickers all EPS history
Fig.14 Aggregate stats for EPS
Fig.15 Actual EPS figures are close to the estimates during the whole history
all_tickers_info_clean[[‘EPS Estimate’,’Reported EPS’]]
Fig. 16 EPS Estimate/Reported EPS have many outliers in both positive and negative directions
import scipy
# calculate z-scores of `df`
z_scores = scipy.stats.zscore(all_tickers_info_clean[[‘EPS Estimate’,’Reported EPS’]])
abs_z_scores = np.abs(z_scores)
filtered_entries = (abs_z_scores < 2).all(axis=1)
new_df = all_tickers_info_clean[[‘EPS Estimate’,’Reported EPS’]][filtered_entries]
Fig.17 Box Plot with removed outliers
from datetime import datetime 
from datetime import timedelta
all_tickers_info_clean[‘Earnings Date 2’] = all_tickers_info_clean[‘Earnings Date’].apply(lambda x:datetime.strptime(x[:-3], ‘%b %d, %Y, %H %p’) )
all_tickers_info_clean[‘PK’] = all_tickers_info_clean.Symbol + “|”+ all_tickers_info_clean[“Earnings Date 2”].apply(lambda x : x.strftime(‘%Y-%m-%d’))
Fig.18 Historical EPS for top traded stocks and the Primary Key (PK)

Getting All Available History of Stock Prices for the Selected Tickers

!pip install yfinance
import yfinance as yf
# Start from an empty dataframe
df_stocks_prices = pd.DataFrame({‘A’ : []})
# Download all history of stock prices and calculate the future returns for 1–7 days, 30d, 90d, 365d
# That is: we are very interested if we buy stock at some date (e.g. high EPS) -> if it is going to be a profitable decision
for i,ticker in enumerate(TICKERS_LIST):
yf_ticker = yf.Ticker(ticker)
historyPrices = yf_ticker.history(period=’max’)
historyPrices[‘Ticker’] = ticker
# Sometimes there is a problem with .index value → use try
historyPrices[‘Year’]= historyPrices.index.year
historyPrices[‘Month’] = historyPrices.index.month
historyPrices[‘Weekday’] = historyPrices.index.weekday
historyPrices[‘Date’] =
except AttributeError:
# !!! Important: we do historyPrices[‘Close’].shift(1) — to get the Close market price 1 day BEFORE current
# !!! Important: we do historyPrices[‘Close’].shift(-i)) — to get the Close market price the i days AFTER current
# If you divide second on first -> you get the returns from holding “i days” the stock that you bought the day before financial reporting occurred
for i in [1,2,3,4,5,6,7,30,90,365]:
historyPrices[‘r_future’+str(i)] = np.log(historyPrices[‘Close’].shift(-i) / historyPrices[‘Close’].shift(1) )
historyPrices[‘years_from_now’] = historyPrices[‘Year’].max()- historyPrices[‘Year’]
historyPrices[‘ln_volume’]= np.log(historyPrices[‘Volume’])
if df_stocks_prices.empty:
df_stocks_prices = historyPrices
df_stocks_prices = pd.concat([df_stocks_prices,historyPrices], ignore_index=True)
df_stocks_prices[“PK”] = df_stocks_prices.Ticker + “|”+ df_stocks_prices[“Date”].apply(lambda x : x.strftime(‘%Y-%m-%d’))
filter1 = df_stocks_prices.Ticker==’GE’
filter2 = df_stocks_prices.Year == 2020
filter3 = df_stocks_prices.Month == 10
df_stocks_prices[filter1 & filter2 & filter3].head(2)
Fig 19. Truncated (on rows) dataset for a daily stock prices and their future returns
all_tickers_info_clean = all_tickers_info_clean.drop_duplicates(subset=[‘PK’], keep=’first’)
df_stocks_prices = df_stocks_prices.drop_duplicates(subset=[‘PK’], keep=’first’)
merged_df = pd.merge(all_tickers_info_clean, df_stocks_prices, on=”PK”, validate=”one_to_one”)
def draw_plot(symbol):
filter = (merged_df.Symbol== symbol) & (merged_df.Year>=2010)
df = merged_df[filter][[“EPS Estimate”,”Reported EPS”,”Surprise(%)”,”r_future1",”r_future7",”Date”]]
with pd.option_context(‘display.max_rows’, None, ‘display.max_columns’, None): # more options can be specified also


#Graph1: EPS estimate vs. Reported EPS
df[[“EPS Estimate”,”Reported EPS”,”Date”]].plot.line(x=”Date”, figsize=(20,6), title=” EPS Estimate vs. Reported”)

#Graph2: Surprise in %
df[[“Surprise(%)”,”Date”]].plot.line(x=”Date”, figsize=(20,6), title=”Surprise % (=Reported EPS/EPS Estimate)”)
#Graph3: 1- and 7-days returns
df[[“r_future1”,”r_future7",”Date”]].sort_values(by=’Date’).plot(x=”Date”, kind=’bar’, figsize=(20,6), title=”Stock jump”)
  • In most of the periods Reported EPS is higher than Predicted EPS
  • Four datapoints (quarters) show the negative Surprise, which didn’t cause a big shock for a stock price for the first time, but caused a 10–20% dip for other times
  • The last report Q3'20 showed a positive dynamics moving EPS from a negative to positive values, which resulted in 3.7% rise in the first day, and 13% stock rise in 7 days
  • The last quarter results pose the following investment idea: if the Surprise (EPS actual vs. EPS predicted) is positive and big, then the stock can jump in one day (r1_future) and continue its growth for the whole week after that (r7_future). An investor can monitor such occasions and buy a stock just after the very successful reporting date aiming to sell it in a short period of time.
Fig.20 GE stock: EPS Estimate vs. Reported EPS
Fig.21 GE stock:Surprise %
Fig.22 GE stock: Stock price jump 1 and 7 days after the reporting date
  • Microsoft (MSFT)tend to have the the actual EPS different from an estimates on -10% .. +20%
  • There was one date in the end of 2017 with surprise 40%, which did cause a positive spike in returns, although r1 and r7 are not very different (which probably happened due to low absolute value for EPS <0.01)
  • Last 6 quarters the stock showed the gradual EPS rising trend, with r7_future was always higher than r1_future. Which means that MSFT stock was a good opportunity to invest all quarters last 1.5 years.
  • Investment idea: try to find stocks with growing EPS over 1–2 years period of time and invest into it around the reporting day.
Fig.23 MSFT stock: EPS Estimate vs. Reported EPS
Fig.24 MSFT stock: Surprise %
Fig.22 MSFT stock: Stock price jump 1 and 7 days after the reporting date

Scaled Analysis

The first example is 1 and 7-days returns for all stocks grouped by year

import matplotlib.pyplot as pltprint(‘Count observations: ‘,merged_df.groupby(by=’Year’).count()[‘r_future1’])ax = merged_df.groupby(by=’Year’).mean()[[‘r_future1’,’r_future7']].plot.line(figsize=(20,6))
vals = ax.get_yticks()
ax.set_yticklabels([‘{:,.1%}’.format(x) for x in vals])
plt.axhline(y=0, color=’r’, linestyle=’-’)
plt.title(“1 and 7 days returns of stocks after the quarterly earnings results announcement”)
Fig.23 Aggregated r_future1 and r_future7

Parametrise the function (with filtering and groupby conditions)

def draw_returns(groupby_factor, filter): 
filter_year = merged_df.Year>=2000
print(‘Count observations: ‘,merged_df[filter & filter_year].groupby(by=groupby_factor).count()[‘r_future1’])
ax = merged_df[filter & filter_year].groupby(by=groupby_factor).mean()[[‘r_future1’,’r_future7']].plot.line(figsize=(20,6))
vals = ax.get_yticks()
ax.set_yticklabels([‘{:,.0%}’.format(x) for x in vals])
plt.axhline(y=0, color=’r’, linestyle=’-’)
plt.title(“1 and 7 days returns of stocks after the quarterly earnings results announcement”)
if groupby_factor==’Year’:
draw_returns(‘Year’, merged_df[“Reported EPS”]<0)
draw_returns(‘Year’, merged_df[“Reported EPS”]<-1)
draw_returns(‘Year’, merged_df[“Reported EPS”]>0)
draw_returns(‘Year’, merged_df[“Reported EPS”]>1)
draw_returns(‘Year’, merged_df[“Reported EPS”]>2)
Fig.24 EPS less than 0
Fig.25 EPS less than -1
Fig.26 EPS greater than 0
Fig.27 EPS greater than 1
Fig.28 EPS greater than 2

Slicing the Data on the Volume of Trade

merged_df.ln_volume.replace([np.inf, -np.inf], np.nan).hist(bins=100)
Fig.29 Ln daily volume of trade distribution
merged_df[“ln_volume_binned”] = pd.qcut(merged_df[“ln_volume”],10)
(17.674, 21.613] 1226
(6.396, 14.344] 1226
(17.111, 17.674] 1225
(16.74, 17.111] 1225
(16.436, 16.74] 1225
(16.157, 16.436] 1225
(15.85, 16.157] 1225
(15.49, 15.85] 1225
(15.037, 15.49] 1225
(14.344, 15.037] 1225 Name: ln_volume_binned, dtype: int64
draw_returns(‘ln_volume_binned’, merged_df[“Year”]>2000)
draw_returns(‘ln_volume_binned’, merged_df[“Year”]==2020)
draw_returns(‘ln_volume_binned’, (merged_df[“Year”]==2020) & (merged_df[“Reported EPS”]<0))
Fig 30. r1_future and r7_future returns split by volume of trade for year>2000
Fig 31. R1 and R7 returns split by volume of trade for year==2020
Fig 32. R1 and R7 returns split by volume of trade, year==2020, and reported_EPS<0

Analysing the Surprise % Value

merged_df[“surprise_%_binned”] = pd.qcut(merged_df[“Surprise(%)”],10)
(-31360.231, -17.356] 1226
(-17.356, -3.39] 1226
(-3.39, 0.27] 1229
(0.27, 1.87] 1221
(1.87, 3.73] 1226
(3.73, 6.446] 1223
(6.446, 10.677] 1225
(10.677, 17.52] 1226
(17.52, 36.469] 1224
(36.469, 6900.0] 1226
Name: surprise_%_binned, dtype: int64
draw_returns(‘surprise_%_binned’, True)
Fig.33 R1 and R7 returns split by the Surprise%


  • 1 day change in price of a stock around the reporting date lies in the interval [-5%.. +5%]. Smaller stocks tend to have larger jumps (and potential profits for an investor), but they bear the larger risk as well;
  • Average reported EPS is $0.52 which is close to the estimated EPS $0.50. The most of observations have only a few percents difference between these values (Surprise(%) <2% in many cases). There are many outliers in the dataset with extremely high/low absolute values of EPS or Surprise(%);
  • There are some long-term trends for individual stocks (like GE and MSFT), when a gradual increase in EPS makes investors happy and raises the r1 and r7 prices for several quarters in a row. These trends can be generalised to the “successful” years when average r1_future>0 and average r7_future>0;
  • It is hard to find a consistent cluster of stocks (with some conditions on EPS, Surprise, year, volume, etc.) which will have the risk-less profit (returns after 7 days are higher than after 1 day: r7>r1);
  • There are certain thresholds around -+10% for Surprise(%), which cause a massive sell or buy behaviour from investors and move the prices to a negative or positive side. If the Surprise(%) is greater than 17% (~20% of all data points in 20 years), then the average difference between r7 and r1 is positive: Investors continue to buy a rising stock during not only one day after the announcement, but a longer period of time (one week or more).



Data and Product Analyst at Google. I run a website on Python programming, analytics, and stock investing.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ivan Brigida

Data and Product Analyst at Google. I run a website on Python programming, analytics, and stock investing.