Use Your Computer to Make Informed Decisions in Stock Trading: Practical Introduction — Part 4: Scraping Earnings Per Share (EPS)
Introduction
I know the exact day when I’ve started trading— it was 28th April 2020, when Google announced its Q1 earnings and rose more than 7% in 1 day. I thought that Facebook had a similar business and it was releasing the report the next day, so I’ve invested my first $1000 in Facebook stock. And it was a quick win — it rose more than 10% in 1 day after showing “stability in ad revenue after fall in March”. The same approach worked for Facebook stock in Q2 earnings date as well: it soared 8% after reporting higher earning-per-share $1.8 instead of expected $1.39. After the second success I finally decided to make a proper analysis at scale and write an article about it, which you can read below.
The hypothesis is: “The financial results are becoming extremely important during the hard times (e.g. COVID-19 in 2020): some resistant verticals gain disproportionally high stock volume of trade and price rise”.
I wish I could find at least one strong idea, which will work in most of the times. Just as that Metal Man of Sligo from the picture pointing at the entrance to Sligo for a couple of centuries now.
The Approach
The stocks ownership gives you the right to share in the profits of the company. In an ideal world, the price of the stock should be highly dependent on the earnings of the company, as it is a discounted future profits. If, at some point, a company earns more than previously — it might mean that the growth of a company is accelerated and the stock should be priced higher. That’s why — among some other financial indicators to follow — earnings per share (EPS) is one of the most important one. Every quarter analysts make predictions on the company profits (or losses) and then check those predictions vs. actual reported EPS. If the company is doing better than predicted — it should cause the stocks price increase, and vice versa.
In this article, we aim to test this at scale — for hundreds of stocks that have reported earnings in 2020 Q2. We will check the dependency of a stock’s price fluctuation vs. actual EPS, predicted EPS, and Surprise (= actual_EPS/predicted_EPS-1). Below is a quick overview of the sections and topics covered in the article:
In the Scraping Yahoo Finance: Earnings-Per-Share section, you’ll learn how to obtain the earnings-per-share information for a wide range of companies for a specified period of time (starting with one day), scraping it from the Yahoo Finance website. Then, in the Packing Everything in One Scrape Function section, you’ll embody everything you learned in the previous section into a single function to get a weekly stats on the dates and EPS. After that, in the Getting Stock Prices for a Company section, you’ll look at how you can get data on stock returns and volume for a certain company. In the Getting S&P 500 Stats section, you will look at how to obtain S&P 500 data to evaluate how a certain symbol is doing against the index. In the Getting Stock Returns and Volume from Yahoo Finance section, you’ll learn how to obtain data on stock returns and volume for all tickers found in Yahoo finance. In Merging All the Pieces Together part you will get the combined dataframe of the stats from all previous parts. And finally, in Analysis and Visualisation section you will see the examples of graphs built on the dataset.
This article is the fourth part in the series that covers how to take advantage of computer technologies to make informed decisions in stock trading. Refer to part 1 to be guided through the process of setting up the working environment need to follow along with the examples provided in the rest of the series. Then, part 2 covered several well-known finance APIs, allowing you to obtain and analyse stock data programmatically. In part 3, you explored whether stock market is influenced by the news.
Scraping Yahoo Finance: Earnings-Per-Share
You might be interested in the earnings-per-share information for a wide range of companies over a certain period of time. You can scrap this information from the Yahoo Finance website at https://finance.yahoo.com/calendar/earnings:
To do scraping, you’ll need to install the BeautifulSoup library in your Colab:
!pip install beautifulsoup4
Then, make sure to import the following libraries:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
Suppose you want to obtain data for 2020–07–27. You’ll need to specify the following URL:
url = “https://finance.yahoo.com/calendar/earnings?from=2020-07-26&to=2020-08-01&day=2020-07-27"
Then, send the following request:
r = requests.get(url)
To make sure it has worked as expected, check the status of the request:
r.ok
If everything is OK, you should see:
True
Now you can move on to the content:
r.content
Your task however, is to find table data within it. This can be easily accomplished with the help of BeautifulSoup as follows:
soup = BeautifulSoup(r.text)
table = soup.find_all(‘table’)
len(table)#output:
1
Just one table has been found, which is good. Let’s now get all the column names of the table:
spans = soup.table.thead.find_all(‘span’)
columns = []
for span in spans:
print(span.text)
columns.append(span.text)
Here are the columns (refer back to the screenshot in Figure 1 to make sure that all the columns have been found):
Symbol
Company
Earnings Call Time
EPS Estimate
Reported EPS
Surprise(%)
Now you can move on to the rows:
rows = soup.table.tbody.find_all(‘tr’)
len(rows)
As you can see, you have 100 rows in the table scraped from the page. In the next code snippet, you load the rows to a pandas dataframe, reading row by row:
stocks_df = pd.DataFrame(columns=columns)for row in rows:
elems = row.find_all(‘td’)
dict_to_add = {}
for i,elem in enumerate(elems):
dict_to_add[columns[i]] = elem.text
stocks_df = stocks_df.append(dict_to_add, ignore_index=True)
As a result, the data in the dataframe should look as follows:
stocks_df
You should have 100 rows scraped. We will use all columns, but note that Earnings Call Time values are not supplied in many cases. Some other problems in the dataset are:
- missing values: some values for EPS Estimate, Reported EPS, and Surprise are unknown
- some values in these columns are integers: you need to convert them to float
To get rid of missing values, you can apply the following filters to the dataset:
filter1 = stocks_df[‘Surprise(%)’]!=’-’
filter2 = stocks_df[‘EPS Estimate’]!=’-’
filter3 = stocks_df[‘Reported EPS’]!=’-’stocks_df_noMissing = stocks_df[filter1 & filter2 & filter3]
You should see that the number of rows has reduced after that:
len(stocks_df_noMissing)79
In the next step, you solve another problem and convert all the values in the EPS Estimate, Reported EPS, and Surprise columns to float:
stocks_df_noMissing[‘EPS Estimate’] = stocks_df_noMissing[‘EPS Estimate’].astype(float)stocks_df_noMissing[‘Reported EPS’] = stocks_df_noMissing[‘Reported EPS’].astype(float)stocks_df_noMissing[‘Surprise(%)’] = stocks_df_noMissingstocks_df_noMissing[‘Surprise(%)’].astype(float)
As a result, you should have the following dataframe:
stocks_df_noMissing.info()<class ‘pandas.core.frame.DataFrame’>
Int64Index: 79 entries, 0 to 99
Data columns (total 6 columns):
# Column Non-Null Count Dtype
— — — — — — — — — — — — — — -
0 Symbol 79 non-null object
1 Company 79 non-null object
2 Earnings Call Time 79 non-null object
3 EPS Estimate 79 non-null float64
4 Reported EPS 79 non-null float64
5 Surprise(%) 79 non-null float64
Packing Everything in One Scrape Function
Now you are ready to pack everything in one scrape function that returns stocks_df over 1 week. You may call the function during the several weeks period:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as nm
from datetime import datetime, timedelta, date# Need to supply weekly stats as you see on the website
#from_dt = ‘2020–07–26’
#to_dt = ‘2020–08–01’def get_scrapped_week(from_dt, to_dt): # initially look at the first 100 stocks with earnings at the first
day of the week (from_dt) # FULL URL with PARAMS example: url = “https://finance.yahoo.com/calendar/earnings?from=2020-07-26&to=2020-08-01&day=2020-07-27" url = “https://finance.yahoo.com/calendar/earnings"
offset = 0
size = 100
fst = 1 # scrape every date in the submitted interval
for day_date in (datetime.strptime(from_dt, ‘%Y-%m-%d’) + timedelta(n) for n in range(6)):
day_dt = datetime.strftime(day_date, ‘%Y-%m-%d’)
print(day_dt)# inner cycle for iteration with offset, if more than 100 stocks earnings happened that date while True:
# make URL request with the params
params = {‘from’: from_dt, ‘to’: to_dt,’day’: day_dt, ‘offset’:offset, ‘size’: size}
r = requests.get(url, params=params)
soup = BeautifulSoup(r.text) # scrape table column names when going first time to create a correct dataframe
if fst == 1:
spans = soup.table.thead.find_all(‘span’) columns = []
for span in spans:
print(span.text)
columns.append(span.text)
stocks_df = pd.DataFrame(columns=columns)
fst = 0 # scrape body with row values
rows = soup.table.tbody.find_all(‘tr’) for row in rows:
elems = row.find_all(‘td’)
dict_to_add = {}
dict_to_add[‘Date’] = day_dt for i,elem in enumerate(elems):
dict_to_add[columns[i]]=elem.text
stocks_df = stocks_df.append(dict_to_add, ignore_index=True)
if len(rows) != 100:
print(len(rows)+offset)
offset = 0
break
else:
offset = offset + 100 return stocks_df
Let’s try it for a certain week:
stocks_df = get_scrapped_week(‘2020–07–05’, ‘2020–07–11’)
Here is the output:
2020–07–05
Symbol
Company
Earnings Call Time
EPS Estimate
Reported EPS
Surprise(%)
8
2020–07–06
29
2020–07–07
23
2020–07–08
23
2020–07–09
23
2020–07–10
4
You can obtain data for more weeks, appending the results to stocks_df:
stocks_df = stocks_df.append(get_scrapped_week(‘2020–07–12’, ‘2020–07–18’))
stocks_df = stocks_df.append(get_scrapped_week(‘2020–07–19’, ‘2020–07–25’))
stocks_df = stocks_df.append(get_scrapped_week(‘2020–07–26’, ‘2020–08–01’))
Before going any further, don’t forget to clean up, so that the final dataset has no missing values, and transform text to float:
filter1 = stocks_df[‘Surprise(%)’]!=’-’
filter2 = stocks_df[‘EPS Estimate’]!=’-’
filter3 = stocks_df[‘Reported EPS’]!=’-’stocks_df_noMissing = stocks_df[filter1 & filter2 & filter3]stocks_df_noMissing['EPS Estimate'] = stocks_df_noMissing['EPS Estimate'].astype(float)
stocks_df_noMissing['Reported EPS'] = stocks_df_noMissing['Reported EPS'].astype(float)
stocks_df_noMissing['Surprise(%)'] = stocks_df_noMissing['Surprise(%)'].astype(float)
To simplify search operations within the dataset, you might want to set the Symbol column as the index:
stocks_df_noMissing.set_index(‘Symbol’)
Here is how you can take advantage of it:
Symbol Company Earnings Call Time EPS Estimate Reported EPS Surprise(%) Date505 GOOGL Alphabet Inc. Time Not Supplied 8.21 10.13 23.42 2020–07–29
Getting Stock Prices for a Company
In this section, we’ll look at how you can get data on stock returns and volume for a certain company. To start with, you’ll need to install the yfinance library needed to obtain stock data:
!pip install yfinance
Let’s start with an attempt to get one stock one date, calculating the returns and volume jump.
import yfinance as yf
import numpy as np
import pandas as pd
from datetime import datetime
from datetime import timedelta
This is what we can see for the FB symbol:
row = stocks_df_noMissing[stocks_df_noMissing[‘Symbol’]==’FB’]
print(row)Symbol Company Earnings Call Time EPS Estimate Reported EPS Surprise(%) Date776 FB Facebook, Inc. Time Not Supplied 1.39 1.8 29.59 2020-07-29
You can easily extract the date from it:
date = row[‘Date’].values[0]
print(date)2020-07-29
Let’s now obtain stock data for this date and the dates near it, using the yfinance library:
date = datetime.strptime(row[‘Date’].values[0], ‘%Y-%m-%d’)print(date + timedelta(days=3))
print(date — timedelta(days=1))
ticker = yf.Ticker(‘FB’)
hist = yf.download(“FB”, start= date — timedelta(days=1), end=date + timedelta(days=3))
The output should look as follows:
2020–08–01 00:00:00
2020–07–28 00:00:00
[*********************100%***********************] 1 of 1 completed
If you check out the hist variable, it should contain the following data:
In the next step, you determine the stock price and volume rises for the last two days of observation.
hist[‘r2’] = np.log(hist[‘Open’] / hist[‘Open’].shift(2))hist[‘volume_rise’] = np.log(hist[‘Volume’] / hist[‘Volume’].shift(2))
So the updated dataset shows you the volume and price rises for Facebook:
If you want to look at the last value of returns (r2 = 2-days return on the next morning after the event) in the dataset, you can extract it as follows:
hist.r2.values[-1]0.10145051589492579
And the volume of trade rising can be viewed as follows:
hist.volume_rise.values[-1]1.361648662790037
Getting S&P 500 Stats
As mentioned in part2, it’s a common practice to compare the stock performance with S&P 500 index. To begin with, let’s obtain the S&P 500 index data for a specified period of time:
import pandas_datareader.data as pdr
from datetime import datestart = datetime(2020,7,1)
end = datetime(2020,8,10)print(f’Period 1 month until today: {start} to {end} ‘)spx_index = pdr.get_data_stooq(‘^SPX’, start, end)# S&P500 index was growing almost all July 2020 → need to adjust stock growth after the reporting datespx_index[‘Open’].plot.line()
You can apply to the index the same technique, which you used in the previous section to determine the stock price rise, calculating the results for 2 days returns daily in July-Aug 2020:
spx_index[‘r2’] = np.log(np.divide(spx_index[‘Open’] , spx_index[‘Open’].shift(2)))spx_index[‘r2’].plot.line()
In the tabular form, the same data would look as follows:
spx_index.head(30)
Open High Low Close Volume r2 Date2020–08–10 3356.04 3363.29 3335.44 3360.47 2565981272 NaN
2020–08–07 3340.05 3352.54 3328.72 3351.28 2279160879 NaN
2020–08–06 3323.17 3351.03 3318.14 3349.16 2414075395 -0.009843
2020–08–05 3317.37 3330.77 3317.37 3327.77 2452040105 -0.006813
2020–08–04 3289.92 3306.84 3286.37 3306.51 2403695283 -0.010056
2020–08–03 3288.26 3302.73 3284.53 3294.61 2379546705 -0.008814
…
In the next code snippet, you fill an array of S&P returns for a corresponding stock. As an important note, if there is a “gap” for a particular date, we take the closest previous value:
array_returns_snp500 = []for index,row in stocks_df_noMissing.iterrows(): start_dt = datetime.strptime(row[‘Date’], ‘%Y-%m-%d’) — timedelta(days = 1) end_dt = datetime.strptime(row[‘Date’], ‘%Y-%m-%d’) + timedelta(days = 3) # we don’t have gaps more than 4 days -> try to find the closest value of S&P500 returns in the dataframe:
cur_dt = end_dt
while cur_dt >= start_dt:
rez_df = spx_index[cur_dt.strftime(‘%Y-%m-%d’)]
if len(rez_df)>0:
array_returns_snp500.append(rez_df.r2.values[0])
break
else:
cur_dt = cur_dt — timedelta(days = 1)
To make sure that it has worked as expected, you can check the lenght of both datasets: the newly created array_returns_snp500 and stocks_df_noMissing introduced earlier in this article:
len(array_returns_snp500)1698len(stocks_df_noMissing)1698
In both cases, you should have the same number.
Getting Stock Returns and Volume from Yahoo Finance
In this section, we’ll look at how you can get data on stock returns and volume for all tickers found in Yahoo Finance. In the following script, you calculate 2 days returns on the open price after earnings in relation to the price 2 days ago, for each ticker:
array_tickers = []
array_returns = []
array_volume_rise = []
array_volume_usd = []
array_snp500 = []for index,row in stocks_df_noMissing.iterrows():
start_dt = datetime.strptime(row[‘Date’], ‘%Y-%m-%d’) — timedelta(days = 1) end_dt = datetime.strptime(row[‘Date’], ‘%Y-%m-%d’) + timedelta(days = 3) hist = yf.download(row[‘Symbol’], start = start_dt, end = end_dt) # We need to have a full data : volume and price for all dates calculate the returns and volume rise # ALSO: if end_dt is non-trading day (Sat,Sun) → we can’t directly calc the stats of returns if len(hist)<4:
continue hist[‘r2’] = np.log(np.divide(hist[‘Open’] , hist[‘Open’].shift(2))) hist[‘volume_rise’] = np.log(np.divide(hist[‘Volume’], hist[‘Volume’].shift(2))) hist[‘volume_usd’] = hist[‘Volume’] * hist[‘Open’] print(row)
print(index)
print(‘ — — — — — — — ‘) array_tickers.append(row[‘Symbol’])
array_returns.append(hist.r2.values[-1])
array_volume_rise.append(hist.volume_rise.values[-1])
array_volume_usd.append(hist.volume_usd.values[-1]) # We only append values S&P for the stocks that have all the data
array_snp500.append(array_returns_snp500[index])
The script generates huge output (about 1000 entries if you recall). Below is just a fragment:
[*********************100%***********************] 1 of 1 completedSymbol AEOJF
Company AEON Financial Service Co., Ltd.
Earnings Call Time Time Not Supplied
EPS Estimate 14.03
Reported EPS -5
Surprise(%) -135.67
Date 2020–07–07
Name: 37, dtype: object
37
— — — — — — — [*********************100%***********************] 1 of 1 completedSymbol BBBY
Company Bed Bath & Beyond Inc.
Earnings Call Time Time Not Supplied
EPS Estimate -1.22
Reported EPS -1.96
Surprise(%) -60.39
Date 2020–07–07
Name: 43, dtype: object
43
— — — — — — —
…
To sum up, it would be interesting to learn how many stocks have these financials: volume of trade, volume_rise (in stocks amount), and returns:
len(array_tickers)1003
Merging All the Pieces Together
Finally, let’s merge all the financials we have obtained so far to see the entire “picture” for each of the traded stocks. For that, we’ll create a dataframe:
returns_df = pd.DataFrame(columns=[‘Ticker’, ‘Returns’,’Volume Rise’,’Volume Trade USD’,’Returns S&P500'])
And load it with data from the datasets we have created so far:
returns_df = pd.DataFrame([array_tickers,array_returns,array_volume_rise,array_volume_usd, array_snp500]).transpose()returns_df.columns=[‘Ticker’,’Returns’,’Volume Rise’,’Volume Trade USD’, ‘Returns S&P500’]returns_df.set_index(‘Ticker’,inplace=True)
returns_df.dropna(inplace=True)
returns_df[‘Returns’] = returns_df[‘Returns’].astype(float)
returns_df[‘Volume Rise’] = returns_df[‘Volume Rise’].astype(float)
returns_df[‘Volume Trade USD’] = returns_df[‘Volume Trade USD’].astype(float)
returns_df[‘Returns S&P500’] = returns_df[‘Returns S&P500’].astype(float)
returns_df[‘Returns in %’] = np.exp(returns_df[‘Returns’])
returns_df[‘Volume Rise in %’] = np.exp(returns_df[‘Volume Rise’])
Also, it would be interesting to learn what companies have had returns above S&P500 and add this information to the dataset:
# Returns above S&P500returns_df[‘Adj. Returns’] = returns_df[‘Returns’] — returns_df[‘Returns S&P500’]returns_df[‘Adj. Returns in %’] = np.exp(returns_df[‘Adj. Returns’])
The Adj. Returns metric serves as an indicator of a relative growth to overall S&P500 index.
You might want to create a set of histograms for each column in the returns_df dataframe to see a representation of the distribution of the data. If you want to have the histograms with no INF values, you can replace them in the dataframe as follows:
returns_df = returns_df.replace([np.inf, -np.inf], np.nan)
returns_df.hist(figsize=(20,10), bins=100)
In the next step, you join the returns_df dataframe with the stocks_df_noMissing dataframe:
stocks_and_returns = stocks_df_noMissing.set_index(‘Symbol’).join(returns_df)stocks_and_returns.head()
You might want to remove the INF values from the final dataset:
stocks_and_returns_no_missing = stocks_and_returns.replace([np.inf, -np.inf], np.nan).dropna()
That is what you should finally have:
stocks_and_returns_no_missing.info()<class ‘pandas.core.frame.DataFrame’>
Index: 997 entries, AA to ZEN
Data columns (total 14 columns):
# Column Non-Null Count Dtype
— — — — — — — — — — — — — — -
0 Company 997 non-null object
1 Earnings Call Time 997 non-null object
2 EPS Estimate 997 non-null float64
3 Reported EPS 997 non-null float64
4 Surprise(%) 997 non-null float64
5 Date 997 non-null object
6 Returns 997 non-null float64
7 Volume Rise 997 non-null float64
8 Volume Trade USD 997 non-null float64
9 Returns S&P500 997 non-null float64
10 Returns in % 997 non-null float64
11 Volume Rise in % 997 non-null float64
12 Adj. Returns 997 non-null float64
13 Adj. Returns in % 997 non-null float64dtypes: float64(11), object(3)
memory usage: 116.8+ KB
Time to get results from our analysis. What are the TOP 50 most traded stocks around the date they publish the quarterly reporting?
top50_volume = stocks_and_returns_no_missing.sort_values(by=’Volume Trade USD’, ascending=False).head(50)print(top50_volume)
Now what are the TOP 200 most traded stocks? Run this code to learn:
top200_volume = stocks_and_returns_no_missing.sort_values(by=’Volume Trade USD’, ascending=False).head(200)print(top200_volume)
Analysis and Visualisation
Let’s play with our dataset, trying to extract interesting information from it. In particular, it would be interesting to look at the distribution of returns. To see a visual summary of information, we’ll build several plots:
In the following plot, you can see a plotting of the Surprise column versus the Returns column in the top50_volume dataframe. This information can be quite useful if you want to start investing in the most lucrative stocks.
top50_volume[[‘Surprise(%)’,’Returns in %’]].plot.scatter(x=’Surprise(%)’, y=’Returns in %’)
There are several immediate results:
- most of the stocks show the result around the expected EPS
- 2 Surprise outliers showed 60x and 80x actual EPS vs. predicted AND slightly positive returns
In the next plot, you can see the same plotting but for the top200_volume dataframe.
top200_volume[[‘Surprise(%)’,’Returns in %’]].plot.scatter(x=’Surprise(%)’, y=’Returns in %’)
A few things to add on this graph:
- there are some new outliers with Surprise(%)>100% and Surprise(%)<2000%, which showed very impressive returns 10%-30% just in 2 days after the quarterly results announcement
- strong negative Surprise doesn’t mean automatic decrease in price: 5of the most left points showed 0.9–1.1 of the value in 2 days, which is -10% to +10% Returns
What if the ‘Returns in %’ depend not only from the relative ‘Surprise in %’, but more on the absolute value ‘Reported EPS’ ?
In the next plot, we’re plotting Surprise and Reported EPS vs. Returns in % for TOP 50. We use subplots to show the axis value:
import matplotlib.pyplot as pltfig, ax = plt.subplots()top50_volume[[‘Surprise(%)’,’Reported EPS’,’Returns in %’]].plot.scatter(x=’Reported EPS’, y=’Surprise(%)’, c=’Returns in %’, colormap=’RdYlGn’, ax=ax)
In general we see many of the stocks reporting EPS between 0 and 2.5$, with a small positive surprise and moderate growth (light green : <5% Returns)
In the next plot, you can see the same plotting but for TOP 200:
fig, ax = plt.subplots()top200_volume[[‘Surprise(%)’,’Reported EPS’,’Returns in %’]].plot.scatter(x=’Reported EPS’, y=’Surprise(%)’, c=’Returns in %’, colormap=’RdYlGn’, ax=ax)
In the next plot, we’re plotting Surprise and Reported EPS vs. Adj. Returns in % for TOP 50.
fig, ax = plt.subplots()top50_volume[[‘Surprise(%)’,’Reported EPS’,’Adj. Returns in %’]].plot.scatter(x=’Reported EPS’, y=’Surprise(%)’, c=’Adj. Returns in %’, colormap=’RdYlGn’, ax=ax)
As you can see, Figure 7 and Figure 9 are not very different — individual stock shocks are much higher than average S&P500, so that Adj.Returns and Returns are very close.
In the next plot, you can see the same plotting but for TOP 200.
Again, Figure 10 it is very similar to Figure 8 (Adjusted Returns and Non-Adjusted returns).
In the next plot, we’re plotting Reported EPS and EPS Estimate vs Returns in % for TOP 50.
fig, ax = plt.subplots()top50_volume.plot.scatter(x=’Reported EPS’, y=’EPS Estimate’, c=’Returns in %’, colormap=’RdYlGn’, ax=ax)
In the next plot, we’re plotting Reported EPS and EPS Estimate vs. Adj.Returns in % for TOP 50.
You’ll see a similar picture when plotting Adj. Returns vs. Returns:
fig, ax = plt.subplots()top50_volume.plot.scatter(x=’Reported EPS’, y=’EPS Estimate’, c=’Adj. Returns in %’, colormap=’RdYlGn’, ax=ax)
In the following histogram, we compare on returns top 50 and top 200 volume-traded stocks. You may notice that top 200 stocks distribution (blue)have more “bell shaped” distribution around 0 and slightly positive returns vs. top 50 stocks (orange):
top200_volume[‘Adj. Returns in %’].hist(bins=50, alpha=0.5)top50_volume[‘Adj. Returns in %’].hist(bins=50, alpha = 0.5)
Conclusion
We’ve shown how to scrape the financial predictions from a website and how to connect them together with the stock returns. Q2'20 seems to be a very successful quarter for the top 50 (on the volume trade) stocks — most of them showing the positive surprise over the expected earnings-per-share (EPS) and high short-term returns. The result remains strong even after the corresponding S&P500 index returns are deducted (i.e. the top 50 stocks had higher positive growth than average index dynamics). When scaled to top-200 stocks — the result is not that simple — the average returns are smaller, and there is more variation in EPS and the returns.