Use Your Computer to Make Informed Decisions in Stock Trading: Practical Introduction — Part 10: Passive Investing with ETFs in Europe

Ivan Brigida
15 min readMay 1, 2022
Old Head of Kinsale, Ireland

[HINT] You can read my previous articles on Medium or on the website PythonInvest (which features more dynamic content). The complete Python code (Colab notebooks) can be found on Github. A YouTube intro video explains the Colab code logic.

Why ETFs?

While, in the US, passive investment in equity is already more than 50% (Bloomberg, 2021), in Europe, index-tracking funds account only for 20%. While Europe lags significantly behind the US in this area, all the indications are that it is on its way to catching up with it, its share of total investment having doubled since 2010 (link Awareness of the topic among retail investors remains minimal due to the lack of interest on the part of professional market participants in selling this product, mainly because the low ETF fees and the fact that they are not shared with resellers remove any incentive on their part.

There are many benefits to investing in ETFs (details in the article “What is an ETF”):

  • wide diversification of investments (e.g. some index ETFs replicate indexes composed of more than 1600 stocks)
  • a multitude of niche ETF funds specialising in a particular sector (e.g. “Energy”) or equity strategy/theme (e.g. “Cloud Security”)
  • low management fees

The downsides can be:

  • lower returns from a highly diversified portfolio than an individual stock (e.g., after a recent IPO, or a small-cap stock)
  • better financial literacy and knowledge required — there are some regulatory and tax issues to be considered (compared to an easy direct stock investment through a broker app)
  • good judgement needed to understand when to rebalance an investment in a changed macroeconomic environment (there is no active asset manager, who will decide it for you)

Executive Summary

In this work we explore 1000+ of the most popular European ETFs: extract the list of unique identifiers (ISINs) for the comparison, obtain the detailed list of the features for each ETF to investigate, build a visualisation and top-level comparison of funds.

After reading the article, you’ll be able to name the main classes and subclasses of the exchange-traded funds (ETFs), compare their short-term (year-to-date) and long-term (up to 3 years) returns.

You will see the new 2021 trend of CryptoETFs (available on a big stock exchange), and the most recent 2022 shift of positive returns from Equity to Precious Metals and Commodities.

Summary of Results

  1. ETFs offer seven broad categories to invest in: Equity ETFs, Bonds ETFs, Precious Metals ETFs, Cryptocurrencies ETFs, Commodities ETFs, Real Estate ETFs, Money Market ETFs
  2. Equity ETFs is the most packed category with more than 750 options in Europe. It includes funds covering stocks in the US, World, Europe, standalone country indexes, sectors, and specific ideas (e.g. Social/Environmental)
  3. Equity, Commodities, and Precious Metals ETFs showed double-digit returns in 3 years: 30–60% cumulative growth in 3 years, with Equity acceleration in 2020–2021 and Commodities/Precious Metals excellent performance in 2022.
  4. Cryptocurrency is the new trend in ETFs space. Showed 7x explosive growth in the last 3 years (but -17% YTD in 2022), have an average age of 1 year and 4x higher commissions compared to other categories.
  5. Current quote (vs. 52-weeks range comparison) reveals ETFs after the fall. It is currently low (and potentially has some room to grow) for Emerging Markets, Germany, China Equity ETFs.

ETF Features Explained

Here we list the top features used in the study and give examples.

Name: fund name normally breaks down as follows: “iShares Core S&P 500 UCITS ETF (Acc)”:

  • iShares — the company issuing the ETF
  • Core S&P 500 — the strategy
  • UCITS ETF (Acc) — additional information related to the fund;

ISIN: International Securities Identification Code — this is the primary key for the datasource about ETF funds in Europe that we’re building;

Description: more details about the fund. In the case of the previously discussed fund “iShares Core S&P 500 UCITS ETF (Acc)” it is “The S&P 500® index tracks the 500 largest US stocks.”;

Exchange: the exchange venue, where the ETF is traded. Many of the top EU ETF funds are traded on XETRA (“the leading trading venue in Europe for ETFs with more than 2000 traded ETFs&ETNs”);

Labels: groups assigned to each fund. For “iShares Core S&P 500 UCITS ETF (Acc)” there are 3 labels: S&P 500(19), Equity(1190), United States (214). The number in brackets gives a total number of comparable funds (within 2–3 levels of categorisation). In this example, there are 19 EU funds tracking the S&P 500 index, 214 funds related to the US, 1190 funds on Equity (on the website). Numbers include accumulating + distributing funds. Let’s remove the latter ones, as they are easier to work with due to the taxes implications. For the sake of building more structure among groups, we’ll compose the hierarchy from top3 on popularity labels Category->subCategory->subCategory2 (in this case: “Equity -> United States -> S&P 500”);

Fund_size: total assets under management (€ m). More assets generally means greater popularity;

TER (Total Expense Ratio): fund fees, which are usually considered as a proxy to the Total Cost of Ownership (TCO) — read full article for more details. TER is normally quite low for ETFs compared to active managed funds (0.05%-0.2% for passive ETFs vs. 1–2% of active funds). But sometimes ETFs TER can be up to 2%, for example for the new crypto investments funds;

Replication: method of replication — Physical (Full replication), Physical (Optimised sampling), Synthetic. Physical (Full replication) allows an investor to get the smallest tracking error (to the index tracked), but it may be costly and hard to implement for a fund (e.g. MSCI World has 1600 stocks in it — full replication means to buy (and rebalance with any new inflow money to the fund). Read the article ‘Replication methods of ETFs’ for more details);

Strategy risk: only ‘Long-only’ strategy funds are included in the dataset, meaning that we cover the strategy of price growth (buy low, sell higher);

Fund currency: the original currency of a fund. For example, if the fund replicates some broad index from the US Stocks (S&P 500, MSCI World, etc.) — the fund currency will be USD;

Currency risk: you’ll see “Currency unhedged” in most situations, as the fund currency is “USD” in many cases, and the returns reporting and trading on European stock exchanges is in “EUR”. Thus, the financial results are biased to the fluctuations of USD/EUR currency;

Distribution policy: can be Distributing (paying off the dividends) or Accumulating (reinvesting the dividends in growth). We’ll consider only Accumulating in this article, as you don’t need to pay taxes on the dividends received;

Fund domicile: the country where a fund is registered. More than 85% of European ETFs are registered in Ireland and Luxembourg (source). The top 2 differences in the funds from these countries are the replication strategy (more Full/Optimised replication funds in Ireland, and Synthetic funds in Luxembourg), and by the asset classes (more Bond funds in Ireland and Equity funds in Luxembourg, also more exotic funds like Alternatives and Money market in Luxembourg);

Return_<period>: data points on the returns: YTD (year-to-date), 2021, 2020, 2019, 2018, 1month, 3months, 6months, 1year, 3years. These indicators are the most important ones, as they define the revenue generated by the fund’s ownership. We want all of them to be positive (green) and higher than TER (to have positive NET income), but it rarely happens like this in reality , as there are periods of ups and downs. The longest-period return (3years, if available) is the first candidate to look at, as it smoothes the short fluctuations.

Web Scraping with Selenium

There is no easily available open database with EU ETF funds listed with all features, so the decision is to construct it by scraping the website When you know what you’re looking for, you’ll find this page ( convenient with its basic search and comparison functionality. Getting the data in a ‘table’ view (ideally, as a clean Pandas dataframe), will make life much easier for an analyst.

Web scraping (or Web data extraction) is the practice of collecting data from the Web by writing scripts. These can mimic a user’s behaviour by opening many web links in one session in a real browser.

We’ve already introduced scraping in the previous articles: Part4. Scraping EPS and Part5. Long-term EPS. The general idea of the approach outlined there was to generate a unique URL for scraping with several parameters (like this “https://<address>?param_1=value_1&…&param_n=value_n”) and get the source code for every page (for various sets of params = {value_1,…,value_n}).

The source code (which is an HTML document representing DOM — Document Object Model) allows anyone to quickly search (or scrape) any text in the tree-like DOM document using CSS Selector and/or XPath language.

The main point of this article is that we can’t confine ourselves to that (convenient) approach anymore.

There are three main reasons for this:

  • On many occasions, the content on a webpage is generated dynamically on-the-go after a user presses a button on the page (the easiest example of this is pagination, in which you have the buttons “Next” and “Previous”, which change the content loaded while the page URL remains the same. Here is an example URL: . You will see that the URL address is not changing when you press “Previous” or “Next” button — so you can’t get the new page data with the same query.
  • Some websites fail to fully load with a plain URL request (which is what we used in previous articles), and there are additional resources (like Javascript) to be downloaded and executed to update the content (they may be slow to load or may wait for a human action before loading). To verify this, you can try to disable the Javascript code in your browser and check some of your favourite websites — you will be surprised how the layout (colours, script size, padding, etc.) and text can change.
  • Even if the website loads eventually, you may need to wait some time (while the important elements are loaded) before retrieving the page source within a script.

We recommend the Selenium library as a solution to the aforementioned problems. Please read the tutorials on Locating elements and Waits, as they helped me to get the full text of a page and find the specific HTML elements.

With this new set-up, you may be able to “pretend” to be a real human, who opens the browser, presses the buttons, and waits until all scripts are loaded and executed (human analogy: all page elements are “visible” in the Browser). Then you can download the source code received by a programmatic “browser” and work as usual when you do scraping (described in the previous articles).

You can find the full code of scraping 1000+ ETFs unique identifiers (‘ISINs’), and then downloading the specific features information on each of the ETFs product page on the Github.

ETF Classes and Aggregated Statistics

We’ve now managed to collect basic information on 1000+ EU-listed ETF funds — no mean feat!

Let’s explore the macro-universe of the available ETF options at the first approximation: Equity ETFs (758 funds), Bond ETFs (183 funds), Precious Metals (37), Cryptocurrencies (36), Commodities (28), Real Estate (10), Money Market (5).

Fig1. ETF funds split by category

(dynamic version of the graph is available on the PythonInvest website)

While we wrote mostly about the largest US stock in the previous articles, here, as you can see from the pie chart, we’re going to see the data on a much wider set of options (via ETFs):

  • Sectoral, broad US and European indexes, worldwide indexes covered by the “Equity” funds
  • Corporate and Government bonds (including inflation-protected TIPS) in “Bonds”
  • Gold, Platinum, Silver, Palladium in “Precious Metals”
  • Individual cryptocurrencies (Bitcoin, Ethereum, etc.) and baskets of currencies in “Cryptocurrencies”
  • Energy, precious metals, industrial metals, livestock and agriculture in “Commodities”
  • The largest Real estate public companies and REITs (Real Estate Investment Trust) in “Real Estate”
  • Highly liquid, low risk (and low return) funds paying overnight and short-term rates (Eurozone) in “Money Market”

Let’s look at the aggregated statistics in order to understand the generic “attractiveness” of different asset classes:

Descriptive analysis of the main ETF classes

Here are the details on the columns:

  • ter_compare_min, ter_compare_max: min and max Total Expense Ratio (TER) in the category. These parameters often intersect between different categories (e.g. [0.05%-0.75%] for Bonds ETFs vs. [0.04%-1.38%] for Equity ETFs), and you need to check the distribution of the TER in more detail and compare the concrete fund you’re interested in against it. You can also approximately estimate the max “cost” of owning that class of assets: for Bonds, a 0.75% fee could almost kill the profitability of returns in 2021 (0.87%), while for Commodities, a 0.89% fee is a much smaller portion of the 38% yearly returns of 2021;
  • years_since_inception: average years of an ETF since its inception. You can observe that, on average, funds are 5–7 years old, while Cryptocurrency funds are less than 1 year old (“new” and emerging things are always more expensive to buy), and MoneyMarket funds are 13 years old (highest average age);
  • fund_size, €m: the average size of a fund’s assets. The highest is for Precious Metal ETFs with €2.1b of assets, while for all other classes it is hundreds of millions, and less than €100m for Cryptocurrencies. Indirectly, it can indicate the ‘popularity’ of a class among investors;
  • ter: an average total expense ratio. It explains the difference in ETF classes better than min and max values for the intersecting intervals of the category-level TER. Thus, all classes have less than 0.4% TER on average, and only Cryptocurrency ETFs show 1.6% TER (4x more expensive);
  • Volatility 1 year: market volatility of the last 365 days (~ Mar’21 — Apr’22). This is a measure of risk, which varies from as little as 1% for Money Market ETFs, to 91% for Cryptocurrency ETFs;
  • Sharpe_1Y: this an approximation of a real Sharpe ratio, assuming zero risk-free return, and dividing return_2021 by ‘Volatility 1 year’ (it should be a volatility for all months of 2021, but we’ve used the data which is available Mar’21-Apr’22 to get a ‘proxy’ for Volatility in 2021). The metric shows a positive and good (higher than 1) retrospective risk-return profile for ETFs in Commodities, Cryptocurrency, Equity, Real Estate. Things are different in 2022, so you shouldn’t rely on this metric alone when selecting the asset class to invest in;
  • return_YTD (year-to-date, 1Jan — 9 Apr 2022), return_2021, return_2020: 3 data points of the returns, which show the most recent stats (YTD), and the last 2 years.

Now let’s add longer-term return stats to the previous list of aggregated statistics (1 year and 3 years from now):

Short- and Long-term returns for the classes of ETFs

You probably shouldn’t expect the same explosive 7x growth (719% in 3 years) for risky Cryptocurrency ETFs, although it may be a good idea to have some portion of “risky” assets for diversification. Another trend is clearly visible here: Commodities, Equity, and Precious Metals also show high growth (32%-59% in 3 years) — this adds more complexity in the decision making when you try to select the best performing fund for your needs.

Distribution of Returns in the Sequence of Years

We can go even further and analyse the returns distribution and the individual ETF level, while also looking at the changes in the performance from year to year.

The first thing to do is to compare the returns distribution for 2022 — the annualised forecast in this case — (‘return_YTD_ann’), 2021 (‘return_2021’), and 2020 (‘return_2020’):

Distribution of returns 2020–2022 (YTD)

Commentary: 1) 2020 (green) was a “normal” year when the returns were almost symmetrically distributed around 0; 2) 2021 (orange) showed ‘abnormal’ positive returns (orange distribution is to the right of the green one); 3) 2022 generally started with negative returns (blue distribution is to the left), but note the long tail that represents the funds that grew by 10–40% in just the first three months of 2022 (if they continue doing that — it is hundreds of percentage points) — there is a long blue tail distribution to the right .

The next idea is to look at the joint distribution of 2022 (annualised) vs. 2021:

Joint distribution of returns 2021 vs. 2022YTD

If this is your first time seeing this kind of chart, it might be useful to think of it just as a height ‘contour’ map, where the most concentrated regions have more datapoints.

There is one “peak” where most of the values are concentrated.
The centre is approximately at +25% returns in 2021 and -20% returns in 2022, while there is a long tail of a highly successful funds in 2022 (annual projection of returns).

This means that, on average, highly successful funds in 2021 tend to lose value in 2022.

Equity ETFs Subcategories

Idea: we want to have smaller groups of ETFs to look at.

Equity ETFs have a large variety of options (758 funds) and represent around 70% of EU ETFs. This number grows to 78% if we look at the total assets share.

We introduce more granularity by adding the subcategories (by analysing the fund labels set on the website), which are the top 3 labels for each fund. For example, this fund LU1615092217 (BNP Paribas Easy MSCI World SRI S-Series PAB 5% Capped UCITS ETF EUR Acc) has 4 labels:

  • MSCI World SRI S-Series PAB 5% Capped (2)
  • Equity (1205)
  • World (296)
  • Social/Environmental (97)

We now can identify that Equity is the largest size label (1205 funds), then it is World (296 funds), then Social/Environmental (97). We ignore the fourth label here.

The full pie with three levels of hierarchy is below (you can locate the ETF example above in the outermost blue layer Equity->World->Social/Environmental) :

3-level hierarchy for categorising ETF funds (NOT weighted by funds size)

(dynamic version of the graph is available on the PythonInvest website)

Description: 758 Equity ETF funds are divided further into the World, Europe, United Stated, Emerging Markets, Japan, and other subcategories. Each subcategory has next level subcategories: for example, for Equity->World it is: Social/Environmental, Technology, Basic Materials, and others.

Commentary: The 3-level piechart roughly outlines the set of the most popular investment ETFs in Europe

3-level hierarchy for categorising ETF funds (weighted by funds size)

(dynamic version of the graph is available on the PythonInvest website)

Commentary: if you rearrange the previous pie-chart with the funds value-weighted size, then you can see that the funds size is shifted towards largest indexes (S&P 500, MSCI USA, MSCI World, DAX), technology sector, Gold precious metal, and Corporate World or Europe Government Bonds.

Treecharts of Returns

Idea: to ascertain the market-size split of European ETFs and their long-term/short-term profitability.

3 YR (Apr 2019–Apr 2022) returns of EU ETFs

(dynamic version of the graph is available on the PythonInvest website)

Commentary: The dominance of the chart by the colour green shows robust double-digit growth for 3-year returns. (10–100% in 3 years). The obvious thing to note is highly positive Technology Equity ETFs (and many other Equity sectors in the USA).

2022 YTD (Jan 2022–Apr 2022) returns of EU ETFs

(dynamic version of the graph is available on the PythonInvest website)

Commentary: 2022 year-to-date returns (1 Jan — 9 Apr) are much less optimistic, as they have a lot of funds coloured yellow (around 0 returns) or light red (slightly negative returns). As you can see, in Equity ETFs there are only a few ‘green’ items with positive returns, many of which have ‘Energy’ in their labels. Commodities and Precious metals are often considered safe havens and are also growing in popularity during hard times of high volatility on the markets, madness of investors, and the state of a high observed inflation.

Current Quote to 52-weeks Range

Idea: the next treechart aims to find the most undervalued ETFs as of today (28 March 2022). The more saturated the blue colour is, the closer is the current quote’s value compared to the whole year (52-weeks range). It is not necessary that all distressed ETFs will come back to high prices in the future, but there will definitely be some that rebound in the future.

The most fallen ETFs are dark blue (the current value is close to the min value in the last 52weeks)

(dynamic version of the graph is available on the PythonInvest website)

Commentary: This picture introduces another angle of analysis and highlights items like Europe, Emerging Markets, Germany, China Equity ETFs and some Bond ETFs that have fallen considerably last year and may start growing soon.


Within this article were were able to achieve a high-level understanding of investing in ETFs in Europe. We’ve learned how to scrape the list of ETF names (ISINs) and major features, built a multi-level categorisation of asset classes, and compared the short- and long-term returns.

Thus, within 1000+ ETFs in Europe, more than 750 of them are in equity markets with many competing offerings that cover well-known indexes S&P 500, MSCI World, MSCI USA.

There is an emerging trend of 20+ new funds (with age<1 year) in Crypto (although in 2022 it shows negative returns).

Lastly the most recent news of January-March brings evidence of rising returns in Energy Equity ETFs and Commodities/Precious metals.



Ivan Brigida

Data and Product Analyst at Google. I run a website on Python programming, analytics, and stock investing.