Notebook

Alphalens Example Tear Sheet

Alphalens is designed to aid in the analysis of "alpha factors," data transformations that are used to predict future price movements of financial instruments. Alpha factors take the form of a single value for each asset on each day. The dimension of these values is not necessarily important. We evaluate an alpha factor by considering daily factor values relative to one another.

It is important to note the difference between an alpha factor and a trading algorithm. A trading algorithm uses an alpha factor, or combination of alpha factors to generate trades. Trading algorithms cover execution and risk constraints: the business of turning predictions into profits. Alpha factors, on the other hand, are focused soley on making predictions. This difference in scope lends itself to a difference in the methodologies used to evaluate alpha factors and trading algorithms. Alphalens does not contain analyses of things like transaction costs, capacity, or portfolio construction. Those interested in more implementation specific analyses are encouaged to check out pyfolio (https://github.com/quantopian/pyfolio), a library specifically geared towards the evaluation of trading algorithms.

In [1]:
import numpy as np
import pandas as pd
from quantopian.research import run_pipeline
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import CustomFactor, Returns, AverageDollarVolume
from quantopian.pipeline.classifiers.morningstar import Sector
In [2]:
universe_screen = AverageDollarVolume(window_length=20).top(500)
In [3]:
pipe = Pipeline(
    columns={
        'Momentum' : Returns(window_length=252, mask=universe_screen),
        'Sector': Sector(mask=universe_screen),
    },
    screen=universe_screen
)
In [4]:
results = run_pipeline(pipe, '2015-06-30', '2016-06-30')
results = results.fillna(value=0.)
In [5]:
momentum_factor = results["Momentum"]
momentum_factor.head()
Out[5]:
2015-06-30 00:00:00+00:00  Equity(2 [ARNC])    -0.230705
                           Equity(24 [AAPL])    0.363660
                           Equity(62 [ABT])     0.224101
                           Equity(64 [ABX])    -0.402359
                           Equity(67 [ADSK])   -0.106047
Name: Momentum, dtype: float64

The pricing data passed to alphalens should reflect the next available price after a factor value was observed at a given timestamp. The price must not be included in the calculation of the factor for that time. Always double check to ensure you are not introducing lookahead bias to your study.

In our example, before trading starts on 2014-12-2, we observe yesterday, 2014-12-1's factor value. The price we should pass to alphalens is the next available price after that factor observation: the open price on 2014-12-2.

In [6]:
assets = results.index.levels[1].unique()
# We need to get a little more pricing data than the 
# length of our factor so we can compare forward returns.
# We'll tack on another month in this example.
pricing = get_pricing(assets, start_date='2015-06-30', end_date='2016-07-31', fields='open_price')
In [7]:
pricing.head()
Out[7]:
Equity(2 [ARNC]) Equity(24 [AAPL]) Equity(62 [ABT]) Equity(64 [ABX]) Equity(67 [ADSK]) Equity(76 [TAP]) Equity(114 [ADBE]) Equity(122 [ADI]) Equity(128 [ADM]) Equity(154 [AEM]) ... Equity(49139 [FIT]) Equity(49141 [CPGX]) Equity(49183 [WRK]) Equity(49209 [BXLT]) Equity(49229 [KHC]) Equity(49242 [PYPL]) Equity(49506 [HPE]) Equity(49515 [RACE]) Equity(49563 [SYF_WI]) Equity(49865 [HOT_WI])
2015-06-30 00:00:00+00:00 34.35 125.57 49.51 10.74 51.16 71.35 82.17 64.34 48.87 28.77 ... 35.05 28.67 55.380 NaN NaN NaN NaN NaN NaN NaN
2015-07-01 00:00:00+00:00 33.60 126.85 49.40 10.63 50.09 70.36 81.57 64.89 48.71 28.30 ... 39.29 29.40 54.847 NaN NaN NaN NaN NaN NaN NaN
2015-07-02 00:00:00+00:00 33.27 126.43 49.80 10.50 50.43 70.04 81.19 64.58 48.82 27.88 ... 41.82 NaN NaN 31.59 NaN NaN NaN NaN NaN NaN
2015-07-06 00:00:00+00:00 32.85 124.94 48.85 10.52 49.78 69.40 80.02 63.85 48.16 28.21 ... 41.15 29.56 57.565 30.80 71.00 NaN NaN NaN NaN NaN
2015-07-07 00:00:00+00:00 32.88 125.89 49.96 10.50 51.01 69.39 80.77 63.98 48.11 28.98 ... 42.36 30.85 57.710 31.70 73.99 37.73 NaN NaN NaN NaN

5 rows × 824 columns

Often, we'd want to know how our factor looks across various sectors. To generate sector level breakdowns, you'll need to pass alphalens a sector mapping for each traded name.

This mapping can come in the form of a MultiIndexed Series (with the same date/symbol index as your factor value) if you want to provide a sector mapping for each symbol on each day.

If you'd like to use constant sector mappings, you may pass symbol to sector mappings as a dict.

If your sector mappings come in the form of codes (as they do in this tutorial), you may also pass alphalens a dict of sector names to use in place of sector codes.

In [8]:
MORNINGSTAR_SECTOR_CODES = {
     -1: 'Misc',
    101: 'Basic Materials',
    102: 'Consumer Cyclical',
    103: 'Financial Services',
    104: 'Real Estate',
    205: 'Consumer Defensive',
    206: 'Healthcare',
    207: 'Utilities',
    308: 'Communication Services',
    309: 'Energy',
    310: 'Industrials',
    311: 'Technology' ,    
}
In [9]:
sectors = results["Sector"]

Importing Alphalens

In [10]:
import alphalens

Formatting input data

Alphalens contains a handy data formatting function to transform your factor and pricing data into the exact inputs expected by the rest of the plotting and performance functions. This get_clean_factor_and_forward_returns function is the first call in create_factor_tear_sheet.

In [13]:
factor, forward_returns = alphalens.utils.get_clean_factor_and_forward_returns(momentum_factor,
                                                                               pricing,
                                                                               groupby=sectors,
                                                                               groupby_labels=MORNINGSTAR_SECTOR_CODES,
                                                                               periods=(1,5,10))

Let's see what that gave us...

In [14]:
factor.head()
Out[14]:
date                       asset              group          
2015-06-30 00:00:00+00:00  Equity(2 [ARNC])   Basic Materials   -0.230705
                           Equity(24 [AAPL])  Technology         0.363660
                           Equity(62 [ABT])   Healthcare         0.224101
                           Equity(64 [ABX])   Basic Materials   -0.402359
                           Equity(67 [ADSK])  Technology        -0.106047
Name: factor, dtype: float64
In [15]:
forward_returns.head()
Out[15]:
1 5 10
date asset group
2015-06-30 00:00:00+00:00 Equity(2 [ARNC]) Basic Materials -0.021834 -0.043668 -0.059389
Equity(24 [AAPL]) Technology 0.010194 -0.007327 0.001195
Equity(62 [ABT]) Healthcare -0.002222 0.012725 0.006665
Equity(64 [ABX]) Basic Materials -0.010242 -0.032588 -0.088454
Equity(67 [ADSK]) Technology -0.020915 0.030493 0.039875

You'll notice that our factor doesn't look much different. The only addition here is an index level describing the sector of each name. That will come in handy as we perform sector level reductions in our performance and plotting functions.

The forward_returns dataframe represents the mean daily price change for the N days after a timestamp. The 1 day forward return for AAPL on 2014-12-2 is the percent change in the AAPL open price on 2014-12-2 and the AAPL open price on 2014-12-3. The 5 day forward return is the percent change from open 2014-12-2 to open 2014-12-9 (5 trading days) divided by 5.

Returns Analysis

Returns analysis gives us a raw description of a factor's value that shows us the power of a factor in real currency values.

In [16]:
quantized_factor = alphalens.performance.quantize_factor(factor)
In [17]:
quantized_factor.head()
Out[17]:
date                       asset              group          
2015-06-30 00:00:00+00:00  Equity(2 [ARNC])   Basic Materials    1
                           Equity(24 [AAPL])  Technology         5
                           Equity(62 [ABT])   Healthcare         4
                           Equity(64 [ABX])   Basic Materials    1
                           Equity(67 [ADSK])  Technology         2
Name: quantile, dtype: int64

One of the most basic ways to look at a factor's predicitve power is to look at the mean return of different factor quantile.

In [21]:
mean_return_by_q_daily, std_err = alphalens.performance.mean_return_by_quantile(quantized_factor, forward_returns,
                                                                                by_group=False,
                                                                                by_date='D')
In [22]:
mean_return_by_q_daily.head()
Out[22]:
1 5 10
date quantile
2015-06-30 00:00:00+00:00 1 -0.004784 -0.023814 -0.046939
2 -0.001886 -0.002682 -0.002961
3 -0.000711 0.004590 0.003581
4 0.002174 0.008346 0.018499
5 0.005200 0.013605 0.027855
In [24]:
mean_return_by_q, std_err_by_q = alphalens.performance.mean_return_by_quantile(quantized_factor,
                                                                               forward_returns,
                                                                               by_group=False)
In [25]:
mean_return_by_q.head()
Out[25]:
1 5 10
quantile
1 -0.000012 -0.000067 0.000320
2 -0.000227 -0.000883 -0.001645
3 0.000016 -0.000437 -0.000467
4 0.000188 0.000789 0.000988
5 0.000035 0.000598 0.000802
In [26]:
alphalens.plotting.plot_quantile_returns_bar(mean_return_by_q);

By looking at the mean daily return by quantile we can get a real look at how well the factor differentiates forward returns across the signal values. Obviously we want securities with a better signal to exhibit higher returns. For a good factor we'd expect to see negative values in the lower quartiles and positive values in the upper quantiles.

In [27]:
alphalens.plotting.plot_quantile_returns_violin(mean_return_by_q_daily);

This violin plot is similar to the one before it but shows more information about the underlying data. It gives a better idea about the range of values, the median, and the inter-quartile range. What gives the plots their shape is the application of a probability density of the data at different values.

In [28]:
quant_return_spread, std_err_spread = alphalens.performance.compute_mean_returns_spread(mean_return_by_q_daily, 5, 1, std_err)
In [49]:
try:
    alphalens.plotting.plot_mean_quantile_returns_spread_time_series(quant_return_spread, std_err_spread, ax=None);
except Exception:    
    pass

This rolling forward returns spread graph allows us to look at the raw spread in basis points between the top and bottom quantiles over time. The green line is the daily returns spread while the orange line is a 1 month average to smooth the data and make it easier to visualize.

In [50]:
alphalens.plotting.plot_cumulative_returns_by_quantile(mean_return_by_q_daily);

By looking at the cumulative returns by factor quantile we can get an intuition for which quantiles are contributing the most to the factor and at what time. Ideally we would like to see a these curves originate at the same value on the left and spread out like a fan as they move to the right through time, with the higher quantiles on the top.

In [51]:
ls_factor_returns = alphalens.performance.factor_returns(factor, forward_returns)
In [52]:
ls_factor_returns.head()
Out[52]:
1 5 10
date
2015-06-30 00:00:00+00:00 0.008775 0.026183 0.047551
2015-07-01 00:00:00+00:00 0.006157 0.015147 0.041300
2015-07-02 00:00:00+00:00 0.002873 0.013388 0.045471
2015-07-06 00:00:00+00:00 0.013099 0.022425 0.052212
2015-07-07 00:00:00+00:00 -0.006675 0.007369 0.045142
In [53]:
alphalens.plotting.plot_cumulative_returns(ls_factor_returns[1]);

While looking at quantiles is important we must also look at the factor returns as a whole. The cumulative factor long/short returns plot lets us view the combined effects overtime of our entire factor.

In [58]:
alpha_beta = alphalens.performance.factor_alpha_beta(factor, forward_returns,
                                                     factor_returns=ls_factor_returns)
In [59]:
alpha_beta
Out[59]:
1 5 10
Ann. alpha -0.008988 -0.010916 -0.013903
beta -0.320250 -0.451156 -0.418124

A very important part of factor returns analysis is determing the alpha, and how significant it is. Here we surface the annualized alpha, beta, and t-stat for the alpha.

Information Analysis

Information Analysis is a way for us to evaluate the predicitive value of a factor without the confounding effects of transaction costs. The main way we look at this is through the Information Coefficient (IC).

To learn more about the Information Coefficient and Spearman Rank Correlation check out the Spearman Rank Correlation lecture from the Quantopian Lecture Series.

In [60]:
ic = alphalens.performance.factor_information_coefficient(factor, forward_returns)
In [61]:
ic.head()
Out[61]:
1 5 10
date
2015-06-30 00:00:00+00:00 0.333019 0.302986 0.459407
2015-07-01 00:00:00+00:00 0.263502 0.256613 0.435593
2015-07-02 00:00:00+00:00 0.080600 0.270333 0.469977
2015-07-06 00:00:00+00:00 0.458503 0.417616 0.485181
2015-07-07 00:00:00+00:00 -0.190300 0.239619 0.445897
In [65]:
alphalens.plotting.plot_ic_ts(ic);