Notebook

FactSet Estimates Examples

Today, we introduced a new dataset to the platform: FactSet Estimates. FactSet Estimates are a new source of data that you can use to write contest algorithms. Importantly, this is the first source of estimates data available on Quantopian, meaning this is a great opportunity to research and develop new strategies that aren't correlated to algorithms that have already been licensed. Having a unique algorithm is an important component of Quantopian's evaluation process. If you're interested in getting an allocation, developing a strategy with estimates data is highly encouraged.

Reminder: there is a 1-year holdout on FactSet Estimates data. This means you cannot conduct research using the most recent year of data. However, this restriction does not apply to the contest or the allocation evaluation process.

What's in this notebook?

The majority of this notebook is focused on providing example pipelines that use estimates data. Each example has a brief explanation of what the pipeline is doing and highlights important features or concepts that are new to the example.

Importantly, this notebook doesn't get into too much detail about the new API features that were added to pipeline with FactSet Estimates. If you want to learn more about the new API features, see the first notebook attached to this post.

Usage Overview

There are two FactSet Estimates dataset families currently available in pipeline:

  • PeriodicConsensus - This provides access to consensus estimates data.
  • Actuals - This provides access to actual reported values corresponding to the metrics from consensus estimates.

How to Import

In [1]:
import quantopian.pipeline.data.factset.estimates as fe

fe.PeriodicConsensus
fe.Actuals

# Or

from quantopian.pipeline.data.factset.estimates import (
    PeriodicConsensus,
    Actuals
)

About the Datasets

PeriodicConsensus and Actuals are both DataSetFamily objects. They need to be sliced before they can be used to build a factor. Both PeriodicConsensus and Actuals need to be sliced with 3 parameters in order to create a standard pipeline DataSet: item, freq, and period_offset.

  • item: The report item. Valid options are listed in the Data Reference.
  • freq: The period frequency. Valid options are 'qf' (quarterly reports), 'saf' (semi-annual reports), and 'af' (annual reports). Note that quarterly and annual reports have more estimates than semi-annual reports.
  • period_offset: The relative offset of the period being estimated. Valid options for PeriodicConsensus are currently -128 to 127, which denote the number of fiscal periods to look forward (+) or backward (0 or -). For Actuals, valid options are -128 to 0.

The remainder of this notebook walks through example pipelines that use FactSet Estimates.

Examples

The following imports need to be run before running the example pipelines. Note that all of our examples run over the US_EQUITIES domain will be screened down to the QTradableStocksUS (QTU), as contest algorithms are required to trade within the QTU.

In [2]:
from quantopian.pipeline import Pipeline, CustomFactor
import quantopian.pipeline.data.factset.estimates as fe
from quantopian.pipeline.domain import US_EQUITIES, CA_EQUITIES, DE_EQUITIES
from quantopian.pipeline.filters import QTradableStocksUS

from quantopian.research import run_pipeline

Example 1 - EPS Estimate and Actual

This pipeline gets the latest mean consensus EPS estimate for next quarter. It also gets the actual EPS from last quarter. The output includes the 'period label' of the previous fiscal period (fq0) and next fiscal period (fq1), corresponding to the end date of the fiscal periods being considered. Note that since Pipeline terms are defined using relative period offsets, the period label is expected to change over the simulation period. The example is run over the US_EQUITIES domain from May 2015 to May 2016.

In [3]:
# Create a dataset of EPS estimates for the upcoming fiscal quarter (fq1).
fq1_eps_cons = fe.PeriodicConsensus.slice('EPS', 'qf', 1)

# Define a pipeline factor that gets the latest mean estimate EPS for fq1.
fq1_eps_cons_mean = fq1_eps_cons.mean.latest

# Define a pipeline factor with the period label (end date) of fq1.
fq1_eps_period_date = fq1_eps_cons.period_label.latest

# Create a dataset of EPS actuals from the most recently published fiscal quarter (fq0).
fq0_eps_act = fe.Actuals.slice('EPS', 'qf', 0)

# Define a pipeline factor that gets the latest actual EPS for fq0.
fq0_eps_value = fq0_eps_act.actual_value.latest

# Define a pipeline factor with the period label (end date) of fq0.
fq0_eps_period_date = fq0_eps_act.period_label.latest

pipe1 = Pipeline(
    columns={
        'fq1_eps_cons_mean': fq1_eps_cons_mean,
        'fq1_eps_period_date': fq1_eps_period_date,
        'fq0_eps_value': fq0_eps_value,
        'fq0_eps_period_date': fq0_eps_period_date,
    },
    domain=US_EQUITIES,
    screen=QTradableStocksUS(),
)
In [4]:
df1 = run_pipeline(pipe1, '2015-05-01', '2016-05-01')

Here is a look at the daily result for AAPL. Note that the EPS estimate (fq1_eps_mean) can change from day to day as analysts change their estimates or new estimates are made.

In [5]:
df1.dropna().xs(24, level=1).head()
Out[5]:
fq0_eps_period_date fq0_eps_value fq1_eps_cons_mean fq1_eps_period_date
2015-05-01 00:00:00+00:00 2015-03-31 2.33 1.390365 2015-06-30
2015-05-04 00:00:00+00:00 2015-03-31 2.33 1.737424 2015-06-30
2015-05-05 00:00:00+00:00 2015-03-31 2.33 1.737424 2015-06-30
2015-05-06 00:00:00+00:00 2015-03-31 2.33 1.737424 2015-06-30
2015-05-07 00:00:00+00:00 2015-03-31 2.33 1.740277 2015-06-30

Here is a look at AAPL's results again, but zoomed in on a time period where a report for AAPL is published, meaning the relative fiscal quarter 'hops' to the next quarter.

In [6]:
df1.dropna().xs(24, level=1).loc['2015-07-20':'2015-07-24']
Out[6]:
fq0_eps_period_date fq0_eps_value fq1_eps_cons_mean fq1_eps_period_date
2015-07-20 00:00:00+00:00 2015-03-31 2.33 1.806457 2015-06-30
2015-07-21 00:00:00+00:00 2015-03-31 2.33 1.808164 2015-06-30
2015-07-22 00:00:00+00:00 2015-03-31 2.33 1.812554 2015-06-30
2015-07-23 00:00:00+00:00 2015-06-30 1.85 1.859802 2015-09-30
2015-07-24 00:00:00+00:00 2015-06-30 1.85 1.861586 2015-09-30

And here is a cross-sectional look at the result. Note that a given simulation date can have different period end dates for fq0 and fq1 as companies can have different fiscal calendars and publish reports and different times.

In [7]:
df1.dropna().head()
Out[7]:
fq0_eps_period_date fq0_eps_value fq1_eps_cons_mean fq1_eps_period_date
2015-05-01 00:00:00+00:00 Equity(2 [ARNC]) 2015-03-31 0.84 0.783570 2015-06-30
Equity(24 [AAPL]) 2015-03-31 2.33 1.390365 2015-06-30
Equity(31 [ABAX]) 2014-12-31 0.26 0.263333 2015-03-31
Equity(39 [DDC]) 2015-01-31 -0.01 0.290000 2015-04-30
Equity(41 [ARCB]) 2014-12-31 0.53 0.107500 2015-03-31

Example 2 - Estimated Sales Growth Factor

This pipeline defines an estimated quarterly sales growth factor. The growth factor is defined by taking the relative difference between the sales estimates for the upcoming quarter (fq1) and the following quarter (fq2). The example is run over the CA_EQUITIES domain between May 2016 and May 2017.

In [8]:
# Create a dataset of sales estimates for the upcoming two fiscal quarters.
fq1_eps_cons = fe.PeriodicConsensus.slice('SALES', 'qf', 1)
fq2_eps_cons = fe.PeriodicConsensus.slice('SALES', 'qf', 2)

# Define factors that get the latest mean sales estimate for fq1 and fq2.
fq1_eps_cons_mean = fq1_eps_cons.mean.latest
fq2_eps_cons_mean = fq2_eps_cons.mean.latest

# Define an estimate sales growth factor as the relative difference between
# the fq2 mean sales estimate and fq1 mean sales estimate.
estimated_growth_factor = (fq2_eps_cons_mean - fq1_eps_cons_mean) / fq1_eps_cons_mean

pipe2 = Pipeline(
    columns={
        'fq1_eps_mean': fq1_eps_cons_mean,
        'fq2_eps_mean': fq2_eps_cons_mean,
        'estimated_sales_growth_factor': estimated_growth_factor,
    },
    domain=CA_EQUITIES,
)
In [9]:
df2 = run_pipeline(pipe2, '2016-05-05', '2017-05-05')
In [10]:
df2.dropna().head()
Out[10]:
estimated_sales_growth_factor fq1_eps_mean fq2_eps_mean
2016-05-05 00:00:00+00:00 Equity(1178884003878983 [PNC.A]) -0.052301 2.390000e+08 2.265000e+08
Equity(1178892628414550 [CET]) -0.299003 4.423830e+07 3.101090e+07
Equity(1178904908683334 [FAS]) 4.028571 2.100000e+04 1.056000e+05
Equity(1178904990660426 [FSZ]) 0.045201 6.601600e+07 6.900000e+07
Equity(1178905443717962 [SJR.B]) -0.056980 1.398740e+09 1.319040e+09

Example 3 - Earnings Surprise Factor

This pipeline defines an earnings surprise factor. The earnings surprise factor is defined by taking the relative difference between the actual EPS from the most recently published quarterly report (fq0) and the last mean consensus estimate prior to the report being published. The surprise factor represents the difference (or 'surprise') between what analysts expected and what the company actually reported. The example is run over the US_EQUITIES domain from May 2016 and May 2017.

In [11]:
# Create a dataset of sales estimates for the most recently published quarterly report (fq0).
fq0_eps_cons = fe.PeriodicConsensus.slice('EPS', 'qf', 0)

# Create a dataset of actual sales values for the most recently published quarterly report (fq0).
fq0_eps_act = fe.Actuals.slice('EPS', 'qf', 0)

# Define a factor of the last mean conensus EPS estimate prior to the fq0 report being published.
fq0_eps_cons_mean = fq0_eps_cons.mean.latest

# Define a factor of the actual EPS from fq0.
fq0_eps_act_value = fq0_eps_act.actual_value.latest


# Define a surprise factor to be the relative difference between the actual EPS and the final 
# mean estimate made prior to the report being published. A positive value means the company
# beat analyst expectations. A negative value means the company missed expectations.
fq0_surprise = (fq0_eps_act_value - fq0_eps_cons_mean) / fq0_eps_cons_mean

pipe3 = Pipeline(
    columns={
        'fq0_eps_cons': fq0_eps_cons_mean,
        'fq0_eps_act': fq0_eps_act_value,
        'fq0_surprise_factor': fq0_surprise,
    },
    domain=US_EQUITIES,
    screen=QTradableStocksUS(),
)
In [12]:
df3 = run_pipeline(pipe3, '2016-05-05', '2017-05-05')
In [13]:
df3.dropna().head()
Out[13]:
fq0_eps_act fq0_eps_cons fq0_surprise_factor
2016-05-05 00:00:00+00:00 Equity(2 [ARNC]) 0.21 0.321216 -0.346234
Equity(24 [AAPL]) 1.90 2.316661 -0.179854
Equity(31 [ABAX]) 0.39 0.320000 0.218750
Equity(39 [DDC]) -0.41 0.175000 -3.342857
Equity(41 [ARCB]) -0.23 -0.108795 1.114068

Example 4 - Up/Down/Total Number of Revisions Over Next 8 Quarters; Next 2 Years

This example computes the number of up/down/total number of revisions to estimates over the most recent consensus window. As a reminder, the 'consensus window' usually includes anaylst estimates from the last 100 days (see the reference for more information). The number of up/down/total revisions factors are defined for the next 8 quarters (qf) and next 2 years (af) using for loops and the new get_column function. This example is run over the DE_EQUITIES domain in August 2016. Note that this example defines each factor and stores them all in a dictionary called pipeline_columns. At the end of the cell, a pipeline is constructed with all of the pipeline columns.

In [14]:
# The set of columns that we would like to get from the datasets we slice from PeriodicConsensus.
data_column_names = ['up', 'down', 'num_est']

# We will add columns to this dictionary and pass this to the Pipeline constructor later.
pipeline_columns = {}

# Create a quarterly EPS estimate dataset for a each of the next 8 fiscal quarters (fq1 to fq8). 
# For each of these datasets, define 3 factors to be the latest 'up', 'down', and 'num_est' values
# for that quarter and add the factor to our pipeline_columns dictionary.
for per_rel_q in range(1, 9):
    fq = fe.PeriodicConsensus.slice('EPS', 'qf', per_rel_q)
    for col in data_column_names:
        dataset_col = fq.get_column(col)
        pipeline_columns['fq%s_%s' % (per_rel_q, dataset_col.name)] = dataset_col.latest

# Similar to the first for loop, create an annual EPS estimate dataset for the next 2 fiscal years
# (fy1 and fy2). Define 3 factors for each of those years to be the latest 'up', 'down', and 'num_est'
# values and add them to our pipeline_columns dictionary.
for per_rel_y in range(1, 3):
    fy = fe.PeriodicConsensus.slice('EPS', 'af', per_rel_y)
    for col in data_column_names:
        dataset_col = fy.get_column(col)
        pipeline_columns['fy%s_%s' % (per_rel_y, dataset_col.name)] = dataset_col.latest

# Create a pipeline with all of our columns.
pipe4 = Pipeline(columns=pipeline_columns, domain=DE_EQUITIES)
In [15]:
df4 = run_pipeline(pipe4, '2016-08-05', '2016-09-06')
In [16]:
df4.dropna().head()
Out[16]:
fq1_down fq1_num_est fq1_up fq2_down fq2_num_est fq2_up fq3_down fq3_num_est fq3_up fq4_down ... fq7_up fq8_down fq8_num_est fq8_up fy1_down fy1_num_est fy1_up fy2_down fy2_num_est fy2_up
2016-08-05 00:00:00+00:00 Equity(1178978509931082 [HTJ]) 2.0 12.0 6.0 6.0 13.0 4.0 5.0 7.0 2.0 2.0 ... 1.0 1.0 1.0 0.0 11.0 19.0 4.0 6.0 14.0 5.0
Equity(1179060014167627 [PUM]) 0.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 ... 0.0 0.0 1.0 0.0 8.0 18.0 6.0 2.0 2.0 0.0
Equity(1179987811189325 [DWNI]) 0.0 4.0 1.0 0.0 2.0 1.0 0.0 1.0 1.0 0.0 ... 0.0 0.0 1.0 1.0 5.0 16.0 4.0 2.0 18.0 7.0
Equity(1179995930245446 [AE9]) 4.0 16.0 9.0 0.0 1.0 1.0 0.0 2.0 2.0 2.0 ... 0.0 0.0 1.0 0.0 4.0 12.0 2.0 3.0 13.0 6.0
Equity(1180060908536406 [AIXA]) 1.0 4.0 0.0 0.0 3.0 1.0 0.0 1.0 0.0 1.0 ... 1.0 0.0 1.0 0.0 2.0 24.0 3.0 5.0 21.0 5.0

5 rows × 30 columns

Example 5 - Changes In Mean Consensus Estimate Over Last 6 Months

This example computes the percent change in the mean fq1 CFPS (Cash Flow Per Share) estimate over the last 126 trading days and the change in the number of estimates contributing to the consensus value over the same trailing 126 days. Note that the percentage change factor is defined using a custom factor called PctChange while the change in number of estimates is defined using a custom factor called Difference. The latest number of estimates for fq1 are also included as a column in the pipeline to give a sense of how many estimates are included in the consensus. The example is computed over the US_EQUITIES domain in 2013.

In [17]:
# Compute the relative difference between the most recent value and the oldest value in 
# a lookback window. We will use this to compute the change in the mean consensus CFPS
# estimate over the last 126 days.
class PctChange(CustomFactor):
    def compute(self, today, asset_ids, out, values):
        out[:] = (values[-1] - values[0]) / values[0]

# Compute the absolute difference between the most recent value and the oldest value in 
# a lookback window. We will use this to compute the change in the number of estimates
# in the consensus window over the last 126 days.
class Difference(CustomFactor):
    def compute(self, today, asset_ids, out, values):
        out[:] = values[-1] - values[0]
In [18]:
# Create a dataset of CFPS estimates for the upcoming fiscal quarter (fq1).
fq1_cfps_cons = fe.PeriodicConsensus.slice('CFPS', 'qf', 1)

# Define a factor as the percent change of mean CFPS estimate over the last 126 days.
# Note that the entire timeseries supplied to Returns represents estimates for the same
# fiscal quarter. The estimates in this window all correspond to the upcoming report (fq1)
# relative to pipeline's simulation date. If the pipeline is simulating computations
# for 03-02-2015, and the next fiscal quarter for a company ends on 03-31-2015, then the
# timeseries of mean CFPS estimates on that day will all correspond to the quarter ending
# on 03-31-2015.
fq1_cfps_cons_mean_pct_change = PctChange(inputs=[fq1_cfps_cons.mean], window_length=126)

# Define a factor as the difference in number of CFPS estimates over the last 126 days.
# The difference is defined as the number of estimates in the consensus window as of yesterday
# minus the number of estimates in the consensus window 126 days ago.
fq1_cfps_cons_num_est_change = Difference(inputs=[fq1_cfps_cons.num_est], window_length=126)


pipe5 = Pipeline(
    columns={
        'fq1_cfps_cons_mean_pct_change': fq1_cfps_cons_mean_pct_change,
        'fq1_cfps_cons_num_est_change': fq1_cfps_cons_num_est_change,
        'fq1_cfps_num_est': fq1_cfps_cons.num_est.latest,
        'test': PctChange(inputs=[fq1_cfps_cons.mean], window_length=126)
    },
    domain=US_EQUITIES,
    screen=QTradableStocksUS(),
)
In [19]:
df5 = run_pipeline(pipe5, '2013-01-01', '2013-01-01')
In [20]:
df5.dropna().head()
Out[20]:
fq1_cfps_cons_mean_pct_change fq1_cfps_cons_num_est_change fq1_cfps_num_est test
2013-01-02 00:00:00+00:00 Equity(2 [ARNC]) -0.030425 0.0 5.0 -0.030425
Equity(24 [AAPL]) -0.765808 2.0 3.0 -0.765808
Equity(39 [DDC]) 0.071993 0.0 2.0 0.071993
Equity(64 [GOLD]) -0.147692 0.0 8.0 -0.147692
Equity(67 [ADSK]) -0.069838 0.0 2.0 -0.069838

There are many other ideas that you can explore in this dataset. As a reminder, we recently increased the limit to the number of contest entries allowed per person. Try poking through the new data and see if you can come up with ways to incorporate them in existing strategies or come up with new ideas altogether and submit them to the contest!