Notebook

FactSet Estimates and Dataset Families

Today, we added Pipeline API support for FactSet Consensus Estimates and FactSet Actuals.

We believe that this release is an exciting opportunity for the Quantopian community. Together, FactSet Estimates and Actuals represent one of the largest additions of new data we've made to the Quantopian platform. Using the new estimates data, you can create signals and strategies that were not previously possible. For example, you can now:

  • Account for current analyst expectations of future earnings.
  • Examine how analyst projections of future earnings have changed over time.
  • Compute earnings surprises by comparing estimated values to company-provided actuals.

Algorithms that use FactSet estimates data are eligible for entry into the Quantopian Contest. They are also eligible to be considered for allocations.

The addition of estimates data to Quantopian also comes with new platform concepts. This notebook introduces estimates data and the new features we added to enable the usage of estimates data in pipeline.

What's in This Notebook?

This notebook is focused on teaching new concepts and API features associated with the recent release of estimates data on Quantopian. In this notebook, we will:

  • Provide a brief introduction to financial estimates.
  • Introduce DataSetFamily, a new pipeline feature that, along with a new slice method, allows us to create DataSets by specifying coordinates.
  • Introduce PeriodicConsensus and Actuals, two new DataSetFamilys that allow us to use consensus estimates and actual reported values in pipeline.

After reading through this notebook, be sure to check out the notebook attached to the first comment of the announcement post. The second notebook runs through several example pipelines using estimates and is meant to be read as a follow-up to this notebook.

Estimates

In this section, we give a brief overview of important estimates concepts.

What Are Financial Estimates?

Publicly traded companies issue quarterly, semi-annual, and annual financial reports. These reports provide investors with standardized, quantitative measures of a company's financial performance, including metrics like Earnings per Share (EPS), Dividends per Share (DPS), and Cash Flow per Share (CFPS).

Investors want to know how companies will perform in the future, so they hire analysts to make estimates of future company performance. These estimates are usually tied to a particular fiscal quarter or year, allowing investors to compare analysts' estimates with actual values reported by companies.

Consensus Estimates

For large companies, there are usually many analysts making estimates for upcoming fiscal periods, so a common method for working with estimates data is to aggregate estimates from many individual analysts into a single "consensus" estimate. There are many ways to summarize individual estimates into a consensus value.

FactSet provides the following summary statistics in their Consensus Estimates data:

  • Mean
  • Median
  • Lowest Estimate
  • Highest Estimate
  • Number of Estimates
  • Standard Deviation
  • Number of "Up" Estimates Since Previous Revision
  • Number of "Down" Estimates Since Previous Revision

Relative Fiscal Periods

When we work with estimates data in a simulation (e.g., a pipeline or a backtest), we're usually not interested in one fiscal period for the entire simulation. Instead, we're interested in "nearby" fiscal periods, where "nearby" is defined relative to the current simulation time. For a simulation running in early 2014, for example, we're interested in estimates for periods ending in 2013 through 2015-ish, but for a simulation running in 2004, we're more likely interested in estimates for periods ending in 2003 through 2005-ish.

Pipeline API Additions

In this section, we discuss new additions to the Quantopian platform that make it possible to work with FactSet Estimates data in the Pipeline API.

For more information on the Pipeline API generally, see the Pipeline Tutorial.

Background: DataSets, Columns, and Logical Daily Timeseries

When working with the Pipeline API, we describe inputs to expressions using DataSet objects, which are collections of named BoundColumn objects. Each column of a dataset picks out a logical daily timeseries of input data. To be a logical daily timeseries means that, from any simulation perspective, we can look back over history and pick out a timeseries for each asset such that:

  1. We have at most one data point per asset per day.
  2. Values within the timeseries are meaningfully comparable to one another.

The exact definition of "meaningfully comparable" varies from dataset to dataset. In the case of pricing data, for example, it means that values in the timeseries have been adjusted to account for splits and dividends. A good general intuition is that values in a timeseries are meaningfully comparable if it's sensible to average them.

Identifying Logical Daily Timeseries for Estimates

To identify a logical daily timeseries of consensus estimates data, we need to fix four variables:

  • Estimate Item (e.g., Earnings per Share, Sales).
  • Reporting Frequency (quarterly, semi-annual, annual).
  • Consensus Aggregation (e.g., mean, median, high, low).
  • Period Offset

Item, frequency, and consensus aggregation are hopefully straightforward. Period offset requires some explanation.

When working with estimates as timeseries data, it's not useful to simply look at consensus updates in the order they arrive, because doing so would interleave estimates for different fiscal periods. To produce a coherent timeseries, we need to group the records for each equity by fiscal period.

In the context of a rolling simulation, the most natural way to do this grouping is to select the "offset" of the fiscal period we're interested in, relative to the current simulation time. This allows us to specify, for example, that we want the timeseries of estimates for the next or previous fiscal quarter/year.

For estimates data in Pipeline, we've adopted the following conventions for relative period offsets:

  • period_offset=1 refers to the period that will be reported next.
  • period_offset=2 refers to the period after that.
  • period_offset=0 refers to the most recently reported period.
  • period_offset=-1 refers to the period prior to most recent.

This convention corresponds to the common industry convention of using "FQ1" as shorthand for "next to be announced fiscal quarter" and "FQ2" for the quarter after "FQ1", etc. Similar conventions are common for "FY1" and "FY2".

Given an item, a frequency, an aggregation method, and a period offset, we can identify a logical daily timeseries of estimates data in a way that's useful in the context of a rolling simulation. In the next section, we show how the new DataSetFamily class allows us to provide these attributes programatically.

Estimates in Pipeline

Before today, new additions to the Pipeline API came in the form of new DataSet objects like EquityPricing or Fundamentals. Each dataset was a collection of columns and each column picked out a unique logical timeseries of data.

Representing datasets as collections of named columns works well for simple tables with only a handful of columns, but as we worked on integrating FactSet Estimates, we found that we needed a more expressive model. In particular, we found that the existing DataSet API had two significant shortcomings:

  1. DataSet doesn't provide a way to group related columns.
  2. DataSet doesn't provide a way programmatically select columns based on attributes of the columns.

These shortcomings are particularly acute for estimates data. In the section above, we observed that to identify a logical daily timeseries of consensus estimates data, we needed to fix four variables: item, frequency, aggregation, and offset. For many estimates use cases, it's important to be able to manipulate columns based on these attributes. For example, to calculate an earnings surprise, we need to be able to match up historical estimates for a given (item, frequency, period) with the "actual" value reported by the company.

To solve the problems above, we introduced a new DataSetFamily class to the Pipeline API. The primary purpose of DataSetFamily is to make it easier to programmatically manipulate collections of related columns based on common attributes.

There are two DataSetFamilys in this update: PeriodicConsensus and Actuals. Both families are importable from the new quantopian.pipeline.data.factset.estimates module.

In [1]:
from quantopian.pipeline.data.factset.estimates import Actuals, PeriodicConsensus
In [2]:
Actuals
Out[2]:
<DataSetFamily: 'Actuals', extra_dims=['item', 'freq', 'period_offset']>
In [3]:
PeriodicConsensus
Out[3]:
<DataSetFamily: 'PeriodicConsensus', extra_dims=['item', 'freq', 'period_offset']>

As the name suggests, a DataSetFamily is not a DataSet, but rather a collection of DataSets, all of which have the same columns. Each member of a family has an associated tuple of named attributes, which we call its coordinates. To select a member from a dataset family, you call the family's .slice method, passing the coordinates of the desired member.

Slices from both Actuals and PeriodicConsensus have three coordinates:

  • item: The estimated/reported company metric, e.g. 'EPS', 'CFPS'. See the Data Reference for the full list of available items.
  • freq: The reporting frequency of the estimate or actual. Choices are 'qf' (quarterly), 'saf' (semi-annual) and 'af' (annual).
  • period_offset: The relative offset between the current simulation time and the estimated/reported period of interest.

Periodic Consensus

Each slice from PeriodicConsensus provides consensus summaries of analysts' estimates for a specific item, frequency, and relative offset:

In [4]:
# Estimates for next quarter's EPS.
FQ1_EPS_est = PeriodicConsensus.slice(item='EPS', freq='qf', period_offset=1)
FQ1_EPS_est
Out[4]:
<DataSet: "PeriodicConsensus.slice(item='EPS', freq='qf', period_offset=1)", domain=GENERIC>
In [5]:
# Estimates for Sales two fiscal years out.
FY2_SALES_est = PeriodicConsensus.slice(item='SALES', freq='af', period_offset=2)
FY2_SALES_est
Out[5]:
<DataSet: "PeriodicConsensus.slice(item='SALES', freq='af', period_offset=2)", domain=GENERIC>

PeriodicConsensus slices each contain the following columns:

In [6]:
sorted(c.name for c in FQ1_EPS_est.columns)
Out[6]:
['asof_date',
 'down',
 'high',
 'low',
 'mean',
 'median',
 'num_est',
 'period_label',
 'std_dev',
 'timestamp',
 'up']

The mean, median, high, low, up, down, num_est, and std_dev columns all provide consensus aggregations as calculated by FactSet.

The asof_date, timestamp, and period_label columns provide metadata for values in the other columns:

  • asof_date tells us the date on which the consensus was updated.
  • timestamp tells us the datetime at which Quantopian learned about the consensus update.
  • period_label tells us the label of the period being estimated.

For more info on asof_date and timestamp within the Pipeline API generally, see the Data Reference.

For detailed information on PeriodicConsensus, see Data Reference: FactSet Estimates - Consensus.

Actuals

Each slice from Actuals provides corporate fiscal period "actual" values. Data in Actuals is primarily meant to be used in conjunction with data from PeriodicConsensus.

Actual values are collected via two methodologies: "Actuals" and "Broker Actuals". Whenever possible, the actual value is derived from the corporate press release, but in the case when no press release is available for a specific item, the Broker Actual value is surfaced. Broker Actuals are calculated for the cases where there is no available press release or if the estimated item is not one of the regularly collected company actuals and there is no company guidance available for that item.

In [7]:
# Actual values for previous quarter EPS.
FQ0_EPS_act = Actuals.slice(item='EPS', freq='qf', period_offset=0)
FQ0_EPS_act
Out[7]:
<DataSet: "Actuals.slice(item='EPS', freq='qf', period_offset=0)", domain=GENERIC>

There are six columns on each Actuals slice:

  • actual_value
  • asof_date
  • timestamp
  • period_label
  • actual_flag_code
  • publication_date

For most applications, the primary column of interest in Actuals is actual_value, which contains the company reported or broker actual value for the specified item, frequency, and offset.

As with PeriodicConsensus, asof_date, timestamp, and period_label all provide metadata for values in the other columns:

  • asof_date tells us the date that an actual release impacted trading.
  • timestamp tells us the date than Quantopian learned about an actual release from FactSet.
  • period_label tells us the label of the period for which the actual was reported.

publication_date and actual_flag_code are additional metadata columns specific to Actuals:

  • publication_date tells us the exact datetime at which an actual was released.
  • actual_flag_code is a code denoting the type of actual value. '1' = company reported, '3' = broker actual.
In [8]:
sorted(c.name for c in FQ0_EPS_act.columns)
Out[8]:
['actual_flag_code',
 'actual_value',
 'asof_date',
 'period_label',
 'publication_date',
 'timestamp']

For detailed information on Actuals, see Data Reference: FactSet Estimates - Actuals.

Examples

In this section, we show examples that demonstrate some of the new concepts associated with estimates data.

Use Case: EPS Estimates for Upcoming Quarter

One of the most common use cases of estimates data is to look at consensus estimates for the next upcoming fiscal period. In order to do this, we first need to create a dataset by slicing PeriodicConsensus. Once we have a dataset, we can use its columns to create factors just like any other pipeline dataset.

In the example below, we take a slice of PeriodicConsensus with coordinates ('EPS', 'qf', 1), producing a dataset of FQ1 EPS estimates. We then construct a Latest expression from the .mean column, using the .latest attribute. The result is a factor that produces the most up-to-date FQ1 EPS estimate every day.

For more info on pipeline factors and the .latest attribute, see the Pipeline API Tutorial - Factors.

In [9]:
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.factset.estimates import (
    PeriodicConsensus,
    Actuals,
)

from quantopian.research import run_pipeline

# Create a dataset of EPS estimates for the upcoming fiscal quarter (fq1).
fq1_eps_cons = PeriodicConsensus.slice('EPS', 'qf', 1)

# Define a pipeline factor that gets the latest mean estimate EPS for fq1.
fq1_eps_cons_mean = fq1_eps_cons.mean.latest

Now that we've defined a factor, fq1_eps_cons_mean, let's see what this factor looks like by adding it to a pipeline and running it over a year.

In [10]:
pipe = Pipeline(
    columns={
        'fq1_eps_cons_mean': fq1_eps_cons_mean,
    },
)

df = run_pipeline(pipe, '2016-01-01', '2017-01-01')
df.dropna().head()
Out[10]:
fq1_eps_cons_mean
2016-01-04 00:00:00+00:00 Equity(2 [ARNC]) 0.925662
Equity(24 [AAPL]) 3.200413
Equity(31 [ABAX]) 0.331250
Equity(39 [DDC]) 0.280000
Equity(41 [ARCB]) 0.575825

The way to interpret this output is: "On January 4th, 2016, the latest mean EPS consensus estimate for ARNC for the upcoming quarter is 0.105035. For AAPL the latest mean EPS estimate for the upcoming quarter is 3.255739. Etc."

Zooming in on AAPL, we can see how the EPS estimate for the upcoming quarter changes over time.

In [11]:
# Index into the dataframe to look at the estimates for AAPL. Recall that AAPL has sid=24.
df.xs(24, level=1).head(10)
Out[11]:
fq1_eps_cons_mean
2016-01-04 00:00:00+00:00 3.200413
2016-01-05 00:00:00+00:00 3.255739
2016-01-06 00:00:00+00:00 3.255739
2016-01-07 00:00:00+00:00 3.250215
2016-01-08 00:00:00+00:00 3.246643
2016-01-11 00:00:00+00:00 3.239349
2016-01-12 00:00:00+00:00 3.239349
2016-01-13 00:00:00+00:00 3.236968
2016-01-14 00:00:00+00:00 3.236968
2016-01-15 00:00:00+00:00 3.236730

Rolling to the Next Fiscal Period

Every day, fq1_eps_cons_mean should be interpreted as the mean consensus EPS estimate for the upcoming fiscal quarter. In the previous output, we saw how AAPL's mean EPS estimate for the upcoming quarter changes day over day. But if we look at a different date range (again for AAPL) of the same pipeline out, we see what appears to be a big change in the FQ1 estimate on 2016-01-28.

In [12]:
df.xs(24, level=1).loc['2016-01-22':'2016-02-02']
Out[12]:
fq1_eps_cons_mean
2016-01-22 00:00:00+00:00 3.230009
2016-01-25 00:00:00+00:00 3.230009
2016-01-26 00:00:00+00:00 3.230009
2016-01-27 00:00:00+00:00 3.228289
2016-01-28 00:00:00+00:00 2.008742
2016-01-29 00:00:00+00:00 2.008742
2016-02-01 00:00:00+00:00 2.000993
2016-02-02 00:00:00+00:00 2.000993

In this case, the jump in estimate isn't coming from analysts changing their prediction for AAPL's EPS in the next quarter. Instead, AAPL published their quarterly report, and now fq1_eps_cons_mean is referring to the subsequent quarter. This is best demonstrated by adding period_label as a new column to our pipeline.

In [13]:
fq1_period_label = fq1_eps_cons.period_label.latest

pipe = Pipeline(
    columns={
        'fq1_eps_cons_mean': fq1_eps_cons_mean,
        'fq1_period_label': fq1_period_label,
    },
)

df = run_pipeline(pipe, '2016-01-01', '2017-01-01')
In [14]:
df.xs(24, level=1).loc['2016-01-22':'2016-02-02']
Out[14]:
fq1_eps_cons_mean fq1_period_label
2016-01-22 00:00:00+00:00 3.230009 2015-12-31
2016-01-25 00:00:00+00:00 3.230009 2015-12-31
2016-01-26 00:00:00+00:00 3.230009 2015-12-31
2016-01-27 00:00:00+00:00 3.228289 2015-12-31
2016-01-28 00:00:00+00:00 2.008742 2016-03-31
2016-01-29 00:00:00+00:00 2.008742 2016-03-31
2016-02-01 00:00:00+00:00 2.000993 2016-03-31
2016-02-02 00:00:00+00:00 2.000993 2016-03-31

By adding the period label, we can clearly see that fq1_eps_cons_mean switches from getting the most recent estimate for AAPL's quarter ending in Dec. 2015 to AAPL's quarter ending in Mar. 2016 on January 28th.

To further demonstrate this point, let's add the mean EPS estimate for FQ2 and FQ3 to our pipeline, along with their period labels. We can do this by slicing PeriodicConsensus with period_offset values of 2 and 3.

In [15]:
fq2_eps_cons = PeriodicConsensus.slice('EPS', 'qf', 2)
fq2_eps_cons_mean = fq2_eps_cons.mean.latest
fq2_period_label = fq2_eps_cons.period_label.latest

fq3_eps_cons = PeriodicConsensus.slice('EPS', 'qf', 3)
fq3_eps_cons_mean = fq3_eps_cons.mean.latest
fq3_period_label = fq3_eps_cons.period_label.latest

pipe = Pipeline(
    columns={
        'fq1_eps_cons_mean': fq1_eps_cons_mean,
        'fq1_period_label': fq1_period_label,
        'fq2_eps_cons_mean': fq2_eps_cons_mean,
        'fq2_period_label': fq2_period_label,
        'fq3_eps_cons_mean': fq3_eps_cons_mean,
        'fq3_period_label': fq3_period_label,
    },
)

df = run_pipeline(pipe, '2016-01-01', '2016-10-15')

For clarity, we've spelled out each expression in the above pipeline explicitly, which makes our code a bit verbose. A shorter way to define the same Pipeline would be to use slice and get_column to construct a loop through the column attributes we're interested in.

In [16]:
def make_pipeline_with_less_code():
    expressions = {}
    
    for offset in 1, 2, 3:
        dataset = PeriodicConsensus.slice('EPS', 'qf', offset)
        
        for column_name in 'mean', 'period_label':
            output_name = "fq{}_{}".format(offset, column_name)
            expressions[output_name] = dataset.get_column(column_name).latest
            
    return Pipeline(expressions)

# Show that the two pipelines contain the same expressions.
equivalent_pipe = make_pipeline_with_less_code()
assert set(equivalent_pipe.columns.values()) == set(pipe.columns.values())
In [17]:
df.xs(24, level=1).loc['2016-01-23':'2016-02-04']
Out[17]:
fq1_eps_cons_mean fq1_period_label fq2_eps_cons_mean fq2_period_label fq3_eps_cons_mean fq3_period_label
2016-01-25 00:00:00+00:00 3.230009 2015-12-31 2.219051 2016-03-31 1.903893 2016-06-30
2016-01-26 00:00:00+00:00 3.230009 2015-12-31 2.219051 2016-03-31 1.903893 2016-06-30
2016-01-27 00:00:00+00:00 3.228289 2015-12-31 2.212514 2016-03-31 1.900579 2016-06-30
2016-01-28 00:00:00+00:00 2.008742 2016-03-31 1.789279 2016-06-30 2.005598 2016-09-30
2016-01-29 00:00:00+00:00 2.008742 2016-03-31 1.789279 2016-06-30 2.005598 2016-09-30
2016-02-01 00:00:00+00:00 2.000993 2016-03-31 1.773876 2016-06-30 1.988305 2016-09-30
2016-02-02 00:00:00+00:00 2.000993 2016-03-31 1.773876 2016-06-30 1.988305 2016-09-30
2016-02-03 00:00:00+00:00 2.000993 2016-03-31 1.773876 2016-06-30 1.988305 2016-09-30
2016-02-04 00:00:00+00:00 2.000993 2016-03-31 1.773876 2016-06-30 1.988305 2016-09-30

The most important thing to take away from this output is that on any given simulation date, fq1, fq2, and fq3 reference consecutive fiscal periods. When a quarterly report is published, these relative period references "roll" over to the next fiscal quarter (as seen on 2016-01-28).

The next cell plots the mean EPS estimate for the quarter ending on 2016-06-30 over the course of a year to demonstrate how it moves from fq3 to fq2 to fq1 over the span of several months.

In [18]:
from matplotlib import pyplot as plt
import pandas as pd
In [19]:
aapl_eps_est = df.xs(24, level=1)
fq1_series = aapl_eps_est[(aapl_eps_est.fq1_period_label == '2016-06-30')].fq1_eps_cons_mean
fq2_series = aapl_eps_est[(aapl_eps_est.fq2_period_label == '2016-06-30')].fq2_eps_cons_mean
fq3_series = aapl_eps_est[(aapl_eps_est.fq3_period_label == '2016-06-30')].fq3_eps_cons_mean
period_2016_06_30 = pd.concat([fq1_series, fq2_series, fq3_series]).sort_index()

plt.plot(
    period_2016_06_30, 
    linestyle='--', 
    color='black', 
    alpha=0.6, 
    label='fiscal quarter ending on 2016-06-30'
)
plt.plot(fq1_series)
plt.plot(fq2_series)
plt.plot(fq3_series)
plt.legend()
plt.title('EPS Estimate For AAPL Quarter Ending On 2016-06-30');

As seen above, as we move forward in our pipeline simulation, the period ending on 2016-06-30 moves from being referenced by our fq3 factor to our fq2 factor to fq1.

In the plot above, we fixed the absolute period label to be 2016-06-30 and plotted the EPS estimates for that fiscal quarter. In order to do that, we had to plot segments of fq3_eps_cons_mean, fq2_eps_cons_mean, and fq1_eps_cons_mean because as pipeline moves through a simulation, the relative offset of the AAPL's 2016-06-30 quarter moves closer to our current perspective.

In the next plot, we'll change things up and fix the relative period label to FQ1 and demonstrate how the absolute period label changes over the course of a simulation. Note that for this plot, we only have to use fq1_eps_cons_mean.

In [20]:
plt.plot(
    aapl_eps_est.fq1_eps_cons_mean, 
    linestyle='--', 
    color='black',
    alpha=0.6,
    label='EPS Estimate For Next Fiscal Quarter (fq1_eps_cons_mean)'
)

for period_label in aapl_eps_est.fq1_period_label.unique():
    fq1_segment = aapl_eps_est[aapl_eps_est.fq1_period_label == period_label].fq1_eps_cons_mean
    plt.plot(fq1_segment, label='Fiscal Period Ending On %s' % period_label)
    
plt.legend()
plt.title("EPS Estimate for AAPL's 'Next' Fiscal Quarter (FQ1) Throughout 2016");

In this plot, each colored line segment represents a different absolute fiscal quarter for AAPL in 2016. Over the course of the year, we see that FQ1 'rolls' from one fiscal quarter to the next, referencing a new absolute fiscal quarter each time AAPL publishes a quarterly report. This matches what we saw earlier in the tabular pipeline output.

Fixed Simulation Date; One Logical Daily Timeseries

In the last two plots, we ran the following two tests:

  1. Fix the absolute period label and see how the relative period label changes over the course of a simluation.
  2. Fix the relative period label and see how the absolute period label changes over the course of a simulation.

In the next plot, we will fix the simulation date and ask for a 6-month history of EPS estimates for FQ1 to see how analysts have changed their opinion of the upcoming quarter's EPS over the last half year. To demonstrate this, we will write a CustomFactor that plots the EPS estimates timeseries for AAPL over the specified window_length. Note that this use of CustomFactor is good for visualizing a timeseries passed to compute, but doesn't actually generate any output (i.e. don't use it in an algorithm!).

In [21]:
from quantopian.pipeline import CustomFactor
In [22]:
class PlotAAPLFactor(CustomFactor):
    """A CustomFactor that takes a single input and plots the data it receives for AAPL"""
    
    def compute(self, today, sids, out, values):
        # Index into our 2D array of estimates to zoom in on the estimates for AAPL (sid=24).
        plt.plot(values[:, sids.searchsorted(24)]);
In [23]:
fq1_eps_cons = PeriodicConsensus.slice('EPS', 'qf', 1)

pipe = Pipeline(
    columns={
        # This column doesn't actually output anything, we just add it as a column so
        # that pipeline executes the compute function that plots the timeseries.
        'plot_factor': PlotAAPLFactor(inputs=[fq1_eps_cons.mean], window_length=126)
    },
    screen=(fq1_eps_cons.num_est.latest > 2)
)
In [24]:
df = run_pipeline(pipe, '2016-01-01', '2016-01-01')

Unlike earlier plots in this notebook, there aren't any significant jumps in this timeseries. This is because on a single simulation date, if we ask pipeline to get us a historical window of estimates for a relative period, pipeline first figures out which absolute period we're asking for, then gets us a timeseries of esimates for that period. The result is a continuous timeseries along which we can make comparisons. Earlier, we referred to this as a logical daily timeseries.

It's worth noting that this behavior is not unique to CustomFactors. Any time a column generated from the PeriodicConsensus or Actuals dataset is passed as input to a custom factor or a built-in factor (e.g. Returns), pipeline will get a timeseries of estimates for the same absolute period.

Use Case: Percentage Change CustomFactor

Now that we've covered many of the new concepts and features related to estimates, let's build an example pipeline that uses estimates data.

Specifically, let's build a pipeline that defines a percentage change factor that looks at the change in estimates for FQ1 over the last 6 months. More specifically, every day, our factor will find the upcoming fiscal quarter and compute the percent change over the last 6 months in the mean estimate for that quarter's EPS. This factor will use the timeseries that we just looked at. We'll also add a column that ranks all assets in the QTradableStocksUS by our calculated percent change. (Remember, estimates slices work just like any other pipeline dataset, so we can use them with all the existing features of the Pipeline API).

In [25]:
from quantopian.pipeline import CustomFactor

class PercentChange(CustomFactor):
    """A CustomFactor that computes the percent change from start to end of a lookback window.
    """
    def compute(self, today, assets, out, data):
        out[:] = (data[-1] - data[0]) / data[0]
In [26]:
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.factset.estimates import PeriodicConsensus
from quantopian.pipeline.filters import QTradableStocksUS

from quantopian.research import run_pipeline

fq1_eps_cons = PeriodicConsensus.slice('EPS', 'qf', 1)
fq1_eps_cons_mean_pct_change = PercentChange(inputs=[fq1_eps_cons.mean], window_length=126)

pipe = Pipeline(
    columns={
        'fq1_eps_cons_mean_pct_change': fq1_eps_cons_mean_pct_change,
        'pct_change_rank': fq1_eps_cons_mean_pct_change.rank(mask=QTradableStocksUS()),
    },
    screen=QTradableStocksUS(),
)
In [27]:
df = run_pipeline(pipe, '2017-01-01', '2018-01-01')
In [28]:
df.head()
Out[28]:
fq1_eps_cons_mean_pct_change pct_change_rank
2017-01-03 00:00:00+00:00 Equity(2 [ARNC]) -0.340167 261.0
Equity(24 [AAPL]) 0.033001 1533.0
Equity(31 [ABAX]) -0.037038 1054.0
Equity(41 [ARCB]) -0.233066 371.0
Equity(52 [ABM]) -0.053267 926.0

This is just a simple example. In this notebook, we focused primarily on one slice of the PeriodicConsensus dataset family (EPS, quarterly frequency, one period forward). However, there are many ideas you can explore by combining PeriodicConsensus with Actuals, there are a handful of different items you can analyze, and you can look at estimates for various frequencies and period offsets.

Be sure to check out the other notebook attached to the first comment of the Estimates Announcement Post. The other notebook has several examples that use the PeriodicEstimates and Actuals datasets in ways we didn't cover in this notebook.

Conclusion

In this notebook, we introduced Quantopian's newest integrated source of data: FactSet Estimates.

Here's a quick recap of the topics we discussed:

  • We gave a brief introduction to financial estimates.
  • We introduced DataSetFamily, a new pipeline feature that, along with a new slice method, allows us to create DataSets by specifying coordinates.
  • We introduced PeriodicConsensus and Actuals, two new DataSetFamilys that allow us to use consensus estimates and actual reported values in pipeline.

In addition to the above topics, we also created several visualizations to illustrate how relative fiscal periods roll from one period to the next and how we can get an extended historical window for a particular fiscal period.

What's Next?

Estimates are the newest dataset allowed in the Quantopian Contest. Try adding estimates to one of your entries or submit a new entry altogether. If you like to learn from examples or you're looking for a place to get some ideas, check out the other notebook attached to the first comment of the Estimates Announcement Post that has several examples using estimates.