Notebook

## Notebook to Accompany "Driven to Distraction" Algorithm¶

In the paper "Driven to Distraction: Extraneous Events and Underreaction to Earnings News", the authors compare Post Earnings Announcement Drift for stocks that announce earnings during peak earnings season and stocks that announce earnings when there are fewer other competing announcements.

I backtested this strategy in Quantopian: Every day, I look at whether the companies announcing earnings that day are either in the top or bottom quintile of earnings announcements. If it's in the top quintile of announcememnts, I go long PEAD by buying the top quintile of earnings beaters and shorting the bottom quintile of earnings missers. On the other hand, if it is in the bottom quintile of announcements, I do the opposite PEAD trade: I short the top quintile of beaters and go long the bottom quintile of missers.

In order to estimeate the quintile cutoffs for beaters and missers, as well as the quintile cutoffs for stocks on high and low announcement days, I created this notebook.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from quantopian.pipeline import Pipeline
from quantopian.pipeline import CustomFactor
from quantopian.research import run_pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.filters import Q500US

from quantopian.pipeline.data.eventvestor import EarningsCalendar

from quantopian.pipeline.factors.eventvestor import (
)

from quantopian.pipeline.data.zacks import EarningsSurprises


First we'll find the distribution of earnings announcements, and compute some summary statistics. We'll get data from the last four quarters (this may take a while, and you may need to shutdown some open notebooks).

In [2]:
pipe = Pipeline()
pipe.set_screen(Q500US() & EarningsSurprises.eps_mean_est.latest.notnan())
pipe_output = run_pipeline(pipe, start_date='2015-10-01',end_date='2016-09-30')
pipe_output[:20]

Out[2]:
2015-10-01 00:00:00+00:00 Equity(2 [ARNC]) 61.0
Equity(24 [AAPL]) 52.0
Equity(62 [ABT]) 51.0
Equity(76 [TAP]) 40.0
Equity(161 [AEP]) 50.0
Equity(166 [AES]) 38.0
Equity(168 [AET]) 42.0
Equity(185 [AFL]) 47.0
Equity(216 [HES]) 436.0
Equity(239 [AIG]) 43.0
Equity(300 [ALK]) 505.0
Equity(337 [AMAT]) 35.0
Equity(357 [TWX]) 41.0
Equity(368 [AMGN]) 45.0
Equity(438 [AON]) 44.0
Equity(448 [APA]) 40.0

To get the number of companies announcing earnings on each day of the quarter:

In [3]:
earners=pipe_output[pipe_output['business_days_since']==0]
earners[:20]

Out[3]:
2015-10-01 00:00:00+00:00 Equity(5121 [MU]) 0.0
2015-10-06 00:00:00+00:00 Equity(5885 [PEP]) 0.0
Equity(17787 [YUM]) 0.0
2015-10-07 00:00:00+00:00 Equity(22140 [MON]) 0.0
Equity(24873 [STZ]) 0.0
2015-10-08 00:00:00+00:00 Equity(2 [ARNC]) 0.0
2015-10-13 00:00:00+00:00 Equity(1937 [CSX]) 0.0
Equity(2696 [FAST]) 0.0
Equity(4151 [JNJ]) 0.0
Equity(4485 [LLTC]) 0.0
Equity(25006 [JPM]) 0.0
2015-10-14 00:00:00+00:00 Equity(700 [BAC]) 0.0
Equity(6068 [PNC]) 0.0
Equity(8151 [WFC]) 0.0
Equity(8344 [XLNX]) 0.0
Equity(20689 [BLK]) 0.0
Equity(23709 [NFLX]) 0.0
Equity(33729 [DAL]) 0.0
2015-10-15 00:00:00+00:00 Equity(1335 [C]) 0.0
Equity(4221 [KEY]) 0.0

The Pipeline output is a hierarchially-indexed (or multi-indexed) DataFrame, indexed on date and security (a three-dimensional array). We can use the unstack method to make it a traditional two dimensional DataFrame (or alternatively you can use groupby(level=0)), and count how many stocks announce on each day.

In [4]:
earnings_count=earners.unstack().count(axis=1)
earnings_count.sort_index()

Out[4]:
2015-10-01 00:00:00+00:00     1
2015-10-06 00:00:00+00:00     2
2015-10-07 00:00:00+00:00     2
2015-10-08 00:00:00+00:00     1
2015-10-13 00:00:00+00:00     5
2015-10-14 00:00:00+00:00     7
2015-10-15 00:00:00+00:00    12
2015-10-16 00:00:00+00:00     6
2015-10-19 00:00:00+00:00     6
2015-10-20 00:00:00+00:00    17
2015-10-21 00:00:00+00:00    23
2015-10-22 00:00:00+00:00    37
2015-10-23 00:00:00+00:00    10
2015-10-26 00:00:00+00:00     6
2015-10-27 00:00:00+00:00    37
2015-10-28 00:00:00+00:00    35
2015-10-29 00:00:00+00:00    35
2015-10-30 00:00:00+00:00    13
2015-11-02 00:00:00+00:00    13
2015-11-03 00:00:00+00:00    24
2015-11-04 00:00:00+00:00    23
2015-11-05 00:00:00+00:00    18
2015-11-06 00:00:00+00:00     3
2015-11-09 00:00:00+00:00     3
2015-11-10 00:00:00+00:00     3
2015-11-11 00:00:00+00:00     1
2015-11-12 00:00:00+00:00     7
2015-11-13 00:00:00+00:00     1
2015-11-16 00:00:00+00:00     1
2015-11-17 00:00:00+00:00     3
..
2016-08-04 00:00:00+00:00    21
2016-08-05 00:00:00+00:00     6
2016-08-08 00:00:00+00:00     6
2016-08-09 00:00:00+00:00    16
2016-08-10 00:00:00+00:00     2
2016-08-11 00:00:00+00:00     4
2016-08-12 00:00:00+00:00     1
2016-08-15 00:00:00+00:00     1
2016-08-16 00:00:00+00:00     5
2016-08-17 00:00:00+00:00     9
2016-08-18 00:00:00+00:00     5
2016-08-19 00:00:00+00:00     3
2016-08-23 00:00:00+00:00     3
2016-08-24 00:00:00+00:00     4
2016-08-25 00:00:00+00:00     9
2016-08-29 00:00:00+00:00     1
2016-08-30 00:00:00+00:00     2
2016-08-31 00:00:00+00:00     1
2016-09-01 00:00:00+00:00     4
2016-09-07 00:00:00+00:00     1
2016-09-08 00:00:00+00:00     1
2016-09-09 00:00:00+00:00     1
2016-09-15 00:00:00+00:00     1
2016-09-20 00:00:00+00:00     3
2016-09-21 00:00:00+00:00     3
2016-09-22 00:00:00+00:00     2
2016-09-26 00:00:00+00:00     1
2016-09-27 00:00:00+00:00     1
2016-09-28 00:00:00+00:00     1
2016-09-29 00:00:00+00:00     3
dtype: int64

And the distribution of number of announcements:

In [5]:
earnings_count.hist(bins=20)
plt.ylabel("Frequency")
plt.xlabel("Number of Earnings Announcements")

Out[5]:
<matplotlib.text.Text at 0x7fccb2249390>

Below we compute the quintiles of this distribution.

Because there are many companes that announce on the highest 20% of days and fewer companies that announce on low announcement days, it is important to note that these aren't the cutoffs that divide the universe into quintiles of equal number of stocks (we do that calculation later in the notebook).

In [6]:
D5=earnings_count.quantile(.8)
D1=earnings_count.quantile(.2)
print  'Number of earnings announcements on the top 20%% announcement days: %d' % D5
print  'Number of earnings announcements on the bottom 20%% announcement days: %d' % D1

Number of earnings announcements on the top 20% announcement days: 16
Number of earnings announcements on the bottom 20% announcement days: 1


Here are the number of announcements over the four quarters (we first have to add back the dates that had zero earnings announcements, which were filtered out above). The peaks occurs around four weeks after the end of the quarter.

In [7]:
no_ann_dates=list(set(pipe_output.unstack().index).symmetric_difference(earners.unstack().index))
no_earnings=pd.Series(np.zeros(len(no_ann_dates)),index=no_ann_dates)
earnings_count=pd.concat([earnings_count,no_earnings])
earnings_count.sort_index().plot()
plt.ylabel("Number of Earnings Announcements")

Out[7]:
<matplotlib.text.Text at 0x7fccb0a33b10>

Here is how I get the $NRANK$ quintiles. Suppose we have 500 stocks in our universe (for Q500US, we actually have a little less than that) and for the bottom quintile, we want the number of earnings announcements on the day that the 100th lowest company announces. We sort the days by the number of announcements, and say there are 10 days when only one company announces earnings, 15 days when two companies announce earnings (so 30 more companies), and so on, until we get to the 100th company. We find the row number (idx below) and see how many announcements that corresponds to. We do the same for the top quintile.

In [8]:
earnings_count_sort=earnings_count.sort_values()
earnings_count_sort.index=range(len(earnings_count))
earnings_count_cumsum=earnings_count_sort.cumsum()
earnings_count_cumsum.index=range(len(earnings_count))
# For top quintile
p=.8
cutoff=p*earnings_count_sort.sum()
idx=(earnings_count_cumsum-cutoff).abs().idxmin()
N5=earnings_count_sort[idx]
# For bottom quintile
p=.2
cutoff=p*earnings_count_sort.sum()
idx=(earnings_count_cumsum-cutoff).abs().idxmin()
N1=earnings_count_sort[idx]
print  'Number of earnings announcements for the top 20%% companies: %d' % N5
print  'Number of earnings announcements for the bottom 20%% companies: %d' % N1

Number of earnings announcements for the top 20% companies: 35
Number of earnings announcements for the bottom 20% companies: 7


Earnings Beat and Miss Quintiles

To get the earnings beat and miss quintile cutoffs, we'll download Zacks data for the last four quarters (again, this may take a while, and you may need to shutdown some open notebooks). We define the earnings surprise, or forecast error $FE$ as $$FE=\frac{E-F}{P}$$ where $E$ is the actual earnings, $F$ is the consensus forecast, and $P$ is the stock price at the time of the earnings announcement.

In [9]:
pipe = Pipeline()
earnings_beat_normalized=((EarningsSurprises.eps_act.latest-EarningsSurprises.eps_mean_est.latest)/
USEquityPricing.close.latest)
pipe.set_screen(Q500US() & EarningsSurprises.eps_mean_est.latest.notnan() & earn_today)
FE_output = run_pipeline(pipe, start_date='2015-10-01',end_date='2016-09-30')
FE_output[:20]

Out[9]:
earnings_beat_normalized
2015-10-01 00:00:00+00:00 Equity(5121 [MU]) -0.002004
2015-10-06 00:00:00+00:00 Equity(5885 [PEP]) 0.000939
Equity(17787 [YUM]) 0.000722
2015-10-07 00:00:00+00:00 Equity(22140 [MON]) 0.003883
Equity(24873 [STZ]) 0.000305
2015-10-08 00:00:00+00:00 Equity(2 [ARNC]) -0.003660
2015-10-13 00:00:00+00:00 Equity(1937 [CSX]) 0.001057
Equity(2696 [FAST]) 0.000258
Equity(4151 [JNJ]) 0.000208
Equity(4485 [LLTC]) -0.000485
Equity(25006 [JPM]) 0.001621
2015-10-14 00:00:00+00:00 Equity(700 [BAC]) 0.005795
Equity(6068 [PNC]) 0.001241
Equity(8151 [WFC]) -0.000193
Equity(8344 [XLNX]) 0.000450
Equity(20689 [BLK]) 0.000507
Equity(23709 [NFLX]) 0.000091
Equity(33729 [DAL]) 0.001047
2015-10-15 00:00:00+00:00 Equity(1335 [C]) 0.001972
Equity(4221 [KEY]) -0.000787

We can simply compute the desired quintiles of $FE$:

In [10]:
E5=FE_output['earnings_beat_normalized'].quantile(.8)
E1=FE_output['earnings_beat_normalized'].quantile(.2)
print  'The top 20%% earnings beat is: %.5f' % E5
print  'The bottom 20%% earnings beat is: %.5f' % E1

The top 20% earnings beat is: 0.00160
The bottom 20% earnings beat is: -0.00023

In [ ]: