Notebook

# Researching & Developing a Market Neutral Strategy¶

The process involves the following steps:

• Researching partner data.
• Designing a pipeline.
• Analyzing an alpha factor with Alphalens.
• Implementing our factor in the IDE (see backtest in next comment).
• Evaluating the backtest using Pyfolio.

## Part 1 - Investigate the Data with Blaze¶

To start out, let's investigate a partner dataset using Blaze. Blaze allows you to define expressions for selecting and transforming data without loading all of the data into memory. This makes it a nice tool for interacting with large amounts of data in research.

In [1]:
import matplotlib.pyplot as plt
import pandas as pd

# http://blaze.readthedocs.io/en/latest/index.html
import blaze as bz

from zipline.utils.tradingcalendar import get_trading_days

from quantopian.interactive.data.sentdex import sentiment


Interactive datasets are Blaze expressions. Blaze expressions have a similar API to pandas, with some differences.

In [2]:
type(sentiment)

Out[2]:
<class 'blaze.expr.expressions.Field'>

Let's start by looking at a sample of the data in the Sentdex Sentiment Analysis dataset for AAPL.

In [3]:
aapl_sid = symbols('AAPL').sid

# Look at a sample of AAPL sentiment data starting from 2013-12-01.
sentiment[(sentiment.sid == aapl_sid) & (sentiment.asof_date >= '2013-12-01')].peek()

Out[3]:
symbol sentiment_signal sid asof_date timestamp
0 AAPL 1.0 24 2013-12-01 2013-12-02
1 AAPL 6.0 24 2013-12-02 2013-12-03
2 AAPL 5.0 24 2013-12-03 2013-12-04
3 AAPL 6.0 24 2013-12-04 2013-12-05
4 AAPL 4.0 24 2013-12-05 2013-12-06
5 AAPL 2.0 24 2013-12-06 2013-12-07
6 AAPL -1.0 24 2013-12-07 2013-12-08
7 AAPL -1.0 24 2013-12-08 2013-12-09
8 AAPL -3.0 24 2013-12-09 2013-12-10
9 AAPL -1.0 24 2013-12-10 2013-12-11
10 AAPL 6.0 24 2013-12-11 2013-12-12

Let's see how many securities are covered by this dataset between 12/2013 and 12/2014.

In [4]:
num_sids = bz.compute(sentiment.sid.distinct().count())
print 'Number of sids in the data: %d' % num_sids

Number of sids in the data: 586


Let's go back to AAPL and let's look at the sentiment signal each day. To do this, we can create a Blaze expression that selects trading days and another for the AAPL sid (24).

In [5]:
# Mask for trading days.
date_mask = sentiment.asof_date.isin(
get_trading_days(pd.Timestamp('2013-12-01'), pd.Timestamp('2014-12-01'))
)

# Mask for AAPL.
stock_mask = (sentiment.sid == aapl_sid)

# Blaze expression for AAPL sentiment on trading days between 12/2013 and 12/2014
sentiment_2014_expr = sentiment[date_mask & stock_mask].sort('asof_date')


Compute the expression. This returns the result in a pandas DataFrame.

In [6]:
sentiment_2014_df = bz.compute(sentiment_2014_expr)


Plot the sentiment signal for AAPL.

In [7]:
sentiment_2014_df.plot(x='asof_date', y='sentiment_signal')

Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fc9e0630bd0>

The sentiment signal tends to jump quite a bit. Let's try smoothing it by plotting the 5-day mean using the pandas.rolling_mean function. Note that we set the index of the Dataframe to be the asof_date so that the x-axis would be nicely formatted.

In [8]:
pd.rolling_mean(sentiment_2014_df.set_index('asof_date').sentiment_signal, window=5).plot()

Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fc9e0660590>

Great! Now let's use this data in a pipeline.

## Part 2 - Define Our Factor¶

Now that we have a dataset that we want to use, let's use it in a pipeline. In addition to the sentiment dataset, we will also use the EventVestor Earnings Calendar dataset to avoid trading around earnings announcements, and the EventVestor Mergers & Acquisitions dataset to avoid trading acquisition targets. We will work with the free versions of these datasets.

In [5]:
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline

from quantopian.pipeline.factors import SimpleMovingAverage
from quantopian.pipeline.filters.morningstar import Q1500US

# Sentdex Sentiment free from 15 Oct 2012 to 1 month ago.
from quantopian.pipeline.data.sentdex import sentiment

# EventVestor Earnings Calendar free from 01 Feb 2007 to 1 year ago.
from quantopian.pipeline.factors.eventvestor import (
BusinessDaysUntilNextEarnings,
BusinessDaysSincePreviousEarnings,
)

# EventVestor Mergers & Acquisitions free from 01 Feb 2007 to 1 year ago.
from quantopian.pipeline.filters.eventvestor import IsAnnouncedAcqTarget

from quantopian.pipeline.factors import BusinessDaysSincePreviousEvent

In [6]:
def make_pipeline():

# 5-day sentiment moving average factor.
sentiment_factor = SimpleMovingAverage(inputs=[sentiment.sentiment_signal], window_length=5)

# Filter for stocks that are not within 2 days of an earnings announcement.
not_near_earnings_announcement = ~((BusinessDaysUntilNextEarnings() <= 2)
| (BusinessDaysSincePreviousEarnings() <= 2))

# Filter for stocks that are announced acquisition target.
not_announced_acq_target = ~IsAnnouncedAcqTarget()

# Filter for stocks that had their sentiment signal updated in the last day.
new_info = (BusinessDaysSincePreviousEvent(inputs=[sentiment.asof_date.latest]) <= 1)

# Our universe is made up of stocks that have a non-null sentiment signal that was updated in
# the last day, are not within 2 days of an earnings announcement, are not announced acquisition
# targets, and are in the Q1500US.
universe = (Q1500US()
& sentiment_factor.notnull()
& not_near_earnings_announcement
& not_announced_acq_target
& new_info)

# Our pipeline is defined to have the rank of the sentiment_factor as the only column. It is
# screened by our universe filter.
pipe = Pipeline(
columns={
'sentiment': sentiment_factor.rank(mask=universe, method='average'),
},
screen=universe
)

return pipe

In [7]:
result = run_pipeline(make_pipeline(), start_date='2013-12-01', end_date='2014-12-01')


## Part 3 - Analyze Our Factor Using Alphalens¶

Now we can analyze our sentiment factor with Alphalens. To do this, we need to get pricing data using get_pricing.

In [8]:
# All assets that were returned in the pipeline result.
assets = result.index.levels[1].unique()

# We need to get a little more pricing data than the length of our factor so we
# can compare forward returns. We'll tack on another month in this example.
pricing = get_pricing(assets, start_date='2013-12-01', end_date='2015-01-01', fields='open_price')


Then we run a factor tearsheet on our factor. We will analyze 3 quantiles, looking at 1, 5, and 10-day lookahead periods.

In [9]:
import alphalens

alphalens.tears.create_factor_tear_sheet(factor=result['sentiment'],
prices=pricing,
quantiles=3,
periods=(1,5,10))

Returns Analysis

1 5 10
Ann. alpha 0.019 0.005 -0.002
beta -0.017 -0.028 -0.027
Mean Period Wise Return Top Quantile (bps) 1.407 0.238 -0.125
Mean Period Wise Return Bottom Quantile (bps) -1.000 -0.552 -0.181
Mean Period Wise Spread (bps) 2.291 0.690 -0.080
Information Analysis

1 5 10
IC Mean 0.005 0.002 -0.005
IC Std. 0.051 0.055 0.058
t-stat(IC) 1.503 0.574 -1.343
p-value(IC) 0.134 0.567 0.181
IC Skew -0.021 -0.107 -0.398
IC Kurtosis 0.294 -0.051 0.386
Ann. IR 1.503 0.574 -1.343
Turnover Analysis

1
Quantile 1 Mean Turnover 0.056
Quantile 2 Mean Turnover 0.145
Quantile 3 Mean Turnover 0.085
1
Mean Factor Rank Autocorrelation 0.982
/usr/local/lib/python2.7/dist-packages/matplotlib/axes/_axes.py:2790: MatplotlibDeprecationWarning: Use of None object as fmt keyword argument to suppress plotting of data values is deprecated since 1.4; use the string "none" instead.
warnings.warn(msg, mplDeprecation, stacklevel=1)

<matplotlib.figure.Figure at 0x7f57e80e2510>