Notebook

Researching & Developing a Market Neutral Strategy

The process involves the following steps:

  • Researching partner data.
  • Designing a pipeline.
  • Analyzing an alpha factor with Alphalens.
  • Implementing our factor in the IDE (see backtest in next comment).
  • Evaluating the backtest using Pyfolio.

Part 1 - Investigate the Data with Blaze

To start out, let's investigate a partner dataset using Blaze. Blaze allows you to define expressions for selecting and transforming data without loading all of the data into memory. This makes it a nice tool for interacting with large amounts of data in research.

In [1]:
import matplotlib.pyplot as plt
import pandas as pd

# http://blaze.readthedocs.io/en/latest/index.html
import blaze as bz

from zipline.utils.tradingcalendar import get_trading_days

from quantopian.interactive.data.sentdex import sentiment

Interactive datasets are Blaze expressions. Blaze expressions have a similar API to pandas, with some differences.

In [2]:
type(sentiment)
Out[2]:
<class 'blaze.expr.expressions.Field'>

Let's start by looking at a sample of the data in the Sentdex Sentiment Analysis dataset for AAPL.

In [3]:
aapl_sid = symbols('AAPL').sid

# Look at a sample of AAPL sentiment data starting from 2013-12-01.
sentiment[(sentiment.sid == aapl_sid) & (sentiment.asof_date >= '2013-12-01')].peek()
Out[3]:
symbol sentiment_signal sid asof_date timestamp
0 AAPL 1.0 24 2013-12-01 2013-12-02
1 AAPL 6.0 24 2013-12-02 2013-12-03
2 AAPL 5.0 24 2013-12-03 2013-12-04
3 AAPL 6.0 24 2013-12-04 2013-12-05
4 AAPL 4.0 24 2013-12-05 2013-12-06
5 AAPL 2.0 24 2013-12-06 2013-12-07
6 AAPL -1.0 24 2013-12-07 2013-12-08
7 AAPL -1.0 24 2013-12-08 2013-12-09
8 AAPL -3.0 24 2013-12-09 2013-12-10
9 AAPL -1.0 24 2013-12-10 2013-12-11
10 AAPL 6.0 24 2013-12-11 2013-12-12

Let's see how many securities are covered by this dataset between 12/2013 and 12/2014.

In [4]:
num_sids = bz.compute(sentiment.sid.distinct().count())
print 'Number of sids in the data: %d' % num_sids
Number of sids in the data: 586

Let's go back to AAPL and let's look at the sentiment signal each day. To do this, we can create a Blaze expression that selects trading days and another for the AAPL sid (24).

In [5]:
# Mask for trading days.
date_mask = sentiment.asof_date.isin(
    get_trading_days(pd.Timestamp('2013-12-01'), pd.Timestamp('2014-12-01'))
)

# Mask for AAPL.
stock_mask = (sentiment.sid == aapl_sid)

# Blaze expression for AAPL sentiment on trading days between 12/2013 and 12/2014
sentiment_2014_expr = sentiment[date_mask & stock_mask].sort('asof_date')

Compute the expression. This returns the result in a pandas DataFrame.

In [6]:
sentiment_2014_df = bz.compute(sentiment_2014_expr)

Plot the sentiment signal for AAPL.

In [7]:
sentiment_2014_df.plot(x='asof_date', y='sentiment_signal')
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fc9e0630bd0>

The sentiment signal tends to jump quite a bit. Let's try smoothing it by plotting the 5-day mean using the pandas.rolling_mean function. Note that we set the index of the Dataframe to be the asof_date so that the x-axis would be nicely formatted.

In [8]:
pd.rolling_mean(sentiment_2014_df.set_index('asof_date').sentiment_signal, window=5).plot()
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fc9e0660590>

Great! Now let's use this data in a pipeline.

Part 2 - Define Our Factor

Now that we have a dataset that we want to use, let's use it in a pipeline. In addition to the sentiment dataset, we will also use the EventVestor Earnings Calendar dataset to avoid trading around earnings announcements, and the EventVestor Mergers & Acquisitions dataset to avoid trading acquisition targets. We will work with the free versions of these datasets.

In [5]:
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline

from quantopian.pipeline.factors import SimpleMovingAverage
from quantopian.pipeline.filters.morningstar import Q1500US

# Sentdex Sentiment free from 15 Oct 2012 to 1 month ago.
from quantopian.pipeline.data.sentdex import sentiment

# EventVestor Earnings Calendar free from 01 Feb 2007 to 1 year ago.
from quantopian.pipeline.factors.eventvestor import (
    BusinessDaysUntilNextEarnings,
    BusinessDaysSincePreviousEarnings,
)

# EventVestor Mergers & Acquisitions free from 01 Feb 2007 to 1 year ago.
from quantopian.pipeline.filters.eventvestor import IsAnnouncedAcqTarget

from quantopian.pipeline.factors import BusinessDaysSincePreviousEvent
In [6]:
def make_pipeline():
    
    # 5-day sentiment moving average factor.
    sentiment_factor = SimpleMovingAverage(inputs=[sentiment.sentiment_signal], window_length=5)
    
    # Filter for stocks that are not within 2 days of an earnings announcement.
    not_near_earnings_announcement = ~((BusinessDaysUntilNextEarnings() <= 2)
                                | (BusinessDaysSincePreviousEarnings() <= 2))
    
    # Filter for stocks that are announced acquisition target.
    not_announced_acq_target = ~IsAnnouncedAcqTarget()
    
    # Filter for stocks that had their sentiment signal updated in the last day.
    new_info = (BusinessDaysSincePreviousEvent(inputs=[sentiment.asof_date.latest]) <= 1)
    
    # Our universe is made up of stocks that have a non-null sentiment signal that was updated in
    # the last day, are not within 2 days of an earnings announcement, are not announced acquisition
    # targets, and are in the Q1500US.
    universe = (Q1500US() 
                & sentiment_factor.notnull() 
                & not_near_earnings_announcement
                & not_announced_acq_target
                & new_info)
    
    # Our pipeline is defined to have the rank of the sentiment_factor as the only column. It is
    # screened by our universe filter.
    pipe = Pipeline(
        columns={
            'sentiment': sentiment_factor.rank(mask=universe, method='average'),
        },
        screen=universe
    )
    
    return pipe
In [7]:
result = run_pipeline(make_pipeline(), start_date='2013-12-01', end_date='2014-12-01')

Part 3 - Analyze Our Factor Using Alphalens

Now we can analyze our sentiment factor with Alphalens. To do this, we need to get pricing data using get_pricing.

In [8]:
# All assets that were returned in the pipeline result.
assets = result.index.levels[1].unique()

# We need to get a little more pricing data than the length of our factor so we 
# can compare forward returns. We'll tack on another month in this example.
pricing = get_pricing(assets, start_date='2013-12-01', end_date='2015-01-01', fields='open_price')

Then we run a factor tearsheet on our factor. We will analyze 3 quantiles, looking at 1, 5, and 10-day lookahead periods.

In [9]:
import alphalens

alphalens.tears.create_factor_tear_sheet(factor=result['sentiment'],
                                         prices=pricing,
                                         quantiles=3,
                                         periods=(1,5,10))
Returns Analysis
1 5 10
Ann. alpha 0.019 0.005 -0.002
beta -0.017 -0.028 -0.027
Mean Period Wise Return Top Quantile (bps) 1.407 0.238 -0.125
Mean Period Wise Return Bottom Quantile (bps) -1.000 -0.552 -0.181
Mean Period Wise Spread (bps) 2.291 0.690 -0.080
Information Analysis
1 5 10
IC Mean 0.005 0.002 -0.005
IC Std. 0.051 0.055 0.058
t-stat(IC) 1.503 0.574 -1.343
p-value(IC) 0.134 0.567 0.181
IC Skew -0.021 -0.107 -0.398
IC Kurtosis 0.294 -0.051 0.386
Ann. IR 1.503 0.574 -1.343
Turnover Analysis
1
Quantile 1 Mean Turnover 0.056
Quantile 2 Mean Turnover 0.145
Quantile 3 Mean Turnover 0.085
1
Mean Factor Rank Autocorrelation 0.982
/usr/local/lib/python2.7/dist-packages/matplotlib/axes/_axes.py:2790: MatplotlibDeprecationWarning: Use of None object as fmt keyword argument to suppress plotting of data values is deprecated since 1.4; use the string "none" instead.
  warnings.warn(msg, mplDeprecation, stacklevel=1)
<matplotlib.figure.Figure at 0x7f57e80e2510>