Notebook
In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pykalman import KalmanFilter
import statsmodels.api as sm

<center>

Using the Kalman Filter in Algorithmic Trading

<center>

<center>

Dr. Aidan O'Mahony

QuantCon Singapore, November 2016

</center>

Introduction

  • Markets are dynamic and ever changing
  • Traders and trading algorithms must adapt
    • Remain profitable
    • Reduce risk and market exposure
  • How do we build adaptive algorithms?
    • Update parameters monthly, weekly etc.
    • Rolling windows of data to calculate parameters
    • Machine learning techniques
  • Demonstrate an adaptative strategy
    • Simple pair trading abitrage strategy
    • Apply Kalman filter

Pair Trading

  • Common technique involving two or more assets
  • Assests have a conintegrating and mean reverting relationship
  • Exploit mispricing of assets
  • Pair trading involves the following steps:
    1. Identify possible pairs of assets
    2. Construct spread from assest relationship
    3. Test for cointegration
    4. Open long-short position when mispricing occurs
    5. Profit from future correction to mispricing

Pair Trading

  • Pros:
    • Negligible beta and therefore minimal exposure to the market
    • Returns are uncorreclated to market returns
  • Cons:
    • Implementation and execution is relatively complex
    • Identification of pairs is difficult and computationally expensive
    • Cointegrating relationshiop can change or break at any time

PEP - KO Pair Trade Example

<center> <center>

PEP - KO Pair Trade Example

In [2]:
secs = ['PEP', 'KO']
data = get_pricing(
    symbols(secs), start_date='2006-1-1', end_date='2008-8-1', 
    fields='close_price', frequency='daily')
data.columns = [sec.symbol for sec in data.columns]
data.index.name = 'Date'
In [3]:
(1 + data.pct_change()).cumprod().plot();
plt.ylabel('Cumulative Return');
  • Plot data and use colormap to indicate the date each point corresponds to

PEP - KO Relationship

In [4]:
cm = plt.get_cmap('jet')
colors = np.linspace(0.1, 1, len(data))
sc = plt.scatter(data[secs[0]], data[secs[1]], s=30, c=colors, cmap=cm, edgecolor='k', alpha=0.7)
cb = plt.colorbar(sc)
cb.ax.set_yticklabels([str(p.date()) for p in data[::len(data)//9].index])
plt.xlabel(secs[0])
plt.ylabel(secs[1]);

Construction of Spread

  • Linear regression:
$$y({\bf x}) = \beta^T {\bf x} + \epsilon$$$$\beta^T = (\beta_0, \beta_1, \ldots, \beta_p)$$$$\epsilon \sim \mathcal{N}(\mu, \sigma^2)$$
  • For one-dimensional case:
$$\beta^T = (\beta_0, \beta_1)$$$${\bf x} = \begin{pmatrix} 1 \\ x \end{pmatrix}$$
  • In our example:
$$x = p^\text{PEP}$$$$y({\bf x}) = p^\text{KO}$$
  • Spread is constructed by:
$$\epsilon = p^\text{KO} - (\beta_0, \beta_1 ) \begin{pmatrix} 1 \\ p^\text{PEP} \end{pmatrix}$$
$$\epsilon = p^\text{KO} - \beta_1 p^\text{PEP} - \beta_0$$
In [5]:
x = sm.add_constant(data[secs[0]], prepend=False)
ols = sm.OLS(data[secs[1]], x).fit()
beta = ols.params
y_fit = [x.min().dot(beta), x.max().dot(beta)]
In [6]:
print ols.summary2()
                  Results: Ordinary least squares
==================================================================
Model:              OLS              Adj. R-squared:     0.885    
Dependent Variable: KO               AIC:                2043.5547
Date:               2016-11-04 06:16 BIC:                2052.5086
No. Observations:   650              Log-Likelihood:     -1019.8  
Df Model:           1                F-statistic:        5002.    
Df Residuals:       648              Prob (F-statistic): 6.54e-307
R-squared:          0.885            Scale:              1.3539   
-------------------------------------------------------------------
            Coef.    Std.Err.     t      P>|t|    [0.025    0.975] 
-------------------------------------------------------------------
PEP          0.6419    0.0091   70.7230  0.0000    0.6240    0.6597
const      -16.7652    0.5982  -28.0280  0.0000  -17.9397  -15.5906
------------------------------------------------------------------
Omnibus:              4.083         Durbin-Watson:           0.075
Prob(Omnibus):        0.130         Jarque-Bera (JB):        4.052
Skew:                 -0.193        Prob(JB):                0.132
Kurtosis:             2.995         Condition No.:           864  
==================================================================

Linear Regression

In [7]:
cm = plt.get_cmap('jet')
colors = np.linspace(0.1, 1, len(data))
sc = plt.scatter(data[secs[0]], data[secs[1]], s=50, c=colors, cmap=cm, 
                 edgecolor='k', alpha=0.7, label='Price Data')
plt.plot([x.min()[0], x.max()[0]], y_fit, '--b', linewidth=3, label='OLS Fit')
plt.legend()
cb = plt.colorbar(sc)
cb.ax.set_yticklabels([str(p.date()) for p in data[::len(data)//9].index])
plt.xlabel(secs[0])
plt.ylabel(secs[1]);

PEP - KO Spread

In [8]:
spread = pd.DataFrame(data[secs[1]] - np.dot(sm.add_constant(data[secs[0]], prepend=False), beta))
spread.columns = [secs[0] + '-' + secs[1] + ' Spread']
In [9]:
spread.plot(style=['g']);

Test for Cointegration

In [10]:
# check for cointegration
adf = sm.tsa.stattools.adfuller(spread['PEP-KO Spread'], maxlag=1)
print 'ADF test statistic: %.02f' % adf[0]
print 'p-value: %.03f' % adf[1]
ADF test statistic: -3.28
p-value: 0.016
  • Augmented Dickey-Fuller test for cointegration:
    • ADF test statistic: -3.28
    • p-value: 0.016
In [11]:
spread['Middle'] = spread['PEP-KO Spread'].mean()
std = spread['PEP-KO Spread'].std()
spread['Upper'] = spread['Middle'] + std
spread['Lower'] = spread['Middle'] - std

Trading Rules

In [12]:
spread.plot(style=['g', '--b', '--y', '--y']);

Trading Rules

In [13]:
trades = pd.DataFrame(np.nan, index=spread.index, columns=['Buy', 'Sell'])
In [14]:
trades['Buy'][(spread['PEP-KO Spread'].shift(1) > spread['Lower']) & 
                (spread['PEP-KO Spread'] < spread['Lower'])] = 1
trades['Buy'][(spread['PEP-KO Spread'].shift(1) < spread['Middle']) & 
                (spread['PEP-KO Spread'] > spread['Middle'])] = 0

trades['Buy'].ffill(inplace=True)
trades['Buy'] = trades['Buy'].diff().shift(-1)
trades['Buy'][trades['Buy'] == 0] = np.nan
trades['Buy'][trades['Buy'] == -1] = 0
trades['Buy'] *= spread['Lower']

trades['Sell'][(spread['PEP-KO Spread'].shift(1) < spread['Upper']) & 
                (spread['PEP-KO Spread'] > spread['Upper'])] = 1
trades['Sell'][(spread['PEP-KO Spread'].shift(1) > spread['Middle']) & 
                (spread['PEP-KO Spread'] < spread['Middle'])] = 0

trades['Sell'].ffill(inplace=True)
trades['Sell'] = trades['Sell'].diff().shift(-1)
trades['Sell'][trades['Sell'] == 0] = np.nan
trades['Sell'][trades['Sell'] == -1] = 0
trades['Sell'] *= spread['Upper']
In [15]:
spread.plot(style=['g', '--b', '--y', '--y'])
plt.plot(trades['Buy'], 'm^', markersize=12, label='Buy')
plt.plot(trades['Sell'], 'cv', markersize=12, label='Sell')
plt.legend(loc=0);

Out of Sample

In [16]:
secs = ['PEP', 'KO']
data_oos = get_pricing(
    symbols(secs), start_date='2008-8-1', end_date='2010-1-1', 
    fields='close_price', frequency='daily')
data_oos.columns = [sec.symbol for sec in data_oos.columns]
data_oos.index.name = 'Date'
In [17]:
spread_oos = spread.reindex(spread.index + data_oos.index)
In [18]:
spread_oos['PEP-KO Spread OOS'] = data_oos[secs[1]] - np.dot(
        sm.add_constant(data_oos[secs[0]], prepend=False), beta)
In [19]:
spread_oos[['Middle', 'Upper', 'Lower']] = spread_oos[['Middle', 'Upper', 'Lower']].ffill()
In [20]:
spread_oos.plot(style=['g', '--b', '--y', '--y', 'r']);

Why?

In [21]:
data_all = data.append(data_oos)
cm = plt.get_cmap('jet')
colors = np.linspace(0.1, 1, len(data_all))
sc = plt.scatter(data_all[secs[0]], data_all[secs[1]], s=50, c=colors, cmap=cm, 
                 edgecolor='k', alpha=0.7, label='Price Data')
plt.plot([x.min()[0], x.max()[0]], y_fit, '--b', linewidth=3, label='OLS Fit')
plt.legend()
cb = plt.colorbar(sc)
cb.ax.set_yticklabels([str(p.date()) for p in data_all[::len(data_all)//9].index])
plt.xlabel(secs[0])
plt.ylabel(secs[1]);

Solution

  • Dynamically update beta coefficients
  • How?
    • Calculate OLS regression coeffecients every n days
    • Use moving window data to peform OLS regression
    • State space model of OLS regre