Notebook
In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pykalman import KalmanFilter
import statsmodels.api as sm


<center></center>

# Using the Kalman Filter in Algorithmic Trading

<center></center>

<center>

</center>

# Introduction¶

• Markets are dynamic and ever changing
• Remain profitable
• Reduce risk and market exposure
• How do we build adaptive algorithms?
• Update parameters monthly, weekly etc.
• Rolling windows of data to calculate parameters
• Machine learning techniques
• Simple pair trading abitrage strategy
• Apply Kalman filter

• Common technique involving two or more assets
• Assests have a conintegrating and mean reverting relationship
• Exploit mispricing of assets
• Pair trading involves the following steps:
1. Identify possible pairs of assets
2. Construct spread from assest relationship
3. Test for cointegration
4. Open long-short position when mispricing occurs
5. Profit from future correction to mispricing

• Pros:
• Negligible beta and therefore minimal exposure to the market
• Returns are uncorreclated to market returns
• Cons:
• Implementation and execution is relatively complex
• Identification of pairs is difficult and computationally expensive
• Cointegrating relationshiop can change or break at any time

# PEP - KO Pair Trade Example¶

<center></center>

<center></center>

# PEP - KO Pair Trade Example¶

In [2]:
secs = ['PEP', 'KO']
data = get_pricing(
symbols(secs), start_date='2006-1-1', end_date='2008-8-1',
fields='close_price', frequency='daily')
data.columns = [sec.symbol for sec in data.columns]
data.index.name = 'Date'

In [3]:
(1 + data.pct_change()).cumprod().plot();
plt.ylabel('Cumulative Return');

• Plot data and use colormap to indicate the date each point corresponds to

# PEP - KO Relationship¶

In [4]:
cm = plt.get_cmap('jet')
colors = np.linspace(0.1, 1, len(data))
sc = plt.scatter(data[secs[0]], data[secs[1]], s=30, c=colors, cmap=cm, edgecolor='k', alpha=0.7)
cb = plt.colorbar(sc)
cb.ax.set_yticklabels([str(p.date()) for p in data[::len(data)//9].index])
plt.xlabel(secs[0])
plt.ylabel(secs[1]);


• Linear regression:
$$y({\bf x}) = \beta^T {\bf x} + \epsilon$$$$\beta^T = (\beta_0, \beta_1, \ldots, \beta_p)$$$$\epsilon \sim \mathcal{N}(\mu, \sigma^2)$$
• For one-dimensional case:
$$\beta^T = (\beta_0, \beta_1)$$$${\bf x} = \begin{pmatrix} 1 \ x \end{pmatrix}$$
• In our example:
$$x = p^\text{PEP}$$$$y({\bf x}) = p^\text{KO}$$
$$\epsilon = p^\text{KO} - (\beta_0, \beta_1 ) \begin{pmatrix} 1 \ p^\text{PEP} \end{pmatrix}$$
$$\epsilon = p^\text{KO} - \beta_1 p^\text{PEP} - \beta_0$$
In [5]:
x = sm.add_constant(data[secs[0]], prepend=False)
ols = sm.OLS(data[secs[1]], x).fit()
beta = ols.params
y_fit = [x.min().dot(beta), x.max().dot(beta)]

In [6]:
print ols.summary2()

                  Results: Ordinary least squares
==================================================================
Dependent Variable: KO               AIC:                2043.5547
Date:               2017-01-17 22:34 BIC:                2052.5086
No. Observations:   650              Log-Likelihood:     -1019.8
Df Model:           1                F-statistic:        5002.
Df Residuals:       648              Prob (F-statistic): 6.54e-307
R-squared:          0.885            Scale:              1.3539
-------------------------------------------------------------------
Coef.    Std.Err.     t      P>|t|    [0.025    0.975]
-------------------------------------------------------------------
PEP          0.6419    0.0091   70.7230  0.0000    0.6240    0.6597
const      -16.7652    0.5982  -28.0280  0.0000  -17.9397  -15.5906
------------------------------------------------------------------
Omnibus:              4.083         Durbin-Watson:           0.075
Prob(Omnibus):        0.130         Jarque-Bera (JB):        4.052
Skew:                 -0.193        Prob(JB):                0.132
Kurtosis:             2.995         Condition No.:           864
==================================================================



# Linear Regression¶

In [7]:
cm = plt.get_cmap('jet')
colors = np.linspace(0.1, 1, len(data))
sc = plt.scatter(data[secs[0]], data[secs[1]], s=50, c=colors, cmap=cm,
edgecolor='k', alpha=0.7, label='Price Data')
plt.plot([x.min()[0], x.max()[0]], y_fit, '--b', linewidth=3, label='OLS Fit')
plt.legend()
cb = plt.colorbar(sc)
cb.ax.set_yticklabels([str(p.date()) for p in data[::len(data)//9].index])
plt.xlabel(secs[0])
plt.ylabel(secs[1]);


In [8]:
spread = pd.DataFrame(data[secs[1]] - np.dot(sm.add_constant(data[secs[0]], prepend=False), beta))

In [9]:
spread.plot(style=['g']);


# Test for Cointegration¶

In [10]:
# check for cointegration

ADF test statistic: -3.28
p-value: 0.016

• Augmented Dickey-Fuller test for cointegration:
• p-value: 0.016
In [11]:
spread['Middle'] = spread['PEP-KO Spread'].mean()


In [12]:
spread.plot(style=['g', '--b', '--y', '--y']);


In [13]:
trades = pd.DataFrame(np.nan, index=spread.index, columns=['Buy', 'Sell'])

In [14]:
trades['Buy'][(spread['PEP-KO Spread'].shift(1) > spread['Lower']) &


In [15]:
spread.plot(style=['g', '--b', '--y', '--y'])
plt.legend(loc=0);


# Out of Sample¶

In [16]:
secs = ['PEP', 'KO']
data_oos = get_pricing(
symbols(secs), start_date='2008-8-1', end_date='2010-1-1',
fields='close_price', frequency='daily')
data_oos.columns = [sec.symbol for sec in data_oos.columns]
data_oos.index.name = 'Date'

In [17]:
spread_oos = spread.reindex(spread.index + data_oos.index)

In [18]:
spread_oos['PEP-KO Spread OOS'] = data_oos[secs[1]] - np.dot(

In [19]:
spread_oos[['Middle', 'Upper', 'Lower']] = spread_oos[['Middle', 'Upper', 'Lower']].ffill()

In [20]:
spread_oos.plot(style=['g', '--b', '--y', '--y', 'r']);


# Why?¶

In [21]:
data_all = data.append(data_oos)
cm = plt.get_cmap('jet')
colors = np.linspace(0.1, 1, len(data_all))
sc = plt.scatter(data_all[secs[0]], data_all[secs[1]], s=50, c=colors, cmap=cm,
edgecolor='k', alpha=0.7, label='Price Data')
plt.plot([x.min()[0], x.max()[0]], y_fit, '--b', linewidth=3, label='OLS Fit')
plt.legend()
cb = plt.colorbar(sc)
cb.ax.set_yticklabels([str(p.date()) for p in data_all[::len(data_all)//9].index])
plt.xlabel(secs[0])
plt.ylabel(secs[1]);


# Solution¶

• Dynamically update beta coefficients
• How?
• Calculate OLS regression coeffecients every n days
• Use moving window data t