Notebook

Quantopian's slippage model for futures

This notebook is an overview of Quantopian's slippage model for futures, VolatilityVolumeShare. If you aren't familiar with what slippage is, consider a simple example: buying futures contracts for coffee. We want to buy five contracts at \$500 apiece. We buy the first four, but by the time we buy the fifth, the price of the coffee contract has increased to \$503. This change in price is what is known as slippage. It will typically work against us — our own buys in a market increase demand, which in-turn increases price.

Our slippage model for futures should, within the backtester, slightly increase the price on a buy and slightly decrease the price on a sell. We can do this similarly to how we apply slippage to stocks. If $MI$ is our market impact, $P_0$ is the original price of the future, and $P$ is the price with slippage applied, then

$ P = P_0 \left(1 + MI\right) $

$MI$ will be positive for buys and negative for sells.

To model slippage here, we need to model $MI$. One way to model $MI$ is with

$ MI = \eta \sigma \sqrt{\psi}$

Where $\sigma$ is volatility (in this case, 20-day annualized daily volatility), $\psi = \frac{our\ volume\ traded\ in\ bar}{20-day\ ADV}$, and $\eta$ is a constant we fit for every continuous future.

To fit the constant $\eta$ for each future, we first compute it once per month for the asset. We do this by looking at the trailing 60 days of price and volume minute bars, and regressing on

$ \left|r\right| = \eta\sigma\sqrt{J} $

Where $r$ is the percentage change of the asset's price in each bar (which we take the absolute value of) and $J$ is $\frac{market\ volume\ traded\ in\ bar}{20-day\ ADV}$ for every bar. This method is known as Kyle's Lambda.

Once we have a value of $\eta$ for each month in a certain time frame (5 years maximum for each future, in this case), we average these values together to get the final value of $\eta$ we will use for each continuous future. This notebook is meant to calculate these $\eta$ values. It can also be run without the volatility ($\sigma$) term.

Examples

  • Say we're trying to buy 100 front contracts of ES, the E-Mini S&P 500, in one minute. If E-Mini is trading at \$2000, with an ADV of 1.4 million contracts/day and a volatility of 9%. Eta for ES is 0.047. Our market impact is then $MI = 0.047 \cdot 0.09 \cdot \sqrt{\frac{100}{1,400,000}} = 0.00003575 = 0.3575 \ \text{bps}$. (bps, or [basis points](https://en.wikipedia.org/wiki/Basis_point), is one-hundredth of one percent). Therefore, the price we would buy E-Mini at here, according to our slippage model, is $ \$2000 \cdot \left(1 + 0.00003575\right) = \$2000.0715 $.

  • In this example, we'll pretend to buy 50 contracts of CL, the crude oil future, in one minute. Pretend CL is trading at \$60 with an ADV of 400,000 contracts and a volatility of 23%. Eta is 0.049. In this case, our market impact is $ MI = 0.049 \cdot 0.23 \cdot \sqrt{\frac{50}{400,000}} = 1.26 \ \text{bps}$. Our buy price is then $ \$50 \cdot \left(1 + 0.000126\right) = \$50.0063 $.

In [1]:
from __future__ import division
import numpy as np
import pandas as pd
import empyrical
from quantopian.research.experimental import continuous_future, history

# This can take a while to run, so we'll only do a few here
root_symbols = ['HG', 'HO', 'GC'] # 'HU', 'QM', 'PA', 'XG', 'PL', 'CL',
                #'QG', 'YS', 'SV', 'NG', 'XB', 'EU', 'NZ', 'JY', 'EC',
                #'ME', 'AD', 'SF', 'EL', 'EE', 'BP', 'JE', 'CD', 'LB',
                #'WC', 'ET', 'SM', 'FC', 'CM', 'CN', 'MW', 'PB', 'LC',
                #'SB', 'MS', 'OA', 'SY', 'LH', 'RR', 'BO', 'UB', 'FS',
                #'TB', 'TY', 'FV', 'TS', 'TU', 'FF', 'MB', 'FI', 'TN',
                #'US', 'ED', 'MI', 'NK', 'MD', 'RM', 'DJ', 'MG', 'NQ',
                #'ER', 'BD', 'EI', 'ES', 'VX', 'SP', 'ND', 'YM', 'AI']

# Only fit for some data. Before '05, there's not a lot of electronic trades.
max_days = pd.Timedelta(days=365 * 5.0)

# Frequency. 'D' for calculating eta once/day, 'M' for once/month.
eta_freq = 'M'

# If False, fit abs(bar_returns) = eta*sqrt(pct_of_ADV)
# If True, fit abs(bar_returns) = eta*volatility*sqrt(pct_of_ADV)
include_volatility = True
In [2]:
# Computes eta for a given root symbol, as described above
# This can take a while to run
def compute_eta(symbol):
    cf = continuous_future(symbol, offset=0, roll='volume', adjustment='mul')
    cf_end_date = cf.end_date - pd.Timedelta(days=1)
    cf_start_date = max(cf_end_date - max_days,
                        cf.start_date,
                        pd.Timestamp('2002-02-01', tz='UTC'))

    if cf_start_date > cf_end_date:
        return None

    market_data = history(
        cf,
        fields=['price', 'volume'], # We want the forward-filled price
        frequency='minute',
        start_date=cf_start_date + pd.Timedelta(hours=12),
        end_date=cf_end_date + pd.Timedelta(hours=12)
    )

    market_data.index = market_data.index.tz_convert('EST')

    # Get daily volume & close
    daily_volume = market_data.volume.resample('1D').sum().dropna()
    daily_close = market_data.price.resample('1D').last().dropna()
    daily_returns = daily_close.pct_change().dropna()

    # Calculate 20 day volatility and ADV. Only include previous days.
    vol_20d = daily_returns.rolling(21).agg(lambda x: empyrical.annual_volatility(x[:-1]))
    adv_20d = daily_volume.rolling(21).agg(lambda x: np.mean(x[:-1]))

    # Calculate regression variables
    market_data['returns'] = market_data.price.pct_change()
    market_data['abs_returns'] = market_data.returns.abs()
    market_data['poadv'] = market_data.apply(lambda x: x.volume / daily_volume[x.name.normalize()]
                                             if x.name.normalize() in daily_volume else np.nan, axis=1)
    market_data['sqrt_poadv']= market_data.poadv.map(np.sqrt)
    market_data['vol_20d'] = market_data.apply(lambda x: vol_20d[x.name.normalize()]
                                               if x.name.normalize() in vol_20d else np.nan, axis=1)
    market_data['vol_x_sqrt_poadv'] = market_data.vol_20d * market_data.sqrt_poadv

    # Delete rows where the previous row wasn't 1 minute earlier
    # We only want to capture minute-to-minute price changes
    index_series = market_data.index.to_series()
    rows_to_drop = index_series[index_series.diff() != pd.Timedelta(minutes=1)].index
    market_data = market_data.drop(rows_to_drop)

    # If volume is 0, the price shouldn't change
    market_data = market_data[market_data.volume > 0]

    if include_volatility:
        market_data = market_data.dropna(subset=['sqrt_poadv', 'abs_returns', 'vol_20d'])
    else:
        market_data = market_data.dropna(subset=['sqrt_poadv', 'abs_returns'])

    sample_size = len(market_data)
    print symbol, 'samples:', sample_size
    if sample_size < 2: # Confirm the sample size is large enough
        return None

    etas = []
    days = pd.date_range(cf_start_date + pd.Timedelta(days=60), cf_end_date, freq=eta_freq)
    for day in days:
        sample_market_data = market_data[day - pd.Timedelta(days=60):day]
        if len(sample_market_data) < 2:
            return None
        if include_volatility:
            x = sample_market_data.vol_x_sqrt_poadv.values
            y = sample_market_data.abs_returns.values
            x = x[:,np.newaxis]
            slope, _, _, _ = np.linalg.lstsq(x, y)
        else:
            x = sample_market_data.sqrt_poadv.values
            y = sample_market_data.abs_returns.values
            x = x[:,np.newaxis]
            slope, _, _, _ = np.linalg.lstsq(x, y)
        etas.append(slope)

    return etas, sample_size
In [3]:
all_etas = {}
all_sample_sizes = {}

# Iterate through each future
for symbol in root_symbols:
    try:
        res = compute_eta(symbol)
        if res is not None:
            all_etas[symbol] = res[0]
            all_sample_sizes[symbol] = res[1]
    except Exception as e:
        print symbol, ':', e

# Mean eta for each symbol
mean_eta = {symbol: np.mean(etas) for symbol, etas in all_etas.iteritems()}
HG samples: 1546475
HO samples: 1086430
GC samples: 1724718
In [4]:
# Only take etas with a large enough sample size
# Also compute a sample-size weighted mean eta, as a fallback for thinly-traded names

min_sample_size = 10000

clean_etas = {}
weighted_sum = 0
total_weight = 0
for symbol, eta in mean_eta.iteritems():
    ss = all_sample_sizes[symbol]
    if eta < 1: # The result must be reasonable
        weighted_sum += eta * ss
        total_weight += ss
        if ss > min_sample_size:
            clean_etas[symbol] = eta

grand_mean_eta = weighted_sum / total_weight

print 'Symbol etas:', clean_etas
print 'Default eta: ', grand_mean_eta
Symbol etas: {'GC': 0.04891437779901682, 'HG': 0.052224109334103576, 'HO': 0.045047038237878125}
Default eta:  0.0491247730894
In [ ]: