Notebook

Log Returns

By Delaney Mackenzie

Log returns are commonly used in quant research as real data tends to be distributed with a fat tail. This is common in self-reinforcing/autocorrelated systems like financial markets, but the result is that normal distributions will heavily underestimate the likelihood of rare events. See our full lecture on this here:

https://www.quantopian.com/lectures/autocorrelation-and-ar-models

Some stock prices are closer to a normal distribution when log transformed, so in the course of doing research it can be helpful to log transform your data before fitting models. Remember that at the end of the day prices are still prices, so don't assume just because log transformed returns are well behaved that you're not vulnerable to tail events.

Also, many stock prices are not log-normally distributed.

Warning

In this example, you'll see that the asset is not truly log-normal, instead it was just a smaller sample size that caused it to pass the test.

This notebook is just a quick piece showing how to log transform prices.

In [1]:
import numpy as np
import pandas as pd

# This is a plotting library for pretty pictures.
import matplotlib.pyplot as plt
In [2]:
# Research environment functions
from quantopian.research import returns, log_returns, symbols

# Select a time range to inspect
period_start = '2012-01-01'
period_end = '2012-06-01'

# Query returns data for AAPL
# over the selected time range
R = returns(
    assets=symbols('XLE'),
    start=period_start,
    end=period_end,
)

log_R = log_returns(
    assets=symbols('XLE'),
    start=period_start,
    end=period_end,
)

# Display first 10 rows
R.head(10)
Out[2]:
2012-01-03 00:00:00+00:00    0.027352
2012-01-04 00:00:00+00:00    0.002955
2012-01-05 00:00:00+00:00   -0.006034
2012-01-06 00:00:00+00:00   -0.004666
2012-01-09 00:00:00+00:00    0.004403
2012-01-10 00:00:00+00:00    0.009051
2012-01-11 00:00:00+00:00   -0.014017
2012-01-12 00:00:00+00:00   -0.008370
2012-01-13 00:00:00+00:00   -0.004731
2012-01-17 00:00:00+00:00    0.006617
Freq: C, Name: Equity(19655 [XLE]), dtype: float64

Let's look at the data distribution.

In [3]:
plt.hist(R, bins=20)
plt.xlabel('Return')
plt.ylabel('Observations');

Let's also run a statistical normality check.

In [4]:
from scipy.stats import normaltest
In [5]:
significance_level = 0.05

result = normaltest(R)
if result.pvalue < significance_level:
    print('Data likely not normally distributed.')
else:
    print('Data likely normally distributed.')
Data likely normally distributed.

Log transforms are NaN on negative data, so we have to use rational returns instead of percent returns.

In [6]:
np.log(R).tail()
Out[6]:
2012-05-25 00:00:00+00:00         NaN
2012-05-29 00:00:00+00:00   -4.157503
2012-05-30 00:00:00+00:00         NaN
2012-05-31 00:00:00+00:00         NaN
2012-06-01 00:00:00+00:00         NaN
Freq: C, Name: Equity(19655 [XLE]), dtype: float64
In [7]:
rational_R = R + 1
In [8]:
np.log(rational_R).tail()
Out[8]:
2012-05-25 00:00:00+00:00   -0.003063
2012-05-29 00:00:00+00:00    0.015525
2012-05-30 00:00:00+00:00   -0.031296
2012-05-31 00:00:00+00:00   -0.008922
2012-06-01 00:00:00+00:00   -0.024190
Freq: C, Name: Equity(19655 [XLE]), dtype: float64

However, we got log returns already from the built in method, so we'll just use those.

In [9]:
log_R.tail()
Out[9]:
2012-05-25 00:00:00+00:00   -0.003063
2012-05-29 00:00:00+00:00    0.015525
2012-05-30 00:00:00+00:00   -0.031296
2012-05-31 00:00:00+00:00   -0.008922
2012-06-01 00:00:00+00:00   -0.024190
Freq: C, Name: Equity(19655 [XLE]), dtype: float64
In [10]:
plt.hist(log_R, bins=20)
plt.xlabel('Return')
plt.ylabel('Observations');
In [11]:
significance_level = 0.05

result = normaltest(log_R)
if result.pvalue < significance_level:
    print('Data likely not normally distributed.')
else:
    print('Data likely normally distributed.')
Data likely normally distributed.

WARNING AGAIN

Often though, returns will still not be normally distributed even with a log transform. Don't apply this blindly without checking.

Here we see that just by expanding the window and gathering more data, the test gains more power and differentiates the returns distribution we observe from a normal one. It would seem that the true process was likely not normal in the first place, it was just that we had too few samples to realize this.

In [12]:
# Select a time range to inspect
period_start = '2012-01-01'
period_end = '2016-01-01'

# Query returns data for AAPL
# over the selected time range
R = returns(
    assets=symbols('XLE'),
    start=period_start,
    end=period_end,
)

rational_R = R + 1

log_R = np.log(rational_R)
log_R.tail()
Out[12]:
2015-12-24 00:00:00+00:00   -0.009053
2015-12-28 00:00:00+00:00   -0.018686
2015-12-29 00:00:00+00:00    0.007089
2015-12-30 00:00:00+00:00   -0.013729
2015-12-31 00:00:00+00:00    0.004818
Freq: C, Name: Equity(19655 [XLE]), dtype: float64
In [13]:
plt.hist(log_R, bins=20)
plt.xlabel('Return')
plt.ylabel('Observations');
In [14]:
significance_level = 0.05

result = normaltest(log_R)
if result.pvalue < significance_level:
    print('Data likely not normally distributed.')
else:
    print('Data likely normally distributed.')
Data likely not normally distributed.

Let's use pipeline to get a whole bunch of stocks returns.

In [15]:
# Pipeline imports
from quantopian.research import run_pipeline
from quantopian.pipeline import Pipeline
from quantopian.pipeline.factors import Returns

# Pipeline definition
def make_pipeline():

    returns = Returns(window_length=2)

    return Pipeline(
        columns={
            'daily_returns': returns,
        },
    )

# Pipeline execution
data_output = run_pipeline(
    make_pipeline(),
    start_date='2012-1-1',
    end_date='2013-1-1'
)
In [16]:
data_output = data_output.unstack()
data_output = np.log(data_output+1)
data_output.head()
Out[16]:
daily_returns
Equity(2 [ARNC]) Equity(21 [AAME]) Equity(24 [AAPL]) Equity(25 [ARNC_PR]) Equity(31 [ABAX]) Equity(39 [DDC]) Equity(41 [ARCB]) Equity(51 [ABL]) Equity(52 [ABM]) Equity(53 [ABMD]) ... Equity(43746 [GHY]) Equity(43747 [COR_PRACL]) Equity(43802 [NSP_WD]) Equity(43803 [SSW_PRD]) Equity(43804 [TCF_PRCCL]) Equity(43805 [RPAI_PRACL]) Equity(43806 [RIOM]) Equity(43835 [RDHL]) Equity(43836 [STI_PRECL]) Equity(43853 [XNET])
2012-01-03 00:00:00+00:00 0.002315 -0.009598 -0.000272 NaN -0.000722 0.021854 -0.001038 NaN -0.008207 0.004885 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2012-01-04 00:00:00+00:00 0.064900 0.000000 0.014974 0.015100 0.018960 0.064597 0.005695 0.081964 0.018620 -0.013628 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2012-01-05 00:00:00+00:00 0.024613 -0.007132 0.005676 -0.004725 -0.004973 -0.010629 0.017908 0.052348 -0.011566 -0.006608 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2012-01-06 00:00:00+00:00 -0.009559 0.036641 0.010873 0.010990 -0.002853 0.019401 0.008081 NaN -0.021064 0.009348 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2012-01-09 00:00:00+00:00 -0.022667 -0.024446 0.010685 NaN -0.002861 -0.025654 0.021891 NaN 0.007398 -0.006590 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 8647 columns

In [17]:
data_output.shape
Out[17]:
(251, 8647)

Warning

This will incur a lot of multiple comparisons bias.

https://www.quantopian.com/lectures/p-hacking-and-multiple-comparisons-bias

In [18]:
num_assets = data_output.shape[1]

num_normal = 0

for i in range(num_assets):
    # Get the series for the asset
    log_R = data_output.iloc[:,i]
    result = normaltest(log_R)
    if result.pvalue >= significance_level:
        num_normal += 1
In [19]:
print 'The percent of stocks which are likely normally distributed: %s%%' %(float(num_normal) / num_assets * 100)
The percent of stocks which are likely normally distributed: 12.2470220886%

This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. ("Quantopian"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the information contained herein, Quantopian, Inc. has not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian, Inc. at the time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.