Backtesting With Zipline Ii
Introduction
In this post, we play again little bit around with python and the pandas-library. You may want to read the first part of this series. There we have backtested a simple crossing moving average strategy in pandas. We had a long/slow moving average over the last 40 days and a fast/short moving average over the last 20 days. When the stock price rockets skywards, the short moving average is above the long moving average. Then we will buy. If the short moving average crossed the long moving avergae again from above, we sold. If you remember, or if you have read the article again, we took most of the upturns but then sold too late.
Here you have a nice image of the averages on the price of amazon stocks.
We reinvested everything back and plotted the returns which looked like the following.
So, how can we sell sooner and take all the money, so that we will be rich and bathing in gold like the mighty Dagobert Duck? Not so fast. First we will rewrite the strategy in zipline. This is an event-based framework for backtesting trading strategies. Last time we used pandas, which kind of sucked, since we could get into issues with look-ahead-bias. In zipline this is impossible. It works approximately like this: Each price change is fed to the trading algorithm as an event and the algorithm can decide what it wants to do. Thus it is slower, since it cannot compute the events in parallel, but on the other hand we will only trade on the old prices and not look into the future. Looking into the future leads to bugs you will never find and possibly also not even notice.
If we use zipline it will be easier to tweak with the algorithm later and, for example, set stoplosses to sell earlier.
The Code
The code is overly documented, since I do not really have experience
with Zipline and so this may also serve as a reference for future
programming adventures.
A zipline algorithm consists mainly of two functions: initialize
and
handle_data
. initialize
is called only once at the start of the
trading algorithm. Here we need to setup everything. The function
handle_data
is executed at each bar. This is the event-based
approach I mentioned earlier. Due to this it will be impossible to
accidentally look into the future and introduce hard to find errors.
If you run this code it will be confused as there are not data sources. To me it said:
ValueError: no data for bundle 'quantopian-quandl' on or before 2016-08-13 10:50:03.317389+00:00
maybe you need to run: $ zipline ingest quantopian-quandl
Though this suggestion did not work:
(venv) me@localhost:~$ zipline ingest quantopian-quandl
Usage: zipline ingest [OPTIONS]
Error: Got unexpected extra argument (quantopian-quandl)
Instead I had to run zipline ingest -b quantopian-quandl
, to say
that it is a bundle.
Results
Here is the resulting value of our portfolio:
and here the computations for the short and long moving average for debugging purposes:
You can see that the stock values are like in our simulation with pandas. So we are not completely off the track.
You may notice that our portfolio on the other hand does not look at all like our previous result. This is due to multiple factors: First we have a much more realistic backtester. In zipline I check if AMZN is tradable at that day, which I did not do in pandas. Further Zipline also handles the Slippage. I do not get the orders filled immediately but instead my orders have an effect on the order book. Zipline simulates this and thus destroys our toy example even more. In the pandas simulation we got the trading data from Yahoo Finance, whereas now we get it from quandl.
An of course there may be simple programming mistakes. In particular I wonder why we do not have the plateaus anymore like in the previous
Addendum
I just realized that in the previous post we did not sell when the short moving average was below the long moving average. Instead we only liquidated the position. That explains the missing plateaus in the PnL graph.