The data obtained from the CBOE is messy and incomplete. It takes a great deal of time for even the experienced nerd to sift through it, understand it, correct it and put it into a usable format.
One problem of many is that the data contains not just the symbol you ordered but odd days here and there from other symbols. Some strikes are duplicated. Some expirations start and then go nowhere, so you need to exclude them. And so on.
The code below assumes you already have a csv file of data from the CBOE and want to update it with a couple of years of further data – 2017 and 2018.
Oddly, the CBOE data does not give you futures prices. Oddly because the VIX options are in fact options on futures, not on the cash VIX index. “Moneyness” should therefor arguably be based off the futures, not the cash VIX. Hence I went through a long rigmarole of imputing futures prices from the parity of put and call strikes and adding them to the data series. Only to discover that it was not that useful and you may as well measure “moneyness” off the cash price.
I then separated the data into puts and calls files for later processing.
The later processing comprises of taking the put and call files and choosing strikes related to the level of cash VIX and stringing the chosen strikes together into monthly series for later back testing.
Obsessive nerds in the financial markets and those familiar with Python may find some interest in this post. Others are advised to walk on by, swiftly, less they get corrupted.
Read the code in the display below or go directly to Gist.