Data is the lifeblood of markets.
At the most basic level, a market is nothing more than the most up to date information about the best price someone is willing to buy for and the lowest price someone is willing to sell at. With only three cells in a spreadsheet you have a ticker, a bid price, and an offer price.
Things escalate quickly from here. Size is a logical next step. How many shares of TWTR are available to buy at $54.20? (All of them!) With fragmented equity markets, we also have to ask ‘where’ you can buy them - NYSE? BATS? Barclays LX? Now your data spec is twice the size it originally was, and that’s only for the top of book.
Last trade is often added so investors know what and when the last print was. Two dollars and two days out means very little; within a penny and a minute gives a relatively strong indication of value. For an options quote you might want to know what the underlying was at the time, or some derived component like the implied volatility. The list goes on…
Market data is not just necessary for first order reasons (you can’t trade if you don’t know the price), but moreover as a saleable value proposition it underpins the exchange businesses. My simple mental model for publicly listed exchanges is that they build fancy computer systems and matching engines to pair trades, yet what they really monetize is the data about those trades.
Transactions are but a volume game with lots of footnotes. While the operations are technically complex in a highly regulated environment, ultimately pairing orders is commoditized and margins are competed away.
What trading really means is eyeballs, and participants to buy their higher value offerings. The value add of data products is nebulous and saleable, so services come with a much bigger price tag. They accounted for almost twice as much revenue at Nasdaq last quarter.
Beyond the exchanges, there are numerous market participants that are creating their own data sets and analyzing the markets in different and interesting ways. Those businesses themselves have become easy acquisition targets for exchanges (see Quandl at NASDAQ or CBOE acquiring Trade Alert, Hanweck, and FT Options.)
Institutions are the prime customers, and they gobble up everything from implied vol percentiles to figures on retail website traffic looking for an incremental edge. Increasingly retail has joined this party, and services offering gamma based prediction have become popular as more traders flock to the options markets.
The cost of real time data has continued to drop, even as the number of symbols and trade count increases. Buying historical options quotes was measured in the tens of thousands of dollars a decade ago, and now all of history can now be had for a fraction of that. If you have the capacity to process and store them, one minute updates of every option series can be yours for only $399 a month.
As the cost of base level data continues to drop, both the breadth and value of analysis expand. More players are able to access the data as costs drop, bringing unique insights to the table. (That’s a SQL joke.) Because we are awash in data, we need both a good method to synthesize it, and a good nose for when something’s rotten.
If I told you the straddle price (cost of an ATM call + ATM put) in the SPX was $75 today, that likely means very little. Even if you knew the expiration date and the current underlying price, the sum of two market prices communicates one specific thing, and requires unique knowledge to process.
On the other hand, if I told you that the VIX closed yesterday at 23.93, I bet that has a lot more meaning. You know that 23 is relatively high, but not as high as the readings in the 30 last month when the market was 10% lower. Or that even 30 isn’t as bad as the 80+ print we saw when a pandemic rattled the global economy. This statistic is way more useful for decision making.
The VIX is an index derived from options prices on the SPX. The calculation looks at raw market data, forged by billions of competing dollars, then weights, smooths, and interpolates these values to derive a single number that has independent context. Variations in the VIX are meaningful because the calculation standardizes options prices that naturally change as time passes.
Because the VIX is always measuring a constant thirty day expiration, there’s no need to ask if that $75 SPX straddle expires tomorrow or next week. VIX levels today are comparable to ten years ago, but straddle prices are not. And further, by looking at a full strip of options, the index calculation gains dimension.
In addition to abstracting context, properly derived data can provide nuanced or layered meaning. A simple snapshot of the ATM option price - even if weighted at a constant term - won’t tell you anything about the implied kurtosis of the option price distribution. In other words, the ATM option tells you how much the market is expected to move, the downside skew tells you how “fat” the left tail will be when things break. The VIX takes both of these into account.
One of the main techniques used in deriving useful statistics about options markets is interpolation. While there are literally millions of options prices available across strikes and expirations, sometimes they might not be *exactly* what you’re looking for, and for accurate comparisons across time there needs to be some normalization.
Continuing with the VIX example, to calculate a constant 30 day expiration, first the data gets scrubbed in a couple of different ways. Only weekly expirations are used in this calculation, and only those between 23 and 37 days out. “No-bid” options are dropped, so that only quotes in series with two sided markets are considered.
The prices of the different series are then weighted according to their distance from the ideal thirty day value. The quotes for an expiration 32 days out will have greater weight than the expiration 25 days out. As time passes, both of those options will decay a little bit, and the weighting will shift slightly more in favor of the expiration that is now only 31 days out. This well designed methodology is both robust and intuitive.
Interpolation also has practical use cases for market participants. When new expirations get listed, liquidity providers need to come up with a new implied volatility curve before any trading happens. One of the simplest and best ways to do this is simply take a weighted average of the values on either side of the new unknown.
Where the theoretical curve hits the stock market open is the ultimate bullshit detector. The calculations that look pristine on paper, face the ultimate test of cutthroat capitalism. No one points out your miscalculations better than a counterparty.
One of the dangers of interpolation is that the texture of volatility distribution might not be uniform between those two known values. If there is a major event, depending on the timing it could skew implied vol differently than a simple weighted average. As soon as that statistical estimate becomes a tradeable market, you quickly get a better idea.
It’s also important to be careful with exactly what kind of value is being interpolated. Implied volatility is good because it’s standardized across time dimensions. Options prices or their resultant greeks are not as clean, because they behave non-linearly.
As we squeeze numbers harder and harder, it’s important to keep centered on the true north. Data manipulation is full of pitfalls. Layers of analysis might generate deep insight but there are so many potential traps along the way. A pure test of something’s accuracy and importance comes when it becomes exposed to market forces.
While the VIX is powerful, you can’t buy a share or contract on the actual index. There is an arcane complex of products to provide exposure to the value of the VIX on a given day (futures), the probability of that value (options), or daily changes in VIX value (ETPs like SVIX or UVIX). It’s pretty actually complicated to just “get long VIX”, and that’s the most tradeable (non-equity index) market indicator.
Unlike the VIX, most derived statistics don’t have anything approaching liquid markets. If there isn’t a market with capital on the line, you always have to be suspect of the statistic’s significance (distinct from its statistical significance). Well designed products let market participants express their opinions on specific outcomes, and capture information that arithmetic can’t.
The outgoing twin of interpolation is extrapolation. Perhaps a new month being listed is the furthest out term, or a downside strike has no lower neighbor to compare against. Projecting into the unknown is difficult, and naïve extrapolation can be dangerous.
In vol space, a new strike or month opens up an entirely different can of worms. A well parameterized model on currently listed strikes makes it easy to price the next lowest strike. But if the level happens to have significance that is exogenous to the current volatility structure, only a tradeable market can tell you that.
A good example would be if the new strike happens to be a knock-out level for a convertible or debt financing. If the underlying follows that path, the volatility might pick up significantly more than parameterized extrapolation would imply. Smart money will buy the strike, and the market will bend to incorporate that new information.
The quantities of data are multiplying as fast as our techniques to analyze them. Machine learning and AI will identify patterns and relationships that would have otherwise been beyond our grasp. That’s why it’s extremely important to use them intentionally and consciously. As beautiful as the aurora borealis is, it’s not a guiding light. You make money from what the market says, not what it’s supposed to say according to your data.
The title for this post is a nod to one of the oldest pieces of global market infrastructure - the transatlantic wire laid between New York and London in the mid 19th century to trade currencies. The “cable” is still a slang term for the Pound / Dollar exchange rate. If rapidly expanding American businesses wanted to buy the world’s Sterling reserve currency, they had to go through that pipe.
The cable is the source of truth, because that’s where the market trades. The most representative information comes right there, and everything else is a statistic. You can crunch Big Mac index data to come up with an accurate price of a quarter pounder in pounds, but the most true data about the market is where you find liquidity.