4 min read
Alternative Data can help make accurate KPI predictions but investors should beware of overfitting
Founder & CEO
An explosion in the number of Alternative Data sources has fueled a sort of “quantification” of traditional investment managers and produced a new source of Alpha for data-savvy managers: predicting KPI surprises based on developing a data edge.
To accurately predict KPIs managers rushed to onboard new and more exotic data sources, but herein lies a problem. Alternative Data is different from traditional economic and market data and you can’t treat them the same when building forecast models. Analysts who are relatively new to Alternative Data are tempted to build complicated multi-factor models with factors and factor weightings optimized for R-squared. However, these models often prove difficult to explain to the PM and constantly disappoint in the real world.
We discuss this in detail in our recent eBook: Simplicity Over Complexity: The Hidden Risks of Multi-Factor KPI Models Using Alternative Data.
Unlike time series data in traditional econometrics – where all sources are in relative agreement on key indicators such as inflation or share prices, Alternative Data presents a unique challenge when choosing your variables as basis for your forecasting models. Namely, Alternative Data can be cut in many ways, yielding dozens of potential variations of independent variables in your models.
For example, let’s focus on spending data sourced from credit card transactions. Investors often segment transactions by debit vs. credit cards, or offline vs. online spending. In addition, you can utilize different outlier detection models, different panel normalization techniques, different panel demographics (low/high income, geographies), different aggregation techniques (consistent shoppers vs. total spend) and so forth. Unlike traditional data sources, where you would have one or just a handful of series to test against, with Alternative Data you have innumerable independent variables stemming from a single data source because of the multitude of possible fields and dimensions available in this rich data.
Keep in mind that for most investors, a single source of data is almost never enough – it’s more like a data mosaic made of spending data, geolocation data, satellite imagery, click stream, app data (and more). Each of these unique data sources come with their own unique dimensions and flavors, further adding to the complexity of model building. These two problems can end up compounding each other: limited observations and numerous factors to test can flip your model on its head.
Critically, these are not standard problems faced in hard sciences or in classical econometric models that study relatively stable relationships with very clean and mature data inputs. Importing approaches from more established domains is not ideal due to the reasons discussed, but that’s precisely the approach many data analysts and investors new to Alternative Data end up taking. This often leads to subpar performance even when the models look great on paper.
In our latest eBook, we discuss the reasons why investors are better served with models based on intuitive factors with causal relationships to the modeled KPIs: Simplicity Over Complexity: The Hidden Risks of Multi-Factor KPI Models Using Alternative Data.