The Phantom Pattern Problem: The Mirage of Big Data
Gary Smith and Jay Cordes
Abstract
Pattern recognition prowess served our ancestors well. However, today we are confronted by a deluge of data that are far more abstract, complicated, and difficult to interpret than were annual seasons and the sounds of predators. The number of possible patterns that can be identified relative to the number that are genuinely useful has grown exponentially—which means that the chances that a discovered pattern is useful is rapidly approaching zero. Coincidental streaks, clusters, and correlations are the norm—not the exception. Our challenge is to overcome our inherited inclination to think tha ... More
Pattern recognition prowess served our ancestors well. However, today we are confronted by a deluge of data that are far more abstract, complicated, and difficult to interpret than were annual seasons and the sounds of predators. The number of possible patterns that can be identified relative to the number that are genuinely useful has grown exponentially—which means that the chances that a discovered pattern is useful is rapidly approaching zero. Coincidental streaks, clusters, and correlations are the norm—not the exception. Our challenge is to overcome our inherited inclination to think that all patterns are meaningful.Computer algorithms can easily identify an essentially unlimited number of phantom patterns and relationships that vanish when confronted with fresh data. The paradox of big data is that the more data we ransack for patterns, the more likely it is that what we find will be worthless. Our challenge is to overcome our inherited inclination to think that all patterns are meaningful.
Keywords:
pattern recognition,
big data,
self-selection bias,
randomized controlled trial,
backtesting
Bibliographic Information
Print publication date: 2020 |
Print ISBN-13: 9780198864165 |
Published to Oxford Scholarship Online: October 2020 |
DOI:10.1093/oso/9780198864165.001.0001 |
Authors
Affiliations are at time of print publication.
Gary Smith, author
Pomona College, Fletcher Jones Professor of Economics
Jay Cordes, author
Data Scientist
More
Less