By Aman Adukoorie
In 2006, Clive Humby, a British mathematician and data scientist, coined the phrase “data is the new oil”. Humby explained that the large scale extraction and utilisation of oil spurred the industrial revolution, ushering in an era of prosperity. However, in order for the economic gains from oil to be realised, oil had to be refined, converted into derivative products like natural gas and coupled with other innovations like the combustion engine.
Similarly today, an unprecedented amount of data is being produced. According to Rivery, a data infrastructure company, over 147 zettabytes of data was produced globally in 2024. This data could potentially yield significant economic value, but just as with oil, for the value of data to be harnessed it will need to be refined, transformed and coupled with other technological innovations.
Hedge funds, investment firms that manage money on behalf of their investors, have been at the forefront of using big data. The performance of a hedge fund is typically assessed not just by the return on investment they provide but by their ‘Alpha’, that is the additional returns they provide relative to the average returns of the market. Often when a hedge fund discovers a successful investment opportunity or trading strategy, other funds quickly emulate their behaviour, eroding their Alpha, a phenomenon known as crowding.
While most hedge funds have traditionally relied on financial data like stock prices and balance sheets to inform their investment decisions, some funds are beginning to use non-standard sources of data such as credit card transaction data, satellite imagery and social media posts. These sources, known as alternative data in the industry, can provide investors with unique insights into consumer behaviour and macroeconomic trends, allowing them to stay ahead of their competitors. Furthermore, recent advancements in AI have allowed funds to quickly and efficiently extract insights from these data sources.
Monitoring Consumer Behaviour
Consumer data such as credit card statements can offer valuable insights into the future prices of financial instruments. For example, if an analyst at a hedge fund had credit card statements from a large number of consumers, in any country, they could accurately estimate inflation rates before the official statistics are released.
This would give them an early perspective on the future movement of bond prices, given that the prices of bonds are closely linked with inflation. Additionally, if the analyst was able to aggregate and bucket the data in credit card statements into categories like consumer electronics or education, they would have an early view on how stocks in those industries will perform.
In order to extract insights from credit card statements, these statements will need to be collected at scale and converted into an interpretable format. While hedge funds can readily procure anonymised credit card data either directly from credit card providers or from third parties, the data they receive is usually messy. The acronyms used to label transactions are typically not standardised. Historically, in order to scan, standardise and tabulate data of this nature, programmers would need to explicitly specify how each of the various formats of acronyms must be interpreted.
Developing such programmes is a time-consuming and tedious task. Fortunately, modern LLMs can perform this task effectively. Since LLMs learn from statistical patterns rather than rigid rules, they can recognise that transactions labelled ‘Uber’, ‘UBR*Trip’ or ‘Ubr’ are all Uber rides, then relabel and aggregate them. A large collection of transaction data processed by LLMs can provide a hedge fund with a clean, aggregated dataset of consumer data which can be used to predict trends in asset prices.
Forecasting the Macroeconomic Environment
The onset of COVID was a challenging period for investors. When the virus broke out in Wuhan, investors were unsure how the pandemic would affect the macroeconomic environment. Many investors felt that the official Chinese government figures on infections and deaths were unreliable and could not be used to accurately assess the severity of the outbreak.
It was also clear that the lockdowns that were announced at the time would have an impact on global supply chains; however, there would be a delay before the official trade statistics were released. The US Bureau of Economic Analysis’s quarterly trade report, for example, is only released two months after the end of each quarter. Investors wanted reliable real-time data about COVID transmission rates and trade statistics, and some turned to alternative data sources.
Funds began to buy datasets of satellite images of ports and large corpuses of social media posts scraped from social media sites. Using AI-based computer vision models, investors could analyse satellite images of ports and count the number of cargo ships leaving, entering or docking at the port on a given day. These figures could be compared with data from the previous weeks or months, allowing hedge funds to measure the extent to which supply chains were being disrupted.
Chinese social media posts also offered a view into the early dynamics of COVID. Hedge funds could buy large corpuses of social media posts where users were often posting about COVID symptoms and cases. Using AI-based language models, they could scan these corpuses and identify posts discussing symptoms or cases. Since social media posts are tagged by location and date, funds could then build a time-series dataset tracking mentions of cases by location.
After feeding this data into epidemiological models, hedge funds could assess the true transmission rates of the virus. Armed with the insights extracted from satellite imagery and social media posts, investors would have an early view on supply chain disruptions and the severity of COVID, allowing them to make informed investment decisions before their competitors.
Just the Beginning
Investors have always looked for data that could give them an edge in the markets. Grain speculators in ancient Babylonia would wait at the docks to observe the flags on ships entering the ports. This gave them an early view of the prices of grains in the market that day. Since then, the space has come a long way.
Hedge funds are now using AI to analyse large and diverse datasets. There is more to come. Neudata, an alternative data provider, recently reported that 95% of hedge funds were planning to maintain or increase their budgets for alternative data in 2025. As AI grows more sophisticated and new streams of alternative data continue to emerge, investors will be relying on alternative data more to stay ahead in the markets.
The author is an experienced quantitative analyst. Views are personal.

