Select Investor Profilechevron

What type of investor are you?

Individual Investor
Institutional Investor
phone iconContact

How to Extract Value from Data

Pink Cloud

C WorldWide has been in a strategic partnership with Harbor Capital, Inc. for almost three years. C WorldWide has published thought leadership on several relevant topics, including this piece focusing on how to extract value from data. Harbor is pleased to share C WorldWide’s thinking for your portfolio construction consideration.

Data has shaped human decisions long before computers, the internet, and AI (Artificial Intelligence) emerged. According to 365 Data Science (365datascience.com) in April 2024, the first evidence of data collection and analysis dates back to 18000 BC. In modern-day Congo, the Paleolithic tribespeople marked lines into bones. The bones, known as Ishango bones, were used to monitor trading activity and forecast the longevity of food supplies. As societies evolved, libraries became attempts at mass data storage. Fast forward to the 21st century and the data explosion, where AI and Large Language Model (LLM) are expected to be catalysts for extracting even more value out of data. No doubt, data is likely to continue to grow rapidly, but the big question is which companies will be able to extract long-term value from the data?

In our analysis, we differentiate between pure data companies and companies exploiting data to complement their existing business. Among the pure data companies, we distinguish between “Standard Bearers” and “Data Librarians.”

We conclude that while some companies are favorably positioned to capture long-term value from a data-driven business model, for the majority, data will be a prerequisite to remain competitive and not necessarily a long-term driver of shareholder value.

Data is a Growing Industry

The amount of data libraries could store was constrained by physical limitations. Scarce capacity made data expensive to store, while its physical nature made it difficult to utilize. Only with the invention of magnetic tape, semiconductors, and computers did it become possible to store data digitally. By 1996, digital storage had become more cost-effective than paper. The amount of data produced grew exponentially with the gradual adoption of the internet, sensors, and digital tools. Data became a byproduct of everything, generated from digital and physical activity in transport, production facilities, and weather patterns. As shown in Figure 1, digital data has grown by 60% annually since 1999 and 48% since 2006. With lower processing costs, companies became better at managing and using data, increasing demand for storage and tools for managing data. Industry experts expect the amount of data produced to grow 10x between 2020 and 2030.

Evolutions in data have made certain business models irrelevant, enabling new ones to emerge and others to evolve. While most companies use data, only a few have built a profitable business model around monetizing the data itself or the insights it provides. Platform companies like Alphabet, Amazon, and Meta have successfully capitalized data through targeted advertisement. Other businesses have been accumulating large amounts of proprietary and trusted data for decades. As data became digital, the latter companies could extract more knowledge from the same data. Furthermore, some traditional industrial companies have pivoted towards extracting data and knowledge from their large installed base, installing sensors, and building intelligent solutions, a theme we call “The intelligent tangible world.”

Figure 1: An Explosion of New Digital Data

Figure 1: An Explosion of New Digital Data

Source: Michael Lesk (1998), Peter Lyman (2000), The Economist (2010), Wonder (2012), and Statista (2024).

Note: Because there is no unambiguous way to measure the size of digital information, definitions vary throughout time and may be either inconclusive or estimates. 1 Exabyte = 1.000.000.000.000 MB or +222.000.000 full-length movies.

Data is Abundant, but Not All Can Gain a Sustainable Competitive Advantage

Data collection and interpretation can help improve decisions and make resource allocation more efficient. This trend has only continued with stronger computing power. Although data is important, it is not the new oil. Only a few companies can process data at a scale or with a brand that increases entry barriers and cements permanent competitive advantages. Like oil, data will likely fuel the digital economy. That doesn’t mean every company will be an oil company. Rather, data is more like sand. It is available almost everywhere, but only a few companies can make it into silica, the raw material of silicon used in semiconductors.

AI and LLMs do not change this. Businesses should use data to improve efficiencies and outcomes. While more data is generally better, and LLMs boost the capacity by which data can be extracted and analyzed, the degree of obsession and expectations are likely to disappoint. For most companies, advantages gained through data will not provide comparative advantages. They will provide efficiencies, but competitors will likely mimic those. Like the effects data analytics had on basketball (see Case Study box), once every team optimized their playing style, all teams returned to a similar baseline. This is also true for the effect data will have on most companies. In certain cases, however, one company is the sole or dominant provider of relevant industry data. These companies can earn a sweet spot position and a toll for usage. We call these data companies.

"Data is like sand. It is available almost everywhere, but only a few companies can make it into silica, the raw material of silicon used in semiconductors."

Case Study: Basketball

Data can convey knowledge and change behavior.

Data and data analytics convey knowledge that can change behavior. These effects can be seen in everything from how businesses inform pricing decisions to how basketball is played. Kirk Goldsberry famously observed the drastic changes in how basketball was played between the early 2000s and the early 2020s. Spatial tracking and data analytics enabled detailed analysis of, for example, the rationale for attempting a shot behind the three-point line. The simple conclusion: the value of attempting to score three rather than two points outweighed the increased difficulty of moving a few meters out of the midrange zone. Data analytics changed basketball. Today, whole teams and offensive attacks are structured around the three-point shot.

Most common jump shot locations in the NBA

Data Companies Can Have Attractive Unit Economics

Data becomes valuable when the producer has unique access to valuable data or when consumers agree that data from one company is the single source of truth (i.e., it becomes the industry standard). Once established, these businesses often become monopolies or duopolies in their business area. Dominant data companies typically have broad distribution and manage a large amount of data, underpinning better offerings and stronger unit economics. This network effect cements high barriers to entry and winner-takes-most markets. As companies become dependent upon data, data companies become deeply entrenched in the workflow of customers through data analytics. This underpins the ability of these companies to effectively cross and upsell new products to customers, establishing attractive reinvestment opportunities.

Once data is gathered, cleaned, and optimized, it can be sold repeatedly without incurring additional costs. Additional sales generate nearly 100% incremental margins. While operating margins are initially low, significant operating leverage enables meaningful margin expansion as these companies scale. We monitor a select group of data companies. These have average gross margins of 69% and Free Cash Flow (FCF) margins of 30%, compared to 33% and 9% for the S&P 500, as illustrated in Figure 2 below.

Figure 2: Gross Margin and FCF Margin, Data Companies

Figure 2: Gross Margin and FCF Margin, Data Companies

Hypothetical for illustrative purposes only

Performance data shown represents past performance and is no guarantee of future results.

Source: Bloomberg (2024) and C WorldWide, November 2024.

Standard Bearers and Data Librarians

Two types of data companies embody these characteristics — Standard Bearers and Data Librarians. Standard Bearers have a unique brand that makes their data more valuable because it has become accepted as an industry standard. This is like the meter, the standard through which distance is measured and communicated, or the Twenty-foot Equivalent Unit (TEU), which is the size of shipping containers. Data Librarians have unique access or processes around data collection. Their data is either impossible or very cumbersome to replicate.

Standard Bearers utilize various data sources to create benchmarks upon which whole ecosystems rely. For example, MSCI is the leading global provider of index data and analytics tools for the asset management industry. MSCI is often synonymous with measuring and describing the performance of global equity markets. While asset managers pay for MSCI data, asset owners, such as pension funds and endowments, require external managers to report against a benchmark they know and trust. It is, therefore, not the data itself that is valuable; it is the MSCI brand. The underlying data can be replicated, but the deep integration into contracts and workflows of asset owners, decades of consistency and brand building, and trust in their data specifically are challenging to replicate.

Two other Standard Bearers are Moody’s (credit ratings) and S&P Global. S&P utilizes credit information, stock prices, and other publicly available data, turning them into benchmarks. S&P is the Standard Bearer of how corporate credit quality is communicated (ratings), how the performance of the US equity market is tracked (S&P500), how commodity contracts are settled (Platts), and how vehicles are registered (Carfax). S&P has bolstered these standards with complementary analytics.

“Two types of data companies stand out: Standard Bearers, whose data gains value by becoming the industry standard, and Data Librarians, who have exclusive access or processes that make their data hard to replicate.”

Data Librarians have unique access to data. One such example is Verisk Analytics. In 1971, new regulations in the US required insurance companies to share data with state regulators in a standardized and accurate format. Rather than each building their own, 280 insurance companies consolidated their data aggregation and cleansing in a not-for-profit called ISO. Due to its initial success, insurers contributed claims and loss data to the consortium. This data was standardized and shared with all members, forming the basis of fraud detection, pricing decisions, and risk assessments. Verisk Analytics, established as the parent company of ISO in 2008, remained nonprofit until 1997 and became a public company in 2009. Starting as a data company, it developed analytical products utilizing the shared data. Today, Verisk Analytics operates in monopolies and duopolies within the US property insurance industry, banking, and energy markets, serving data and the analytics around it.

RELX is another example of a Data Librarian. RELX has unique and dominant data assets within various industries. RELX traces its origins to Elsevier, a news magazine founded in 1880. In five decades, Elsevier built and acquired industry-specific magazines and journals, ultimately becoming the largest Business-to-Business (B2B) publisher in Europe. RELX operates leading positions in oligopolistic data, or data of a few firms which tends to control the market, and business analytics markets within the legal industry, US auto insurance, banking, security service, aviation, medical research, academic publishing, and chemical pricing. The professional readers of the journals often contribute the proprietary data.

In 1971, Elsevier became the first company to store journal information in a computerized database. This transitioned Elsevier from a media company to an analytics company, charging customers for access to data and analytics instead of journals. Clients have few alternatives and rely on this data for critical decisions and operations; therefore, RELX can enjoy potential pricing power around the data.

In Table 1, we have listed some of the most prominent companies that can be characterized as Standard Bearers and Data Librarians.

Table 1: Examples of Prominent Data Companies

Standard BearersData Librarians
FICOExperian
MSCITransunion
S&P GlobalEquifax
Moody'sRELX
CMEVerisk
ICELSEG
GartnerDeutsche Börse
Nielsen HoldingsCostar
IQVIAFactSet
Morningstar
Visa
Mastercard
Meta
Alphabet

Source: C WorldWide, November 2024.

Data Companies Also Have Risks

Most data companies have existed for decades, even centuries, and built strong barriers around their right to win. Despite the seemingly insurmountable structural factors and dynamics underpinning their strength, history has shown that even these can fail. As Benedict Evens, an independent technology analyst and former partner at Andreessen Horowitz, puts it, a company’s structural competitive advantage can either be ordered by a king to be knocked down or what it protects may become irrelevant (Benedict Evens: “How to lose a Monopoly,” 2020).

The king, often a regulator, may have granted a company its strong position. If the company acts irresponsibly or anticompetitively, the king can choose to break down the barriers of protection. Dun & Bradstreet (D&B) was given a monopoly in 1996. In 1996, the federal government required all companies working with the government or filing certain documents to have a DUNS number. The DUNS number, operated by D&B, is an individual identification number for businesses, linking ownership and trade credit data to specific companies. This established DUNS as a standard. Many businesses adopted the DUNS number to determine if companies were eligible for loans and compliant business partners. This put D&B in a sweet spot. While it remained a standard, D&B did not transition into analytics. In 2022, the government moved from the DUNS number to an opensource code, breaking down the most entrenched barrier.

Another risk is customers destroying the competitive advantages. RELX and Verisk receive data from customers. This data is cleaned, refined, and sold to the same clients. Contributory data models are very powerful but hard to achieve. While these are often natural monopolies, they serve as a common good for the industry. If they exploit high pricing power by appearing as toll takers, customers can create a new data consortium. In the late 1990s, RELX increased prices significantly within its academic journals business. In this business, RELX receives the academic work of researchers and publishes it in journals. Price increases resulted in a backlash from both contributors and clients. This led to the emergence of alternative forms of distribution — most notably open access. While RELX remains the leader today, it recognized that unutilized pricing power may be even more powerful for contributory data models than realized pricing power. Lower prices incentivize the adoption of new products, enabling more data and thereby generating more customer value, further entrenching RELX into the workflow of customers.

“Despite the seemingly insurmountable structural factors and dynamics underpinning companies with decades of experience and strength, history has shown that even they can fail.”

The source around which a company has built a competitive advantage may also run dry. Competitive advantages can become irrelevant, not because they are diminished, but because the lay of the land has changed, rendering the very thing it protects immaterial. Nielsen Holdings was an aspirational data company. Founded in 1923, it coined the very concept of market share and became ubiquitous with TV ratings, retail share, and market data. Nielsen had access to valuable proprietary data and was the standard on which USD 80 billion+ of television advertising spending was based. Nielsen had a network of 100,000 Americans carrying a small device when watching TV. The company also embedded a digital watermark in the audio of television programs, enabling the device to recognize when, how much, and which television programs were watched. It became the standard bearer, fueling the TV advertising market. However, the internet changed the lay of the land. Consumers migrated towards streaming and mobile phones. Advertising moved to Facebook, YouTube, and Google. The media landscape became fragmented. Although Nielsen remains the standard bearer for TV ratings, the medium itself has become less relevant.

Finally, the sensitive nature of information makes data companies a target for hackers. This reinforces the political emphasis on data sovereignty and privacy. Data sovereignty may ultimately limit data companies’ growth opportunities and business development. Data privacy regulations like the General Data Protection Regulation (GDPR) which exclusively defines the protection and processing of personal data, increase the implicit storage costs of holding and transacting with data, and with this follows the risk of increased political scrutiny.

Data is an Interesting Fishing Pond

Data is, without a doubt, growing and becoming more important to all businesses. AI and other tools will further enable more valuable insights to be extracted from data, effectively increasing productivity for most companies. For example, leading industrial companies are utilizing sensors to extract data from their machines and selling such data to their customers. As discussed in other Insights, this solidifies their market positions with intangible intelligence around their products.

In this paper, we looked at companies that have made data the backbone of their business model. We made the distinction between Standard Bearers and Data Librarians. Both types of data companies often develop into natural monopolies with deeply entrenched rights to win, very attractive unit economics, high margins, and returns on capital with strong underlying growth drivers and attractive reinvestment opportunities.

As long-term stock pickers, we see this area as an attractive fishing pond. However, selectivity is required as the area is not without risks.


Important Information

This information has been provided by C WorldWide and was published in December 2024 for informational purposes only. It does not constitute or form part of any offer to issue or sell, or any solicitation of any offer to subscribe or to purchase, shares, units or other interests in investments that may be referred to herein and must not be construed as investment or financial product advice. Harbor nor C WorldWide has not considered any reader’s financial situation, objective or needs in providing the relevant information.

Investing entails risks and there can be no assurance that any investment will achieve profits or avoid incurring losses. Past performance is not necessarily a guide to future performance or returns. C WorldWide has taken all reasonable care to ensure that the information contained in this material is accurate at the time of its distribution, no representation or warranty, express or implied, is made as to the accuracy, reliability or completeness of such information.

The views expressed herein may not be reflective of current opinions, are subject to change without prior notice. This material does not constitute investment advice and should not be viewed as a current or past recommendation or a solicitation of an offer to buy or sell any securities or to adopt any investment strategy.

This material may contain forward-looking information that is not purely historical in nature. Such information may include, among other things, projections and forecasts. There is no guarantee that any of these views will come to pass.

Copyright © (2024) Harbor Capital Advisors, Inc. All Rights Reserved.

4156917

Blue Background

Connect with us | LinkedIn Logo IconLinktree icon to podcast media links

Harbor Funds Distributors, Inc. is the Distributor of the Harbor Mutual Funds.
Foreside Fund Services, LLC is the Distributor of the Harbor ETFs.
FINRA Brokercheck logo in white color

Investing involves risk and the potential loss of capital.

Investors should carefully consider the investment objectives, risks, charges and expenses of a fund before investing. To obtain a summary prospectus or prospectus for this and other information, click here or call 800-422-1050. Read it carefully before investing.

All trademarks or product names mentioned herein are the property of their respective owners. Copyright © 2025 Harbor Capital Advisors, Inc. All rights reserved.