Overcoming Data Inequality- Valutrics

Businesses say they want to integrate external data with in-house data, but bias and a lack of tools stand in the way.

For many organizations, the goal of business intelligence (BI) and Big Data is to better understand and leverage the complete data landscape for commercial advantage. Internal company data most often serves as the fuel for BI and Big Data initiatives, but in reality this is only part of the picture.

Consider a traditional oil and gas company. Back in the day, operations were much simpler with seemingly endless demand growth, a relatively small universe of known peers, and few reasonable alternatives to traditional fossil fuels. The combined skills of a company’s workforce, its proprietary technological approach, and other intrinsic factors about how the company functioned set it apart from its competition, and no one knew that information space better than the company’s executives.

Today, the industry and data landscape for this type of company is profoundly different and more complex. From global oil prices and technology that fostered the development of presalt oil resources in offshore Brazil, to the technological rethink that spurred the US shale revolution and an increasingly environmentally-mindful global community, profit potential from traditional oil and gas exploration is vastly more complicated and difficult to assess.

In this context, internal data must be paired with external data to reveal the biggest, most complete picture that helps guide strategic and accurate decisions. A whole industry exists to supply companies with this kind of helpful external data.

Barriers remain

So why is it so hard for executives to find and access data they need?  Why do information initiatives continue to favor internal data analysis, often excluding external data?

We see an ongoing, strong inequality between internal and external data and a few common barriers impeding data “commingling” — cost, findability, complexity, and connectivity. Addressing these barriers is the key to achieving the future vision of consistent, easy, affordable data discovery experience for information workers. Currently, there are data providers, data marketplaces, business intelligence and enterprise search solutions serving different needs. Yet no one is delivering on the completeness of this vision.

Cost. Internal data is free in the sense that a company does not need to pay for access to it. It’s an internal by-product of normal day-to-day operations and shareable with any employee with an information need. External data typically carries a significant ongoing cost and a variety of potential restrictions and roadblocks to its use. For example, Bloomberg terminals are powerful tools but they are expensive and often only accessible to a select few users with terminal access rights. Furthermore, the terminal is designed to be used as a standalone instrument, not in combination with other data resources.

Also consider the fact that while there are several free, open source BI solutions available — which are highly effective in accessing and analyzing internal data and deriving insights — there are no comparable offerings for external data. BI for internal data is being democratized, while the same cannot be said for external data.

Findability.  Even for enterprises with access to external data sources, it is nearly impossible to provide a consistent data discovery experience to their information workers. Most enterprise search engines today are designed to search across documents, which can miss powerful data consisting of points indexed on a chart, with little textual metadata available. Data records can also be many orders of magnitude greater in scale than typical document storage, making it even harder to unearth valuable data. 

Specialized data-first search engines are needed to work around issues that make data searches unique from typical document searches. Data marketplaces are the information industry’s starting point to amalgamate data and make data discovery simpler. But these marketplaces still need highly precise, AI-powered search capabilities specifically tailored to unearthing data. These search capabilities must be capable of constant evolution, learning from past searches and incorporating new algorithms to continually sharpen relevancy. 

Complexity.  Proprietary data designed for corporate consumption is often spread across dozens of external systems (examples include DataStream global financial data; Experian and Dun Bradstreet credit reports; Acxiom demographic information). There are hundreds of publishers, delivery approaches and pricing models, which often can’t be easily integrated into a single data warehouse or enterprise BI solution for many reasons. All of this external data presents way too much hassle for IT teams, who naturally are more comfortable with their own data warehouses and BI solutions.

Connectivity. Companies often say they want external (industry and market data) to be available through their favorite database and business intelligence solutions. Today, several data marketplaces offer APIs that allow external dataset providers to push their data into the data marketplace. But, we are seeing very little in terms of BI and database vendors opening their systems to the data marketplaces. The industry badly needs new methods not just for accessing and finding, but also incorporating external data into the analytics flow.

The current state of the data market is very similar to the late 1990’s or early 2000’s, when there were many points of entry to the Internet — directories, catalogs and portals — but no dominant player. Then Google emerged and blew everyone else away by making Internet data discovery consistent, easy and affordable, in a sense, trivial. Huge opportunities abound for forward-thinking industry leaders who can do the same thing for data, both internal and external, as Google did for the Internet.

Vladimir Bougay is chief executive officer of Knoema.