Today’s headlines are filled with the news, stories and predictions of the power and value of big data analytics. Most companies are scrambling to develop new ways to monetize their information assets. Yet often, big data seems to be falling off the radar when it comes to legal and risk compliance. As big data becomes operational, it needs the same governance disciplines that we apply to traditional data management. Unless compliance managers fully understand what makes big data different from traditional business analytics, and how these differences impact the way we approach big data governance, they will not be able to successfully govern them. This understanding can also help inform us as to how we can make big data platforms work to our advantage to reduce overall risk.
Contrary to common misunderstanding, big data does not simply refer to bigger database systems nor to business analytics efforts that leverage larger data sets. Big data projects often share these same attributes, but they neither define big data nor distinguish it from traditional relational databases. Beyond the technology itself, the primary distinction that sets big data apart is the way that information is ingested into and stored by the platforms. Traditional databases are highly structured systems based on interrelated tables of predefined fields. To work, data must first be made to conform to this known structure and be shoehorned into it using a process called extract, transform and load (ETL). Having been initially built as a solution for searching the Internet, big data platforms do away with this necessity. Instead, they are able to ingest massive amounts of raw data, much of which has no preexisting need or use, in just about any format it comes in. Structure isn’t applied until the analysis phase, which changes the approach to extract, load and transform (ELT).
This single distinction of loading data before transforming it has massive consequences for information governance and privacy compliance. ETL allows you to work within a predefined environment where the tables and fields holding sensitive or regulated data, such as PII, are known and easily located. This lets you wrap proper controls around them. With massive amounts of raw data being ingested through ELT, often from real-time data feeds and a variety of internal and external sources, the content of the data remains unknown until analysis is conducted, and then only within the scope of that analysis. This results in large data pools with little understanding of what is in them or what compliance obligations attach. The distinction also leads to a wide array of predictions and conclusions that were previously impossible, and that are based on new and often unknown sources of information. This very point is what prompted the Federal Trade Commission to issue guidance last month on what uses of big data could run afoul of consumer protection laws.
Big data does not need to be a big headache. It just requires that we appreciate what differentiates it from traditional data analytics, and how these distinctions impact our compliance programs. It also requires counsel to get involved during the early program design phases and build privacy and compliance in from the start. Successful work with big data begins with having a clear understanding of the data being ingested and ensuring that the analytics layer includes a compliance component for masking, anonymizing and otherwise securing sensitive data as it comes into the company. The saving grace is that big data is inherently suited to do just that. It just needs the right stakeholders involved to ensure that it is done right.
Published March 3, 2016.