Frequently Asked Questions:
What is a commercial data warehouse?
What is the difference between a data warehouse and a database?
Aren't data warehouses regulated by U.S. law?
What are the Fair Information Practices?
Why do federal agencies have contracts with commercial data warehouses?
Why do commercial data warehouses rely so heavily on public records?

What is a commercial data warehouse?

Commercial data warehouses consolidate data from various sources and resell it to third parties. Data warehouse sources include public records such as birth, marriage, and death records; voter registrations; court files; arrest records; property ownership and tax information; driver's license information; occupational licenses; Securities and Exchange Commission filings; and Census data. They also include, but are not limited to media reports, merchant records, credit reports, change of address information, phone records, student surveys, and private investigative records.

What is the difference between a data warehouse and a database?

Following is an industry definition of a data warehouse, which makes a clear distinction between a data warehouse and a database:

"So, what is a data warehouse? Basically, a data warehouse is a collection of data with certain characteristics. This collection of data is typically used for analysis purposes, is subject-oriented, time-variant, integrated and cleansed to conform to a standard understanding or definition of content and meaning, and optimized to support the analysis/DSS/BI tools and functions of an organization....Most data warehouses will contain vast amounts of historical data, from three to as many as ten years' worth.

A database is also a collection of data. However, that is about the limit of the commonality between the two entities. While a data warehouse can be, and usually is, implemented in a database, there are many other databases in most organizations which do not qualify as data warehouses. These databases are the OLTP databases that support the daily business activities of an organization. Conceptually, these are the record keeping tools of the organization. They are designed and optimized to support the input of individual transactions..."

Source: For more information about databases and data warehouses, see Collections of Data: Bases, Marts, Warehouses from ComputerWorld.

Aren't data warehouses regulated by U.S. law?

There is no comprehensive data protection law in the U.S., as there is in most developed countries. Consumers have virtually no control over secondary uses of their data.

Beth Givens, Director of the Privacy Rights Clearinghouse, states: "It would not be impossible for an insurance company to attempt to purchase a list of people who have diabetes or asthma in order to compare that information with their own data in order to screen out individuals with pre-existing conditions, or to raise the rates of those who have indicated elsewhere that they have those ailments."

Givens recommends that the U.S. "pass laws that prohibit personal data from being used for unintended secondary purposes. Such laws should be based on the Fair Information Practices (FIPs), a set of privacy principles first established in the 1970s by the U.S. Department of Health, Education and Welfare, and then expanded by the Organization for Economic Cooperation and Development in 1980. The FIPs form the foundation of several privacy-related laws such as the Fair Credit Reporting Act (15 U.S.C. sec. 1681 et. seq.) and the Cable Communications Policy Act (47 U.S.C. sec. 551)."

Source: The Information Marketplace: Merging and Exchanging Consumer Data.

What are the Fair Information Practices?

Currently the Federal Trade Commission believes industry self-regulation is the least intrusive and most efficient means to ensure fair information practices. Self-regulation is based upon The Code of Fair Information Practices (FIP), originally drafted in 1972.

FIP is based on five principles:
•There must be no personal data record-keeping systems whose very existence is secret.
•There must be a way to find out what personal information is in a record and how it is used.
•There must be a way to prevent personal information that was obtained for one purpose from being used or made available for other purposes without consent.
•There must be a way to correct or amend a record of personal identifiable information.
•Any organization creating, maintaining, using, or disseminating records of identifiable personal data must assure the reliability of the data for their intended use and must take precautions to prevent misuses of the data.

Jerry Berman, Executive Director The Center For Democracy & Technology, states: "Bad actors will not self-regulate: the clueless or new on the scene may not have the resources or where-with-all to participate in regulating their own behavior. Law is critical to spreading the word and ensuring widespread compliance with fair, privacy protective standards....On the public policy front, [the U.S.] must adopt legislation that incorporates into law Fair Information Practices -- long-accepted principles specifying that individuals should be able to determine for themselves when, how, and to what extent information about them is shared."

Sources: EPIC's The Code of Fair Information Practices, CDT'S Privacy Basics: Fair Information Practices, Privacy Online: Fair Information Practices In The Electronic Marketplace

Why does the U.S. government have contracts with commercial data warehouses?

The Privacy Act of 1974 places restrictions on the collection, use and dissemination of personal information by government agencies, but places no limitations on the private sector. Therefore, government agencies have begun to rely on the huge databases that are freely maintained by private companies in order to retrieve information.

ChoicePoint (CP) is the biggest data warehouse or commercial supplier of personal data to federal agencies. The FBI, Deptartment of Justice, and IRS all have multi-million dollar accounts with CP. CP has 10 billion records and contracts with 35 federal agencies to share data with them.

Caron Carlson reports in eWeek: "Thus far, the government appears unconcerned about regulating its sources of personal data. The FBI's use of commercial databases has grown 9,600 percent over the last decade, according to EPIC. The bureau uses credit records, property records, professional licenses, driver's licenses and other data purchased from companies such as ChoicePoint Inc., of Alpharetta, Ga., and LexisNexis, of Dayton, Ohio, as well as credit reporting agencies such as Atlanta-based Equifax Inc., Experian Information Solutions Inc., of Costa Mesa, Calif., and Trans Union LLC, of Chicago. But none of these companies is held accountable for the truth or accuracy of the information it sells."

Source: Hearing on Data Mining: Current Applications and Future Possibilities, Who's Minding Your Data?

Why do commercial warehouses rely so heavily on public records?

Public records are open to inspection by anyone. The definition of what records are public varies depending on state and federal law. The definition may include government contracts with businesses, birth, marriage, and death records, court files, arrest records, property ownership and tax information, minutes of meetings of government entities, driver's license information, occupational licenses, and Securities and Exchange Commission filings.

Many of these public records contain personal information. This information is required to be divulged when citizens interact with state or federal bureaucracies to vote, receive public benefits, and enjoy privileges. However, once a record becomes public, there is generally no restriction on the way the personal information in the record is used.

Commercial data warehouses are, therefore, consolidating public information and reselling it for a profit. People are willing to pay data warehouses for free information because the data warehouses make it easy to access the data online and deliver it in a database format ready for instant use.

This commercial trend is undermining one of the original purposes of making this information public in the first place: to keep powerful government bureaucracies in check and accountable to its citizens. Daniel Solove, an expert in information privacy law, writes: "A growing number of large corporations are assembling dossiers on practically every individual by combining information in public records with information collected in the private sector such as one's purchases, spending habits, magazine subscriptions, web surfing activity, and credit history. Increasingly, these dossiers of fortified public record information are sold back to government agencies for use in investigating people." Solove believes there is a need to regulate the use of pubic records, restricting commercial access and use in light of the new technologies of the Information Age."

Sources: EPIC's Privacy and Public Records, Daniel Solove's Access and Aggregation: Public Records, Privacy and the Constitution