Three Reasons Your Data is Ugly
Sorry to be the one to break the news to you, but…well…your data is ugly.
There. I said it.
Don’t feel too bad, though. Everyone’s data is ugly. When you’re building a Business Intelligence (BI) program, all data is ugly.
We see three main reasons that often cause this:
- The system was not designed with BI in mind
- Business rules and work processes are inconsistent, and have changed over time
- Inconsistent definitions
1. The system was not designed with BI in mind
A system, and its associated data can be perfectly good for its intended purpose – say shipping an order; yet be almost useless to answer BI questions about those shipped orders. For example, you want to enter an order, ship it, invoice it, and collect money.
You can probably do all of that with a lot of specific data missing from the individual orders. You may even be able to serve a retail customer without knowing their name. Or with an incorrect Zip code. Or by leaving default values in many of the fields, even though they are incorrect. The order still goes out the door, the customer is happy, and you still get paid. It takes extra time to collect or verify the correct data, and you don’t want to slow down the customer, and it doesn’t seem to matter anyway – no one seems to be looking at it. Until BI comes along.
Then, when you start grouping by, sorting by, and summarizing by that missing, default, or incorrect data, problems show up. Sometimes it’s obvious, sometimes the numbers just don’t seem to make sense. In the worst cases, no one notices, but decisions are made based on the erroneous data.
When you have the opportunity to design or implement a new system, spend some time thinking about the information the business stakeholders will want to get out of it, not just on correctly processing the transactions on a one-at-a-time basis.
2. Business rules and work processes are inconsistent, and have changed over time
Your business grows and changes. You enter new markets, and create new products. Maybe you start selling thru distributors if you’ve always sold direct. Or vice versa. And then there is always “Special Markets”, where they do EVERYTHING different from everyone else.
Is that transaction taxable or not? How are discounts calculated? What minimum data is required to set up a new account or a new product?
This column now has a drop-down validation list, but it used to be free-form text. Business users have appropriated a column that was intended for one purpose and used it for another. But only sometimes, and only on some rows.
When examined individually, any of these conditions make sense, and may be rational responses to changing circumstances and business needs. But taken together, it creates a bit of chaos, rather than clear context surrounding each business transaction. And often, that chaos is first really visible to the business in a BI app. And it’s ugly.
3. Inconsistent definitions
How many definitions of margin are there in your company? How many changes have happened to sales territories over the last year or three? Do Marketing and Manufacturing legitimately need to group products differently? How many products get reclassified over time? So if I ask a simple business question like “What is gross margin by territory and product line over the last two years?” there is a LOT of room for interpretation:
- Is “gross margin” the same thing as “margin”?
- What is the actual calculation for “gross margin”? Is that what it’s always been? Is Accounting on board with that calculation?
- Which products should be included in each product line, since many may have moved over time? Do you mean the products that were in the product lines at the time of the sale transaction, or do you mean the way the products fit into the product lines today? Both may be legitimate business questions.
- Which customers should be included in each territory, since they, too, have moved over time? Similarly, do you mean the customers as they were at the time of the sale, or as they are today? Or something else?
It takes work
There are also plenty of other reasons data may be ugly – not immediately well suited to the task of BI. But in all cases, there is real work involved in figuring out the correct way the BI program needs to handle each of these. It is really a business problem, much more than a technical one.
Put your business analyst hat on, and go talk to business process owners, system users, analysts, etc. IT can guide the process, but they generally won’t have the perspective of the business stakeholders without going out and asking for it. And if these issues go un-addressed, it will negatively impact the usefulness and trustworthiness of the BI program, potentially to the point of failure. It’s naïve to think it will be OK to simply dump transactional data directly into a BI or data visualization tool and give it to business users. Unfortunately, there is no “easy button” for ugly data.