Sunday, October 31, 2010

Zion Business Intelligence

How about a hypothetical scenario -

You are an Enterprise Architect in a Global Investment Bank and tasked with finding a solution to the data fragmentation problem.

And this is how it happened -
This particular bank has many data files flowing whether which way between various groups in the organization. Product controllers take a feed from front office systems, and market risk another feed, and credit analysts take a third feed, except the 3 feeds are generated differently, and have slightly different points of view, and in some cases, even different trade populations. But then, someone comes along and says they need to see 3 numbers - 1 from market risk, 1 from credit analysts, and 1 from product controllers side by side. And you say - that's impossible because it will take months just to analyze whether the data populations are the same, and even if there were, the numbers are produced at different levels, and even if they weren't, there is no way to show them on the same report. And even if there was, this is a 6 month project, and who is going to pay for it. And then they say, "aren't you an architect", and you hesitantly say "yes", and then they say "so go fix it."

Rabbit Hole
So, how to fix it. Core data primarily generates from front office systems and then flows through the rest of the organization. You can think of it as streams of water branching off into many smaller sub-streams, and then further branching off, until eventually the stream is to weak to branch of and just sips into the ground.

Well, easy breezy you say. Pervasive BI, data warehousing, Online Analytical Processing, bottom up data-warehousing, top down, bus architecture, centralized architecture, federation, hub and spoke, relational, dimensional, operational stores, data marts, Inmon, Kimball, conformed dimensions, Boyce-Codd normal form, 3NF, .....

We just need to take this mix - shake it up, and we'll get ourselves a fancy enterprise data architecture or perhaps something strong enough, maybe with a cherry - so we can forget the whole thing ever happened.

Blue pill or the red pill
What to do.

Well, Kimball likes the bus architecture, so, let's give that a go. The bus architecture consists of a bottom up or was it top down approach where you basically start off with a bunch of data marts, which then flow into a data warehouse. The data marts are primarily operational store type structures, while the data-warehouse is a pure data-warehouse, star-schema and all.
The problem, of course, is that you effectively already have a ton of little data marts all over the place, which don't conform. Ah, that's the problem, we need to have conformed dimensions. Right, and we do that how exactly? The other problem, of course, is that this is the same data being treated in a slightly different way, maybe with a slightly different trade population or attribute set or granularity or perhaps with a different temporal point of view. Seems awfully wrong to have a bunch of data marts storing the same data, which you then have to reconcile all together.

Right you say, let's go the other way, Inmon likes top down, so, let's create a big data warehouse, which then populates the data marts. So, who is going to build this monstrosity exactly? Well, can't be the individual business groups, because they are not stupid enough to take on a project like this, so, it would have to be some central group away from any particular business line, and close to senior management, 'cause this is going to cost a lot of money. So, the group is created, except they don't know what they are doing - very technical guys, but don't understand the business at all. And if by some miracle they do, they can't keep up with it. If this project actually succeeds, which is highly unlikely, and actually have the right amount of data, which is frankly impossible, it will still fail, because the business moves just too damn fast, and at the end of the day, they will never ever be able to fully understand and own what they are storing. So, after some time, and a whole lot of money, this will be dramatically killed off.

What about federation you say? We just need a magical vendor, and all our problems will disappear. You see this vendor will create an abstraction on top of our asylum, and this way, we will present a clear simple view shielding the end user from the underlying complexity - easy, breezy. I suppose that could work if it was actually possible to build enough complexity into this thing to actually bridge something which is fundamentally diverging and is actually able to perform at the required speed, and is actually supportable. So, let's just kill that idea for now, before we embarrass ourselves much further. Perhaps, if works for you readers out there - if you happen to like this federation thing - check out Composite, Inc - it's all the rage these days.

Red pill
Reality it is then. Well, seems to me we need a new philosophy. Let's call it Zion. There are a few core principles of this theory:

1. Each piece of data has 1 and only 1 owner.
2. The owner is the only one that can change this piece of data.
3. Each piece of data has a natural key and a surrogate key

The distinction between this theory and all the above is that this one says that first you need to understand what data is before you start deciding on how to deal with it. And the first question to answer is who is responsible for it? Responsible is not the same as which IT group owns the data-warehouse, but responsible, as in which business group is responsible for this data. If you need to change it, who best knows how to change it? How to evolve it? What it means?

If you answer this question, now you know who is going to build the data store for this type of data - and this will become the golden source for this data.

To be continued....