Wednesday, February 13, 2008

Business Intelligence

Imagine a world where the people that own the data, actually have access to it. Sounds obvious, but think about it. Unless the business user is also the developer, this is never the case.

Your system users own the data that's in your system. Unfortunately for them, your system is also what's keeping them from their data. Every little thing they need has to go through you. The irony is that all you care about is serving the business user.

So, what are the parts of the problem. We have data that has some logical structure and some semantic definition to that structure. The business user implicitly understands the semantic definition, and wants to exploit it to its full potential. A semantic definition will define a relationship such as a cusip is an attribute of a stock, quantity is an attribute of a trade. Stocks are traded on exchanges. etc... Now, the user wants to aggregate quantity by cusip across all trades within an exchange. Fine you say. You go off, come back a few days later with a beautiful brand new report. Great the user says, now I want to see the average quantity traded. Well, of you go again, etc....

So, the data has some semantic definition. The same semantic definition exists in the users head. The user exploits the structure. This is analytics. The user should be able to manipulate the data with the only constraint being the semantic definition. At the moment, this space is filled by cube technology on the data warehouse side, and Business Objects on the relational side. The only real difference between BO and Cube technology is the size of the dataset. Cubes are pre-aggregated while BO is real-time SQL. It should be interesting to link cube technology to BO for drillthrough. So, once you have the data you point pump it into a rich visualization component. But, be careful not to link the visualization with the data. Each piece of technology is independent, but has a well defined interface in how to leverage it. The visualization component can receive data. So, now we have our analytics and visualization. The next part is to take both pieces and generate a static report that can be presented to senior management. This report can be saved, or automatically updated with new data; archived daily, quarterly, etc...

So, not too bad. But, I also want to understand how good my data is. I want to understand the integrity of the data at the lowest level. I need to know the story of every data point. This is where rule engines come in to play. The user will define the rules that will validate integrity. The trick is to have the rule engine tell you how good or bad your data is at any aggregated level. The data isn't discarded but just measured.

So far, the user has the data, can analyze it, visualize it, knows its quality and can report it. The next step is to manipulate it. A lot of times, analytics takes a flavor of what-if analysis. The user should be able to locally modify any data point, analyze the impact, visualize, report, etc...

Well, are we done. Have we satisfied everything the user wants. No. No. No. Now that you have some analysis you need to act on it. The data that you derived has some attributes which via rules can be applied to certain actions. One action can be to feed it into another system for further enrichment.

Are we done now? Damn it no. Once you have all this, you can take the data to the next step. You can mine the data for patterns. The patterns can then feed back to calibrate the data integrity rules.

As the user analyzes the data, the system watches the user. The more analysis done, the more the system can understand the user's intent. At this point, the system can start to infer what the user is trying to do. Now, we are starting to take the flavor of having the system solve equations and then acting on the outcome.

Think about it, but think about it in a context of a massively large data repository, and a Wall Street type firm.

In the interest of buying a solution, I present a vendor list:
Microsoft Analysis Services
Business Objects Web Intelligence
Panorama
Microsoft Sharepoint
Microsoft Excel 2007 (has a lot of cube technology)
Business Objects Dashboard + Reporting + etc...
ILog Jrules