Pons Asinorum: 2007

Tuesday, October 02, 2007

Philosophy of Architecture

I have recently come to face with two distinct philosophies of architecture.

The first philosophy holds the business in the highest esteem to the determent of the system. All projects are done as strategic. This means that the management pushes on the development team to deliver as soon as possible and with a sub-optimal solution. This is what's commonly termed as "getting it done". With this philosophy, requirements tend to be spotty or none existent. In most cases, the requirement document is created after development has already completed. Development is incremental and the system follows an incremental evolution. The business receives the minimum of what was asked, but with the impression of quick delivery. Unfortunately, an incremental evolution causes the development time to continuously increase. This increase is caused, because code is only added and rarely removed. Removing code requires analysis and re-factoring: time which is not factored into the project schedule. Adding code in this way will balloon the system and make adding any future enhancements/changes incrementally more difficult.

The second philosophy is more methodical in its approach. In this case, development goes through some established cycles such as understanding what needs to be built, designing, reviewing, and finally building. This approach has a longer upfront cost before actual development begins but causes the system to move in revolutionary jumps rather then in continuous, increasing steps. With revolution jumps, the system tends to get more compact as code gets re-factored and multiple functionalities folded into a single framework.

Most shops follow the first philosophy. This philosophy is more natural to organic growth. When the user tells you to do something, and they are paying your salary, you do it. With the second philosophy, you would need to have the guts to tell the user, no, wait, let me first understand what you're trying to accomplish, and then we'll review, and then I'll build. This is very difficult. For example, most, if not all, Wall street firms follow the "getting is done" model. The "beauty" of the system is secondary, delivering the project is primary above all rest.

My argument is that beyond creating a simple report, no project should follow the "getting it done" philosophy. Every project needs to have a more methodical approach. Building the first thing that comes to your mind is dangerous and stupid when working with an enterprise system. All projects need proper analysis: what already exists, what should change, what the user wants, what else might they want. Then, draw up the architecture, review it, and only then, build it.

Friday, September 14, 2007

Data Warehousing

I have recently been immersed in the world of BI, OLAP, XMLA, MDX, DW, UDM, Cube, ROLAP, MOLAP, HOLAP, star schema, snowflake, dimensions and facts.

A data warehouse is a special form of repository that sacrifices storage space for ease of retrieval. The data is stored in a special normalized form that literally looks like a star schema. If you change one attribute, an entire row is duplicated. The data is normalized in a way to ease retrieval and reduce table joins. The data warehouse is nothing but a giant relational database whose schema design makes using plain SQL downright ugly. On top of this repository, lies one or more cubes that represent an aggregated view of the massive amounts of data. There are multiple forms of the cube: multi-dimensional online analytical processing(MOLAP), relational online analytical processing (ROLAP), and hybrid online analytical processing (HOLAP). A ROLAP cube is nothing but a special engine that converts the user requests into SQL and passes it to the relational database, a MOLAP is a pre-aggregated cube that allows the user fast retrieval without consistently requiring the underling data store, and a HOLAP is a hybrid of those two approaches. The reason for the cube technology is that it allows the user to slice and dice massive amounts of data online without any developers involvement. On top of the cube technology, there are a set of user front ends either web based or desktop. One such company is Panorama. Each GUI tool communicated with the cube in a standard language called MDX. A multi-dimensional expression language. An XML version of this language is the XMLA protocol, which was originally invented by the cube GUI company Panorama. Microsoft bought out their original tool, and further developed it into what is today called Microsoft Analysis Services 2005, which is a leading cube framework.

So to summarize:
UDB(Relational Database)
Microsoft Analysis Services 2005 (Cube)
Panorama (GUI)

Now the price, well, for a full blow BI (Business Intelligence) solution, you're easily looking into millions just on storage alone not to mention the license costs of the products. There are free solutions at least on the GUI side: one good one is jpivot.

A Data warehouse is a very powerful concept. It allows you to literally analyze your data in real-time. The business users use a friendly GUI to slice and dice their data, aggregate the numbers in different ways, generate reports, etc... The concept allows you to see ALL your data in any way the user imagines or at least in the number of dimensions defined on your cube. A dimension, by the way, is an attribute that it makes sense to slice by. For example, dates or type columns are good dimensions. A fact on the other hand is the business item that you're aggregating. For example, a trade would be considered a fact.

Once you have a data warehouse the next logical extension is KPI (Key Performance Indicators). Imagine looking at a dashboard, and have pretty colors with green, yellow, and red telling you how much money you're making/losing at that point. KPI are special rules that are applied to the data at the lowest level. When you aggregate up, the colors change depending on how you're slicing the data. This allows you to start at the very top of which region isn't doing so well, and then drill down to the very desk that's loosing money.

A further extension of data warehousing is data mining. This is a off-shot of AI and covers areas such as cluster detection, association rules, etc... There will be further blogs covering this in more detail.

So, if you have a huge budget, I recommend you give this a try. Your company will thank you for it(later). And if you don't have a huge budget, understand whether your problem fits in the BI world, and ask for a huge budget. I've seen too many companies take a cheap route and end up with half baked solutions that have no future.

Sunday, August 26, 2007

Rule Engines

Recently, there has been a proliferation of rule engines. A rule engine is by product of AI research. The basic premise is that a user is able to create a bunch of atomic units of knowledge. When the rule engine is presented with a state of the world, the rules all fire. After all the firings have settled down, the new state of the world is an outcome. A lot of problems are easier to implement using rule engines than the more conventional programming. For example, system that relies on heavy usage of knowledge with deep trees - imagine many layers deep of if/elif.

There are a couple of major contenders. For the corporate world, there is ILOG and FairIsaac. For the open source, there is Jboss Rules and Jess. Jess being the original java rule engine, and the closest to the original NASA Clips system. Clips being the system that created the rule engines. Personally, I am most familiar with Jboss Rules, ILOG and to a much lesser degree with Jess. This should not be taken as a diss on FairIsaac or any other rule engine.

Each rule engine, at its core, is based on the RETE algorithm. There are a lot of variations and enhancements, but each rule engine implements the core algorithm. The algorithm is used to find rules that need to be executed for a given world state. Imagine thousands of rules, and a good search algorithm becomes critical to a useful rule engine. The RETE algorithm acts as the control flow in a regular language.

The major blocking point to a wide adoption of rule engines is their dynamic nature and unpredictability. If you define a thousand rules, it becomes difficult to know how the rules will interact in every situation. This means testing and scenario generation is critical. This also means a much more mature infrastructure and process than most organizations have. The advantages are huge. You can explain to your user exactly how a given outcome was reached. Display the rules, modify the rules, add rules, all dynamically. You can even simplify the rule model such that your users can create their own rules.

The next blocking point is the rule language itself. The language has many requirements. For example, some people want the language to have a natural language feel. Others, want a clean interact with the existing java system, while others seek some middle ground with a scripting language. ILOG does this very well, with a natural language translation tool. Jboss rules has a more rudimentary natural language translation (DRL - DSL) but supports a wider language group.

I find Jboss Rules to be easier to get started with, but a large and mature organization should probably take a look at a vendor product for the scenario generation, and rule management infrastructure, something Jboss doesn't quite have yet. The vendors also have much more mature rule editing GUI's.

Saturday, July 07, 2007

Supply of Money

I know this should be a technology oriented blog, but I am starting to be afraid, because I don't understand what is happening.

Money is intrinsically worthless:

"Paper money eventually returns to its intrinsic value - zero." ~ Voltaire - 1729

Our economy is one of exponentially increasing debt. All money (dollar) is loaned at interest from the Fed. The Fed creates money by printing it as basically zero cost. This means that to pay interest you need to borrow more money(get a loan), by so creating more money. Notice the exponential function in all of this. The US economy basically no longer produces anything, and imports everything necessary for basic survival. To import requires purchasing, to purchase requires money, money that needs to be borrowed. Borrowing requires paying interest. How does the government borrow, it borrows from the Fed, which prints more money.

The interesting thing is the bond market which acts as a money sponge. A US treasury bond pays a certain yield. Japan has historically bought billions and billions of US treasuries to the tune of 16% of all US treasury bonds. This is interesting, Japan buys a bond of $100 paying 4% yield. This means that Japan hands over 100 dollars to the US government in exchange for 4% yield. In an essence, $100 dollars disappears from circulation and was replaced by a continuous stream of $4 dollars. Now, $4 dollars has to come from somewhere, it's borrowed from the Fed. This is an ever increasing cycle, growing exponentially fast. What ever money exists in circulation was borrowed at interest. I think all this means is that money can never be destroyed. It can only ever exponentially increase.

What happens on the way back. What happens if the money was to be re-payed to the Fed. The dollar will need to traverse the entire route back. I don't understand how that's possible, but if it was to happen, money would return to its intrinsic value of 0.

A little confusing. Right now, Tokyo's interest rate is extremely low. Tokyo is also trading at about 125 yet to a dollar. Tokyo's rate is around 1 percent, while US and the rest of the western world is at 4 to 5 percent. This means you can get cheap money from Tokyo, convert it into dollars, buy US bonds, and earn a hefty 4.5 percent without doing anything. But you can also leverage your position, by taking on more risk. In this case, you don't buy more yen, but plan to buy later, but also simultaneously use what you don't own. In an essence, you've just created even more supply of money. One day, you will need to reverse you position buy actually buying the yen you promised to buy. This will cause the supply of yen to drop, the demand to sky rocket, and the price to act accordingly. The US dollar is going to continue to drop or in other words go up. The currency must continue to weaken, as it will take more dollars to service the exponentially increasing debt.

China and India will undoubtedly delay the inevitable, but the world economy must and will collapse. An exponential function cannot last indefinitely. This is the conclusion I am drawing, but I must admit I don't understand all the factors. All I know is that I am becoming increasingly uneasy.

Sunday, May 27, 2007

Black Swan

I am becoming obsessed with randomness and probability. What follows is based very heavily on Nassim Nicholas Taleb research. Imagine a turkey on a farm. Everyday the turkey has ever known, the farmer comes every morning and feeds it. From the turkey's point of view, the farmer is a friend, a trusted being. Then, one morning, the farmer kills the turkey. A black swan has occurred from the point of view of the turkey. A completely unexpected event.

Take our stock market, heck take the entire global market, companies, and global economies have created multiple levels to guard against risk. Options trading, derivatives, options on derivatives, credit default swaps, and so on, and on, and on. Each product is designed to allow some risk, some profit, and some safety. Some products have two components such as derivatives, allowing a company to sell its risk to others. Risk, actually, is an interesting side of the coin. Companies have large staffs of risk professionals, calculating, and guarding the said corporations from risk. Recently, companies started to realize that risk comes in many forms, and a new area was born "operational risk". This is the risk where an employee goes crazy and shoots everyone. So, you would argue that all this guards the said companies from risk. Now, Nassim Taleb, and myself, actually believe that this enhances risk. All this calculating is simply creating an impression of safety. Like the turkey, we go day in and day out believing we are safe, until one day, the farmer kills the turkey.

The basic problem is that we can't understand the future. In fact, we can't understand that we can't understand the future. We keep believing in things, looking for correlations, patterns in randomness. We find them, in fact we tend to create patterns in randomness. Are the markets random? I would argue no. In fact, I would argue that the markets are becoming very much un-random. The markets are starting to be governed by machines following very concrete rules. There are also very few players in the market that have the weight to move markets, and a lot of those players are using machines. All of this is very scary.

Another interesting example is China. An unprecedented amount of common people are investing heavily in the market. And, the market is going up and up and up. But, like everything else in life, it will come down, and boy will it come down hard. And there will be ripples throw the global markets, and global economies. But, this isn't the black swan I am afraid of. I am afraid of something more. I am afraid of something we don't know is going to happen.

Global Development

It is all the rage these days to do global development. One "system", global implementation. The idea being is economy of scale. Any region can perform development allowing other regions to reap the rewards. There are different ways for a single system to achieve global development.

1. The system is being developed by 1 region. All global requirements are funneled to this region. The actual system maybe run centrally or locally within the regions.

2. Each region has a separate system which, based on an agreed protocol, feed a shared central system.

Ah, but there is another way. You maybe able to have a single system, and yet global, parallel development. You can split the system into areas of concern, and assign different parts to different systems. Unfortunately, at one point or another, the areas will overlap. This brings up an interesting scenario. Single system, many teams split across different timezones answering to different management, have different requirements, different users, schedules, etc... Quite a mess. Now, each region is actually working on a common goal. The system is a specific system, serving a specific goal, but different masters. The trick is to split the system into a common framework and a regional implementation. If the regions are using the same system, and there is a core of the system which is indeed universal, but there is also an aspect of the system which is very much unique to a given region. Understand the problem the system is solving. Then understand the fundamental aspect of the system, the raw materials, if you will. This is the common framework. Each region may modify the framework, but what they are doing is enhancing the breadth of the system. Imagine a graph, links and nodes going every which way. Imagine dark areas of the graph, unexplored. These dark areas represent parts of the system developed by other regions, but not yet used locally. When a given area matures to that functionality, it will be there for it. The unexplored areas of the graph become used, and therefore visible. This seems a very interesting way to create a global enterprise architecture. Model the system as a graph, allow each region to build out the graph, but in such a way as to allow other regions to use only what they need. Then allow the graph to be customized to the regions needs. If done correctly, the system will become a set of loose shared modules, with concrete implementation by each region. The regions decide how the modules are used and how they link. Of course, some linkages is defined. Regions may enhance existing modules, build new ones, or create region specific enhancements to existing.

Sunday, April 22, 2007

Equilibrium

I had a chat with a Rabbi the other day. He told me a story from his life. When he was a young man, he had trouble sleeping. He would sleep at most 4 hours a night. He was worried that he had a sleeping disorder, so he found a top doctor on sleeping disorders. The doctor had him keep track of the number of ours he slept every night for a month. At the end, the doctor identified that the Rabbi slept an average of 4 hours every night. Sometimes, 4:15, other times, 3:50, but on the average 4 hours. What the doctor told the Rabbi was that he was one of the lucky ones. Most people are in the middle, and require 7 hours of sleep. The Rabbi was an extreme exception on the far side of the curve, requiring only 4 hours. The Rabbi was lucky because he has 3 hours more a day than everyone else. This story is interesting in that in this day and age, in this country, the rabbi would be put on sleeping medication. I am pretty sure that a number of hours people sleep fits a bell curve. Most people are in the middle sleeping somewhere between 6 and 8 hours. But the tails of the curve expend in both directions; some, requiring more like 9 or 10, while others requiring less like 4 or 5. Now, the established medical principle of the day and age is to fit everyone into the middle with no tails. I see this in everything. For example, medical community preaches that cholesterol should be below 200. Now, what makes 200 a magic number that applies to the entire population regardless of background. I would imagine that cholesterol, like everything else, follows a bell curve. Most people's normal average is 200, but the tails of the curve, go out in both directions. Some have a high average cholesterol number, and that is considered normal for their bodies, while others, have a low average. It is very troubling that most things are being applied indiscriminately. We, as a society, are loosing the equilibrium in favor of the standard.

Monday, February 05, 2007

I usually stay away from such posts, but I can't resist. Check out these two sites:

http://www.zoho.com/
http://services.alphaworks.ibm.com/ManyEyes/

Saturday, February 03, 2007

A priori

A priori is a term that describes sequence of events, time. More specifically, development is a sequence of steps, events, that produce a desired result. The question that bothers me is why does it freaking take so long.

A colleague of mine was recently complaining that his users were upset that it takes his team a long time to develop seemingly simple functionality. Why does it take weeks to read some data, apply some business rules, send some messages, and produce a report.

The world of business tools can be thought of as a giant, ever increasing graveyard. The business tools are being continuously and artificially given life to. Like little Frankensteins, they roam the earth, used and abused by both users and developers, growing up, until being killed off and replaced with a younger Frankensteins that are doomed to the same fate.

Excel is the only tool that comes to mind that has escaped this fate. It allows the business user to solve his own problems. Unthinkable to a crack smoking code monkey. The user can load data, build his models, produce reports, export them out. The power is in the users hands. On the other side, the developer attempts to give the user exactly what the user asked for and nothing, and I mean nothing else. In fact, the majority of the time, the developer neither understands the business user nor the business nor the problem being solved.

I think the industry is starting to realize this and is attempting to shift the power back to the business user. For example, specs like BPEL and the hype surrounding web-services are all meant to give more power to the business user and reduce the turn-around time of development. I believe software will become less like software and more like legos. Individual pieces will still need to be built, but the business user is the one that will put the legos together to produce a result. Things like forms, business rules, reports, data loading, data extraction will go away. Instead, time will be spent on producing richer widgets to do more sophisticated things. Honestly, how many developers does it take to build a relatively large system that does a whole lot of variations of the 5 things mentioned above? 1, 2, 5, 7, 10, 40? How big is your team?

Friday, January 26, 2007

I've been away for a long time. For that, I am sorry. But, now I am back.

What's been on my mind lately is whether its possible to encode a business intention in an intermediary language and then build an interpreter to read this language. One system would encode an intention, the second system would evaluate it. Interesting, no? Perhaps, all this means is that system A sends a message to system B. System B reads the message and based on hard-coded business rules performs the work. But, let's say there are no hard-coded business rules. Let's say the message is the rules and the data. Would that be possible? What would this language look like. It would need to contain meta-data that could be evaluated and mapped to business rules. Let's step back a little. What's the point of this. System B is a specific system that does a specific thing. It should know what to do with the message without needing System A to tell it. A new trade message arrives, your system receives the trade. It knows its a new trade, because it says so on the message. What is the action, book the trade. So, your system dynamically looks up all the supported actions, and passes the data-set to that rule-set. Now, some of you are thinking, great, all this and he describes a bloody factory pattern. But wait, forget messages. It's an event. Something, some how raises an event that says there is a new action with the given payload. Some controller accepts the event and routes it to the appropriate implementation for that event, or perhaps a set of implementations, or even better triggers the work-flow. Now, we're getting somewhere. The event name maps to a business intention, which is specified as a work-flow. But, the work-flow is a generic concept. It's not real unless there is code behind it. So, we build a bunch of modularized code that does specific functions, we wire it together with dependency injection and have a dynamic work-flow define the execution path.

Pons Asinorum