Pons Asinorum

Tuesday, October 02, 2007

Philosophy of Architecture

I have recently come to face with two distinct philosophies of architecture.

The first philosophy holds the business in the highest esteem to the determent of the system. All projects are done as strategic. This means that the management pushes on the development team to deliver as soon as possible and with a sub-optimal solution. This is what's commonly termed as "getting it done". With this philosophy, requirements tend to be spotty or none existent. In most cases, the requirement document is created after development has already completed. Development is incremental and the system follows an incremental evolution. The business receives the minimum of what was asked, but with the impression of quick delivery. Unfortunately, an incremental evolution causes the development time to continuously increase. This increase is caused, because code is only added and rarely removed. Removing code requires analysis and re-factoring: time which is not factored into the project schedule. Adding code in this way will balloon the system and make adding any future enhancements/changes incrementally more difficult.

The second philosophy is more methodical in its approach. In this case, development goes through some established cycles such as understanding what needs to be built, designing, reviewing, and finally building. This approach has a longer upfront cost before actual development begins but causes the system to move in revolutionary jumps rather then in continuous, increasing steps. With revolution jumps, the system tends to get more compact as code gets re-factored and multiple functionalities folded into a single framework.

Most shops follow the first philosophy. This philosophy is more natural to organic growth. When the user tells you to do something, and they are paying your salary, you do it. With the second philosophy, you would need to have the guts to tell the user, no, wait, let me first understand what you're trying to accomplish, and then we'll review, and then I'll build. This is very difficult. For example, most, if not all, Wall street firms follow the "getting is done" model. The "beauty" of the system is secondary, delivering the project is primary above all rest.

My argument is that beyond creating a simple report, no project should follow the "getting it done" philosophy. Every project needs to have a more methodical approach. Building the first thing that comes to your mind is dangerous and stupid when working with an enterprise system. All projects need proper analysis: what already exists, what should change, what the user wants, what else might they want. Then, draw up the architecture, review it, and only then, build it.

Friday, September 14, 2007

Data Warehousing

I have recently been immersed in the world of BI, OLAP, XMLA, MDX, DW, UDM, Cube, ROLAP, MOLAP, HOLAP, star schema, snowflake, dimensions and facts.

A data warehouse is a special form of repository that sacrifices storage space for ease of retrieval. The data is stored in a special normalized form that literally looks like a star schema. If you change one attribute, an entire row is duplicated. The data is normalized in a way to ease retrieval and reduce table joins. The data warehouse is nothing but a giant relational database whose schema design makes using plain SQL downright ugly. On top of this repository, lies one or more cubes that represent an aggregated view of the massive amounts of data. There are multiple forms of the cube: multi-dimensional online analytical processing(MOLAP), relational online analytical processing (ROLAP), and hybrid online analytical processing (HOLAP). A ROLAP cube is nothing but a special engine that converts the user requests into SQL and passes it to the relational database, a MOLAP is a pre-aggregated cube that allows the user fast retrieval without consistently requiring the underling data store, and a HOLAP is a hybrid of those two approaches. The reason for the cube technology is that it allows the user to slice and dice massive amounts of data online without any developers involvement. On top of the cube technology, there are a set of user front ends either web based or desktop. One such company is Panorama. Each GUI tool communicated with the cube in a standard language called MDX. A multi-dimensional expression language. An XML version of this language is the XMLA protocol, which was originally invented by the cube GUI company Panorama. Microsoft bought out their original tool, and further developed it into what is today called Microsoft Analysis Services 2005, which is a leading cube framework.

So to summarize:
UDB(Relational Database)
Microsoft Analysis Services 2005 (Cube)
Panorama (GUI)

Now the price, well, for a full blow BI (Business Intelligence) solution, you're easily looking into millions just on storage alone not to mention the license costs of the products. There are free solutions at least on the GUI side: one good one is jpivot.

A Data warehouse is a very powerful concept. It allows you to literally analyze your data in real-time. The business users use a friendly GUI to slice and dice their data, aggregate the numbers in different ways, generate reports, etc... The concept allows you to see ALL your data in any way the user imagines or at least in the number of dimensions defined on your cube. A dimension, by the way, is an attribute that it makes sense to slice by. For example, dates or type columns are good dimensions. A fact on the other hand is the business item that you're aggregating. For example, a trade would be considered a fact.

Once you have a data warehouse the next logical extension is KPI (Key Performance Indicators). Imagine looking at a dashboard, and have pretty colors with green, yellow, and red telling you how much money you're making/losing at that point. KPI are special rules that are applied to the data at the lowest level. When you aggregate up, the colors change depending on how you're slicing the data. This allows you to start at the very top of which region isn't doing so well, and then drill down to the very desk that's loosing money.

A further extension of data warehousing is data mining. This is a off-shot of AI and covers areas such as cluster detection, association rules, etc... There will be further blogs covering this in more detail.

So, if you have a huge budget, I recommend you give this a try. Your company will thank you for it(later). And if you don't have a huge budget, understand whether your problem fits in the BI world, and ask for a huge budget. I've seen too many companies take a cheap route and end up with half baked solutions that have no future.

Sunday, August 26, 2007

Rule Engines

Recently, there has been a proliferation of rule engines. A rule engine is by product of AI research. The basic premise is that a user is able to create a bunch of atomic units of knowledge. When the rule engine is presented with a state of the world, the rules all fire. After all the firings have settled down, the new state of the world is an outcome. A lot of problems are easier to implement using rule engines than the more conventional programming. For example, system that relies on heavy usage of knowledge with deep trees - imagine many layers deep of if/elif.

There are a couple of major contenders. For the corporate world, there is ILOG and FairIsaac. For the open source, there is Jboss Rules and Jess. Jess being the original java rule engine, and the closest to the original NASA Clips system. Clips being the system that created the rule engines. Personally, I am most familiar with Jboss Rules, ILOG and to a much lesser degree with Jess. This should not be taken as a diss on FairIsaac or any other rule engine.

Each rule engine, at its core, is based on the RETE algorithm. There are a lot of variations and enhancements, but each rule engine implements the core algorithm. The algorithm is used to find rules that need to be executed for a given world state. Imagine thousands of rules, and a good search algorithm becomes critical to a useful rule engine. The RETE algorithm acts as the control flow in a regular language.

The major blocking point to a wide adoption of rule engines is their dynamic nature and unpredictability. If you define a thousand rules, it becomes difficult to know how the rules will interact in every situation. This means testing and scenario generation is critical. This also means a much more mature infrastructure and process than most organizations have. The advantages are huge. You can explain to your user exactly how a given outcome was reached. Display the rules, modify the rules, add rules, all dynamically. You can even simplify the rule model such that your users can create their own rules.

The next blocking point is the rule language itself. The language has many requirements. For example, some people want the language to have a natural language feel. Others, want a clean interact with the existing java system, while others seek some middle ground with a scripting language. ILOG does this very well, with a natural language translation tool. Jboss rules has a more rudimentary natural language translation (DRL - DSL) but supports a wider language group.

I find Jboss Rules to be easier to get started with, but a large and mature organization should probably take a look at a vendor product for the scenario generation, and rule management infrastructure, something Jboss doesn't quite have yet. The vendors also have much more mature rule editing GUI's.

Saturday, July 07, 2007

Supply of Money

I know this should be a technology oriented blog, but I am starting to be afraid, because I don't understand what is happening.

Money is intrinsically worthless:

"Paper money eventually returns to its intrinsic value - zero." ~ Voltaire - 1729

Our economy is one of exponentially increasing debt. All money (dollar) is loaned at interest from the Fed. The Fed creates money by printing it as basically zero cost. This means that to pay interest you need to borrow more money(get a loan), by so creating more money. Notice the exponential function in all of this. The US economy basically no longer produces anything, and imports everything necessary for basic survival. To import requires purchasing, to purchase requires money, money that needs to be borrowed. Borrowing requires paying interest. How does the government borrow, it borrows from the Fed, which prints more money.

The interesting thing is the bond market which acts as a money sponge. A US treasury bond pays a certain yield. Japan has historically bought billions and billions of US treasuries to the tune of 16% of all US treasury bonds. This is interesting, Japan buys a bond of $100 paying 4% yield. This means that Japan hands over 100 dollars to the US government in exchange for 4% yield. In an essence, $100 dollars disappears from circulation and was replaced by a continuous stream of $4 dollars. Now, $4 dollars has to come from somewhere, it's borrowed from the Fed. This is an ever increasing cycle, growing exponentially fast. What ever money exists in circulation was borrowed at interest. I think all this means is that money can never be destroyed. It can only ever exponentially increase.

What happens on the way back. What happens if the money was to be re-payed to the Fed. The dollar will need to traverse the entire route back. I don't understand how that's possible, but if it was to happen, money would return to its intrinsic value of 0.

A little confusing. Right now, Tokyo's interest rate is extremely low. Tokyo is also trading at about 125 yet to a dollar. Tokyo's rate is around 1 percent, while US and the rest of the western world is at 4 to 5 percent. This means you can get cheap money from Tokyo, convert it into dollars, buy US bonds, and earn a hefty 4.5 percent without doing anything. But you can also leverage your position, by taking on more risk. In this case, you don't buy more yen, but plan to buy later, but also simultaneously use what you don't own. In an essence, you've just created even more supply of money. One day, you will need to reverse you position buy actually buying the yen you promised to buy. This will cause the supply of yen to drop, the demand to sky rocket, and the price to act accordingly. The US dollar is going to continue to drop or in other words go up. The currency must continue to weaken, as it will take more dollars to service the exponentially increasing debt.

China and India will undoubtedly delay the inevitable, but the world economy must and will collapse. An exponential function cannot last indefinitely. This is the conclusion I am drawing, but I must admit I don't understand all the factors. All I know is that I am becoming increasingly uneasy.

Sunday, May 27, 2007

Black Swan

I am becoming obsessed with randomness and probability. What follows is based very heavily on Nassim Nicholas Taleb research. Imagine a turkey on a farm. Everyday the turkey has ever known, the farmer comes every morning and feeds it. From the turkey's point of view, the farmer is a friend, a trusted being. Then, one morning, the farmer kills the turkey. A black swan has occurred from the point of view of the turkey. A completely unexpected event.

Take our stock market, heck take the entire global market, companies, and global economies have created multiple levels to guard against risk. Options trading, derivatives, options on derivatives, credit default swaps, and so on, and on, and on. Each product is designed to allow some risk, some profit, and some safety. Some products have two components such as derivatives, allowing a company to sell its risk to others. Risk, actually, is an interesting side of the coin. Companies have large staffs of risk professionals, calculating, and guarding the said corporations from risk. Recently, companies started to realize that risk comes in many forms, and a new area was born "operational risk". This is the risk where an employee goes crazy and shoots everyone. So, you would argue that all this guards the said companies from risk. Now, Nassim Taleb, and myself, actually believe that this enhances risk. All this calculating is simply creating an impression of safety. Like the turkey, we go day in and day out believing we are safe, until one day, the farmer kills the turkey.

The basic problem is that we can't understand the future. In fact, we can't understand that we can't understand the future. We keep believing in things, looking for correlations, patterns in randomness. We find them, in fact we tend to create patterns in randomness. Are the markets random? I would argue no. In fact, I would argue that the markets are becoming very much un-random. The markets are starting to be governed by machines following very concrete rules. There are also very few players in the market that have the weight to move markets, and a lot of those players are using machines. All of this is very scary.

Another interesting example is China. An unprecedented amount of common people are investing heavily in the market. And, the market is going up and up and up. But, like everything else in life, it will come down, and boy will it come down hard. And there will be ripples throw the global markets, and global economies. But, this isn't the black swan I am afraid of. I am afraid of something more. I am afraid of something we don't know is going to happen.

Global Development

It is all the rage these days to do global development. One "system", global implementation. The idea being is economy of scale. Any region can perform development allowing other regions to reap the rewards. There are different ways for a single system to achieve global development.

1. The system is being developed by 1 region. All global requirements are funneled to this region. The actual system maybe run centrally or locally within the regions.

2. Each region has a separate system which, based on an agreed protocol, feed a shared central system.

Ah, but there is another way. You maybe able to have a single system, and yet global, parallel development. You can split the system into areas of concern, and assign different parts to different systems. Unfortunately, at one point or another, the areas will overlap. This brings up an interesting scenario. Single system, many teams split across different timezones answering to different management, have different requirements, different users, schedules, etc... Quite a mess. Now, each region is actually working on a common goal. The system is a specific system, serving a specific goal, but different masters. The trick is to split the system into a common framework and a regional implementation. If the regions are using the same system, and there is a core of the system which is indeed universal, but there is also an aspect of the system which is very much unique to a given region. Understand the problem the system is solving. Then understand the fundamental aspect of the system, the raw materials, if you will. This is the common framework. Each region may modify the framework, but what they are doing is enhancing the breadth of the system. Imagine a graph, links and nodes going every which way. Imagine dark areas of the graph, unexplored. These dark areas represent parts of the system developed by other regions, but not yet used locally. When a given area matures to that functionality, it will be there for it. The unexplored areas of the graph become used, and therefore visible. This seems a very interesting way to create a global enterprise architecture. Model the system as a graph, allow each region to build out the graph, but in such a way as to allow other regions to use only what they need. Then allow the graph to be customized to the regions needs. If done correctly, the system will become a set of loose shared modules, with concrete implementation by each region. The regions decide how the modules are used and how they link. Of course, some linkages is defined. Regions may enhance existing modules, build new ones, or create region specific enhancements to existing.

Sunday, April 22, 2007

Equilibrium

I had a chat with a Rabbi the other day. He told me a story from his life. When he was a young man, he had trouble sleeping. He would sleep at most 4 hours a night. He was worried that he had a sleeping disorder, so he found a top doctor on sleeping disorders. The doctor had him keep track of the number of ours he slept every night for a month. At the end, the doctor identified that the Rabbi slept an average of 4 hours every night. Sometimes, 4:15, other times, 3:50, but on the average 4 hours. What the doctor told the Rabbi was that he was one of the lucky ones. Most people are in the middle, and require 7 hours of sleep. The Rabbi was an extreme exception on the far side of the curve, requiring only 4 hours. The Rabbi was lucky because he has 3 hours more a day than everyone else. This story is interesting in that in this day and age, in this country, the rabbi would be put on sleeping medication. I am pretty sure that a number of hours people sleep fits a bell curve. Most people are in the middle sleeping somewhere between 6 and 8 hours. But the tails of the curve expend in both directions; some, requiring more like 9 or 10, while others requiring less like 4 or 5. Now, the established medical principle of the day and age is to fit everyone into the middle with no tails. I see this in everything. For example, medical community preaches that cholesterol should be below 200. Now, what makes 200 a magic number that applies to the entire population regardless of background. I would imagine that cholesterol, like everything else, follows a bell curve. Most people's normal average is 200, but the tails of the curve, go out in both directions. Some have a high average cholesterol number, and that is considered normal for their bodies, while others, have a low average. It is very troubling that most things are being applied indiscriminately. We, as a society, are loosing the equilibrium in favor of the standard.

Monday, February 05, 2007

I usually stay away from such posts, but I can't resist. Check out these two sites:

http://www.zoho.com/
http://services.alphaworks.ibm.com/ManyEyes/

Saturday, February 03, 2007

A priori

A priori is a term that describes sequence of events, time. More specifically, development is a sequence of steps, events, that produce a desired result. The question that bothers me is why does it freaking take so long.

A colleague of mine was recently complaining that his users were upset that it takes his team a long time to develop seemingly simple functionality. Why does it take weeks to read some data, apply some business rules, send some messages, and produce a report.

The world of business tools can be thought of as a giant, ever increasing graveyard. The business tools are being continuously and artificially given life to. Like little Frankensteins, they roam the earth, used and abused by both users and developers, growing up, until being killed off and replaced with a younger Frankensteins that are doomed to the same fate.

Excel is the only tool that comes to mind that has escaped this fate. It allows the business user to solve his own problems. Unthinkable to a crack smoking code monkey. The user can load data, build his models, produce reports, export them out. The power is in the users hands. On the other side, the developer attempts to give the user exactly what the user asked for and nothing, and I mean nothing else. In fact, the majority of the time, the developer neither understands the business user nor the business nor the problem being solved.

I think the industry is starting to realize this and is attempting to shift the power back to the business user. For example, specs like BPEL and the hype surrounding web-services are all meant to give more power to the business user and reduce the turn-around time of development. I believe software will become less like software and more like legos. Individual pieces will still need to be built, but the business user is the one that will put the legos together to produce a result. Things like forms, business rules, reports, data loading, data extraction will go away. Instead, time will be spent on producing richer widgets to do more sophisticated things. Honestly, how many developers does it take to build a relatively large system that does a whole lot of variations of the 5 things mentioned above? 1, 2, 5, 7, 10, 40? How big is your team?

Friday, January 26, 2007

I've been away for a long time. For that, I am sorry. But, now I am back.

What's been on my mind lately is whether its possible to encode a business intention in an intermediary language and then build an interpreter to read this language. One system would encode an intention, the second system would evaluate it. Interesting, no? Perhaps, all this means is that system A sends a message to system B. System B reads the message and based on hard-coded business rules performs the work. But, let's say there are no hard-coded business rules. Let's say the message is the rules and the data. Would that be possible? What would this language look like. It would need to contain meta-data that could be evaluated and mapped to business rules. Let's step back a little. What's the point of this. System B is a specific system that does a specific thing. It should know what to do with the message without needing System A to tell it. A new trade message arrives, your system receives the trade. It knows its a new trade, because it says so on the message. What is the action, book the trade. So, your system dynamically looks up all the supported actions, and passes the data-set to that rule-set. Now, some of you are thinking, great, all this and he describes a bloody factory pattern. But wait, forget messages. It's an event. Something, some how raises an event that says there is a new action with the given payload. Some controller accepts the event and routes it to the appropriate implementation for that event, or perhaps a set of implementations, or even better triggers the work-flow. Now, we're getting somewhere. The event name maps to a business intention, which is specified as a work-flow. But, the work-flow is a generic concept. It's not real unless there is code behind it. So, we build a bunch of modularized code that does specific functions, we wire it together with dependency injection and have a dynamic work-flow define the execution path.

Tuesday, October 17, 2006

An Ode to a Slacker

I need to get around to writing this, and more in general, to finish some of the shit I start.

A week later:
Of course, this entry is a lot more complicated than a simple statement such as "get shit done." The issue is a delicate balance of life and work, and the venn diagram where they cross.

Projects start as fun little side things. You play around with them for a few hours, put some things together, and call it a day. Then, you get an email from the user saying, hey this is pretty cool, but it needs to be a lot more to be useful. Sure, you say, I'll add a few more lines of code. Unfortunately, now its not a small little project but a junior system. And not only that, but its a junior system which is poorly tested. You try to maintain the same schedule, but you realize that you can't add the kind of functionality that's needed, or maintain the level of quality necessary for a production system. You make mistakes, take shortcuts. Before you know it, your users are pretty angry. They are starting to question the whole thing. And frankly, so are you. You want to finish it. You desperately need to finish it. You've came this close, dedicated this much, but you realize that finishing will require even more.

This is an interesting struggle. The few lucky of us actually enjoy building things. So, this side little project may seem like work to some but is actually seen as basically a hobby. Unfortunately, some people are relying on your hobby, and that's when the pressure kicks in, and the problems start. On the other hand, unless you had a user who wanted something, you probably wouldn't have choose to build this particular thing as your hobby.

The other interesting observation is you are starting to see this project as something more. Maybe this project is the way out of the rat race. If it works, it could be your ticket. But its so much work you say.

How do you maintain the delicate balance? Is it even possible to maintain the balance? You're working with fixed items. There is a fixed amount of time. That amount is then reduced by constants such as actual work hours, sleeping, eating, showering, spending time with the family.
A week has 168 hours, 45 hours is spent at work, 49 hours is spent sleeping, 14 hours is spent eating, 4 hours is spent on toiletries. What remains is 54 hours to spend time with the family, work on the side projects, wash the dishes, do laundry, go to the movies, sleep in, watch tv, do the bills, etc... What ends up happening is you can probably take maybe 9 hours for the week - 1 per workday and 2 per weekend. Unfortunately, as everyone knows spending 1 hour programming is like watching ballet dancers do hip hop (it's not right). You can't accomplish anything major in 1 hour or even 2 hours. So you may start, but you tend to aim lower, and make a lot of mistakes in the process.

Wish me luck!

Thursday, August 24, 2006

Randomness

Here is an interesting question; if you know the past, can you guard against a similar event in the future? You know that the great depression took place. A lot of research has been done to understand what led to the great depression, and a lot of research has been done to understand how to get out of the great depression. In fact, the current chairman of the Federal Reserve is a specialist on the great depression. So, after all that, do you think it can happen again. With all this acquired knowledge, would we see it coming and be able to guard against it?

This question has been occupying me lately, and I am leaning towards a no. I don't think we'll see it coming. We may know how to guard against that specific event, but I am starting to believe that history never repeats itself. Events may seem similar, but there are infinite combinations of how they are triggered, how we react to those triggers, consequences, possibilities, and, of course, conclusions. If history never repeats, then studying history may not provide much value other than protecting us from that exact event.

My other opinion is that the world is getting more interconnected and more complicated. By this I mean that connections are forming that we may not realize exist or are even correlated. The world of the past will never happen again, and if the event of the past happens in the current world, the consequences will be quite different than before. Unfortunately, some other event may take place that may bring us the same type of devastation. Basically, my theory is that history can never repeat itself because the world is continuously changing.

There is some kind of random under-tone to the world. Some people call it luck, others misfortune. Let's say you trade stocks. You've read all there is about the company. You think you understand the fed, the government, the currency, etc... You believe very strongly in the financials of this company. You buy the stock. First it goes up, but then it drops like a rock. It seems that this company was dependent on the knowledge of a single engineer who was hit by the train. An unforeseen circumstance knocked you out of the market. What is that circumstance, is it randomness. Can you foresee it? Can you calculate its probability of occurrence? Do you understand its impact? I don't know, but it doesn't seem likely especially with our current understanding of probability. What is more likely is that we may get a safe feeling of security due to our acquired knowledge or perhaps our previous fortune, and this if nothing else will lead us to ruin.

The other item I wanted to cover was noise. This blog is noise. CNN is noise. In fact, a large part of the internet is noise. First off, the question is whether more information is noise or valuable artifacts. And if more information is noise, is it harmful? Does having more information actually increase your probability of making a wrong decision? Can you measure what information is valuable and what is noise. These statements seem very counterintuitive. What I am basically saying is that knowledge may actually be bad for you. Our brains seem to have adapted to this by actually reducing the large amounts of knowledge into manageable chunks. A lot of knowledge is simply forgotten, other knowledge gets reduced into some basic concepts and understandings. Does learning everything there is to know about a company such as all their news statements, their financial statements, statements made by their peers, etc... somehow takes away from the bigger picture?

If anyone out there has an answer, please, do write a comment.

Friday, July 07, 2006

Reflexive Theory

The word reflexive means to direct back on itself. Don't confuse this with reflection, which means careful consideration or self reflection, which means careful consideration of oneself.

Reflexive Theory was originally created by a Russian mathematician Vladimir Lefebvre who is now part of a US think tank dealing with terrorism. Reflexive Theory was born during the cold war in Russia as a response to game theory which was widely adapted by the West.

What brought this theory to my attention is an article by Jonathan David Farley, San Francisco Chronicle: The torturer's dilemma: the math on fire with fire which was published on the Econophysics blog. I started trying to get a bit more information on the Reflexive Theory - Wikipedia had nothing, Google came up short, in fact, the only reference I could find is a link from a Russian site to a very old publication - go to page 86 for the relevant paper. And even there, the theory is never defined but is just applied in a simplified mathematical model of border protection from terrorism. I am wondering whether the fact that there is almost no mention on the web of reflexive theory has anything to do with the founder of the theory now being employed by the United States Government. Of course this thought pattern is better pursued on a big brother paranoia blog.

Reflexive Theory tries to explain mathematically why individuals take certain actions and what the consequence of those actions are. The theory takes into consideration how individuals perceive themselves whether good or evil, and whether those perceptions are valid or not.

The interesting thing about reflexivity is that its derived from psychology. The term actually implies that "reality and identity are reflexive". One implies the other. What we perceive is how we view ourselves and what we believe is true. This is a very powerful statement. This means that our reality is based on what we know, which is derived from our perceptions, which are based on our reality. This is a bit tough to swallow, but stay with me a bit longer. The whole point is that our reality defines us and influences our actions. In order for us to get a better understanding of our actions, and the consequences, and make additional evolutionary leaps, we need to step outside of our reality and view our knowledge and actions from that medium. I wonder whether traveling across realities is simply an evolutionary step where we can let go of our reality and understand the possibility of another. Alright, this last sentence is something that belongs in a sci-fi book rather than a blog on technology.

O. K. This blog is about technology not philosophy nor psychology or even mathematical models of terrorism. I am still working on how to tie this with technology. It's do able, but a bit theoretic, so I'll leave it for future entries.

Tuesday, June 20, 2006

Scene 1

Scene 1:
The following conversation took place between two co-workers over an instant messaging product called Sparc.
The setting is a corporate office with many cubicles. The two co-workers have been asked to design
an enterprise, grid enabled architecture. They received a single 8 by 11 Visio diagram of the architecture.
They were also directed to leverage a FpML (financial markup language) as a messaging protocol between enterprise
systems.

Ben is an existentialist with a bend on fatalism. Mike is generally an optimist, unless Ben gets too fatalistic.

[4:45 PM] Ben: Boss wants to discuss FpMl
[4:52 PM] Mike: As long as we don't have to use it internally ....
[5:00 PM] Ben: I love it when the architecture is dictated from above. It makes designing so much easier.
[5:04 PM] Mike: And simpler, too. The whole system fits into a picture with a few boxes in it
[5:05 PM] Ben: it's pretty cool.
[5:05 PM] Ben: What's wrong with using FpML internally? You suck!
[5:06 PM] Ben: It's a nice intermediary format that allows system to communicate in a well defined language. Honestly Mike, what kind of an architect do you call yourself?
[5:06 PM] Mike: Hey, we've got enough on our plate writing our own database, our own operating system and our own programming language. I just don't need to have to use FpML as well as all that. I think it could jeopardize the entire project!
[5:08 PM] Ben: come on, that's crap. We can introduce this conversion in our custom db level, or even add it natively into our custom language.
[5:08 PM] Ben: think about it seamless integration with FpML, beautiful. Too bad FpML only covers derivatives, haha
[5:09 PM] Ben: I am sure we can work through that. We just need to work with the FpML working group to add a few parts to their spec.
[5:10 PM] Ben: 10-4?
[5:11 PM] Mike: Yeah, OK, it's taking me a while to think of a witty reply. 10-4
[5:11 PM] Ben: sorry to rush you, take your time. I just thought you were ignoring me because you are working.
[5:13 PM] Mike: Does FpML support images? In case we need to attach screenshots when we're reporting errors in market data?
[5:14 PM] Ben: Not yet, but I think we should bring this up when we discuss with them about expanding their specification to support other products.
[5:15 PM] Mike: I think whatever solution we go with, it's vital that we can scavenge unused cycles from people's mobile phones
[5:15 PM] Ben: and PDA's
[5:16 PM] Mike: and pacemakers
[5:16 PM] Ben: and watches
[5:16 PM] Mike: and elevators
[5:16 PM] Ben: maybe we can work something out where we can use the employee’s home machines.
[5:17 PM] Mike: Or people with Bluetooth devices in their briefcases as they wander past the building
[5:17 PM] Ben: good thinking, what about fax machines?
[5:17 PM] Mike: I wouldn't like to see any proposal signed off until we've really considered all these factors
[5:18 PM] Ben: I am glad at least you and me are on the same page.
[5:18 PM] Ben: We need to write up a document, someone will sign off, and then we can proceed with the development.
[5:19 PM] Ben: I think this conversation is sufficient as design.
[5:19 PM] Mike: Especially once we've deleted the Sparc logs
[5:19 PM] Ben: man, if someone has a sniffer, we're doomed.
[5:20 PM] Mike: That would be sad - especially as you introduced this product into CompanyX
[5:21 PM] Ben:
[5:21 PM] Mike: Do I have to 10-4 the smileys ?
[5:21 PM] Ben: no worries, I believe encryption is on.
[5:22 PM] Ben: no, don't worry about the smileys
[5:24 PM] Mike: I don't think we should restrict ourselves to FpML, either. We should have a meta-markup framework where we can just plug-in any standard that comes along - in case we need to support fPml or FPml or fPmL later on
[5:24 PM] Ben: I love it. Consider it added to the spec.
[5:24 PM] Mike: The spec which I hope is written in specML ?
[5:25 PM] Ben: should we look into whether we can leverage specML along side FpML as the messaging protocol?
[5:25 PM] Mike: Absolutely
[5:26 PM] Ben: http://www.mozilla.org/rhino/
[5:26 PM] Ben: I think we should use rhino tool to build out entire framework
[5:26 PM] Ben: think about it, we can release partial code, no reason to compile.
[5:27 PM] Mike: I've used Rhino before (indirectly) - it's built into JWebUnit
[5:27 PM] Ben: so, what do you think of using it as our core language?
[5:30 PM] Mike: Might be a bit low-level. I want something high-level that maps 4 boxes on a diagram into a fully built-out, productionised, resilient, performant, scaleable, internationalised system that runs on everything from a supercomputer to a Beowulf cluster to a digital watch.
[5:32 PM] Ben: do you have a copy of our entire conversation; I think it would make a wonderful blog entry.

Saturday, June 10, 2006

Commodity, Speed of Light, EAB

This post was actually going to be about technology commodity. I actually even wrote a part of it. I had all kinds of things in there - a definition of the word commodity from the wikipedia, a reference to Karl Marx, processing grid, service grid, data grid, etc... It was going to be a pretty good post before I erased it.

What the hell? Well, I had no point, I was just writing the obvious. Let me start over. Any enterprise architecture is going to be distributed, but that's not enough. Systems need to communicate and share data. Some systems may provide a service to other systems. Some systems may be in-charge of routing messages. Other systems may be in charge of doing calculations, other systems provide auxiliary services like calendar, or caching. The point is that a whole lot of systems are going to be communicating. In fact, in some cases, that communication will be very heavy and may become a liability. An Enterprise Architecture Bottleneck(EAB). You gotta love acronyms. They make everything sound so much more impressive.

In order to reduce EAB, your system will need to reduce the amount of data being transferred, figure out a faster transfer method, go faster than the speed of light, or all of the above. For the sake of simplicity, let's assume the last point is currently not feasible. For the second item, you can buy a bigger pipe, but you are still stuck to a certain latency. The cost to transfer a bit from NY to London will always be bound to the speed of light. So, can the system reduce the amount transferred. I think it's possible if the system is aware of the data patterns and can remove any unnecessary or redundant information. For example, let's say me and you are carrying a conversation. Certain things are obvious, other things can be deduced by you without me saying anything. For other things, I may only need to say a few things for you to understand much more, and in other cases, you may already know certain things, because I've already mentioned them. What does this all mean? The sending system will need to analize the data stream and learn to reduce the load. Of course, this assumes that the sender and receiver have agreed on some transfer protocol.

Well, this post is still pretty obvious, but maybe a bit more interesting than another tirade on processing grids.

Thursday, June 08, 2006

Business Process Modeling

I was discussing a system process with a business client, and during the conversation it hit me that the client has no idea what the business process is. At some points during the development of the system, a business specification was written, perhaps even by this same person, but overtime, the system evolved, and the process forgotten. Now, my business client has no idea what the system is actually doing. To him, the system is a black box, one big scary unknown. He has a general idea of what it does - it takes data from some place, does some filtering and calculation, populates a few tables, does some more calculations and filtering and eventually generates some results. The business analyst is then charged with the verification of the results - the irony is that the business analyst isn't completely sure of how the number was generated. He has a certain level of trust, but overtime that trust erodes and he is left feeling uneasy and unsure.

Later that day, I sat in a meeting where a discussion was taking place about a new business system. It was a usual meeting, many people spoke about many things, most of the conversations strayed far from the actual deliverable or even reality, nothing was agreed, a few people tried to cock fight each other, and, overall, the meeting reversed progress - usual meeting. Anyways, during the meeting, specifically during one of the cock fights, one of the analysts spoke up and said something very profound and interesting: this was followed by a cock block, but that's a different story. So this interesting statement was that he believed that quality analysis testing should not end when the system goes to production but should be an ongoing part of the system. He believed that it was important to have a stable validation layer in the system in order to provide basic sanity checks that the system is performing as expected in an endless parade of changing data. My team-members rose up in anger against him, some claimed he was a heretic, others threatened ex-communication. I sat silent listening and wondering.

Each system is basically a workflow. Once you remove some of the techy parts, you end up with a business process. In fact, at some point this system was a very clean visio diagram. Each box was then blown up into a class diagram, and then some crack smoking code monkey (developer) defecated all over it - an enterprise system is born. This workflow is then overlaid with data. The workflow reacts differently to different pieces of data, but its still functionally a flow - actually more of a graph. The graph is a mix of generalized technical aspects and business logic. The problem these days is that the business logic is sprinkled all over the system making it very hard to re-create exactly what happened.

So, I wonder if it would be possible to overlay an actual system with a meta-system. Would it be possible to create a set of, let's say annotations, to add along side code and possible some additional hooks to allow another system to walk the system code to generate the graph, and overlay the graph with the business documentation - sprinkled throughout the code. The end result can be a self-documenting system. No, I am not talking about javadoc, or external specification. I am talking about a tool for the business user to verify what a given system is doing. Because the documentation and the code are living side by side, perhaps are even the same thing, the business user can be confident in what they are seeing.

The second part is that a lot of data centric systems live and die by the data they are receiving. Garbage in, garbage out they say. Well, I am not quite sure this statement needs to be true. After a long deep thought, I agreed with the business analyst and took a stand to support him. I think he is right, QA should not end once the system is in production. Each system should be built to be in a constant state of testing itself. The point isn't to test the code, the point is to test the data. The data is the most important thing. As developers and architects we treat data as a second citizen. What comes in to the system should be checked, what happens in the system should be checked, and what comes out of the system should be checked. It would help if the checks are a hypothesis test. The analyst proposed having a parallel testing dataset. He figured that a constant check against a constant may provide a basic sanity check or at least maybe raise some red flags if the data is too far from the norm. Of course, this type of test is context specific, but I think the basic principle has value. Data isn't just data, it's the most important thing. When the business analyst receives the end result, and the end result is wrong, the analyst spends hours trying to narrow down what went wrong. Sometimes the problem is the inputs, sometimes the problem is the business logic, other times, he just doesn't know.

I wanted to get this post out, but overall I am still thinking through a lot of these concepts. I think there is something conceptually there, but its a bit foggy.

Tuesday, May 23, 2006

Robo Blogger

I recently had a discussion with an associate of mine, let's call him Philip. I made a wager with Philip. I bet him that if I wrote a robo-blogger, his blog will be more popular than mine. The robo-blogger is going to subscribe to all the popular blogs. He will summarize the blogs, and then comment on them. The comments will range from outright bashing to a more supportive tone. The bashing will also occur using different regional dialects such as Australian. For example, this blog is just piss-farting around - translation: this blog is just wasting time. Another example, this program is so cactus - translation: this program is just wrong. For more: http://en.wikipedia.org/wiki/Australian_words

I believe the only way I can loose this bet is if I don't build it. Watch out Philip, YOU ARE GOING DOWN!!!!

Sunday, April 23, 2006

Ontology

Ontology is "...a systematic arrangement of all of the important categories of objects or concepts which exist in some field of discourse, showing the relations between them. When complete, an ontology is a categorization of all of the concepts in some field of knowledge, including the objects and all of the properties, relations, and functions needed to define the objects and specify their actions."

So, an ontology is a language to represent all the objects in your field, along with their properties, and relationships. For example, a trade contains an instrument. A trade contains a price. An option instrument is a type of instrument. An option instrument contains an underlying instrument. An index instrument is a type of instrument. An index instrument consists of sub instruments. etc... I am thinking that in theory, you could represent an entire industry or a body of knowledge using an ontology. In theory, once you have a representational space, and given a new instance of an object, you could apply the object to the ontology; find your place, and then be able to walk the ontology to figure out all the possible causes and effects.

There has been some work done. W3C came out with a language specification - owl. Also, see the links below for more references.

Links
http://jena.sourceforge.net/
http://sweetrules.projects.semwebcentral.org
http://protege.stanford.edu/

Saturday, April 01, 2006

Interesting Links - need to organize

More to follow...
http://opennlp.sourceforge.net/projects.html

http://wordnet.princeton.edu/

http://weta-group.net/

http://www.cs.waikato.ac.nz/ml/weka/

Friday, March 31, 2006

Metaverse continued ...

I've recently came upon this site: http://www.youos.com/

and this site: http://www.ajaxwrite.com/

and I've found these to be absolutely exciting. The YouOs is absolutely great. They redefined the concept of an operating system.

I think what needs to happen is for applications to break out of the browser. YouOs, and AjaxWrite are a beginning of this. What is the point of the browser. It is a very heavy and limiting medium. What if you created a browser wrapper, something that lives neither here nor there. With Ajax taking over, it is possible to move a piece of the website to your desktop and work with that chunk as if it was a real application running on your PC. The whole browser is only confusing the matter. It's really not needed. A browser is nothing but an interpreter in an interpreted language. The only difference is that, right now, the browser is also imposing a look and feel and is constraining the interaction between the user and the service.

Imagine a powerful application like the AjaxWrite. Now remove the browser, and create a link on your current desktop that instead of opening a browser just runs the app. The app decides the look and feel, etc... I know there are security concerns, but for the sake of progress I would rather ignore them for the moment. Now, you have a part of your desktop running a web-based application. The whole thing is running on some other server, and you are simply interacting with it. The difference is that you have a seemless integration with your environment.

At the moment, there is such a clear separation between desktop and web. In my opinion, they really are the same thing. What is the difference between running something locally and running something remotely and only bringing back the display. Both systems react to the user events, the only difference, is that the desktop system is bound to your machine.