Monday, December 22, 2008

Lehman Brothers

The last 4 months have been a crazy whirlwind. I was an employee of Lehman Brothers during it's demise. For those who know, that weekend went fast, without a single doubt, each of us expected a normal resolution; maybe a industry consortium bailout, maybe the Fed will extend a loan, maybe a last minute buyer. None of us expected a bankruptcy filing. That Monday felt like being shell shocked. Everything you worked for, all your hard work, gone, abandoned. All those meetings about tiny little minutia's of detail. All that time, coming up with the perfect design, the clean integration, the cleanest build file, the good interface - puff.

Some of the guys have worked their whole lives at the Firm. Think about it, you spend the last 20 years of your life building something. Sometimes you work late, sometimes even on weekends. You meet tough deadlines, you build out the system, you have projects, and Jira items, and future ideas for improvements, and then one Sunday evening, it all disappears in a single 20 second news-clip.

What can I say about that Monday, a lot of guys were afraid. Single earners, mortgage, kids, car payments, private school, dance lessons, ... It was never a problem before, the Firm was known for it's bonuses. The irony of having more money, is that your expenses rise proportionately. Some of the guys were hit very hard. A lot of the bonus is paid in Lehman Stock, part of the 401K is in Lehman, the firm also promoted personal investing in the stock. All in all, if you were with the firm for 20 years, you have a major percent of your worth tied with the company, and then it's gone.

The next few weeks gyrated between moments of team comradery and private lonely introspection. Every person came to work on time and stayed for the full day. Most of the guys, myself included, kept to the Lehman dress code of full business attire. I think it was a out of respect for the Firm, or perhaps, just routine.

Barclays purchase of the firm provided a rare glimmer of hope, a possibility of normality. It wasn't so much as having your job saved, which is of course very important, as a possibility of saving what you've built. Someone using what you've worked so hard to create. As time dragged on, it became clear that Barclays had no intention of taking our system, so, a glimmer of hope slowly changed into despair with a dash of anger. Barclays, of course, wound up laying off most of us. The system I've worked on was trashed.

Today, I am at another company, assigned the task of comparing Lehman system's to my new employer systems. This is how Barclay's employees must have felt when they found out their company was buying Lehman. It's us or them. And, so I do what is requested of me. I criticize and destroy the very systems I promoted, just a few months ago.

I have the deepest respect for the Firm. I have never worked for a company like that. They really tried to create something more then the sum of themselves. I shall miss it.

Friday, October 24, 2008

Intentional Software

After working on the Drools Flex Editor, I started thinking a lot more of what it would take for it to be usable by non-developers, i.e. domain experts. This narrative of thought brought me to heavyweights such as Charles Simonyi, founder of Intentional Software and
Gregor Kiczales, founder of AOP, latest paper. It also led me to a lot of other concepts such as the law of leaky abstractions, history of software, Lagom Process, along with the more obtuse concepts such as the omega number, and people like Gregory Chaitin.

And of course, what would our industry be without it's acronyms: MDA, DSL, BPM, UML, DDD, SOA. There are plenty of others, but I think I've made my point.

That's a lot of information to digest especially if you looked up Chaitin, which would have led you to our Founding Fathers: Leibniz , Turing, and Godel. Now ask your self: what kind of a thought narrative can take a person from Adobe Flex to Godel. It would almost be funny.

Now, let's step back for a moment. A lot of people argue that software development should be reduced to visual tools. A counter argument can be that mathematicians do not use visual tools to draw up their equations. They use blackboards and chalk, the most primitive of tools. Excel has been praised as the most successful intent based system. But, if you look at it, it's not visual at all. At best, it's a basic grid, with cell co-ordinates, and a blank text input box allowing manipulation of cells. Another argument is that business people are somehow not smart enough to program. It takes a special kind of mind to generate code. The fallacy with this statement is that business people already code, just not in the typical "tech" way, but rather in their own domain. Their interaction with excel, the domain expertise, the manipulation of that domain expertise - can all be considered coding. They manipulate their symbols to achieve their goals. The only thing separating technology developers and domain experts is which domain they are experts in.

There was a quote from Charles Simonyi that went something like this: If we don't expect business people to learn how to code, why do we expect coders to learn the business. Each path is extremely inefficient and is rife problems. So, instead, let's allow each group to focus on what they do best. Developers should stick to technology, business people should stick to business.

So, we have established that a business person is capable of performing some form of "development" to encode their domain expertise into a set of steps, "their intent". It is also probably safe to assume that a business person understands concepts like if-then-else, for-each and standard algebra. It is also safe to assume that they know nothing of JSP, Servlets, JMS, EJB, transactions, XA, JDBC, SQL, Java, Class, public/private, encapsulation, polymorphism, design pattern, singleton, facade, heap, stack, binary search tree, NP-complete, and on, and on, and on, .... So, where does this leave us? I think it means that software development stops being the pure domain of developers, and instead is split between developers and business people.

If we look at a typical business system, we can see that it has inputs(JMS, GUI, etc...), a concrete data representation model in the form of a database schema, complex output in the form of varied reports, processes that criss-cross the system that have some triggers such as external events (JMS, Schedule, User, etc...) . There is also business logic in the form of calculations, business steps, if-blocks, etc... sprinkled through the system. Some of it lives embedded in the report logic, others in the processes, and some, perhaps, even implicit in the data storage or data format.

I think we can start to take steps to separate the domains. Process flows attempt to separate the process logic from the system logic. Web Services attempts to expose the individual services and by so reduce the hard linking between services. Business Intelligence is attempting to expose the data to the users and allow ad-hoc manipulation. Proliferation of domain specific languages, online compilers, rule engines is a sign of the desire to separate the system from the business rules. Hibernate, JDO, etc... are attempting to isolate the system from the underlying data stores, and map out the data definitions. Ontology's are attempting to bridge the interaction of a human defined relationships and a system. Mashups - i.e. http://www.programmableweb.com/, Yahoo Pipes, are yet more examples of technical concepts being exposed to non-technical people. All these things, in my opinion, are converging on the same topic of intentional programming.

Tuesday, October 21, 2008

Alpha Release of Drools Flex Editor - 0.5

Orange Mile is proud to announce the long awaited alpha release of a Drools Rule Editor in Flex.

http://code.google.com/p/drools-flex-editor/

The current release includes all the Flex pieces without the rather basic server side code for rule compilation, and code completion. This will be available in the final 1.0 release.

This is the beginning of having rich enough components available within the system that can allow the user/admin to directly manipulate the business rules without having the long development cycle.

Orange Mile is Expanding

We are proud to welcome Venu to the Orange Mile Team. Venu will also act as a contributing author to the blog.

Although, Orange Mile started as a small math with a single avout, it has since expanded, grown and matured.

We have become a devout following. At times, looked down upon by the saecular world, but always, pursuing our dreams.

Entry written in the style of Anathem.

Thursday, October 16, 2008

Orange Mile Security Release - 1.1

Orange Mile is proud to release version 1.1 of Dynamic Rule Based Security.

The new features include:
1. A complete example based on Spring Security - see orangemile-security-test.war
2. isGranted JSTL Tag

http://code.google.com/p/dynamic-rule-security

Friday, October 10, 2008

Doom and Gloom and the Economy

Although, this is a tech blog, I strongly believe that a good developer must be an Erudite. Given this, I feel justified in writing about the economy in this blog.

We are living through historic times. Mid last year, I started getting very scared and started blogging about randomness and the economy.
http://orangemile.blogspot.com/2007/05/black-swan.html
http://orangemile.blogspot.com/2007/07/supply-of-money.html

You will notice that in the Supply of Money entry, I actually wrote about the likely hood of the collapse of world economy due to the unsustainable supply of fiat money.

What's interesting about those entries and that time period is why did my mind shift to the arcane topics of money supplies, carry trades, fiat money, when the blog entries before and after clearly deal with the arcane topics of technology. Perhaps, my subconscious started to pick up on the feelings of uneasiness in the global market; hiccups if you will. I can't possibly attribute those entries to knowledge, because I am simply not qualified to speak of money supplies, fiat currency, and carry trades.

So, what is happening today is a global loss of confidence. What is interesting is that companies, specifically, banks are hoarding cash rather then people. I would argue that if people started hoarding cash than we're all doomed. The global economy would come to a scretching halt. Chinese economy will collapse, probably throwing that country into either Marshall law or revolution. America and Europe will fall into severe and prolong depression taking the rest of the civilized world with it. Africa will fall into an even lower level of sustainability with probably wide ranging civil wars due to lack of food and an acute demand for natural resources such as diamonds, gold, and oil. India's economy will also take a severe beating, but I think they will remain a loose democracy. If they position themselves well, they may end up being the next superpower.

Right now, the US government is printing money at an ever faster clip, giving it away, almost for free, and nationalizing large areas of the financial industry. Money is flooding the global economy. What's interesting is that we are actually in the period where money is actually disappearing. As the perceived value of assets fall, money disappears. The US Government then tries to fill in the gap of lost money, by providing more money to the institutions whose wealth disappeared; hoping against all hope that the newly provided money will be used to create more money by the institutions. Let's recap how money gets created. A person decided to do something on credit, let's say by a house. They go to the bank and say give me 300k to buy a house. The banks gives you the money, and you go buy a house. The thing is that the 300k is actually some other depositor's money. Money the bank doesn't actually have. What's happening now is that the bank thought it had 300k loan asset. But instead, the 300k is really only a 200k asset. This means that if you default, the bank looses 100k of someone Else's money. If enough loans do this, the bank won't be able to cover the loses, confidence in the bank erodes, people start to retrieve their deposits, and of course, after some interval, the bank simply runs out of money to give out to depositors. This is why the FDIC was created. This is a standard Ponzi scheme. In other words, if the asset side of the bank balance sheet starts to reduce, they will reduce the amount of new loans they can give out, and by so, reduce lending, which will probably drive the interest rates up because there are less institutions lending. This actually means the opposite of what I said earlier, money isn't destroyed, the rate of creation just reduces. The bank industry is structured as a very calibrated entity, with a minor hiccup in cash flows or perceived cash flows destabilizing the entire industry. The Fed is trying to erase the perceived losses from the Bank's balance sheet, and by so, start up the loan process. Another interesting thing is how many industries rely on having a continuous supply of new loans. Imagine a ponzi scheme applied to the car industry. If Ford borrows money to pay it workers, hoping that in the future it will sell enough cars to pay back the loan, except it doesn't, so it borrows more to keep going. At some point it actually needs to borrow from Person A to pay Person B, and on and on. This should end at some point when there is no-one else willing to lend to the said company. But, unfortunately, for most companies there is always someone willing to lend. This is partly due to the obscurity/opaqueness of the financial industry. Now, our current scenario, where the said company can't get a loan not because of the financial condition of the company, but because of the financial condition of the lender; all lenders.

What the Fed is trying to do now is fight the deflationary path. Money is becoming a scares resource, not because there is not enough of it, but because banks are hoarding it. A lot of people are also talking about hyper-inflationary model. I don't see this happening, even if I believed it for awhile. In a hyper-inflationary model, you have too much money. This is unlikely because the Fed can always mop up the money supply; and because the world is dollar denominated, at least for the foreseeable future. I think what is more likely is nationalization of a number of areas, and banks, a significant increase of money available to the banks, guarantee of bank assets, forced reduction of inter-bank rate, and a forced reduction of the interest rate payed by home owners.

Interesting reads:
http://en.wikipedia.org/wiki/Fractional_reserve_banking
http://en.wikipedia.org/wiki/Credit_default_swap

There is another animal that hasn't received much news: credit default swaps. This instrument is a form of insurance against default. The problem is that this is not based on anything, and the current amount of outstanding CDS is a few times larger than all money combined ever produced. This means that the government needs to do its darnest to make sure those CDS contracts never come due, because if they do, all financial institutions will file for bankruptcy, governments will default, end of the world, etc....

So, what is the government to do. Money must be made cheaper to a point where it is basically free. This will allow the banks to start to give out loans, this would spur the market for re-financing, which should save some borrowers. At the same time, the government will probably start with the regulation. We will see a period of a slow down - recession, in which the disaster of the day, will become a distant memory, and we will start up with the next bubble, maybe energy, maybe the housing sector again to a lesser form, although, this will probably be regulated to the gills. My guess energy or commodities. But the bubble won't start until the people regain confidence, which will take a few years.

I repeat, the Fed must make money cheap. After a recovery, they will attempt to make money more expensive again to starve off another bubble, but they will tread very lightly to avoid any more panic. This means that rates will stay cheap, or only very gradually over a long interval will start to go up.

Of course, there is another animal in this picture: US treasury bonds backed by our taxes. I don't fully understand this animal and it's relationship to the money supply, but hopefully, in the next few blog entries...

Tuesday, October 07, 2008

Release of Microsoft Analysis Services 2005 Automation SDK

I am proud to announce the long awaited release of the Microsoft Analysis Services 2005 Automation SDK.

http://code.google.com/p/mssas-automation/

The library allows a java developer to automate the creation and modification of a MSSAS 2005 cube. The design consists of codifying most of the XMLA 1.1 specification into java pojo's via Jibx binding framework. On top of this core library, it then becomes trivial to codify specific design patterns or utilities to automate or speed up the creation/modification of a cube.

Thursday, October 02, 2008

How not to be a turkey - a dead turkey!

The idea is to build a little web app that will scan the common news sources nightly, and compile a score for different words on how negative or positive the topic is described. For example, regarding the economy, the system should pick up speeches from the Fed, congress discussions, etc... The idea behind all this is from the Black Swan Book. The theory goes that the night before Thanksgiving, the turkey should have the highest confidence in the goodness of humans.

To achieve this, I will need an NLP mood analyzer, or in other words, Sentiment Analysis. Some open source tools to accomplish this are:

NPL Libraries:

Knowledge Understanding

News Sources

Ekman's research on universal facial expressions
[happy, sad, anger, fear, disgust, surprise]

Frustration – Repetition of low-magnitude anger
Relief – Fear followed by happy
Horror – Sudden high-magnitude fear
Contentment – Persistent low-level happy

Tuesday, September 23, 2008

Informatica PowerCenter Automation SDK 1.0.0 Released!

I am proud to announce the release of Informatica PowerCenter Automation SDK; brought to you by the good people at Orange Mile, Inc.

http://code.google.com/p/informatica-powercenter-automation

The library allows the automation of repetitive patterns when creating or changing Informatica PowerCenter Mappings.

With the proliferation of development abstraction platforms like Informatica Power Center, Tibco Business Works, BPM, rule engines, etc... it becomes more and more possible to build automation software to automate the development abstraction software. In other words, these tools provide a meta definition to build certain services. In the case of Informatica, those meta definitions are geared towards ETL tasks. The automation software is then able to manipulate the meta pieces to automatically generate those services. In other words, you have a system that knows how to manage the meta service pieces.

Monday, September 22, 2008

Dynamic Rule Security is Released!!!

I would like to announce the release of the 1.0 version of Dynamic Rule Security brought to you by the good people at Orange Mile, Inc.

http://code.google.com/p/dynamic-rule-security/

I think this release may very well revolutionize the way application level security is handled. Although, the first release is somewhat simplistic in how it manages the rules, I believe it will serve a large majority of the systems out there.

The next major release will focus on expanding the rule management, adding tag libraries, and adding support for direct instantiation.

Thursday, July 24, 2008

Let me count the ways... I HATE Spring Security ACL

Elizabeth Barrett Browning wrote a Love Poem "How do I love thee? Let me count the ways..." It's a lovely poem, but I am forced to use the same core phrase in a very negative connotation, and applying to technology, which "some" believe has no soul. More on that in a different post.

Now, Let me count the ways in which I hate the Spring Security Acl implementation. In any other setting, I would have written this off as some poor wanking by some poor wanker, but unfortunately, in my prior post, I've vowed to add property based security via a rule engine as an add-on for Spring Security. What I failed to realize at that writing is that Spring Security seems to be split into 2 sections. The core security, which has things like app server plugins, role, and principle management, etc... This section seems rather decent enough. Perhaps, a bit configuration heavy, but hey, that's Spring for ya. Now, this other section, the Acl section is a complete and outer fuckup. The irony is that this is a re-write of an even worse implementation.

Now, listen you Spring theists:
Why create an ObjectIdentity interface that wraps a serializable identifier, and then implement a ObjectIdentityImpl, only to cast the serializable identifier to a Long in both the BasicLookupStrategy, and the JdbcMutableAclService. As a side note, keep with the fucking naming convention. If you're going to call all the db accessors with Jdbc, then why name the jdbc lookup class BascLookupStrategy? And oh yeah, what's the point of the LookupStrategy pattern considering that you already have a lookup strategy pattern called MutableAclService, which has a Jdbc Accessor called JdbcMutableAclService?

So, even if I extend the ObjectIdentity and add support for property management, the implementation will go to hell, if someone decides to use any of the persistence classes. Oh, almost forgot, for all the bloody abstraction and interfaces, the JdbcLookupStrategy accepts an ObjectIdentity, yet, performs a direct instantiation for ObjectIdentityImpl, with a Long as a serializable id. So, there goes the ability to extend the class, or define anything but a long as an identifier. So, what's the point of creating the ObjectIdentity interface? And, what's the point of making the identifier serializable?

Ah, there is support for an Acl tree via parent/child Acl. I could create a parent Acl to represent the object, and then subsequent children for each of the properties, ah, but the damn ObjectIdentity cast to a long kills that as well.

What would be quite nice is to add property level support directly to the Access Control Entry. Of course, there is an interface, and an implementation, and supporting classes that require the implementation, making another useless interface. What's needed here is a factory pattern.

I am sorry I am angry. I've been reading Buddhist books lately, and they teach you to channel your anger, understand it's source, manage your emotions, so as to balance the negative and positive of Karma. The problem is that all this is going to force me to break from the Acl implementation in Spring, which would mean yet another Acl implementation with a subset feature set. Spring, for all it's problems, seems to provide a large feature set, and if at all possible, I prefer to enhance rather than replace.

Ok, back to Spring Security Acl bashing. The Acl interface and the AclImpl class are capable of encompassing the entire Sid structure. So, if I have 10k users, than, my poor little Acl class will start to look like an ACL cache rather than a simple pojo it was meant to be. What the ACL object should be is a representation of an object, which has properties, and is an instance of security for a single Sid. I highly disagree that a Single Acl needs to start supporting multiple Sids. Granted your approach is more flexible, but flexible to a point that there will be a single ACL class in the system, with a large array of all permissions. Acl is not a cache, it's a simple wrapper around what a single user/principle/granted authority has access to for the given object. The ACL Entry is actually supposed to be a wrapper around a property and a permission mask. That's the whole point of having a permission mask. A mask is an int, which means that you have a single integer (all those bits) that represent all the possible access control rights for a single property of a single object. The beauty of adding property support is that you're no longer limited to a 31 possible permissions, but rather unlimited, with a limit of 31 per property of an object. This means that you can conceivably have different rights per object attribute. And we all know that some objects have a lot more than 32 attributes. So, if you just wrapped the Permission mask in an ACL Entry class, then, what was the point of an ACL Entry class. You could simple collapse the whole structure into the ACL class and be done with it.

Deep breaths, I was reading another blog, which was talking about another blog that mentioned that "Every time you use Acegi... A fairy dies." My daughter love's fairy's.

Saturday, July 19, 2008

Drools + Spring Security + Annotations + AOP= ?

I am starting a new open source project:

http://code.google.com/p/dynamic-rule-security/

No code has been released yet, but I am hoping to have an alpha version out soon. The project integrates Drools Rule Engine with Spring Security to provide dynamic, rule based, field level ACL security to a system.

Once complete, the system administrator will be able to create business rules to restrict fields, objects, pages, content, whatever based on dynamic rules. But, that's not all. The current crop of security requires the security logic to be embedded with the code and is quite brittle and complex when security rules become very granular. For example, imagine having to implement a requirement that says when a trade belongs to account "abc" hide the trade from anyone not in group "abc-allowed". No problem, you say. You create the security group "abc-allowed". Now you have some choices regarding implementation, you can integrate the rule at the data retrieval layer, at the presentation tier, or in the middle. Either way, somewhere in your system, you'll have a chunk of code like this: if ( trade.account == "abc" && !isUserInRole("abc-allowed") ) then hide.

That was easy. Probably only took 10 minutes to write, 10 minutes to test, and a few days to get it deployed to production. No problem.

A few days go by, and the user comes back and says, I need to expand that security. It seems that group efg can actually see abc account trades but only when the trading amount is less than $50m. Ok, you say. A bit messy, but do-able. So, you create security group "efg-allowed", and change your prior rule to say:
if ( trade.account == "abc" && (!isUserInRole("abc-allowed") && ( trade.amount > 50 && !(isUserInRole("efg-allowed") ) then hide.

Probably only took 10 minutes to code, and another 10 minutes to test, but damn there is QA, UAT, production release. A few days later, you finally release the new feature.
Aren't you glad that's over. A few more days go by, and the user says, wait, he forgot that the efg group can't change the trader name on the trade, and can't see the counterparty, but should be able to see and change everything else. Oh, one more thing, they can change the trader name if the trader is "Jack", because trader Jack's accounts are actually managed by the efg group even if the account belongs to the "abc" group.

Crap you say, that's going to be a bit of work. You may need to change the presentation tier, to hide the fields in some cases, but not others. And boy, how much does it suck to hard code the trader's name somewhere.

Anyways, you get the point. Security Rules may get very complex and very specific to the data they interact with and the context of the request. This means that the rule needs to be aware of the data, and who is requesting it. The rule is then capable of setting the security ACL. The presentation tier then only needs to worry about following the ACL rather than actually dealing with the security rules themselves. Not only that, but security rules will be in a single place rather than being sprinkled throughout the system. You can also change them on the fly allowing you to react very quickly to additional security requests.

How to retrieve the fields used in a Drools Rule (DRL)

Sometimes it maybe useful to know what fields the rule-set relies on. For example, let's imagine you have a freaky dynamic system that's able to populate beans with only the data needed. The problem then becomes how do you know what data is needed by your vast set of dynamic rules.

One way to do this is to assume that you're dealing with standard pojo's. This means that each variable is private and has an associated getVar and setVar method. Drools currently supports their own language, DRL, java (backed by Janino compiler), and MVEL. I will present how to retrieve the fields form DRL and Java. I am sure the same principles can be applied to MVEL.

First, your pojo:

package com.orangemile.ruleengine;

public class Trade {
private String traderName;
private double amount;
private String currency;
public String getTraderName() {
return traderName;
}
public void setTraderName(String traderName) {
this.traderName = traderName;
}
public double getAmount() {
return amount;
}
public void setAmount(double amount) {
this.amount = amount;
}
public String getCurrency() {
return currency;
}
public void setCurrency(String currency) {
this.currency = currency;
}
}


Now the magic:


package com.orangemile.ruleengine;

import java.io.StringReader;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;

import org.codehaus.janino.Java;
import org.codehaus.janino.Parser;
import org.codehaus.janino.Scanner;
import org.codehaus.janino.Java.MethodInvocation;
import org.codehaus.janino.util.Traverser;
import org.drools.compiler.DrlParser;
import org.drools.lang.DrlDumper;
import org.drools.lang.descr.EvalDescr;
import org.drools.lang.descr.FieldConstraintDescr;
import org.drools.lang.descr.ImportDescr;
import org.drools.lang.descr.PackageDescr;
import org.drools.lang.descr.PatternDescr;
import org.drools.lang.descr.RuleDescr;

/**
* @author OrangeMile, Inc
*/
public class DRLFieldExtractor extends DrlDumper {

private PackageDescr packageDescr;
private Map variableNameToEntryMap = new HashMap();
private List entries = new ArrayList();
private Entry currentEntry;

public Collection getEntries() {
return entries;
}

/**
* Main Entry point - to retrieve the fields call getEntries()
*/
public String dump( String str ) {
try {
DrlParser parser = new DrlParser();
PackageDescr packageDescr = parser.parse(new StringReader(str));
String ruleText = dump( packageDescr );
return ruleText;
} catch ( Exception e ){
throw new RuntimeException(e);
}
}

/**
* Main Entry point - to retrieve the fields call getEntries()
*/
@Override
public synchronized String dump(PackageDescr packageDescr) {
this.packageDescr = packageDescr;
String ruleText = super.dump(packageDescr);
List rules = (List) packageDescr.getRules();
for ( RuleDescr rule : rules ) {
evalJava( (String) rule.getConsequence() );
}
return ruleText;
}

/**
* Parses the eval statement
*/
@Override
public void visitEvalDescr(EvalDescr descr) {
evalJava( (String) descr.getContent() );
super.visitEvalDescr(descr);
}

/**
* Retrieves the variable bindings from DRL
*/
@Override
public void visitPatternDescr(PatternDescr descr) {
currentEntry = new Entry();
currentEntry.classType = descr.getObjectType();
currentEntry.variableName = descr.getIdentifier();
variableNameToEntryMap.put(currentEntry.variableName, currentEntry);
entries.add( currentEntry );
super.visitPatternDescr(descr);
}

/**
* Retrieves the field names used in the DRL
*/
@Override
public void visitFieldConstraintDescr(FieldConstraintDescr descr) {
currentEntry.fields.add( descr.getFieldName() );
super.visitFieldConstraintDescr(descr);
}

/**
* Parses out the fields from a chunk of java code
* @param code
*/
@SuppressWarnings("unchecked")
private void evalJava(String code) {
try {
StringBuilder java = new StringBuilder();
List imports = (List) packageDescr.getImports();
for ( ImportDescr i : imports ) {
java.append(" import ").append( i.getTarget() ).append("; ");
}
java.append("public class Test { ");
java.append(" static {");
for ( Entry e : variableNameToEntryMap.values() ) {
java.append( e.classType ).append(" ").append( e.variableName ).append(" = null; ");
}
java.append(code).append("; } ");
java.append("}");
Traverser traverser = new Traverser() {
@Override
public void traverseMethodInvocation(MethodInvocation mi) {
if ((mi.arguments != null && mi.arguments.length > 0)
|| !mi.methodName.startsWith("get") || mi.optionalTarget == null) {
super.traverseMethodInvocation(mi);
}
Entry entry = variableNameToEntryMap.get(mi.optionalTarget.toString());
if ( entry != null ) {
String fieldName = mi.methodName.substring("get".length());
fieldName = Character.toLowerCase(fieldName.charAt(0)) + fieldName.substring(1);
entry.fields.add( fieldName );
}
super.traverseMethodInvocation(mi);
}
};
System.out.println( java );
StringReader reader = new StringReader(java.toString());
Parser parser = new Parser(new Scanner(null, reader));
Java.CompilationUnit cu = parser.parseCompilationUnit();
traverser.traverseCompilationUnit(cu);
} catch (Exception e) {
throw new RuntimeException(e);
}
}


/**
* Utility storage class
*/
public static class Entry {
public String variableName;
public String classType;
public HashSet fields = new HashSet();

public String toString() {
return "[variableName: " + variableName + ", classType: " + classType + ", fields: " + fields + "]";
}
}
}



And now, how to run it:


public static void main( String args [] ) {
String rule = "package com.orangemile.ruleengine;" +
" import com.orangemile.ruleengine.*; " +
" rule \"test rule\" " +
" when " +
" trade : Trade( amount > 5 ) " +
" then " +
" System.out.println( trade.getTraderName() ); " +
" end ";

DRLFieldExtractor e = new DRLFieldExtractor();
e.dump(rule);
System.out.println( e.getEntries() );
}



The basic principle is that the code relies on the AST tree that's produced by DRL and Janino. In the case of Janino walk, the code only looks for method calls that have a target, start with a "get", and take no variables. In the cast of DRL, the API is helpful enough in providing callbacks when a variable declaration and field is hit, making the code trivial.

That's it. Hope this helps someone.

Wednesday, July 16, 2008

Drools - Fact Template Example

Jboss Rule Engine ( Drools ) primarily works based on an object model. In order to define a rule and have it compile, the data referenced in the rule needs to exist somewhere in the classpath. This is easy enough to accomplish by using any of the dynamic libraries such as asm, cglib, or antlr. Once your class is defined, you can either inject your own implementation of a Classloader or change the permission on the system classloader and call defineClass manually.

But, there is another way, which is a bit over simplistic, but maybe useful for some of you out there. Drools has introduced support for fact templates, which is a concept introduced by Clips. A fact template is a basically a definition of a flat class:


template "Trade"
String tradeId
Double amount
String cusip
String traderName
end


This template can then be naturally used in the when part of a rule:

rule "test rule"
when
$trade : Trade(tradeId == 5 )
then
System.out.println( trade.getFieldValue("traderName") );
end


But, there is a cleaner way to do all of this using the MVEL dialect introduced in Drools 4.0.
You can code your own Fact implementation that's backed by a Map.

package app.java.com.orangemile.ruleengine;

import java.util.HashMap;
import java.util.concurrent.atomic.AtomicLong;

import org.drools.facttemplates.Fact;
import org.drools.facttemplates.FactTemplate;
import org.drools.facttemplates.FieldTemplate;

/**
* @author OrangeMile, Inc
*/
public class HashMapFactImpl extends HashMap implements Fact {

private static AtomicLong staticFactId = new AtomicLong();

private FactTemplate factTemplate;
private long factId;

public HashMapFactImpl( FactTemplate factTemplate ) {
factId = staticFactId.addAndGet(1);
this.factTemplate = factTemplate;
}

@Override
public long getFactId() {
return factId;
}

@Override
public FactTemplate getFactTemplate() {
return factTemplate;
}

@Override
public Object getFieldValue(int index) {
FieldTemplate field = factTemplate.getFieldTemplate(index);
return get(field.getName());
}

@Override
public Object getFieldValue(String key) {
return get(key);
}

@Override
public void setFieldValue(int index, Object value) {
FieldTemplate field = factTemplate.getFieldTemplate(index);
put( field.getName(), value );
}

@Override
public void setFieldValue(String key, Object value) {
put(key, value);
}
}


To use this class, you would then do this:

String rule = "package com.orangemile.ruleengine.test;" +
" template \"Trade\" " +
" String traderName " +
" int id " +
" end " +
" rule \"test rule\" " +
" dialect \"mvel\" " +
" when " +
" $trade : Trade( id == 5 ) " +
" then " +
" System.out.println( $trade.traderName ); " +
" end ";

MVELDialectConfiguration dialect = new MVELDialectConfiguration();
PackageBuilderConfiguration conf = dialect.getPackageBuilderConfiguration();
PackageBuilder builder = new PackageBuilder(conf);
builder.addPackageFromDrl(new StringReader(rule));
org.drools.rule.Package pkg = builder.getPackage();
RuleBase ruleBase = RuleBaseFactory.newRuleBase();
ruleBase.addPackage(pkg);

HashMapFactImpl trade = new HashMapFactImpl(pkg.getFactTemplate("Trade"));
trade.put("traderName", "Bob Dole");
trade.put("id", 5);

StatefulSession session = ruleBase.newStatefulSession();
session.insert(trade);
session.fireAllRules();
session.dispose();



Notice, that in the then clause, to output the traderName, the syntax is:
$trade.traderName
rather then the cumbersome:
$trade.getFieldValue("traderName")

What makes this possible is that the Fact is backed by a Map, and the dialect is MVEL, which supports this type of operation, when the map keys are strings.

The interesting thing about using the fact template, is that it makes it easy to perform lazy variable resolution. You may extend the above HashMapFactImpl to add Field Resolvers that may contain specific logic to retrieve field values. To do this with an object tree, especially dynamic objects, would require either intercepting the call to retrieve the field via AOP and injecting the appropriate lazy value, or setting the value to a dynamic proxy, which then performs the lazy variable retrieval once triggered. In either case, this simple Fact Template solution maybe all that you need.

Thursday, June 12, 2008

Corticon Rule Engine Review

I've recently had an opportunity to give Corticon a test run. Corticon consists of desktop rule studio and a server component that runs the rules. The server component can be embedded inside a Java application, or run as a standalone server. The studio is used to create a vocabulary and the rule sets.

The approach Corticon uses is quite novel in comparison to the other rule engine vendors. They don't use the Rete Algorithm, instead they rely on compile time rule linking. This means that the rules are deployed to the server compiled, and the rule firing order is already calculated. The concept is undoubtedly more palatable to some organizations that find traditional rule engines a little unyielding.

The other novel idea is that the rules are compiled against a vocabulary (an ontology, if you will) rather than an object model. This means that you can submit dynamic datasets to the rule engine rather than relying on strict structures as is the case with Jboss Rules. It also seems that Corticon has licensed the Oracle XML SDK, which allows them to query the database, retrieve the result in XML, pipe it to the rule engine, and produce a result, all without any custom code. The actual rule writing and vocabulary management occurs in the desktop rule studio. The studio gives a user a decision tree type of interface but with some enhancements. The rule writing is done in a 4th generation language, meaning that the rule author uses the vocabulary, boolean algebra, truth tables, discrete math type of programming, "table driven algorithm programming". It's quite nifty but comes off a little limiting to a hard core developer.

And now for the negative:
In order to extend the language, you need to write java code, and then add it to the classpath of the desktop rule studio. This means that if you distributed the desktop rule studio to your business analyst, now you have to redistribute your jar file. You will need to redistribute a new jar file every time you extend the library with a new function. This is almost impossible to do in large organizations with dumb terminals, software packaging, etc...

The actual desktop rule studio is cluncky and is missing some basic keyboard shortcuts, like the "delete" key. There is also no security of any type. The rule author is able to change the vocabulary and any rules. The actual 4gl language at times seems more confusing than it's 3GL counterpart.

For the developer, Corticon is a disaster. The API library consists of a handful of classes. One to create a server, and run the rules. Another to deploy the rules to the server, and yet another to do some admin RMI calls to the desktop rule studio. Mind you, RMI calls for an enterprise level application is a little laughable. The functionality of the RMI calls are also a little silly. You can start the studio, shut it down, and open a vocabulary. In comparison, ILOG has an immense API library. I would wager that you can pretty much control any aspect of ILOG. And of course, JBoss Rules is open source, which makes them quite friendly to developers.

And yet more negativity:
There is no way to control the rule firings. There is also no way to disable certain rules from firing, or manage the rules in a central location. The mere fact that the rules are compiled in the desktop rule studio by the rule author and written as a file in binary format makes any form of enterprise rule management a joke. A nice web 2.0 gui for rule management like with Drools or ILog would also be nice. Why is it that I need a desktop app to manage my rules?

In summary, I think Corticon is quite a nifty concept, but is not a mature enterprise framework. It maybe useful in some limited fashion as glue logic, but it doesn't belong as an enterprise class rule engine. At least, not yet.

An addendum: Tibco iProcess has licensed the Corticon Rule engine for it's decision server. This does not change my opinion of Corticon, but reduces my opinion of Tibco. As a side topic, I think Tibco Business Work is an impressive tool, but the entire iProcess stack is quite bad and convoluted. I think Tibco just wanted to put something out quickly, so, they pieced together some varying half finished technologies.

Sunday, May 11, 2008

J2ee, Jee, EJB 2 and 3, Spring, Weblogic,

Bla, bla, bla, bla. First, I'd like to say that I hate EJB 2, and starting to seriously dislike EJB 3. J2ee, and the stupid rebranding to JEE. There are many things that are wrong with EJB 2. One of them is the security model, another is the QL language, another is their retarded persistence approach. The only gain, and even that's a stretch would be the transaction management. But, even there, the EJB authors completely f'd up and created a convoluted model.

EJB's, originally, were supposed to save the day. Usher in a day where corporate code monkeys can get a bit dumber and focus on the "business logic" and forget all that complicated plumbing like transaction management, and threading. Threading, ha, said the EJB spec writes. No threads needed. The container will take care of that for you. Don't worry your pretty little head, little developer, just go play in your little sandbox with your pretty little business logic. Well, many developers did just that. And in time, they've realized that not only did EJB's complicate everything but you needed EJB experts just to support the frankenstein monstrosity that get produced.

Somewhere around there Spring has dawned, and flowers bloomed. Now Rod has the right idea, but the implementation lacked. Spring has proved to be another monster. The issue really is in the complexity. In order to get any value from spring, or even understand how spring works requires deep internal knowledge, something that Spring and EJB fail to mention. Spring has an insane dependency on configuration. In fact to such a degree as to make the code unreadable. A developer now has to funnel through massive amounts of XML config, plus a massive amounts of useless interfaces only to find some impl classes that's AOP'd into the damn dao. Now, don't get me wrong all you spring wankers, I am more than aware of all the benefits of IOC and AOP and unit testing. I know that's what's running through your head. Heck, he doesn't understand proper test driven development, bla, bla, bla. How dare he criticize that which is holy and has saved the day. Well, I don't like it. It replaces something really bad. I agree with that. I believe in the simplicity of design and the readability of the code. I agree in the principle value of AOP and IOC, but I wonder if there is a better and cleaner way to achieve some of the same things Spring set out to do.

EJB 3.0 persistence part is yet another over-arching spec-iness of the writes. It's basically useless for anything slightly larger than a pet-store website. The goals are definately admirable, but they must have known how far of the target they would actually hit. They are attempting to map pojo's into a relational table structure. What they don't tell you is that it's impossible to accomplish without seriously hampering the design of your relational model. Now, if you had an object database, perhaps, I wouldn't be saying that. Perhaps, a more valid attempt for this will come from the space of semantic networks, and companies like Business Objects with the concepts of a universe design, which provides a layer above the raw data.

To return to EJB and Spring bashing, I disagree with the basic notion of the goals. Each attempts to reduce how much a developer needs to know. Of course the reason for that, is to dummy down the developer, replace him/her with a body from a cheap foreign country, and keep-on churning code. A different model would be to replace the army of code monkeys with a few diligent developers, but move the responsibility of business development to the BUSINESS. At the end of the day, no one knows the end goal except the business. And no developer will ever be better than the business user at solving the business problem. Now, a lot of developers try, and they are usually bad developers. So, they are bad at business, they are bad at programming. And yes, they need yet another framework to keep them from hurting themselves. I think we should focus our attention at reducing our field to a few competent experts that deserve the title of the developer who focus on the technical development that enables the business users to do the thing that most of the developers do today: business coding.

A better compression algorithm

I am presented with an interesting problem. Usually, my employer burns money indiscriminately, but lately, with the market in tailspin, all costs are being evaluated. To avoid being one of those costs, I need to find a way to save money for the company. One of those ways is file storage space for document management systems. Unlike your basic $50, 100gb drives that you buy at Circuit City for you dell, corporate disk storage is highly expensive with EMC-SRDF storage running for 1TB at 1m.

Audit and regulatory rules requires that basically all files are kept. A large number of those files are data feeds from external systems. The files are structured and are in a readable format such as fixed length, delimited, or XML. My idea, which is not that unique, is to apply a heuristic compression algorithm to the data files. I am going to leverage the work done by the FIXML Protocol committee on the FAST specification, which defines a number of optimal heuristic encoding schemes. FAST defines a compression algorithm for market-data, but the same principles apply to file storage.

http://www.fixprotocol.org/fast

The concept is quite interesting. The compression algorithm basically attempts to find data patterns in the file, and encode them away. Let's say you have column that's an incrementing number: 1, 2, 3, ... n, n+1. The encoder will identify that this is an incrementing column, and encode it as algo: { previous + 1, starting with 0 }. We've just encoded away an entire column and took no space to do it. Let's try another example: abcdefg, abcdefe, abcdefn, abcdef5, etc... In this case, the first "abcdef" is the same in all the columns, and only the last character changes. We can encode this as a constant, and only send the last character: g, e, n, 5, etc...
There are a lot more sophisticated algorithms defined in the FAST protocol, but you get the idea.

The data in the file starts to mean something. The encoder actually attempts to represent the patterns present in the file. The patterns have a potential to save a lot more space then a traditional compression algorithm based on Huffman Encoding. How much space: how about average case of > 80%, compared with best case of 40% for ZIP. And don't forget, the result can still be zipped.

The program will read a file, scan through all the data points, figure out the optimal encoding algorithm, and then actually do the compression. The encoding algorithm will be needed to decompress the file. The first field in the file will carry the bytes needed for the encoding algorithm, followed by the encoding algorithm, and finally the data. This allows us to store the encoding scheme with the file.

One enhancement to FAST would be to allow the pre-processor to re-arrange the file. Data optimization is mostly based on previous records, so the more similar subsequent entries are, the higher the compression rate. Another enhancement maybe to bit map away typed fields. If a million entry file has 100 unique types, it might be more optimal to encode the bitmap separately, and then encode away the type id. Another extension maybe to see whether a corollary between fields rather then between subsequent records exists.

Another extension to this architecture is to write the files in a way as to improve lookup cost: index the files, and an intuitive UI, for the user to jump to the needed entry.

I have high hopes for this algorithm. If it can really encode away 90% of the file, then the space savings just might save my job. Well, at least until the next round of cost cutting.

Wednesday, February 13, 2008

Business Intelligence

Imagine a world where the people that own the data, actually have access to it. Sounds obvious, but think about it. Unless the business user is also the developer, this is never the case.

Your system users own the data that's in your system. Unfortunately for them, your system is also what's keeping them from their data. Every little thing they need has to go through you. The irony is that all you care about is serving the business user.

So, what are the parts of the problem. We have data that has some logical structure and some semantic definition to that structure. The business user implicitly understands the semantic definition, and wants to exploit it to its full potential. A semantic definition will define a relationship such as a cusip is an attribute of a stock, quantity is an attribute of a trade. Stocks are traded on exchanges. etc... Now, the user wants to aggregate quantity by cusip across all trades within an exchange. Fine you say. You go off, come back a few days later with a beautiful brand new report. Great the user says, now I want to see the average quantity traded. Well, of you go again, etc....

So, the data has some semantic definition. The same semantic definition exists in the users head. The user exploits the structure. This is analytics. The user should be able to manipulate the data with the only constraint being the semantic definition. At the moment, this space is filled by cube technology on the data warehouse side, and Business Objects on the relational side. The only real difference between BO and Cube technology is the size of the dataset. Cubes are pre-aggregated while BO is real-time SQL. It should be interesting to link cube technology to BO for drillthrough. So, once you have the data you point pump it into a rich visualization component. But, be careful not to link the visualization with the data. Each piece of technology is independent, but has a well defined interface in how to leverage it. The visualization component can receive data. So, now we have our analytics and visualization. The next part is to take both pieces and generate a static report that can be presented to senior management. This report can be saved, or automatically updated with new data; archived daily, quarterly, etc...

So, not too bad. But, I also want to understand how good my data is. I want to understand the integrity of the data at the lowest level. I need to know the story of every data point. This is where rule engines come in to play. The user will define the rules that will validate integrity. The trick is to have the rule engine tell you how good or bad your data is at any aggregated level. The data isn't discarded but just measured.

So far, the user has the data, can analyze it, visualize it, knows its quality and can report it. The next step is to manipulate it. A lot of times, analytics takes a flavor of what-if analysis. The user should be able to locally modify any data point, analyze the impact, visualize, report, etc...

Well, are we done. Have we satisfied everything the user wants. No. No. No. Now that you have some analysis you need to act on it. The data that you derived has some attributes which via rules can be applied to certain actions. One action can be to feed it into another system for further enrichment.

Are we done now? Damn it no. Once you have all this, you can take the data to the next step. You can mine the data for patterns. The patterns can then feed back to calibrate the data integrity rules.

As the user analyzes the data, the system watches the user. The more analysis done, the more the system can understand the user's intent. At this point, the system can start to infer what the user is trying to do. Now, we are starting to take the flavor of having the system solve equations and then acting on the outcome.

Think about it, but think about it in a context of a massively large data repository, and a Wall Street type firm.

In the interest of buying a solution, I present a vendor list:
Microsoft Analysis Services
Business Objects Web Intelligence
Panorama
Microsoft Sharepoint
Microsoft Excel 2007 (has a lot of cube technology)
Business Objects Dashboard + Reporting + etc...
ILog Jrules