Thursday, July 24, 2008

Let me count the ways... I HATE Spring Security ACL

Elizabeth Barrett Browning wrote a Love Poem "How do I love thee? Let me count the ways..." It's a lovely poem, but I am forced to use the same core phrase in a very negative connotation, and applying to technology, which "some" believe has no soul. More on that in a different post.

Now, Let me count the ways in which I hate the Spring Security Acl implementation. In any other setting, I would have written this off as some poor wanking by some poor wanker, but unfortunately, in my prior post, I've vowed to add property based security via a rule engine as an add-on for Spring Security. What I failed to realize at that writing is that Spring Security seems to be split into 2 sections. The core security, which has things like app server plugins, role, and principle management, etc... This section seems rather decent enough. Perhaps, a bit configuration heavy, but hey, that's Spring for ya. Now, this other section, the Acl section is a complete and outer fuckup. The irony is that this is a re-write of an even worse implementation.

Now, listen you Spring theists:
Why create an ObjectIdentity interface that wraps a serializable identifier, and then implement a ObjectIdentityImpl, only to cast the serializable identifier to a Long in both the BasicLookupStrategy, and the JdbcMutableAclService. As a side note, keep with the fucking naming convention. If you're going to call all the db accessors with Jdbc, then why name the jdbc lookup class BascLookupStrategy? And oh yeah, what's the point of the LookupStrategy pattern considering that you already have a lookup strategy pattern called MutableAclService, which has a Jdbc Accessor called JdbcMutableAclService?

So, even if I extend the ObjectIdentity and add support for property management, the implementation will go to hell, if someone decides to use any of the persistence classes. Oh, almost forgot, for all the bloody abstraction and interfaces, the JdbcLookupStrategy accepts an ObjectIdentity, yet, performs a direct instantiation for ObjectIdentityImpl, with a Long as a serializable id. So, there goes the ability to extend the class, or define anything but a long as an identifier. So, what's the point of creating the ObjectIdentity interface? And, what's the point of making the identifier serializable?

Ah, there is support for an Acl tree via parent/child Acl. I could create a parent Acl to represent the object, and then subsequent children for each of the properties, ah, but the damn ObjectIdentity cast to a long kills that as well.

What would be quite nice is to add property level support directly to the Access Control Entry. Of course, there is an interface, and an implementation, and supporting classes that require the implementation, making another useless interface. What's needed here is a factory pattern.

I am sorry I am angry. I've been reading Buddhist books lately, and they teach you to channel your anger, understand it's source, manage your emotions, so as to balance the negative and positive of Karma. The problem is that all this is going to force me to break from the Acl implementation in Spring, which would mean yet another Acl implementation with a subset feature set. Spring, for all it's problems, seems to provide a large feature set, and if at all possible, I prefer to enhance rather than replace.

Ok, back to Spring Security Acl bashing. The Acl interface and the AclImpl class are capable of encompassing the entire Sid structure. So, if I have 10k users, than, my poor little Acl class will start to look like an ACL cache rather than a simple pojo it was meant to be. What the ACL object should be is a representation of an object, which has properties, and is an instance of security for a single Sid. I highly disagree that a Single Acl needs to start supporting multiple Sids. Granted your approach is more flexible, but flexible to a point that there will be a single ACL class in the system, with a large array of all permissions. Acl is not a cache, it's a simple wrapper around what a single user/principle/granted authority has access to for the given object. The ACL Entry is actually supposed to be a wrapper around a property and a permission mask. That's the whole point of having a permission mask. A mask is an int, which means that you have a single integer (all those bits) that represent all the possible access control rights for a single property of a single object. The beauty of adding property support is that you're no longer limited to a 31 possible permissions, but rather unlimited, with a limit of 31 per property of an object. This means that you can conceivably have different rights per object attribute. And we all know that some objects have a lot more than 32 attributes. So, if you just wrapped the Permission mask in an ACL Entry class, then, what was the point of an ACL Entry class. You could simple collapse the whole structure into the ACL class and be done with it.

Deep breaths, I was reading another blog, which was talking about another blog that mentioned that "Every time you use Acegi... A fairy dies." My daughter love's fairy's.

Saturday, July 19, 2008

Drools + Spring Security + Annotations + AOP= ?

I am starting a new open source project:

http://code.google.com/p/dynamic-rule-security/

No code has been released yet, but I am hoping to have an alpha version out soon. The project integrates Drools Rule Engine with Spring Security to provide dynamic, rule based, field level ACL security to a system.

Once complete, the system administrator will be able to create business rules to restrict fields, objects, pages, content, whatever based on dynamic rules. But, that's not all. The current crop of security requires the security logic to be embedded with the code and is quite brittle and complex when security rules become very granular. For example, imagine having to implement a requirement that says when a trade belongs to account "abc" hide the trade from anyone not in group "abc-allowed". No problem, you say. You create the security group "abc-allowed". Now you have some choices regarding implementation, you can integrate the rule at the data retrieval layer, at the presentation tier, or in the middle. Either way, somewhere in your system, you'll have a chunk of code like this: if ( trade.account == "abc" && !isUserInRole("abc-allowed") ) then hide.

That was easy. Probably only took 10 minutes to write, 10 minutes to test, and a few days to get it deployed to production. No problem.

A few days go by, and the user comes back and says, I need to expand that security. It seems that group efg can actually see abc account trades but only when the trading amount is less than $50m. Ok, you say. A bit messy, but do-able. So, you create security group "efg-allowed", and change your prior rule to say:
if ( trade.account == "abc" && (!isUserInRole("abc-allowed") && ( trade.amount > 50 && !(isUserInRole("efg-allowed") ) then hide.

Probably only took 10 minutes to code, and another 10 minutes to test, but damn there is QA, UAT, production release. A few days later, you finally release the new feature.
Aren't you glad that's over. A few more days go by, and the user says, wait, he forgot that the efg group can't change the trader name on the trade, and can't see the counterparty, but should be able to see and change everything else. Oh, one more thing, they can change the trader name if the trader is "Jack", because trader Jack's accounts are actually managed by the efg group even if the account belongs to the "abc" group.

Crap you say, that's going to be a bit of work. You may need to change the presentation tier, to hide the fields in some cases, but not others. And boy, how much does it suck to hard code the trader's name somewhere.

Anyways, you get the point. Security Rules may get very complex and very specific to the data they interact with and the context of the request. This means that the rule needs to be aware of the data, and who is requesting it. The rule is then capable of setting the security ACL. The presentation tier then only needs to worry about following the ACL rather than actually dealing with the security rules themselves. Not only that, but security rules will be in a single place rather than being sprinkled throughout the system. You can also change them on the fly allowing you to react very quickly to additional security requests.

How to retrieve the fields used in a Drools Rule (DRL)

Sometimes it maybe useful to know what fields the rule-set relies on. For example, let's imagine you have a freaky dynamic system that's able to populate beans with only the data needed. The problem then becomes how do you know what data is needed by your vast set of dynamic rules.

One way to do this is to assume that you're dealing with standard pojo's. This means that each variable is private and has an associated getVar and setVar method. Drools currently supports their own language, DRL, java (backed by Janino compiler), and MVEL. I will present how to retrieve the fields form DRL and Java. I am sure the same principles can be applied to MVEL.

First, your pojo:

package com.orangemile.ruleengine;

public class Trade {
private String traderName;
private double amount;
private String currency;
public String getTraderName() {
return traderName;
}
public void setTraderName(String traderName) {
this.traderName = traderName;
}
public double getAmount() {
return amount;
}
public void setAmount(double amount) {
this.amount = amount;
}
public String getCurrency() {
return currency;
}
public void setCurrency(String currency) {
this.currency = currency;
}
}


Now the magic:


package com.orangemile.ruleengine;

import java.io.StringReader;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;

import org.codehaus.janino.Java;
import org.codehaus.janino.Parser;
import org.codehaus.janino.Scanner;
import org.codehaus.janino.Java.MethodInvocation;
import org.codehaus.janino.util.Traverser;
import org.drools.compiler.DrlParser;
import org.drools.lang.DrlDumper;
import org.drools.lang.descr.EvalDescr;
import org.drools.lang.descr.FieldConstraintDescr;
import org.drools.lang.descr.ImportDescr;
import org.drools.lang.descr.PackageDescr;
import org.drools.lang.descr.PatternDescr;
import org.drools.lang.descr.RuleDescr;

/**
* @author OrangeMile, Inc
*/
public class DRLFieldExtractor extends DrlDumper {

private PackageDescr packageDescr;
private Map variableNameToEntryMap = new HashMap();
private List entries = new ArrayList();
private Entry currentEntry;

public Collection getEntries() {
return entries;
}

/**
* Main Entry point - to retrieve the fields call getEntries()
*/
public String dump( String str ) {
try {
DrlParser parser = new DrlParser();
PackageDescr packageDescr = parser.parse(new StringReader(str));
String ruleText = dump( packageDescr );
return ruleText;
} catch ( Exception e ){
throw new RuntimeException(e);
}
}

/**
* Main Entry point - to retrieve the fields call getEntries()
*/
@Override
public synchronized String dump(PackageDescr packageDescr) {
this.packageDescr = packageDescr;
String ruleText = super.dump(packageDescr);
List rules = (List) packageDescr.getRules();
for ( RuleDescr rule : rules ) {
evalJava( (String) rule.getConsequence() );
}
return ruleText;
}

/**
* Parses the eval statement
*/
@Override
public void visitEvalDescr(EvalDescr descr) {
evalJava( (String) descr.getContent() );
super.visitEvalDescr(descr);
}

/**
* Retrieves the variable bindings from DRL
*/
@Override
public void visitPatternDescr(PatternDescr descr) {
currentEntry = new Entry();
currentEntry.classType = descr.getObjectType();
currentEntry.variableName = descr.getIdentifier();
variableNameToEntryMap.put(currentEntry.variableName, currentEntry);
entries.add( currentEntry );
super.visitPatternDescr(descr);
}

/**
* Retrieves the field names used in the DRL
*/
@Override
public void visitFieldConstraintDescr(FieldConstraintDescr descr) {
currentEntry.fields.add( descr.getFieldName() );
super.visitFieldConstraintDescr(descr);
}

/**
* Parses out the fields from a chunk of java code
* @param code
*/
@SuppressWarnings("unchecked")
private void evalJava(String code) {
try {
StringBuilder java = new StringBuilder();
List imports = (List) packageDescr.getImports();
for ( ImportDescr i : imports ) {
java.append(" import ").append( i.getTarget() ).append("; ");
}
java.append("public class Test { ");
java.append(" static {");
for ( Entry e : variableNameToEntryMap.values() ) {
java.append( e.classType ).append(" ").append( e.variableName ).append(" = null; ");
}
java.append(code).append("; } ");
java.append("}");
Traverser traverser = new Traverser() {
@Override
public void traverseMethodInvocation(MethodInvocation mi) {
if ((mi.arguments != null && mi.arguments.length > 0)
|| !mi.methodName.startsWith("get") || mi.optionalTarget == null) {
super.traverseMethodInvocation(mi);
}
Entry entry = variableNameToEntryMap.get(mi.optionalTarget.toString());
if ( entry != null ) {
String fieldName = mi.methodName.substring("get".length());
fieldName = Character.toLowerCase(fieldName.charAt(0)) + fieldName.substring(1);
entry.fields.add( fieldName );
}
super.traverseMethodInvocation(mi);
}
};
System.out.println( java );
StringReader reader = new StringReader(java.toString());
Parser parser = new Parser(new Scanner(null, reader));
Java.CompilationUnit cu = parser.parseCompilationUnit();
traverser.traverseCompilationUnit(cu);
} catch (Exception e) {
throw new RuntimeException(e);
}
}


/**
* Utility storage class
*/
public static class Entry {
public String variableName;
public String classType;
public HashSet fields = new HashSet();

public String toString() {
return "[variableName: " + variableName + ", classType: " + classType + ", fields: " + fields + "]";
}
}
}



And now, how to run it:


public static void main( String args [] ) {
String rule = "package com.orangemile.ruleengine;" +
" import com.orangemile.ruleengine.*; " +
" rule \"test rule\" " +
" when " +
" trade : Trade( amount > 5 ) " +
" then " +
" System.out.println( trade.getTraderName() ); " +
" end ";

DRLFieldExtractor e = new DRLFieldExtractor();
e.dump(rule);
System.out.println( e.getEntries() );
}



The basic principle is that the code relies on the AST tree that's produced by DRL and Janino. In the case of Janino walk, the code only looks for method calls that have a target, start with a "get", and take no variables. In the cast of DRL, the API is helpful enough in providing callbacks when a variable declaration and field is hit, making the code trivial.

That's it. Hope this helps someone.

Wednesday, July 16, 2008

Drools - Fact Template Example

Jboss Rule Engine ( Drools ) primarily works based on an object model. In order to define a rule and have it compile, the data referenced in the rule needs to exist somewhere in the classpath. This is easy enough to accomplish by using any of the dynamic libraries such as asm, cglib, or antlr. Once your class is defined, you can either inject your own implementation of a Classloader or change the permission on the system classloader and call defineClass manually.

But, there is another way, which is a bit over simplistic, but maybe useful for some of you out there. Drools has introduced support for fact templates, which is a concept introduced by Clips. A fact template is a basically a definition of a flat class:


template "Trade"
String tradeId
Double amount
String cusip
String traderName
end


This template can then be naturally used in the when part of a rule:

rule "test rule"
when
$trade : Trade(tradeId == 5 )
then
System.out.println( trade.getFieldValue("traderName") );
end


But, there is a cleaner way to do all of this using the MVEL dialect introduced in Drools 4.0.
You can code your own Fact implementation that's backed by a Map.

package app.java.com.orangemile.ruleengine;

import java.util.HashMap;
import java.util.concurrent.atomic.AtomicLong;

import org.drools.facttemplates.Fact;
import org.drools.facttemplates.FactTemplate;
import org.drools.facttemplates.FieldTemplate;

/**
* @author OrangeMile, Inc
*/
public class HashMapFactImpl extends HashMap implements Fact {

private static AtomicLong staticFactId = new AtomicLong();

private FactTemplate factTemplate;
private long factId;

public HashMapFactImpl( FactTemplate factTemplate ) {
factId = staticFactId.addAndGet(1);
this.factTemplate = factTemplate;
}

@Override
public long getFactId() {
return factId;
}

@Override
public FactTemplate getFactTemplate() {
return factTemplate;
}

@Override
public Object getFieldValue(int index) {
FieldTemplate field = factTemplate.getFieldTemplate(index);
return get(field.getName());
}

@Override
public Object getFieldValue(String key) {
return get(key);
}

@Override
public void setFieldValue(int index, Object value) {
FieldTemplate field = factTemplate.getFieldTemplate(index);
put( field.getName(), value );
}

@Override
public void setFieldValue(String key, Object value) {
put(key, value);
}
}


To use this class, you would then do this:

String rule = "package com.orangemile.ruleengine.test;" +
" template \"Trade\" " +
" String traderName " +
" int id " +
" end " +
" rule \"test rule\" " +
" dialect \"mvel\" " +
" when " +
" $trade : Trade( id == 5 ) " +
" then " +
" System.out.println( $trade.traderName ); " +
" end ";

MVELDialectConfiguration dialect = new MVELDialectConfiguration();
PackageBuilderConfiguration conf = dialect.getPackageBuilderConfiguration();
PackageBuilder builder = new PackageBuilder(conf);
builder.addPackageFromDrl(new StringReader(rule));
org.drools.rule.Package pkg = builder.getPackage();
RuleBase ruleBase = RuleBaseFactory.newRuleBase();
ruleBase.addPackage(pkg);

HashMapFactImpl trade = new HashMapFactImpl(pkg.getFactTemplate("Trade"));
trade.put("traderName", "Bob Dole");
trade.put("id", 5);

StatefulSession session = ruleBase.newStatefulSession();
session.insert(trade);
session.fireAllRules();
session.dispose();



Notice, that in the then clause, to output the traderName, the syntax is:
$trade.traderName
rather then the cumbersome:
$trade.getFieldValue("traderName")

What makes this possible is that the Fact is backed by a Map, and the dialect is MVEL, which supports this type of operation, when the map keys are strings.

The interesting thing about using the fact template, is that it makes it easy to perform lazy variable resolution. You may extend the above HashMapFactImpl to add Field Resolvers that may contain specific logic to retrieve field values. To do this with an object tree, especially dynamic objects, would require either intercepting the call to retrieve the field via AOP and injecting the appropriate lazy value, or setting the value to a dynamic proxy, which then performs the lazy variable retrieval once triggered. In either case, this simple Fact Template solution maybe all that you need.

Thursday, June 12, 2008

Corticon Rule Engine Review

I've recently had an opportunity to give Corticon a test run. Corticon consists of desktop rule studio and a server component that runs the rules. The server component can be embedded inside a Java application, or run as a standalone server. The studio is used to create a vocabulary and the rule sets.

The approach Corticon uses is quite novel in comparison to the other rule engine vendors. They don't use the Rete Algorithm, instead they rely on compile time rule linking. This means that the rules are deployed to the server compiled, and the rule firing order is already calculated. The concept is undoubtedly more palatable to some organizations that find traditional rule engines a little unyielding.

The other novel idea is that the rules are compiled against a vocabulary (an ontology, if you will) rather than an object model. This means that you can submit dynamic datasets to the rule engine rather than relying on strict structures as is the case with Jboss Rules. It also seems that Corticon has licensed the Oracle XML SDK, which allows them to query the database, retrieve the result in XML, pipe it to the rule engine, and produce a result, all without any custom code. The actual rule writing and vocabulary management occurs in the desktop rule studio. The studio gives a user a decision tree type of interface but with some enhancements. The rule writing is done in a 4th generation language, meaning that the rule author uses the vocabulary, boolean algebra, truth tables, discrete math type of programming, "table driven algorithm programming". It's quite nifty but comes off a little limiting to a hard core developer.

And now for the negative:
In order to extend the language, you need to write java code, and then add it to the classpath of the desktop rule studio. This means that if you distributed the desktop rule studio to your business analyst, now you have to redistribute your jar file. You will need to redistribute a new jar file every time you extend the library with a new function. This is almost impossible to do in large organizations with dumb terminals, software packaging, etc...

The actual desktop rule studio is cluncky and is missing some basic keyboard shortcuts, like the "delete" key. There is also no security of any type. The rule author is able to change the vocabulary and any rules. The actual 4gl language at times seems more confusing than it's 3GL counterpart.

For the developer, Corticon is a disaster. The API library consists of a handful of classes. One to create a server, and run the rules. Another to deploy the rules to the server, and yet another to do some admin RMI calls to the desktop rule studio. Mind you, RMI calls for an enterprise level application is a little laughable. The functionality of the RMI calls are also a little silly. You can start the studio, shut it down, and open a vocabulary. In comparison, ILOG has an immense API library. I would wager that you can pretty much control any aspect of ILOG. And of course, JBoss Rules is open source, which makes them quite friendly to developers.

And yet more negativity:
There is no way to control the rule firings. There is also no way to disable certain rules from firing, or manage the rules in a central location. The mere fact that the rules are compiled in the desktop rule studio by the rule author and written as a file in binary format makes any form of enterprise rule management a joke. A nice web 2.0 gui for rule management like with Drools or ILog would also be nice. Why is it that I need a desktop app to manage my rules?

In summary, I think Corticon is quite a nifty concept, but is not a mature enterprise framework. It maybe useful in some limited fashion as glue logic, but it doesn't belong as an enterprise class rule engine. At least, not yet.

An addendum: Tibco iProcess has licensed the Corticon Rule engine for it's decision server. This does not change my opinion of Corticon, but reduces my opinion of Tibco. As a side topic, I think Tibco Business Work is an impressive tool, but the entire iProcess stack is quite bad and convoluted. I think Tibco just wanted to put something out quickly, so, they pieced together some varying half finished technologies.

Sunday, May 11, 2008

J2ee, Jee, EJB 2 and 3, Spring, Weblogic,

Bla, bla, bla, bla. First, I'd like to say that I hate EJB 2, and starting to seriously dislike EJB 3. J2ee, and the stupid rebranding to JEE. There are many things that are wrong with EJB 2. One of them is the security model, another is the QL language, another is their retarded persistence approach. The only gain, and even that's a stretch would be the transaction management. But, even there, the EJB authors completely f'd up and created a convoluted model.

EJB's, originally, were supposed to save the day. Usher in a day where corporate code monkeys can get a bit dumber and focus on the "business logic" and forget all that complicated plumbing like transaction management, and threading. Threading, ha, said the EJB spec writes. No threads needed. The container will take care of that for you. Don't worry your pretty little head, little developer, just go play in your little sandbox with your pretty little business logic. Well, many developers did just that. And in time, they've realized that not only did EJB's complicate everything but you needed EJB experts just to support the frankenstein monstrosity that get produced.

Somewhere around there Spring has dawned, and flowers bloomed. Now Rod has the right idea, but the implementation lacked. Spring has proved to be another monster. The issue really is in the complexity. In order to get any value from spring, or even understand how spring works requires deep internal knowledge, something that Spring and EJB fail to mention. Spring has an insane dependency on configuration. In fact to such a degree as to make the code unreadable. A developer now has to funnel through massive amounts of XML config, plus a massive amounts of useless interfaces only to find some impl classes that's AOP'd into the damn dao. Now, don't get me wrong all you spring wankers, I am more than aware of all the benefits of IOC and AOP and unit testing. I know that's what's running through your head. Heck, he doesn't understand proper test driven development, bla, bla, bla. How dare he criticize that which is holy and has saved the day. Well, I don't like it. It replaces something really bad. I agree with that. I believe in the simplicity of design and the readability of the code. I agree in the principle value of AOP and IOC, but I wonder if there is a better and cleaner way to achieve some of the same things Spring set out to do.

EJB 3.0 persistence part is yet another over-arching spec-iness of the writes. It's basically useless for anything slightly larger than a pet-store website. The goals are definately admirable, but they must have known how far of the target they would actually hit. They are attempting to map pojo's into a relational table structure. What they don't tell you is that it's impossible to accomplish without seriously hampering the design of your relational model. Now, if you had an object database, perhaps, I wouldn't be saying that. Perhaps, a more valid attempt for this will come from the space of semantic networks, and companies like Business Objects with the concepts of a universe design, which provides a layer above the raw data.

To return to EJB and Spring bashing, I disagree with the basic notion of the goals. Each attempts to reduce how much a developer needs to know. Of course the reason for that, is to dummy down the developer, replace him/her with a body from a cheap foreign country, and keep-on churning code. A different model would be to replace the army of code monkeys with a few diligent developers, but move the responsibility of business development to the BUSINESS. At the end of the day, no one knows the end goal except the business. And no developer will ever be better than the business user at solving the business problem. Now, a lot of developers try, and they are usually bad developers. So, they are bad at business, they are bad at programming. And yes, they need yet another framework to keep them from hurting themselves. I think we should focus our attention at reducing our field to a few competent experts that deserve the title of the developer who focus on the technical development that enables the business users to do the thing that most of the developers do today: business coding.

A better compression algorithm

I am presented with an interesting problem. Usually, my employer burns money indiscriminately, but lately, with the market in tailspin, all costs are being evaluated. To avoid being one of those costs, I need to find a way to save money for the company. One of those ways is file storage space for document management systems. Unlike your basic $50, 100gb drives that you buy at Circuit City for you dell, corporate disk storage is highly expensive with EMC-SRDF storage running for 1TB at 1m.

Audit and regulatory rules requires that basically all files are kept. A large number of those files are data feeds from external systems. The files are structured and are in a readable format such as fixed length, delimited, or XML. My idea, which is not that unique, is to apply a heuristic compression algorithm to the data files. I am going to leverage the work done by the FIXML Protocol committee on the FAST specification, which defines a number of optimal heuristic encoding schemes. FAST defines a compression algorithm for market-data, but the same principles apply to file storage.

http://www.fixprotocol.org/fast

The concept is quite interesting. The compression algorithm basically attempts to find data patterns in the file, and encode them away. Let's say you have column that's an incrementing number: 1, 2, 3, ... n, n+1. The encoder will identify that this is an incrementing column, and encode it as algo: { previous + 1, starting with 0 }. We've just encoded away an entire column and took no space to do it. Let's try another example: abcdefg, abcdefe, abcdefn, abcdef5, etc... In this case, the first "abcdef" is the same in all the columns, and only the last character changes. We can encode this as a constant, and only send the last character: g, e, n, 5, etc...
There are a lot more sophisticated algorithms defined in the FAST protocol, but you get the idea.

The data in the file starts to mean something. The encoder actually attempts to represent the patterns present in the file. The patterns have a potential to save a lot more space then a traditional compression algorithm based on Huffman Encoding. How much space: how about average case of > 80%, compared with best case of 40% for ZIP. And don't forget, the result can still be zipped.

The program will read a file, scan through all the data points, figure out the optimal encoding algorithm, and then actually do the compression. The encoding algorithm will be needed to decompress the file. The first field in the file will carry the bytes needed for the encoding algorithm, followed by the encoding algorithm, and finally the data. This allows us to store the encoding scheme with the file.

One enhancement to FAST would be to allow the pre-processor to re-arrange the file. Data optimization is mostly based on previous records, so the more similar subsequent entries are, the higher the compression rate. Another enhancement maybe to bit map away typed fields. If a million entry file has 100 unique types, it might be more optimal to encode the bitmap separately, and then encode away the type id. Another extension maybe to see whether a corollary between fields rather then between subsequent records exists.

Another extension to this architecture is to write the files in a way as to improve lookup cost: index the files, and an intuitive UI, for the user to jump to the needed entry.

I have high hopes for this algorithm. If it can really encode away 90% of the file, then the space savings just might save my job. Well, at least until the next round of cost cutting.

Wednesday, February 13, 2008

Business Intelligence

Imagine a world where the people that own the data, actually have access to it. Sounds obvious, but think about it. Unless the business user is also the developer, this is never the case.

Your system users own the data that's in your system. Unfortunately for them, your system is also what's keeping them from their data. Every little thing they need has to go through you. The irony is that all you care about is serving the business user.

So, what are the parts of the problem. We have data that has some logical structure and some semantic definition to that structure. The business user implicitly understands the semantic definition, and wants to exploit it to its full potential. A semantic definition will define a relationship such as a cusip is an attribute of a stock, quantity is an attribute of a trade. Stocks are traded on exchanges. etc... Now, the user wants to aggregate quantity by cusip across all trades within an exchange. Fine you say. You go off, come back a few days later with a beautiful brand new report. Great the user says, now I want to see the average quantity traded. Well, of you go again, etc....

So, the data has some semantic definition. The same semantic definition exists in the users head. The user exploits the structure. This is analytics. The user should be able to manipulate the data with the only constraint being the semantic definition. At the moment, this space is filled by cube technology on the data warehouse side, and Business Objects on the relational side. The only real difference between BO and Cube technology is the size of the dataset. Cubes are pre-aggregated while BO is real-time SQL. It should be interesting to link cube technology to BO for drillthrough. So, once you have the data you point pump it into a rich visualization component. But, be careful not to link the visualization with the data. Each piece of technology is independent, but has a well defined interface in how to leverage it. The visualization component can receive data. So, now we have our analytics and visualization. The next part is to take both pieces and generate a static report that can be presented to senior management. This report can be saved, or automatically updated with new data; archived daily, quarterly, etc...

So, not too bad. But, I also want to understand how good my data is. I want to understand the integrity of the data at the lowest level. I need to know the story of every data point. This is where rule engines come in to play. The user will define the rules that will validate integrity. The trick is to have the rule engine tell you how good or bad your data is at any aggregated level. The data isn't discarded but just measured.

So far, the user has the data, can analyze it, visualize it, knows its quality and can report it. The next step is to manipulate it. A lot of times, analytics takes a flavor of what-if analysis. The user should be able to locally modify any data point, analyze the impact, visualize, report, etc...

Well, are we done. Have we satisfied everything the user wants. No. No. No. Now that you have some analysis you need to act on it. The data that you derived has some attributes which via rules can be applied to certain actions. One action can be to feed it into another system for further enrichment.

Are we done now? Damn it no. Once you have all this, you can take the data to the next step. You can mine the data for patterns. The patterns can then feed back to calibrate the data integrity rules.

As the user analyzes the data, the system watches the user. The more analysis done, the more the system can understand the user's intent. At this point, the system can start to infer what the user is trying to do. Now, we are starting to take the flavor of having the system solve equations and then acting on the outcome.

Think about it, but think about it in a context of a massively large data repository, and a Wall Street type firm.

In the interest of buying a solution, I present a vendor list:
Microsoft Analysis Services
Business Objects Web Intelligence
Panorama
Microsoft Sharepoint
Microsoft Excel 2007 (has a lot of cube technology)
Business Objects Dashboard + Reporting + etc...
ILog Jrules

Tuesday, October 02, 2007

Philosophy of Architecture

I have recently come to face with two distinct philosophies of architecture.

The first philosophy holds the business in the highest esteem to the determent of the system. All projects are done as strategic. This means that the management pushes on the development team to deliver as soon as possible and with a sub-optimal solution. This is what's commonly termed as "getting it done". With this philosophy, requirements tend to be spotty or none existent. In most cases, the requirement document is created after development has already completed. Development is incremental and the system follows an incremental evolution. The business receives the minimum of what was asked, but with the impression of quick delivery. Unfortunately, an incremental evolution causes the development time to continuously increase. This increase is caused, because code is only added and rarely removed. Removing code requires analysis and re-factoring: time which is not factored into the project schedule. Adding code in this way will balloon the system and make adding any future enhancements/changes incrementally more difficult.

The second philosophy is more methodical in its approach. In this case, development goes through some established cycles such as understanding what needs to be built, designing, reviewing, and finally building. This approach has a longer upfront cost before actual development begins but causes the system to move in revolutionary jumps rather then in continuous, increasing steps. With revolution jumps, the system tends to get more compact as code gets re-factored and multiple functionalities folded into a single framework.

Most shops follow the first philosophy. This philosophy is more natural to organic growth. When the user tells you to do something, and they are paying your salary, you do it. With the second philosophy, you would need to have the guts to tell the user, no, wait, let me first understand what you're trying to accomplish, and then we'll review, and then I'll build. This is very difficult. For example, most, if not all, Wall street firms follow the "getting is done" model. The "beauty" of the system is secondary, delivering the project is primary above all rest.

My argument is that beyond creating a simple report, no project should follow the "getting it done" philosophy. Every project needs to have a more methodical approach. Building the first thing that comes to your mind is dangerous and stupid when working with an enterprise system. All projects need proper analysis: what already exists, what should change, what the user wants, what else might they want. Then, draw up the architecture, review it, and only then, build it.

Friday, September 14, 2007

Data Warehousing

I have recently been immersed in the world of BI, OLAP, XMLA, MDX, DW, UDM, Cube, ROLAP, MOLAP, HOLAP, star schema, snowflake, dimensions and facts.

A data warehouse is a special form of repository that sacrifices storage space for ease of retrieval. The data is stored in a special normalized form that literally looks like a star schema. If you change one attribute, an entire row is duplicated. The data is normalized in a way to ease retrieval and reduce table joins. The data warehouse is nothing but a giant relational database whose schema design makes using plain SQL downright ugly. On top of this repository, lies one or more cubes that represent an aggregated view of the massive amounts of data. There are multiple forms of the cube: multi-dimensional online analytical processing(MOLAP), relational online analytical processing (ROLAP), and hybrid online analytical processing (HOLAP). A ROLAP cube is nothing but a special engine that converts the user requests into SQL and passes it to the relational database, a MOLAP is a pre-aggregated cube that allows the user fast retrieval without consistently requiring the underling data store, and a HOLAP is a hybrid of those two approaches. The reason for the cube technology is that it allows the user to slice and dice massive amounts of data online without any developers involvement. On top of the cube technology, there are a set of user front ends either web based or desktop. One such company is Panorama. Each GUI tool communicated with the cube in a standard language called MDX. A multi-dimensional expression language. An XML version of this language is the XMLA protocol, which was originally invented by the cube GUI company Panorama. Microsoft bought out their original tool, and further developed it into what is today called Microsoft Analysis Services 2005, which is a leading cube framework.

So to summarize:
UDB(Relational Database)
Microsoft Analysis Services 2005 (Cube)
Panorama (GUI)

Now the price, well, for a full blow BI (Business Intelligence) solution, you're easily looking into millions just on storage alone not to mention the license costs of the products. There are free solutions at least on the GUI side: one good one is jpivot.

A Data warehouse is a very powerful concept. It allows you to literally analyze your data in real-time. The business users use a friendly GUI to slice and dice their data, aggregate the numbers in different ways, generate reports, etc... The concept allows you to see ALL your data in any way the user imagines or at least in the number of dimensions defined on your cube. A dimension, by the way, is an attribute that it makes sense to slice by. For example, dates or type columns are good dimensions. A fact on the other hand is the business item that you're aggregating. For example, a trade would be considered a fact.

Once you have a data warehouse the next logical extension is KPI (Key Performance Indicators). Imagine looking at a dashboard, and have pretty colors with green, yellow, and red telling you how much money you're making/losing at that point. KPI are special rules that are applied to the data at the lowest level. When you aggregate up, the colors change depending on how you're slicing the data. This allows you to start at the very top of which region isn't doing so well, and then drill down to the very desk that's loosing money.

A further extension of data warehousing is data mining. This is a off-shot of AI and covers areas such as cluster detection, association rules, etc... There will be further blogs covering this in more detail.

So, if you have a huge budget, I recommend you give this a try. Your company will thank you for it(later). And if you don't have a huge budget, understand whether your problem fits in the BI world, and ask for a huge budget. I've seen too many companies take a cheap route and end up with half baked solutions that have no future.

Sunday, August 26, 2007

Rule Engines

Recently, there has been a proliferation of rule engines. A rule engine is by product of AI research. The basic premise is that a user is able to create a bunch of atomic units of knowledge. When the rule engine is presented with a state of the world, the rules all fire. After all the firings have settled down, the new state of the world is an outcome. A lot of problems are easier to implement using rule engines than the more conventional programming. For example, system that relies on heavy usage of knowledge with deep trees - imagine many layers deep of if/elif.

There are a couple of major contenders. For the corporate world, there is ILOG and FairIsaac. For the open source, there is Jboss Rules and Jess. Jess being the original java rule engine, and the closest to the original NASA Clips system. Clips being the system that created the rule engines. Personally, I am most familiar with Jboss Rules, ILOG and to a much lesser degree with Jess. This should not be taken as a diss on FairIsaac or any other rule engine.

Each rule engine, at its core, is based on the RETE algorithm. There are a lot of variations and enhancements, but each rule engine implements the core algorithm. The algorithm is used to find rules that need to be executed for a given world state. Imagine thousands of rules, and a good search algorithm becomes critical to a useful rule engine. The RETE algorithm acts as the control flow in a regular language.

The major blocking point to a wide adoption of rule engines is their dynamic nature and unpredictability. If you define a thousand rules, it becomes difficult to know how the rules will interact in every situation. This means testing and scenario generation is critical. This also means a much more mature infrastructure and process than most organizations have. The advantages are huge. You can explain to your user exactly how a given outcome was reached. Display the rules, modify the rules, add rules, all dynamically. You can even simplify the rule model such that your users can create their own rules.

The next blocking point is the rule language itself. The language has many requirements. For example, some people want the language to have a natural language feel. Others, want a clean interact with the existing java system, while others seek some middle ground with a scripting language. ILOG does this very well, with a natural language translation tool. Jboss rules has a more rudimentary natural language translation (DRL - DSL) but supports a wider language group.

I find Jboss Rules to be easier to get started with, but a large and mature organization should probably take a look at a vendor product for the scenario generation, and rule management infrastructure, something Jboss doesn't quite have yet. The vendors also have much more mature rule editing GUI's.

Saturday, July 07, 2007

Supply of Money

I know this should be a technology oriented blog, but I am starting to be afraid, because I don't understand what is happening.

Money is intrinsically worthless:

"Paper money eventually returns to its intrinsic value - zero." ~ Voltaire - 1729

Our economy is one of exponentially increasing debt. All money (dollar) is loaned at interest from the Fed. The Fed creates money by printing it as basically zero cost. This means that to pay interest you need to borrow more money(get a loan), by so creating more money. Notice the exponential function in all of this. The US economy basically no longer produces anything, and imports everything necessary for basic survival. To import requires purchasing, to purchase requires money, money that needs to be borrowed. Borrowing requires paying interest. How does the government borrow, it borrows from the Fed, which prints more money.

The interesting thing is the bond market which acts as a money sponge. A US treasury bond pays a certain yield. Japan has historically bought billions and billions of US treasuries to the tune of 16% of all US treasury bonds. This is interesting, Japan buys a bond of $100 paying 4% yield. This means that Japan hands over 100 dollars to the US government in exchange for 4% yield. In an essence, $100 dollars disappears from circulation and was replaced by a continuous stream of $4 dollars. Now, $4 dollars has to come from somewhere, it's borrowed from the Fed. This is an ever increasing cycle, growing exponentially fast. What ever money exists in circulation was borrowed at interest. I think all this means is that money can never be destroyed. It can only ever exponentially increase.

What happens on the way back. What happens if the money was to be re-payed to the Fed. The dollar will need to traverse the entire route back. I don't understand how that's possible, but if it was to happen, money would return to its intrinsic value of 0.

A little confusing. Right now, Tokyo's interest rate is extremely low. Tokyo is also trading at about 125 yet to a dollar. Tokyo's rate is around 1 percent, while US and the rest of the western world is at 4 to 5 percent. This means you can get cheap money from Tokyo, convert it into dollars, buy US bonds, and earn a hefty 4.5 percent without doing anything. But you can also leverage your position, by taking on more risk. In this case, you don't buy more yen, but plan to buy later, but also simultaneously use what you don't own. In an essence, you've just created even more supply of money. One day, you will need to reverse you position buy actually buying the yen you promised to buy. This will cause the supply of yen to drop, the demand to sky rocket, and the price to act accordingly. The US dollar is going to continue to drop or in other words go up. The currency must continue to weaken, as it will take more dollars to service the exponentially increasing debt.

China and India will undoubtedly delay the inevitable, but the world economy must and will collapse. An exponential function cannot last indefinitely. This is the conclusion I am drawing, but I must admit I don't understand all the factors. All I know is that I am becoming increasingly uneasy.



Sunday, May 27, 2007

Black Swan

I am becoming obsessed with randomness and probability. What follows is based very heavily on Nassim Nicholas Taleb research. Imagine a turkey on a farm. Everyday the turkey has ever known, the farmer comes every morning and feeds it. From the turkey's point of view, the farmer is a friend, a trusted being. Then, one morning, the farmer kills the turkey. A black swan has occurred from the point of view of the turkey. A completely unexpected event.

Take our stock market, heck take the entire global market, companies, and global economies have created multiple levels to guard against risk. Options trading, derivatives, options on derivatives, credit default swaps, and so on, and on, and on. Each product is designed to allow some risk, some profit, and some safety. Some products have two components such as derivatives, allowing a company to sell its risk to others. Risk, actually, is an interesting side of the coin. Companies have large staffs of risk professionals, calculating, and guarding the said corporations from risk. Recently, companies started to realize that risk comes in many forms, and a new area was born "operational risk". This is the risk where an employee goes crazy and shoots everyone. So, you would argue that all this guards the said companies from risk. Now, Nassim Taleb, and myself, actually believe that this enhances risk. All this calculating is simply creating an impression of safety. Like the turkey, we go day in and day out believing we are safe, until one day, the farmer kills the turkey.

The basic problem is that we can't understand the future. In fact, we can't understand that we can't understand the future. We keep believing in things, looking for correlations, patterns in randomness. We find them, in fact we tend to create patterns in randomness. Are the markets random? I would argue no. In fact, I would argue that the markets are becoming very much un-random. The markets are starting to be governed by machines following very concrete rules. There are also very few players in the market that have the weight to move markets, and a lot of those players are using machines. All of this is very scary.

Another interesting example is China. An unprecedented amount of common people are investing heavily in the market. And, the market is going up and up and up. But, like everything else in life, it will come down, and boy will it come down hard. And there will be ripples throw the global markets, and global economies. But, this isn't the black swan I am afraid of. I am afraid of something more. I am afraid of something we don't know is going to happen.

Global Development

It is all the rage these days to do global development. One "system", global implementation. The idea being is economy of scale. Any region can perform development allowing other regions to reap the rewards. There are different ways for a single system to achieve global development.

1. The system is being developed by 1 region. All global requirements are funneled to this region. The actual system maybe run centrally or locally within the regions.

2. Each region has a separate system which, based on an agreed protocol, feed a shared central system.

Ah, but there is another way. You maybe able to have a single system, and yet global, parallel development. You can split the system into areas of concern, and assign different parts to different systems. Unfortunately, at one point or another, the areas will overlap. This brings up an interesting scenario. Single system, many teams split across different timezones answering to different management, have different requirements, different users, schedules, etc... Quite a mess. Now, each region is actually working on a common goal. The system is a specific system, serving a specific goal, but different masters. The trick is to split the system into a common framework and a regional implementation. If the regions are using the same system, and there is a core of the system which is indeed universal, but there is also an aspect of the system which is very much unique to a given region. Understand the problem the system is solving. Then understand the fundamental aspect of the system, the raw materials, if you will. This is the common framework. Each region may modify the framework, but what they are doing is enhancing the breadth of the system. Imagine a graph, links and nodes going every which way. Imagine dark areas of the graph, unexplored. These dark areas represent parts of the system developed by other regions, but not yet used locally. When a given area matures to that functionality, it will be there for it. The unexplored areas of the graph become used, and therefore visible. This seems a very interesting way to create a global enterprise architecture. Model the system as a graph, allow each region to build out the graph, but in such a way as to allow other regions to use only what they need. Then allow the graph to be customized to the regions needs. If done correctly, the system will become a set of loose shared modules, with concrete implementation by each region. The regions decide how the modules are used and how they link. Of course, some linkages is defined. Regions may enhance existing modules, build new ones, or create region specific enhancements to existing.

Sunday, April 22, 2007

Equilibrium

I had a chat with a Rabbi the other day. He told me a story from his life. When he was a young man, he had trouble sleeping. He would sleep at most 4 hours a night. He was worried that he had a sleeping disorder, so he found a top doctor on sleeping disorders. The doctor had him keep track of the number of ours he slept every night for a month. At the end, the doctor identified that the Rabbi slept an average of 4 hours every night. Sometimes, 4:15, other times, 3:50, but on the average 4 hours. What the doctor told the Rabbi was that he was one of the lucky ones. Most people are in the middle, and require 7 hours of sleep. The Rabbi was an extreme exception on the far side of the curve, requiring only 4 hours. The Rabbi was lucky because he has 3 hours more a day than everyone else. This story is interesting in that in this day and age, in this country, the rabbi would be put on sleeping medication. I am pretty sure that a number of hours people sleep fits a bell curve. Most people are in the middle sleeping somewhere between 6 and 8 hours. But the tails of the curve expend in both directions; some, requiring more like 9 or 10, while others requiring less like 4 or 5. Now, the established medical principle of the day and age is to fit everyone into the middle with no tails. I see this in everything. For example, medical community preaches that cholesterol should be below 200. Now, what makes 200 a magic number that applies to the entire population regardless of background. I would imagine that cholesterol, like everything else, follows a bell curve. Most people's normal average is 200, but the tails of the curve, go out in both directions. Some have a high average cholesterol number, and that is considered normal for their bodies, while others, have a low average. It is very troubling that most things are being applied indiscriminately. We, as a society, are loosing the equilibrium in favor of the standard.

Monday, February 05, 2007

I usually stay away from such posts, but I can't resist. Check out these two sites:

http://www.zoho.com/
http://services.alphaworks.ibm.com/ManyEyes/

Saturday, February 03, 2007

A priori

A priori is a term that describes sequence of events, time. More specifically, development is a sequence of steps, events, that produce a desired result. The question that bothers me is why does it freaking take so long.

A colleague of mine was recently complaining that his users were upset that it takes his team a long time to develop seemingly simple functionality. Why does it take weeks to read some data, apply some business rules, send some messages, and produce a report.

The world of business tools can be thought of as a giant, ever increasing graveyard. The business tools are being continuously and artificially given life to. Like little Frankensteins, they roam the earth, used and abused by both users and developers, growing up, until being killed off and replaced with a younger Frankensteins that are doomed to the same fate.

Excel is the only tool that comes to mind that has escaped this fate. It allows the business user to solve his own problems. Unthinkable to a crack smoking code monkey. The user can load data, build his models, produce reports, export them out. The power is in the users hands. On the other side, the developer attempts to give the user exactly what the user asked for and nothing, and I mean nothing else. In fact, the majority of the time, the developer neither understands the business user nor the business nor the problem being solved.

I think the industry is starting to realize this and is attempting to shift the power back to the business user. For example, specs like BPEL and the hype surrounding web-services are all meant to give more power to the business user and reduce the turn-around time of development. I believe software will become less like software and more like legos. Individual pieces will still need to be built, but the business user is the one that will put the legos together to produce a result. Things like forms, business rules, reports, data loading, data extraction will go away. Instead, time will be spent on producing richer widgets to do more sophisticated things. Honestly, how many developers does it take to build a relatively large system that does a whole lot of variations of the 5 things mentioned above? 1, 2, 5, 7, 10, 40? How big is your team?

Friday, January 26, 2007

I've been away for a long time. For that, I am sorry. But, now I am back.

What's been on my mind lately is whether its possible to encode a business intention in an intermediary language and then build an interpreter to read this language. One system would encode an intention, the second system would evaluate it. Interesting, no? Perhaps, all this means is that system A sends a message to system B. System B reads the message and based on hard-coded business rules performs the work. But, let's say there are no hard-coded business rules. Let's say the message is the rules and the data. Would that be possible? What would this language look like. It would need to contain meta-data that could be evaluated and mapped to business rules. Let's step back a little. What's the point of this. System B is a specific system that does a specific thing. It should know what to do with the message without needing System A to tell it. A new trade message arrives, your system receives the trade. It knows its a new trade, because it says so on the message. What is the action, book the trade. So, your system dynamically looks up all the supported actions, and passes the data-set to that rule-set. Now, some of you are thinking, great, all this and he describes a bloody factory pattern. But wait, forget messages. It's an event. Something, some how raises an event that says there is a new action with the given payload. Some controller accepts the event and routes it to the appropriate implementation for that event, or perhaps a set of implementations, or even better triggers the work-flow. Now, we're getting somewhere. The event name maps to a business intention, which is specified as a work-flow. But, the work-flow is a generic concept. It's not real unless there is code behind it. So, we build a bunch of modularized code that does specific functions, we wire it together with dependency injection and have a dynamic work-flow define the execution path.

Tuesday, October 17, 2006

An Ode to a Slacker

I need to get around to writing this, and more in general, to finish some of the shit I start.

A week later:
Of course, this entry is a lot more complicated than a simple statement such as "get shit done." The issue is a delicate balance of life and work, and the venn diagram where they cross.

Projects start as fun little side things. You play around with them for a few hours, put some things together, and call it a day. Then, you get an email from the user saying, hey this is pretty cool, but it needs to be a lot more to be useful. Sure, you say, I'll add a few more lines of code. Unfortunately, now its not a small little project but a junior system. And not only that, but its a junior system which is poorly tested. You try to maintain the same schedule, but you realize that you can't add the kind of functionality that's needed, or maintain the level of quality necessary for a production system. You make mistakes, take shortcuts. Before you know it, your users are pretty angry. They are starting to question the whole thing. And frankly, so are you. You want to finish it. You desperately need to finish it. You've came this close, dedicated this much, but you realize that finishing will require even more.

This is an interesting struggle. The few lucky of us actually enjoy building things. So, this side little project may seem like work to some but is actually seen as basically a hobby. Unfortunately, some people are relying on your hobby, and that's when the pressure kicks in, and the problems start. On the other hand, unless you had a user who wanted something, you probably wouldn't have choose to build this particular thing as your hobby.

The other interesting observation is you are starting to see this project as something more. Maybe this project is the way out of the rat race. If it works, it could be your ticket. But its so much work you say.

How do you maintain the delicate balance? Is it even possible to maintain the balance? You're working with fixed items. There is a fixed amount of time. That amount is then reduced by constants such as actual work hours, sleeping, eating, showering, spending time with the family.
A week has 168 hours, 45 hours is spent at work, 49 hours is spent sleeping, 14 hours is spent eating, 4 hours is spent on toiletries. What remains is 54 hours to spend time with the family, work on the side projects, wash the dishes, do laundry, go to the movies, sleep in, watch tv, do the bills, etc... What ends up happening is you can probably take maybe 9 hours for the week - 1 per workday and 2 per weekend. Unfortunately, as everyone knows spending 1 hour programming is like watching ballet dancers do hip hop (it's not right). You can't accomplish anything major in 1 hour or even 2 hours. So you may start, but you tend to aim lower, and make a lot of mistakes in the process.

Wish me luck!

Thursday, August 24, 2006

Randomness

Here is an interesting question; if you know the past, can you guard against a similar event in the future? You know that the great depression took place. A lot of research has been done to understand what led to the great depression, and a lot of research has been done to understand how to get out of the great depression. In fact, the current chairman of the Federal Reserve is a specialist on the great depression. So, after all that, do you think it can happen again. With all this acquired knowledge, would we see it coming and be able to guard against it?

This question has been occupying me lately, and I am leaning towards a no. I don't think we'll see it coming. We may know how to guard against that specific event, but I am starting to believe that history never repeats itself. Events may seem similar, but there are infinite combinations of how they are triggered, how we react to those triggers, consequences, possibilities, and, of course, conclusions. If history never repeats, then studying history may not provide much value other than protecting us from that exact event.

My other opinion is that the world is getting more interconnected and more complicated. By this I mean that connections are forming that we may not realize exist or are even correlated. The world of the past will never happen again, and if the event of the past happens in the current world, the consequences will be quite different than before. Unfortunately, some other event may take place that may bring us the same type of devastation. Basically, my theory is that history can never repeat itself because the world is continuously changing.

There is some kind of random under-tone to the world. Some people call it luck, others misfortune. Let's say you trade stocks. You've read all there is about the company. You think you understand the fed, the government, the currency, etc... You believe very strongly in the financials of this company. You buy the stock. First it goes up, but then it drops like a rock. It seems that this company was dependent on the knowledge of a single engineer who was hit by the train. An unforeseen circumstance knocked you out of the market. What is that circumstance, is it randomness. Can you foresee it? Can you calculate its probability of occurrence? Do you understand its impact? I don't know, but it doesn't seem likely especially with our current understanding of probability. What is more likely is that we may get a safe feeling of security due to our acquired knowledge or perhaps our previous fortune, and this if nothing else will lead us to ruin.

The other item I wanted to cover was noise. This blog is noise. CNN is noise. In fact, a large part of the internet is noise. First off, the question is whether more information is noise or valuable artifacts. And if more information is noise, is it harmful? Does having more information actually increase your probability of making a wrong decision? Can you measure what information is valuable and what is noise. These statements seem very counterintuitive. What I am basically saying is that knowledge may actually be bad for you. Our brains seem to have adapted to this by actually reducing the large amounts of knowledge into manageable chunks. A lot of knowledge is simply forgotten, other knowledge gets reduced into some basic concepts and understandings. Does learning everything there is to know about a company such as all their news statements, their financial statements, statements made by their peers, etc... somehow takes away from the bigger picture?

If anyone out there has an answer, please, do write a comment.