Friday, December 30, 2005

Grid

There is shift in the tech community towards grids. Oracle came out with its 10G database, now Sun has released the Sun Grid, and is offering $1 per CPU hour to run apps on their grid. At the moment, grids are still a novelty. Berkeley introduced the grid to masses through the search for aliens (SETI). Their system, BOINC, is a very nice and simple grid, but is very University geared. There are a number of vendors and a couple of enterprise strength open-source solutions.

Vendor Tools
http://www.entropia.com
http://www.datasynapse.com/
http://www.avaki.com
http://www.platform.com/
http://www.ud.com
http://www.gridsystems.com/

Open-Source
http://www.zetagrid.net
http://www.globus.org
http://www.cs.wisc.edu/condor/
http://gridengine.sunsource.net/
http://boinc.berkeley.edu/
http://www.globus.org/cog/java/

The other interesting development is that people are talking about Grids in the same breath as web-services. Personally, I am not a big fan of web-services, and has always considered web-services a solution to a very specific and narrow problem space. A grid can be thought of as a single computer with a whole lot of CPU's. I guess running apps on the grid is really just a service, and web-services, in theory, is attempting to standardize that communication. A service oriented world is interesting, but I am not convinced the world works that way.

On top of all this bable, I am proposing to build a PSG (Pretty Simple Grid) as an open-source project. The grid, unlike all other grids, will be geared for small teams whether in large organizations or not. The grid must be simple to install and simple to use. It's features will be limited to what is required and what doesn't require much setup. It will be written in java; client download-able via internet; the client will run as service, screen-saver, etc... The goal is to allow individual technology teams to leverage the grid in their offices. Most of us working for the Man probably work in pretty small teams, but are a part of a large entity that is impossible to convince to install a large grid. A small grid used solely in the small team with their 10 pc, is simple to install and provides a lot of value.

Tuesday, December 06, 2005

Are great developers dopamine addicts?

The below post is not complete, and not fully developed. I believe there is some sort of connection between dopamine and great developers. But, dopamine may not be the cause, but an effect. In any case, enjoy the post but don't judge the author.

Dopamine is natural chemical produced by the body. Based on medical studies, researchers have discovered that during periods of excitement and satisfaction, dopamine levels in the brain increase. One such study, discussed in the Wall Street Journal, shows that the reason people enjoy shopping is because of the excitement of trying on something new, or experiencing something new. For example, they've found that people tend to buy more things when they shop in a new environment such as another city.

Great developers tend to be people that are constantly searching for new challenges. They are in a constant pursuit of the known. Give a great developer a non-trivial project, and ask them what they think of it. A great developer will tell you that the project is fun and interesting, a bad developer will complain that the project is hard. Give a great developer a simple project that they've already mastered, and ask them what they think of it. A great developer will complain of boredom, and the lameness of the project. A bad developer will seem happy to have received an easy task.

So, where does dopamine come in? I am thinking that the reason great developers are in a constant pursuit of the known is because the pursuit is exciting; pursing the unknown gives them pleasure. It's the excitement of the chase. Once the chase is over, the excitement is over, the dopamine level decreases, and the developer becomes sad and bored until the next challenge.

Sunday, October 16, 2005

Autonomous Computing

Autonomous Computing is a very popular buzzword at IBM. The term describes a system that has self-configuration, self-healing, self-optimization, and self-protection. (copied from IBM)

A system that has all these attributes is excellent. The system does not require configuration. If you put a system on a production machine, it will automatically recognize that its in production and identify production configuration settings. In case of an error, the system will attempt to identify where the error originated from, fix the problem, and correct side affects by re-processing business logic, for example. The system is also able to monitor itself and perform tuning. For example, the system would recognize that certain data-structures tend to be of specific sizes, and initialize the data structures with the necessary size rather than continuously performing costly re-size operations. The system might also identify certain processing that can be done in parallel, and automatically split the processing. The system will also attempt to survive. If the production server is inadvertently stopped, the system will migrate to a different server, re-configure, and continue. In another scenario, if the database fails, the system will switch to a different storage medium.

Great, absolutely great; really hard to do. At interesting problem arises when building non-deterministic systems. They are very hard to test. More specifically, it is very hard to know exactly how a system will react in a scenario that hasn't been considered. For example, a system might rerun certain jobs numerous times causing data corruption, or inadvertently switch servers causing data fragmentation. The system might fix a data error by tweaking variables that could cause data problems without generating any errors. The bottom line is, for risk critical systems, non-deterministic machines have a potential to cause more harm then good. This might be why the business community has been weary for AIish technologies.

I am a great believer in non-deterministic systems. I think there is a great benefit to them. The problem is how to introduce them in a way that makes them more deterministic. The probable answer might be to build more complex systems. Another answer might be more descriptive languages. Each function might have attributes telling the system what could be done with this function. If the function modifies data, then it is not idempotent, etc... The system almost has to understand what its limits are, and work within the given confines.

Monday, September 26, 2005

SOX - Security Policy

I was asked to solve an interesting problem today. The problem is to hide the database password from everyone but the production system that uses it. The db account information is currently stored in properties files. The properties files are in plain view on the production boxes, along with the version control system, etc... The goal is to have the password reside in a single spot, and in such a way, where it is still accessible by a couple of production system across the globe, but is not known by anyone but the senior manager and the dba. hmmmm. It should also be possible to change the password by modifying it in one spot, and have every system automatically start using the new password. hmmm again.

I thought about this for a bit, and came up with using public/private key cryptography. The idea is to put a private key on each machine that needs to use the production db account. The private key will only be accessible by the system account. The system properties file will contain a guest database account that will have access to a password table. The password table will contain a crypted account information that was encrypted by the dba or the manager using the associated public key. So, the dba crypts the db account using the public key, writes the crypto into the table. Each system has a guest account to read the table and has access to the private key which will decrypt the account. The system will then drop the guest connection, and re-create db pool using the decrypted production db account. The solution sounds good, but has a major flaw. It requires a guest account on a production database system. The guest account might not sound very dangerous, but it allows the hooligan to start from within the database rather than have to figure out how to even connect to it.

Friday, September 09, 2005

Humility

I saw something today that I would not wish upon my worst enemies. I saw humanity at its core. There was nothing that could have been done. Nothing in the world could have changed it.
Our bodies are extremely fragile, and it all ends as quickly as it starts.

The Rabbi said it was meant to be. Our lives are pre-ordained, he said. It is what it is. Fate.

He went on to say that we come in to this world with our hands closed, and leave this world with our hands open. In the beginning we are selfish, and want for ourselves, at the end we take nothing except for who we were.

There was nothing to do. I stood at the edge watching as her husband shoveled dirt. It was an unbearable site, but it had to be done. And, he had to do it. To be in that position is absolutely unthinkable, the absolute misery. But it had to be done. It was very important.

That's it. That was the end. There was nothing that anyone could have done. How can our lives be so fragile, and we spend them so recklessly. The Rabbi said that what we take from this world is who we were, what we accomplished, our respect, our dignity, humility.

Time goes by so quickly. It rushes by, going quicker and quicker. I feel it now; every day time goes faster and faster. Hours roll into days, weeks, months, years. Years go by as fast as a minutes. Ideas, moments, events, opportunities, gone as quickly as they appear. Some are forgotten and lost, others remain as a memory, a feeling, wrapping themselves around us, forming who we are, what we shell take.

I need to take life more seriously, or perhaps, the goal is to take it less seriously. Be a person said the Rabbi, that's it.

Sunday, August 21, 2005

Steganography

There was a post on slashdot about Steganography. Now I am not much of a slashdot reader, but chaos prevailed, and I was on slashdot. The concept is great. The ability to hide data within other data such as an image. There are a couple of algorithms available out there, but it seems none are mature enough for general usage.

There is a good article here, at least an article that rates high on google.
http://www.guillermito2.net/stegano/ideas.html

The guy makes a very good point that all data out there is not random, seems random, but in fact there is order to chaos. Encryption algorithms tend to go against the grain. Its like dumping a pink elephant in the middle of Time Square and asking what's out of the ordinary. Most encrypted files beckon for decryption. They sit out there in plain site with nothing hiding them but the encryption algorithm.

Encryption if very complicated, Steganography seems even more complicated. I am wondering if its possible to use the actual image as data; versus adding something to the image, or modifying part of the data stream like the header or certain insignificant bits. Given a file and an image, would it be possible to produce a mapping such that the file maps to the image, and given the image and the key its possible to produce the file. The key is obviously very complex and very specific to the file. The key is meaningless by itself, and so is the image. Of course, the weakness is that the key must be protected. The beautify is that there is no encrypted file sitting somewhere. Instead there are trillions of images sitting in the public domain. No algorithm defined, but it should be possible. For example, given a neural net, it is possible to get the correct result. The image is the outcome, the file is the inputs, the net then figures out the weights on the nodes to get the right result. The weights then make up the key, assuming that reverse is true, which I am not so sure.


Tuesday, August 02, 2005

Software Complexity

I spend a lot of time developing large systems. I find myself continously wrestling with the fact that in order to make a system easy to use, it needs to be exceedingly complex. You could almost draw a direct parallel. As the system becomes easier to use, its complexity increases by the same factor. The side affect is that maintaining a large system becomes that much more complex. Each system has certain drawbacks that were probably rationalized as functionality, or perhaps functionality that later become a drawback. For example, lets say a system maintains some internal data cache. The data needs to be refreshed every day.

Solution 1 is to enforce that the system is restarted every day causing the internal cache to be refreshed.

Solution 2 is to build a timer that automatically refreshes the internal cache every midnight. The timer locks the system for that period and refreshes the cache.

Solution 3 is to build something dynamic that is able to identify a slow period, then lock the system and perform the refresh.

Solution 4 is to build a partial cache refresh, were only changed items are refreshed. The system automatically scans the data source every few minutes, identifies the changes and inserts them into the cache, locking only the data source.

Solution 5, the external data source notifies the system when data changes occur, and the system performs a specific cache refresh, locking only the items that changed.

Solution 6 incompasses solution 5, except instead of locking, a seperate cache is built and swapped in during a slow interval, etc...

There are a lot of solutions to this problem. In fact, there are a lot of solutions without even considering AI where you can start having nueral nets try to predict slow intervals, optimal times for refresh, or which data is likely to change, etc... Each solution makes the system more flexible and much smarter but with an ever increasing complexity cost. Solution 6 will probably yield the most flexible system with the least amount of downtime with perhaps some AI thrown in. But, Solution 6 will also be very complex with quite a bit of code envolved in making this work. There is going to be a seperate system that needs to be aware of data changed, a method of notification, ability to modify and lock parts of the cache, ability to identify slow periods or periods where the specific data is not used, etc.. Solution 6, if built correctly, will probably require little day to day maintenance but once it fails will be very complex to troubleshoot and fix.
So, what's the point of all this? I don't know. I like to make complex systems simple, but that's a loosing battle. Perhaps, the curve is not really a a straight line but more of a bell curve, where after a certain point, the system complexity drops. The system becomes so complex and so smart that it is actually easy to maintain.

Sunday, July 24, 2005

Sarbanes-Oxley

The organization, I am currently consulting for, is in the heat of trying to comply with Sarbanes-Oxley. I hear this name mentioned at least twice a week, if not more, as a reason for a number of wrong and potentially dangerous decisions. The act, through legal wording, goes through great detail to specify how auditing shall be done, how its paid for, who is going to do it, what the deliverable will be, etc... The report explicitly assumes that auditing is a major fix to all the accounting problems we are having. As long as the auditors do their job, everything will be fine. The report also touches upon record keeping, and briefly mentions, I believe in one sentence, the requirement to maintain safe access to a production system.
It is very import to question what auditors say. Some of my friends are these auditors; they are overworked, just out of college, being promised partnerships, and the sky. The act is designed for auditing companies, if not written by them, not the companies that are being audited. The requests the auditors are making are going to cause more harm then good. Yes, the requests sound excellent on paper, but will create a huge mess on the ground. The implementation of this Act is also being unnecessary rushed, probably for political reasons. DO NOT RUSH THIS! Fixing the mess will be much more expensive.
I am also having some trouble finding the exact places that outline all this requests mentioned by the Auditor. Perhaps, the Act simply gives the auditor control, and outlines that the auditor will be partial and correct. The Act then simply assumes that the auditor is right in his statements, and the company is required to comply.
If you take away anything, question everything. The auditors are following subject lines rather than diving down and understanding the text.
Compliance is, of course, required by law, but it should be done in a way that develops a limber organization that is able to adapt and involve. Rather than, an organization stuck in a paper trial and an ever evolving mountain of red tape.

Monday, July 11, 2005

Distributed Groove

I spent my day troubleshooting JGroups. For those unfamiliar, JGroups is a open source middleware system. The problem was that sometimes when a client system is restarted, the client would fail to see the cluster and instead become a coordinator of its own cluster. After a bit of research, the problem ended up being connected with long garbage collection delays on the real coordinator. This caused the server to not respond to heart beat requests, which in turn caused the client to think that it was alone and therefore become its own coordinator in its own cluster.

To solve the problem, I started looking through the JGroups source code and I stumbled upon a reference to the Lamport timestamp algorithm. I remember studing the algorithm in school. The basic premise is an ability to understand logical order of messages in a distributed environment based on a concept of time. His paper goes into quite a bit of detail, some of it awfully trivial, and other awfully complicated. This brought me to his website, where I discovered a stock pile of research papers.

He covers a lot of very interesting concepts, most of it some what over my heard. The bizantine systems are very interesting: the ability to write systems that can react to any type of error. He also goes in to describe a truly parallel garbage collection algorithm, something that would be quite nice in Java.

It's very interesting. Most of the code being written these days, myself included, is written to get it out quickly, and cheaply. The code does what it supposed to do, but is by no means very efficient or bullet proof. It does the work, but is at best a temporary solution. Definately not elegant. Then I get to see all these research papers dealing explicitely with the elegance of programming. It is a nice feeling: raw computer science.

Monday, May 23, 2005

First Entry

For those of you wondering what Pons Asinorum means, the exact translation is "Bridge of Asses". The phrase comes from Euclid's fifth proposition from book one of elements. The proposition states that the two angles of an isosceles triangle are equal. Euclid meant the phrase as the first test of intelligence and a bridge to the harder problems that follow.

The reason I choose this name is two fold. On the one hand, this blog will be used as a dumping ground of half baked ideas and half baked knowledge. This blog is a means to explore the ideas and knowledge in further detail to perhaps derive something more interesting and useful. So, the first part of the bridge is my own test of intelligence. The second part, is the bridge to something else that hopefully the ideas and knowledge combined will produce.