Sunday, August 21, 2005

Steganography

There was a post on slashdot about Steganography. Now I am not much of a slashdot reader, but chaos prevailed, and I was on slashdot. The concept is great. The ability to hide data within other data such as an image. There are a couple of algorithms available out there, but it seems none are mature enough for general usage.

There is a good article here, at least an article that rates high on google.
http://www.guillermito2.net/stegano/ideas.html

The guy makes a very good point that all data out there is not random, seems random, but in fact there is order to chaos. Encryption algorithms tend to go against the grain. Its like dumping a pink elephant in the middle of Time Square and asking what's out of the ordinary. Most encrypted files beckon for decryption. They sit out there in plain site with nothing hiding them but the encryption algorithm.

Encryption if very complicated, Steganography seems even more complicated. I am wondering if its possible to use the actual image as data; versus adding something to the image, or modifying part of the data stream like the header or certain insignificant bits. Given a file and an image, would it be possible to produce a mapping such that the file maps to the image, and given the image and the key its possible to produce the file. The key is obviously very complex and very specific to the file. The key is meaningless by itself, and so is the image. Of course, the weakness is that the key must be protected. The beautify is that there is no encrypted file sitting somewhere. Instead there are trillions of images sitting in the public domain. No algorithm defined, but it should be possible. For example, given a neural net, it is possible to get the correct result. The image is the outcome, the file is the inputs, the net then figures out the weights on the nodes to get the right result. The weights then make up the key, assuming that reverse is true, which I am not so sure.


Tuesday, August 02, 2005

Software Complexity

I spend a lot of time developing large systems. I find myself continously wrestling with the fact that in order to make a system easy to use, it needs to be exceedingly complex. You could almost draw a direct parallel. As the system becomes easier to use, its complexity increases by the same factor. The side affect is that maintaining a large system becomes that much more complex. Each system has certain drawbacks that were probably rationalized as functionality, or perhaps functionality that later become a drawback. For example, lets say a system maintains some internal data cache. The data needs to be refreshed every day.

Solution 1 is to enforce that the system is restarted every day causing the internal cache to be refreshed.

Solution 2 is to build a timer that automatically refreshes the internal cache every midnight. The timer locks the system for that period and refreshes the cache.

Solution 3 is to build something dynamic that is able to identify a slow period, then lock the system and perform the refresh.

Solution 4 is to build a partial cache refresh, were only changed items are refreshed. The system automatically scans the data source every few minutes, identifies the changes and inserts them into the cache, locking only the data source.

Solution 5, the external data source notifies the system when data changes occur, and the system performs a specific cache refresh, locking only the items that changed.

Solution 6 incompasses solution 5, except instead of locking, a seperate cache is built and swapped in during a slow interval, etc...

There are a lot of solutions to this problem. In fact, there are a lot of solutions without even considering AI where you can start having nueral nets try to predict slow intervals, optimal times for refresh, or which data is likely to change, etc... Each solution makes the system more flexible and much smarter but with an ever increasing complexity cost. Solution 6 will probably yield the most flexible system with the least amount of downtime with perhaps some AI thrown in. But, Solution 6 will also be very complex with quite a bit of code envolved in making this work. There is going to be a seperate system that needs to be aware of data changed, a method of notification, ability to modify and lock parts of the cache, ability to identify slow periods or periods where the specific data is not used, etc.. Solution 6, if built correctly, will probably require little day to day maintenance but once it fails will be very complex to troubleshoot and fix.
So, what's the point of all this? I don't know. I like to make complex systems simple, but that's a loosing battle. Perhaps, the curve is not really a a straight line but more of a bell curve, where after a certain point, the system complexity drops. The system becomes so complex and so smart that it is actually easy to maintain.