Sunday, October 31, 2010
Zion Business Intelligence
You are an Enterprise Architect in a Global Investment Bank and tasked with finding a solution to the data fragmentation problem.
And this is how it happened -
This particular bank has many data files flowing whether which way between various groups in the organization. Product controllers take a feed from front office systems, and market risk another feed, and credit analysts take a third feed, except the 3 feeds are generated differently, and have slightly different points of view, and in some cases, even different trade populations. But then, someone comes along and says they need to see 3 numbers - 1 from market risk, 1 from credit analysts, and 1 from product controllers side by side. And you say - that's impossible because it will take months just to analyze whether the data populations are the same, and even if there were, the numbers are produced at different levels, and even if they weren't, there is no way to show them on the same report. And even if there was, this is a 6 month project, and who is going to pay for it. And then they say, "aren't you an architect", and you hesitantly say "yes", and then they say "so go fix it."
Rabbit Hole
So, how to fix it. Core data primarily generates from front office systems and then flows through the rest of the organization. You can think of it as streams of water branching off into many smaller sub-streams, and then further branching off, until eventually the stream is to weak to branch of and just sips into the ground.
Well, easy breezy you say. Pervasive BI, data warehousing, Online Analytical Processing, bottom up data-warehousing, top down, bus architecture, centralized architecture, federation, hub and spoke, relational, dimensional, operational stores, data marts, Inmon, Kimball, conformed dimensions, Boyce-Codd normal form, 3NF, .....
We just need to take this mix - shake it up, and we'll get ourselves a fancy enterprise data architecture or perhaps something strong enough, maybe with a cherry - so we can forget the whole thing ever happened.
Blue pill or the red pill
What to do.
Well, Kimball likes the bus architecture, so, let's give that a go. The bus architecture consists of a bottom up or was it top down approach where you basically start off with a bunch of data marts, which then flow into a data warehouse. The data marts are primarily operational store type structures, while the data-warehouse is a pure data-warehouse, star-schema and all.
The problem, of course, is that you effectively already have a ton of little data marts all over the place, which don't conform. Ah, that's the problem, we need to have conformed dimensions. Right, and we do that how exactly? The other problem, of course, is that this is the same data being treated in a slightly different way, maybe with a slightly different trade population or attribute set or granularity or perhaps with a different temporal point of view. Seems awfully wrong to have a bunch of data marts storing the same data, which you then have to reconcile all together.
Right you say, let's go the other way, Inmon likes top down, so, let's create a big data warehouse, which then populates the data marts. So, who is going to build this monstrosity exactly? Well, can't be the individual business groups, because they are not stupid enough to take on a project like this, so, it would have to be some central group away from any particular business line, and close to senior management, 'cause this is going to cost a lot of money. So, the group is created, except they don't know what they are doing - very technical guys, but don't understand the business at all. And if by some miracle they do, they can't keep up with it. If this project actually succeeds, which is highly unlikely, and actually have the right amount of data, which is frankly impossible, it will still fail, because the business moves just too damn fast, and at the end of the day, they will never ever be able to fully understand and own what they are storing. So, after some time, and a whole lot of money, this will be dramatically killed off.
What about federation you say? We just need a magical vendor, and all our problems will disappear. You see this vendor will create an abstraction on top of our asylum, and this way, we will present a clear simple view shielding the end user from the underlying complexity - easy, breezy. I suppose that could work if it was actually possible to build enough complexity into this thing to actually bridge something which is fundamentally diverging and is actually able to perform at the required speed, and is actually supportable. So, let's just kill that idea for now, before we embarrass ourselves much further. Perhaps, if works for you readers out there - if you happen to like this federation thing - check out Composite, Inc - it's all the rage these days.
Red pill
Reality it is then. Well, seems to me we need a new philosophy. Let's call it Zion. There are a few core principles of this theory:
1. Each piece of data has 1 and only 1 owner.
2. The owner is the only one that can change this piece of data.
3. Each piece of data has a natural key and a surrogate key
The distinction between this theory and all the above is that this one says that first you need to understand what data is before you start deciding on how to deal with it. And the first question to answer is who is responsible for it? Responsible is not the same as which IT group owns the data-warehouse, but responsible, as in which business group is responsible for this data. If you need to change it, who best knows how to change it? How to evolve it? What it means?
If you answer this question, now you know who is going to build the data store for this type of data - and this will become the golden source for this data.
To be continued....
Friday, May 21, 2010
Social Networking
What is interesting is where all of this is going. I think the next step will be that teens will record their life continuously and post bits of it for the rest of the world to see.
There are some examples of this already occurring:
http://research.microsoft.com/en-us/projects/mylifebits/
http://qik.com/
It needs to be effortless, continously recording, maybe an addon device behind the ear, connected to an iphone in your pocket.
After something interesting happens, you can then open the iphone, scan back, cut out the relevant section and share it via some social networking site.
Thursday, September 17, 2009
Connecting to Sybase IQ 12 from Analysis Services 2008
If you have the Sybase 12 client installed, you will have the "Sybase ASE OLE DB Provider". If you also install the Sybase IQ client, you'll additionally have the "Sybase Adaptive Server Anywhere OLE DB Provider 9.0"
On the ODBC side, you should have the "Sybase IQ", "Adaptive Server Anywhere 9.0", and "Sybase ASE ODBC Driver".
1. Create an ODBC connection using "Sybase IQ" data source.
a. Give it a name in the Data source name field in the ODBC tab
b. Provide your username and password in the Login tab.
c. Enter the server field in the Database tab. You'll need to get this information from your DBA. For me, it was <hostname>_<instance name>
d. For the network tab, select only TCP/IP, and enter: host=<host>;port=<port>
2. Open Analysis Services, create a new data source using "Native OLE DB\Sybase Adaptive Server Anywhere OLE DB Provider 9.0".
a. In the server or file name entry put the name of your ODBC data source that you created in step 1.
b. Press Test Connection, everything should work.
3. In the data source views section, create a new data source view given your data source that you just created. It seems importing the tables directly fails with some arcane error.
a. Don't select any tables, just click next, until an empty view is created.
b. You can now create named queries like "select * form <table name>"
Tuesday, August 18, 2009
Healthcare
What's at stake seems to be a lot more fundamental.
Is it or is it not the role of Government to protect the life of its citizens. Must the Government protect the "inalienable rights" of man - life, liberty and the pursuit of happiness? Or is such responsibility more fickle and has dependencies on array of variables such as politics, and budgets and costs, and personal gains?
Let's see if we can look at this problem at an even more basic way. Life is fundamentally unpredictable - complexity theory. Tomorrow we can either be hit by the bus, get cancer, both, or neither. In the case of neither, we, humans are conditioned to ignore the perils until either one or both of the above occurs. Now, is it not the role of a stable, evolved society to protect its weak and young? What does it say about our society when we concisely ignore the weak and the young? Are we as a society egotistical, thinking like every other great society before us, that we can do no wrong, strong survive, weak perish, one must accept the law of nature?
Some statistics on the makeup of America, total uninsured by various categories:
http://www.kff.org/uninsured/upload/7451_04_Data_Tables.pdf
http://facts.kff.org/chartbooks/State Variation and Health Reform.pdf
Tuesday, February 17, 2009
XML/A via Flex
The spec is available at http://www.xmlforanalysis.com/xmla1.1.doc
<SOAP-ENV:Body>
<cxmla:ExecuteResponse xmlns:cxmla="urn:schemas-microsoft-com:xml-analysis">
<cxmla:return>
<root xmlns="urn:schemas-microsoft-com:xml-analysis:rowset" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:EX="urn:schemas-microsoft-com:xml-analysis:exception">
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="urn:schemas-microsoft-com:xml-analysis:rowset" xmlns="urn:schemas-microsoft-com:xml-analysis:rowset" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:sql="urn:schemas-microsoft-com:xml-sql" elementFormDefault="qualified">
<xsd:complexType name="row">
<xsd:sequence>
<xsd:element minOccurs="0" name="_x005b_Counterpart_x005d_._x005b_All_x0020_Counterparts_x005d_._x005b_ABN_x0020_AMRO_x0020_Bank_x0020_N.V._x005d_" sql:field="[Counterpart].[All Counterparts].[ABN AMRO Bank N.V.]"/>
...
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
<row>
<_x005b_Counterpart_x005d_._x005b_All_x0020_Counterparts_x005d_._x005b_ABN_x0020_AMRO_x0020_Bank_x0020_N.V._x005d_ xsi:type="xsd:double">10</_x005b_Counterpart_x005d_._x005b_All_x0020_Counterparts_x005d_._x005b_ABN_x0020_AMRO_x0020_Bank_x0020_N.V._x005d_>
...
</row>
</root>
</cxmla:return>
</cxmla:ExecuteResponse>
</SOAP-ENV:Body>
This is a sample response. In this case, I've used Pentaho Mondrian as the cube provider.
This XML can be rather easily walked with Adobe Flex.
var message:XML = ...
var soapEnv:Namespace = message.namespace("SOAP-ENV");
var cxmla:Namespace = new Namespace("cxmla", "urn:schemas-microsoft-com:xml-analysis");
message.addNamespace(cxmla);
var xsd:Namespace = new Namespace("xsd", "http://www.w3.org/2001/XMLSchema");
message.addNamespace(xsd);
var sql:Namespace = new Namespace("sql", "urn:schemas-microsoft-com:xml-sql");
message.addNamespace(sql);
var body:XMLList = message.soapEnv::Body;
var executeResponse:XMLList = body.cxmla::ExecuteResponse;
var schema:XMLList = executeResponse..xsd::schema;
var complexType:XMLList = schema.xsd::complexType.(@name="row")
var elements:XMLList = complexType.xsd::sequence.xsd::element;
The above will return a XMLList of elements within the complex type row.
A slight change, will produce a list of actual data rows.
var rows:XMLList = executeResponse..rowset::row;
Saturday, January 10, 2009
Problem's for your CIO
Every organization struggles with the problem of data. Either it's too much data, or not enough data.
In the case of too much data, the analysts and the IT teams struggle to make sense from the barrage of data. A lot of times, IT is forced to archive data, and by so remove it from analysts access. In other cases, large amounts of data is simply stored, but no viable reporting is available, or only a small segment is reported on.
In the case of not enough data, local analysts are unable to get complete pictures because data is stored in multiple systems, and a lot of times, is localized to regions. This means that organizations struggle to get complete pictures. Imagine what that means for financial risk: What is my exposure to Bear Sterns? I don't know, it will take a few days to compile the report from various places. Don't believe me, this was the case at Lehman, and is the case at a Global Asiatic Bank. I would also bet, is the case, in almost very other institution.
2. Business doesn't trust IT to deliver.
The business simply does not not trust IT to deliver the solutions it needs. When solutions are delivered, they become large monolithic entities locking the organization in, and becoming an expensive constant expense. Any change to the systems, becomes an expense at times larger than the cost to build the system in the first place. In other times, IT simply looks like a dog chasing it's own tail. They move very fast, and have short time-lines, but the end products are not even remotely close to what the business wanted. IT seems to always have an excuse. I was once in a 3 hour meeting. The business analyst wanted account numbers. He wanted a system where he can debit one account and credit another account. For that, he needed account numbers, and amounts within those accounts. The developer for the system was trying to explain that the analyst didn't need accounts, but only had to enter the transactions and attributes of the money movement. The system can then aggregate those transactions in what ever way the analyst wanted. You see, the system was designed generically, and adding account numbers would intrude on that design. And, so they went, back and forth. Neither understanding the other. The end result, nothing. No account numbers. It was deemed that this feature wasn't go live critical, and can be addressed in the next phase.
The fault is actually neither IT nor the business. The problem is more basic. The business is very agile. It's able to move quickly and easily. If you've ever read a legal document or saw financial models, you'll understand. They can be very complex and very nuanced. Most technical architectures are not build to be nuanced. They are built to solve concrete Boolean problems. IT wants the problem defined in concrete terms, but the business is only able to articulate it's current understanding of the world. Unfortunately, tomorrow it will be a new understanding. And so, IT builds a monolith, because, that's what it knows how to build, the business changes, and the organization suffers.
This really saddens me. Technology is capable of solving so many problems, but instead it's relegated to addressing the most basic of operational problems, and even there, we struggle. Consider all the technical possibilities, neural networks, expert systems, machine learning, data mining ....
3. Business data is not clean.
So much of data is simply bad. Imagine how complex some of the systems are, and then imagine the possibility of an error. An error that doesn't generate an exception, perhaps, rounding, perhaps, a logic error, perhaps, an unforseen condition. The output data becomes bad, and so it flows through the system. A lot of times, enterprise architectures don't even do basic reconcilliation. Reconcilliaton requires active design, time and thought. Most architectures are organically produced, and are feature driven rather than any thoughtful design. The end result is frankenstein architectures and garbage data.
4. There are a lot of manual processes.
How many times have you created a new system, which actually creates more manual processes than it removes. New systems sometimes require users to enter data in multiple places, verify multiple places, etc... Some firms have massive operational groups, hundreds of people. Their sole job is to do what systems fail to do. Enter data in multiple places, reconcile, data entry, data messaging, normalization, etc... This is a horrible model. It's extremely error prone, not too mention expensive.
5. Heavy reliance on people.
I believe that machines and people have their roles. Unfortunately, a lot of jobs done by people to day, should be done by machines. Machines and people need to find a harmony. Some things we do very well, other things, machines. Sifting through large amounts of data should be the role of machines. Alerting us to unforseen circumstences, should be machines. Allowing us access to data, should be the role of machines.
6. Presentation
So few firms give through to the presentation tier. In most cases, it's an after thought. Some UI's are desktop, other's web, other's, something else. The end user is required to memorize what features are available where, what report has what, etc... God forbid, we standardize and unify the disparate systems. The user should have a single place to go, a single way to do something. The learning curve to learn how to use everything should either not exist, or be extremely small.
Monday, December 22, 2008
Lehman Brothers
Some of the guys have worked their whole lives at the Firm. Think about it, you spend the last 20 years of your life building something. Sometimes you work late, sometimes even on weekends. You meet tough deadlines, you build out the system, you have projects, and Jira items, and future ideas for improvements, and then one Sunday evening, it all disappears in a single 20 second news-clip.
What can I say about that Monday, a lot of guys were afraid. Single earners, mortgage, kids, car payments, private school, dance lessons, ... It was never a problem before, the Firm was known for it's bonuses. The irony of having more money, is that your expenses rise proportionately. Some of the guys were hit very hard. A lot of the bonus is paid in Lehman Stock, part of the 401K is in Lehman, the firm also promoted personal investing in the stock. All in all, if you were with the firm for 20 years, you have a major percent of your worth tied with the company, and then it's gone.
The next few weeks gyrated between moments of team comradery and private lonely introspection. Every person came to work on time and stayed for the full day. Most of the guys, myself included, kept to the Lehman dress code of full business attire. I think it was a out of respect for the Firm, or perhaps, just routine.
Barclays purchase of the firm provided a rare glimmer of hope, a possibility of normality. It wasn't so much as having your job saved, which is of course very important, as a possibility of saving what you've built. Someone using what you've worked so hard to create. As time dragged on, it became clear that Barclays had no intention of taking our system, so, a glimmer of hope slowly changed into despair with a dash of anger. Barclays, of course, wound up laying off most of us. The system I've worked on was trashed.
Today, I am at another company, assigned the task of comparing Lehman system's to my new employer systems. This is how Barclay's employees must have felt when they found out their company was buying Lehman. It's us or them. And, so I do what is requested of me. I criticize and destroy the very systems I promoted, just a few months ago.
I have the deepest respect for the Firm. I have never worked for a company like that. They really tried to create something more then the sum of themselves. I shall miss it.
Friday, October 24, 2008
Intentional Software
Gregor Kiczales, founder of AOP, latest paper. It also led me to a lot of other concepts such as the law of leaky abstractions, history of software, Lagom Process, along with the more obtuse concepts such as the omega number, and people like Gregory Chaitin.
And of course, what would our industry be without it's acronyms: MDA, DSL, BPM, UML, DDD, SOA. There are plenty of others, but I think I've made my point.
That's a lot of information to digest especially if you looked up Chaitin, which would have led you to our Founding Fathers: Leibniz , Turing, and Godel. Now ask your self: what kind of a thought narrative can take a person from Adobe Flex to Godel. It would almost be funny.
Now, let's step back for a moment. A lot of people argue that software development should be reduced to visual tools. A counter argument can be that mathematicians do not use visual tools to draw up their equations. They use blackboards and chalk, the most primitive of tools. Excel has been praised as the most successful intent based system. But, if you look at it, it's not visual at all. At best, it's a basic grid, with cell co-ordinates, and a blank text input box allowing manipulation of cells. Another argument is that business people are somehow not smart enough to program. It takes a special kind of mind to generate code. The fallacy with this statement is that business people already code, just not in the typical "tech" way, but rather in their own domain. Their interaction with excel, the domain expertise, the manipulation of that domain expertise - can all be considered coding. They manipulate their symbols to achieve their goals. The only thing separating technology developers and domain experts is which domain they are experts in.
There was a quote from Charles Simonyi that went something like this: If we don't expect business people to learn how to code, why do we expect coders to learn the business. Each path is extremely inefficient and is rife problems. So, instead, let's allow each group to focus on what they do best. Developers should stick to technology, business people should stick to business.
So, we have established that a business person is capable of performing some form of "development" to encode their domain expertise into a set of steps, "their intent". It is also probably safe to assume that a business person understands concepts like if-then-else, for-each and standard algebra. It is also safe to assume that they know nothing of JSP, Servlets, JMS, EJB, transactions, XA, JDBC, SQL, Java, Class, public/private, encapsulation, polymorphism, design pattern, singleton, facade, heap, stack, binary search tree, NP-complete, and on, and on, and on, .... So, where does this leave us? I think it means that software development stops being the pure domain of developers, and instead is split between developers and business people.
If we look at a typical business system, we can see that it has inputs(JMS, GUI, etc...), a concrete data representation model in the form of a database schema, complex output in the form of varied reports, processes that criss-cross the system that have some triggers such as external events (JMS, Schedule, User, etc...) . There is also business logic in the form of calculations, business steps, if-blocks, etc... sprinkled through the system. Some of it lives embedded in the report logic, others in the processes, and some, perhaps, even implicit in the data storage or data format.
I think we can start to take steps to separate the domains. Process flows attempt to separate the process logic from the system logic. Web Services attempts to expose the individual services and by so reduce the hard linking between services. Business Intelligence is attempting to expose the data to the users and allow ad-hoc manipulation. Proliferation of domain specific languages, online compilers, rule engines is a sign of the desire to separate the system from the business rules. Hibernate, JDO, etc... are attempting to isolate the system from the underlying data stores, and map out the data definitions. Ontology's are attempting to bridge the interaction of a human defined relationships and a system. Mashups - i.e. http://www.programmableweb.com/, Yahoo Pipes, are yet more examples of technical concepts being exposed to non-technical people. All these things, in my opinion, are converging on the same topic of intentional programming.
Tuesday, October 21, 2008
Alpha Release of Drools Flex Editor - 0.5
Orange Mile is proud to announce the long awaited alpha release of a Drools Rule Editor in Flex.
http://code.google.com/p/drools-flex-editor/
The current release includes all the Flex pieces without the rather basic server side code for rule compilation, and code completion. This will be available in the final 1.0 release.This is the beginning of having rich enough components available within the system that can allow the user/admin to directly manipulate the business rules without having the long development cycle.
Orange Mile is Expanding
Although, Orange Mile started as a small math with a single avout, it has since expanded, grown and matured.
We have become a devout following. At times, looked down upon by the saecular world, but always, pursuing our dreams.
Entry written in the style of Anathem.
Thursday, October 16, 2008
Orange Mile Security Release - 1.1
The new features include:
1. A complete example based on Spring Security - see orangemile-security-test.war
2. isGranted JSTL Tag
http://code.google.com/p/dynamic-rule-security
Friday, October 10, 2008
Doom and Gloom and the Economy
We are living through historic times. Mid last year, I started getting very scared and started blogging about randomness and the economy.
http://orangemile.blogspot.com/2007/05/black-swan.html
http://orangemile.blogspot.com/2007/07/supply-of-money.html
You will notice that in the Supply of Money entry, I actually wrote about the likely hood of the collapse of world economy due to the unsustainable supply of fiat money.
What's interesting about those entries and that time period is why did my mind shift to the arcane topics of money supplies, carry trades, fiat money, when the blog entries before and after clearly deal with the arcane topics of technology. Perhaps, my subconscious started to pick up on the feelings of uneasiness in the global market; hiccups if you will. I can't possibly attribute those entries to knowledge, because I am simply not qualified to speak of money supplies, fiat currency, and carry trades.
So, what is happening today is a global loss of confidence. What is interesting is that companies, specifically, banks are hoarding cash rather then people. I would argue that if people started hoarding cash than we're all doomed. The global economy would come to a scretching halt. Chinese economy will collapse, probably throwing that country into either Marshall law or revolution. America and Europe will fall into severe and prolong depression taking the rest of the civilized world with it. Africa will fall into an even lower level of sustainability with probably wide ranging civil wars due to lack of food and an acute demand for natural resources such as diamonds, gold, and oil. India's economy will also take a severe beating, but I think they will remain a loose democracy. If they position themselves well, they may end up being the next superpower.
Right now, the US government is printing money at an ever faster clip, giving it away, almost for free, and nationalizing large areas of the financial industry. Money is flooding the global economy. What's interesting is that we are actually in the period where money is actually disappearing. As the perceived value of assets fall, money disappears. The US Government then tries to fill in the gap of lost money, by providing more money to the institutions whose wealth disappeared; hoping against all hope that the newly provided money will be used to create more money by the institutions. Let's recap how money gets created. A person decided to do something on credit, let's say by a house. They go to the bank and say give me 300k to buy a house. The banks gives you the money, and you go buy a house. The thing is that the 300k is actually some other depositor's money. Money the bank doesn't actually have. What's happening now is that the bank thought it had 300k loan asset. But instead, the 300k is really only a 200k asset. This means that if you default, the bank looses 100k of someone Else's money. If enough loans do this, the bank won't be able to cover the loses, confidence in the bank erodes, people start to retrieve their deposits, and of course, after some interval, the bank simply runs out of money to give out to depositors. This is why the FDIC was created. This is a standard Ponzi scheme. In other words, if the asset side of the bank balance sheet starts to reduce, they will reduce the amount of new loans they can give out, and by so, reduce lending, which will probably drive the interest rates up because there are less institutions lending. This actually means the opposite of what I said earlier, money isn't destroyed, the rate of creation just reduces. The bank industry is structured as a very calibrated entity, with a minor hiccup in cash flows or perceived cash flows destabilizing the entire industry. The Fed is trying to erase the perceived losses from the Bank's balance sheet, and by so, start up the loan process. Another interesting thing is how many industries rely on having a continuous supply of new loans. Imagine a ponzi scheme applied to the car industry. If Ford borrows money to pay it workers, hoping that in the future it will sell enough cars to pay back the loan, except it doesn't, so it borrows more to keep going. At some point it actually needs to borrow from Person A to pay Person B, and on and on. This should end at some point when there is no-one else willing to lend to the said company. But, unfortunately, for most companies there is always someone willing to lend. This is partly due to the obscurity/opaqueness of the financial industry. Now, our current scenario, where the said company can't get a loan not because of the financial condition of the company, but because of the financial condition of the lender; all lenders.
What the Fed is trying to do now is fight the deflationary path. Money is becoming a scares resource, not because there is not enough of it, but because banks are hoarding it. A lot of people are also talking about hyper-inflationary model. I don't see this happening, even if I believed it for awhile. In a hyper-inflationary model, you have too much money. This is unlikely because the Fed can always mop up the money supply; and because the world is dollar denominated, at least for the foreseeable future. I think what is more likely is nationalization of a number of areas, and banks, a significant increase of money available to the banks, guarantee of bank assets, forced reduction of inter-bank rate, and a forced reduction of the interest rate payed by home owners.
Interesting reads:
http://en.wikipedia.org/wiki/Fractional_reserve_banking
http://en.wikipedia.org/wiki/Credit_default_swap
There is another animal that hasn't received much news: credit default swaps. This instrument is a form of insurance against default. The problem is that this is not based on anything, and the current amount of outstanding CDS is a few times larger than all money combined ever produced. This means that the government needs to do its darnest to make sure those CDS contracts never come due, because if they do, all financial institutions will file for bankruptcy, governments will default, end of the world, etc....
So, what is the government to do. Money must be made cheaper to a point where it is basically free. This will allow the banks to start to give out loans, this would spur the market for re-financing, which should save some borrowers. At the same time, the government will probably start with the regulation. We will see a period of a slow down - recession, in which the disaster of the day, will become a distant memory, and we will start up with the next bubble, maybe energy, maybe the housing sector again to a lesser form, although, this will probably be regulated to the gills. My guess energy or commodities. But the bubble won't start until the people regain confidence, which will take a few years.
I repeat, the Fed must make money cheap. After a recovery, they will attempt to make money more expensive again to starve off another bubble, but they will tread very lightly to avoid any more panic. This means that rates will stay cheap, or only very gradually over a long interval will start to go up.
Of course, there is another animal in this picture: US treasury bonds backed by our taxes. I don't fully understand this animal and it's relationship to the money supply, but hopefully, in the next few blog entries...
Tuesday, October 07, 2008
Release of Microsoft Analysis Services 2005 Automation SDK
http://code.google.com/p/mssas-automation/
The library allows a java developer to automate the creation and modification of a MSSAS 2005 cube. The design consists of codifying most of the XMLA 1.1 specification into java pojo's via Jibx binding framework. On top of this core library, it then becomes trivial to codify specific design patterns or utilities to automate or speed up the creation/modification of a cube.
Thursday, October 02, 2008
How not to be a turkey - a dead turkey!
The idea is to build a little web app that will scan the common news sources nightly, and compile a score for different words on how negative or positive the topic is described. For example, regarding the economy, the system should pick up speeches from the Fed, congress discussions, etc... The idea behind all this is from the Black Swan Book. The theory goes that the night before Thanksgiving, the turkey should have the highest confidence in the goodness of humans.
To achieve this, I will need an NLP mood analyzer, or in other words, Sentiment Analysis. Some open source tools to accomplish this are:
NPL Libraries:
- http://www.opencalais.com/ (Reuters Web Service)
- RapidMiner
- JavaNPL
- LingPipe
- Jane16
- http://garraf.epsevg.upc.es/freeling
Knowledge Understanding
- http://commons.media.mit.edu/en/
- http://www.opencyc.org/
- http://wordnet.princeton.edu/
- http://framenet.icsi.berkeley.edu/
News Sources
- http://cookbook.daylife.com/
- http://thomas.loc.gov/home/c110query.html (Government Enrolled Bills)
Ekman's research on universal facial expressions
[happy, sad, anger, fear, disgust, surprise]
Frustration – Repetition of low-magnitude anger
Relief – Fear followed by happy
Horror – Sudden high-magnitude fear
Contentment – Persistent low-level happy
Tuesday, September 23, 2008
Informatica PowerCenter Automation SDK 1.0.0 Released!
http://code.google.com/p/informatica-powercenter-automation
The library allows the automation of repetitive patterns when creating or changing Informatica PowerCenter Mappings.
With the proliferation of development abstraction platforms like Informatica Power Center, Tibco Business Works, BPM, rule engines, etc... it becomes more and more possible to build automation software to automate the development abstraction software. In other words, these tools provide a meta definition to build certain services. In the case of Informatica, those meta definitions are geared towards ETL tasks. The automation software is then able to manipulate the meta pieces to automatically generate those services. In other words, you have a system that knows how to manage the meta service pieces.
Monday, September 22, 2008
Dynamic Rule Security is Released!!!
http://code.google.com/p/dynamic-rule-security/
I think this release may very well revolutionize the way application level security is handled. Although, the first release is somewhat simplistic in how it manages the rules, I believe it will serve a large majority of the systems out there.
The next major release will focus on expanding the rule management, adding tag libraries, and adding support for direct instantiation.
Thursday, July 24, 2008
Let me count the ways... I HATE Spring Security ACL
Now, Let me count the ways in which I hate the Spring Security Acl implementation. In any other setting, I would have written this off as some poor wanking by some poor wanker, but unfortunately, in my prior post, I've vowed to add property based security via a rule engine as an add-on for Spring Security. What I failed to realize at that writing is that Spring Security seems to be split into 2 sections. The core security, which has things like app server plugins, role, and principle management, etc... This section seems rather decent enough. Perhaps, a bit configuration heavy, but hey, that's Spring for ya. Now, this other section, the Acl section is a complete and outer fuckup. The irony is that this is a re-write of an even worse implementation.
Now, listen you Spring theists:
Why create an ObjectIdentity interface that wraps a serializable identifier, and then implement a ObjectIdentityImpl, only to cast the serializable identifier to a Long in both the BasicLookupStrategy, and the JdbcMutableAclService. As a side note, keep with the fucking naming convention. If you're going to call all the db accessors with Jdbc, then why name the jdbc lookup class BascLookupStrategy? And oh yeah, what's the point of the LookupStrategy pattern considering that you already have a lookup strategy pattern called MutableAclService, which has a Jdbc Accessor called JdbcMutableAclService?
So, even if I extend the ObjectIdentity and add support for property management, the implementation will go to hell, if someone decides to use any of the persistence classes. Oh, almost forgot, for all the bloody abstraction and interfaces, the JdbcLookupStrategy accepts an ObjectIdentity, yet, performs a direct instantiation for ObjectIdentityImpl, with a Long as a serializable id. So, there goes the ability to extend the class, or define anything but a long as an identifier. So, what's the point of creating the ObjectIdentity interface? And, what's the point of making the identifier serializable?
Ah, there is support for an Acl tree via parent/child Acl. I could create a parent Acl to represent the object, and then subsequent children for each of the properties, ah, but the damn ObjectIdentity cast to a long kills that as well.
What would be quite nice is to add property level support directly to the Access Control Entry. Of course, there is an interface, and an implementation, and supporting classes that require the implementation, making another useless interface. What's needed here is a factory pattern.
I am sorry I am angry. I've been reading Buddhist books lately, and they teach you to channel your anger, understand it's source, manage your emotions, so as to balance the negative and positive of Karma. The problem is that all this is going to force me to break from the Acl implementation in Spring, which would mean yet another Acl implementation with a subset feature set. Spring, for all it's problems, seems to provide a large feature set, and if at all possible, I prefer to enhance rather than replace.
Ok, back to Spring Security Acl bashing. The Acl interface and the AclImpl class are capable of encompassing the entire Sid structure. So, if I have 10k users, than, my poor little Acl class will start to look like an ACL cache rather than a simple pojo it was meant to be. What the ACL object should be is a representation of an object, which has properties, and is an instance of security for a single Sid. I highly disagree that a Single Acl needs to start supporting multiple Sids. Granted your approach is more flexible, but flexible to a point that there will be a single ACL class in the system, with a large array of all permissions. Acl is not a cache, it's a simple wrapper around what a single user/principle/granted authority has access to for the given object. The ACL Entry is actually supposed to be a wrapper around a property and a permission mask. That's the whole point of having a permission mask. A mask is an int, which means that you have a single integer (all those bits) that represent all the possible access control rights for a single property of a single object. The beauty of adding property support is that you're no longer limited to a 31 possible permissions, but rather unlimited, with a limit of 31 per property of an object. This means that you can conceivably have different rights per object attribute. And we all know that some objects have a lot more than 32 attributes. So, if you just wrapped the Permission mask in an ACL Entry class, then, what was the point of an ACL Entry class. You could simple collapse the whole structure into the ACL class and be done with it.
Deep breaths, I was reading another blog, which was talking about another blog that mentioned that "Every time you use Acegi... A fairy dies." My daughter love's fairy's.
Saturday, July 19, 2008
Drools + Spring Security + Annotations + AOP= ?
http://code.google.com/p/dynamic-rule-security/
No code has been released yet, but I am hoping to have an alpha version out soon. The project integrates Drools Rule Engine with Spring Security to provide dynamic, rule based, field level ACL security to a system.
Once complete, the system administrator will be able to create business rules to restrict fields, objects, pages, content, whatever based on dynamic rules. But, that's not all. The current crop of security requires the security logic to be embedded with the code and is quite brittle and complex when security rules become very granular. For example, imagine having to implement a requirement that says when a trade belongs to account "abc" hide the trade from anyone not in group "abc-allowed". No problem, you say. You create the security group "abc-allowed". Now you have some choices regarding implementation, you can integrate the rule at the data retrieval layer, at the presentation tier, or in the middle. Either way, somewhere in your system, you'll have a chunk of code like this: if ( trade.account == "abc" && !isUserInRole("abc-allowed") ) then hide.
That was easy. Probably only took 10 minutes to write, 10 minutes to test, and a few days to get it deployed to production. No problem.
A few days go by, and the user comes back and says, I need to expand that security. It seems that group efg can actually see abc account trades but only when the trading amount is less than $50m. Ok, you say. A bit messy, but do-able. So, you create security group "efg-allowed", and change your prior rule to say:
if ( trade.account == "abc" && (!isUserInRole("abc-allowed") && ( trade.amount > 50 && !(isUserInRole("efg-allowed") ) then hide.
Probably only took 10 minutes to code, and another 10 minutes to test, but damn there is QA, UAT, production release. A few days later, you finally release the new feature.
Aren't you glad that's over. A few more days go by, and the user says, wait, he forgot that the efg group can't change the trader name on the trade, and can't see the counterparty, but should be able to see and change everything else. Oh, one more thing, they can change the trader name if the trader is "Jack", because trader Jack's accounts are actually managed by the efg group even if the account belongs to the "abc" group.
Crap you say, that's going to be a bit of work. You may need to change the presentation tier, to hide the fields in some cases, but not others. And boy, how much does it suck to hard code the trader's name somewhere.
Anyways, you get the point. Security Rules may get very complex and very specific to the data they interact with and the context of the request. This means that the rule needs to be aware of the data, and who is requesting it. The rule is then capable of setting the security ACL. The presentation tier then only needs to worry about following the ACL rather than actually dealing with the security rules themselves. Not only that, but security rules will be in a single place rather than being sprinkled throughout the system. You can also change them on the fly allowing you to react very quickly to additional security requests.
How to retrieve the fields used in a Drools Rule (DRL)
One way to do this is to assume that you're dealing with standard pojo's. This means that each variable is private and has an associated getVar and setVar method. Drools currently supports their own language, DRL, java (backed by Janino compiler), and MVEL. I will present how to retrieve the fields form DRL and Java. I am sure the same principles can be applied to MVEL.
First, your pojo:
package com.orangemile.ruleengine;
public class Trade {
private String traderName;
private double amount;
private String currency;
public String getTraderName() {
return traderName;
}
public void setTraderName(String traderName) {
this.traderName = traderName;
}
public double getAmount() {
return amount;
}
public void setAmount(double amount) {
this.amount = amount;
}
public String getCurrency() {
return currency;
}
public void setCurrency(String currency) {
this.currency = currency;
}
}
Now the magic:
package com.orangemile.ruleengine;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import org.codehaus.janino.Java;
import org.codehaus.janino.Parser;
import org.codehaus.janino.Scanner;
import org.codehaus.janino.Java.MethodInvocation;
import org.codehaus.janino.util.Traverser;
import org.drools.compiler.DrlParser;
import org.drools.lang.DrlDumper;
import org.drools.lang.descr.EvalDescr;
import org.drools.lang.descr.FieldConstraintDescr;
import org.drools.lang.descr.ImportDescr;
import org.drools.lang.descr.PackageDescr;
import org.drools.lang.descr.PatternDescr;
import org.drools.lang.descr.RuleDescr;
/**
* @author OrangeMile, Inc
*/
public class DRLFieldExtractor extends DrlDumper {
private PackageDescr packageDescr;
private Map variableNameToEntryMap = new HashMap();
private List entries = new ArrayList();
private Entry currentEntry;
public Collection getEntries() {
return entries;
}
/**
* Main Entry point - to retrieve the fields call getEntries()
*/
public String dump( String str ) {
try {
DrlParser parser = new DrlParser();
PackageDescr packageDescr = parser.parse(new StringReader(str));
String ruleText = dump( packageDescr );
return ruleText;
} catch ( Exception e ){
throw new RuntimeException(e);
}
}
/**
* Main Entry point - to retrieve the fields call getEntries()
*/
@Override
public synchronized String dump(PackageDescr packageDescr) {
this.packageDescr = packageDescr;
String ruleText = super.dump(packageDescr);
List rules = (List) packageDescr.getRules();
for ( RuleDescr rule : rules ) {
evalJava( (String) rule.getConsequence() );
}
return ruleText;
}
/**
* Parses the eval statement
*/
@Override
public void visitEvalDescr(EvalDescr descr) {
evalJava( (String) descr.getContent() );
super.visitEvalDescr(descr);
}
/**
* Retrieves the variable bindings from DRL
*/
@Override
public void visitPatternDescr(PatternDescr descr) {
currentEntry = new Entry();
currentEntry.classType = descr.getObjectType();
currentEntry.variableName = descr.getIdentifier();
variableNameToEntryMap.put(currentEntry.variableName, currentEntry);
entries.add( currentEntry );
super.visitPatternDescr(descr);
}
/**
* Retrieves the field names used in the DRL
*/
@Override
public void visitFieldConstraintDescr(FieldConstraintDescr descr) {
currentEntry.fields.add( descr.getFieldName() );
super.visitFieldConstraintDescr(descr);
}
/**
* Parses out the fields from a chunk of java code
* @param code
*/
@SuppressWarnings("unchecked")
private void evalJava(String code) {
try {
StringBuilder java = new StringBuilder();
List imports = (List) packageDescr.getImports();
for ( ImportDescr i : imports ) {
java.append(" import ").append( i.getTarget() ).append("; ");
}
java.append("public class Test { ");
java.append(" static {");
for ( Entry e : variableNameToEntryMap.values() ) {
java.append( e.classType ).append(" ").append( e.variableName ).append(" = null; ");
}
java.append(code).append("; } ");
java.append("}");
Traverser traverser = new Traverser() {
@Override
public void traverseMethodInvocation(MethodInvocation mi) {
if ((mi.arguments != null && mi.arguments.length > 0)
|| !mi.methodName.startsWith("get") || mi.optionalTarget == null) {
super.traverseMethodInvocation(mi);
}
Entry entry = variableNameToEntryMap.get(mi.optionalTarget.toString());
if ( entry != null ) {
String fieldName = mi.methodName.substring("get".length());
fieldName = Character.toLowerCase(fieldName.charAt(0)) + fieldName.substring(1);
entry.fields.add( fieldName );
}
super.traverseMethodInvocation(mi);
}
};
System.out.println( java );
StringReader reader = new StringReader(java.toString());
Parser parser = new Parser(new Scanner(null, reader));
Java.CompilationUnit cu = parser.parseCompilationUnit();
traverser.traverseCompilationUnit(cu);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
/**
* Utility storage class
*/
public static class Entry {
public String variableName;
public String classType;
public HashSet fields = new HashSet();
public String toString() {
return "[variableName: " + variableName + ", classType: " + classType + ", fields: " + fields + "]";
}
}
}
And now, how to run it:
public static void main( String args [] ) {
String rule = "package com.orangemile.ruleengine;" +
" import com.orangemile.ruleengine.*; " +
" rule \"test rule\" " +
" when " +
" trade : Trade( amount > 5 ) " +
" then " +
" System.out.println( trade.getTraderName() ); " +
" end ";
DRLFieldExtractor e = new DRLFieldExtractor();
e.dump(rule);
System.out.println( e.getEntries() );
}
The basic principle is that the code relies on the AST tree that's produced by DRL and Janino. In the case of Janino walk, the code only looks for method calls that have a target, start with a "get", and take no variables. In the cast of DRL, the API is helpful enough in providing callbacks when a variable declaration and field is hit, making the code trivial.
That's it. Hope this helps someone.
Wednesday, July 16, 2008
Drools - Fact Template Example
But, there is another way, which is a bit over simplistic, but maybe useful for some of you out there. Drools has introduced support for fact templates, which is a concept introduced by Clips. A fact template is a basically a definition of a flat class:
template "Trade"
String tradeId
Double amount
String cusip
String traderName
end
This template can then be naturally used in the when part of a rule:
rule "test rule"
when
$trade : Trade(tradeId == 5 )
then
System.out.println( trade.getFieldValue("traderName") );
end
But, there is a cleaner way to do all of this using the MVEL dialect introduced in Drools 4.0.
You can code your own Fact implementation that's backed by a Map.
package app.java.com.orangemile.ruleengine;
import java.util.HashMap;
import java.util.concurrent.atomic.AtomicLong;
import org.drools.facttemplates.Fact;
import org.drools.facttemplates.FactTemplate;
import org.drools.facttemplates.FieldTemplate;
/**
* @author OrangeMile, Inc
*/
public class HashMapFactImpl extends HashMap implements Fact {
private static AtomicLong staticFactId = new AtomicLong();
private FactTemplate factTemplate;
private long factId;
public HashMapFactImpl( FactTemplate factTemplate ) {
factId = staticFactId.addAndGet(1);
this.factTemplate = factTemplate;
}
@Override
public long getFactId() {
return factId;
}
@Override
public FactTemplate getFactTemplate() {
return factTemplate;
}
@Override
public Object getFieldValue(int index) {
FieldTemplate field = factTemplate.getFieldTemplate(index);
return get(field.getName());
}
@Override
public Object getFieldValue(String key) {
return get(key);
}
@Override
public void setFieldValue(int index, Object value) {
FieldTemplate field = factTemplate.getFieldTemplate(index);
put( field.getName(), value );
}
@Override
public void setFieldValue(String key, Object value) {
put(key, value);
}
}
To use this class, you would then do this:
String rule = "package com.orangemile.ruleengine.test;" +
" template \"Trade\" " +
" String traderName " +
" int id " +
" end " +
" rule \"test rule\" " +
" dialect \"mvel\" " +
" when " +
" $trade : Trade( id == 5 ) " +
" then " +
" System.out.println( $trade.traderName ); " +
" end ";
MVELDialectConfiguration dialect = new MVELDialectConfiguration();
PackageBuilderConfiguration conf = dialect.getPackageBuilderConfiguration();
PackageBuilder builder = new PackageBuilder(conf);
builder.addPackageFromDrl(new StringReader(rule));
org.drools.rule.Package pkg = builder.getPackage();
RuleBase ruleBase = RuleBaseFactory.newRuleBase();
ruleBase.addPackage(pkg);
HashMapFactImpl trade = new HashMapFactImpl(pkg.getFactTemplate("Trade"));
trade.put("traderName", "Bob Dole");
trade.put("id", 5);
StatefulSession session = ruleBase.newStatefulSession();
session.insert(trade);
session.fireAllRules();
session.dispose();
Notice, that in the then clause, to output the traderName, the syntax is:
$trade.traderNamerather then the cumbersome:
$trade.getFieldValue("traderName")
What makes this possible is that the Fact is backed by a Map, and the dialect is MVEL, which supports this type of operation, when the map keys are strings.
The interesting thing about using the fact template, is that it makes it easy to perform lazy variable resolution. You may extend the above HashMapFactImpl to add Field Resolvers that may contain specific logic to retrieve field values. To do this with an object tree, especially dynamic objects, would require either intercepting the call to retrieve the field via AOP and injecting the appropriate lazy value, or setting the value to a dynamic proxy, which then performs the lazy variable retrieval once triggered. In either case, this simple Fact Template solution maybe all that you need.