Tuesday, October 22, 2013

... Java so fast it blows women's cloths off!

The question I am trying to answer is what does it take to get Java to perform so fast it blows women's cloths off. The goal is not to understand what is fast - but simply to look at techniques which are faster than anything else available.

1. Never use java.util (collections). They generate tremendous amount of garbage and are slow.
2. Avoid garbage collection
 3. Reuse strings.
3. Async Logging - i.e. don't spend all your time writing logs
4. Don't use the Heap - i.e. No heap = no GC.
5.  Know what the heck your system is doing. Maybe it's the machine and not your crappy-code?
6. Use a ring instead of a queue / thread message passing
7. Maybe the problem isn't with your code at all - but with the watch you use?
8. Keep your methods short and sweet - helps with hotspot
9. Try some exotic features
 10. CAS / optimistic locking / lock free

DemiGods in this space - 










Monday, October 14, 2013

JVM Monitoring List

Yes - it's another list. In the last few days, I've had a perverse desire to make lists. This one is for tools that allow for JVM monitoring both internal or external.


Tools:
Open Source:

Vendors:


Metrics:
Log Aggregation


Tutorials:

Distributed File System List

LP Solvers List

I am feeling like making lists. I guess it's the human condition to want to organize and categorize. For this post, I will focus on linear programming solver libraries and links:


Open Source:
 Commercial:

Saturday, October 12, 2013

GC Parameters List

I am always on the lookout for common GC parameters. So I figured I'll compile a list of some of the common parameters and places where I found them:

Super Lists of All Parameters:
  1. http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html 
  2. http://stas-blogspot.blogspot.fi/2011/07/most-complete-list-of-xx-options-for.html
  3. http://blog.ragozin.info/2011/09/hotspot-jvm-garbage-collection-options.html
  4. http://reins.altervista.org/java/A_Collection_of_JVM_Options_MP.html 

Blog Roll
  • http://practicingtechie.wordpress.com/2013/06/15/java-vm-options/
    • -XX:+UseConcMarkSweepGC
    • -XX:+HeapDumpOnOutOfMemoryError
    • -XX:HeapDumpPath=$APP_HOME_DIR
    • -XX:OnOutOfMemoryError=
    • -XX:OnError=
    • -XX:+PrintGCDetails
    • -XX:+PrintGCTimeStamps
    • -Xloggc:$APP_HOME_DIR/gc.log
    • -XX:-UseGCLogFileRotation
    • -XX:GCLogFileSize=
    • -XX:NumberOfGCLogFiles=
  • http://forum.openspaces.org/thread.jspa?messageID=9277 
    • -Xms2g 
    • -Xmx2g 
    • -XX:+UseConcMarkSweepGC 
    • -XX:+CMSIncrementalMode 
    • -XX:+CMSIncrementalPacing 
    • -XX:CMSIncrementalDutyCycleMin=10 
    • -XX:CMSIncrementalDutyCycle=50 
    • -XX:ParallelGCThreads=8 
    • -XX:+UseParNewGC 
    • -Xmn150m 
    • -XX:MaxGCPauseMillis=2000 
    • -XX:GCTimeRatio=10 
    • -XX:+DisableExplicitGC 
  • http://java-is-the-new-c.blogspot.com/
    • -Xms11g 
    • -Xmx11g 
    • -verbose:gc 
    • -XX:-UseAdaptiveSizePolicy 
    • -XX:SurvivorRatio=12 
    • -XX:NewSize=100m 
    • -XX:MaxNewSize=100m 
    • -XX:MaxTenuringThreshold=2
  • http://blog.igorminar.com/2010/07/dgc-ii-jvm-tuning.html 
    • -XX:+UseConcMarkSweepGC
    • -XX:+UseParNewGC
    • -XX:CMSInitiatingOccupancyFraction=68
    • -XX:MaxTenuringThreshold=31
    • -XX:+CMSParallelRemarkEnabled
    • -XX:SurvivorRatio=6
    • -XX:TargetSurvivorRatio=90 
    • -XX:+AggressiveOpts
    • -XX:+DoEscapeAnalysis 
    • -Xloggc:/some/path/
    • -XX:+PrintGCDetails 
    • -XX:+PrintGCTimeStamps
    • -XX:+PrintGCDateStamps
    • -XX:+PrintTenuringDistribution
    • -XX:+HeapDumpOnOutOfMemoryError
    • -Xmn2818m 
  • http://blog.performize-it.com/2013/09/jvm-params-everyone-should-have-in.html
    • -Xms{#MB}m -Xmx{#MB}m
    • -XX:PermSize={#MB}m -XX:MaxPermSize={#MB}m
    • -XX:+HeapDumpOnOutOfMemoryError
    • -XX:+PrintFlagsFinal
    • -server
    • -XX:+PrintGCDetails
    • -XX:+PrintGCDateStamps 
    • -XX:+PrintTenuringDistribution 
    • -XX:+PrintGCApplicationStoppedTime 
    • -XX:+PrintGCApplicationConcurrentTime  
    • -XX:+UseGCLogFileRotation
    • -XX:NumberOfGCLogFiles={#files}
    • -XX:GCLogFileSize={#MB}M
    • -Xloggc:{some gc log file}.gc 
    • -Dcom.sun.management.jmxremote
    • -Dcom.sun.management.jmxremote.port={a port}
    • -Dcom.sun.management.jmxremote.authenticate=false
    • -Dcom.sun.management.jmxremote.authenticate=false
  • Big Bank System with a very large Heap (~80gb)
    • -d64
    • -server
    • -XX:+AggressiveOpts
    • -XX:+UseConcMarkSweepGC
    • -XX:+UseParNewGC
    • -XX:ParallelGCThreads=4
    • -XX:NewRatio=4
 Tools
  1. https://github.com/foursquare/heapaudit
  2. https://github.com/twitter/jvmgcprof
  3. https://github.com/Netflix/gcviz
  4. https://github.com/chewiebug/GCViewer

Tutorials
  1. http://www.slideshare.net/aszegedi/everything-i-ever-learned-about-jvm-performance-tuning-twitter
  2. http://java.dzone.com/articles/how-tame-java-gc-pauses
  3. http://blog.ragozin.info/p/garbage-collection.html
  4. http://blog.ragozin.info/2011/10/techtalk-garbage-collection-in-java.html
  5. https://blogs.oracle.com/jonthecollector/entry/the_second_most_important_gc
  6. http://stackoverflow.com/questions/17009961/understanding-the-java-memory-model-and-garbage-collection
  7. http://mechanical-sympathy.blogspot.com/2013/07/java-garbage-collection-distilled.html
  8. http://blog.mgm-tp.com/2013/03/garbage-collection-tuning/
  9. http://www.youtube.com/watch?v=o6qx_zvpOyI  
  10. http://www.infoq.com/presentations/Virtualizing-Tuning-JVM
  11. https://blog.codecentric.de/en/2013/10/useful-jvm-flags-part-7-cms-collector/
  12. http://blog.headius.com/2009/01/my-favorite-hotspot-jvm-flags.html 
  13. http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html
  14. http://java.dzone.com/articles/java-garbage-collection-0 

Tuesday, July 02, 2013

Greenfield Enterprise Architecture for an IT BU

Let's say you had the power to have a completely greenfield development for an entire Enterprise Architecture for an IT BU - what would it look like?

Well, which IT BU you might ask? Does it really matter?

You need a big database. What kind of enterprise architecture can you have without a database?

Next, we need a bunch of ETL to load data from source systems, because there is always a source system to source from. We can either code it or buy it - doesn't matter which.

And there shall be database performance problems with the data-load.

Next, we need some engine code. Let's call it "The System". And, It shall be in Java. And it shall have memory problems regardless of whether it's 32bit or 64bit or how much memory you allocate to it.

And next, there shall be a web-ui, for which, countless hours will be spent on things that will never be used. And there shall be an excel download link on every page - because that's the only feature the users seem to care about.

And there shall be data quality problems, and performance problems, and scalability problems, and extensibility, and bulk upload requirements, and usability issues, and technical debt, and The Business will cry out for Salvation.

Surely, there must be another way.

I have spent the last 5 years as an Enterprise Architect at a Tier 1 Investment Bank designing systems that solve Big and Expensive problems. There are a few observations I would like to make for things that worked and things that didn't.

1. User's are smart and love Excel and are better at coding than the H1B coder you got for the 2 for 1 sale from a body shop.
2. Build functions not systems - and expose your functions to your users. Allow the users to create a managed ecosystem around the functionality.
3. Make sure your functions work natively in Excel - think COM C# library.
4. Use elastic infrastructure like HDFS, and compute clouds, and data-grids, etc... - don't build dedicated systems, build services that have clean inputs and outputs and can run on scalable hardware like compute clouds.
5. The database and ETL has been my Achilles heel. The database schema is too rigid for the fast pass of change. Alternatively, the rigidness is required given how central the data is too everything. I have yet to really embrace the nosql movement, given the lack of ACID qualities. There are some promising developments in the form of Impala, which is a closer to a pure MPP database running on commodity elastic hardware. Perhaps an interim medium can be found between a strict data-model of a traditional database and a loose schema of a nosql database.

To be continued....




Wednesday, May 22, 2013

Cross Language



Java to C to C++/CLI to C# to Java via IKVM to HTTP Rest to Java Servlet


and it all works!!!

Sunday, February 26, 2012

Platform Building

As architects, the most natural thing to do is to build systems. Building a system is a lot like building a sky-scraper. There is always a foundation and strong central core. Stuff around the core is a frame, and everything else is just eye candy - windows, walls, carpets, furniture, etc... The interesting distinction is that construction architects don't erect sky scrappers in the middle of Sahara, while system architects do. In all cases, all systems start in a vaccum and proceed from there. Once the system is built, the most rudimentary of neurons is spent on deciding how to integrate the system into an existing eco environment. In most cases, this means that the author slaps together a messaging bus or maybe some kinda of REST, WS, or maybe something new fan dangled like proto-buffers, etc... The system is still a big giant monolith with some little shitty input and output.

Now what if we decided to build a platform instead of a system - what does that mean:

1. User interface split into a Bloomberg like approach - I can jump to any screen directly by entering the right tag and parameters
2. Analytic & Calculators - segregated engines that can be called as libraries or services
3. Commons Core - all the technical generic plumbing libraries and services (calendar, security, etc...)
4. Modular controllers - business logic segregated by type
5. Physical services - compute farm, data storage, distributed locking schemes like Zookeeper, etc...

But you also need orchestration of all of this:
1. Workflows, schedulers, coordinators