Thursday, September 3, 2009

Attempting to Quickly Diagnose Production Memory Leaks with JRockit

I just wanted to mention my favourite, not well known, JRockit diagnostic command which I've used a few times recently to resolve tricky JVM heap related issues in production environments.

Imagine the situation: You've performance tested and tuned your server and application to the hilt, but on day 3 after going live, the production servers are exhibiting excessive heap memory usage issues. You've been pushed into the noisy server room, the pressure's on and the project sponsor is pacing up and down behind you. Until a sys-admin sorts out a non-headless machine for you to run JRockit's Mission Control on, you're stuck with the command line on the production server you've just SSH'd into. You're desperate to be able to just issue a command or two into the shell to quickly see what's happening in the JVM heap before the sys-admin bounces the server to prevent it from going pop with out of memory errors. Well there is such a command that might just save you....

Using JRockit's JRCMD command line tool you can issue a command called heap_diagnostics.

This command was new in JRockit R26.4 which was bundled with some of the later WLS 9.x versions and with all WLS versions from 10.0 onwards. It produces a diagnostic text dump of the state of the live JRockit heap, into sys-out of your command shell. The dump includes a summary of the current heap size and heap free, the layout of the native memory used by the JVM, the layout of the JVM's heap, and the state of soft/weak/phantom references and finalizers at the last garbage collection. Most crucially though, the dump also includes the number, percentage and size of every object type currently held in the heap, in order of the largest consumer first. If you suspect a memory leak and look at the names of the classes listed at the top of this list, you have a good chance of very quickly discovering which objects are being leaked. This may not directly point you at the root cause, but it will give you a much better idea of where the problem may lie.

As an example, to produce a diagnostic dump on a Unix based host environment, run the following commands to: (i) make the JRockit JRE and tools visible to your shell, (ii) determine the operating system's process ID of the WebLogic instance you want to probe, and (iii) generate the diagnostic dump into a text file:
$ . /opt/oracle/WLS103/wlserver_10.3/server/bin/setWLSEnv.sh

$ jps -l | grep weblogic
9681 weblogic.Server

$ jrcmd 9681 heap_diagnostics > heapdump.txt

The command takes a second or two to complete (depending on the current size of your heap and how much is in there). In the environment where I was chasing a memory leak, the dump text file was about 0.5 MB in size, which I've truncated and shown below.

In this specific example, by looking at the first few entries in the "Detailed Heap Statistics" section of the dump, showing largest heap consumers first, I was able to take an educated guess that the cause of my memory leak was due to some SNMP related classes in the package 'monfox' in WebLogic 10.0.1 rather than in my application code. In my case, having so many of these types of objects in the heap was not to be expected. However in other cases, top consuming objects may be legitimate and not be leaked objects, so this is only a potential indicator. Incidentally, if you need a patch for this specific SNMP issue, raise an Oracle Support case using Metalink refering to Bug#8185278.
======== BEGIN OF HEAPDIAGNOSTIC =========================

Total memory in system: 3434975232 bytes
Available physical memory in system: 2001797120 bytes
-Xmx (maximal heap size) is 536870912 bytes
Heapsize: 536870912 bytes
Free heap-memory: 470024072 bytes

mmStartCompaction = 0x8c00000, mmEndCompaction = 0xac00000

Memory layout:
00000000-00002000 ---p 00000000 00:00 0
08048000-08057000 r-xp 00000000 08:06 11421468   /opt/oracle/wls1001/jrockit_150_11/bin/java
08057000-08058000 rwxp 0000f000 08:06 11421468   /opt/oracle/wls1001/jrockit_150_11/bin/java
08060000-080e0000 rwxp 08060000 00:00 0
08100000-08901000 rwxp 08100000 00:00 0
.........truncated.........

--------- Detailed Heap Statistics: ---------
26.1% 12083k   108775   -541k [C
12.8% 5924k     5586    +18k  [B
 7.2% 3338k   142444    -52k  java/lang/String
 6.4% 2974k    54389     -4k  monfox/toolkit/snmp/util/OidTree$Node
 3.8% 1753k    16028     +0k  [Lmonfox/toolkit/snmp/util/OidTree$Node;
 2.6% 1218k    38978    -39k  monfox/toolkit/snmp/agent/ext/table/SnmpMibTableAdaptor$AdaptorMibNode
 1.2% 555k     7069     +0k   java/lang/Class
 0.5% 225k     3213     -2k   java/util/TreeMap$Entry
.........truncated.........
     46258kB total ---
--------- End of Detailed Heap Statistics ---

------------------- Printing heap ---------------------
"o"/"p" = 1k normal/pinned objects
"O"/"P" = 50k normal/pinned objects
"."/"/" = <1k br="" fragmentation="" normal="" objects="" pinned="">" "/"_" = 1k/50k free space
-------------------------------------------------------
OOOOOOOOOOOOOOOOooooooooooooooooooo.oooooooooooo..      0x8ccb5e8
..o.oooooooooooooooooo.O.............OOooooooooooo      0x8cfb770
ooooooooooo.Oooooooooooooooooooooooooooooooooooooo      0x8d13a28
.........truncated.........
__________________________________________________      0x28a36420
____________________________________                    0x28bfd178
         
-------------- Done printing heap ---------------------

--- Verbose reference objects statistics - old collection --------
456.7 MB free memory (of 512.0 MB) after last heap GC, finished 0.137 s ago.
Soft references: 326 (146 only soft reachable, 0 cleared this GC)
    java/lang/ref/SoftReference: 215 (140, 0)
           71 ( 32, 0) java/lang/reflect/Method
           45 ( 45, 0) [Ljava/lang/reflect/Method;
.........truncated.........
    Softly reachable referents not used for at least 228.349 s cleared.
Weak references: 15268 (0 cleared this GC)
    java/lang/ref/WeakReference: 12834 (0)
        11355 (  0)    java/lang/Class
          888 (  0)    com/sun/jmx/interceptor/DefaultMBeanServerInterceptor$ListenerWrapper
.........truncated.........
Final handles: 743 (0 pending finalization, 0 became pending this GC)
          244 (  0, 0) java/util/zip/Inflater
          228 (  0, 0) java/util/jar/JarFile
           48 (  0, 0) com/rsa/jsafe/JA_RSAPublicKey
.........truncated.........
Weak object handles: 42096 (0 cleared this GC)
        26308 (  0)    java/lang/String
        12262 (  0)    sun/misc/Launcher$AppClassLoader
.........truncated.........
Phantom references: 366 (330 only phantom reachable, 0 became phantom reachable this GC)
    com/bea/xbean/store/Locale$Ref: 330 (330, 0)
          330 (330, 0) com/bea/xbean/store/Cursor
    jrockit/vm/ObjectMonitor: 22 (0, 0)
            4 (  0, 0) weblogic/work/ExecuteThread
            3 (  0, 0) weblogic/kernel/ServerExecuteThread
.........truncated.........
--- End of reference objects statistics - old collection ---------

Dark matter: 7207024 bytes
Heap size is not locked

======== END OF HEAPDIAGNOSTIC ===========================


Song for today: Turn Around by Whiskeytown