Hints and tips for analyzing Java heap dumps
- Free: Eclipse's Memory Analyzer Tool (MAT) - https://www.eclipse.org/mat/
- Free: VisualVM - https://visualvm.github.io
- Great but licensed: YourKit Java Profiler - https://www.yourkit.com
There are a lot more!
- Examine threads to get an idea on what's going on. Your tool may provide a thread in which an OOM happened. This can be a lead to figure out the cause, but not necessarily because the thread can also be just a tipping point. When on Tomcat, a thread "catalina-exec-xyz" can be matched to a URL (see later).
- Examine the biggest objects: Dominators, Top Consumers, Leak Suspects that the tool provides.
Some known objects
PersistenceManager / Bundle Cache
Corresponds to the repository, also known as the bundle cache. Sizes up to 200MB are not ununsual, depending on data size and settings, e.g. bundle cache size.
org.hippoecm.repository.jackrabbit.HippoLocalItemStateManager objects correspond to JCR sessions. Sizes up to 20MB are not unusual. There should not be too many of them, some tens are normal.
org.hippoecm.hst.configuration.model.HstManagerImpl objects contain the HST model. They can be a couple of tens of MBs, depending on the HST configuration involved. 1 or 2 instances only; can be one for live and one for preview.
org.hippoecm.hst.site.request.HstRequestContextImpl, one for an HST request, typically from a browser. Commonly tied to a 'catalina-exec' thread. Some valuable properties are contained in the baseURL object within this class:
If present, a CMS was deployed, else just a site/delivery tier.
Matching a big object to a URL
1) Match a big object to an HTTP request.
From an object, use "Path to GC roots" function (or similar). It should show the root object which is 'catalina-exec-xyz' if the object is part of an incoming request to a Tomcat, typically a browser request.
2) Match HST request context to the same HTTP request.
Try to find a RequestContextImpl that has the same 'catalina-exec-xyz' as found earlier. In its properties you will find what this request was, looking at the baseURL. Search for RequestContextImpl objects using OQL if that is supported, see below.
Using Object Query Language (OQL)
If supported by your tool, use OQL to list objects, see https://en.wikipedia.org/wiki/Object_Query_Language
Examples: all HST Request Contexts, HST Requests, HST Managers:
select * from org.hippoecm.hst.site.request.HstRequestContextImpl
select * from org.hippoecm.hst.core.component.HstRequestImpl
select * from org.hippoecm.hst.configuration.model.HstManagerImpl
Some known out-of-memory cases
A thread with a lot of org.apache.lucene.search.* objects like BooleanQuery, BooleanScorer2, TermScorer: it can be an exploding query, meaning one that has too many hits, taking up too much memory while processing those.
For analysis, find the URL involved from the HstRequestContext to map it back to the page and components involved. If it's an HST query, the actual JCR XPath queries themselves can be found within the thread.
An exploding query can also be a result of a misconfigured Faceted Navigation; e.g., when no limits are set or when inefficient faceted query is used.
Custom cache with node references
Custom cache implementations have been seen to cause OOM when they cache JCR nodes or properties, because these then will never be cleaned up. Simple rule here is: never put JCR node/property references into cache.
Too many JCR sessions
A lot of HippoLocalItemStateManager objects may mean an exhausted JCR session pool.
Groovy script with large JCR session
A custom Groovy script can take up a lot of memory if it tries to save a lot of changes at once.
Pointers: the thread view shows a thread involving org.onehippo.repository.update.UpdaterExecutor, a big bundle cache objects.
URL rewriter having an infinite redirects
When wrongly configured, e.g. with a redirect loop, the URL rewriter can pollute the memory with large String objects containing for example: "/https:/www.mydomain.nl/https:/www.mydomain.nl/https:/www.mydomain.nl..."
Large YAML (XML) download from console
If a user tries to download an oversized node structure from the console, this can cause OOM.
Pointers: the thread view shows a thread involving org.onehippo.cm.engine.ConfigurationServiceImpl#exportContent or alike.