Skip to main content

A case of Apache HTTP client becoming unusable due to java.lang.OutOfMemoryError

This post describes a problem where an instance of the Apache HTTP client becomes unusable if the JVM throws a java.lang.Error when executing the library code and possible ways of dealing with that situation.

Problem description

Apache HTTP libraries are widely used in the JVM ecosystem, some examples include the official AWS SDK, Spring Framework, Hadoop, and many more. They are often used as building blocks by other libraries and frameworks to handle low-level details of the HTTP protocol.

At one point in time, “my” team was responsible for a data aggregation pipeline that aggregated data from multiple tables present in a proprietary data store and saved those aggregations in S3. Systems downstream of that data store could be notified about saves to specific tables by subscribing their SQS queues to a notification stream produced by that data store. The data aggregation pipeline polled the SQS queue and when a new save notification arrived the data was loaded in memory and saved in S3. Due to the number of saves to that datastore, the application was deployed on multiple servers and it relied on SQS to distribute the work between the nodes/workers. We could assume that a single row in the data store won’t change after the initial write and it wasn’t an issue if the data aggregations in S3 would contain a “small” number of duplicated entries.

High level architecture diagram

High level architecture diagram

We’ve routinely observed that every couple of weeks some of the nodes would stop processing the messages correctly, either by not being able to consume messages from SQS or failing to fetch the data from the data store. In both cases, the logs were filled with the following exception suggesting that something wrong was happening with the Apache HTTP client used internally by both AWS SDK and a client library used to interact with the data store.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
java.lang.IllegalStateException: Connection pool shut down
    at org.apache.http.util.Asserts.check(Asserts.java:34) 
    at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:191)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:267) 
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176) 
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) 
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) 
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) 
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) 
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) 
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) 

Based on the exception message we’ve investigated a possibility that the application process was being shut down but with a very long timeout or that something was preventing the full shutdown altogether (for whatever reason). After inspecting application logs and looking at dates and times of deployments and system restarts we could not find any evidence of an external actor initiating the JVM shutdown process.

After ruling out a possibility of an actual shutdown happening we’ve started to read the source code of the Apache HTTP client to understand under what conditions the connection pool shuts down. It turned out that there exists logic inside the library that shuts down the pool if an instance of java.lang.Error is thrown when preparing the request, sending it, and receiving a response from the server. Here is an excerpt from the org.apache.http.impl.execchain.MainClientExec#execute method:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
@Override
public CloseableHttpResponse execute(
        final HttpRoute route,
        final HttpRequestWrapper request,
        final HttpClientContext context,
        final HttpExecutionAware execAware) throws IOException, HttpException {
    
    // Acquire a connection to use for executing the request
    final ConnectionHolder connHolder = new ConnectionHolder(this.log, this.connManager, managedConn);
    try {
        // Send the request and receive the response
    } catch (final ConnectionShutdownException ex) {
        final InterruptedIOException ioex = new InterruptedIOException(
                "Connection has been shut down");
        ioex.initCause(ex);
        throw ioex;
    } catch (final HttpException ex) {
        connHolder.abortConnection();
        throw ex;
    } catch (final IOException ex) {
        connHolder.abortConnection();
        if (proxyAuthState.isConnectionBased()) {
            proxyAuthState.reset();
        }
        if (targetAuthState.isConnectionBased()) {
            targetAuthState.reset();
        }
        throw ex;
    } catch (final RuntimeException ex) {
        connHolder.abortConnection();
        if (proxyAuthState.isConnectionBased()) {
            proxyAuthState.reset();
        }
        if (targetAuthState.isConnectionBased()) {
            targetAuthState.reset();
        }
        throw ex;
    } catch (final Error error) {
        connManager.shutdown();
        throw error;
    }
}

We can see that the code catches different types of exceptions but it also catches instances of java.lang.Error. Both java.lang.Exception and java.lang.Error classes are defined in the Java standard library and both of them extend from the top-level class for any things that can be thrown, java.lang.Throwable. The difference is that an error “indicates serious problems that a reasonable application should not try to catch” (based on the Java docs), things like java.lang.LinkageError or java.lang.StackOverflowError. Once we’ve figured out that the connection pool could be shut down if a subclass of java.lang.Error was thrown by the JVM during HTTP request processing we understood what was happening.

Java exception hierarchy

Java exception hierarchy

On a side note, that catch (final Error error) clause was introduced as part of the HTTPCLIENT-1924 in the 4.5.6 version of the library.

To lower the fleet cost we’ve made an operational decision in the past to allow situations where the rate of messages added to the queue was higher than the rate of messages consumed and processed by the fleet. We could do that because the aggregations in S3 could be behind the live data by hours or days and the volume of messages sent to the SQS queue from the data store depended on the time of day (more messages during the day, fewer messages during the night). Even if messages accumulated during the day the surplus was processed in the night.

Sometimes when the nodes were processing messages at their full capacity the memory pressure on the system was too high (e.g. too many threads trying to load too much data in memory) and java.lang.OutOfMemoryError was thrown by the JVM. On the high level, we did not have any problem with that, even if a processing thread was terminated and a message was not processed by this specific node we relied on an SQS mechanism called visibility timeout to process the same message on another consumer. In the worst-case scenario, the same message would be processed twice and there would be some duplication in the aggregated data set but, as mentioned before, that situation was acceptable. We did not expect though that some components in the system would stop working altogether if they encountered an error like that.

The solution

Once we understood the problem we decided to pass the -XX:+ExitOnOutOfMemoryError option to the JVM process. It instructs the JVM to exit on the first occurrence of an out-of-memory error. In our case, the JVM processes were managed by a supervisor that would restart them if they crash or exit by themselves. Even if a process was processing some SQS messages they would be picked up again after the visibility timeout period. That was a quick and easy fix to the problem in our case.

Some alternative approaches come to my mind if we could not kill the processes without breaking some of the business requirements:

  1. Implement a wrapper around the client libraries that use Apache HTTP under the hood. That wrapper would handle the java.lang.Errors and replace old clients with new clients.
  2. Scale the fleet, horizontally or vertically, to decrease the memory pressure on individual hosts.
  3. Implement a dynamic backpressure mechanism based on heap memory occupancy or recent GC activity. Hosts would dynamically decrease the speed of their processing to not end up with java.lang.OutOfMemoryError thrown during message processing.

Reproducing the problem

I’ve tried to prepare an example that would reproduce the problem described above using the client/server architecture. The full source code is available on my GitHub.

The server uses a not very well-known (from my experience) part of the Java standard library, com.sun.net.httpserver.HttpServer. Despite having the sun in the package name it’s a part of the standard library. The server listens on a local interface on the port 12345 and implements two operations, randomNumberOfHeaders and noAdditionalHeadersHandler. The first operation returns a String Response for every request and additionally sends an increasing number of HTTP headers in the response. I’ve settled on this strategy to allow the client JVM to warm up and then gradually construct bigger and bigger responses that will trigger a java.lang.OutOfMemoryError on the client-side when receiving an HTTP response. noAdditionalHeadersHandler just returns a string Response in the response with no extra headers.

Initially, I wanted to construct responses with bigger and bigger response body but it turns out that the Apache HTTP client does not load the response body to memory automatically, instead the client code needs to consume it in a form of java.io.InputStream. Consuming bigger and bigger response bodies generated java.lang.OutOfMemoryError but it originated not in the Apache HTTP library itself but in the code written by me and did not result in the connection pool shutdown.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
private static final class RandomNumberOfHeadersHandler implements HttpHandler {
    private static final AtomicInteger NUMBER_OF_HEADERS = new AtomicInteger(100);

    @Override
    public void handle(HttpExchange t) throws IOException {
        int number_of_headers = NUMBER_OF_HEADERS.addAndGet(10);

        if (number_of_headers % 100 == 0) {
            logger.info("NUMBER_OF_HEADERS is {}", number_of_headers);
        }
        for (int i = 0; i < number_of_headers; ++i) {
            t.getResponseHeaders().add("Header-" + i, String.valueOf(number_of_headers));
        }

        String response = "Response";
        t.sendResponseHeaders(200, response.length());
        OutputStream os = t.getResponseBody();
        os.write(response.getBytes());
        os.close();
    }
} 

The client spawns multiple threads, each thread uses Apache HTTP client to call the randomNumberOfHeaders operation in an “infinite” loop until java.lang.IllegalStateException is received which indicates that a thread caught java.lang.OutOfMemoryError and shut down the connection manager. After catching java.lang.IllegalStateException one last request is made to confirm that the client is now unusable. Any other exception will be logged and the infinite loop will continue.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
private static void workerThreadLogic() {
    // We will loop infinitely until we receive IllegalStateException indicating that the library has shut down
    // the connection pool
    while (true) {
        try {
            HttpGet request = new HttpGet(RANDOM_NUMBER_OF_HEADERS_ADDRESS);
            HttpResponse response = CLIENT.execute(request);
            // We need to consume the response in order to release the connection in the pool maintained
            // by the Apache client library
            EntityUtils.consume(response.getEntity());

            Thread.sleep(ThreadLocalRandom.current().nextInt(150));
        } catch (IllegalStateException e) {
            logger.info("Illegal state exception received, stopping the loop");
            break;
        } catch (Throwable t) {
            logger.info(
                "Exception other than IllegalStateException received, logging and retrying the connection",
                t);
        }
    }

    try {
        HttpGet request = new HttpGet(RANDOM_NUMBER_OF_HEADERS_ADDRESS);
        request.setConfig(
            RequestConfig.custom().setSocketTimeout(1000000).build()
        );
        CLIENT.execute(request);
    } catch (Exception e) {
        logger.error("Last request", e);
    }
} 

And here is an example output from the Client execution. I’ve explicitly specified -Xms9m -Xmx9m to set the heap size to (roughly) 9MB out of the box. As you can see it took around 5 seconds to get to a state where java.lang.OutOfMemoryError was thrown.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
adebski@adebski-VirtualBox:~/IdeaProjects/ApacheHTTPClientOOM$ java -Xms9m -Xmx9m -cp target/ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar com.adebski.Client
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
22:31:47.306 [main] INFO  com.adebski.Client - PID 18173
22:31:52.254 [WorkerThread-0] INFO  com.adebski.Client - Illegal state exception received, stopping the loop
22:31:52.254 [WorkerThread-1] INFO  com.adebski.Client - Illegal state exception received, stopping the loop
22:31:52.255 [WorkerThread-3] INFO  org.apache.http.impl.execchain.RetryExec - I/O exception (org.apache.http.ConnectionClosedException) caught when processing request to {}->http://localhost:12345: Connection is closed
22:31:52.257 [WorkerThread-3] INFO  org.apache.http.impl.execchain.RetryExec - Retrying request to {}->http://localhost:12345
22:31:52.257 [WorkerThread-3] INFO  com.adebski.Client - Illegal state exception received, stopping the loop
22:31:52.253 [WorkerThread-2] INFO  com.adebski.Client - Exception other than IllegalStateException received, logging and retrying the connection
java.lang.OutOfMemoryError: Java heap space
...
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at com.adebski.Client.workerThreadLogic(Client.java:70) [ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at com.adebski.Client$$Lambda$102/0x000000084017cc40.run(Unknown Source) [ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at java.lang.Thread.run(Thread.java:834) [?:?]
22:31:52.255 [WorkerThread-0] ERROR com.adebski.Client - Last request
java.lang.IllegalStateException: Connection pool shut down
...
22:31:52.260 [WorkerThread-3] ERROR com.adebski.Client - Last request
java.lang.IllegalStateException: Connection pool shut down
...
22:31:52.255 [WorkerThread-1] ERROR com.adebski.Client - Last request
java.lang.IllegalStateException: Connection pool shut down
...
22:31:52.262 [WorkerThread-2] INFO  com.adebski.Client - Illegal state exception received, stopping the loop
22:31:52.263 [WorkerThread-2] ERROR com.adebski.Client - Last request
java.lang.IllegalStateException: Connection pool shut down
    at org.apache.http.util.Asserts.check(Asserts.java:34) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:191) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:267) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at com.adebski.Client.workerThreadLogic(Client.java:91) ~[ApacheHTTPClientOOM-1.0-jar-with-dependencies.jar:?]
    at java.lang.Thread.run(Thread.java:834) [?:?]
22:31:52.264 [main] INFO  com.adebski.Client - Expected exception received!
comments powered by Disqus