Connection pool leak via Apache Async Http Client? #923

SwonVIP · 2024-12-23T12:38:36Z

Java API client version

8.16.1

Java version

21

Elasticsearch Version

8.16.1

Problem description

Hello,

we are suffering from a hard to debug resource leak on our components using the Elasticsearch Client which we are currently investigating. The symptoms will occur on deployments after multiple days/ weeks and will lead to the corresponding pod which carries the deployment to be "stuck". Requests reaching the pod are essentially stuck and no new requests will be distributed to the individual pod until restarted.
The issue is unfortunately hard to reproduce locally. From a heapdump of an affected instance I was able to retrieve the following information which pointed us in the direction of the Apache Async HTTP Client used by the Low Level Elasticsearch Rest Client.

One instance of org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager loaded by org.springframework.boot.loader.launch.LaunchedClassLoader @ 0xa515d078 occupies 131,150,024 (67.48%) bytes. The memory is accumulated in one instance of java.util.LinkedList, loaded by <system class loader>, which occupies 130,898,736 (67.36%) bytes.

Thread java.lang.Thread @ 0xa6ef74e0 elasticsearch-rest-client-0-thread-1 has a local variable or reference to org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager @ 0xa6ef7730 which is on the shortest path to java.util.LinkedList @ 0xa703f370. The thread java.lang.Thread @ 0xa6ef74e0 elasticsearch-rest-client-0-thread-1 keeps local variables with total size 1,928 (0.00%) bytes.

Significant stack frames and local variables

org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(Lorg/apache/http/nio/reactor/IOEventDispatch;)V (PoolingNHttpClientConnectionManager.java:221)
org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager @ 0xa6ef7730 retains 131,150,024 (67.48%) bytes
The stacktrace of this Thread is available. See stacktrace. See stacktrace with involved local variables.

Keywords

org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager
org.springframework.boot.loader.launch.LaunchedClassLoader
java.util.LinkedList
org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(Lorg/apache/http/nio/reactor/IOEventDispatch;)V
PoolingNHttpClientConnectionManager.java:221

elasticsearch-rest-client-0-thread-1
  at sun.nio.ch.EPoll.wait(IJII)I (EPoll.java(Native Method))
  at sun.nio.ch.EPollSelectorImpl.doSelect(Ljava/util/function/Consumer;J)I (EPollSelectorImpl.java:121)
  at sun.nio.ch.SelectorImpl.lockAndDoSelect(Ljava/util/function/Consumer;J)I (SelectorImpl.java:130)
  at sun.nio.ch.SelectorImpl.select(J)I (SelectorImpl.java:142)
  at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(Lorg/apache/http/nio/reactor/IOEventDispatch;)V (AbstractMultiworkerIOReactor.java:343)
  at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(Lorg/apache/http/nio/reactor/IOEventDispatch;)V (PoolingNHttpClientConnectionManager.java:221)
  at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run()V (CloseableHttpAsyncClientBase.java:64)
  at java.lang.Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V (Thread.java:1596)
  at java.lang.Thread.run()V (Thread.java:1583)


Class Name | Shallow Heap (bytes) | Retained Heap (bytes)
-- | -- | --
java.util.LinkedList @ 0xa703f370 | 32 | 130,898,736
└─ leasingRequests org.apache.http.impl.nio.conn.CPool @ 0xa703e868 | 88 | 131,084,168
└─ pool org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager @ 0xa6ef7730 | 32 | 131,150,024
+ <Java Local> java.lang.Thread @ 0xa6ef74e0 elasticsearch-rest-client-0-thread-1 Thread | 104 | 1,928
+ val$connmgr org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1 @ 0xa6ef77c0 | 32 | 32
+ connmgr, connmgr org.apache.http.impl.nio.client.InternalHttpAsyncClient @ 0xa618fb18 | 72 | 112

We are using the Elasticsearch client in a conventional way from a reactive context like this:

	@Override
	@Retry(name = CB_ELASTIC_CLIENT)
	@CircuitBreaker(name = CB_ELASTIC_CLIENT, fallbackMethod = "searchFallback")
	public <T extends IndexedObject> Mono<SearchResponse<T>> search(SearchRequest searchRequest, String clientId,
		Class<T> clazz) {
		return Mono.fromFuture(() -> elasticsearchClient
			.withTransportOptions(t -> t.addHeader("X-Opaque-Id", clientId != null ? clientId : "unknown"))
			.search(searchRequest, clazz))
			.subscribeOn(Schedulers.boundedElastic())
			.doOnNext(x -> log.debug("Search response: {}", x));
	}

Wondering if you observed similar issues in the past or if you have an idea what the source of the issue could be.
The issue was also present in version prior to 8.16.1 as it seems.

Thanks a lot!

Best Regards
Sven S.

Edit:
A workaround which we found so far is to specify a short TTL for the connection of the http client itself.

The text was updated successfully, but these errors were encountered:

l-trotta · 2025-01-13T15:16:22Z

Hello! This is the first report of this kind, so I don't really know what could be the cause, it could very much be an internal issue within the Apache Async HTTP Client. Is it possible to get a snippet of the client initialization code, when the transport client (the one that uses the Apache client) is created and passed to the java client? Also since I see that spring is involved I'd suggest double checking if there's other apache dependencies that could interfere with the transitive one from the java client.

SwonVIP · 2025-01-14T11:53:41Z

Hi, thanks for your reply! I need to check the transitive dependencies, thats a good point!

Below is a snippet of how the client is initialised. We explicitly set values for connection timeout etc. as of now to mitigate the persisted connections that stayed open in the connection pool. For other clients we did not observe such a behaviour but only for connections via the Elasticsearch Java Client using the Apache Async Http Client.

        @Bean
	@Lazy
	@ConditionalOnMissingBean
	public ElasticsearchAsyncClient rawElasticsearchClient(ObjectMapper objectMapper) {
		var options = new RestClientOptions(RequestOptions.DEFAULT.toBuilder().build(), false);
		var transport = new RestClientTransport(restClient(), new JacksonJsonpMapper(objectMapper));
		return new ElasticsearchAsyncClient(transport, options);
	}

	@Bean
	public ReactiveElasticsearchClient elasticsearchClient(ElasticsearchAsyncClient rawElasticsearchClient) {
		return new DefaultReactiveElasticsearchClient(rawElasticsearchClient);
	}

	private RestClient restClient() {
		return RestClient.builder(
			new HttpHost(properties.host.getHost(), properties.host.getPort(), properties.host.getScheme()))
			.setRequestConfigCallback(b -> b.setSocketTimeout(SOCKET_TIMEOUT))
			.setRequestConfigCallback(b -> b.setConnectTimeout(CONNECT_TIMEOUT))
			.setHttpClientConfigCallback(httpClientBuilder -> {
				HttpAsyncClientBuilder httpAsyncClientBuilderWithAuth =
					httpClientBuilder.setDefaultCredentialsProvider(auth())
						.setMaxConnTotal(240)
						.setMaxConnPerRoute(20);

				return Optional.ofNullable(httpClientConfigCallback)
					.map(callback -> callback.customizeHttpClient(httpAsyncClientBuilderWithAuth))
					.orElse(httpAsyncClientBuilderWithAuth);
			})
			.build();
	}

Thanks a lot & have a nice day!
Sven S.

l-trotta · 2025-01-16T16:47:16Z

So, it looks like the leasingRequest LinkedList in PoolingNHttpClientConnectionManager keeps growing. I found this interesting old issue in the elasticsearch server repository which looks very similar to what's happening here, and was solved by removing the maxRetryTimeout parameter. Just to be sure I'd again suggest checking the transitive dependencies to make sure that the transport version (the elasticsearch-rest-client dependency) matches what's in the java client version (8.10.0). We'll investigate this more in depth once we make sure that all versions match.

l-trotta added the Area: Transport label Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection pool leak via Apache Async Http Client? #923

Connection pool leak via Apache Async Http Client? #923

SwonVIP commented Dec 23, 2024 •

edited

Loading

l-trotta commented Jan 13, 2025 •

edited

Loading

SwonVIP commented Jan 14, 2025

l-trotta commented Jan 16, 2025

Connection pool leak via Apache Async Http Client? #923

Connection pool leak via Apache Async Http Client? #923

Comments

SwonVIP commented Dec 23, 2024 • edited Loading

Java API client version

Java version

Elasticsearch Version

Problem description

l-trotta commented Jan 13, 2025 • edited Loading

SwonVIP commented Jan 14, 2025

l-trotta commented Jan 16, 2025

SwonVIP commented Dec 23, 2024 •

edited

Loading

l-trotta commented Jan 13, 2025 •

edited

Loading