Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve response message when a connection fails to respond within a timeout. #196

Open
tommairs opened this issue Jun 3, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@tommairs
Copy link
Collaborator

tommairs commented Jun 3, 2024

While recently testing a deployment, I noticed that connections to a port that seemed open, but never returned an EHLO response, also seemed to not record any error message. In this case, we were trying to use STARTTLS on port 465, which was using SMTPS and KumoMTA did not get the response it was expecting so it behaved like a tarpit. The timeout seemed to expire, but a descriptive error message was not logged anywhere I was able to find.

Ideally, a connection error should be logged showing that the remote server did not respond to the EHLO.

I will continue to investigate.

@wez
Copy link
Collaborator

wez commented Jun 4, 2024

When connecting to an SMTPS port the remote host will not volunteer any data; it is waiting on the client to initiate the TLS handshake.
This is at-odds with SMTP in which the client connects and waits for the remote host to send the banner.
So a mis-configured client will essentially "deadlock" its conversation when talking to an SMTPS host until either party times out.

In kumomta, we will wait for the connect_timeout to expire before attempting to connect to the next host in the connection plan. Only once it has been exhausted (or we successfully connected somewhere) will the set of connection failures be logged.

We could consider adding a new connection failure log event for this case, but it will result in increased log volume and IO pressure.

The outbound tracing stuff will also help to diagnose this situation without impacting logging.

wez added a commit that referenced this issue Jul 12, 2024
The motivation for this is:

My test environment is not permitted to reach outbound port 25.
If I run an ad-hoc test without setting up an explicit sink,
I end up with messages that try to reach the public internet.
Since they are blocked at a firewall, each of the MX hosts in
the connection plan is subject to a 60s wait before trying the next
thing.

In addition, this can cause the shutdown to take longer while
we wait for the in-flight delivery attempts to complete.

Making a separate configuration option allows the local administrator
to decide how to split the time waiting for a connection from
the time waiting for the banner.

refs: #196
@wez
Copy link
Collaborator

wez commented Jul 12, 2024

In main, connect_timeout has now been split into connect_timeout (for the raw connection establishment) and banner_timeout (for the reading of the initial 220).

While it doesn't directly address the introspective side of this issue, it does allow eg: setting connect_timeout to something fairly short, while keeping the banner timeout are a more RFC-appropriate value.

This won't help with trying to send to an SMTPS destination due to misconfiguration.

However, also in main, is kcli trace-smtp-client which can provide insight into the connection attempts being made for a given client session.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants