PVA's port field in search request makes it not NAT/firewall friendly #197

EmilioPeJu · 2024-07-26T15:26:53Z

I would like to report a use-case that has problems with PVA, and though the problem is more a protocol-related problem, I couldn't find an issues page for just the protocol (not a specific implementation), so I assumed this was the best place to do it.

The use-case is running a container with a PVA server and exposing 5075-5067 to the host, this expose mechanism usually involves some NATing, if we send a search request from the host, it starts as:
127.0.0.1:49155 -> 127.0.0.1:5076 with payload specifying Port: 49155
The network plug-in converts that into something like:
172.20.255.250:33851 -> 172.20.255.250:5076 with payload specifying Port: 49155
and then, the PVA server tries to respond to port 49155 instead of the NAT-ed one (33851).
Because the network plug-in doesn't know anything about that port, it fails and obtains a ICMP destination unreachable message.
Please keep in mind this is not only container-specific, this will be a problems for any NAT or firewall doing something similar.

FYI: @coretl and @gilesknap

EmilioPeJu · 2024-07-26T15:29:09Z

A possible solution for this is changing the PVA protocol to have a special value for the port field (say 0) which the server would interpret as "answer to the port in the UDP header", this is something that is already done for the address, i.e. address 0.0.0.0 indicates the server to answer to the IP in the header.

mdavidsaver · 2024-07-29T19:38:33Z

The handoff from UDP to TCP is a feature of both CA and PVA protocols. I don't think there is any getting around the fact that UDP and TCP port numbers are in effect different namespaces.

One place where this is possible is in the case of search over TCP. When there are only TCP port numbers involved a special (0) value would make sense. I think the main complication would be in coordinating a minor protocol version increment. Existing clients do not recognize zero as special, and would blindly try to connect to port zero.

@kasemir fyi.

EmilioPeJu · 2024-07-29T20:57:07Z

Hi @mdavidsaver and sorry I didn't express myself properly.
I wasn't talking about handoff from UDP to TCP, I was referring only to the search request and reply, all happening fully using UDP... there is a big difference between CA and PVA in that PVA specifies the port in which the search requester (or client) wants to receive the search reply.
To be more specific, PVA spec document defines a search request as:

struct searchRequest {
  int searchSequenceID;
  byte flags;
  byte[3] reserved;
  byte[16] responseAddress;
  short responsePort;
  string[] protocols;
  struct {
    int searchInstanceID;
    string channelName;
  } channels[];
  };

The field this issue is about is responsePort, I cannot think of a case in which it is useful. The same applies to responseAddress but in my traffic capture I can see it is ignored and just set to 0.0.0.0 (actually the IPv6 representation of it).
Those two fields make even less sense if the search request is sent over TCP.

mdavidsaver · 2024-07-29T23:56:17Z

The field this issue is about is responsePort

Ah. So this a duplicate of #159?

I agree that allowing these indirect replies is a bad design. PVXS and core.pva servers do what may be done compatibly to be friendly to stateful firewalls matching request with reply. This does not cover NAT though. Fully eliminating this misfeature would require a protocol version increment. Possible, but tedious enough that it hasn't happened so far.

EmilioPeJu · 2024-07-30T08:07:15Z

No, this is not a duplicate of #159 , that issue is talking about the source port of the search response packet being a random one, however, this issue is about the search request packet specifying a response port which will become the destination port in the response packet.

P.S. An extra detail I just noticed, we didn't have the problem in #159 because we were talking to a PVA gateway (which doesn't use this PVA implementation). The problem described in the current issue(197) is common to both implementations though.

mdavidsaver · 2024-08-02T00:24:58Z

not a duplicate of #159

Ok. So a different consequence of the same protocol design decision.

Fully eliminating this misfeature would require a protocol version increment. Possible, but tedious enough that it hasn't happened so far.

On further reflection, I'm not sure that a minor (compatible) protocol increment would work. Testing the minor version works when it can be negotiated between client and server. aka. over TCP connections after the initial handshake, or with a UDP reply.

With a UDP request, the sender has no idea of the protocol minor versions (likely plural) supported by the recipients.

So I think it would be an incompatible change to start sending SEARCH requests with responsePort==0, regardless of protocol minor version.

Right now, the only way I can think of directly "fixing" this issue would be to introduce a new, second, search request message format. Maintaining compatibility would then require that clients concurrently send both messages. This would double the bandwidth used, which in 2024 is I think probably not an issue. Although I expect some would disagree with me on this.

A second option, which I like far less, would be to introduce handling of responsePort==0 on RX now, with the idea of "eventually" starting to send it.

As a note: responsePort is also necessary to implement (what I call) the local multicast "hack", where the recipient of a unicast UDP search will re-send it via. multicast to 127.0.0.1 to reach all PVA peers. This is how PVA avoids the problems CA has with unicast search to hosts with multiple IOC processes. I think this could be accommodated by appending the origin port to the ORIGIN_TAG message prefixed to forwarded messages.

EmilioPeJu · 2024-08-02T09:27:02Z

Thanks @mdavidsaver for the information,
I think the server side can be fixed (in both implementations) without breaking any old version (given that older clients will never use 0 as responsePort), regarding the client side, from all the options you mentioned, I think sending 2 search requests seems less problematic, and to be less invasive, this behavior could be enabled by setting some environment variable (for example EPICS_PVA_ALLOW_NAT=yes)

anjohnson · 2024-08-02T15:18:42Z

I would be one of those people who isn't keen on the idea of sending out duplicate searches.

This may be the same problem that we are hoping to solve by putting a PVA name-server in between the two networks. Is that something which could be done with containers too?

kasemir · 2024-08-02T16:56:49Z

solve by putting a PVA name-server in between

Maybe that's a better approach.

With containers, we've said for a while that you need to use --network=host. The OP said this isn't container-specific, NAT in general will cause problems. Well, yes. Network infrastructure may be aware of the http protocol and can accordingly patch URLs. But firewalls etc. don't understand PVA and won't update the responseAddress & Port.
So do we require a name server or PVA gateway to go across such network infrastructure?

mdavidsaver · 2024-08-02T19:13:46Z

With containers, we've said for a while that you need to use --network=host ...

Maybe "need" is too strict. Although imo. doing otherwise is asking for some avoidable pain.

gilesknap · 2024-08-13T07:29:21Z

Maybe "need" is too strict. Although imo. doing otherwise is asking for some avoidable pain.

In preparing for the EPICS collaboration meeting I have gone to some effort to stop using network=host for IOCs in order that we can run a workshop with lots of people using the same PVs on the same network. I've been meaning to get around to this for some time.

I have been entirely successful in doing this by running all the IOCs plus one ca-gateway in the same container network and having the ca-gateway bind to the CA ports on the Host (using the loopback adapter for local development - but this could also be used to bind to an actual NIC).

Next, I tried to use the PVA plugin to show images out of Areadetector. I could not achieve the same thing with PVAGW and asked Emillio to help me diagnose it. That is when we found the issue that Emillio reported here.

This may be the same problem that we are hoping to solve by putting a PVA name-server in between the two networks. Is that something which could be done with containers too?

This would work for containers as long as the port that the request comes in on is the port that you should reply to. If at any point the protocol requires passing port numbers in the application layer, any NAT will fail (including container network NATs).

mdavidsaver · 2024-08-14T02:03:45Z

Next, I tried to use the PVA plugin to show images out of Areadetector. I could not achieve the same thing with PVAGW ...

You might be able to make this work with a pair of gateways communicating by TCP only. As I sometimes do with SSH tunneling. aka. EPICS_PVA_NAME_SERVERS=.... Then I think all of the port numbers would be under your control.

Although, looking more closely, I realize that I have PVXS ignoring responseAddr and responsePort in SEARCH received over TCP.

https://github.com/epics-base/pvxs/blob/5fa743d4c87377859953012af3c0fbcd1b063129/src/serverchan.cpp#L183

gilesknap · 2024-08-14T07:36:20Z

Although, looking more closely, I realize that I have PVXS ignoring responseAddr and responsePort in SEARCH received over TCP.

My interpretation of this is that it would work with PVXS because TCP just replies back to where the SEARCH came from. The PVGW I'm using is from P4P so this would work?

It's not ideal because instead of having everything running nicely inside of a container with no host installs required, we now need a gateway outside too (if I'm interpreting you correctly).
--edit--
Maybe not. Both gateways could be in containers, one running in host network the other in the shared container network.

gilesknap · 2024-12-04T16:19:45Z

After a conversation with Michael I was able to get PVA servers inside of containers connected to clients outside.

The caveats are:-

client and server must be PVXS
the client must specify the server's IP with EPICS_PVA_NAME_SERVERS

The above mean that a single TCP connection is used and this will pass through NAT with no issues.

I've written up my findings here https://epics-containers.github.io/4.1.0b1/explanations/epics_protocols.html

mdavidsaver · 2024-12-10T00:13:54Z

client and server must be PVXS

Phoebus (with Kay's core.pva client) also supports EPICS_PVA_NAME_SERVERS.

gilesknap · 2024-12-10T08:16:26Z

client and server must be PVXS

Phoebus (with Kay's core.pva client) also supports EPICS_PVA_NAME_SERVERS.

Thanks @mdavidsaver, yes I should have mentioned that as I have this working nicely with phoebus too.

anjohnson · 2024-12-10T18:52:16Z

APS has an implementation of EPICS_PVA_NAME_SERVERS support for pvAccessCPP in PR #192, but that needs rework after review comments and development on it appears to have stalled for now.

gilesknap mentioned this issue Jul 29, 2024

PVA's port field in search request makes it not NAT/firewall friendly epics-base/pvxs#70

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PVA's port field in search request makes it not NAT/firewall friendly #197

PVA's port field in search request makes it not NAT/firewall friendly #197

EmilioPeJu commented Jul 26, 2024 •

edited

Loading

EmilioPeJu commented Jul 26, 2024

mdavidsaver commented Jul 29, 2024

EmilioPeJu commented Jul 29, 2024 •

edited

Loading

mdavidsaver commented Jul 29, 2024 •

edited

Loading

EmilioPeJu commented Jul 30, 2024 •

edited

Loading

mdavidsaver commented Aug 2, 2024

EmilioPeJu commented Aug 2, 2024 •

edited

Loading

anjohnson commented Aug 2, 2024

kasemir commented Aug 2, 2024

mdavidsaver commented Aug 2, 2024

gilesknap commented Aug 13, 2024

mdavidsaver commented Aug 14, 2024

gilesknap commented Aug 14, 2024 •

edited

Loading

gilesknap commented Dec 4, 2024

mdavidsaver commented Dec 10, 2024

gilesknap commented Dec 10, 2024

anjohnson commented Dec 10, 2024

PVA's port field in search request makes it not NAT/firewall friendly #197

PVA's port field in search request makes it not NAT/firewall friendly #197

Comments

EmilioPeJu commented Jul 26, 2024 • edited Loading

EmilioPeJu commented Jul 26, 2024

mdavidsaver commented Jul 29, 2024

EmilioPeJu commented Jul 29, 2024 • edited Loading

mdavidsaver commented Jul 29, 2024 • edited Loading

EmilioPeJu commented Jul 30, 2024 • edited Loading

mdavidsaver commented Aug 2, 2024

EmilioPeJu commented Aug 2, 2024 • edited Loading

anjohnson commented Aug 2, 2024

kasemir commented Aug 2, 2024

mdavidsaver commented Aug 2, 2024

gilesknap commented Aug 13, 2024

mdavidsaver commented Aug 14, 2024

gilesknap commented Aug 14, 2024 • edited Loading

gilesknap commented Dec 4, 2024

mdavidsaver commented Dec 10, 2024

gilesknap commented Dec 10, 2024

anjohnson commented Dec 10, 2024

EmilioPeJu commented Jul 26, 2024 •

edited

Loading

EmilioPeJu commented Jul 29, 2024 •

edited

Loading

mdavidsaver commented Jul 29, 2024 •

edited

Loading

EmilioPeJu commented Jul 30, 2024 •

edited

Loading

EmilioPeJu commented Aug 2, 2024 •

edited

Loading

gilesknap commented Aug 14, 2024 •

edited

Loading