Graph remembers Gateway IP address for listing OCM shares #10846

wkloucek · 2025-01-09T15:12:11Z

Describe the bug

I'm testing OCM sharing with owncloud/ocis-charts#840 and cs3org/reva#5033

Steps to reproduce

start the ocm-install deplyoment example in minikube
login to each of the both oCIS installations with a different user
establish a OCM trust relationship between those two users
share a file via OCM to user x on instance B
as user x, list the OCM shares and see that you received a file share
kubectl rollout restart deploy -n $ns gateway where ns is the namespace of instance B / user x
wait until the previous gateway pod is gone and only the new gateway pod is there
list shares again

Expected behavior

Everything works as before

Actual behavior

We don't see the OCM share anymore.

Graph service complains:

graph-587d848cbf-2shrr graph {"level":"debug","service":"graph","error":"generalException: stat:rpc error: code = Unavailable desc = last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 10.244.7.190:9142: connect: connection reset by peer\"","shareid":"b7e559ef-54dc-4184-9284-b4c6493c6d97","remoteshareid":"815ea2cb-95de-434c-8eaf-d682a420e607","time":"2025-01-09T15:06:50Z","line":"github.com/owncloud/ocis/v2/services/graph/pkg/service/v0/utils.go:565","message":"could not stat received ocm share, skipping"}

The used gateway IP is the one of the old gateway pod:

k get pods -n $ns -o wide
NAME                                 READY   STATUS        RESTARTS        AGE     IP             NODE       NOMINATED NODE   READINESS GATES
...
gateway-66665b8659-85f4s             1/1     Running       0               14s     10.244.7.202   minikube   <none>           <none>
gateway-84d55df46d-pdn25             1/1     Terminating   1 (6m33s ago)   6m36s   10.244.7.190   minikube   <none>           <none>

Setup

see issue description

Additional context

restarting the graph service doesn't help here.

The text was updated successfully, but these errors were encountered:

kobergj · 2025-01-09T15:25:53Z

Prio 1 to find out if this is a release blocker. Can be deprioritized if not.

dj4oC · 2025-01-09T16:05:38Z

It is working until the pods are restarted, right?
In that case, it would not block a rc1 but a final, right?

wkloucek · 2025-01-10T08:01:26Z

It is working until the pods are restarted, right?

We can never guarantee to NOT restart pods. So it might break sooner than you'd wish.

wkloucek · 2025-01-13T19:59:04Z

Similar but different message during OCM sharing:

ocm-64f9c566d8-vdxp7 ocm {"level":"error","service":"ocm","pkg":"rgrpc","traceid":"53cb70a07c24fe9cab98c0ffc1aa22dc","error":"rpc error: code = Unavailable desc = last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 10.96.5.119:9142: i/o timeout\"","status":{"code":15,"message":"error listing spaces","trace":"53cb70a07c24fe9cab98c0ffc1aa22dc"},"filters":[{"type":4,"Term":{"SpaceType":"+grant"}},{"type":6,"Term":{"User":{"idp":"https://xxxx","opaque_id":"xxx"}}}],"time":"2025-01-13T19:51:53Z","line":"github.com/cs3org/reva/[email protected]/internal/grpc/services/storageprovider/storageprovider.go:580","message":"failed to list storage spaces"}

kobergj · 2025-01-13T21:07:38Z

It is probably not related to ocm but to the service registry issues. Should be reproducible without ocm

kobergj · 2025-01-22T12:16:15Z

Insight: Restarting the ocm service fixes the problem

wkloucek · 2025-01-22T12:59:41Z

Insight: Restarting the ocm service fixes the problem

it fixes the error in the graph service? 😆

kobergj · 2025-01-22T14:50:02Z

Yes. The connection issue with gateway is coming from ocm service. It talked to the gateway using a gateway client initialized at start only. So it would never talk to nats to get recent registry changes. Fix in above PR.

wkloucek · 2025-01-22T14:54:01Z

Yes. The connection issue with gateway is coming from ocm service.

Ah, the graph service just "repeats" the error message of OCM in this case?

kobergj · 2025-01-22T15:06:35Z

Ah, the graph service just "repeats" the error message of OCM in this case?

Yes exactly. We thought it is the connection graph:gateway but in fact it is ocm:gateway that throws this error. Therefore we should be able to find the same error first in the ocm logs. Then it is repeated by gateway(?) and graph.

kobergj · 2025-01-22T15:16:46Z

Fun Fact: The SQL invite manager has the same flaw. But we don't care because we don't support it.

wkloucek added the Type:Bug label Jan 9, 2025

kobergj added this to Infinite Scale Team Board Jan 9, 2025

github-project-automation bot moved this to Qualification in Infinite Scale Team Board Jan 9, 2025

kobergj moved this from Qualification to Prio 1 in Infinite Scale Team Board Jan 9, 2025

kobergj self-assigned this Jan 22, 2025

This was referenced Jan 22, 2025

bump the ocis version accordingly to the major version cs3org/reva#5042

Merged

[full-ci] Fix the ocm gateway connection pool #10916

Open

kobergj assigned 2403905 and unassigned kobergj Jan 22, 2025

unbekanntes-pferd moved this from Prio 1 to In progress in Infinite Scale Team Board Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph remembers Gateway IP address for listing OCM shares #10846

Graph remembers Gateway IP address for listing OCM shares #10846

wkloucek commented Jan 9, 2025 •

edited

Loading

kobergj commented Jan 9, 2025 •

edited

Loading

dj4oC commented Jan 9, 2025

wkloucek commented Jan 10, 2025

wkloucek commented Jan 13, 2025

kobergj commented Jan 13, 2025

kobergj commented Jan 22, 2025

wkloucek commented Jan 22, 2025

kobergj commented Jan 22, 2025

wkloucek commented Jan 22, 2025

kobergj commented Jan 22, 2025

kobergj commented Jan 22, 2025

Graph remembers Gateway IP address for listing OCM shares #10846

Graph remembers Gateway IP address for listing OCM shares #10846

Comments

wkloucek commented Jan 9, 2025 • edited Loading

Describe the bug

Steps to reproduce

Expected behavior

Actual behavior

Setup

Additional context

kobergj commented Jan 9, 2025 • edited Loading

dj4oC commented Jan 9, 2025

wkloucek commented Jan 10, 2025

wkloucek commented Jan 13, 2025

kobergj commented Jan 13, 2025

kobergj commented Jan 22, 2025

wkloucek commented Jan 22, 2025

kobergj commented Jan 22, 2025

wkloucek commented Jan 22, 2025

kobergj commented Jan 22, 2025

kobergj commented Jan 22, 2025

wkloucek commented Jan 9, 2025 •

edited

Loading

kobergj commented Jan 9, 2025 •

edited

Loading