Increasing Memory Consumption #2365

steffenbeermann · 2024-11-12T06:45:23Z

Describe the bug
We noticed that we are running out of memory sometimes on the devices running IoT Edge with the OPC Publisher. After investigating, we noticed that the OPC Publisher module takes more and more memory over time, see:

Restarting the module will reset the memory usage (see dip at start of the graph).
We noticed the same behavior on OPC Publisher 2.9.11 and 2.9.0

To Reproduce
We have the following config:

{
  "Hostname": "publisher_axh_active",
  "Cmd": [
    "--cl=5",
    "--cf",
    "--aa",
    "--pf=/mount/publishednodes.json",
    "--PkiRootPath=/mount/pki",
    "--si=90",
    "--di=3600"
  ],
  "HostConfig": {
    "Binds": [
      "/home/edge/OPCPublisherMount_Active:/mount"
    ]
  }
}

and we have around 20 endpoints configured with a total of around 200.000 Nodes in multiple subscription.

On an older version 2.8.1 it seems to run fine without increasing memory over time:

Expected behavior
The OPC Publisher does its garbage collection it self and does not built up high memory usages

The text was updated successfully, but these errors were encountered:

marcschier · 2024-11-15T17:57:22Z

Could you run without Prometheus by setting --em=False?

Also, how does the oom show for you?

steffenbeermann · 2024-11-15T18:56:07Z

We rely on the metrics of the OPC publisher module. With disabling Prometheus, the edge metric collector will still function right?

We noticed the issue because the IoT Edge runtime crashed and restarted all modules. It failed for a few hours until it went up again. See the drop of the memory on the 10th of November:

This goes along with a downtime of all modules for about 3 hours, where they did not respond. After some automatic restarts all modules went up again on their own.

marcschier · 2024-11-22T12:11:11Z

Could you take a look at the diagnostic log and check the numbers of the data flow pipeline (encoder related, send queue, egress, etc.), check if they are low or increasing? That could mean we need to tune the data flow path to not hold on to incoming data it cannot send out.

My colleague also found a resource leak in the security handshake of the OPC UA stack. But it is likely not enough to explain the increases. Especially the huge increase at the end of the run from 2->4GB.

steffenbeermann · 2024-12-02T09:50:59Z

I have this diagnostic info for one endpoint:

    "sentMessagesPerSec": 0.12591434486187222,
    "ingestionDuration": "6.17:47:16.9692019",
    "ingressDataChanges": 3494295,
    "ingressValueChanges": 12381772,
    "ingressBatchBlockBufferSize": 48,
    "encodingBlockInputSize": 0,
    "encodingBlockOutputSize": 0,
    "encoderNotificationsProcessed": 3495567,
    "encoderNotificationsDropped": 0,
    "encoderIoTMessagesProcessed": 73343,
    "encoderAvgNotificationsMessage": 47.660540201519055,
    "encoderAvgIoTMessageBodySize": 37767.330201928016,
    "encoderAvgIoTChunkUsage": 9.220539600080082,
    "estimatedIoTChunksPerDay": 0,
    "outgressInputBufferCount": 0,
    "outgressInputBufferDropped": 0,
    "outgressIoTMessageCount": 73336,
    "connectionRetries": 0,
    "opcEndpointConnected": true,
    "monitoredOpcNodesSucceededCount": 6862,
    "monitoredOpcNodesFailedCount": 0,
    "ingressEventNotifications": 0,
    "ingressEvents": 0,
    "encoderMaxMessageSplitRatio": 0,
    "ingressDataChangesInLastMinute": 360,
    "ingressValueChangesInLastMinute": 1144,
    "ingressHeartbeats": 0,
    "ingressCyclicReads": 0

I don't exactly understand what you mean with data flow pipeline?

steffenbeermann · 2024-12-02T09:55:37Z

Could you run without Prometheus by setting --em=False?

Also, how does the oom show for you?

I tried with the setting --em=False but it seems it did not solved the problem as there is still an slow increase over the last two weeks:

marcschier added the bug Something isn't working label Nov 15, 2024

marcschier added this to the 2.9.12 milestone Nov 15, 2024

marcschier modified the milestones: 2.9.12, 2.9.13 Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increasing Memory Consumption #2365

Increasing Memory Consumption #2365

steffenbeermann commented Nov 12, 2024 •

edited

Loading

marcschier commented Nov 15, 2024

steffenbeermann commented Nov 15, 2024 •

edited

Loading

marcschier commented Nov 22, 2024

steffenbeermann commented Dec 2, 2024

steffenbeermann commented Dec 2, 2024

Increasing Memory Consumption #2365

Increasing Memory Consumption #2365

Comments

steffenbeermann commented Nov 12, 2024 • edited Loading

marcschier commented Nov 15, 2024

steffenbeermann commented Nov 15, 2024 • edited Loading

marcschier commented Nov 22, 2024

steffenbeermann commented Dec 2, 2024

steffenbeermann commented Dec 2, 2024

steffenbeermann commented Nov 12, 2024 •

edited

Loading

steffenbeermann commented Nov 15, 2024 •

edited

Loading