Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timestamp Inconsistencies Between CPU, Memory, and GPU Records #1254

Closed
bharathappali opened this issue Aug 9, 2024 · 1 comment
Closed
Assignees
Labels
bug Something isn't working

Comments

@bharathappali
Copy link
Member

Description:

We have identified a critical issue where there are significant timestamp inconsistencies between the recorded data for CPU, memory, and GPU metrics. Specifically, this issue arises when there are only a few data points available for CPU and memory metrics, while GPU metrics have a much higher frequency of recorded entries. For instance, in a recent workload, only two data points were available for CPU and memory over a given time range, whereas there were 30 entries for GPU. However, 28 of these GPU records did not have any corresponding CPU and memory data within a reasonable time proximity.

Impact:

When attempting to map these records, the absence of matching timestamps for CPU and memory results in the creation of new entries in the GPU map without any corresponding CPU and memory metrics. This leads to situations where the metrics map for a given pod contains only GPU values. Consequently, when performing pod-level calculations that rely on a complete set of metrics (CPU, memory, and GPU), the process encounters null values, causing errors or crashes.

@bharathappali bharathappali added the bug Something isn't working label Aug 9, 2024
@bharathappali bharathappali self-assigned this Aug 9, 2024
@bharathappali
Copy link
Member Author

Fixed in #1255

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant