-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect historic average values #2920
Comments
This issue isn't inherently a bug but rather a limitation of the current query logic and how the data is stored and interpreted. Here's why: 1. Expected Behavior Based on Storage ModelOpenEMS is designed to store time-series data efficiently by persisting values only when they change or periodically (e.g., every five minutes). This model reduces storage overhead and aligns with typical industrial monitoring requirements. The query logic reflects this design, so when calculating the average, only the recorded points are considered, without inferring the duration a value persisted.
2. Gap Filling Is a Value-Added OperationFilling gaps with the last known value is not a universal requirement and depends on the specific analysis context. For example:
The query as implemented prioritizes simplicity and raw data fidelity, which is valid for many use cases. 3. InfluxDB Default BehaviorInfluxDB’s
4. Trade-Offs of Gap FillingGap filling introduces complexity and computational overhead, especially with large datasets or high-frequency data. Not all users or scenarios require this level of precision, so implementing it as a default could:
5. Documentation and User EducationThe current behavior is consistent with OpenEMS's data storage and querying principles. To address the confusion:
Why This Matters
In the getParams you could add:
to it, so it is correct for you maybe? |
I see, thank you for your explanation. I labelled it as a bug because in multiple cases we need time-weighted averages in OpenEMS to check the history (e.g. the energy monitor in the UI). But I also understand your arguments. |
Just because I am curious - what happens, if the Data is empty for 2 days and you use fill() for a timeframe which is within this missing data ? |
You're right. In this case, the gaps would still be filled with the previous value, which is also not a perfect solution. Let me give you some background information on why we encountered this issue. In early November, we transitioned from release 2024.9.0 to the current dev version (as of November 6, 2024). Prior to this update, values like _sum/EssActivePower, _sum/GridActivePower, and _sum/EssSoc were stored in Influx every second, regardless of changes. With the update, however, this behavior changed, and data is now only stored when there are changes, or at least every 5 minutes. As we need to analyze the power usage of our batteries at various resolutions for accurate forecasting, it is essential to have technically precise data. Ultimately, the solution with fill() was just a workaround for the underlying issue of missing data points. Unfortunately, I haven't yet determined what caused this change. Since time-series databases like Influx are optimized for storing large amounts of data and use various compression algorithms, the advantage of consistent data for _sum-channels should outweigh the additional data storage. From my perspective, at least for the past month, the data should be available in the corresponding resolution. There are a lot of use cases where high-resolution data plays an important role. |
That is true, but in a production environment, it's not feasible to have second-by-second data available—especially with more than 10,000 devices. The scalability and storage requirements make this impractical. Perhaps we could consider alternative approaches like adaptive sampling rates or data aggregation to ensure the needed accuracy without impacting system performance. |
Thank you for your message. I have set up an Influx-DB on the edges, which shows that the data is saved with a resolution of every second. However, I still get the data from the backend. If you look at https://github.com/OpenEMS/openems/blob/develop/io.openems.backend.core/src/io/openems/backend/core/jsonrpcrequesthandler/EdgeRpcRequestHandler.java lines 77 and 126, you can see that an edgeRpc call first attempts to load the data from the backend and only loads it from the edge if this is not possible. So I have the following questions and maybe someone can help me answer one of them:
Ultimately, I need some solution to retrieve the data in the correct resolution. Otherwise we can't calculate our predictions reliably. |
Well then i guess you need to develop some new Lines of Code to achieve this :) More in Detail maybe later on. Greetings |
But with a little bit of search you could have found it ;) |
If, in the future, a comprehensive solution addressing all of these issues is identified, we can certainly consider creating a Pull Request to implement it. For the time being, however, there doesn’t appear to be any fundamental defect or urgent need for changes within the OpenEMS UI, the Edge, or the Backend as currently implemented. Therefore, this “non-issue” can be closed for now. Should circumstances change or additional requirements arise, we can revisit this matter and proceed with a targeted PR at that time. @sfeilmeier |
Description
We have identified a bug when querying the historic
_sum/EssActivePower
for one of our simulated edges.The query incorrectly returns the average of all existing entries for the requested resolution (e.g., 15 minutes) without considering that OpenEMS persists a value only when it changes (or at least every five minutes).
Sample
09:30:00: 5 kW
09:30:01: 10 kW
09:30:05: 5 kW
Querying the average between 09:30:00 and 09:30:05 yields:
(5+10+5)/3 = 6.67 kW.
However, the correct average should account for the persistence of values over time: (5+10+10+10+10+5)/6 = 8.33 kW.
Setup
Branch: Current dev-branch
Environment: OpenEMS backend with simulated edge
Database: InfluxDB v2.7.10
Explaination
The field queried is
_sum/EssActivePower
.The issue is illustrated in the influx graph below:
During this time range, 74 entries exist in InfluxDB:
2 entries before the power increase, with a 5-minute gap between them.
35 entries while the power increases to 800,000 kW.
35 entries while the power decreases back to near 0 kW.
2 entries after the decrease, again with a 5-minute gap.
Requesting historic data via Postman for this range returns a mean value of 377,539 kW. This matches the result obtained using the
mean
function in InfluxDB.Influx Query:
Postman Query:
Expected behaviour
When accounting for gaps using the last existing value to fill missing timestamps, the correct average would be 121,724 kW.
Adjusted Influx Query:
This approach correctly considers the persistence of values over time, resulting in the expected average.
Screenshots
No response
Operating System
No response
How to reproduce the Error?
Setup an OpenEMS instance with Influx as time series database
Produce sample data for the field
_sum/EssActivePower
with constant valuesOption 1: Due to a self-consumption optimization controller and a simulated production/consumption datasource with entries like 5, 10, 10, 10, 10, 5).
Option 2: Use the "Fix Active Power" controller to produce a constant power for a short time. Note that this controller does not appear to use the PID filter.
Query the configured time range with the history method
queryHistoricTimeseriesData
via postman. You could also check the amount of entries in Influx for this time range.The text was updated successfully, but these errors were encountered: