-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add memory usage information #2
Comments
I'm attaching a sample implementation for Linux. Includes stack, heap and mmap. To avoid page size alignment overhead, you should use valgrind's massif, but I'd argue that it should be included. Most everyone uses 4kb pages, and if a codec does many small allocations, it will also have that overhead in real usage. |
Thanks, but AFAIK statm provides an instantaneous measurement. We can hardly ask each codec to call this code at the point when they happen to be using the most memory—we need a highwater measurement. It also suffers from all the problems I mentioned in the original report. |
The VmPeak value is available from the /proc/PID/status file, which
provides the maximum value (however, no way to reset without spawning a
new process). Massif as mentioned would support everything.
|
Using /proc/$PID/status also suffers from all the problems I mentioned in the original report. I think it is much better not to provide a number than to provide a wildly inaccurate one. Providing an inaccurate number could lead people to the wrong conclusions when in reality they could/should just test the codecs they are interested in in their software to see if it performs as they need it to. Squash even makes this trivial; changing codecs typically just requires changing a single string. Massif would kill performance, which is far more important to most people than memory usage. |
Massif would kill performance, which is far more important to most people than memory usage.
This is true, but why not make it a non-default option, only for
measuring the memory usage? This is the main thing I was missing from
your site (I came there to find out about brotli's memory usage, and
was disappointed).
|
AFAIK it would require a significant rewrite of the benchmark, since it would have to fork()/exec() massif, and a second executable would need to be created to actually run the benchmark. That's a pretty big effort for a non-default option. Also, the data wouldn't be included in the web interface as it would simply be too slow for me to be able to run the benchmark anymore. On the fastest computer it already takes almost 24 hours to run, and the slowest computer takes a few hours shy of a week. IIRC massif usually slows things by about an order of magnitude… I can't give up the computers I actually use for two weeks, and I can't wait 2 months for results from the slower machines. |
Are benchmarks run on linux? If so, an LD_PRELOAD export w/ dlmalloc with a few tweaks over there could get the total RAM consumption and peaks. |
Yes, they are currently run exclusively on Linux. I don't think LD_PRELOAD would be necessary; you could get the same effect from a glibc malloc hook. Unfortunately it would miss memory allocated by C++'s To be viable I think we need to be able to measure the high-water mark for:
Without having a significant effect on performance. |
I feel like launching a process per codec run and using the OS highwater counters is probably the most complete and promising approach. As you point out, though, that is a lot of work - although perhaps the process-per-run model will have other advantages too, in terms of being able to read the /proc numbers to learn interesting stuff. |
My main concern with that is memory which |
The obvious route for heap usage (fork() and wait3()) also has some issues when considering things like preexisting freelists in malloc implementations, fragmentation, and malloc requesting more memory than it needs (e.g., next highest power of two, a multiple of the page size, etc.).
I think the only way to do this accurately would be to override malloc/realloc/free/new/delete/mmap, but I still need to find a reliable solution for measuring the stack size.
The text was updated successfully, but these errors were encountered: