Add memory usage information #2

nemequ · 2015-03-30T15:39:52Z

The obvious route for heap usage (fork() and wait3()) also has some issues when considering things like preexisting freelists in malloc implementations, fragmentation, and malloc requesting more memory than it needs (e.g., next highest power of two, a multiple of the page size, etc.).

I think the only way to do this accurately would be to override malloc/realloc/free/new/delete/mmap, but I still need to find a reliable solution for measuring the stack size.

clbr · 2015-09-23T07:51:24Z

I'm attaching a sample implementation for Linux. Includes stack, heap and mmap. To avoid page size alignment overhead, you should use valgrind's massif, but I'd argue that it should be included.

Most everyone uses 4kb pages, and if a codec does many small allocations, it will also have that overhead in real usage.

http://pastebin.ca/3171788

nemequ · 2015-09-23T16:01:38Z

Thanks, but AFAIK statm provides an instantaneous measurement. We can hardly ask each codec to call this code at the point when they happen to be using the most memory—we need a highwater measurement. It also suffers from all the problems I mentioned in the original report.

clbr · 2015-09-23T16:26:55Z

The VmPeak value is available from the /proc/PID/status file, which provides the maximum value (however, no way to reset without spawning a new process). Massif as mentioned would support everything.

nemequ · 2015-09-23T16:57:43Z

Using /proc/$PID/status also suffers from all the problems I mentioned in the original report. I think it is much better not to provide a number than to provide a wildly inaccurate one. Providing an inaccurate number could lead people to the wrong conclusions when in reality they could/should just test the codecs they are interested in in their software to see if it performs as they need it to. Squash even makes this trivial; changing codecs typically just requires changing a single string.

Massif would kill performance, which is far more important to most people than memory usage.

clbr · 2015-09-23T17:08:02Z

Massif would kill performance, which is far more important to most people than memory usage.

This is true, but why not make it a non-default option, only for measuring the memory usage? This is the main thing I was missing from your site (I came there to find out about brotli's memory usage, and was disappointed).

nemequ · 2015-09-23T17:25:10Z

AFAIK it would require a significant rewrite of the benchmark, since it would have to fork()/exec() massif, and a second executable would need to be created to actually run the benchmark. That's a pretty big effort for a non-default option.

Also, the data wouldn't be included in the web interface as it would simply be too slow for me to be able to run the benchmark anymore. On the fastest computer it already takes almost 24 hours to run, and the slowest computer takes a few hours shy of a week. IIRC massif usually slows things by about an order of magnitude… I can't give up the computers I actually use for two weeks, and I can't wait 2 months for results from the slower machines.

r-lyeh-archived · 2015-09-23T19:45:01Z

Are benchmarks run on linux? If so, an LD_PRELOAD export w/ dlmalloc with a few tweaks over there could get the total RAM consumption and peaks.

nemequ · 2015-09-23T20:00:42Z

Yes, they are currently run exclusively on Linux. I don't think LD_PRELOAD would be necessary; you could get the same effect from a glibc malloc hook.

Unfortunately it would miss memory allocated by C++'s new keyword (several plugins use it). It also wouldn't take into account plugins which use buffers on the stack. Finally, it would miss anonymous mappings from mmap and other allocators, but honestly I don't think that is a problem; I haven't done an exhaustive search but I'm not aware of any plugins which use either.

To be viable I think we need to be able to measure the high-water mark for:

malloc/realloc/calloc
new in C++
stack size

Without having a significant effect on performance.

travisdowns · 2015-09-26T08:30:10Z

I feel like launching a process per codec run and using the OS highwater counters is probably the most complete and promising approach. As you point out, though, that is a lot of work - although perhaps the process-per-run model will have other advantages too, in terms of being able to read the /proc numbers to learn interesting stuff.

nemequ · 2015-09-26T16:16:55Z

My main concern with that is memory which malloc has sitting in a pool when you start that process wouldn't be counted. For codecs which require a lot of memory it wouldn't be a big deal, but for codecs which require little it could account for everything they require.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add memory usage information #2

Add memory usage information #2

nemequ commented Mar 30, 2015

clbr commented Sep 23, 2015

nemequ commented Sep 23, 2015

clbr commented Sep 23, 2015 via email

nemequ commented Sep 23, 2015

clbr commented Sep 23, 2015 via email

nemequ commented Sep 23, 2015

r-lyeh-archived commented Sep 23, 2015

nemequ commented Sep 23, 2015

travisdowns commented Sep 26, 2015

nemequ commented Sep 26, 2015

Add memory usage information #2

Add memory usage information #2

Comments

nemequ commented Mar 30, 2015

clbr commented Sep 23, 2015

nemequ commented Sep 23, 2015

clbr commented Sep 23, 2015 via email

nemequ commented Sep 23, 2015

clbr commented Sep 23, 2015 via email

nemequ commented Sep 23, 2015

r-lyeh-archived commented Sep 23, 2015

nemequ commented Sep 23, 2015

travisdowns commented Sep 26, 2015

nemequ commented Sep 26, 2015