Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory usage information #2

Open
nemequ opened this issue Mar 30, 2015 · 10 comments
Open

Add memory usage information #2

nemequ opened this issue Mar 30, 2015 · 10 comments

Comments

@nemequ
Copy link
Member

nemequ commented Mar 30, 2015

The obvious route for heap usage (fork() and wait3()) also has some issues when considering things like preexisting freelists in malloc implementations, fragmentation, and malloc requesting more memory than it needs (e.g., next highest power of two, a multiple of the page size, etc.).

I think the only way to do this accurately would be to override malloc/realloc/free/new/delete/mmap, but I still need to find a reliable solution for measuring the stack size.

@clbr
Copy link

clbr commented Sep 23, 2015

I'm attaching a sample implementation for Linux. Includes stack, heap and mmap. To avoid page size alignment overhead, you should use valgrind's massif, but I'd argue that it should be included.

Most everyone uses 4kb pages, and if a codec does many small allocations, it will also have that overhead in real usage.

http://pastebin.ca/3171788

@nemequ
Copy link
Member Author

nemequ commented Sep 23, 2015

Thanks, but AFAIK statm provides an instantaneous measurement. We can hardly ask each codec to call this code at the point when they happen to be using the most memory—we need a highwater measurement. It also suffers from all the problems I mentioned in the original report.

@clbr
Copy link

clbr commented Sep 23, 2015 via email

@nemequ
Copy link
Member Author

nemequ commented Sep 23, 2015

Using /proc/$PID/status also suffers from all the problems I mentioned in the original report. I think it is much better not to provide a number than to provide a wildly inaccurate one. Providing an inaccurate number could lead people to the wrong conclusions when in reality they could/should just test the codecs they are interested in in their software to see if it performs as they need it to. Squash even makes this trivial; changing codecs typically just requires changing a single string.

Massif would kill performance, which is far more important to most people than memory usage.

@clbr
Copy link

clbr commented Sep 23, 2015 via email

@nemequ
Copy link
Member Author

nemequ commented Sep 23, 2015

AFAIK it would require a significant rewrite of the benchmark, since it would have to fork()/exec() massif, and a second executable would need to be created to actually run the benchmark. That's a pretty big effort for a non-default option.

Also, the data wouldn't be included in the web interface as it would simply be too slow for me to be able to run the benchmark anymore. On the fastest computer it already takes almost 24 hours to run, and the slowest computer takes a few hours shy of a week. IIRC massif usually slows things by about an order of magnitude… I can't give up the computers I actually use for two weeks, and I can't wait 2 months for results from the slower machines.

@r-lyeh-archived
Copy link

Are benchmarks run on linux? If so, an LD_PRELOAD export w/ dlmalloc with a few tweaks over there could get the total RAM consumption and peaks.

@nemequ
Copy link
Member Author

nemequ commented Sep 23, 2015

Yes, they are currently run exclusively on Linux. I don't think LD_PRELOAD would be necessary; you could get the same effect from a glibc malloc hook.

Unfortunately it would miss memory allocated by C++'s new keyword (several plugins use it). It also wouldn't take into account plugins which use buffers on the stack. Finally, it would miss anonymous mappings from mmap and other allocators, but honestly I don't think that is a problem; I haven't done an exhaustive search but I'm not aware of any plugins which use either.

To be viable I think we need to be able to measure the high-water mark for:

  • malloc/realloc/calloc
  • new in C++
  • stack size

Without having a significant effect on performance.

@travisdowns
Copy link

I feel like launching a process per codec run and using the OS highwater counters is probably the most complete and promising approach. As you point out, though, that is a lot of work - although perhaps the process-per-run model will have other advantages too, in terms of being able to read the /proc numbers to learn interesting stuff.

@nemequ
Copy link
Member Author

nemequ commented Sep 26, 2015

My main concern with that is memory which malloc has sitting in a pool when you start that process wouldn't be counted. For codecs which require a lot of memory it wouldn't be a big deal, but for codecs which require little it could account for everything they require.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants