Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export and Import metrics #1

Open
BrianHicks opened this issue May 11, 2018 · 3 comments
Open

Export and Import metrics #1

BrianHicks opened this issue May 11, 2018 · 3 comments

Comments

@BrianHicks
Copy link
Collaborator

Issue by BrianHicks
Friday Feb 10, 2017 at 16:18 GMT
Originally opened as BrianHicks/elm-benchmark#5


we should be able to export benchmarks for safekeeping and comparison against new data. This should be independent from running benchmarks (and possibly related to #4.) My ideal workflow:

  1. Run benchmarks
  2. Save data (CSV, JSON, whatever… as long as it's consistent)
  3. Improve code
  4. Import prior data for analysis

Also:

  1. Run benchmarks and save data ("a")
  2. Run improved benchmarks and saved data ("b")
  3. profit import benchmark data sets a and b for analysis.

This may mean that the analysis may be separate from the benchmarks. That wouldn't be the end of the world, and could pretty easily be hosted somewhere static and public.

@BrianHicks
Copy link
Collaborator Author

Comment by chadbrewbaker
Saturday Feb 11, 2017 at 14:10 GMT


Yes, this is pretty much it. BOS's Criterion split into two pieces. One for benchmarking Elm code, a common benchmark CSV format, and a separate chunk of code that takes benchmark data from some source and produces Criterion style graphs.

BenchmarkGenerator :: [config] -> InternalBenchmark 
BenchmarkExport :: InternalBenchmark -> CSV
BenchmarkImport :: CSV -> InternalBenchmark
ReportKDE :: InternalBenchMark -> KDEReport -- The default
ReportHisto :: InternalBenchmark -> HistoReport

See http://www.serpentine.com/criterion/tutorial.html

For this to really take off I think there has to be out of the box support for benchmarking WASM FFI. Elm -> JS -> LLVM IR -> WASM. That way you can also use Elm to benchmark any language that compiles to LLVM IR. C/C++, Haskell, ...

As for WebAssembly I'm not sure what the plans are for Elm. IMHO Elm should support a second type of JS Port called PurePort. If you have WebAssembly or JS that you can prove to be pure then Elm should have a simplified port structure for calling that code and assume no side effects.

@BrianHicks
Copy link
Collaborator Author

Comment by zwilias
Friday Mar 10, 2017 at 10:46 GMT


I think this is definitely the way to go; especially in terms of separation of concerns.

Let the runner give you only very basic info, and dump a json blob when benchmarking is done. Allow/encourage projects to then take such blobs and give pretty/well formatted information based on those.

I imagine such reporters could exist as SPA's online, too, no installation required ;)

@BrianHicks
Copy link
Collaborator Author

Comment by zwilias
Thursday Mar 23, 2017 at 18:05 GMT


Let's go all hypothetical for a moment and take it a bit further.

Imagine if elm-benchmark was actually 2 things:

  • an elm-benchmark-primitives library with the core benchmark and benchmark1 through benchmark8 functions, with one difference - rather than taking a string, make them take an arbitrary piece of data for identification. Edit: upon further inspection, I'm completely wrong and this is basically just Benchmark.LowLevel
  • elm-benchmark, providing a nice higher level API to work with these primitives (i.e. Benchmark.describe, Benchmark.compare, etc.), a minimalistic but sane minimal browser runner, and a JSON encoder. Edit: so yeah, that's basically already what it is -_-

Splitting the primitives into their own library and keeping them as primitive and limited as possible allows for a richer ecosystem of tools leveraging those. Edit: Then again, there's nothing stopping anyone from doing so right now.

Say you need an 8-way comparison with one of them marked as a baseline; you can just do that. The current structure requires nested describes to get relevant grouping, and doesn't support saying "this benchmark tests input of type X and size Y". There's no particular reason that it must be this way, other than the convenience of having a sane default. And that brings us full circle - the default runner could provide sane defaults. Edit: same goes here, that is totally feasible right now.

Edit: I kind had a major brainfart. See edits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant