-
Notifications
You must be signed in to change notification settings - Fork 308
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* chore: benchmark revamp * foramt and add more documentation
- Loading branch information
Showing
4 changed files
with
450 additions
and
45 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,128 @@ | ||
# BigQuery Benchmark | ||
This directory contains benchmarks for BigQuery client. | ||
This directory contains benchmark scripts for BigQuery client. It is created primarily for project | ||
maintainers to measure library performance. | ||
|
||
## Usage | ||
`python benchmark.py queries.json` | ||
`python benchmark.py` | ||
|
||
BigQuery service caches requests so the benchmark should be run | ||
at least twice, disregarding the first result. | ||
|
||
### Flags | ||
Run `python benchmark.py -h` for detailed information on available flags. | ||
|
||
`--reruns` can be used to override the default number of times a query is rerun. Must be a positive | ||
integer. Default value is 3. | ||
|
||
`--projectid` can be used to run benchmarks in a different project. If unset, the GOOGLE_CLOUD_PROJECT | ||
environment variable is used. | ||
|
||
`--queryfile` can be used to override the default file which contains queries to be instrumented. | ||
|
||
`--table` can be used to specify a table to which benchmarking results should be streamed. The format | ||
for this string is in BigQuery standard SQL notation without escapes, e.g. `projectid.datasetid.tableid` | ||
|
||
`--create_table` can be used to have the benchmarking tool create the destination table prior to streaming. | ||
|
||
`--tag` allows arbitrary key:value pairs to be set. This flag can be specified multiple times. | ||
|
||
When `--create_table` flag is set, must also specify the name of the new table using `--table`. | ||
|
||
### Example invocations | ||
|
||
Setting all the flags | ||
``` | ||
python benchmark.py \ | ||
--reruns 5 \ | ||
--projectid test_project_id \ | ||
--table logging_project_id.querybenchmarks.measurements \ | ||
--create_table \ | ||
--tag source:myhostname \ | ||
--tag somekeywithnovalue \ | ||
--tag experiment:special_environment_thing | ||
``` | ||
|
||
Or, a more realistic invocation using shell substitions: | ||
``` | ||
python benchmark.py \ | ||
--reruns 5 \ | ||
--table $BENCHMARK_TABLE \ | ||
--tag origin:$(hostname) \ | ||
--tag branch:$(git branch --show-current) \ | ||
--tag latestcommit:$(git log --pretty=format:'%H' -n 1) | ||
``` | ||
|
||
## Stream Results To A BigQuery Table | ||
|
||
When streaming benchmarking results to a BigQuery table, the table schema is as follows: | ||
``` | ||
[ | ||
{ | ||
"name": "groupname", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"name": "name", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"name": "tags", | ||
"type": "RECORD", | ||
"mode": "REPEATED", | ||
"fields": [ | ||
{ | ||
"name": "key", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"name": "value", | ||
"type": "STRING" | ||
} | ||
] | ||
}, | ||
{ | ||
"name": "SQL", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"name": "runs", | ||
"type": "RECORD", | ||
"mode": "REPEATED", | ||
"fields": [ | ||
{ | ||
"name": "errorstring", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"name": "start_time", | ||
"type": "TIMESTAMP" | ||
}, | ||
{ | ||
"name": "query_end_time", | ||
"type": "TIMESTAMP" | ||
}, | ||
{ | ||
"name": "first_row_returned_time", | ||
"type": "TIMESTAMP" | ||
}, | ||
{ | ||
"name": "all_rows_returned_time", | ||
"type": "TIMESTAMP" | ||
}, | ||
{ | ||
"name": "total_rows", | ||
"type": "INTEGER" | ||
} | ||
] | ||
}, | ||
{ | ||
"name": "event_time", | ||
"type": "TIMESTAMP" | ||
} | ||
] | ||
``` | ||
|
||
The table schema is the same as the [benchmark in go](https://github.com/googleapis/google-cloud-go/tree/main/bigquery/benchmarks), | ||
so results from both languages can be streamed to the same table. | ||
|
||
## BigQuery Benchmarks In Other Languages | ||
* Go: https://github.com/googleapis/google-cloud-go/tree/main/bigquery/benchmarks | ||
* JAVA: https://github.com/googleapis/java-bigquery/tree/main/benchmark |
Oops, something went wrong.