Skip to content

Commit

Permalink
chore: benchmark revamp (#1694)
Browse files Browse the repository at this point in the history
* chore: benchmark revamp

* foramt and add more documentation
  • Loading branch information
Linchin authored Oct 23, 2023
1 parent 345e06e commit 76d0e5a
Show file tree
Hide file tree
Showing 4 changed files with 450 additions and 45 deletions.
128 changes: 124 additions & 4 deletions benchmark/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,128 @@
# BigQuery Benchmark
This directory contains benchmarks for BigQuery client.
This directory contains benchmark scripts for BigQuery client. It is created primarily for project
maintainers to measure library performance.

## Usage
`python benchmark.py queries.json`
`python benchmark.py`

BigQuery service caches requests so the benchmark should be run
at least twice, disregarding the first result.

### Flags
Run `python benchmark.py -h` for detailed information on available flags.

`--reruns` can be used to override the default number of times a query is rerun. Must be a positive
integer. Default value is 3.

`--projectid` can be used to run benchmarks in a different project. If unset, the GOOGLE_CLOUD_PROJECT
environment variable is used.

`--queryfile` can be used to override the default file which contains queries to be instrumented.

`--table` can be used to specify a table to which benchmarking results should be streamed. The format
for this string is in BigQuery standard SQL notation without escapes, e.g. `projectid.datasetid.tableid`

`--create_table` can be used to have the benchmarking tool create the destination table prior to streaming.

`--tag` allows arbitrary key:value pairs to be set. This flag can be specified multiple times.

When `--create_table` flag is set, must also specify the name of the new table using `--table`.

### Example invocations

Setting all the flags
```
python benchmark.py \
--reruns 5 \
--projectid test_project_id \
--table logging_project_id.querybenchmarks.measurements \
--create_table \
--tag source:myhostname \
--tag somekeywithnovalue \
--tag experiment:special_environment_thing
```

Or, a more realistic invocation using shell substitions:
```
python benchmark.py \
--reruns 5 \
--table $BENCHMARK_TABLE \
--tag origin:$(hostname) \
--tag branch:$(git branch --show-current) \
--tag latestcommit:$(git log --pretty=format:'%H' -n 1)
```

## Stream Results To A BigQuery Table

When streaming benchmarking results to a BigQuery table, the table schema is as follows:
```
[
{
"name": "groupname",
"type": "STRING"
},
{
"name": "name",
"type": "STRING"
},
{
"name": "tags",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "key",
"type": "STRING"
},
{
"name": "value",
"type": "STRING"
}
]
},
{
"name": "SQL",
"type": "STRING"
},
{
"name": "runs",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "errorstring",
"type": "STRING"
},
{
"name": "start_time",
"type": "TIMESTAMP"
},
{
"name": "query_end_time",
"type": "TIMESTAMP"
},
{
"name": "first_row_returned_time",
"type": "TIMESTAMP"
},
{
"name": "all_rows_returned_time",
"type": "TIMESTAMP"
},
{
"name": "total_rows",
"type": "INTEGER"
}
]
},
{
"name": "event_time",
"type": "TIMESTAMP"
}
]
```

The table schema is the same as the [benchmark in go](https://github.com/googleapis/google-cloud-go/tree/main/bigquery/benchmarks),
so results from both languages can be streamed to the same table.

## BigQuery Benchmarks In Other Languages
* Go: https://github.com/googleapis/google-cloud-go/tree/main/bigquery/benchmarks
* JAVA: https://github.com/googleapis/java-bigquery/tree/main/benchmark
Loading

0 comments on commit 76d0e5a

Please sign in to comment.