Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - sync: enable rate limiting for servers #5151

Closed
wants to merge 33 commits into from

Conversation

dshulyak
Copy link
Contributor

@dshulyak dshulyak commented Oct 13, 2023

closes: #4977
closes: #4603

this change introduces two configuration parameter for every server:

  • requests per interval pace, for example 10 req/s, this caps the maximum bandwidth that every server can use
  • queue size, it is set to serve requests within expected latency. every other request is dropped immediately so that client can retry with different node. currently the timeout is set to 10s, so the queue should be roughly 10 times larger then rps

it doesn't provide global limit for bandwidth, but we have limit for the number of peers. and honest peer doesn't run many concurrent queries. so what we really want to handle is peers with intentionally malicious behavior, but thats not a pressing issue

example configuration:

"fetch": {
        "servers": {
            "ax/1": {"queue": 10, "requests": 1, "interval": "1s"},
            "ld/1": {"queue": 1000, "requests": 100, "interval": "1s"},
            "hs/1": {"queue": 2000, "requests": 200, "interval": "1s"},
            "mh/1": {"queue": 1000, "requests": 100, "interval": "1s"},
            "ml/1": {"queue": 100, "requests": 10, "interval": "1s"},
            "lp/2": {"queue": 10000, "requests": 1000, "interval": "1s"}
        }
    }

go-spacemesh/fetch/fetch.go

Lines 130 to 144 in 3cf0214

// serves 1 MB of data
atxProtocol: {Queue: 10, Requests: 1, Interval: time.Second},
// serves 1 KB of data
lyrDataProtocol: {Queue: 1000, Requests: 100, Interval: time.Second},
// serves atxs, ballots, active sets
// atx - 1 KB
// ballots > 300 bytes
// often queried after receiving gossip message
hashProtocol: {Queue: 2000, Requests: 200, Interval: time.Second},
// serves at most 100 hashes - 3KB
meshHashProtocol: {Queue: 1000, Requests: 100, Interval: time.Second},
// serves all malicious ids (id - 32 byte) - 10KB
malProtocol: {Queue: 100, Requests: 10, Interval: time.Second},
// 64 bytes
OpnProtocol: {Queue: 10000, Requests: 1000, Interval: time.Second},

metrics are per server:

targetQueue = metrics.NewGauge(
"target_queue",
namespace,
"target size of the queue",
[]string{protoLabel},
)
queue = metrics.NewGauge(
"queue",
namespace,
"actual size of the queue",
[]string{protoLabel},
)
targetRps = metrics.NewGauge(
"rps",
namespace,
"target requests per second",
[]string{protoLabel},
)
requests = metrics.NewCounter(
"requests",
namespace,
"requests counter",
[]string{protoLabel, "state"},
)
clientLatency = metrics.NewHistogramWithBuckets(
"client_latency_seconds",
namespace,
"latency since initiating a request",
[]string{protoLabel, "result"},
prometheus.ExponentialBuckets(0.01, 2, 10),
)
serverLatency = metrics.NewHistogramWithBuckets(
"server_latency_seconds",
namespace,
"latency since accepting new stream",
[]string{protoLabel},
prometheus.ExponentialBuckets(0.01, 2, 10),
)

have to be enabled for all servers with

"fetch": {
        "servers-metrics": true
    }

@codecov
Copy link

codecov bot commented Oct 13, 2023

Codecov Report

Merging #5151 (e2b4fff) into develop (b168b41) will increase coverage by 0.0%.
The diff coverage is 89.3%.

@@           Coverage Diff            @@
##           develop   #5151    +/-   ##
========================================
  Coverage     77.6%   77.7%            
========================================
  Files          261     262     +1     
  Lines        30995   31100   +105     
========================================
+ Hits         24083   24181    +98     
- Misses        5406    5411     +5     
- Partials      1506    1508     +2     
Files Coverage Δ
fetch/interface.go 100.0% <ø> (ø)
p2p/server/metrics.go 100.0% <100.0%> (ø)
fetch/fetch.go 83.0% <95.0%> (+0.6%) ⬆️
p2p/server/server.go 82.2% <85.3%> (+4.1%) ⬆️

... and 1 file with indirect coverage changes

@dshulyak dshulyak marked this pull request as ready for review October 17, 2023 07:04
@dshulyak
Copy link
Contributor Author

bors try

bors bot added a commit that referenced this pull request Oct 17, 2023
@bors
Copy link

bors bot commented Oct 17, 2023

try

Build failed:

@dshulyak dshulyak force-pushed the sync/rate-limit-server branch from 8e2a0aa to e668c58 Compare October 21, 2023 07:20
@dshulyak
Copy link
Contributor Author

bors try

bors bot added a commit that referenced this pull request Oct 21, 2023
@bors
Copy link

bors bot commented Oct 21, 2023

try

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@dshulyak
Copy link
Contributor Author

bors try

bors bot added a commit that referenced this pull request Oct 21, 2023
@bors
Copy link

bors bot commented Oct 21, 2023

try

Build failed:

@dshulyak
Copy link
Contributor Author

bors try

bors bot added a commit that referenced this pull request Oct 21, 2023
@bors
Copy link

bors bot commented Oct 21, 2023

try

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@dshulyak
Copy link
Contributor Author

bors try

bors bot added a commit that referenced this pull request Oct 21, 2023
@bors
Copy link

bors bot commented Oct 21, 2023

try

Build failed:

@dshulyak
Copy link
Contributor Author

bors merge

bors bot pushed a commit that referenced this pull request Oct 22, 2023
closes: #4977
closes: #4603

this change introduces two configuration parameter for every server:
- requests per interval pace, for example 10 req/s, this caps the maximum bandwidth that every server can use
- queue size, it is set to serve requests within expected latency. every other request is dropped immediately so that client can retry with different node. currently the timeout is set to 10s, so the queue should be roughly 10 times larger then rps

it doesn't provide global limit for bandwidth, but we have limit for the number of peers. and honest peer doesn't run many concurrent queries. so what we really want to handle is peers with intentionally malicious behavior, but thats not a pressing issue 

example configuration:

```json
"fetch": {
        "servers": {
            "ax/1": {"queue": 10, "requests": 1, "interval": "1s"},
            "ld/1": {"queue": 1000, "requests": 100, "interval": "1s"},
            "hs/1": {"queue": 2000, "requests": 200, "interval": "1s"},
            "mh/1": {"queue": 1000, "requests": 100, "interval": "1s"},
            "ml/1": {"queue": 100, "requests": 10, "interval": "1s"},
            "lp/2": {"queue": 10000, "requests": 1000, "interval": "1s"}
        }
    }
```

https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/fetch/fetch.go#L130-L144


metrics are per server:

https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/p2p/server/metrics.go#L15-L52

have to be enabled for all servers with

```json
"fetch": {
        "servers-metrics": true
    }
```
@bors
Copy link

bors bot commented Oct 22, 2023

Build failed:

@dshulyak
Copy link
Contributor Author

bors merge

bors bot pushed a commit that referenced this pull request Oct 22, 2023
closes: #4977
closes: #4603

this change introduces two configuration parameter for every server:
- requests per interval pace, for example 10 req/s, this caps the maximum bandwidth that every server can use
- queue size, it is set to serve requests within expected latency. every other request is dropped immediately so that client can retry with different node. currently the timeout is set to 10s, so the queue should be roughly 10 times larger then rps

it doesn't provide global limit for bandwidth, but we have limit for the number of peers. and honest peer doesn't run many concurrent queries. so what we really want to handle is peers with intentionally malicious behavior, but thats not a pressing issue 

example configuration:

```json
"fetch": {
        "servers": {
            "ax/1": {"queue": 10, "requests": 1, "interval": "1s"},
            "ld/1": {"queue": 1000, "requests": 100, "interval": "1s"},
            "hs/1": {"queue": 2000, "requests": 200, "interval": "1s"},
            "mh/1": {"queue": 1000, "requests": 100, "interval": "1s"},
            "ml/1": {"queue": 100, "requests": 10, "interval": "1s"},
            "lp/2": {"queue": 10000, "requests": 1000, "interval": "1s"}
        }
    }
```

https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/fetch/fetch.go#L130-L144


metrics are per server:

https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/p2p/server/metrics.go#L15-L52

have to be enabled for all servers with

```json
"fetch": {
        "servers-metrics": true
    }
```
@bors
Copy link

bors bot commented Oct 22, 2023

Build failed:

@dshulyak
Copy link
Contributor Author

bors cancel

@dshulyak
Copy link
Contributor Author

bors merge

bors bot pushed a commit that referenced this pull request Oct 22, 2023
closes: #4977
closes: #4603

this change introduces two configuration parameter for every server:
- requests per interval pace, for example 10 req/s, this caps the maximum bandwidth that every server can use
- queue size, it is set to serve requests within expected latency. every other request is dropped immediately so that client can retry with different node. currently the timeout is set to 10s, so the queue should be roughly 10 times larger then rps

it doesn't provide global limit for bandwidth, but we have limit for the number of peers. and honest peer doesn't run many concurrent queries. so what we really want to handle is peers with intentionally malicious behavior, but thats not a pressing issue 

example configuration:

```json
"fetch": {
        "servers": {
            "ax/1": {"queue": 10, "requests": 1, "interval": "1s"},
            "ld/1": {"queue": 1000, "requests": 100, "interval": "1s"},
            "hs/1": {"queue": 2000, "requests": 200, "interval": "1s"},
            "mh/1": {"queue": 1000, "requests": 100, "interval": "1s"},
            "ml/1": {"queue": 100, "requests": 10, "interval": "1s"},
            "lp/2": {"queue": 10000, "requests": 1000, "interval": "1s"}
        }
    }
```

https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/fetch/fetch.go#L130-L144


metrics are per server:

https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/p2p/server/metrics.go#L15-L52

have to be enabled for all servers with

```json
"fetch": {
        "servers-metrics": true
    }
```
@bors
Copy link

bors bot commented Oct 22, 2023

Pull request successfully merged into develop.

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot changed the title sync: enable rate limiting for servers [Merged by Bors] - sync: enable rate limiting for servers Oct 22, 2023
@bors bors bot closed this Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

optimize atx sync codepath sync: limit resources spent on serving data to peers
2 participants