[Merged by Bors] - sync: enable rate limiting for servers #5151

dshulyak · 2023-10-13T07:59:47Z

closes: #4977
closes: #4603

this change introduces two configuration parameter for every server:

requests per interval pace, for example 10 req/s, this caps the maximum bandwidth that every server can use
queue size, it is set to serve requests within expected latency. every other request is dropped immediately so that client can retry with different node. currently the timeout is set to 10s, so the queue should be roughly 10 times larger then rps

it doesn't provide global limit for bandwidth, but we have limit for the number of peers. and honest peer doesn't run many concurrent queries. so what we really want to handle is peers with intentionally malicious behavior, but thats not a pressing issue

example configuration:

"fetch": {
        "servers": {
            "ax/1": {"queue": 10, "requests": 1, "interval": "1s"},
            "ld/1": {"queue": 1000, "requests": 100, "interval": "1s"},
            "hs/1": {"queue": 2000, "requests": 200, "interval": "1s"},
            "mh/1": {"queue": 1000, "requests": 100, "interval": "1s"},
            "ml/1": {"queue": 100, "requests": 10, "interval": "1s"},
            "lp/2": {"queue": 10000, "requests": 1000, "interval": "1s"}
        }
    }

go-spacemesh/fetch/fetch.go

Lines 130 to 144 in 3cf0214

    
           // serves 1 MB of data 
        
           atxProtocol: {Queue: 10, Requests: 1, Interval: time.Second}, 
        
           // serves 1 KB of data 
        
           lyrDataProtocol: {Queue: 1000, Requests: 100, Interval: time.Second}, 
        
           // serves atxs, ballots, active sets 
        
           // atx - 1 KB 
        
           // ballots > 300 bytes 
        
           // often queried after receiving gossip message 
        
           hashProtocol: {Queue: 2000, Requests: 200, Interval: time.Second}, 
        
           // serves at most 100 hashes - 3KB 
        
           meshHashProtocol: {Queue: 1000, Requests: 100, Interval: time.Second}, 
        
           // serves all malicious ids (id - 32 byte) - 10KB 
        
           malProtocol: {Queue: 100, Requests: 10, Interval: time.Second}, 
        
           // 64 bytes 
        
           OpnProtocol: {Queue: 10000, Requests: 1000, Interval: time.Second},

metrics are per server:

go-spacemesh/p2p/server/metrics.go

Lines 15 to 52 in 3cf0214

    
           targetQueue = metrics.NewGauge( 
        
           	"target_queue", 
        
           	namespace, 
        
           	"target size of the queue", 
        
           	[]string{protoLabel}, 
        
           ) 
        
           queue = metrics.NewGauge( 
        
           	"queue", 
        
           	namespace, 
        
           	"actual size of the queue", 
        
           	[]string{protoLabel}, 
        
           ) 
        
           targetRps = metrics.NewGauge( 
        
           	"rps", 
        
           	namespace, 
        
           	"target requests per second", 
        
           	[]string{protoLabel}, 
        
           ) 
        
           requests = metrics.NewCounter( 
        
           	"requests", 
        
           	namespace, 
        
           	"requests counter", 
        
           	[]string{protoLabel, "state"}, 
        
           ) 
        
           clientLatency = metrics.NewHistogramWithBuckets( 
        
           	"client_latency_seconds", 
        
           	namespace, 
        
           	"latency since initiating a request", 
        
           	[]string{protoLabel, "result"}, 
        
           	prometheus.ExponentialBuckets(0.01, 2, 10), 
        
           ) 
        
           serverLatency = metrics.NewHistogramWithBuckets( 
        
           	"server_latency_seconds", 
        
           	namespace, 
        
           	"latency since accepting new stream", 
        
           	[]string{protoLabel}, 
        
           	prometheus.ExponentialBuckets(0.01, 2, 10), 
        
           )

have to be enabled for all servers with

"fetch": {
        "servers-metrics": true
    }

codecov · 2023-10-13T08:10:33Z

Codecov Report

Merging #5151 (e2b4fff) into develop (b168b41) will increase coverage by 0.0%.
The diff coverage is 89.3%.

@@           Coverage Diff            @@
##           develop   #5151    +/-   ##
========================================
  Coverage     77.6%   77.7%            
========================================
  Files          261     262     +1     
  Lines        30995   31100   +105     
========================================
+ Hits         24083   24181    +98     
- Misses        5406    5411     +5     
- Partials      1506    1508     +2

Files	Coverage Δ
fetch/interface.go	`100.0% <ø> (ø)`
p2p/server/metrics.go	`100.0% <100.0%> (ø)`
fetch/fetch.go	`83.0% <95.0%> (+0.6%)`	⬆️
p2p/server/server.go	`82.2% <85.3%> (+4.1%)`	⬆️

... and 1 file with indirect coverage changes

dshulyak · 2023-10-17T07:04:35Z

bors try

bors · 2023-10-17T07:34:04Z

try

Build failed:

ci-status

dshulyak · 2023-10-21T07:20:07Z

bors try

bors · 2023-10-21T08:10:05Z

try

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

dshulyak · 2023-10-21T08:25:54Z

bors try

bors · 2023-10-21T08:38:58Z

try

Build failed:

ci-status

dshulyak · 2023-10-21T12:59:12Z

bors try

bors · 2023-10-21T13:46:21Z

try

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

dshulyak · 2023-10-21T14:32:07Z

bors try

bors · 2023-10-21T15:17:31Z

try

Build failed:

systest-status

This reverts commit e668c58.

dshulyak · 2023-10-22T05:16:03Z

bors merge

closes: #4977 closes: #4603 this change introduces two configuration parameter for every server: - requests per interval pace, for example 10 req/s, this caps the maximum bandwidth that every server can use - queue size, it is set to serve requests within expected latency. every other request is dropped immediately so that client can retry with different node. currently the timeout is set to 10s, so the queue should be roughly 10 times larger then rps it doesn't provide global limit for bandwidth, but we have limit for the number of peers. and honest peer doesn't run many concurrent queries. so what we really want to handle is peers with intentionally malicious behavior, but thats not a pressing issue example configuration: ```json "fetch": { "servers": { "ax/1": {"queue": 10, "requests": 1, "interval": "1s"}, "ld/1": {"queue": 1000, "requests": 100, "interval": "1s"}, "hs/1": {"queue": 2000, "requests": 200, "interval": "1s"}, "mh/1": {"queue": 1000, "requests": 100, "interval": "1s"}, "ml/1": {"queue": 100, "requests": 10, "interval": "1s"}, "lp/2": {"queue": 10000, "requests": 1000, "interval": "1s"} } } ``` https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/fetch/fetch.go#L130-L144 metrics are per server: https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/p2p/server/metrics.go#L15-L52 have to be enabled for all servers with ```json "fetch": { "servers-metrics": true } ```

bors · 2023-10-22T05:59:37Z

Build failed:

systest-status

dshulyak · 2023-10-22T06:47:46Z

bors merge

closes: #4977 closes: #4603 this change introduces two configuration parameter for every server: - requests per interval pace, for example 10 req/s, this caps the maximum bandwidth that every server can use - queue size, it is set to serve requests within expected latency. every other request is dropped immediately so that client can retry with different node. currently the timeout is set to 10s, so the queue should be roughly 10 times larger then rps it doesn't provide global limit for bandwidth, but we have limit for the number of peers. and honest peer doesn't run many concurrent queries. so what we really want to handle is peers with intentionally malicious behavior, but thats not a pressing issue example configuration: ```json "fetch": { "servers": { "ax/1": {"queue": 10, "requests": 1, "interval": "1s"}, "ld/1": {"queue": 1000, "requests": 100, "interval": "1s"}, "hs/1": {"queue": 2000, "requests": 200, "interval": "1s"}, "mh/1": {"queue": 1000, "requests": 100, "interval": "1s"}, "ml/1": {"queue": 100, "requests": 10, "interval": "1s"}, "lp/2": {"queue": 10000, "requests": 1000, "interval": "1s"} } } ``` https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/fetch/fetch.go#L130-L144 metrics are per server: https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/p2p/server/metrics.go#L15-L52 have to be enabled for all servers with ```json "fetch": { "servers-metrics": true } ```

bors · 2023-10-22T07:09:51Z

Build failed:

ci-status

dshulyak · 2023-10-22T07:20:58Z

bors cancel

dshulyak · 2023-10-22T07:21:03Z

bors merge

closes: #4977 closes: #4603 this change introduces two configuration parameter for every server: - requests per interval pace, for example 10 req/s, this caps the maximum bandwidth that every server can use - queue size, it is set to serve requests within expected latency. every other request is dropped immediately so that client can retry with different node. currently the timeout is set to 10s, so the queue should be roughly 10 times larger then rps it doesn't provide global limit for bandwidth, but we have limit for the number of peers. and honest peer doesn't run many concurrent queries. so what we really want to handle is peers with intentionally malicious behavior, but thats not a pressing issue example configuration: ```json "fetch": { "servers": { "ax/1": {"queue": 10, "requests": 1, "interval": "1s"}, "ld/1": {"queue": 1000, "requests": 100, "interval": "1s"}, "hs/1": {"queue": 2000, "requests": 200, "interval": "1s"}, "mh/1": {"queue": 1000, "requests": 100, "interval": "1s"}, "ml/1": {"queue": 100, "requests": 10, "interval": "1s"}, "lp/2": {"queue": 10000, "requests": 1000, "interval": "1s"} } } ``` https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/fetch/fetch.go#L130-L144 metrics are per server: https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/p2p/server/metrics.go#L15-L52 have to be enabled for all servers with ```json "fetch": { "servers-metrics": true } ```

bors · 2023-10-22T08:08:59Z

Pull request successfully merged into develop.

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

dshulyak added 17 commits October 11, 2023 10:26

integrate peers tracker

57f8202

Merge branch 'develop' into sync/priotize-fast

3f5b776

add coverage

a102583

add doc

2ac58c8

Merge branch 'develop' into sync/priotize-fast

e0e3cfa

add benchmark to select 10 peers from 10000

379aee2

review

138b8ec

store rate on struct

1c93431

add changelog

0fbcb58

Merge branch 'develop' into sync/priotize-fast

4cb8f33

first version

b2701a2

import

b0c679d

support configuration for queue and rps

a7372b3

add a sanity test

d605cfe

add metrics

5c8a818

debug and configure metrics

44910ee

Merge branch 'develop' into sync/rate-limit-server

f332bc5

dshulyak added 5 commits October 16, 2023 10:03

set hard limit for goroutines

b814cde

set defaults

3517793

Merge branch 'develop' into sync/rate-limit-server

679ee80

linter

3cf0214

go mod tidyy

bb0975f

dshulyak marked this pull request as ready for review October 17, 2023 07:04

dshulyak requested review from countvonzero, fasmat and poszu as code owners October 17, 2023 07:04

bors bot added a commit that referenced this pull request Oct 17, 2023

Try #5151:

b14fc0f

dont prune too earlyy

e668c58

dshulyak force-pushed the sync/rate-limit-server branch from 8e2a0aa to e668c58 Compare October 21, 2023 07:20

bors bot added a commit that referenced this pull request Oct 21, 2023

Try #5151:

2150285

dshulyak added 2 commits October 21, 2023 10:24

wait for protocols correctly and compute rps

d9ffcc7

Merge branch 'develop' into sync/rate-limit-server

b1942d2

bors bot added a commit that referenced this pull request Oct 21, 2023

Try #5151:

e983e01

adjust test

9622d85

bors bot added a commit that referenced this pull request Oct 21, 2023

Try #5151:

130edee

bors bot added a commit that referenced this pull request Oct 21, 2023

Try #5151:

5455c90

Revert "dont prune too earlyy"

e2b4fff

This reverts commit e668c58.

bors bot changed the title ~~sync: enable rate limiting for servers~~ [Merged by Bors] - sync: enable rate limiting for servers Oct 22, 2023

bors bot closed this Oct 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Merged by Bors] - sync: enable rate limiting for servers #5151

[Merged by Bors] - sync: enable rate limiting for servers #5151

dshulyak commented Oct 13, 2023 •

edited

Loading

codecov bot commented Oct 13, 2023 •

edited

Loading

dshulyak commented Oct 17, 2023

bors bot commented Oct 17, 2023

dshulyak commented Oct 21, 2023

bors bot commented Oct 21, 2023

dshulyak commented Oct 21, 2023

bors bot commented Oct 21, 2023

dshulyak commented Oct 21, 2023

bors bot commented Oct 21, 2023

dshulyak commented Oct 21, 2023

bors bot commented Oct 21, 2023

dshulyak commented Oct 22, 2023

bors bot commented Oct 22, 2023

dshulyak commented Oct 22, 2023

bors bot commented Oct 22, 2023

dshulyak commented Oct 22, 2023

dshulyak commented Oct 22, 2023

bors bot commented Oct 22, 2023

	// serves 1 MB of data
	atxProtocol: {Queue: 10, Requests: 1, Interval: time.Second},
	// serves 1 KB of data
	lyrDataProtocol: {Queue: 1000, Requests: 100, Interval: time.Second},
	// serves atxs, ballots, active sets
	// atx - 1 KB
	// ballots > 300 bytes
	// often queried after receiving gossip message
	hashProtocol: {Queue: 2000, Requests: 200, Interval: time.Second},
	// serves at most 100 hashes - 3KB
	meshHashProtocol: {Queue: 1000, Requests: 100, Interval: time.Second},
	// serves all malicious ids (id - 32 byte) - 10KB
	malProtocol: {Queue: 100, Requests: 10, Interval: time.Second},
	// 64 bytes
	OpnProtocol: {Queue: 10000, Requests: 1000, Interval: time.Second},

	targetQueue = metrics.NewGauge(
	"target_queue",
	namespace,
	"target size of the queue",
	[]string{protoLabel},
	)
	queue = metrics.NewGauge(
	"queue",
	namespace,
	"actual size of the queue",
	[]string{protoLabel},
	)
	targetRps = metrics.NewGauge(
	"rps",
	namespace,
	"target requests per second",
	[]string{protoLabel},
	)
	requests = metrics.NewCounter(
	"requests",
	namespace,
	"requests counter",
	[]string{protoLabel, "state"},
	)
	clientLatency = metrics.NewHistogramWithBuckets(
	"client_latency_seconds",
	namespace,
	"latency since initiating a request",
	[]string{protoLabel, "result"},
	prometheus.ExponentialBuckets(0.01, 2, 10),
	)
	serverLatency = metrics.NewHistogramWithBuckets(
	"server_latency_seconds",
	namespace,
	"latency since accepting new stream",
	[]string{protoLabel},
	prometheus.ExponentialBuckets(0.01, 2, 10),
	)

[Merged by Bors] - sync: enable rate limiting for servers #5151

[Merged by Bors] - sync: enable rate limiting for servers #5151

Conversation

dshulyak commented Oct 13, 2023 • edited Loading

codecov bot commented Oct 13, 2023 • edited Loading

Codecov Report

dshulyak commented Oct 17, 2023

bors bot commented Oct 17, 2023

try

dshulyak commented Oct 21, 2023

bors bot commented Oct 21, 2023

try

dshulyak commented Oct 21, 2023

bors bot commented Oct 21, 2023

try

dshulyak commented Oct 21, 2023

bors bot commented Oct 21, 2023

try

dshulyak commented Oct 21, 2023

bors bot commented Oct 21, 2023

try

dshulyak commented Oct 22, 2023

bors bot commented Oct 22, 2023

dshulyak commented Oct 22, 2023

bors bot commented Oct 22, 2023

dshulyak commented Oct 22, 2023

dshulyak commented Oct 22, 2023

bors bot commented Oct 22, 2023

dshulyak commented Oct 13, 2023 •

edited

Loading

codecov bot commented Oct 13, 2023 •

edited

Loading