relay_v2 performance anomaly on small machines #2619

winksaville · 2022-04-19T18:26:26Z

winksaville
Apr 19, 2022

I’ve had trouble getting the hole_punching working. Specifically, I run libp2p-lookup direct --address /ip4/$RELAY_SERVER_IP/tcp/4001 in step 3 of the “Setting up the relay server” section it fails sometimes.

The first time I run relay_v2 and then used libp2p-lookup it works. But, most of the time, if I stop relay_v2 and restart it and then try libp2p-lookup again I get a Lookup failed: Timeout error.

Here are the steps I use to reproduce the problem.

I built libp2p master on a ubuntu 20.04 virtualbox:

wink@ubuntu-20-04-3 22-04-19T15:19:49.177Z:~/prgs/rust/myrepos/rust-libp2p (master)
$ git log -1 --pretty=oneline
22fbce34d5c2c3c7fccaf7c7ce9f80a92406a641 (HEAD -> master, origin/master, origin/HEAD) *: Fix clippy warnings (#2615)

wink@ubuntu-20-04-3 22-04-19T15:21:02.106Z:~/prgs/rust/myrepos/rust-libp2p (master)
$ rustc --version
rustc 1.60.0 (7737e0b5c 2022-04-04)

wink@ubuntu-20-04-3 22-04-19T15:22:15.514Z:~/prgs/rust/myrepos/rust-libp2p (master)
$ cargo build --example relay_v2 -p libp2p-relay
    Finished dev [unoptimized + debuginfo] target(s) in 0.21s

Copied it to the smallest cheapest Digital Ocean single CPU VM which is named do-sfo3-01

wink@ubuntu-20-04-3 22-04-19T15:22:32.752Z:~/prgs/rust/myrepos/rust-libp2p (master)
$ scp target/debug/examples/relay_v2 do-sfo3-01:~/bin/relay_v2.master-22fbce34.stable-1.60.0-7737e0b5c.debug
relay_v2

rebooted do-sfo3-01

wink@do-sfo3-01 22-04-19T16:17:30.029Z:~
$ sudo reboot now
[sudo] password for wink: 
Connection to 164.92.118.108 closed by remote host.
Connection to 164.92.118.108 closed.
wink@3900x 22-04-19T16:17:40.926Z:~

log back in:

wink@3900x 22-04-19T16:17:40.926Z:~
$ ssh do-sfo3-01 
Welcome to Ubuntu 20.04.4 LTS (GNU/Linux 5.4.0-107-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Tue Apr 19 16:18:02 UTC 2022

  System load:           0.92
  Usage of /:            17.3% of 24.06GB
  Memory usage:          21%
  Swap usage:            0%
  Processes:             116
  Users logged in:       0
  IPv4 address for eth0: 164.92.118.108
  IPv4 address for eth0: 10.48.0.5
  IPv6 address for eth0: 2604:a880:4:1d0::60b:0
  IPv4 address for eth1: 10.124.0.2

0 updates can be applied immediately.

Last login: Tue Apr 19 15:52:27 2022 from 23.119.164.150
wink@do-sfo3-01 22-04-19T16:18:04.313Z:~

Run relay_v2:

wink@do-sfo3-01 22-04-19T16:18:04.313Z:~
$ relay_v2.master-22fbce34.stable-1.60.0-7737e0b5c.debug --port 4001 --secret-key-seed 0
opt: Opt { use_ipv6: None, secret_key_seed: 0, port: 4001 }
Local peer id: PeerId("12D3KooWDpJ7As7BWAwRMfu1VU2WCqNjvq387JEYKDBj4kx6nXTN")
Listening on "/ip4/127.0.0.1/tcp/4001"
Listening on "/ip4/164.92.118.108/tcp/4001"
Listening on "/ip4/10.48.0.5/tcp/4001"
Listening on "/ip4/10.124.0.2/tcp/4001"

And then from my desktop, 3900x, I used libp2p-lookup to access relay_v2 and all is well:

wink@3900x 22-04-19T16:18:49.675Z:~
$ libp2p-lookup direct --address /ip4/164.92.118.108/tcp/4001
Lookup for peer with id PeerId("12D3KooWDpJ7As7BWAwRMfu1VU2WCqNjvq387JEYKDBj4kx6nXTN") succeeded.

Protocol version: "/TODO/0.0.1"
Agent version: "rust-libp2p/0.36.0"
Observed address: "/ip4/23.119.164.150/tcp/53584"
Listen addresses:
	- "/ip4/127.0.0.1/tcp/4001"
	- "/ip4/164.92.118.108/tcp/4001"
	- "/ip4/10.48.0.5/tcp/4001"
	- "/ip4/10.124.0.2/tcp/4001"
Protocols:
	- "/libp2p/circuit/relay/0.2.0/hop"
	- "/ipfs/ping/1.0.0"
	- "/ipfs/id/1.0.0"
	- "/ipfs/id/push/1.0.0"

wink@3900x 22-04-19T16:18:54.148Z:~

I then used wireshark on the desktop to grab the contents on the wire. Everything looks fine!

I then stop relay_v2 using Ctrl-C and run it again:

^C
wink@do-sfo3-01 22-04-19T16:19:00.012Z:~
$ relay_v2.master-22fbce34.stable-1.60.0-7737e0b5c.debug --port 4001 --secret-key-seed 0
opt: Opt { use_ipv6: None, secret_key_seed: 0, port: 4001 }
Local peer id: PeerId("12D3KooWDpJ7As7BWAwRMfu1VU2WCqNjvq387JEYKDBj4kx6nXTN")
Listening on "/ip4/127.0.0.1/tcp/4001"
Listening on "/ip4/164.92.118.108/tcp/4001"
Listening on "/ip4/10.48.0.5/tcp/4001"
Listening on "/ip4/10.124.0.2/tcp/4001"

I then run libp2p-lookup a second time from my desktop, but this time we see Lookup failed: Timeout:

wink@3900x 22-04-19T16:18:54.148Z:~
$ libp2p-lookup direct --address /ip4/164.92.118.108/tcp/4001
[2022-04-19T16:19:35Z ERROR libp2p_lookup] Lookup failed: Timeout.
wink@3900x 22-04-19T16:19:35.871Z:~
$

Here is the wireshark screenshot:

And here is the .zip pcapng file:
wireshark-timeout-libp2p-relay_v2.all.pcapng.zip

My interpretation of packet 32 is that do-sfo3-01 received the packet and that there is a performance issue with relay_v2 that it could not "accept" the packet. Further more, the release version with excessive trace! logging can fail the same way, although typically it does work.

Anyway, with the debugging I've done it seems to me there is too much polling being done on the main thread. My suggested fix is that the epoll code should be on high priority thread all by itself. Of course, this is the opinion of a retired programmer with no-expertise with libp2p :)

mxinden · 2022-04-22T21:00:29Z

mxinden
Apr 22, 2022
Collaborator

Very much appreciate the detailed bug report @winksaville. Sorry for the late reply.

Can you post the logs when you run the relay server with RUST_LOG=debug?

Of course, this is the opinion of a retired programmer with no-expertise with libp2p :)

:D The help is very welcome!

0 replies

winksaville · 2022-04-22T21:34:17Z

winksaville
Apr 22, 2022
Author

Below is the log captured after:

wink@do-sfo3-01 22-04-22T21:19:29.897Z:~
$ sudo reboot now
[sudo] password for wink: 
Connection to 164.92.118.108 closed by remote host.
Connection to 164.92.118.108 closed.
wink@3900x 22-04-22T21:19:42.785Z:~
$ ssh do-sfo3-01 
Welcome to Ubuntu 20.04.4 LTS (GNU/Linux 5.4.0-109-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

 System information disabled due to load higher than 1.0

0 updates can be applied immediately.


Last login: Fri Apr 22 21:03:37 2022 from 23.119.164.150
wink@do-sfo3-01 22-04-22T21:19:59.918Z:~
$ RUST_LOG=debug relay_v2.master-22fbce.stable-1.60.0-7737e0.debug --port 4001 --secret-key-seed 0 &> relay_v2.master-22fbce.stable-1.60.0-7737e0.debug.1.rust_log-debug.log
^C
wink@do-sfo3-01 22-04-22T21:26:12.237Z:~
$ zip relay_v2.master-22fbce.stable-1.60.0-7737e0.debug.1.rust_log-debug.log.zip relay_v2.master-22fbce.stable-1.60.0-7737e0.debug.1.rust_log-debug.log
  adding: relay_v2.master-22fbce.stable-1.60.0-7737e0.debug.1.rust_log-debug.log (deflated 100%)
wink@do-sfo3-01 22-04-22T21:27:36.179Z:~
$ scp relay_v2.master-22fbce.stable-1.60.0-7737e0.debug.1.rust_log-debug.log.zip 3900xz:~/
relay_v2.master-22fbce.stable-1.60.0-7737e0.debug.1.rust_log-debug.log.zip                                                                                             100%  624KB   7.8MB/s   00:00

Here is doing the libp2p-lookup on 3900x:

wink@3900x 22-04-22T21:04:54.544Z:~/prgs/downloads/virtualbox
$ libp2p-lookup direct --address /ip4/164.92.118.108/tcp/4001
[2022-04-22T21:21:34Z ERROR libp2p_lookup] Lookup failed: Timeout.
wink@3900x 22-04-22T21:21:34.473Z:~/prgs/downloads/virtualbox

relay_v2.master-22fbce.stable-1.60.0-7737e0.debug.1.rust_log-debug.log.zip

0 replies

winksaville · 2022-04-22T21:41:09Z

winksaville
Apr 22, 2022
Author

For good measure here is a trace log, again taken after booting:

wink@do-sfo3-01 22-04-22T21:34:33.313Z:~
$ sudo reboot now
Connection to 164.92.118.108 closed by remote host.
Connection to 164.92.118.108 closed.
wink@3900x 22-04-22T21:34:54.992Z:~
$ ssh do-sfo3-01 
Welcome to Ubuntu 20.04.4 LTS (GNU/Linux 5.4.0-109-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

 System information disabled due to load higher than 1.0

0 updates can be applied immediately.


Last login: Fri Apr 22 21:19:59 2022 from 23.119.164.150
wink@do-sfo3-01 22-04-22T21:35:00.221Z:~
$ RUST_LOG=trace relay_v2.master-22fbce.stable-1.60.0-7737e0.debug --port 4001 --secret-key-seed 0 &> relay_v2.master-22fbce.stable-1.60.0-7737e0.debug.1.rust_log-trace.log
^C
wink@do-sfo3-01 22-04-22T21:35:52.570Z:~
$ zip relay_v2.master-22fbce.stable-1.60.0-7737e0.debug.1.rust_log-trace.log.zip relay_v2.master-22fbce.stable-1.60.0-7737e0.debug.1.rust_log-trace.log
  adding: relay_v2.master-22fbce.stable-1.60.0-7737e0.debug.1.rust_log-trace.log (deflated 99%)
wink@do-sfo3-01 22-04-22T21:37:10.826Z:~
$ scp relay_v2.master-22fbce.stable-1.60.0-7737e0.debug.1.rust_log-trace.log.zip 3900xz:~/
relay_v2.master-22fbce.stable-1.60.0-7737e0.debug.1.rust_log-trace.log.zip

The lookup:

wink@3900x 22-04-22T21:21:34.473Z:~/prgs/downloads/virtualbox
$ libp2p-lookup direct --address /ip4/164.92.118.108/tcp/4001
[2022-04-22T21:35:49Z ERROR libp2p_lookup] Lookup failed: Timeout.
wink@3900x 22-04-22T21:35:49.859Z:~/prgs/downloads/virtualbox

relay_v2.master-22fbce.stable-1.60.0-7737e0.debug.1.rust_log-trace.log.zip

0 replies

winksaville · 2022-04-22T22:16:56Z

winksaville
Apr 22, 2022
Author

One other tidbit, which I'm sure you know, but in the offhand case you don't. The timeout occurs 20secs after starting. I added the UTC time to my prompt so if you look at the prompt after the timeout and subtract 20secs that will be the approximate time when the initial request from my desktop, 3900x, would be received by device running relay, do-sfo3-01. So looking at the two timeouts:

wink@3900x 22-04-22T21:04:54.544Z:~/prgs/downloads/virtualbox
$ libp2p-lookup direct --address /ip4/164.92.118.108/tcp/4001
[2022-04-22T21:21:34Z ERROR libp2p_lookup] Lookup failed: Timeout.
wink@3900x 22-04-22T21:21:34.473Z:~/prgs/downloads/virtualbox
$ libp2p-lookup direct --address /ip4/164.92.118.108/tcp/4001
[2022-04-22T21:35:49Z ERROR libp2p_lookup] Lookup failed: Timeout.
wink@3900x 22-04-22T21:35:49.859Z:~/prgs/downloads/virtualbox

If you take 21:21:34 - 20 = 21:21:14 this should be about when the RUST_LOG=debug run would have received the connect. Open relay_v2.master-22fbce.stable-1.60.0-7737e0.debug.1.rust_log-debug.log in the editor or use a grep tool and search for 21:21:14. Bingo, we see a 6 second gap between line 151 and 152. This is where relay_v2 was waiting (in epoll) for a connection:

wink@3900x 22-04-22T22:02:32.901Z:~
$ rg -C 5 -m 1 -n '21:21:14' relay_v2.master-22fbce.stable-1.60.0-7737e0.debug.1.rust_log-debug.log
147-[2022-04-22T21:21:08Z DEBUG netlink_proto::codecs] NetlinkCodec: decoding next message
148-[2022-04-22T21:21:08Z DEBUG netlink_proto::connection] forwarding unsolicited messages to the connection handle
149-[2022-04-22T21:21:08Z DEBUG netlink_proto::connection] forwaring responses to previous requests to the connection handle
150-[2022-04-22T21:21:08Z DEBUG netlink_proto::connection] handling requests
151-[2022-04-22T21:21:08Z DEBUG netlink_proto::connection] sending messages
152:[2022-04-22T21:21:14Z DEBUG netlink_proto::connection] reading incoming messages
153:[2022-04-22T21:21:14Z DEBUG netlink_proto::codecs] NetlinkCodec: decoding next message
154:[2022-04-22T21:21:14Z DEBUG netlink_proto::connection] forwarding unsolicited messages to the connection handle
155:[2022-04-22T21:21:14Z DEBUG netlink_proto::connection] forwaring responses to previous requests to the connection handle
156:[2022-04-22T21:21:14Z DEBUG netlink_proto::connection] handling requests
157:[2022-04-22T21:21:14Z DEBUG netlink_proto::connection] sending messages

There is more information in the RUST_LOG=trace run, so here is that information. And we see a 6 second gap (weird I was so consistent) between lines 472 and 473:

wink@3900x 22-04-22T22:04:50.638Z:~
$ rg -C 5 -m 1 -n '21:35:29' relay_v2.master-22fbce.stable-1.60.0-7737e0.debug.1.rust_log-trace.log
468-[2022-04-22T21:35:23Z DEBUG netlink_proto::connection] sending messages
469-[2022-04-22T21:35:23Z TRACE netlink_proto::connection] poll_send_messages called
470-[2022-04-22T21:35:23Z TRACE netlink_proto::connection] poll_send_messages done
471-[2022-04-22T21:35:23Z TRACE netlink_proto::connection] poll_flush called
472-[2022-04-22T21:35:23Z TRACE netlink_proto::connection] done polling Connection
473:[2022-04-22T21:35:29Z TRACE polling::epoll] new events: epoll_fd=4, res=1
474:[2022-04-22T21:35:29Z TRACE polling::epoll] modify: epoll_fd=4, fd=5, ev=Event { key: 18446744073709551615, readable: true, writable: false }
475:[2022-04-22T21:35:29Z TRACE async_io::reactor] react: 1 ready wakers
476:[2022-04-22T21:35:29Z TRACE async_io::driver] main_loop: waiting on I/O
477:[2022-04-22T21:35:29Z TRACE async_io::reactor] process_timers: 0 ready wakers
478:[2022-04-22T21:35:29Z TRACE polling] Poller::wait(_, None)

0 replies

winksaville · 2022-04-23T03:23:54Z

winksaville
Apr 23, 2022
Author

One other observation, looking at your libp2p-perf benchmark, I think there is something wrong when go-libp2p is twice as fast as rustlibp2p. Has the reason for this been investigated?

0 replies

winksaville · 2022-04-25T15:09:09Z

winksaville
Apr 25, 2022
Author

@mxinden , Ping to bump this up in your inbox :)

0 replies

mxinden · 2022-04-27T12:52:59Z

mxinden
Apr 27, 2022
Collaborator

Sorry for the late reply here. I haven't found time to debug this yet.

One other observation, looking at your libp2p-perf benchmark, I think there is something wrong when go-libp2p is twice as fast as rustlibp2p. Has the reason for this been investigated?

Note that the numbers on the libp2p-perf page are based on a run on localhost only. I would question whether they have much value beyond comparing concrete changes. I encourage folks to run libp2p-perf between two machines with reasonable network delays. This is not to say that rust-libp2p is slower or faster than go-libp2p.

Cross referencing #2601 here which might be related.

My first suspicion is that we are missing a wake up somewhere. In case you do want to investigate further, would you mind testing with https://github.com/mxinden/rust-libp2p-server/ as a relay server?

0 replies

winksaville · 2022-04-28T05:20:19Z

winksaville
Apr 28, 2022
Author

I understand this doesn't have high priority. I'm willing to do experiments/investigations so if you can spend a small amount of time reviewing results and suggesting additional E/I's I think we can make progress.

It does seem we're missing wake up's. In one experiment I saw a wake up happen shortly after the 20sec timeout, but since the initiator had timed out it was to late and the connection couldn't be established.

Anyway, I didn't get a chance to run your rust-libp2p-server today as I spent the day trying to learn more about how the server polls, didn't make much progress but have a little better feel. I'll try your server tomorrow.

0 replies

winksaville · 2022-04-28T15:35:28Z

winksaville
Apr 28, 2022
Author

Your server has a Timeout error too:

wink@3900x 22-04-28T15:20:03.354Z:~
$ libp2p-lookup direct --address /ip4/164.92.118.108/tcp/4001
[2022-04-28T15:20:32Z ERROR libp2p_lookup] Lookup failed: Timeout.
wink@3900x 22-04-28T15:20:32.552Z:~

I forked your repo and using master:

wink@ubuntu-20-04-3 22-04-28T15:16:37.244Z:~/prgs/rust/myrepos/mxinden-libp2p-server (master)
$ git log -1 --pretty=oneline
f88de40848e70000957b980f240f51791fd0e078 (HEAD -> master, origin/master, origin/HEAD) Merge pull request #63 from mxinden/dependabot/cargo/libp2p-0.44.0
wink@ubuntu-20-04-3 22-04-28T15:31:31.134Z:~/prgs/rust/myrepos/mxinden-libp2p-server (master)

Built it with stable 1.60.0:

wink@ubuntu-20-04-3 22-04-28T15:15:52.453Z:~
$ rustup show
Default host: x86_64-unknown-linux-gnu
rustup home:  /home/wink/.rustup

installed toolchains
--------------------

stable-x86_64-unknown-linux-gnu (default)
nightly-x86_64-unknown-linux-gnu

active toolchain
----------------

stable-x86_64-unknown-linux-gnu (default)
rustc 1.60.0 (7737e0b5c 2022-04-04)

wink@ubuntu-20-04-3 22-04-28T15:32:25.994Z:~

And did a debug build:

wink@ubuntu-20-04-3 22-04-28T15:12:44.348Z:~/prgs/rust/myrepos/mxinden-libp2p-server (master)
$ cargo build
  Downloaded byte-pool v0.2.3
  Downloaded crypto-mac v0.10.1
  Downloaded crossbeam-queue v0.3.2
...
   Compiling libp2p-request-response v0.17.0
   Compiling libp2p-metrics v0.5.0
   Compiling libp2p v0.44.0
   Compiling libp2p-server v0.5.4 (/home/wink/prgs/rust/myrepos/mxinden-libp2p-server)
    Finished dev [unoptimized + debuginfo] target(s) in 39.07s
wink@ubuntu-20-04-3 22-04-28T15:13:33.894Z:~/prgs/rust/myrepos/mxinden-libp2p-server (master)

I then copied it to the small server on digital ocean:

wink@ubuntu-20-04-3 22-04-28T15:16:18.903Z:~/prgs/rust/myrepos/mxinden-libp2p-server (master)
$ scp target/debug/libp2p-server do-sfo3-01:~/bin/mxinden-libp2p-server.master-f88de4.stable-1.60.0-7737e0.debug
libp2p-server                                                                                                                                                   100%  203MB  94.8MB/s   00:02    
wink@ubuntu-20-04-3 22-04-28T15:16:37.244Z:~/prgs/rust/myrepos/mxinden-libp2p-server (master)

Here is how I ran it with RUST_LOG=trace piping the output to a file:

wink@do-sfo3-01 22-04-28T15:17:04.945Z:~
$ RUST_LOG=trace mxinden-libp2p-server.master-f88de4.stable-1.60.0-7737e0.debug &> mxinden-libp2p-server.master-f88de4.stable-1.60.0-7737e0.debug.1.rust_log-trace.log
^C
wink@do-sfo3-01 22-04-28T15:20:40.360Z:~

Here is the log, I've not even looked at it yet, but it behaves exactly like the example/relay_v2 server.

mxinden-libp2p-server.master-f88de4.stable-1.60.0-7737e0.debug.1.rust_log-trace.log.zip

0 replies

winksaville · 2022-04-28T16:19:31Z

winksaville
Apr 28, 2022
Author

My initial hypothesis was that the epoll code was on the main thread and should be moved to a separate high priority thread. That was wrong. It is on it's own thread, which I've determined by outputing the current thread id, although probably not high priority, but I haven't checked:

    /// Waits for I/O events with an optional timeout.
    pub fn wait(&self, events: &mut Events, timeout: Option<Duration>) -> io::Result<()> {
        log::trace!("wait:+ tid={} epoll_fd={}, timeout={:?}", std::thread::current().id().as_u64(), self.epoll_fd, timeout);

And here is the log where you can see the swarm is on tid=1 and epoll is on tid=4:

[2022-04-28T04:47:55.020893Z TRACE libp2p_swarm] Swarm<TBehavior>::poll_next_event:- tid=1 behaviour_poll Poll::Pending listeners_not_ready: true && connections_not_ready: true, res.is_ready: false
[2022-04-28T04:47:55.020896Z TRACE libp2p_swarm] Swarm<TBehavior>::Stream::poll_next:- tid=1 res.is_ready(): false
[2022-04-28T04:47:55.020938Z TRACE async_io::driver] main_loop: waiting on I/O
[2022-04-28T04:47:55.020943Z TRACE async_io::reactor] process_timers: 0 ready wakers
[2022-04-28T04:47:55.020946Z TRACE polling] Poller::wait(_, None)
[2022-04-28T04:47:55.020948Z TRACE polling::epoll] wait:+ tid=4 epoll_fd=4, timeout=None

I still "feel" there is too much polling, but I think its because the information epoll it can provide isn't being used. It seems to me the server is using epoll simply as a "signal" that something is ready and then server then starts polling everything to find out what that was.

This is just a feeling/guess from my interpretation of what I see in the extra verbose logs I've created and could be totally wrong.

0 replies

mxinden · 2022-05-03T09:12:35Z

mxinden
May 3, 2022
Collaborator

A couple of notes, again thanks for the follow-ups.

You could try running rust-libp2p-server with --enable-kademlia, just to force more wake-ups across the stack.

Yes, polling on the main event loop is not ideal, though conceptually it should not include a lot of work, given that most work is offloaded to the per connection tasks. A wild guess is that the many wake-ups stem from the interface watching happening on the main event loop (

rust-libp2p/transports/tcp/src/lib.rs

Lines 467 to 472 in f46fecd

    
               /// The IP addresses of network interfaces on which the listening socket 
        
               /// is accepting connections. 
        
               /// 
        
               /// If the listen socket listens on all interfaces, these may change over 
        
               /// time as interfaces become available or unavailable. 
        
               in_addr: InAddr<T::IfWatcher>,

).

Currently NetworkBehaviours are all polled once one of them wakes up. Long term we should do neat tricks like FuturesUnordered does in order to only poll the one able to make progress.
Just tested the tutorial once more on my setup. Still not able to reproduce your failures :/

0 replies

winksaville · 2022-05-03T19:24:09Z

winksaville
May 3, 2022
Author

Based on the suggestion at the Community call today, I tested a version of relay_v2 with Tokio. I ran relay_v2-tokio with RUST_LOG=trace using release and debug builds. As expected both responded to a libp2p-lookup without problem.

Here are the quick and dirty changes I made as suggested in the call:

This further suggests the bug is in async-io. I'm going to still pursue finding a fix for my own education and I'm assuming it would be appreciated by libp2p as well as async-io.

It was great seeing and talking with everyone today, TXS!

1 reply

mxinden May 3, 2022
Collaborator

Thanks for the testing and the update here.

I'm going to still pursue finding a fix for my own education and I'm assuming it would be appreciated by libp2p as well as async-io.

Very much appreciated. As of today we want to continue supporting async-io and thus a fix would help.

It was great seeing and talking with everyone today, TXS!

Same here.

winksaville · 2022-05-06T18:01:57Z

winksaville
May 6, 2022
Author

Quick update, I've created issue Missing waker.wake events #78 and PR #79 on async-io. If anyone wants to experiment with that async-io version with libp2p I've created this branch on my fork which simply adds a [patch.crates-io] section to the repo wide Cargo.toml of libp2p:

wink@3900x 22-05-06T18:00:10.752Z:~/prgs/rust/myrepos/rust-libp2p (use-async-io.fix-missing-waker.wake.v2)
$ git diff master use-async-io.fix-missing-waker.wake.v2
diff --git a/Cargo.toml b/Cargo.toml
index d2f361cb..15e28b87 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -171,3 +171,6 @@ required-features = ["gossipsub"]
 [[example]]
 name = "ipfs-private"
 required-features = ["gossipsub"]
+
+[patch.crates-io]
+async-io = { git = "https://github.com/winksaville/async-io", tag = "fix-missing-waker.wake.v2" }

From my testing this fixes the timeout issue I was seeing.

1 reply

mxinden May 30, 2022
Collaborator

Again, thanks for following all the way through here and opening the issue and pull request upstream.

I don't have the capacity to drive this upstream. Am I correct in assuming that you are not blocked on this given that you can use tokio instead @winksaville?

winksaville · 2022-05-30T15:14:23Z

winksaville
May 30, 2022
Author

Using Tokio is good enough for me. But if I'm correct, IMHO, async-io should be removed as a dependency from libp2p.

@mxinden, thoughts?

1 reply

mxinden Jun 1, 2022
Collaborator

Documenting an out-of-band conversation: We will not remove support for async-io. Potentially making the bug easier to reproduce will help upstream maintainers to look into it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

relay_v2 performance anomaly on small machines #2619

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 14 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

relay_v2 performance anomaly on small machines #2619

winksaville Apr 19, 2022

Replies: 14 comments · 3 replies

mxinden Apr 22, 2022 Collaborator

winksaville Apr 22, 2022 Author

winksaville Apr 22, 2022 Author

winksaville Apr 22, 2022 Author

winksaville Apr 23, 2022 Author

winksaville Apr 25, 2022 Author

mxinden Apr 27, 2022 Collaborator

winksaville Apr 28, 2022 Author

winksaville Apr 28, 2022 Author

winksaville Apr 28, 2022 Author

mxinden May 3, 2022 Collaborator

winksaville May 3, 2022 Author

mxinden May 3, 2022 Collaborator

winksaville May 6, 2022 Author

mxinden May 30, 2022 Collaborator

winksaville May 30, 2022 Author

mxinden Jun 1, 2022 Collaborator

winksaville
Apr 19, 2022

Replies: 14 comments 3 replies

mxinden
Apr 22, 2022
Collaborator

winksaville
Apr 22, 2022
Author

winksaville
Apr 22, 2022
Author

winksaville
Apr 22, 2022
Author

winksaville
Apr 23, 2022
Author

winksaville
Apr 25, 2022
Author

mxinden
Apr 27, 2022
Collaborator

winksaville
Apr 28, 2022
Author

winksaville
Apr 28, 2022
Author

winksaville
Apr 28, 2022
Author

mxinden
May 3, 2022
Collaborator

winksaville
May 3, 2022
Author

mxinden May 3, 2022
Collaborator

winksaville
May 6, 2022
Author

mxinden May 30, 2022
Collaborator

winksaville
May 30, 2022
Author

mxinden Jun 1, 2022
Collaborator