Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update esp-hal, esp-wifi, esp-idf and edge-net dependencies #63

Merged
merged 13 commits into from
Jan 9, 2025

Conversation

ivmarkov
Copy link
Collaborator

@ivmarkov ivmarkov commented Jan 5, 2025

esp-mbedtls library:

  • esp-wifi 0.10 -> 0.11 (non-optional dep, (ab)used as a malloc provider, we'll get rid of this soon)
  • edge-net (all sub-crates) 0.3 GIT master patch -> 0.4 (optional dep, with feature edge-net only)
  • esp-idf-sys 0.35 -> 0.36 (optional dep, for the -espidf targets only)
  • embuild 0.32 -> 0.33 (optional dep, for the -espidf targets only)
  • esp-hal 0.21 -> 0.22 (optional dep, with feature esp-hal only)

Then for the examples, additionally:

  • esp-hal-embassy 0.4 -> 0.5
  • embassy-executor "=0.6" + GIT patch => 0.6
  • embassy-net 0.4 => 0.5

Then, in the examples:

  • For now we need to temporarily depend on both smoltcp 0.11 and smoltcp 0.12 because of reasons; will be fixed with the next esp-hal/esp-wifi release I think
  • We need to depend on blocking-network-stack = { git = "https://github.com/bjoernQ/blocking-network-stack", rev = "1c581661d78e0cf0f17b936297179b993fb149d7" }, as the blocking API of esp-wifi is now removed. Either that, or we have to get rid of all sync_* examples.

@ivmarkov
Copy link
Collaborator Author

ivmarkov commented Jan 5, 2025

Ah! did not notice #62. Should be mostly identical, except this one also updates edge-net and esp-idf-*. Let me quickly check...

@ivmarkov
Copy link
Collaborator Author

ivmarkov commented Jan 5, 2025

@bjoernQ @AnthonyGrondin It seems my PR is a strict super-set of #62 in that it does everything #62 does, and additionally updates to latest edge-net and esp-idf-* (i.e. no patch.crates-io anymore)

Unfortunately, same issue of course - memchr missing. Perhaps we can do a quick workaround with our own little implementation, in the library, so that we don't depend on patched versions etc.? Let me try this.

I'll also quickly check what is going on with edge-server not accepting connections.

@yanshay
Copy link
Contributor

yanshay commented Jan 5, 2025

I'll also quickly check what is going on with edge-server not accepting connections.

@ivmarkov I don't know what this issue is exactly about, but could it be related to the issue I face, reported in #60?

@ivmarkov
Copy link
Collaborator Author

ivmarkov commented Jan 5, 2025

I'll also quickly check what is going on with edge-server not accepting connections.

@ivmarkov I don't know what this issue is exactly about, but could it be related to the issue I face, reported in #60?

No it is not. Your issue is that validating certificates with esp-mbedtls simply does not work, and you have to switch it off until we figure out what is going on. This does not work since months.

@yanshay
Copy link
Contributor

yanshay commented Jan 5, 2025

I'll also quickly check what is going on with edge-server not accepting connections.

@ivmarkov I don't know what this issue is exactly about, but could it be related to the issue I face, reported in #60?

No it is not. Your issue is that validating certificates with esp-mbedtls simply does not work, and you have to switch it off until we figure out what is going on. This does not work since months.

My issue started with commit d5e29f8 on Nov 5, so not that long ago, and it is the server that does the rejection if I get it right (though I don't understand well how it all works exactly), and I don't use certificates at all.
Does that align with the esp-mbedtls issue you refer to? Is that because that underlying esp-mbedtls was upgraded on that commit?

If it does, how do I switch off whatever needs to be switched off to get this working? I don't care much about security there, I just need to get the connectivity going.

@ivmarkov
Copy link
Collaborator Author

ivmarkov commented Jan 5, 2025

I'll also quickly check what is going on with edge-server not accepting connections.

@ivmarkov I don't know what this issue is exactly about, but could it be related to the issue I face, reported in #60?

No it is not. Your issue is that validating certificates with esp-mbedtls simply does not work, and you have to switch it off until we figure out what is going on. This does not work since months.

My issue started with commit d5e29f8 on Nov 5, so not that long ago, and it is the server that does the rejection if I get it right (though I don't understand well how it all works exactly), and I don't use certificates at all. Does that align with the esp-mbedtls issue you refer to? Is that because that underlying esp-mbedtls was upgraded on that commit?

If it does, how do I switch off whatever needs to be switched off to get this working? I don't care much about security there, I just need to get the connectivity going.

Please let's not hijack this thread with something which is unrelated to this PR.
I'll put my suggestions in your issue #57 shortly.

@ivmarkov
Copy link
Collaborator Author

ivmarkov commented Jan 5, 2025

@bjoernQ I've temporarily upgraded the nightly in CI from nightly-2024-07-12 to nightly-2024-12-01 instead of just nightly until this PR is merged.

@ivmarkov
Copy link
Collaborator Author

ivmarkov commented Jan 5, 2025

@bjoernQ @AnthonyGrondin This is now ready for review.
Summary of the changes:

  • All changes from the original Update dependencies for esp-hal, esp-wifi and embassy-net #62 PR
  • Also changes to upgrade to latest edge-net & esp-idf-*; required a more recent nightly as well
  • Polyfill with tinyrlibc for the missing memchr. Check the comment on the polyfill though. I think this should actually become a pattern and we should support such polyfills for all libc functions used by mbedtls in future, if the concrete baremetal HAL / user does not declare that it supports its own polyfills
  • Fixed a potential issue when writing in esp-mbedtls/lib.rs
  • Addressed all clippy issues induced from the new nightly (mainly in the examples, but also a few formatting fixes in the actual library)
  • Regarding the issue in edge_server that it does not work:
    • It does work with latest-released edge-net. There was a regression in that after the upgrade to embassy-net 0.5, the lib was unable to listen to port 0.0.0.0, but this is now fixed
    • ALSO: It seems we need quite a bit more heap now (?!) I had to increase the heap in the edge_server from 110 to 140 KB so that the server works in both Chrome and Firefox. However, I doubt this is induced by edge-net itself, as it is no-alloc. Did something change in esp-hal / esp-wifi? The reason why I believe we need more memory is because otherwise I'm faced with a lot of either -17168 (private key error) or -32512 (ssl alloc failed), where I believe both are caused by not enough memory though. Note that an upcoming subsequent PR of mine is hopefully reducing the heap usage by re-using the instantiated public/private key pairs. We are currently having separate ones for each Session, even if the cert+key is the same.

@AnthonyGrondin
Copy link
Collaborator

Thank you for this contribution. Yeah it's a superset of #62 and I'll close it once this one is merged, as yours provides more contribution value.

  • Can you document the reasons behind the socket count increase in dea21dc
  • Can you document the reasons behind the fixes provided in 8a9e7b0, perhaps describe in the code why EOF needs to be returned / the steps of the io loop

Polyfill with tinyrlibc for the missing memchr. Check the comment on the polyfill though. I think this should actually become a pattern and we should support such polyfills for all libc functions used by mbedtls in future, if the concrete baremetal HAL / user does not declare that it supports its own polyfills.

This is a better fix than waiting for a new update for esp-hal with even more breaking changes (and forcing Rust +1.83).
I like the feature flag fix so that tinyrlibc is only pulled when building for esp32c3, but this comes at the tradeoff of requiring more granular control, than using #[linkage = "weak"] and always provide a polyfill, no matter the target.

That being said for now, I don't mind the granular path, we'll see once we add more targets outside of the esp ecosystem (if any).


It seems we need quite a bit more heap now (?!) I had to increase the heap in the edge_server from 110 to 140 KB so that the server works in both Chrome and Firefox.

There is indeed a significant memory increase that should be investigated. I've profiled both the current main (for a more deterministic approach, I patched edge-net to 4c0940b) and here's the result I've got after doing a request, waiting for timeout and then doing another request:

Current main (edge-net fixed at 4c0940b2b6e550a577e2602af97ea57548714915)

Session connection time:

Benchmark 1: curl -k https://192.168.69.166 -H 'connection: close'
  Time (mean ± σ):      2.310 s ±  0.030 s    [User: 0.004 s, System: 0.003 s]
  Range (min … max):    2.287 s …  2.385 s    10 runs

Memory usage:

WARN - Before request: 41616
INFO - Creating 2 handler tasks, memory: 4120B
INFO - new session
WARN - Before init_ssl: 41740
WARN - After mbedtls_ssl_setup: 85612
WARN - Got new connection: 85524
WARN - Connection done: 85524
WARN - Got new connection: 85524
WARN - Connection done: 85524
WARN - Handler task 0: Error when handling request: Connection(Io(Timeout))
WARN - Before dropping: 85524
WARN - After dropping: 41740
INFO - new session
WARN - Before init_ssl: 41740
WARN - After mbedtls_ssl_setup: 85612
WARN - Got new connection: 85524
WARN - Connection done: 85524
WARN - Got new connection: 85524
WARN - Connection done: 85524
WARN - Handler task 1: Error when handling request: Connection(Io(Timeout))
WARN - Before dropping: 85524
WARN - After dropping: 41740

With update (this PR)

Session connection time:

Benchmark 1: curl -k https://192.168.69.166 -H 'connection: close'
  Time (mean ± σ):      2.323 s ±  0.016 s    [User: 0.003 s, System: 0.004 s]
  Range (min … max):    2.308 s …  2.352 s    10 runs

Note: I've also noticed that running the benchmark will sometimes cause a new session to be established while the previous one is still in memory, create a 2 * N memory usage, where N is the amount of memory required per session. I haven't noticed the current head to be doing this at any time, so far (would require extensive testing to confirm).

Memory usage:

WARN - Before request: 45368
INFO - Creating 2 handler tasks, memory: 4104B
INFO - new session
WARN - Before init_ssl: 45496
WARN - After mbedtls_ssl_setup: 89368
WARN - Got new connection: 89328
WARN - Connection done: 89328
WARN - Got new connection: 89328
WARN - Connection done: 89328
WARN - Handler task 0: Error when handling request: Connection(Io(Timeout))
WARN - Before dropping: 89328
WARN - After dropping: 45544
INFO - new session
WARN - Before init_ssl: 45544
WARN - After mbedtls_ssl_setup: 89416
WARN - Got new connection: 89328
WARN - Connection done: 89328
WARN - Got new connection: 89328
WARN - Connection done: 89328
WARN - Handler task 1: Error when handling request: Connection(Io(Timeout))
WARN - Before dropping: 89328
WARN - After dropping: 45544

P.S. I'm really looking forward to the future improvements to this library, and its memory usage / compatibility, and certificate management.

Copy link
Collaborator

@AnthonyGrondin AnthonyGrondin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a review on the edge_server example, but I believe some of it applies to almost all of the async examples. I just didn't want to make too much redundant noise.

In term of socket count, perhaps we should use a const variable for every example, like we do in edge_server, instead of hardcoding a fixed value without much description.

examples/edge_server.rs Outdated Show resolved Hide resolved
examples/edge_server.rs Outdated Show resolved Hide resolved
@ivmarkov
Copy link
Collaborator Author

ivmarkov commented Jan 8, 2025

  • Can you document the reasons behind the fixes provided in 8a9e7b0, perhaps describe in the code why EOF needs to be returned / the steps of the io loop

Agreed and addressed.

In retrospective, the flush_write-related change reasoning is not obvious (or else I would've done it from the get-go).
The "EOF" cases should be more obvious, but I've commented these as well, as I'm not 100% convinced unrolling the stack when these happens is exactly right. Shall we rather return 0 instead? In any case, we should do something and what exactly is not trivial to think out quickly, so unrolling the stack (for now, until we get more intelligent there and see how the lib behaves in actual conditions) is IMO good enough.

Polyfill with tinyrlibc for the missing memchr. Check the comment on the polyfill though. I think this should actually become a pattern and we should support such polyfills for all libc functions used by mbedtls in future, if the concrete baremetal HAL / user does not declare that it supports its own polyfills.

This is a better fix than waiting for a new update for esp-hal with even more breaking changes (and forcing Rust +1.83). I like the feature flag fix so that tinyrlibc is only pulled when building for esp32c3, but this comes at the tradeoff of requiring more granular control, than using #[linkage = "weak"] and always provide a polyfill, no matter the target.

Hm, how would you do #[linkage = "weak"]?

In any case, I completely changed how we polyfill the missing memchr symbol (look at the examples). Now, this new approach I really like a lot, because (a) esp-mbedtls is (once again) not concerned how users are providing these functions (b) users are free to use tinyrlibc or whatever to provide these. A bit like your use esp-wifi as _ except not in esp-mbedtls itself (so no need for a feature-per-libc-function) and of course we should get rid of the esp-wifi dependency for other reasons.

There is indeed a significant memory increase that should be investigated. I've profiled both the current main (for a more deterministic approach, I patched edge-net to 4c0940b) and here's the result I've got after doing a request, waiting for timeout and then doing another request:

Hmmm, in your benchmarks I actually see only an increase of 5K, which is not that much. And this is in the heap by the way, so
can't be caused by edge-http which is no-alloc. Perhaps esp-wifi or esp-hal?

I think I had to increase the memory much more (30K) because of the "two edge-http connections" issue actually. Read on.

Note: I've also noticed that running the benchmark will sometimes cause a new session to be established while the previous one is still in memory, create a 2 * N memory usage, where N is the amount of memory required per session. I haven't noticed the current head to be doing this at any time, so far (would require extensive testing to confirm).

I'm surprised actually that the current head is not doing this? See, if you configure edge-http with 2 connections/handlers (as we do) nothing stops the browser from opening two TLS connections to the server, isn't it? And the server would just try to handle these?

P.S. I'm really looking forward to the future improvements to this library, and its memory usage / compatibility, and certificate management.

This is also coming. I am putting big bets that the improved certificate management should reduce the memory considerable, when you have > 1 connection to the same server with the same certificates. Fingers crossed.

@AnthonyGrondin
Copy link
Collaborator

Thank you for the added documentation.

I like the way the polyfill is provided too. It's completely independent of esp-mbedtls. Also, this specific issue has been fixed in esp-rs/esp-hal#2896 so in a later update, we could remove it.

The 5K RAM increase mostly comes from esp-wifi, I believe, considering that it's noticeable even before initializing the TLS Session.

I'm surprised actually that the current head is not doing this? See, if you configure edge-http with 2 connections/handlers (as we do) nothing stops the browser from opening two TLS connections to the server, isn't it? And the server would just try to handle these?

I did a little bit of testing with hyperfine --runs 100 -i "curl -k https://<IP> -H 'connection: close'" on revision 1431835 while running edge_server and there happens to be double connections, causing overflow sometimes. In a run of 100 connections, it happened twice, where it overflows. Doing the same but on this PR, I get about 28 overflows this time. Not sure if this is worth investigating, as this is a timing issue that could come from anywhere, where intrinsically, the real issue is that the allocator doesn't have enough space to allocate for the number of maximum parallel connections at the same time, which is itself just a ticking bomb.

In retrospective, the flush_write-related change reasoning is not obvious (or else I would've done it from the get-go).
The "EOF" cases should be more obvious, but I've commented these as well, as I'm not 100% convinced unrolling the stack when these happens is exactly right. Shall we rather return 0 instead? In any case, we should do something and what exactly is not trivial to think out quickly, so unrolling the stack (for now, until we get more intelligent there and see how the lib behaves in actual conditions) is IMO good enough.

I agree with this approach, we should go case by case and by the needs to the real world.

Overall looks good to me. I see the TcpSplit trait added, and there was a discussion a bit ago about supporting full-duplex. #49 (comment)

I will open an issue about it to solve in a future PR.

Copy link
Collaborator

@AnthonyGrondin AnthonyGrondin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. It would be great if you could also include the fix for SHA in this PR to avoid a subsequent PR, since it's just a 6 lines diff.

@ivmarkov
Copy link
Collaborator Author

ivmarkov commented Jan 9, 2025

P.S. It would be great if you could also include the fix for SHA in this PR to avoid a subsequent PR, since it's just a 6 lines diff.

Will do shortly, yes

@ivmarkov
Copy link
Collaborator Author

ivmarkov commented Jan 9, 2025

P.S. It would be great if you could also include the fix for SHA in this PR to avoid a subsequent PR, since it's just a 6 lines diff.

Done now. @bjoernQ I think this is ready for review from your side and possibly merge.

@yanshay
Copy link
Contributor

yanshay commented Jan 9, 2025

Is this supposed to work with latest releases of esp-hal and esp-wifi so I can test it as well?

Copy link
Collaborator

@bjoernQ bjoernQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the effort here!

Regarding the increased use of dynamic memory: Indeed, in esp-wifi we now prefer dynamic allocations over static allocations which explains it

@bjoernQ bjoernQ merged commit 9cce6c2 into esp-rs:main Jan 9, 2025
1 check passed
@AnthonyGrondin
Copy link
Collaborator

Is this supposed to work with latest releases of esp-hal and esp-wifi so I can test it as well?

Yes, could you test and see if this fixes #60

@yanshay
Copy link
Contributor

yanshay commented Jan 10, 2025

Is this supposed to work with latest releases of esp-hal and esp-wifi so I can test it as well?

Yes, could you test and see if this fixes #60

Yes, it's working, great to be on the main branch once again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants