Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base64 Encoding simplification + optimization #549

Merged
merged 9 commits into from
Jul 15, 2024

Conversation

to11mtm
Copy link
Collaborator

@to11mtm to11mtm commented Jul 6, 2024

Resolves #518

tl;dr - we are optimizing a lot of our base64 encode parsing by using System.Text.Buffers as it has NetStandard2.0 compatibility, which will also allow us to have less code to maintain long term. :)

This is a semi-breaking change however, as it does the following:

  1. System.Text.Json is updated to the newest version, to support ReadOnlyMemory<byte>
    1. this version is compatible with NetStandard2.0 and Net6, so, win-win?
    2. Other win-win, by doing this, we instead rely on the implementations of how System.Text.Json handles ReadOnlyMemory<byte>, i.e. it does the Base64 decoding internally and should be 'good' for all versions (if there is issue we can always PR them!)
  2. We use ReadOnlyMemory<Byte> for our model here now. If someone is somehow relying on that type for some test or other behavior it will be a breaking change
  3. FWIW this should be a -compile- breaking change, but if someone has a dis-proof of that I'm all ears.

Copy link
Collaborator Author

@to11mtm to11mtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some thoughts/reasoning to the changes given.

Comment on lines 38 to 41
<ItemGroup>
<PackageReference Include="System.Text.Json" Version="8.0.3" />
</ItemGroup>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly, IDK if I should be putting this elsewhere, but this was the 'lowest need' for the PR.

As noted, this works gr8 with netstandard2.0 and net6, so the biggest 'hazard' is consumers may have to deal with other changes.

Comment on lines 543 to 560
var bytes = ArrayPool<byte>.Shared.Rent(response.Message.Data.Length);
ObjectMetadata data;
try
{
if (System.Buffers.Text.Base64.DecodeFromUtf8(response.Message.Data.Span, bytes.AsSpan(),out _, out var written) == OperationStatus.Done)
{
var buffer = new ReadOnlySequence<byte>(bytes.AsMemory(0, written));
data = NatsObjJsonSerializer<ObjectMetadata>.Default.Deserialize(buffer) ?? throw new NatsObjException("Can't deserialize object metadata");
}
else
{
throw new NatsObjException("Can't decode data message value");
}
}
finally
{
ArrayPool<byte>.Shared.Return(bytes);
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ironically, this optimizes NET8 too 😁. Not sure how much of a hot path this can be but why not?

That said, If there is one regret, IDK how to make this easily 'composable where this doesn't feel like a copypaste of the parse with buffer'. OTOH I had had a colleague spend some time teaching me go so the overton window for pragmatism has changed. 🤷

@to11mtm
Copy link
Collaborator Author

to11mtm commented Jul 8, 2024

Thinking these errors -may- be related to some whitespace/etc present in the payload.

Convert.FromBase64String tends to be more tolerant of that... FWIW we should still be able to 'safely' trim things if that's the case (and I don't forsee a perf penalty since FromBaase64String is doing it too...)

Just gotta get my local test flow working again. 😅

@mtmk
Copy link
Collaborator

mtmk commented Jul 8, 2024

Convert.FromBase64String tends to be more tolerant of that... FWIW we should still be able to 'safely' trim things if that's the case (and I don't forsee a perf penalty since FromBaase64String is doing it too...)

it's not only us we should be compatible with other clients as well, but I'd say that probably won't be a problem once we verified we're able to read and write proper base64. we should double check once local tests are passing (I can do that as well)

Comment on lines 542 to 545
var base64String = Convert.FromBase64String(response.Message.Data);
var data = NatsObjJsonSerializer<ObjectMetadata>.Default.Deserialize(new ReadOnlySequence<byte>(base64String)) ?? throw new NatsObjException("Can't deserialize object metadata");
ObjectMetadata data;
var buffer = new ReadOnlySequence<byte>(response.Message.Data);
data = NatsObjJsonSerializer<ObjectMetadata>.Default.Deserialize(buffer) ?? throw new NatsObjException("Can't deserialize object metadata");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, should still clean this up.

@to11mtm
Copy link
Collaborator Author

to11mtm commented Jul 8, 2024

Convert.FromBase64String tends to be more tolerant of that... FWIW we should still be able to 'safely' trim things if that's the case (and I don't forsee a perf penalty since FromBaase64String is doing it too...)

it's not only us we should be compatible with other clients as well, but I'd say that probably won't be a problem once we verified we're able to read and write proper base64. we should double check once local tests are passing (I can do that as well)

🤯

I am amazed.

Tests were -actually- failing because ReadOnlyMemory<byte> in JSON Schema, will handle Base64 all on it's own! 🤯

Once I was able to get tests (kinda) running locally it was pretty obvious to see what was happening. (will discuss to see if we should file something for the sake of doc or easing for others in future).

On one hand, we are semi 'at mercy' of however System.Text.Json is doing the Base64 decode, OTOH it's a lot less code for us to maintain and I'm guessing that we are still winning overall on alloc/perf since we are avoiding a UTF8->String conversion here.

@mtmk
Copy link
Collaborator

mtmk commented Jul 8, 2024

Tests were -actually- failing because ReadOnlyMemory<byte> in JSON Schema, will handle Base64 all on it's own! 🤯

woow! very cool

@galvesribeiro
Copy link
Collaborator

Tests were -actually- failing because ReadOnlyMemory in JSON Schema, will handle Base64 all on it's own!

Yeah, there is a converter for it. Similar to byte[].

On one hand, we are semi 'at mercy' of however System.Text.Json is doing the Base64 decode, OTOH it's a lot less code for us to maintain and I'm guessing that we are still winning overall on alloc/perf since we are avoiding a UTF8->String conversion here.

Yeah, I would leave that to STJ. There is a huge work over the past couple of major releases to optimize the UTF8 handling in it. I don't see why we should need to roll our own in this case.

What we should do is to make sure all types that come from Nats are using STJ Source Generators. That would avoid the reflection costs at runtime. Besides that, I won't see why would we go lower than that. I mean, yeah, we can always optimize something but it gets to a point of diminish return.

@to11mtm to11mtm force-pushed the base64utf8-via-system-buffers-text branch 2 times, most recently from ef69b72 to 0678400 Compare July 9, 2024 21:13
@to11mtm
Copy link
Collaborator Author

to11mtm commented Jul 9, 2024

Yeah, there is a converter for it. Similar to byte[].

Yeah, I would leave that to STJ. There is a huge work over the past couple of major releases to optimize the UTF8 handling in it. I don't see why we should need to roll our own in this case.

Agreed.

What we should do is to make sure all types that come from Nats are using STJ Source Generators. That would avoid the reflection costs at runtime. Besides that, I won't see why would we go lower than that. I mean, yeah, we can always optimize something but it gets to a point of diminish return.

Agreed, if we do see an opportunity we can always contrib to upstream if needed. :)

I believe there's already sourcegen in play for STJ (I remember seeing some errors when I had bad composition), however your comment did lead me to look and I think we can also obviate a decent amount of code in ObjectStore with this change...

Although, not sure if we should keep that in same PR or separate, just for sake of sizing... open to opinions here.

Edit: Actually I may be off on that on some level, Base64URL != Base64... so I'd say keep it separated.

@galvesribeiro
Copy link
Collaborator

Although, not sure if we should keep that in same PR or separate, just for sake of sizing... open to opinions here.

Yeah, lets do on a separated round.

@to11mtm
Copy link
Collaborator Author

to11mtm commented Jul 9, 2024

Although, not sure if we should keep that in same PR or separate, just for sake of sizing... open to opinions here.

Yeah, lets do on a separated round.

Created an issue to handle that (#557)

... TBH I think I'm good to pull this out of draft, my only concern is the weird flicker on unit tests. This latest push failed on test/dotnet (2.9) pull_request, the previous push failed on test/donet(latest).

I am no stranger to flickery CI but will defer to the opinions of core team here.

@galvesribeiro
Copy link
Collaborator

galvesribeiro commented Jul 9, 2024

my only concern is the weird flicker on unit tests.

When I was discussing the netstandard2.0 support, the tests flakiness were mentioned. I think it is essentially the fixtures as your test results are looking good.

I suggested using TestContainers on it to get the fixtures working properly. I'll take a look on how to improve it and will create an issue. Created #558 to look at it.

@to11mtm to11mtm marked this pull request as ready for review July 10, 2024 20:46
@to11mtm to11mtm force-pushed the base64utf8-via-system-buffers-text branch from 2141b92 to 192ee79 Compare July 11, 2024 21:28
@to11mtm to11mtm requested a review from mtmk July 14, 2024 18:15
@to11mtm to11mtm force-pushed the base64utf8-via-system-buffers-text branch from acd5f4b to 295239e Compare July 14, 2024 18:23
Copy link
Collaborator

@mtmk mtmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @to11mtm

@mtmk mtmk merged commit b09b9ae into nats-io:main Jul 15, 2024
10 checks passed
mtmk added a commit that referenced this pull request Jul 18, 2024
* Add logging enhancements and improve error handling (#570)
* Base64 Encoding simplification + optimization (#549)
* Use string.Create when building a new inbox string (#551)
@mtmk mtmk mentioned this pull request Jul 18, 2024
mtmk added a commit that referenced this pull request Jul 18, 2024
* Add logging enhancements and improve error handling (#570)
* Base64 Encoding simplification + optimization (#549)
* Use string.Create when building a new inbox string (#551)
@mtmk mtmk added the breaking label Jul 19, 2024
mtmk added a commit that referenced this pull request Aug 2, 2024
* Handle various protocol errors and socket exceptions (#584)
* Fetch idle timeout default fix (#582)
* Obj store empty list fixed (#578)
* KV never set duplicate window (#577)
* Resolved issue of ObjStore GetInfo MTime returning 0 (#567)
* Add logging enhancements and improve error handling (#570)
* Base64 Encoding simplification + optimization (#549)
* Use string.Create when building a new inbox string (#551)
@mtmk mtmk mentioned this pull request Aug 2, 2024
mtmk added a commit that referenced this pull request Aug 2, 2024
* Handle various protocol errors and socket exceptions (#584)
* Fetch idle timeout default fix (#582)
* Obj store empty list fixed (#578)
* KV never set duplicate window (#577)
* Resolved issue of ObjStore GetInfo MTime returning 0 (#567)
* Add logging enhancements and improve error handling (#570)
* Base64 Encoding simplification + optimization (#549)
* Use string.Create when building a new inbox string (#551)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StoredMessage.Data as System.Buffers.Text
3 participants