-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] HTTP Compression Support #1368
base: master
Are you sure you want to change the base?
Conversation
Properly handle no compressors enabled
|
src/nxt_http_compression.c
Outdated
nxt_str_t enc; | ||
nxt_http_comp_scheme_t scheme; | ||
|
||
tkn = strtok_r(str, ", ", &saveptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ac000,
strtok_r(3) isn't very good for this, since it will also split on
.
Let's take the following example, taken literally from RFC 9110:
Accept-Encoding: gzip;q=1.0, identity; q=0.5, *;q=0
One of the tokens would be q=0.5
.
What you want is to separate on ,
, and after that trim whitespace.
Another problem of strtok_r(3) is that it merges adjacent delimiters, which would be bogus input and should be rejected.
strsep(3) doesn't have that issue.
You probably want:
#define stpspn(s, accept) (s + strspn(s, accept))
c = stpspn(strsep(&str, ","), " ");
w = stpsep(c, ";");
stpsep(c, " ");
if (w != NULL) {
w = stpspn(w, " ");
stpsep(w, " ");
}
With stpsep() being a function of mine similar to strsep(3).
Bad part: strsep(3) is a GNU extension.
break; | ||
} | ||
|
||
qptr = strstr(tkn, ";q="); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is brittle. RFC 9110 accepts whitespace after ;
.
On Tue, 23 Jul 2024 08:04:55 -0700 Alejandro Colomar ***@***.***> wrote:
@alejandro-colomar commented on this pull request.
> +
+ str = strndup((char *)token->start, token->length);
+
+ for ( ; ; str = NULL) {
+ char *tkn, *qptr;
+ double qval = -1.0;
+ nxt_uint_t ecidx;
+ nxt_str_t enc;
+ nxt_http_comp_scheme_t scheme;
+
+ tkn = strtok_r(str, ", ", &saveptr);
+ if (tkn == NULL) {
+ break;
+ }
+
+ qptr = strstr(tkn, ";q=");
This is brittle. RFC 9110 accepts whitespace after `;`.
As I said, this likely needs some work...
|
On Tue, 23 Jul 2024 08:03:33 -0700 Alejandro Colomar ***@***.***> wrote:
With stpsep() being a function of mine similar to strsep(3).
Bad part: strsep(3) is a GNU extension.
Which is why I didn't use it.
|
However if it's available on macOS (seems so) the BSD's (looks like it probably is) and musl libc, then it may be a goer... |
Hi Andrew,
On Tue, Jul 23, 2024 at 09:07:07AM GMT, Andrew Clayton wrote:
> On Tue, 23 Jul 2024 08:03:33 -0700 Alejandro Colomar ***@***.***> wrote: With stpsep() being a function of mine similar to strsep(3). Bad part: strsep(3) is a GNU extension.
> Which is why I didn't use it.
However if it's available on macOS (seems so) the BSD's (looks like it
probably is) and musl libc, then it may be a goer...
(I shouldn't have called it a GNU extension.)
HISTORY
4.4BSD.
The strsep() function was introduced as a replacement for str‐
tok(3), since the latter cannot handle empty fields.
It is available on the BSDs:
***@***.***:~/src/bsd$ find -type f \
| grep '\.c$' \
| grep -v -e /external/ -e /crypto/ -e /contrib/ \
| xargs grepc -tfd strsep \
| grep -A3 '\.c:';
./openbsd/master/lib/libc/string/strsep.c:char *
strsep(char **stringp, const char *delim)
{
char *s;
--
./freebsd/main/lib/libc/string/strsep.c:char *
strsep(char **stringp, const char *delim)
{
char *s;
--
./freebsd/main/sys/libkern/strsep.c:char *
strsep(char **stringp, const char *delim)
{
char *s;
--
./netbsd/trunk/common/lib/libc/string/strsep.c:char *
strsep(char **stringp, const char *delim)
{
char *s;
And on musl:
***@***.***:~/src/musl/libc/master$ grepc -tfd strsep . | grep -A3 '\.c:';
./src/string/strsep.c:char *strsep(char **str, const char *sep)
{
char *s = *str, *end;
if (!s) return NULL;
It's probably unavailable on some of those dying SysV-based Unix
systems. Which is probably why POSIX hasn't taken it.
Cheers,
Alex
…
|
Rebased with master...
|
Fix compilation (nxt_tstr_query_failed() is no longer a thing)
|
How close to merge? |
|
TODO
NOTE: It still contains a tonne of debugging and a few compiler warnings (build with E=0)
|
Do this once at startup (for each enabled compressor) rather then checking and setting it on each request. TODO
NOTE: It still contains a tonne of debugging and a few compiler warnings (build with E=0)
|
TODO
NOTE: It still contains a tonne of debugging and a few compiler warnings (build with E=0)
|
TODO
NOTE: It still contains a tonne of debugging and a few compiler warnings (build with E=0)
|
TODO
NOTE: It still contains a tonne of debugging and a few compiler warnings (build with E=0)
|
|
This is to store the MIME type of the response which will be used by the HTTP compression patches as part of determining whether or not to compress the response. Signed-off-by: Andrew Clayton <[email protected]>
Rebased with master
|
Move the debug commit to the HEAD |
This is the initial step to enabling HTTP compression on both static and application responses. This code itself doesn't do any actual compression, that will come in subsequent commits. It just contains the core functions for initialising structures that describe the available compressors and functions for checking if compression should be done depending on various criteria. Signed-off-by: Andrew Clayton <[email protected]>
This adds support for both deflate & gzip compressors. Signed-off-by: Andrew Clayton <[email protected]>
Signed-off-by: Andrew Clayton <[email protected]>
Signed-off-by: Andrew Clayton <[email protected]>
This allows to actually build unit with support fro zlib, zstd and brotli compression. Any or all can be specified. E.g. $ ./configure --zlib ... $ ./configure --zlib --zstd --brotli ... During configure you will see if support for the requested compressions has been found and what version of the library is being used. E.g. ... checking for zlib ... found + zlib version: 1.3.1.zlib-ng checking for zstd ... found + zstd version: 1.5.6 checking for brotli ... found + brotli version: 1.1.0 ... Unit configuration summary: ... zlib support: .............. YES zstd support: .............. YES brotli support: ............ YES ... Co-authored-by: Alejandro Colomar <[email protected]> Signed-off-by: Alejandro Colomar <[email protected]> Signed-off-by: Andrew Clayton <[email protected]>
This exposes a new "settings.http.compression" configuration object. Under which are types & compressors objects. types is used to specify what MIME types should be considered compressible. compressors is used to configure an array of compressors that are available. For each of these, you specify the encoding, e.g gzip and optional level and min_length parameters. Where level is what compression level to use and min_length is the minimum length of data that should be compressed. By default the default compression level for the specified compressor is used and there is no minimum data length considered for compression. It may look something like "settings": { "http": { "server_version": true, "static": { "mime_types": { "text/x-c": [ ".c", ".h" ] } }, "compression": { "types": [ "text/*" ], "compressors": [ { "encoding": "gzip", "level": 3, "min_length": 2048 }, { "encoding": "deflate", "min_length": 1024 }, { "encoding": "zstd", "min_length": 2048 }, { "encoding": "br", "min_length": 256 } ] } } }, Currently this is a global option that will effect both static and application responses. In future it should be possible to add per-application (and perhaps even per-static) configuration. Signed-off-by: Andrew Clayton <[email protected]>
This adds two helper functions that will be used in subsequent commits. nxt_http_comp_compress() does the actual compression. nxt_http_comp_bound() returns the maximum compressed size for the given size. Signed-off-by: Andrew Clayton <[email protected]>
Signed-off-by: Andrew Clayton <[email protected]>
Co-authored-by: Alejandro Colomar <[email protected]> Signed-off-by: Alejandro Colomar <[email protected]> Signed-off-by: Andrew Clayton <[email protected]>
While the first bit of the above was true., it seems that the output buffer will be at least 16KiB and then multiples of there after, up to some as yet undetermined maximum.
Scrub that, the bigger the input buffer the bigger the protential max compressed data size, which pretty quickly is always larger than the originally allocated buffer... nuts...
|
First things first, as the title suggests, this is very much a 'Work In Progress'.
The code is in a mixture of Unit and Kernel coding style, full of debugging,
subject to change and it's all in one big patch.
I'm posting this in its current form to show current progress and the overall
approach taken.
This has gone through several iterations up to this point
V1
Initial version which worked similarly to @alejandro-colomar's version by
doing the compression from
nxt_http_request_send()
This has some issues
We can't set the Content-Length, it's too late at this point, so
compressed responses are sent chunked.
Creates a new buffer (via
malloc(3)
) to store the compressed data, thencopies it over to the response buffer (overwriting what was there).
We have an issue if we are compressing data which then takes up more
space than we have in the response buffer, this would likely hit for
small data sizes where the compressed data + meta data would be larger
than the original data. This can be handled as @alejandro-colomar did by
allocating a new buffer and swapping it with the current one.
Slightly convoluted logic to determine if we are on the last buffer,
so the compressor can know to properly finish up.
This approach did have the advantage of also handling application responses.
V2
In this version the compression is done in
nxt_http_static_buf_completion()
.This has the advantage that we can allocate large enough output buffers in
nxt_http_static_body_handler()
to allow for the maximum compressed size ofthe data.
It is also easier to tell if we are on the last buffer.
This still has issue (1) above.
This uses a static buffer of
NXT_HTTP_STATIC_BUF_SIZE
(128KiB) as atemporary buffer to read in chunks of the file to be compressed.
V3
This version uses
mmap(2)
to map the file to be compressed into memory, wecan then read from this mapping directly without the need to read the file
into a temporary buffer.
V4 (this version)
Like V3 above we
mmap(2)
the file to be compressed. However we do this directlyin
nxt_http_static_send_ready()
where we we read the file to be compressed in.We also
mmap(2)
a temporary output file (created viamkstemp(3)
) initially sizedto the maximum compressed size of the file.
We then compress the input mmap'd file into the output mmap'd file, negating the
need for any intermediate buffering, saving extraneous copies and stack space (or
the overhead of heap allocations) for said buffer.
Finally, we
ftruncate(2)
this new output file to its actual size.This file is what is then used in
nxt_http_static_body_handler()
to do the outputbuffer allocations and so we don't need to make any modifications to that function.
This also allows us to correctly set the
Content-Length
header.Other than V1, these don't handle application responses which will need
handling separately.
What works
This supports; deflate, gzip, zstd & brotli compression. This is all opt-in, e.g
The compressors
src/nxt_{zlib,zstd,brotli}.c
themselves are nicely isolated from the Unit core and arejust the bare minimum required to do the actual compression.
Configuration may look like
This adds a new
settings.http.compression
config option. Under here we can define the mime-types we want to compress and what compressors we want to use. For each compressor we can set the compression level and the minimum length of content to compress.This could be extended for example to allow for per-compressor mime-type overrides.
Todo
As mentioned above this currently only handles compressing static share content. Compressing application responses needs to be handled separately.
While this tries to handle more complex
Accept-Encoding
headers e.ggzip;q=1.0, identity;q=0.5, *;q=0
this no doubt requires a little more work.