Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't get any other compression library to recognize this format #15

Open
thejoshwolfe opened this issue Apr 9, 2016 · 1 comment
Open

Comments

@thejoshwolfe
Copy link

Using python:

import zlib
zlib.decompress("w7NIw43DicOJBwA=".decode("base64"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check

using node.js:

> zlib.inflateSync(new Buffer("w7NIw43DicOJBwA=", "base64"))
Error: incorrect header check
...
> zlib.inflateRawSync(new Buffer("w7NIw43DicOJBwA=", "base64"))
Error: invalid distance too far back
...
> zlib.gunzipSync(new Buffer("w7NIw43DicOJBwA=", "base64"))
Error: incorrect header check
...

But I can get python's compressed blobs to be accepted by node's inflateSync. (And I have experience using python for png formatting, and node for zip file formatting.)

Is this project compliant with the DEFLATE spec?

@grawity
Copy link

grawity commented Jul 4, 2019

Looks like the compressor/decompressor works fine internally, the problem is with test/base64.js. When you use the provided module to base64-encode the compressed data, it treats the input as Unicode codepoints instead of raw bytes.

  • For example, compressing "Hello" produces bytes [f3 48 cd c9 c9 07 00].

  • However, the Base64.toBase64() module treats the input as Unicode characters [U00f3 U0048 U00cd U00c9 U00c9 U0007 U0000] and uses UTF-8 to encode them to bytes, resulting in U00f3 ⇒ [c3 b3], U0048 => [48], U00cd ⇒ [c3 8d], and so on.

  • So after the UTF-8 encoding, you get [c3 b3 48 c3 8d c3 89 c3 89 07 00].

To recover incorrectly encoded data, do the opposite – run it through an UTF-8 decoder:

>>> bad_buf = Buffer.from("w7NIw43DicOJBwA=", "base64");
<Buffer c3 b3 48 c3 8d c3 89 c3 89 07 00>
>>> good_buf = Buffer.from(bad_buf.toString("utf8"), "latin1");
<Buffer f3 48 cd c9 c9 07 00>
>>> zlib.inflateRawSync(good_buf).toString("latin1")
'Hello'
$ cat bad_data | base64 -d | iconv -f utf8 -t latin1 | base64 -e > good_data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants