Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on latin1 character #31

Open
Glandos opened this issue Jul 26, 2023 · 6 comments
Open

Crash on latin1 character #31

Glandos opened this issue Jul 26, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@Glandos
Copy link

Glandos commented Jul 26, 2023

Welcome to the rude world of character encoding:

~ ❯ echo "é" | iconv -f utf8 -t latin1 | /tmp/tailspin/spin 
Error reading from stdin: stream did not contain valid UTF-8
thread 'main' panicked at 'Could not receive EOF signal from oneshot channel: RecvError(())', /home/compil/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tailspin-1.2.1/src/main.rs:107:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Here, the error message is clear, but I have logs with Latin1 encoded value, and the application crashes. It shouldn't :)
I suspect other encoding not compatible with UTF-8 generate the same crash.

@bensadeh
Copy link
Owner

Thanks for opening this issue.

How does your terminal handle non-utf8 encodings in general? Have you set any of these env variables: LANG, LC_ALL, LC_CTYPE, LC_COLLATE? And if so, what are they set to?

Are you able to cat or less the log files with no issues?

@Glandos
Copy link
Author

Glandos commented Jul 27, 2023

/tmp/tailspin ❯ echo "é" | iconv -t latin1 > test.log
/tmp/tailspin ❯ cat test.log 
�
/tmp/tailspin ❯ less test.log 
/tmp/tailspin ❯ xxd test.log 
00000000: e90a                                     ..
/tmp/tailspin ❯ ./spin test.log 
thread 'main' panicked at 'Could not receive EOF signal from oneshot channel: RecvError(())', /home/compil/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tailspin-1.2.1/src/main.rs:107:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
/tmp/tailspin [101] ❯ echo $LANG , $LC_ALL , $LC_CTYPE , $LC_COLLATE .
fr_FR.UTF-8 , , , .

You already guessed I'm French, giving the "chosen character" 😉

I hope this test is long enough. I have the same behavior with LANG=C by the way.

@bensadeh
Copy link
Owner

Ahh, I see now, thanks for explaining it to me in detail. I was afraid to start implementing specific encodings for non-utf8 types at this stage, but I agree that non utf8 encodings should at the very least not crash.

This is indeed a bug, and we should handle foreign characters gracefully. My goal is to have the same behavior as both cat and less, which in this case is "don't crash".

@bensadeh bensadeh added the bug Something isn't working label Jul 27, 2023
@Glandos
Copy link
Author

Glandos commented Jul 27, 2023

cat and less do nothing with encoding. cat transliterate with UNKNOWN_CHARACTER and less output an escape sequence with <E9> on a white background.

I guess the best behavior is:

  • detect it's not a valid UTF-8 character
  • in this case display the escaped byte sequence

@bensadeh
Copy link
Owner

bensadeh commented Sep 6, 2023

Since 1.3.0, tailspin gracefully handles different encodings similar to cat when using stdin. So this should resolve the explicit example above.

For reading files on disk, we use a library that provides the actual reading and tailing functionality, and at the moment it crashes on these non-utf-8 encodings. I've opened a ticket in the repo upstream and will implement a fix on our end once a solution is in place.

@caseymdk
Copy link

Chiming in to confirm this is still present, I just discovered it trying to tail some postgres log files.

Workaround as mentioned is simply to tail -f file.log | tspin rather than tspin file.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants