Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better detection of charset utf-8 in html: #219

Merged
merged 3 commits into from
Jan 1, 2025

Conversation

ikreymer
Copy link
Member

@ikreymer ikreymer commented Jan 1, 2025

  • detect if is set and switch parsing to utf-8, if not already
  • parse first 512 of html buffer first in case is present in the beginning, so that subsequent parsing may use the correct encoding
  • fixes Possible encoding issue with some    #218

- detect if <meta charset='utf-8'> is set and switch parsing to utf-8 if not already
- parse first 1k of html buffer first in case <meta charset> is present and in first 1k bytes, and subsequent parsing may use the correct encoding
- fixes #218
@ikreymer ikreymer merged commit 1259d09 into main Jan 1, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible encoding issue with some &nbsp;
1 participant