Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement encoding in .rules #2319

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jokesper
Copy link

@jokesper jokesper commented Jan 21, 2025

An implementation for encoding in rules file:
The cases are:

  • -f *.rules
  • -f *.csv
  • -f *.csv --rules *.rules

The second and third case are effectively the same. The problem is that for them we call a Readers rReadFn value taking the contents rather then the filepath / handle.
So we either would have to change Reader to take a FilePath instead of Text, which would break how we do Tests,
or we add an extra field rGetContents :: FilePath -> IO Text.
I would rather get some feedback on this before I implement a potentially bigger change.

If implemented this PR should:

We use `text-icu` as [recommended by `text`](https://hackage.haskell.org/package/text-2.1.2/docs/Data-Text-Encoding.html):

> To gain access to a much larger family of encodings, use the `text-icu` package.
@jokesper jokesper force-pushed the feat-rules-encoding branch from e30d9ad to 5966c8a Compare January 21, 2025 22:43
@jokesper jokesper changed the title Implement encoding for -f *.rules Implement encoding in .rules Jan 21, 2025
@simonmichael
Copy link
Owner

Thanks for exploring this. It's probably a good idea. But what if rules files made it easier to run a preprocessor, eg

source !iconv -f CP1250 FILE.csv

That requires a little more expertise from the user, I guess ?

@simonmichael simonmichael added A-WISH Some kind of improvement request, hare-brained proposal, or plea. i18n Internationalisation/localisation-related. csv The csv file format, csv output format, or generally CSV-related. labels Jan 22, 2025
@simonmichael
Copy link
Owner

The case of reading CSV from stdin might also be worth thinking about.

@jokesper
Copy link
Author

But what if rules files made it easier to run a preprocessor

I like the idea, though implementing it with the other cases is a bit wierd, since we ignore source with -f.

We could split it into two options with the other being preprocessor iconv -f CP1250 {}.
This has the drawback of either reserving a replacement or allowing one to specify it.

The case of reading CSV from stdin might also be worth thinking about.

Is it even possible to read a csv from stdin?

  • -f - --rules *.rules doesn't work, since we first figure out whether it's a csv based on the file ending.
  • source - doesn't work, since it searches for - as a file and can't find it.
  • source /dev/stdin works, but is a bit hacky.

Though if they would work, they would be handled the same as the cases mentioned above.
Opening the handle and reading from it are indpendent.

@simonmichael
Copy link
Owner

simonmichael commented Jan 22, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-WISH Some kind of improvement request, hare-brained proposal, or plea. csv The csv file format, csv output format, or generally CSV-related. i18n Internationalisation/localisation-related.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CSV import rules, encoding
2 participants