Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unparse: EscapeChar is wrongly set when quoteChar is changed #1068

Open
janisdd opened this issue Sep 29, 2024 · 0 comments
Open

unparse: EscapeChar is wrongly set when quoteChar is changed #1068

janisdd opened this issue Sep 29, 2024 · 0 comments

Comments

@janisdd
Copy link
Contributor

janisdd commented Sep 29, 2024

While looking into #1035 I noticed that the escapeChar is wrongly set when the quoteChar is changed. This is only the case for Papap.unparse.

Example:

Papa.unparse([['a', 'x+y']], {quoteChar: '+'})

which gives currently

a,+x""y+

As you can see, the escapeChar is still set to "" and not ++.

The issue was introduced in this commit: a627547#diff-fbb8d0b19dc4586fadd80f52a7b49415d2d41a5fdccaeaa728f7fca51cfbbb29R277

This can be easily be fixed when changing line https://github.com/mholt/PapaParse/blob/4af6882e0ea91b0baad30345f190b38d50e7535e/papaparse.js#L364C4-L364C29

if (typeof _config.quoteChar === 'string')
  _quoteChar = _config.quoteChar;

into

if (typeof _config.quoteChar === 'string') {
  _quoteChar = _config.quoteChar;
  _escapedQuote = _quoteChar + _quoteChar;
}

And here is a test-case for the UNPARSE_TESTS

{
  description: "Other quote char but without setting escape char (should be 2x the quote char)",
  input: [['a', 'x+y']],
  config: { quoteChar: "+"},
  expected: 'a,+x++y+'
}

However, fixing this will result in

a,x++y

The correct result would be

a,+x++y+

The other part of the issue is that in line

|| hasAny(escapedQuoteStr, Papa.BAD_DELIMITERS)

needsQuotes = needsQuotes
			|| _quotes === true
			|| (typeof _quotes === 'function' && _quotes(str, col))
			|| (Array.isArray(_quotes) && _quotes[col])
			|| hasAny(escapedQuoteStr, Papa.BAD_DELIMITERS)
			|| escapedQuoteStr.indexOf(_delimiter) > -1
			|| escapedQuoteStr.charAt(0) === ' '
			|| escapedQuoteStr.charAt(escapedQuoteStr.length - 1) === ' ';

Will always set needsQuotes=true if the field contains a " because Papa.BAD_DELIMITERS is defined as ['\r', '\n', '"', Papa.BYTE_ORDER_MARK].

There are different ways to fix this, here is one

  • make a new constants, e.g. Papa.NEED_QUOTES_CHARS=['\r', '\n'] and use it here instead of Papa.BAD_DELIMITERS
  • add an additional check whether the field contains the actual quoteChar, e.g. str.indexOf(_quoteChar) > -1

These changed will also fix issue #1035

At this point, a PR would probably be easier... However, this can be implemented in different ways, so I'm hesitant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant