Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polymorphic insertAt for Text #350

Open
wants to merge 5 commits into
base: performance
Choose a base branch
from

Conversation

lightsprint09
Copy link
Member

This could resolve #326.

It introduces a the possibility to pass normal strings into insertAt.

Automerge.change(s1, doc => doc.text.insertAt(0, 'string'))

If one passes a string it gets split up into characters inside insertAt and processed as as list of single characters

@lightsprint09 lightsprint09 changed the base branch from main to performance April 30, 2021 13:10
frontend/text.js Outdated Show resolved Hide resolved
test/text_test.js Outdated Show resolved Hide resolved
@nemanja-tosic
Copy link
Contributor

What is the expected behavior for text.insertAt(0, 'abc')?

One test specifying "should support inserting a single element" is asserting that the text equals abc: https://github.com/automerge/automerge/blob/performance/test/typescript_test.ts#L327-L331

This test is in contradiction with the PR and should fail, and is only passing because there is an assertion on the stringified value of text, which will be 'abc' in either case - single element or multiple chars.

Another test is asserting that the text object contains an with the value of abc: https://github.com/automerge/automerge/blob/performance/test/typescript_test.ts#L355-L361

As a corollary to this, what should happen if i do:

text.insertAt(0, 'abc', 'abc', 'abc')

The PR does not cover this case and the outcome is

Text = ['a', 'b', 'c']

where the latter two strings are rejected without an error.

@lightsprint09
Copy link
Member Author

As a corollary to this, what should happen if i do:

text.insertAt(0, 'abc', 'abc', 'abc')

The PR does not cover this case and the outcome is

Text = ['a', 'b', 'c']

where the latter two strings are rejected without an error.

What do you expect to happen. Should we throw an error or support it it with a result of: Text = ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c']

@lightsprint09
Copy link
Member Author

What is the expected behavior for text.insertAt(0, 'abc')?

One test specifying "should support inserting a single element" is asserting that the text equals abc: https://github.com/automerge/automerge/blob/performance/test/typescript_test.ts#L327-L331

This test is in contradiction with the PR and should fail, and is only passing because there is an assertion on the stringified value of text, which will be 'abc' in either case - single element or multiple chars.

I am not sure about that. I don't know how smart TypeScript converts types

@nemanja-tosic
Copy link
Contributor

nemanja-tosic commented Apr 30, 2021

I'm not a maintainer so i wouldn't know.

I'm pointing out how the tests prior to the PR were handling the insertion of strings and it appears to me that the tests weren't asserting the same thing across the board. For example, https://github.com/automerge/automerge/blob/performance/test/typescript_test.ts#L327-L331, is specifying that inserting a string leads to an insertion of a single element, however the assertion doesn't check for that. This PR inserts a string one character at a time. So, it feels that there is a conflict between the previous specs and the behavior this PR introduces, or it's me who is misunderstanding the specs.

Given that, whether or not we throw an error is dependent upon the question "What is the expected behavior for text.insertAt(0, 'abc')?" :)

@josharian
Copy link
Contributor

The discussion in #326 seems to have landed at:

we have a single insertAt method as now, but have it accept an array of characters or a string (which gets split) as second argument

That doesn't seem to be what this PR implements. That plan should also make these calls unambiguous, because the sole arg (it's not variadic any more) is either a string or an array.

@lightsprint09
Copy link
Member Author

@josharian

So the call side would look like the following?

text.insertAt(0, ["a", "b", "c"])

or

text.insertAt(0, "abc")

@josharian
Copy link
Contributor

That's how I read it, yes.

@lightsprint09
Copy link
Member Author

I close this for now

@pvh
Copy link
Member

pvh commented May 2, 2021

Thanks for a good discussion of the situation and especially @nemanja-tosic and @josharian for review. We've definitely already agreed to do something like this so I hope you'll find a way to clean up this patch so we can merge it, @lightsprint09. If you need help, I have been meaning to work on this so just let me know.

@ept ept reopened this May 5, 2021
@ept
Copy link
Member

ept commented May 5, 2021

Thanks for taking the initiative for this change! I've updated the PR to allow to forms of Text.insertAt():

// means Automerge should split the string into individual characters
text.insertAt(index, 'hello')

// takes a string that has already been split;
// Automerge does not do any further splitting.
text.insertAt(index, ['h', 'e', 'l', 'l', 'o'])

Passing an array rather than using varargs avoids @josharian's problem of hitting the maximum number of arguments. Also, because the array form does not perform any further splitting, it is suitable for app developers who want to use custom splitting logic (e.g. splitting by Unicode grapheme cluster).

The API also needs to support inserting non-string values (such as marker objects for rich text). For now I've set it up so that the following two are equivalent:

text.insertAt(index, {bold: true})
text.insertAt(index, [{bold: true}])

It seems weird that those two calls do the same thing, but I'm not sure what a better alternative would be. One option would be to only allow the latter (where the control object appears inside an array), but that's a bit clumsy to use, and would be inconsistent with the API for inserting items into lists. What do folks think about this? Happy to have a little bit of bikeshedding…

@pvh
Copy link
Member

pvh commented May 5, 2021

I think I lean towards only supporting the latter: it either takes a string or an array of items to insert. All else is incorrect usage. This would prevent problems caused by accidentally passing other unrelated objects in.

@ept
Copy link
Member

ept commented May 5, 2021

@pvh If we require any non-string values to be wrapped in an array, should we also change the API for lists to be consistent? That API currently works like this:

let x = Automerge.change(Automerge.init(), doc => {
  doc.list = [1, 2, 4]
})
x = Automerge.change(x, doc => {
  doc.list.insertAt(2, 3) // insert the number 3 at index 2
})
// x is now { list: [ 1, 2, 3, 4 ] }

To be consistent with the Automerge.Text API we should then also change the list API so that you have to call doc.list.insertAt(2, [3]) if you want to insert the number 3 into a list — but that seems a bit weird to me. Moreover, it raises the question of what happens when you want to insert a nested list as a list element — would you then have to write doc.list.insertAt(2, [[...]])? This doesn't seem very nice. But having the Text and list APIs inconsistent with each other is not nice either.

@crn420
Copy link

crn420 commented May 7, 2021

polymorphic-insertAt

1 similar comment
@crn420
Copy link

crn420 commented May 7, 2021

polymorphic-insertAt

@crn420
Copy link

crn420 commented May 7, 2021

@lightsprint09 `

@crn420
Copy link

crn420 commented May 7, 2021

#350

@crn420
Copy link

crn420 commented May 7, 2021

Duplicate of #

@LiraNuna
Copy link

What is the expected behavior when splitting emojis? Given the code, I think it will split 👨‍👩‍👦‍👦 into '👨', ZWJ, '👩', ZWJ, '👦', ZWJ, '👦', is this desired?

@ept
Copy link
Member

ept commented Apr 20, 2022

Yes, this code will split text into Unicode scalar values, which means some emoji will get broken up. I have previously argued to use grapheme cluster segmentation, which would keep such emoji together as a single string. However, we discussed this recently in automerge/automerge#330 and came to the conclusion that it's okay to split into Unicode scalar values.

echarles pushed a commit to datalayer-externals/automerge-classic-arch that referenced this pull request Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

accept strings in Automerge.Text.insertAt
7 participants