Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ [Tasks] JSON spec: text-generation #468

Merged
merged 8 commits into from
Feb 12, 2024

Conversation

SBrandeis
Copy link
Contributor

@SBrandeis SBrandeis commented Feb 6, 2024

TL;DR

@SBrandeis SBrandeis requested review from Narsil and julien-c February 6, 2024 14:59
Comment on lines 27 to 29
* Best of
*/
doSample?: boolean;
bestOf?: number;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Narsil @OlivierDehaene I'm not sure what this parameter represents

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do_sample: bool -> Should the generation be greedy or not (it cannot be set to True if any other params is set, but sometimes you do not want to specify temp, nor top_p, nor anyhing, therefore setting do_sample=True is for that).

best_of: int -> Run n sampling queries, return the best (in terms of total logprob). Clashes with streaming.

@osanseviero
Copy link
Contributor

Given https://github.com/huggingface/moon-landing/pull/8723, do we really need the conversational spec?

@SBrandeis
Copy link
Contributor Author

@osanseviero I think speccing the conversational API can still be useful for the widget and inference clients, wdyt?

@SBrandeis SBrandeis changed the title [Tasks] JSON spec: text-generation & conversational [Tasks] JSON spec: text-generation Feb 7, 2024
@SBrandeis SBrandeis marked this pull request as ready for review February 7, 2024 12:49
@SBrandeis SBrandeis changed the title [Tasks] JSON spec: text-generation ♻️ [Tasks] JSON spec: text-generation Feb 7, 2024
@SBrandeis SBrandeis requested a review from coyotte508 February 8, 2024 13:01
@SBrandeis SBrandeis force-pushed the add-missing-generate-params branch from a8a59e3 to 3cad908 Compare February 9, 2024 13:08
Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry being late to the show here. I'm currently reviewing this PR :) (don't merge it too quickly 🙏)

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to list all the difference I've spotted between these specs, the TGI documentation and the Python client implementation. I think we should stick to the existing TGI parameters as they are already in use + "newer" than the transformers-based parameters.

@@ -3,7 +3,6 @@
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(commenting on inference.ts but more related to ./spec)

TGI server has an OpenAPI documentation to document its inputs and outputs (see swagger ui and openapi.json). I've spotted a few differences compared to the specs here. Some of these properties are currently specific to TGI and not available in the "transformers-based inference" endpoints.

In TextGenerationParameters:

  • max_new_tokens (integer) => Maximum number of generated tokens.
  • repetition_penalty (number) => The parameter for repetition penalty. A value of 1.0 means no penalty. See this paper for more details.
  • seed (integer) => Random sampling seed.
  • stop (array of string) => Stop generating tokens if a member of stop is generated. (currently named stop_sequences in these specs / stop in TGI => same purpose => to harmonize?)
  • top_n_tokens (integer) => ??? (no idea what this is. We don't have it either in Python client)

In the Python client, along with the inputs (string) + parameters (TextGenerationParameters) values in the request payload we also have a stream (bool) value. If True the output type is different (same as /generate_stream instead of /generate in TGI). I don't see it documented here so I wonder if these docs might be a bit outdated (ping @Narsil you might know more?). In any case, this is an option currently working and used so worth keeping it.

In TextGenerationOutputDetails:

  • best_of_sequences (array of BestOfSequence) => Additional sequences when using the best_of parameter.

BestOfSequence (currently missing):

  • generated_text (string) => The generated text.
  • finish_reason (FinishReason) => The reason for the generation to finish, represented by a FinishReason value.
  • generated_tokens (integer) => The number of generated tokens in the sequence.
  • seed (integer) => The sampling seed if sampling was activated.
  • prefill (array of InputToken) => he decoder input tokens. Empty if decoder_input_details is False.
  • tokens (array of Token) => The generated tokens.

BestOfSequence object is actually very similar to TextGenerationOutputDetails but 1 level of nesting below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deliberately left out the stream parameter because it was API/transport-specific (and not inference-specific)

I'm happy to revisit that if there's a consensus tho

@SBrandeis SBrandeis requested a review from Wauplin February 9, 2024 16:34
Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes @SBrandeis! Re-reviewed it and looks good to me now to use it in the Python client. Fair-enough about not adding the stream parameters (and let's revisit if we realize we really need it at some point).

@SBrandeis SBrandeis merged commit ea2d471 into main Feb 12, 2024
1 of 2 checks passed
@SBrandeis SBrandeis deleted the add-missing-generate-params branch February 12, 2024 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants