Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ [Widgets] Enable streaming in the conversational widget #486

Merged
merged 22 commits into from
Feb 22, 2024

Conversation

SBrandeis
Copy link
Contributor

@SBrandeis SBrandeis commented Feb 16, 2024

Linked to #360 #410

Should unlock the conversational widget on Mistral if I'm not mistaken?

TL;DR

  • Leverage inference types from @huggingface/task to type input and output of the inference client
  • Use the inference client to call the inference serverless API
  • Use the streaming API when supported for the model

@SBrandeis SBrandeis requested a review from coyotte508 February 16, 2024 16:24
@SBrandeis
Copy link
Contributor Author

Screen recording showcase (first answer probably cached)

Screen.Recording.2024-02-16.at.17.30.31.mov

Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, thanks for working on this.

This is starting to all fit together very nicely!

Base automatically changed from 8578-switch-conversational-to-text-generation to main February 20, 2024 09:31
@SBrandeis SBrandeis force-pushed the chat-widget-streaming branch from bfc7fbe to 79fcfa3 Compare February 20, 2024 10:17
@SBrandeis SBrandeis changed the title (wip) Enable streaming in the conversational widget ✨ [Widgets] Enable streaming in the conversational widget Feb 20, 2024
@SBrandeis SBrandeis marked this pull request as ready for review February 20, 2024 14:10
</a>
{#if $tgiSupportedModels?.has(model.id)}
<p class="text-xs text-gray-400">
Streaming with <a href="https://huggingface.co/docs/text-generation-inference" class="underline">TGI</a>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Streaming with <a href="https://huggingface.co/docs/text-generation-inference" class="underline">TGI</a>
Streaming with <a href="https://huggingface.co/docs/text-generation-inference" class="hover:underline">TGI</a>

made it to hover:underline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -85,7 +86,7 @@ export interface TextGenerationStreamOutput {
* Use to continue text from a prompt. Same as `textGeneration` but returns generator that can be read one token at a time
*/
export async function* textGenerationStream(
args: TextGenerationArgs,
args: BaseArgs & TextGenerationInput,
options?: Options
): AsyncGenerator<TextGenerationStreamOutput> {
Copy link
Collaborator

@mishig25 mishig25 Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should TextGenerationStreamOutput live in @huggingface/tasks/src/tasks/text-generation/inference just like TextGenerationInput & TextGenerationOutput ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - cc @Wauplin with whom we discussed that previously

The current philosophy is to not type the streaming mode because it's transfer-specific, not inference-specific

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, we mentioned it here once: #468 (comment).

As a matter of fact, I came to the conclusion today that we should specify the stream parameter and the streamed output in our JS specs. I am currently starting to use the generated types in Python (see ongoing PR) and for now I've kept text_generation apart since I'm missing TextGenerationStreamResponse (defined here but I don't want to mix the auto-generated types with previous definitions). Agree it's more "transfer-specific" rather than "inference-specific" but the thing is that setting stream=True is modifying the output format so we need to document that somewhere.

…getWrapper/WidgetWrapper.svelte

Co-authored-by: Mishig <[email protected]>
@gary149 gary149 requested a review from krampstudio February 21, 2024 17:38
Copy link
Collaborator

@gary149 gary149 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice!

Copy link
Collaborator

@krampstudio krampstudio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.
Just one optional comment

Comment on lines 39 to 41
if (!$tgiSupportedModels) {
$tgiSupportedModels = await getTgiSupportedModels(apiUrl);
}
Copy link
Collaborator

@krampstudio krampstudio Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need it for every widgets?

If yes, that's ok to initialize it here,
otherwise, you can do something like this :

in store.js

import { get, writable } from 'svelte/store'; 

const tgiSupportedModels = writable<Set<string> | undefined>(undefined);

export async function getTgiSupportedModels() {
     if (!get(tgiSupportedModels)) { 
        const response = await fetch(`${url}/framework/text-generation-inference`);
	const output = await response.json();
	if (response.ok) {
		tgiSupportedModels.set(new Set(
			(output as { model_id: string; task: string }[])
				.filter(({ task }) => task === "text-generation")
				.map(({ model_id }) => model_id)
		));
	}
     }
     return tgiSupportedModels;
}

so inside widgets, you always get the store using const $tgiSupportedModels = getTgiSupportedModels() and it fetches the data only when needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so nice! thank you!

@SBrandeis SBrandeis requested a review from mishig25 February 22, 2024 14:46
</a>
<div class="flex gap-4 items-center mb-1.5">
<a
class={TASKS_DATA[task] ? "hover:underline" : undefined}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class={TASKS_DATA[task] ? "hover:underline" : undefined}
class:hover:underline={TASKS_DATA[task]}

didn't test. Can we use the new svelte syntax?

Copy link
Collaborator

@mishig25 mishig25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm !

@SBrandeis SBrandeis force-pushed the chat-widget-streaming branch from ae2b965 to 78196e5 Compare February 22, 2024 16:05
@SBrandeis SBrandeis merged commit bea807a into main Feb 22, 2024
2 checks passed
@SBrandeis SBrandeis deleted the chat-widget-streaming branch February 22, 2024 16:14
mishig25 pushed a commit that referenced this pull request May 16, 2024
### Fix `Model Loading` behaviour in Conversational Widget.

Follow up to #486
Due to #486,
Conversational Widget is different from other widgets in how it calls hf
api-inference: Conversational Widget uses `@huggingface/inference
client` while other widgets use `regular fetch`.

Equivalent of lines below were missing in Conversational Widget
https://github.com/huggingface/huggingface.js/blob/f2e9ce3c11822910d293ae3455e22bad093026a3/packages/widgets/src/lib/components/InferenceWidget/widgets/TextGenerationWidget/TextGenerationWidget.svelte#L144-L149

And this PR add this missing function.

#### Screen cast


https://github.com/huggingface/huggingface.js/assets/11827707/2201c471-964f-4943-8455-8a51800ade12

note: in the demo above `mrfakename/refusal-old` is not supported in
TGI, therefore, the output is not streamed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants