-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ [Widgets] Enable streaming in the conversational widget #486
Conversation
Screen recording showcase (first answer probably cached) Screen.Recording.2024-02-16.at.17.30.31.mov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, thanks for working on this.
This is starting to all fit together very nicely!
bfc7fbe
to
79fcfa3
Compare
packages/widgets/src/lib/components/InferenceWidget/shared/WidgetHeader/WidgetHeader.svelte
Outdated
Show resolved
Hide resolved
packages/widgets/src/lib/components/InferenceWidget/shared/WidgetWrapper/WidgetWrapper.svelte
Outdated
Show resolved
Hide resolved
packages/widgets/src/lib/components/InferenceWidget/shared/WidgetWrapper/WidgetWrapper.svelte
Outdated
Show resolved
Hide resolved
</a> | ||
{#if $tgiSupportedModels?.has(model.id)} | ||
<p class="text-xs text-gray-400"> | ||
Streaming with <a href="https://huggingface.co/docs/text-generation-inference" class="underline">TGI</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Streaming with <a href="https://huggingface.co/docs/text-generation-inference" class="underline">TGI</a> | |
Streaming with <a href="https://huggingface.co/docs/text-generation-inference" class="hover:underline">TGI</a> |
made it to hover:underline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @gary149
@@ -85,7 +86,7 @@ export interface TextGenerationStreamOutput { | |||
* Use to continue text from a prompt. Same as `textGeneration` but returns generator that can be read one token at a time | |||
*/ | |||
export async function* textGenerationStream( | |||
args: TextGenerationArgs, | |||
args: BaseArgs & TextGenerationInput, | |||
options?: Options | |||
): AsyncGenerator<TextGenerationStreamOutput> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should TextGenerationStreamOutput
live in @huggingface/tasks/src/tasks/text-generation/inference
just like TextGenerationInput
& TextGenerationOutput
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question - cc @Wauplin with whom we discussed that previously
The current philosophy is to not type the streaming mode because it's transfer-specific, not inference-specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, we mentioned it here once: #468 (comment).
As a matter of fact, I came to the conclusion today that we should specify the stream
parameter and the streamed output in our JS specs. I am currently starting to use the generated types in Python (see ongoing PR) and for now I've kept text_generation
apart since I'm missing TextGenerationStreamResponse
(defined here but I don't want to mix the auto-generated types with previous definitions). Agree it's more "transfer-specific" rather than "inference-specific" but the thing is that setting stream=True
is modifying the output format so we need to document that somewhere.
…getWrapper/WidgetWrapper.svelte Co-authored-by: Mishig <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
Just one optional comment
if (!$tgiSupportedModels) { | ||
$tgiSupportedModels = await getTgiSupportedModels(apiUrl); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need it for every widgets?
If yes, that's ok to initialize it here,
otherwise, you can do something like this :
in store.js
import { get, writable } from 'svelte/store';
const tgiSupportedModels = writable<Set<string> | undefined>(undefined);
export async function getTgiSupportedModels() {
if (!get(tgiSupportedModels)) {
const response = await fetch(`${url}/framework/text-generation-inference`);
const output = await response.json();
if (response.ok) {
tgiSupportedModels.set(new Set(
(output as { model_id: string; task: string }[])
.filter(({ task }) => task === "text-generation")
.map(({ model_id }) => model_id)
));
}
}
return tgiSupportedModels;
}
so inside widgets, you always get the store using const $tgiSupportedModels = getTgiSupportedModels()
and it fetches the data only when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so nice! thank you!
</a> | ||
<div class="flex gap-4 items-center mb-1.5"> | ||
<a | ||
class={TASKS_DATA[task] ? "hover:underline" : undefined} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class={TASKS_DATA[task] ? "hover:underline" : undefined} | |
class:hover:underline={TASKS_DATA[task]} |
didn't test. Can we use the new svelte syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm !
ae2b965
to
78196e5
Compare
### Fix `Model Loading` behaviour in Conversational Widget. Follow up to #486 Due to #486, Conversational Widget is different from other widgets in how it calls hf api-inference: Conversational Widget uses `@huggingface/inference client` while other widgets use `regular fetch`. Equivalent of lines below were missing in Conversational Widget https://github.com/huggingface/huggingface.js/blob/f2e9ce3c11822910d293ae3455e22bad093026a3/packages/widgets/src/lib/components/InferenceWidget/widgets/TextGenerationWidget/TextGenerationWidget.svelte#L144-L149 And this PR add this missing function. #### Screen cast https://github.com/huggingface/huggingface.js/assets/11827707/2201c471-964f-4943-8455-8a51800ade12 note: in the demo above `mrfakename/refusal-old` is not supported in TGI, therefore, the output is not streamed
Linked to #360 #410
Should unlock the conversational widget on Mistral if I'm not mistaken?
TL;DR
@huggingface/task
to type input and output of the inference client