-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ [Tasks] JSON Schema spec for Inference types + TS type generation #449
Conversation
Ping @coyotte508 for visibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks 😍😍😍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
/** | ||
* Inputs for Audio Classification inference | ||
*/ | ||
export interface AudioClassificationInput { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Audio data is usually pass through data
https://huggingface.co/docs/api-inference/detailed_parameters#audio-classification-task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-flagging this comment in case it was lost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for images. The jsonschema cannot specify this since sending as raw data and sending as json are 2 different things. So for now it's kind of a blind spot. If we provide an openapi schema for our APIs in the future, then it will be possible to document it. Openapi easily integrates with jsonschema so having them is already a first good step.
(difference between a jsonschema as in this PR and an openapi description is that this PR describes objects with their attributes while the openapi description with include stuff like server routes, accepted headers, etc.)
(^ only my understanding of the specs, anyone feel free to correct me 😄)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - sorry for the delay in answering
Leaving the image/audio data as unknown
was intentional, to give more flexibility to the libraries.
Image & audio data can be passed in several different forms (raw binary data, path to a local or remote file, base64 encoded data...) and I did not want to constrain downstream users of those types into one single representation.
(difference between a jsonschema as in this PR and an openapi description is that this PR describes objects with their attributes while the openapi description with include stuff like server routes, accepted headers, etc.)
Yes that is correct, there will be some additional work necessary to generate an OpenAPI spec for an inference API (including actually specifying how we expect the binary data to be represented)
@@ -216,6 +216,7 @@ export interface TaskData { | |||
datasets: ExampleRepo[]; | |||
demo: TaskDemo; | |||
id: PipelineType; | |||
canonicalId?: PipelineType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this property to express one task being a "subtask" of another (eg, summarization being a subtask of text2text-generation)
1b9c6e2
to
6b10c4d
Compare
I added a "post-process" script using the typescript API to generate the appropriate array type while glideapps/quicktype#2481 is being handled |
/** | ||
* The function to apply to the model outputs in order to retrieve the scores. | ||
*/ | ||
functionToApply?: AudioClassificationOutputTransform; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not supported by any library afaik
packages/tasks/src/tasks/automatic-speech-recognition/inference.ts
Outdated
Show resolved
Hide resolved
"items": { | ||
"description": "The output depth labels" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iirc, the output is a dictionary with two entries, one being the depth
which is a depth estimation image, the other is predicted_depth
, which is the tensor. See https://huggingface.co/docs/transformers/main/tasks/monocular_depth_estimation
"items": { | ||
"description": "The output depth labels" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** | ||
* The answer to the question. | ||
*/ | ||
answer: string; | ||
end: number; | ||
/** | ||
* The probability associated to the answer. | ||
*/ | ||
score: number; | ||
start: number; | ||
/** | ||
* The index of each word/box pair that is in the answer | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the alphabetical order is a bit weird with the docstrings. We have The answer to the question.
, then answer, then end, much later start.
/**
* The answer to the question.
*/
answer: string;
end: number;
/**
* The probability associated to the answer.
*/
score: number;
start: number;
/**
parameters?: { [key: string]: unknown }; | ||
[property: string]: unknown; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of this would work out of the box in sentence_transformers
API, but I guess we can add later on if needed
"$id": "/inference/schemas/feature-extraction/output.json", | ||
"$schema": "http://json-schema.org/draft-06/schema#", | ||
"description": "The embedding for the input text, as a nested list (tensor) of floats", | ||
"type": "array", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: it's an array in sentence transformers (one embedding per input), a list within a list in transformers (one embedding per token), and a list within a list within a list in Inference API (for batching) iirc
@@ -0,0 +1,12 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this one is not exported
399f484
to
49a8151
Compare
/** | ||
* Parametrization of the text generation process | ||
*/ | ||
generate?: GenerationParameters; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also support forward params so we can pass things such as speaker_embeddings
in SpeechT5 https://huggingface.co/microsoft/speecht5_tts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do it as a follow-up
/** | ||
* I can be the papa you'd be the mama | ||
*/ | ||
temperature?: number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a bunch in 826181a - there are still a lot of other parameters to add
@@ -0,0 +1,53 @@ | |||
/** | |||
* Inference code generated from the JSON schema spec in ./spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use this opportunity to unify text-generation
and text2text-generation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, probably
/** | ||
* The strategy used to fuse tokens based on model predictions | ||
*/ | ||
aggregationStrategy?: TokenClassificationAggregationStrategy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit strange as this is actually a load parameter, not an inference parameter - see https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.TextToAudioPipeline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you're right - but shouldn't be supported by the call method too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect yes, but maybe it changes how the model is loaded?
whoop whoop 🚀 |
Follow up to #449 Review with whitespaces off
TL;DR
quicktype-core
quicktype-core
as a dev dependency (from our fork of quicktype https://github.com/huggingface/quicktype/releases/tag/pack-18.0.15)TODO
text2text-generation
task to serve as a "canonical reference" forsummarization
&translation
text-to-audio
task to server as a "canonical reference" fortext-to-speech
any
types tounknown
sentence-similarity
feature-extraction
-> Let's do that later?