[Inference] compatibility with third-party Inference providers #1077

julien-c · 2024-12-16T18:33:19Z

TL;DR

Allow users to request 3rd party inference providers (Sambanova, Replicate, Together, Fal) with @huggingface/inference for a curated set of models on the HF Hub

For now, Requesting a 3rd party inference provider requires users to pass an api key from this provider as a parameter to the inference function.

@xenova

cc @xenova

HuggingFaceDocBuilderDev · 2024-12-16T18:35:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

julien-c · 2024-12-17T10:41:04Z

README.md

 - [@huggingface/tasks](packages/tasks/README.md): The definition files and source-of-truth for the Hub's main primitives like pipeline tasks, model libraries, etc.
+- [@huggingface/jinja](packages/jinja/README.md): A minimalistic JS implementation of the Jinja templating engine, to be used for ML chat templates.


cc @xenova fyi

coyotte508 · 2024-12-17T11:08:34Z

LGTM. When I have time I'll look at the tests.

julien-c · 2024-12-17T11:57:27Z

@coyotte508 i think i can use VCR tapes, looks like a cool feature

coyotte508 · 2024-12-17T14:08:02Z

cc @Aschen ⬆️ :)

julien-c · 2024-12-19T19:20:20Z

packages/inference/src/lib/makeRequestOptions.ts

+				/// TODO we wil proxy the request server-side (using our own keys) and handle billing for it on the user's HF account.
+				throw new Error("Inference proxying is not implemented yet");


internal PR cc @coyotte508

packages/inference/src/lib/makeRequestOptions.ts

coyotte508

Nice!

coyotte508 · 2025-01-09T13:02:01Z

packages/inference/src/lib/makeRequestOptions.ts

+			switch (provider) {
+				case "replicate":
+					return REPLICATE_MODEL_IDS[model];
+				case "sambanova":
+					return SAMBANOVA_MODEL_IDS[model];
+				case "together":
+					return TOGETHER_MODEL_IDS[model]?.id;
+				case "fal-ai":
+					return FAL_AI_MODEL_IDS[model];
+				default:
+					return model;


I think we should maybe at least return REPLICATE_MODEL_IDS[model] ?? model; to fallback to the provided id in case the user directly provided a replicate model id and not a HF model ID. Same with the others.

Note that we could maybe maintain a mapping in the backend and in case of errors try to load it -only once (like we do for default models associated to tasks). Just a thought for the future, but it would enable new mappings without updating the lib.

I think we should maybe at least return REPLICATE_MODEL_IDS[model] ?? model; to fallback to the provided id in case the user directly provided a replicate model id and not a HF model ID. Same with the others.

cc @julien-c - we discussed it and decided to stick to HF model IDs for now

Note that we could maybe maintain a mapping in the backend and in case of errors try to load it -only once (like we do for default models associated to tasks). Just a thought for the future, but it would enable new mappings without updating the lib.

Yes, we definitely want a way for 3rd party providers to expose the mapping HF model ID -> Provider ID that does not require hardcoding / updating the huggingface.js lib

we discussed it and decided to stick to HF model IDs for now

Yes. simpler to always be "Hub-centric"

ah yes and i now remember, the ?? model in my mind was to work out of the box for models that have the same id on the inference provider as the HF id, NOT to work if you pass the provider's (different) id

packages/inference/src/lib/makeRequestOptions.ts

julien-c · 2025-01-09T15:14:45Z

i think this is ready to merge 🎉 🌮

Aschen · 2025-01-10T23:45:43Z

Hey @julien-c @coyotte508 glad to see that it's still here and it's useful ;-)

Continue your great job here, I like it 😄

julien-c added 9 commits December 16, 2024 15:20

formatting

21fc18c

link to jinja in main README

f327375

cc @xenova

link to DDUF

4526b56

Update LICENSE

5e1ee1f

Update README.md

a2e7872

implem

f1961a8

also stream

a413824

Together.ai implem

5629b86

textToImage should work too

d96a18e

julien-c requested review from coyotte508, gary149 and mishig25 December 16, 2024 18:33

julien-c commented Dec 17, 2024

View reviewed changes

support for replicate

aa7d1ca

julien-c commented Dec 19, 2024

View reviewed changes

SBrandeis reviewed Jan 6, 2025

View reviewed changes

packages/inference/src/lib/makeRequestOptions.ts Outdated Show resolved Hide resolved

SBrandeis self-assigned this Jan 6, 2025

add fal-ai as a provider

238567a

SBrandeis force-pushed the inference-providers branch from bf47b3b to 238567a Compare January 7, 2025 14:25

SBrandeis added 7 commits January 7, 2025 17:17

tweaks

43f739f

stringify JSON error when chat completing

8155828

Update tests & VCR tapes

d8e01ab

add sample for asr on fal

884f931

update tapes

e801bfa

lint

01ccbf6

fix test + update tapes

104f672

SBrandeis added 4 commits January 8, 2025 11:09

format + lint

3528b7c

tests tests tests

7e634bc

Merge branch 'main' into inference-providers

7c7a4ff

add dummy keys to secrets

3939438

SBrandeis force-pushed the inference-providers branch from daec1eb to 3939438 Compare January 8, 2025 15:42

update tests & tapes

1b71e5b

SBrandeis self-requested a review January 9, 2025 11:33

coyotte508 approved these changes Jan 9, 2025

View reviewed changes

SBrandeis added 2 commits January 9, 2025 14:58

fix textGeneration typo

841efab

fix tape for openai-community/gpt2 test

e3c5264

SBrandeis approved these changes Jan 9, 2025

View reviewed changes

SBrandeis merged commit 86b1f2e into main Jan 9, 2025
6 checks passed

SBrandeis deleted the inference-providers branch January 9, 2025 15:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inference] compatibility with third-party Inference providers #1077

[Inference] compatibility with third-party Inference providers #1077

julien-c commented Dec 16, 2024 •

edited by SBrandeis

Loading

HuggingFaceDocBuilderDev commented Dec 16, 2024

julien-c Dec 17, 2024

coyotte508 commented Dec 17, 2024

julien-c commented Dec 17, 2024

coyotte508 commented Dec 17, 2024

julien-c Dec 19, 2024

coyotte508 left a comment

coyotte508 Jan 9, 2025 •

edited

Loading

SBrandeis Jan 9, 2025

julien-c Jan 9, 2025

julien-c Jan 9, 2025

julien-c commented Jan 9, 2025

Aschen commented Jan 10, 2025 •

edited

Loading

		- [@huggingface/tasks](packages/tasks/README.md): The definition files and source-of-truth for the Hub's main primitives like pipeline tasks, model libraries, etc.
		- [@huggingface/jinja](packages/jinja/README.md): A minimalistic JS implementation of the Jinja templating engine, to be used for ML chat templates.

		/// TODO we wil proxy the request server-side (using our own keys) and handle billing for it on the user's HF account.
		throw new Error("Inference proxying is not implemented yet");

[Inference] compatibility with third-party Inference providers #1077

[Inference] compatibility with third-party Inference providers #1077

Conversation

julien-c commented Dec 16, 2024 • edited by SBrandeis Loading

TL;DR

HuggingFaceDocBuilderDev commented Dec 16, 2024

julien-c Dec 17, 2024

Choose a reason for hiding this comment

coyotte508 commented Dec 17, 2024

julien-c commented Dec 17, 2024

coyotte508 commented Dec 17, 2024

julien-c Dec 19, 2024

Choose a reason for hiding this comment

coyotte508 left a comment

Choose a reason for hiding this comment

coyotte508 Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

SBrandeis Jan 9, 2025

Choose a reason for hiding this comment

julien-c Jan 9, 2025

Choose a reason for hiding this comment

julien-c Jan 9, 2025

Choose a reason for hiding this comment

julien-c commented Jan 9, 2025

Aschen commented Jan 10, 2025 • edited Loading

julien-c commented Dec 16, 2024 •

edited by SBrandeis

Loading

coyotte508 Jan 9, 2025 •

edited

Loading

Aschen commented Jan 10, 2025 •

edited

Loading