Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tasks and 🛳️ mask-generation and zero-shot-object-detection #462

Merged
merged 26 commits into from
Jan 31, 2024
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
ce1a841
Improve snippet
merveenoyan Jan 26, 2024
98e002c
Update about.md
merveenoyan Jan 26, 2024
d6ad547
Add ASR blogs
merveenoyan Jan 26, 2024
b983487
Add new blogs for t2i
merveenoyan Jan 26, 2024
66e6bb6
Add resources to text-generation
merveenoyan Jan 26, 2024
a695d22
Update about.md
merveenoyan Jan 26, 2024
8a89dcf
🛳️ mask generation and zero shot object detection
merveenoyan Jan 26, 2024
49bfcc8
Add SetFit ABSA blog
merveenoyan Jan 26, 2024
432445f
add resources to t2i
merveenoyan Jan 26, 2024
ec9b0e1
Update packages/tasks/src/tasks/text-generation/about.md
merveenoyan Jan 26, 2024
f5f20c4
Update packages/tasks/src/tasks/text-to-image/about.md
merveenoyan Jan 26, 2024
04e6dc6
Update about.md
merveenoyan Jan 26, 2024
1a46966
Update packages/tasks/src/tasks/automatic-speech-recognition/about.md
merveenoyan Jan 29, 2024
eb60bd7
Update packages/tasks/src/tasks/mask-generation/about.md
merveenoyan Jan 30, 2024
795cf91
Update packages/tasks/src/tasks/mask-generation/about.md
merveenoyan Jan 30, 2024
826c52b
Update packages/tasks/src/tasks/mask-generation/about.md
merveenoyan Jan 30, 2024
43aea49
Update packages/tasks/src/tasks/text-to-image/about.md
merveenoyan Jan 30, 2024
8f3099c
Update packages/tasks/src/tasks/text-to-image/about.md
merveenoyan Jan 30, 2024
9a7edbe
Merge branch 'main' into update_tasks_2
merveenoyan Jan 30, 2024
afd5b8d
Update packages/tasks/src/tasks/mask-generation/about.md
merveenoyan Jan 30, 2024
42a44b2
Update packages/tasks/src/tasks/mask-generation/about.md
merveenoyan Jan 30, 2024
21a7edd
Update packages/tasks/src/tasks/text-generation/about.md
merveenoyan Jan 30, 2024
f211ca6
addressed Pedro's comments
merveenoyan Jan 30, 2024
19da8da
nit
merveenoyan Jan 30, 2024
2435d58
nit
merveenoyan Jan 30, 2024
82b277d
Update packages/tasks/src/tasks/mask-generation/about.md
merveenoyan Jan 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,8 @@ These events help democratize ASR for all languages, including low-resource lang
- [Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters](https://arxiv.org/pdf/2007.03001.pdf)
- An ASR toolkit made by [NVIDIA: NeMo](https://github.com/NVIDIA/NeMo) with code and pretrained models useful for new ASR models. Watch the [introductory video](https://www.youtube.com/embed/wBgpMf_KQVw) for an overview.
- [An introduction to SpeechT5, a multi-purpose speech recognition and synthesis model](https://huggingface.co/blog/speecht5)
- [A guide on Fine-tuning Whisper For Multilingual ASR with 🤗Transformers](https://huggingface.co/blog/fine-tune-whisper)
- [Fine-tune Whisper For Multilingual ASR with 🤗Transformers](https://huggingface.co/blog/fine-tune-whisper)
- [Automatic speech recognition task guide](https://huggingface.co/docs/transformers/tasks/asr)
- [Speech Synthesis, Recognition, and More With SpeechT5](https://huggingface.co/blog/speecht5)
- [Fine-Tune W2V2-Bert for low-resource ASR with 🤗 Transformers](https://huggingface.co/blog/fine-tune-w2v2-bert)
- [Speculative Decoding for 2x Faster Whisper Inference](https://huggingface.co/blog/whisper-speculative-decoding)
6 changes: 4 additions & 2 deletions packages/tasks/src/tasks/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import imageClassification from "./image-classification/data";
import imageToImage from "./image-to-image/data";
import imageToText from "./image-to-text/data";
import imageSegmentation from "./image-segmentation/data";
import maskGeneration from "./mask-generation/data";
import objectDetection from "./object-detection/data";
import depthEstimation from "./depth-estimation/data";
import placeholder from "./placeholder/data";
Expand All @@ -33,6 +34,7 @@ import videoClassification from "./video-classification/data";
import visualQuestionAnswering from "./visual-question-answering/data";
import zeroShotClassification from "./zero-shot-classification/data";
import zeroShotImageClassification from "./zero-shot-image-classification/data";
import zeroShotObjectDetection from "./zero-shot-object-detection/data";

import type { ModelLibraryKey } from "../model-libraries";

Expand Down Expand Up @@ -131,7 +133,7 @@ export const TASKS_DATA: Record<PipelineType, TaskData | undefined> = {
"image-to-image": getData("image-to-image", imageToImage),
"image-to-text": getData("image-to-text", imageToText),
"image-to-video": undefined,
"mask-generation": getData("mask-generation", placeholder),
"mask-generation": getData("mask-generation", maskGeneration),
"multiple-choice": undefined,
"object-detection": getData("object-detection", objectDetection),
"video-classification": getData("video-classification", videoClassification),
Expand Down Expand Up @@ -162,7 +164,7 @@ export const TASKS_DATA: Record<PipelineType, TaskData | undefined> = {
"voice-activity-detection": undefined,
"zero-shot-classification": getData("zero-shot-classification", zeroShotClassification),
"zero-shot-image-classification": getData("zero-shot-image-classification", zeroShotImageClassification),
"zero-shot-object-detection": getData("zero-shot-object-detection", placeholder),
"zero-shot-object-detection": getData("zero-shot-object-detection", zeroShotObjectDetection),
"text-to-3d": getData("text-to-3d", placeholder),
"image-to-3d": getData("image-to-3d", placeholder),
} as const;
Expand Down
43 changes: 37 additions & 6 deletions packages/tasks/src/tasks/mask-generation/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,25 +6,56 @@ When filtering for an image, the generated masks might serve as an initial filte

### Masked Image Modelling

Generating masks can be done to facilitate learning, especially in semi- or unsupervised learning. For example, the [BEiT model](https://huggingface.co/docs/transformers/model_doc/beit) uses image-masked patches in the pre-training.
Generating masks can facilitate learning, especially in semi or unsupervised learning. For example, the [BEiT model](https://huggingface.co/docs/transformers/model_doc/beit) uses image-mask patches in the pre-training.

### Human-in-the-loop
### Human-in-the-loop Computer Vision Applications

For applications where humans are in the loop, masks highlight certain region of images for humans to validate.
For applications where humans are in the loop, masks highlight certain regions of images for humans to validate.

## Task Variants

### Segmentation

Image Segmentation divides an image into segments where each pixel in the image is mapped to an object. This task has multiple variants such as instance segmentation, panoptic segmentation and semantic segmentation. You can learn more about segmentation on its [task page](https://huggingface.co/tasks/image-segmentation).
Image Segmentation divides an image into segments where each pixel is mapped to an object. This task has multiple variants, such as instance segmentation, panoptic segmentation, and semantic segmentation. You can learn more about segmentation on its [task page](https://huggingface.co/tasks/image-segmentation).

## Inference

Mask generation models often work in two modes: segment everything or prompt mode.
The example below works in segment-everything-mode, where many masks will be returned.

```python
from transformers import pipeline
generator = pipeline("mask-generation", device = 0, points_per_batch = 256)

generator = pipeline("mask-generation", model="Zigeng/SlimSAM-uniform-50", points_per_batch=64, device="cuda")
image_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
outputs = generator(image_url, points_per_batch = 256)
outputs = generator(image_url)
outputs["masks"]
# array of multiple binary masks returned for each generated mask
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to show them (or a few)?

```

Prompt mode takes in three types of prompts:

- **Point prompt:** The user can select a point on the image, and a meaningful segment around the point will be returned.
- **Box prompt:** The user can draw a box on the image, and a meaningful segment within the box will be returned.
- **Text prompt:** The user can input a text, and the objects of that type will be segmented. Note that this capability has not yet been released and has only been explored in research.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Text prompt:** The user can input a text, and the objects of that type will be segmented. Note that this capability has not yet been released and has only been explored in research.
- **Text prompt:** The user can input a text, and the objects of that type will be segmented. Note that this capability is not available in all models and has only been explored in research.

If no model supports it yet maybe we should not list it?


Below you can see how to use an input-point prompt. It also demonstrates direct model inference without the `pipeline` abstraction. The input prompt here is a list of lists where outermost list is number of batches, inner one is number of points and innermost one is the actual location of the point.
merveenoyan marked this conversation as resolved.
Show resolved Hide resolved

```python
from transformers import SamModel, SamProcessor
from PIL import Image
import requests

model = SamModel.from_pretrained("Zigeng/SlimSAM-uniform-50").to("cuda")
processor = SamProcessor.from_pretrained("Zigeng/SlimSAM-uniform-50")

raw_image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe repeat the url here, so the example is self-contained?

# pointing to the car window
input_points = [[[450, 600]]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe briefly explain the shape? (bs, number_of_points, 2 (coords per point)). Also, does the processor reshape to include the batch size if we just pass a list of points?

inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to("cuda")
outputs = model(**inputs)
masks = processor.post_process_masks(outputs.pred_masks.cpu(), inputs["original_sizes"].cpu(), inputs["reshaped_input_sizes"].cpu())
scores = outputs.iou_scores
```

## Useful Resources
Expand Down
1 change: 1 addition & 0 deletions packages/tasks/src/tasks/text-classification/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ classifier("I will walk to home when I went through the bus.")

Would you like to learn more about the topic? Awesome! Here you can find some curated resources that you may find helpful!

- [SetFitABSA: Few-Shot Aspect Based Sentiment Analysis using SetFit](https://huggingface.co/blog/setfit-absa)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great resource, but maybe a bit advanced to place it the first in the list (not sure)

- [Course Chapter on Fine-tuning a Text Classification Model](https://huggingface.co/course/chapter3/1?fw=pt)
- [Getting Started with Sentiment Analysis using Python](https://huggingface.co/blog/sentiment-analysis-python)
- [Sentiment Analysis on Encrypted Data with Homomorphic Encryption](https://huggingface.co/blog/sentiment-analysis-fhe)
Expand Down
37 changes: 24 additions & 13 deletions packages/tasks/src/tasks/text-generation/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,25 +110,36 @@ Would you like to learn more about the topic? Awesome! Here you can find some cu
- [ChatUI Docker Spaces](https://huggingface.co/docs/hub/spaces-sdks-docker-chatui)
- [Causal language modeling task guide](https://huggingface.co/docs/transformers/tasks/language_modeling)
- [Text generation strategies](https://huggingface.co/docs/transformers/generation_strategies)
- [Course chapter on training a causal language model from scratch](https://huggingface.co/course/chapter7/6?fw=pt)

### Course and Blogs
### Model Inference & Deployment

- [Course Chapter on Training a causal language model from scratch](https://huggingface.co/course/chapter7/6?fw=pt)
- [TO Discussion with Victor Sanh](https://www.youtube.com/watch?v=Oy49SCW_Xpw&ab_channel=HuggingFace)
- [Hugging Face Course Workshops: Pretraining Language Models & CodeParrot](https://www.youtube.com/watch?v=ExUR7w6xe94&ab_channel=HuggingFace)
- [Training CodeParrot 🦜 from Scratch](https://huggingface.co/blog/codeparrot)
- [How to generate text: using different decoding methods for language generation with Transformers](https://huggingface.co/blog/how-to-generate)
- [Optimizing your LLM in production](https://huggingface.co/blog/optimize-llm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave a new line between headings and lists, as some editors won't work well otherwise

- [Open-Source Text Generation & LLM Ecosystem at Hugging Face](https://huggingface.co/blog/os-llms)
- [Introducing RWKV - An RNN with the advantages of a transformer](https://huggingface.co/blog/rwkv)
- [Llama 2 is at Hugging Face](https://huggingface.co/blog/llama2)
- [Guiding Text Generation with Constrained Beam Search in 🤗 Transformers](https://huggingface.co/blog/constrained-beam-search)
- [Code generation with Hugging Face](https://huggingface.co/spaces/codeparrot/code-generation-models)
- [🌸 Introducing The World's Largest Open Multilingual Language Model: BLOOM 🌸](https://huggingface.co/blog/bloom)
- [The Technology Behind BLOOM Training](https://huggingface.co/blog/bloom-megatron-deepspeed)
- [Faster Text Generation with TensorFlow and XLA](https://huggingface.co/blog/tf-xla-generate)
- [Assisted Generation: a new direction toward low-latency text generation](https://huggingface.co/blog/assisted-generation)
- [Introducing RWKV - An RNN with the advantages of a transformer](https://huggingface.co/blog/rwkv)
- [How to generate text: using different decoding methods for language generation with Transformers](https://huggingface.co/blog/how-to-generate)
- [Faster Text Generation with TensorFlow and XLA](https://huggingface.co/blog/tf-xla-generate)

### Model Fine-tuning/Training

- [Non-engineers guide: Train a LLaMA 2 chatbot](https://huggingface.co/blog/Llama2-for-non-engineers)
- [Training CodeParrot 🦜 from Scratch](https://huggingface.co/blog/codeparrot)
- [Creating a Coding Assistant with StarCoder](https://huggingface.co/blog/starchat-alpha)
- [StarCoder: A State-of-the-Art LLM for Code](https://huggingface.co/blog/starcoder)
- [Open-Source Text Generation & LLM Ecosystem at Hugging Face](https://huggingface.co/blog/os-llms)
- [Llama 2 is at Hugging Face](https://huggingface.co/blog/llama2)

### Advanced Concepts Explained Simply

- [Mixture of Experts Explained](https://huggingface.co/blog/moe)

### Advanced Fine-tuning/Training Recipes

- [Fine-tuning Llama 2 70B using PyTorch FSDP](https://huggingface.co/blog/ram-efficient-pytorch-fsdp)
- [The N Implementation Details of RLHF with PPO](https://huggingface.co/blog/the_n_implementation_details_of_rlhf_with_ppo)
- [Preference Tuning LLMs with Direct Preference Optimization Methods](https://huggingface.co/blog/pref-tuning)
- [Fine-tune Llama 2 with DPO](https://huggingface.co/blog/dpo-trl)

### Notebooks

Expand Down
13 changes: 11 additions & 2 deletions packages/tasks/src/tasks/text-to-image/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,14 +53,23 @@ await inference.textToImage({

## Useful Resources

### Model Inference
merveenoyan marked this conversation as resolved.
Show resolved Hide resolved

- [Hugging Face Diffusion Models Course](https://github.com/huggingface/diffusion-models-class)
- [Getting Started with Diffusers](https://huggingface.co/docs/diffusers/index)
- [Text-to-Image Generation](https://huggingface.co/docs/diffusers/using-diffusers/conditional_image_generation)
- [MinImagen - Build Your Own Imagen Text-to-Image Model](https://www.assemblyai.com/blog/minimagen-build-your-own-imagen-text-to-image-model/)
- [Using LoRA for Efficient Stable Diffusion Fine-Tuning](https://huggingface.co/blog/lora)
- [Using Stable Diffusion with Core ML on Apple Silicon](https://huggingface.co/blog/diffusers-coreml)
- [A guide on Vector Quantized Diffusion](https://huggingface.co/blog/vq-diffusion)
- [🧨 Stable Diffusion in JAX/Flax](https://huggingface.co/blog/stable_diffusion_jax)
- [Running IF with 🧨 diffusers on a Free Tier Google Colab](https://huggingface.co/blog/if)
- [Introducing Würstchen: Fast Diffusion for Image Generation](https://huggingface.co/blog/wuerstchen)
- [Efficient Controllable Generation for SDXL with T2I-Adapters](https://huggingface.co/blog/t2i-sdxl-adapters)
- [Welcome aMUSEd: Efficient Text-to-Image Generation](https://huggingface.co/blog/amused)

### Model Fine-tuning
merveenoyan marked this conversation as resolved.
Show resolved Hide resolved

- [Finetune Stable Diffusion Models with DDPO via TRL](https://huggingface.co/blog/pref-tuning)
- [LoRA training scripts of the world, unite!](https://huggingface.co/blog/sdxl_lora_advanced_script)
- [Using LoRA for Efficient Stable Diffusion Fine-Tuning](https://huggingface.co/blog/lora)
Comment on lines +71 to +73
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put them exactly in reverse order lol. Also https://huggingface.co/blog/annotated-diffusion is a great, great resource (and not only for fine-tuning).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They weren't in particular order honestly


This page was made possible thanks to the efforts of [Ishan Dutta](https://huggingface.co/ishandutta), [Enrique Elias Ubaldo](https://huggingface.co/herrius) and [Oğuz Akif](https://huggingface.co/oguzakif).
2 changes: 2 additions & 0 deletions packages/tasks/src/tasks/text-to-speech/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,5 @@ await inference.textToSpeech({
- [An introduction to SpeechT5, a multi-purpose speech recognition and synthesis model](https://huggingface.co/blog/speecht5).
- [A guide on Fine-tuning Whisper For Multilingual ASR with 🤗Transformers](https://huggingface.co/blog/fine-tune-whisper)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not tts, is it?

- [Speech Synthesis, Recognition, and More With SpeechT5](https://huggingface.co/blog/speecht5)
- [Optimizing a Text-To-Speech model using 🤗 Transformers](https://huggingface.co/blog/optimizing-bark)
-
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
## Use Cases

Zero-shot object detection models can be used in any object detection application where the detection involves text queries for objects of interest.

### Object Search

Zero-shot object detection models can be used in image search. Smartphones, for example, use zero-shot object detection models to detect entities (such as specific places or objects) and allow the user to search for the entity on the internet.
Expand All @@ -8,6 +10,10 @@ Zero-shot object detection models can be used in image search. Smartphones, for

Zero-shot object detection models are used to count instances of objects in a given image. This can include counting the objects in warehouses or stores or the number of visitors in a store. They are also used to manage crowds at events to prevent disasters.

### Object Tracking

Zero-shot object detectors can track objects in videos.

## Inference

You can infer with zero-shot object detection models through the `zero-shot-object-detection` pipeline. When calling the pipeline, you just need to specify a path or HTTP link to an image and the candidate labels.
Expand Down
Loading