Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you have plan to support yolov obb model? #132

Open
saurabh-git-dev opened this issue Oct 11, 2024 · 5 comments
Open

Do you have plan to support yolov obb model? #132

saurabh-git-dev opened this issue Oct 11, 2024 · 5 comments

Comments

@saurabh-git-dev
Copy link

Is there any plan to implement any of the latest yolo obb models in the near future?
Mainly Writing post-processing is not easy for everyone. So I can't move forward with that.

@tan199954
Copy link

tan199954 commented Oct 28, 2024

@saurabh-git-dev Have you found a solution yet?

@saurabh-git-dev
Copy link
Author

@tan199954
Copy link

@saurabh-git-dev
I used assistance from ChatGPT, and my code is now working

import numpy as np
import math

REGRESSION_LENGTH = 15
STRIDES = [8, 16, 32]
names = ['plane', 'ship', 'storage tank', 'baseball diamond', 'tennis court', 'basketball court', 
          'ground track field', 'harbor', 'bridge', 'large vehicle', 'small vehicle', 
          'helicopter', 'roundabout', 'soccer ball field', 'swimming pool']

def softmax(x):
    return np.exp(x) / np.expand_dims(np.sum(np.exp(x), axis=-1), axis=-1)

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def _yolov8_obb_decoding(raw_boxes, angles, strides, image_dims, reg_max):
    boxes = None
    for box_distribute, stride, angle in zip(raw_boxes, strides, angles):
        # create grid
        shape = [int(x / stride) for x in image_dims]
        grid_x = np.arange(shape[1]) + 0.5
        grid_y = np.arange(shape[0]) + 0.5
        grid_x, grid_y = np.meshgrid(grid_x, grid_y)
        ct_row = grid_y.flatten() * stride
        ct_col = grid_x.flatten() * stride
        center = np.stack((ct_col, ct_row), axis=1)

        # box distribution to distance
        reg_range = np.arange(reg_max + 1)
        box_distribute = np.reshape(
            box_distribute, (-1, box_distribute.shape[1] * box_distribute.shape[2], 4, reg_max + 1)
        )
        box_distance = softmax(box_distribute)
        box_distance = box_distance * np.reshape(reg_range, (1, 1, 1, -1))
        box_distance = np.sum(box_distance, axis=-1)

        lt = box_distance[...,:2]
        rb = box_distance[...,2:]
        cos = np.cos(angle)
        sin = np.sin(angle)

        xf, yf = np.split((rb - lt) / 2, 2, axis=-1)       
        x = xf * cos - yf * sin
        y = xf * sin + yf * cos

        xy = np.concatenate([x, y], axis=-1)
        xywh_box = np.concatenate([xy, lt + rb], axis=-1) * stride
        xywh_box[..., :2] += np.expand_dims(center, axis=0)

        boxes = xywh_box if boxes is None else np.concatenate([boxes, xywh_box], axis=1)
    return boxes
def generate_yolo_predictions(endnodes):
    """
    endnodes is a list of 9 tensors:
        endnodes[0]:  bbox output with shapes (BS, 20, 20, 64)
        endnodes[1]:  scores output with shapes (BS, 20, 20, 80)
        endnodes[2]:  angles output with shapes (BS, 20, 20, 1)
        endnodes[3]:  bbox output with shapes (BS, 40, 40, 64)
        endnodes[4]:  scores output with shapes (BS, 40, 40, 80)
        endnodes[5]:  angles output with shapes (BS, 20, 20, 1)
        endnodes[6]:  bbox output with shapes (BS, 80, 80, 64)
        endnodes[7]:  scores output with shapes (BS, 80, 80, 80)
        endnodes[8]:  angles output with shapes (BS, 20, 20, 1)
    Returns:
        numpy.ndarray: A concatenated array of shape (BS, total_predictions, 5 + num_classes) where:
            - `total_predictions` is the sum of predictions across all scales (20x20, 40x40, 80x80).
            - Each prediction contains:
                - `4` values for the bounding box coordinates in the format [x, y, w, h].
                - `1` value representing the angle of rotation.
                - `num_classes` values for the confidence scores for each class.
    """
    image_dims = (640, 640)
    raw_boxes = endnodes[:7:3]
    angles = [np.reshape(s, (-1, s.shape[1] * s.shape[2], 1)) for s in endnodes[2::3]]
    angles = [(sigmoid(x) - 0.25) * math.pi for x in angles]
    decoded_boxes = _yolov8_obb_decoding(raw_boxes, angles, STRIDES, image_dims, REGRESSION_LENGTH)
    scores = [np.reshape(s, (-1, s.shape[1] * s.shape[2], len(names))) for s in endnodes[1:8:3]]
    scores = np.concatenate(scores, axis=1)
    angles = np.concatenate(angles, axis=1)
    return np.concatenate([decoded_boxes, scores, angles], axis=2)

@saurabh-git-dev
Copy link
Author

saurabh-git-dev commented Nov 6, 2024

@tan199954
Are you able to post-process and can see rotated detections?

I think you also need to implement the Rotated NMS.

@tan199954
Copy link

@saurabh-git-dev
i'm using the non_max_suppression function from ultralytics with torch cpu.
i convert the output of the generate_yolo_predictions function to torch and then transpose to (batch_size, num_classes + 5, num_boxes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants