Q1. Describe what would be the best instance segmentation model to get the highest precision. Explain why do you think it will be the best performing model.
Ans. I have read about three instance segmentation model. FCIS Mask-RCNN Yolact Yolact++
I have used Mask-RCNN and Yolact for my projects.
**We can also look into the benchmark **
For me Yolact is the best instance segmentation model because it is a single stage detector extension which performs instance segmentation by breaking into subtasks , they forgo explicitly the localisation step.
The network learns to localise masks on its own where visually , spatially and semantically similar instances appear in the prototypes .The number of prototype masks in YOLACT is independent of the number of categories ,this leads to distributed representation in the prototype space , this behaviour leads to following advantages
Some prototype spatially partition the image Some localize the instances Some detect instance contours Some encode position-sensitive directional maps Some do the combo of the above operations
They have many practical advantages with this approach which are mentioned below
Lightweight assembly process due to parallel structure Marginal amount of computational overhead to one-stage detectors like ResNet101[5] Masks quality are high Generic concept of adding of generating prototypes and mask coefficients which could be added to almost any modern object detector
For large objects, the quality of the masks is even better than those of two-stage detectors.
Q.2 List some possible challenges if implementing such model into production. For this question, assume that you will have enough images for training the model.
Challenges:
Localization Failure : If there are too many objects in one spot in a scene, the network can fail to localize each object in its own prototype. In these cases, the network will output something closer to a foreground mask than an instance segmentation for some objects in the group. Suppose we are capturing field images with low resolution(10 meter resolution) and some fields will be small in that region. With 10 meter resolution area of small fields in image would we small. If there are multiple small fields align to each other. Mask of our model going to overlap on each other.
Leakage : The network leverages the fact that masks are cropped after assembly, and makes no attempt to suppress noise outside of the cropped region. This works fine when the bounding box is accurate, but when it is not, that noise can creep into the instance mask, creating some “leakage” from outside the cropped region. This can also happen when two instances are far away from each other, because the network has learned that it doesn’t need to localize far away instances the cropping will take care of it.
Q.3 Experiment with an instance segmentation model using the images provided (you can do image augmentation) and explain the results. It can be any model (you can re-use anything you have done previously or clone any repo of your choice and use it); it does not need to be the model described in 1). We are not testing precision only; we are going to evaluate how you solve open challenges.
Well above image are the outputs from trained model.
- Lowering the image size results in a large decrease in performance, demonstrating that instance segmentation naturally demands larger images.
- Increasing the image size decreases speed significantly but also increases performance, as expected.
- In addition to the base backbone of ResNet-101 we can also test ResNet-50 and DarkNet-53 to obtain even faster results.
- If higher speeds are preferable then use ResNet-50 or DarkNet-53 instead of lowering the image size, as the performance of these configurations is much better, while only being slightly slower.
- In Mask-rcnn the masks produced in two-stage methods are highly dependent on their region proposals in the first stage since Yolact is one-stage detector even if the model predicts different boxes across frames, the prototypes are not affected, yielding much more temporally stable masks.
As per as my journey on instance segment this is the best architecture which is currently available which balances the trade off between accuracy vs performance.