Skip to content
Arthur A Goshtasby edited this page Jan 21, 2025 · 1 revision

Welcome to the MMR-Multi-model-Recognizer wiki!

An image recognizer based on multiple recognizers is designed, implemented, and evaluated. The multi-model recognizer is found to produce a consistently higher accuracy than that obtained by any one of the recognizers.

Recognition accuracy is the most important measure of a recognizer. Different recognizers use different features of objects and different relations between the features to recognize an object. A recognizer may do better on a set of objects in a class than other recognizers. By combining the strengths of individual classifiers, the objective is to let a recognizer that performs better on an object type than other recognizers have the biggest role in determining the identity of the object. Through an evaluation process, the accuracy of a recognizer in identifying each object type is determined. The multi-model recognizer then uses the votes of individual recognizers on the identity of an object and the identity receiving the highest total vote is taken as the identity of the object.

The multi-model recognizer idea is demonstrated using simple rigid objects in an image. As example objects, U.S. coins are used. Five coins in circulation Penny, Nickel, Dime, Quarter, and Dollar are considered. A segmentation method to extract coins from an image is implemented. Due to the reflective nature of coins, means to dull specularities and normalize scene lighting to produce similar images are incorporated in the segmentation process. Instead of coins, industrial parts that come in different sizes and shapes can be used. The only requirement is for the objects to be rigid and have well-defined boundaries.

After extracting coins from an image, each coin is mapped to a 129x129 image with a blank background and interactively labeled. The labeled coins in an image are then saved as a validation dataset. The same dataset is used to create a training dataset that contains various orientations of the same coin. Typically from 3 to 5 orientations are considered. The orientations are not random but rather represent the most dominant orientations of a coin. A nonsymmetric object has a number of dominant orientations. By training a recognizer to remember an object in several of its most dominant orientations, the recognizer will be able to better recognize the same object when seen in its most dominant orientation than seeing it in a random orientation. We can visually recognize a person's face or a complex object when seeing it upside up rather than upside down or sideways.

Clone this wiki locally