M1 - Error - command buffer exited with error status. #196

pretbc · 2023-11-20T13:46:59Z

I tried to set device to be 'mps'

and with that detect an image:

def extract_fau_from_images(images: list[str], extractor: ExtendedFeatDetector, **kwargs) -> torch.Tensor:
    """Extract face action units + emotion scores"""
    det_fex = extractor.detect_image(images[1])
    return _fex_to_tensor(det_fex, **kwargs)

error occurred:

ValueError: when using a batch_size > 1 all images must have the same dimensions or output_size must not be None so py-feat can rescale images to output_size. See pytorch error: 
slow_conv2d_forward_mps: input(device='cpu') and weight(device=mps:0')  must be on the same device

so I started from /feat/detector.py", line 492, in detect_facepose to check where data come with mismatch.

and when I added here img2pose_test.py:

    def scale_and_predict(self, img, euler=True):
        """Runs a prediction on the passed image. Returns detected faces and associates poses.
        Args:
            img (tensor): A torch tensor image
            euler (bool): set to True to obtain euler angles, False to obtain rotation vector

        Returns:
            dict: key 'pose' contains array - [yaw, pitch, roll], key 'boxes' contains 2D array of bboxes
        """

        # Transform image to improve model performance. Resize the image so that both dimensions are in the range [MIN_SIZE, MAX_SIZE]
        scale = 1
        border_size = 0
        if min(img.shape[-2:]) < self.MIN_SIZE or max(img.shape[-2:]) > self.MAX_SIZE:
            logging.info(
                f"img2pose: RESCALING WARNING: img2pose has a min img size of {self.MIN_SIZE} and a max img size of {self.MAX_SIZE} but checked value is {img.shape[-2:]}."
            )
            transform = Compose([Rescale(self.MAX_SIZE, preserve_aspect_ratio=True)])
            transformed_img = transform(img)
            img = transformed_img["Image"]
            scale = transformed_img["Scale"]
        img = img.to('mps'). < ----- ADDED

there is no more mismatch issue but M1 goes crazy

  0%|          | 0/1 [00:00<?, ?it/s]Error: command buffer exited with error status.
	The Metal Performance Shaders operations encoded on it may not have completed.
	Error: 
	(null)
	Internal Error (0000000e:Internal Error)
	<AGXG13GFamilyCommandBuffer: 0x2aee094b0>
    label = <none> 
    device = <AGXG13GDevice: 0x29928c800>
        name = Apple M1 
    commandQueue = <AGXG13GFamilyCommandQueue: 0x29931e800>
        label = <none> 
        device = <AGXG13GDevice: 0x29928c800>
            name = Apple M1 
    retainedReferences = 1
Error: command buffer exited with error status.

The text was updated successfully, but these errors were encountered:

ljchang · 2024-01-03T16:20:02Z

Thanks for reporting this @pretbc. This is currently a known issue also reported in #187 , but I'm hoping will eventually solve itself as pytorch 2.0 becomes more mature and slowly adds support for mps across their base models.

We have found wonky results with mps, where sometimes it is slower than just CPU. I suspect this is only going to better in the future. We have started tracking speeds and which models are supported by MPS on this issue #184 .

ljchang added the known issue label Jan 6, 2024

ljchang mentioned this issue Aug 5, 2024

Create a faster version of detection using pytorch computational graph #229

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

M1 - Error - command buffer exited with error status. #196

M1 - Error - command buffer exited with error status. #196

pretbc commented Nov 20, 2023

ljchang commented Jan 3, 2024

M1 - Error - command buffer exited with error status. #196

M1 - Error - command buffer exited with error status. #196

Comments

pretbc commented Nov 20, 2023

ljchang commented Jan 3, 2024