Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bug: Ersilia fetch/serve fails but model appears on catalog #1505

Open
GemmaTuron opened this issue Jan 14, 2025 · 3 comments
Open

🐛 Bug: Ersilia fetch/serve fails but model appears on catalog #1505

GemmaTuron opened this issue Jan 14, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@GemmaTuron
Copy link
Member

GemmaTuron commented Jan 14, 2025

eos69p9_serve.txt

Describe the bug.

Hi,

I tried to get a model through the CLI directly using the serve command (Docker inactive, so it will try from S3) but it crashed (See attached error log). Nonetheless, if I immediatly after do ersilia catalog --local --more, the model appears:

(ersilia) gturon@pujarnol:~$ ersilia catalog --local --more
┌───────┬────────────┬──────────────────────┬──────────────────────────────────────────────────────────────────────────┬────────────────────┬─────────────┬─────────────────┬──────────────┬──────────────┐
│ Index | Identifier | Slug                 | Title                                                                    | Task               | Input Shape | Output          | Output Shape | Model Source │
├───────┼────────────┼──────────────────────┼──────────────────────────────────────────────────────────────────────────┼────────────────────┼─────────────┼─────────────────┼──────────────┼──────────────┤
│ 1     | eos78ao    | mordred              | Mordred chemical descriptors                                             | ['Representation'] | Single      | ['Descriptor']  | List         | DockerHub    │
├───────┼────────────┼──────────────────────┼──────────────────────────────────────────────────────────────────────────┼────────────────────┼─────────────┼─────────────────┼──────────────┼──────────────┤
│ 2     | eos5axz    | morgan-counts        | Morgan counts fingerprints                                               | ['Representation'] | Single      | ['Descriptor']  | List         | DockerHub    │
├───────┼────────────┼──────────────────────┼──────────────────────────────────────────────────────────────────────────┼────────────────────┼─────────────┼─────────────────┼──────────────┼──────────────┤
│ 3     | eos69p9    | ssl-gcn-tox21        | Toxicity prediction across the Tox21 panel with semi-supervised learning | ['Classification'] | Single      | ['Probability'] | List         |              │
├───────┼────────────┼──────────────────────┼──────────────────────────────────────────────────────────────────────────┼────────────────────┼─────────────┼─────────────────┼──────────────┼──────────────┤
│ 4     | eos2gw4    | eosce                | Ersilia Compound Embeddings                                              | ['Representation'] | Single      | ['Descriptor']  | List         | DockerHub    │
├───────┼────────────┼──────────────────────┼──────────────────────────────────────────────────────────────────────────┼────────────────────┼─────────────┼─────────────────┼──────────────┼──────────────┤
│ 5     | eos3b5e    | molecular-weight     | Molecular weight                                                         | ['Regression']     | Single      | ['Other value'] | Single       | DockerHub    │
├───────┼────────────┼──────────────────────┼──────────────────────────────────────────────────────────────────────────┼────────────────────┼─────────────┼─────────────────┼──────────────┼──────────────┤
│ 6     | eos4avb    | image-mol-embeddings | Molecular representation learning                                        | ['Representation'] | Single      | ['Descriptor']  | Matrix       | DockerHub    │
├───────┼────────────┼──────────────────────┼──────────────────────────────────────────────────────────────────────────┼────────────────────┼─────────────┼─────────────────┼──────────────┼──────────────┤
│ 7     | eos4u6p    | cc-signaturizer      | Chemical Checker signaturizer                                            | ['Representation'] | Single      | ['Descriptor']  | List         | DockerHub    │
├───────┼────────────┼──────────────────────┼──────────────────────────────────────────────────────────────────────────┼────────────────────┼─────────────┼─────────────────┼──────────────┼──────────────┤
│ 8     | eos3cf4    | molfeat-chemgpt      | ChemGPT-4.7                                                              | ['Representation'] | Single      | ['Descriptor']  | List         | DockerHub    │
├───────┼────────────┼──────────────────────┼──────────────────────────────────────────────────────────────────────────┼────────────────────┼─────────────┼─────────────────┼──────────────┼──────────────┤
│ 9     | eos7w6n    | grover-embedding     | Large-scale graph transformer                                            | ['Representation'] | Single      | ['Descriptor']  | List         | DockerHub    │
├───────┼────────────┼──────────────────────┼──────────────────────────────────────────────────────────────────────────┼────────────────────┼─────────────┼─────────────────┼──────────────┼──────────────┤
│ 10    | eos7jio    | rdkit-fingerprint    | Path-based fingerprint                                                   | ['Representation'] | Single      | ['Descriptor']  | List         | DockerHub    │
└───────┴────────────┴──────────────────────┴──────────────────────────────────────────────────────────────────────────┴────────────────────┴─────────────┴─────────────────┴──────────────┴──────────────┘

The model source does not appear, which indicates it has failed but it would be good to have a way to catch this or add a note to users that a certain model is not working?

I'll tag this as an addition as it is not critical

Describe the steps to reproduce the behavior

No response

Operating environment

Ubuntu 24.02 LTS

@GemmaTuron GemmaTuron added the bug Something isn't working label Jan 14, 2025
@DhanshreeA
Copy link
Member

DhanshreeA commented Jan 15, 2025

This info bug is very similar to what @Abellegese was describing having faced with bentoml. But in any case, we should just delete the model and its artifacts if it fails to fetch. We presently only do it if the model fails to generate a Standard Model Example, as you can see in this snippet from the referenced code:

        fr = await self._fetch(model_id)
        if fr.fetch_success:
            try:
                self._standard_csv_example(model_id)
            except StandardModelExampleError:
                self.logger.debug("Standard model example failed, deleting artifacts")
                do_delete = yes_no_input(
                    "Do you want to delete the model artifacts? [Y/n]",
                    default_answer="Y",
                )
                if do_delete:
                    md = ModelFullDeleter(overwrite=False)
                    md.delete(model_id)
                return FetchResult(
                    fetch_success=False,
                    reason="Could not successfully run a standard example from the model.",
                )
            else:
                self.logger.debug("Writing model source to file")
                model_source_file = os.path.join(
                    self._model_path(model_id), MODEL_SOURCE_FILE
                )
                try:
                    os.makedirs(self._model_path(model_id), exist_ok=True)
                except OSError as error:
                    self.logger.error(f"Error during folder creation: {error}")
                with open(model_source_file, "w") as f:
                    f.write(self.model_source)
                return FetchResult(
                    fetch_success=True, reason="Model fetched successfully"
                )
        else:
            return fr

I think we should encapsulate all BentoML related sub-process calls with a general BentoMLError, and catch that in downstream code, and then use that to do something similar to above and delete model artifacts if the fetching fails for whatever reason.

@Abellegese
Copy link
Contributor

Yes exactly @DhanshreeA @GemmaTuron . Quick fix is to do the following

pip uninstall bentoml
# then just call this bentoml commamd
bentoml --version

@DhanshreeA
Copy link
Member

@OlawumiSalaam this might be interesting, and definitely more of a deep dive than your current task. Please take a look when you can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: On Hold
Development

No branches or pull requests

3 participants