Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performace Improvement on the run CLI(ersilia-os#1299) #1351

Closed
wants to merge 33 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
58de64d
Revert "Fetch CLI unecessary docker and model checkup removal(#1299)"
Abellegese Oct 9, 2024
dc28679
Fetch CLI unecessary docker and model checkup removal(#1299)
Abellegese Oct 9, 2024
90f0551
Fetch CLI asynchronous implementation(#1299)
Abellegese Oct 9, 2024
bda3ebe
Revert "Fetch CLI asynchronous implementation(#1299)"
Abellegese Oct 9, 2024
a1b4825
Fetch CLI asynchronous implementation(#1299)
Abellegese Oct 9, 2024
e04ef49
Fetch CLI asynchronous model testing (#1299)
Abellegese Oct 9, 2024
a839749
Performance Improvement on the Airtable Interface
Abellegese Oct 23, 2024
e8271cf
Performace Improvement on the run CLI(#1299)
Abellegese Oct 23, 2024
b48ec59
Merge branch 'master' into improv-2
DhanshreeA Oct 24, 2024
570a00d
Merge branch 'master' into improv-2
DhanshreeA Oct 24, 2024
66a0b5d
Make pyairtable install on the fly
DhanshreeA Oct 24, 2024
143ce5d
Fixing Test Error: async_pull -> pull
Abellegese Oct 24, 2024
c454d7e
Fixing Test Error: Asyncing functions
Abellegese Oct 24, 2024
44c171a
Fixing Test Error: Asyncing functions
Abellegese Oct 24, 2024
9110816
Fixing pytest error: Adding a session to the compound identifier func…
Abellegese Oct 24, 2024
0262954
Fixing pytest error: Adding a test case for smiles checker
Abellegese Oct 24, 2024
1d537ca
Fixing pytest error: updating the test_model_2 function
Abellegese Oct 24, 2024
88af484
Fixing pytest error: Removing is_smile test case
Abellegese Oct 24, 2024
7f10a36
Fixing PR/Build Error: Adding event loop on async execution
Abellegese Oct 24, 2024
1cd73df
Fixing PR/Build Error: Adding event loop on async execution
Abellegese Oct 24, 2024
6caa551
Fixing PR/Build Error: Adding event loop on async execution
Abellegese Oct 24, 2024
f682dd2
Fixing PR/Build Error: Adding event loop on async execution
Abellegese Oct 24, 2024
7080f49
Fixing PR/Build Error: Adding event loop on async execution
Abellegese Oct 24, 2024
eb487d2
Fixing PR/Build Error: Adding event loop on async execution
Abellegese Oct 24, 2024
4342a12
Fixing PR/Build Error: Adding event loop on async execution
Abellegese Oct 24, 2024
2892f35
Fixing PR/Build Error: Adding event loop on async execution
Abellegese Oct 24, 2024
7a85922
Fixing PR/Build Error: Adding event loop on async execution
Abellegese Oct 24, 2024
2de4e98
Merge branch 'master' into improv-2
DhanshreeA Oct 25, 2024
a2d5bd6
Adding the subshell command functionality
Abellegese Oct 26, 2024
511829f
Fixing build error: Applying the nest-asyncio to handle multiple events
Abellegese Oct 26, 2024
c0ed8b5
Fixing build error: Applying the nest-asyncio to handle multiple events
Abellegese Oct 26, 2024
4d4b608
Fixing build error: Applying the nest-asyncio to handle multiple events
Abellegese Oct 26, 2024
9bc5432
Update compound.py
Abellegese Oct 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 17 additions & 13 deletions .github/workflows/pr_check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,38 +45,42 @@ jobs:

- name: Hub catalog
run: |
ersilia catalog

(
ersilia catalog
)
- name: Fetch model from GitHub
run: |
source activate
ersilia -v fetch molecular-weight --from_github
echo "Serving molecular-weight model."
ersilia serve molecular-weight
ersilia info
ersilia run -i "CC(=O)OC1=CC=CC=C1C(=O)O" | grep "180.15899"
ersilia close

(
source activate
ersilia -v fetch molecular-weight --from_github
echo "Serving molecular-weight model."
ersilia serve molecular-weight
ersilia info
ersilia run -i "CC(=O)OC1=CC=CC=C1C(=O)O" | grep "180.15899"
ersilia close
)
- name: Fetch model from S3
run: |
(
source activate
ersilia -v fetch molecular-weight --from_s3
echo "Serving molecular-weight model."
ersilia serve molecular-weight
ersilia info
ersilia run -i "CC(=O)OC1=CC=CC=C1C(=O)O" | grep "180.15899"
ersilia close

)
- name: Fetch model from DockerHub
run: |
(
source activate
ersilia -v fetch molecular-weight --from_dockerhub
echo "Serving molecular-weight model."
ersilia serve molecular-weight
ersilia info
ersilia run -i "CC(=O)OC1=CC=CC=C1C(=O)O" | grep "180.15899"
ersilia close

)
- name: Local catalog
run: |
ersilia catalog --local
(ersilia catalog --local)
19 changes: 17 additions & 2 deletions ersilia/cli/commands/fetch.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,20 @@
import subprocess
import sys

def install_package(package):
subprocess.check_call([sys.executable, '-m', 'pip', 'install', package])

try:
import nest_asyncio
except ImportError:
print("nest_asyncio not found. Installing...")
install_package('nest_asyncio')
import nest_asyncio

nest_asyncio.apply()

import click
import asyncio
from . import ersilia_cli
from .. import echo
from ...hub.fetch.fetch import ModelFetcher
Expand All @@ -9,8 +25,7 @@ def fetch_cmd():
"""Create fetch commmand"""

def _fetch(mf, model_id):
mf.fetch(model_id)

asyncio.run(mf.fetch(model_id))
# Example usage: ersilia fetch {MODEL}
@ersilia_cli.command(
short_help="Fetch model from Ersilia Model Hub",
Expand Down
31 changes: 18 additions & 13 deletions ersilia/db/hubdata/interfaces.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,16 @@
AIRTABLE_MAX_ROWS = 100000
AIRTABLE_PAGE_SIZE = 100


class AirtableInterface(ErsiliaBase):
def __init__(self, config_json):
ErsiliaBase.__init__(self, config_json=config_json)
super().__init__(config_json=config_json)
self.base_id = AIRTABLE_MODEL_HUB_BASE_ID
self.table_name = AIRTABLE_MODEL_HUB_TABLE_NAME
self.max_rows = AIRTABLE_MAX_ROWS
self.page_size = AIRTABLE_PAGE_SIZE
self.write_api_key = None

self.read_only_api_key = None
self.table = self._create_table(api_key=self._get_read_only_airtable_api_key())

def _create_table(self, api_key):
Expand All @@ -26,25 +27,29 @@ def _create_table(self, api_key):
pyairtable = importlib.import_module("pyairtable")
return pyairtable.Table(api_key, self.base_id, self.table_name)

@staticmethod
def _get_read_only_airtable_api_key():
def _get_read_only_airtable_api_key(self):
if self.read_only_api_key:
return self.read_only_api_key

url = "https://ersilia-model-hub.s3.eu-central-1.amazonaws.com/read_only_keys.json"
r = requests.get(url)
data = r.json()
return data["AIRTABLE_READONLY_API_KEY"]
response = requests.get(url)
response.raise_for_status()
data = response.json()
self.read_only_api_key = data["AIRTABLE_READONLY_API_KEY"]
return self.read_only_api_key

def set_write_api_key(self, write_api_key):
self.write_api_key = write_api_key
self.table = self._create_table(api_key=write_api_key)

def items(self):
"""Efficiently yield records using pagination."""
for records in self.table.iterate(
page_size=self.page_size, max_records=self.max_rows
page_size=self.page_size,
max_records=self.max_rows
):
for record in records:
yield record
yield from records

def items_all(self):
records = self.table.all()
for record in records:
yield record
"""Avoid using `all` to prevent memory overload on large tables."""
return self.items()
1 change: 0 additions & 1 deletion ersilia/db/hubdata/json_models_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
class JsonModelsInterface(ErsiliaBase):
def __init__(self, config_json):
ErsiliaBase.__init__(self, config_json=config_json)
# self.cache_dir =
self.json_file_name = MODELS_JSON
self.url = f"https://{ERSILIA_MODEL_HUB_S3_BUCKET}.s3.eu-central-1.amazonaws.com/{MODELS_JSON}"

Expand Down
1 change: 1 addition & 0 deletions ersilia/default.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
"model/framework/example.csv",
"example.csv",
]
UNPROCESSABLE_INPUT ="UNPROCESSABLE_INPUT"
DEFAULT_ERSILIA_ERROR_EXIT_CODE = 1
METADATA_JSON_FILE = "metadata.json"
METADATA_YAML_FILE = "metadata.yml"
Expand Down
27 changes: 18 additions & 9 deletions ersilia/hub/fetch/fetch.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
NotInstallableWithFastAPI,
NotInstallableWithBentoML,
)
from .register.standard_example import ModelStandardExample
from ...utils.exceptions_utils.throw_ersilia_exception import throw_ersilia_exception
from ...default import PACK_METHOD_BENTOML, PACK_METHOD_FASTAPI, EOS, MODEL_SOURCE_FILE
from . import STATUS_FILE, DONE_TAG
Expand Down Expand Up @@ -156,9 +157,13 @@ def _fetch_not_from_dockerhub(self, model_id):
else:
self.logger.debug("Model already exists in your local, skipping fetching")

def _fetch_from_dockerhub(self, model_id):
def _standard_csv_example(self, model_id):
ms = ModelStandardExample(model_id=model_id, config_json=self.config_json)
ms.run()

async def _fetch_from_dockerhub(self, model_id):
self.logger.debug("Fetching from DockerHub")
self.model_dockerhub_fetcher.fetch(model_id=model_id)
await self.model_dockerhub_fetcher.fetch(model_id=model_id)

def _fetch_from_hosted(self, model_id):
self.logger.debug("Fetching from hosted")
Expand Down Expand Up @@ -213,14 +218,14 @@ def exists(self, model_id):
return True
else:
return False
def _fetch(self, model_id):

async def _fetch(self, model_id):

self.logger.debug("Starting fetching procedure")
do_dockerhub = self._decide_if_use_dockerhub(model_id=model_id)
if do_dockerhub:
self.logger.debug("Decided to fetch from DockerHub")
self._fetch_from_dockerhub(model_id=model_id)
await self._fetch_from_dockerhub(model_id=model_id)
return
do_hosted = self._decide_if_use_hosted(model_id=model_id)
if do_hosted:
Expand All @@ -233,10 +238,14 @@ def _fetch(self, model_id):
self.logger.debug("Fetching in your system, not from DockerHub")
self._fetch_not_from_dockerhub(model_id=model_id)

def fetch(self, model_id):
self._fetch(model_id)
async def fetch(self, model_id):
await self._fetch(model_id)
self._standard_csv_example(model_id)
self.logger.debug("Writing model source to file")
model_source_file = os.path.join(self._model_path(model_id), MODEL_SOURCE_FILE)
try:
os.makedirs(self._model_path(model_id), exist_ok=True)
except OSError as error:
print(f"Error during folder creation: {error}")
with open(model_source_file, "w") as f:
f.write(self.model_source)

f.write(self.model_source)
71 changes: 35 additions & 36 deletions ersilia/hub/fetch/lazy_fetchers/dockerhub.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import os
import json
import asyncio
from ..register.register import ModelRegisterer

from .... import ErsiliaBase, throw_ersilia_exception
Expand All @@ -19,10 +20,9 @@
from ....utils.exceptions_utils.fetch_exceptions import DockerNotActiveError
from .. import STATUS_FILE


class ModelDockerHubFetcher(ErsiliaBase):
def __init__(self, overwrite=None, config_json=None):
ErsiliaBase.__init__(self, config_json=config_json, credentials_json=None)
super().__init__(config_json=config_json, credentials_json=None)
self.simple_docker = SimpleDocker()
self.overwrite = overwrite

Expand All @@ -42,28 +42,28 @@ def is_available(self, model_id):
return True
return False

def write_apis(self, model_id):
async def write_apis(self, model_id):
self.logger.debug("Writing APIs")
di = PulledDockerImageService(
model_id=model_id, config_json=self.config_json, preferred_port=None
)
di.serve()
di.close()

def _copy_from_bentoml_image(self, model_id, file):
fr_file = "/root/eos/dest/{0}/{1}".format(model_id, file)
to_file = "{0}/dest/{1}/{2}".format(EOS, model_id, file)
self.simple_docker.cp_from_image(
async def _copy_from_bentoml_image(self, model_id, file):
fr_file = f"/root/eos/dest/{model_id}/{file}"
to_file = f"{EOS}/dest/{model_id}/{file}"
await self.simple_docker.cp_from_image(
img_path=fr_file,
local_path=to_file,
org=DOCKERHUB_ORG,
img=model_id,
tag=DOCKERHUB_LATEST_TAG,
)

def _copy_from_ersiliapack_image(self, model_id, file):
fr_file = "/root/{0}".format(file)
to_file = "{0}/dest/{1}/{2}".format(EOS, model_id, file)
async def _copy_from_ersiliapack_image(self, model_id, file):
fr_file = f"/root/{file}"
to_file = f"{EOS}/dest/{model_id}/{file}"
self.simple_docker.cp_from_image(
img_path=fr_file,
local_path=to_file,
Expand All @@ -72,30 +72,30 @@ def _copy_from_ersiliapack_image(self, model_id, file):
tag=DOCKERHUB_LATEST_TAG,
)

def _copy_from_image_to_local(self, model_id, file):
async def _copy_from_image_to_local(self, model_id, file):
pack_method = resolve_pack_method_docker(model_id)
if pack_method == PACK_METHOD_BENTOML:
self._copy_from_bentoml_image(model_id, file)
else:
self._copy_from_ersiliapack_image(model_id, file)

def copy_information(self, model_id):
async def copy_information(self, model_id):
self.logger.debug("Copying information file from model container")
self._copy_from_image_to_local(model_id, INFORMATION_FILE)

def copy_metadata(self, model_id):
async def copy_metadata(self, model_id):
self.logger.debug("Copying api_schema_file file from model container")
self._copy_from_image_to_local(model_id, API_SCHEMA_FILE)

def copy_status(self, model_id):
async def copy_status(self, model_id):
self.logger.debug("Copying status file from model container")
self._copy_from_image_to_local(model_id, STATUS_FILE)
def copy_example_if_available(self, model_id):
# TODO This also needs to change to accomodate ersilia pack

async def copy_example_if_available(self, model_id):
# This needs to accommodate ersilia pack
for pf in PREDEFINED_EXAMPLE_FILES:
fr_file = "/root/eos/dest/{0}/{1}".format(model_id, pf)
to_file = "{0}/dest/{1}/{2}".format(EOS, model_id, "input.csv")
fr_file = f"/root/eos/dest/{model_id}/{pf}"
to_file = f"{EOS}/dest/{model_id}/input.csv"
try:
self.simple_docker.cp_from_image(
img_path=fr_file,
Expand All @@ -108,12 +108,9 @@ def copy_example_if_available(self, model_id):
except:
self.logger.debug("Could not find example file in docker image")

def modify_information(self, model_id):
async def modify_information(self, model_id):
"""
Modify the information file being copied from docker container to the host machine.
:param file: The model information file being copied.
:param service_class_file: File containing the model service class.
:size_file: File containing the size of the pulled docker image.
"""
information_file = os.path.join(self._model_path(model_id), INFORMATION_FILE)
mp = ModelPuller(model_id=model_id, config_json=self.config_json)
Expand All @@ -124,24 +121,26 @@ def modify_information(self, model_id):
self.logger.error("Information file not found, not modifying anything")
return None

# Using this literal here to prevent a file read
# from service class file for a model fetched through DockerHub
# since we already know the service class.
data["service_class"] = "pulled_docker"
data["size"] = mp._get_size_of_local_docker_image_in_mb() # TODO this should probably be a util function
data["size"] = mp._get_size_of_local_docker_image_in_mb()

with open(information_file, "w") as outfile:
json.dump(data, outfile, indent=4)

@throw_ersilia_exception
def fetch(self, model_id):
async def fetch(self, model_id):
mp = ModelPuller(model_id=model_id, config_json=self.config_json)
self.logger.debug("Pulling model image from DockerHub")
mp.pull()
# Asynchronous pulling
await mp.pull()
mr = ModelRegisterer(model_id=model_id, config_json=self.config_json)
mr.register(is_from_dockerhub=True)
self.write_apis(model_id)
self.copy_information(model_id)
self.modify_information(model_id)
self.copy_metadata(model_id)
self.copy_status(model_id)
self.copy_example_if_available(model_id)
# Asynchronous and concurent execution
await asyncio.gather(
mr.register(is_from_dockerhub=True),
self.write_apis(model_id),
self.copy_information(model_id),
self.modify_information(model_id),
self.copy_metadata(model_id),
self.copy_status(model_id),
self.copy_example_if_available(model_id)
)
2 changes: 1 addition & 1 deletion ersilia/hub/fetch/register/register.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ def register_not_from_hosted(self):
with open(file_name, "w") as f:
json.dump(data, f)

def register(self, is_from_dockerhub=False, is_from_hosted=False):
async def register(self, is_from_dockerhub=False, is_from_hosted=False):
if is_from_dockerhub and is_from_hosted:
raise Exception
if is_from_dockerhub and not is_from_hosted:
Expand Down
Loading
Loading