-
-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐕 Batch: Add Output Validation for Model API Results in Playground and Model Tester Pipelines #1488
Comments
Hi @Abellegese I am a bit confused with the changes that will be implemented in the test command itself vs the testing playground. Wasn't the test command already checking that the output was correct? Would it pass the existing test if the output was None or Null in the existing version previous to refactoring? |
Hi @GemmaTuron yes the test indeed checks the output but it reads the values from a csv (missing json and h5). Here is the code below that does checking values extracted from csv output. This checks two output generated from 1) ersilia run CLI and 2) from the running the model def validate_output(output1, output2):
if not isinstance(output1, type(output2)):
raise texc.InconsistentOutputTypes(self.model_id)
if output1 is None:
return
if isinstance(output1, (float, int)):
rmse = compute_rmse([output1], [output2])
if rmse > 0.1:
raise texc.InconsistentOutputs(self.model_id)
rho, _ = spearmanr([output1], [output2])
if rho < 0.5:
raise texc.InconsistentOutputs(self.model_id)
elif isinstance(output1, list):
rmse = compute_rmse(output1, output2)
if rmse > 0.1:
raise texc.InconsistentOutputs(self.model_id)
rho, _ = spearmanr(output1, output2)
if rho < 0.5:
raise texc.InconsistentOutputs(self.model_id)
elif isinstance(output1, str):
if _compare_output_strings(output1, output2) <= 95:
raise texc.InconsistentOutputs(self.model_id) |
So @GemmaTuron here I want to make more general testing system for three output file type also guided by ersilia general datastructure definition: # Supported data structures
ds = {
"Single": lambda x: isinstance(x, list) and len(x) == 1,
"List": lambda x: isinstance(x, list)
and len(x) > 1
and all(isinstance(item, (int, float)) for item in x),
"Flexible List": lambda x: isinstance(x, list)
and all(isinstance(item, (str, int, float)) for item in x),
"Matrix": lambda x: isinstance(x, list)
and all(
isinstance(row, list)
and all(isinstance(item, (int, float)) for item in row)
for row in x
),
"Serializable Object": lambda x: isinstance(x, dict),
} In the model tester I proposed double layer check, meaning decoupling model api result (accessing raw response from the model) and ersilia system that converts those api results to the specified output files. In the first layer checks we perform several checks on the model api result, guided by the above datastructure and also not violating the invalid values rule such as None, null ... The reason is that those three output file will be generated by ersilia so at some point if something changes in the ersilia system we may decide that the model has something wrong, which would entirely be false, because the final files are ersilia system dependent and will not be sufficient to decide our model healthiness. Having this first layer checks passed, whatever files check comes after that and fails, wont be a model problem, this let use know the source of the problem. Whereas in the playground test we do have all those checks except the double check layer, which is more important to test model that are under dev or maintenance. Does it makes sense? |
Hi @Abellegese I completely understand your point about decoupling systems and testing them individually rather than the integration of the "ersilia CLI + model" set up. However, it does not do much for us since the final outcome, ie the model working or not working will be considered in light of the entire system. That is, if the model API does indeed return a response that is correct, but Ersiila doesn't handle it correctly, the eventual effect will be that the model won't be considered "working", because it will not be useful for the users. I am considering the tradeoff here in terms of usefulness and implementation time - as a proxy measure you can see that the model API has returned something by looking at the response status code from the API in the logs from making a prediction request to it. In any case, the test for running the bash script which generates a CSV file is a good enough test that the model itself is working. I don't see a strong argument for the decoupling approach. |
Description
The current models process SMILES inputs (string, list, or
.csv
) and generate various types of outputs, including numeric values, text (e.g., for generative models), boolean values and other values. Checking and validating the output from those models helps to ensures consistent functionalities when they exposed to several refactor and changes. Currently Ersilia has two main testing pipeline to check models if they work properly and report issues otherwise, utilising thetest
command and playground test. These pipelines lack features to perform several checks on output of the models API. This issue focuses on defining and implementing validation checks for outputs returned by the Ersilia Model Hub APIs.The goal is to:
check
function in theCheckService
class for the model tester pipeline to validate results programmatically.Details
To achieve the most out of the checking and validating of models output and to address several output type, we can use information from
metadata
, a file that has all necessary information about the model.Ersilia Test Command Features (Double Layer checks)
a) First Level Checks: focusing on the two info for now, which are
Output Type
andOutput Shape
. Given input as string, list and .csv file. This can be achieved by creating a simple post code to the api after serve. This is important because we this commands test model healthiness, accessing raw output from the api and validate that would be invaluable.null
,''
,None
, andNAN
.DataType
(for instanceinteger
,float
,bool
) andDataStruture
(List, Single,...).b) After all first level check passes, we sencondly check for:
json
,h5
`) type is correctly created and has a proper value in it. Since the value is validated in the first check, if things fail in this stage, its likely because of ersilia not the models themselves. Decoupling systems is really important.Playground Test:
null
,''
,None
,NAN
,DataType
,DataStruture
and length matching between input output.Tasks
Playground Test: Add Rule-Based Checks and Feature Update
config.yml
from anox
sessions.Model Tester: Add
CheckService
Functioncheck
function in theCheckService
class.True
/False
results.json
,h5
) and validatecheck
function to ensure its correctness.Integration and Documentation
ersilia test
command user guide for how rules apply during testing.Objective(s)
No response
Documentation
No response
The text was updated successfully, but these errors were encountered: