Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move fragment types to constant #263

Open
wants to merge 4 commits into
base: development
Choose a base branch
from
Open

Conversation

GeorgWa
Copy link
Collaborator

@GeorgWa GeorgWa commented Dec 29, 2024

So far the definition of what is a valid fragment type was somewhat loose.
This PR:

  • defines all valid fragment types as FRAGMENT_TYPES which allows checking if a type is supported
  • Drops partial support for non charge fragment types
  • Issues depracation warnings for old datastructures
  • Have defined order for charged frag types so mismatch of prediction models, libraries is mitigated

@GeorgWa GeorgWa requested review from jalew188 and mschwoer December 29, 2024 15:30
@GeorgWa
Copy link
Collaborator Author

GeorgWa commented Dec 29, 2024

This currently fails because the pinned version rdkit==2024.3.3 is being ignored

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Collaborator

@jalew188 jalew188 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

"y_modloss": FragmentType(
name="y_modloss",
ref_ion="y",
formula="N(-1)H(-2)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loss formula?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check others

"c_lossH": "b+N(1)H(2)",
"z_addH": "y+N(-1)H(-1)",

class DIRECTION:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direction (also LOSS, LOSS_INVERSE.. )

@@ -66,11 +302,20 @@ def parse_all_frag_type_representation():
parse_all_frag_type_representation()


def sort_charged_frag_types(charged_frag_types: List[str]) -> List[str]:
"""charged frag types are sorted by (no-loss, loss) and then alphabetically"""
has_loss = [f.count("_") > 1 for f in charged_frag_types]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_loss = [f.replace(FRAGMENT_CHARGE_SEPARATOR, "").count("_") > 0 for f in charged_frag_types]

@@ -93,9 +338,12 @@ def get_charged_frag_types(
"""
charged_frag_types = []
for _type in frag_types:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_type->frag_type, _ch -> charge?


if FRAGMENT_CHARGE_SEPARATOR in charged_frag_type:
_type, _ch = charged_frag_type.split(FRAGMENT_CHARGE_SEPARATOR)
return _type, int(_ch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you could raise if _ch is not int

return _type, int(_ch)

if FRAGMENT_CHARGE_SEPARATOR in charged_frag_type:
_type, _ch = charged_frag_type.split(FRAGMENT_CHARGE_SEPARATOR)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_type->frag_type, _ch -> charge?

else:
frag_directions.append(0)
for charged_frag_type in fragment_mz_df.columns.values:
frag_type, charge = parse_charged_frag_type(charged_frag_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you knew it all along!

for charged_frag_type in fragment_mz_df.columns.values:
frag_type, charge = parse_charged_frag_type(charged_frag_type)
frag_charges.append(charge)
frag_types.append(FRAGMENT_TYPES[frag_type].series)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new implementation really offers enhanced maintainability and readability!


import pandas as pd
import tqdm


# Create a warning class for deprecation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I could improve only one thing in code generated by LLMs, it would be "drop this comment describing what the next line of code does".


# because we dont know the loss, we assume every loss type is phospho
class LOSS:
MODLOSS = 98
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe give some context here what these numbers mean?

Comment on lines +28 to 44
class LOSS:
MODLOSS = 98
H2O = 18
NH3 = 17
LOSSH = 1
ADDH = 2
NONE = 0


LOSS_INVERSE = {
18: "H2O",
17: "NH3",
98: "modloss",
1: "lossH",
0: "",
2: "addH",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using this idiom:

class Losses:
  """String contants defining losses.""
  H2O = "H2O"
...

LOSS_MAPPING = {
  """Mapping of loss names to ... .""
   Losses.H2O: 18,
...
}

LOSS_MAPPING_INV = { v : k for k, v in LOSS_MAPPING.items() }

(same for SERIES)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants