You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, extracting data from ORCA computational chemistry log files is based on rule-based pattern matching, which has several limitations, like constant maintenance of rules, limited flexibility for different ORCA versions, etc.
Describe the solution you'd like
Need to analyze various NLP approaches that could provide more robust and flexible data extraction:
Text Chunking/Segmentation Methods
Evaluate algorithms for identifying section boundaries
Compare with the current rule-based approach
Section Classification Methods
TF-IDF vs Word Embeddings
BERT vs simpler approaches
Data Structure Recognition
Methods for table/coordinate data extraction
Accuracy requirements
Acceptance Criteria
Analysis document comparing different NLP methods
Recommendations for best approach
Implementation considerations
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Currently, extracting data from ORCA computational chemistry log files is based on rule-based pattern matching, which has several limitations, like constant maintenance of rules, limited flexibility for different ORCA versions, etc.
Describe the solution you'd like
Need to analyze various NLP approaches that could provide more robust and flexible data extraction:
Acceptance Criteria
The text was updated successfully, but these errors were encountered: