-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: remove BEM-specific code and generalize PDF processing #159
Conversation
☂️ Python Coverage
Overall Coverage
New FilesNo new covered files... Modified Files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the generalization of the PDF code!
app/src/ingestion/pdf_stylings.py
Outdated
@@ -319,15 +319,11 @@ def _create_phrase(self, parent_node: Element | None, child: Text) -> Phrase | N | |||
return Phrase(text=child.data, bold=bolded) | |||
|
|||
|
|||
class BemTagExtractor(TagExtractor): | |||
class GenericTagExtractor(TagExtractor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't call this Generic since it was implemented specifically for BEM pdfs. Either remove it entirely or leave it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove -- I noticed this file was all specific BEM if I'm not mistaken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this file pdf_stylings.py
as it was only used by BEM
app/src/ingest_policy_pdfs.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KevinJBoyer Should this file go away as well (probably in another PR)? I think this was for some Michigan PDF that was different from BEM pdfs -- #31
9a4e794
to
8db3a3d
Compare
app/src/format.py
Outdated
def _get_bem_documents_to_show( | ||
def _get_documents_to_show( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is not called by any remaining code, remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
Co-authored-by: Yoom Lam <[email protected]>
Ticket
https://navalabs.atlassian.net/browse/DST-620
Changes
Removed BEM (Bridges Eligibility Manual) specific components:
BridgesEligibilityManualEngine
class from chat engineBemFormattingConfig
class and related formatting logicbem_util.py
)ingest_bem_pdfs.py
,ingest_policy_pdfs.py
)Updated tests:
Updated configurations:
Context for reviewers
PR removes all Bridges Eligibility Manual (BEM) specific code as part of our effort to make the codebase more generic and maintainable. The PDF processing functionality has been generalized to work with any PDF document, not just BEM files.
Testing
make init start
to start the application