Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: remove BEM-specific code and generalize PDF processing #159

Merged
merged 14 commits into from
Dec 18, 2024

Conversation

fg-nava
Copy link
Contributor

@fg-nava fg-nava commented Dec 17, 2024

Ticket

https://navalabs.atlassian.net/browse/DST-620

Changes

  • Removed BEM (Bridges Eligibility Manual) specific components:

    • Removed BridgesEligibilityManualEngine class from chat engine
    • Removed BemFormattingConfig class and related formatting logic
    • Removed BEM-specific utility files (bem_util.py)
    • Removed BEM PDF ingestion code (ingest_bem_pdfs.py, ingest_policy_pdfs.py)
  • Updated tests:

    • Removed BEM-specific test files
    • Updated remaining tests to use generic functionality
    • Made embedding tests more resilient with mock transformers
  • Updated configurations:

    • Removed BEM-related targets from Makefile
    • Removed BEM-related scripts from pyproject.toml

Context for reviewers

PR removes all Bridges Eligibility Manual (BEM) specific code as part of our effort to make the codebase more generic and maintainable. The PDF processing functionality has been generalized to work with any PDF document, not just BEM files.

Testing

  1. Ran tests:
$ make lint # All checks passed
$ make test # All 164 tests passed
  1. Functional testing:
  • Ran make init start to start the application
  • Verified chat functionality works with the CA EDD Web engine

@fg-nava fg-nava self-assigned this Dec 17, 2024
Copy link

github-actions bot commented Dec 17, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
2822 2526 90% 80% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
app/src/chat_engine.py 83% 🟢
app/src/format.py 96% 🟢
app/src/ingestion/pdf_elements.py 100% 🟢
app/src/ingestion/pdf_postprocess.py 94% 🟢
TOTAL 93% 🟢

updated for commit: 32b5b90 by action🐍

@fg-nava fg-nava marked this pull request as ready for review December 17, 2024 00:07
@fg-nava fg-nava requested a review from KevinJBoyer December 17, 2024 00:07
Copy link
Contributor

@KevinJBoyer KevinJBoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the generalization of the PDF code!

@@ -319,15 +319,11 @@ def _create_phrase(self, parent_node: Element | None, child: Text) -> Phrase | N
return Phrase(text=child.data, bold=bolded)


class BemTagExtractor(TagExtractor):
class GenericTagExtractor(TagExtractor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't call this Generic since it was implemented specifically for BEM pdfs. Either remove it entirely or leave it as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove -- I noticed this file was all specific BEM if I'm not mistaken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this file pdf_stylings.py as it was only used by BEM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KevinJBoyer Should this file go away as well (probably in another PR)? I think this was for some Michigan PDF that was different from BEM pdfs -- #31

@fg-nava fg-nava requested a review from yoomlam December 18, 2024 18:34
@fg-nava fg-nava force-pushed the DST-620-remove-bem-code-functions branch from 9a4e794 to 8db3a3d Compare December 18, 2024 18:47
Comment on lines 95 to 68
def _get_bem_documents_to_show(
def _get_documents_to_show(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not called by any remaining code, remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

app/pyproject.toml Outdated Show resolved Hide resolved
@fg-nava fg-nava merged commit 7ac53b5 into main Dec 18, 2024
4 checks passed
@fg-nava fg-nava deleted the DST-620-remove-bem-code-functions branch December 18, 2024 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants