Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: test that PyPDF can extract passages so that they are detect by DocumentSplitter #8739

Merged
merged 1 commit into from
Jan 17, 2025

Conversation

davidsbatista
Copy link
Contributor

@davidsbatista davidsbatista commented Jan 16, 2025

Related Issues

Proposed Changes:

  • I added a test assuring that when defined like this PyPDFToDocument(extraction_mode=PyPDFExtractionMode.LAYOUT), text passages are correctly extracted from PDF

How did you test it?

  • unit tests, manual verification
  • CI tests

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

@coveralls
Copy link
Collaborator

coveralls commented Jan 16, 2025

Pull Request Test Coverage Report for Build 12811890480

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.001%) to 91.293%

Files with Coverage Reduction New Missed Lines %
components/converters/pypdf.py 2 96.0%
Totals Coverage Status
Change from base Build 12794162077: 0.001%
Covered Lines: 8839
Relevant Lines: 9682

💛 - Coveralls

@davidsbatista davidsbatista changed the title test: test that PyPDF can extract passages so that they are detect b DocumentSplitter test: test that PyPDF can extract passages so that they are detect by DocumentSplitter Jan 16, 2025
@davidsbatista davidsbatista added the ignore-for-release-notes PRs with this flag won't be included in the release notes. label Jan 16, 2025
@davidsbatista davidsbatista marked this pull request as ready for review January 16, 2025 15:14
@davidsbatista davidsbatista requested a review from a team as a code owner January 16, 2025 15:14
@davidsbatista davidsbatista requested review from Amnah199 and removed request for a team January 16, 2025 15:15
@anakin87 anakin87 self-requested a review January 16, 2025 17:31
@davidsbatista davidsbatista merged commit 2c84266 into main Jan 17, 2025
28 of 29 checks passed
@davidsbatista davidsbatista deleted the pypdf-passage-fix branch January 17, 2025 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ignore-for-release-notes PRs with this flag won't be included in the release notes. topic:tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document Splitter always returns 1 document for split_type="passage" in pdfs
3 participants