-
I am trying to split a PDF into pages. The original PDF is 71 M, 898 pages. The resulting first page as a PDF is 69 M. EnvironmentWhich environment were you using when you encountered the problem? $ python -m platform
macOS-15.1.1-arm64-arm-64bit-Mach-O
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.1.0, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=none Code + PDFPDF: https://cdn-resources.ableton.com/resources/pdfs/live-manual/12/2024-11-13/live12-manual-en.pdf from pypdf import PdfReader, PdfWriter
from pathlib import Path
# Assumes the downloaded PDF is in the same directory as this script
# Change if needed
reader = PdfReader("./live12-manual-en.pdf")
writer = PdfWriter()
writer.add_page(reader.pages[0])
# Doesn't seem to do anything
# for page in writer.pages:
# page.compress_content_streams()
output_path = Path("./page_1.pdf")
writer.write(output_path)
size = output_path.stat().st_size / 1024 / 1024
print(f"Page 1 size: {size} M") Output TracebackN/A |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Beta Was this translation helpful? Give feedback.
-
If you have a look at the internal structure of the generated PDF file, you will see that even the one-page file contains all images: This is not directly obvious from looking at the content of the original PDF file directly, but after decoding the content streams (
If you search for further occurrences of this xobject 902 (the number might differ due to the work done by pdftk for the decompression), you will discover that it is referenced by basically every page:
It basically is the fault of the original creator of the PDF file that there is no clean separation and you end up with nearly the same size as the original file for each page. |
Beta Was this translation helpful? Give feedback.
this PDF is an example of files where the ressources shared between all the pages.
the full /Resources is shared between all pages, so cleaning should be done carefully.
considering a case where many pages could be added, this is a (draft) proposal for clean up;