Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some exceptions. Also jbig2 vs. jbig2enc. #13

Open
DiagonalArg opened this issue Jun 2, 2023 · 2 comments
Open

Some exceptions. Also jbig2 vs. jbig2enc. #13

DiagonalArg opened this issue Jun 2, 2023 · 2 comments

Comments

@DiagonalArg
Copy link

Looks like a nice tool, thanks. I'm running scans2pdf on the output of scantailor-advanced. Some exceptions occurred, so here is the output of running it on the first page. I can provide the image if it would be useful.

Note that ubuntu 22.04 has a snap providing jbig2enc, while you're looking for jbig2.

$ scans2pdf -v Feigon-001-000.crop_2R.tif scans2.pdf
WARNING:Program not found: jbig2
DEBUG:Using selector: EpollSelector
DEBUG:Running command: ['convert', '-colorspace', 'sRGB', '-profile', '/home/user/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/argyllcms-srgb.icm', '-background', '#ffffff', '-alpha', 'remove', '-alpha', 'off', '-type', 'TrueColor', '/home/user/Feigon-001-000.crop_2R.tif', '/var/tmp/djpdf-xwxq05cu/image.png']
DEBUG:convert-im6.q16: profile 'icc': 'RGB ': RGB color space not permitted on grayscale PNG `/var/tmp/djpdf-xwxq05cu/image.png' @ warning/png.c/MagickPNGWarningHandler/1668.

DEBUG:Running command: ['convert', '-fill', '#000000', '-opaque', '#000000', '-fill', '#000000', '-opaque', '#000000', '-threshold', '0', '/var/tmp/djpdf-xwxq05cu/image.png', '/var/tmp/djpdf-nma9ouby/image.png']
DEBUG:Running command: ['identify', '-units', 'PixelsPerInch', '-format', '%x %y', '/var/tmp/djpdf-xwxq05cu/image.png']
DEBUG:Running command: ['convert', '-fill', '#ffffff', '-opaque', '#000000', '-resize', '50%', '/var/tmp/djpdf-xwxq05cu/image.png', '/var/tmp/djpdf-_4oeppyf/image.png']
DEBUG:Running command: ['identify', '-format', '%w %h', '/var/tmp/djpdf-xwxq05cu/image.png']
DEBUG:Running command: ['convert', '-format', '%c', '/var/tmp/djpdf-nma9ouby/image.png', 'histogram:info:-']
DEBUG:Running command: ['convert', '-format', '%c', '/var/tmp/djpdf-_4oeppyf/image.png', 'histogram:info:-']
DEBUG:Running command: ['convert', '-fill', '#000000', '-opaque', '#000000', '-fill', '#000000', '-opaque', '#000000', '-threshold', '0', '/var/tmp/djpdf-xwxq05cu/image.png', '/var/tmp/djpdf-0h9szie5/image.png']
DEBUG:Running command: ['tesseract', '-l', 'eng', '--dpi', '600', '/var/tmp/djpdf-0h9szie5/image.png', '/var/tmp/djpdf-hogdbirb/ocr', 'hocr']
DEBUG:Tesseract Open Source OCR Engine v4.1.1 with Leptonica

INFO:Can't extract textangle from ocr_line: bbox 716 941 826 987; baseline 0 0; x_size 61; x_descenders 15.25; x_ascenders 15.25
DEBUG:Exception occurred:
Traceback (most recent call last):
  File "/home/user/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/hocr.py", line 46, in extract_text
    textangle = textangle_regex.search(line.attrib["title"]).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

INFO:Can't extract textangle from ocr_line: bbox 643 1005 898 1043; baseline -0.008 -6; x_size 37; x_descenders 7; x_ascenders 8
DEBUG:Exception occurred:
Traceback (most recent call last):
  File "/home/user/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/hocr.py", line 46, in extract_text
    textangle = textangle_regex.search(line.attrib["title"]).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

DEBUG:Running command: ['convert', '-alpha', 'remove', '-alpha', 'off', '-colorspace', 'gray', '-threshold', '50%', '-compress', 'fax', '/var/tmp/djpdf-nma9ouby/image.png', '/var/tmp/djpdf-82nt536j/image.pdf']
DEBUG:Running command: ['qpdf', '--stream-data=preserve', '--object-streams=preserve', '--normalize-content=n', '--newline-before-endstream', '--linearize', '/var/tmp/djpdf-1p_vtodx/temp.pdf', '/home/user/scans2.pdf']

@DiagonalArg
Copy link
Author

Symlinking jbig2 to jbig2enc, also produces errors:

$ scans2pdf -v Feigon-001-000.crop_2R.tif scans2.pdf
DEBUG:Using selector: EpollSelector
DEBUG:Running command: ['convert', '-colorspace', 'sRGB', '-profile', '/home/dev/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/argyllcms-srgb.icm', '-background', '#ffffff', '-alpha', 'remove', '-alpha', 'off', '-type', 'TrueColor', '/home/user/Lee.finished/Feigon-Images/out/Feigon-001-000.crop_2R.tif', '/var/tmp/djpdf-kb6b1v1n/image.png']
DEBUG:convert-im6.q16: profile 'icc': 'RGB ': RGB color space not permitted on grayscale PNG `/var/tmp/djpdf-kb6b1v1n/image.png' @ warning/png.c/MagickPNGWarningHandler/1668.

DEBUG:Running command: ['convert', '-fill', '#000000', '-opaque', '#000000', '-fill', '#000000', '-opaque', '#000000', '-threshold', '0', '/var/tmp/djpdf-kb6b1v1n/image.png', '/var/tmp/djpdf-3yk3pn39/image.png']
DEBUG:Running command: ['identify', '-units', 'PixelsPerInch', '-format', '%x %y', '/var/tmp/djpdf-kb6b1v1n/image.png']
DEBUG:Running command: ['convert', '-fill', '#ffffff', '-opaque', '#000000', '-resize', '50%', '/var/tmp/djpdf-kb6b1v1n/image.png', '/var/tmp/djpdf-ftx7rfk1/image.png']
DEBUG:Running command: ['identify', '-format', '%w %h', '/var/tmp/djpdf-kb6b1v1n/image.png']
DEBUG:Running command: ['convert', '-format', '%c', '/var/tmp/djpdf-3yk3pn39/image.png', 'histogram:info:-']
DEBUG:Running command: ['convert', '-format', '%c', '/var/tmp/djpdf-ftx7rfk1/image.png', 'histogram:info:-']
DEBUG:Running command: ['convert', '-fill', '#000000', '-opaque', '#000000', '-fill', '#000000', '-opaque', '#000000', '-threshold', '0', '/var/tmp/djpdf-kb6b1v1n/image.png', '/var/tmp/djpdf-amjl3335/image.png']
DEBUG:Running command: ['tesseract', '-l', 'eng', '--dpi', '600', '/var/tmp/djpdf-amjl3335/image.png', '/var/tmp/djpdf-9ap6ewya/ocr', 'hocr']
DEBUG:Tesseract Open Source OCR Engine v4.1.1 with Leptonica

INFO:Can't extract textangle from ocr_line: bbox 716 941 826 987; baseline 0 0; x_size 61; x_descenders 15.25; x_ascenders 15.25
DEBUG:Exception occurred:
Traceback (most recent call last):
  File "/home/dev/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/hocr.py", line 46, in extract_text
    textangle = textangle_regex.search(line.attrib["title"]).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

INFO:Can't extract textangle from ocr_line: bbox 643 1005 898 1043; baseline -0.008 -6; x_size 37; x_descenders 7; x_ascenders 8
DEBUG:Exception occurred:
Traceback (most recent call last):
  File "/home/dev/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/hocr.py", line 46, in extract_text
    textangle = textangle_regex.search(line.attrib["title"]).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

WARNING:Lossy JBIG2 compression can alter text in a way that is not noticeable as corruption (e.g. the numbers '6' and '8' get replaced)
DEBUG:Running command: ['convert', '-alpha', 'remove', '-alpha', 'off', '-colorspace', 'gray', '-threshold', '50%', '/var/tmp/djpdf-3yk3pn39/image.png', '/var/tmp/djpdf-lykzi68v/input.0.png']
DEBUG:Running command: ['jbig2', '-p', '-s', '-t', '0.9', '/var/tmp/djpdf-lykzi68v/input.0.png']
DEBUG:Error in fopenReadStream: file not found
Unable to open "/var/tmp/djpdf-lykzi68v/input.0.png"

ERROR:Command '['jbig2', '-p', '-s', '-t', '0.9', '/var/tmp/djpdf-lykzi68v/input.0.png']' returned non-zero exit status 1
DEBUG:Exception occurred:
Traceback (most recent call last):
  File "/home/dev/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/scans2pdfcli.py", line 392, in main
    asyncio.run(build_pdf(pages, out_file))
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
    return future.result()
  File "/home/user/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/scans2pdf.py", line 589, in build_pdf
    return await pdf_builder.write(
  File "/home/user/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/djpdf.py", line 867, in write
    await asyncio.gather(
  File "/home/user/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/djpdf.py", line 734, in make_page
    await asyncio.gather(
  File "/home/user/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/djpdf.py", line 576, in pdf_image
    return await self._image.pdf_image(psem)
  File "/home/user/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/djpdf.py", line 433, in pdf_image
    return await self._cache.get(self._pdf_image(psem))
  File "/home/user/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/util.py", line 121, in get
    self._content = await content_future
  File "/home/user/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/djpdf.py", line 514, in _pdf_image
    (jbig2_images, jbig2_globals), image_masks = await asyncio.gather(
  File "/home/user/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/djpdf.py", line 496, in get_jbig2_images
    await run_command(cmd, psem, cwd=temp_dir)
  File "/home/user/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/util.py", line 169, in run_command
    raise CalledProcessError(proc.returncode, args, None)
subprocess.CalledProcessError: Command '['jbig2', '-p', '-s', '-t', '0.9', '/var/tmp/djpdf-lykzi68v/input.0.png']' returned non-zero exit status 1.

CRITICAL:Operation failed

$ jbig2 -p -s -t 0.9 ~/Constitution.png 
JBIG2 compression complete. pages:1 symbols:127 log2:7
$ echo $?
0

@Unrud
Copy link
Owner

Unrud commented Jun 2, 2023

Unable to open "/var/tmp/djpdf-lykzi68v/input.0.png"

Temporary files are stored in /var/tmp/. My guess is that Snap jbig2 doesn't have access to this folder on the host or some similar issue related to sandboxing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants