Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

15_EventMaxiumSpeed_Qualifying.PDF can't be loaded because of overly strict startxref parsing #318

Open
jrmuizel opened this issue Sep 3, 2024 · 5 comments

Comments

@jrmuizel
Copy link
Contributor

jrmuizel commented Sep 3, 2024

Loading http://spasummerclassic.alkamelsystems.com/Results/03_2021/01_Spa%20Summer%20Classic/320_Spa%203%20Hours/202106251945_Qualifying/15_EventMaxiumSpeed_Qualifying.PDF
gives Error: Custom { kind: Other, error: "Invalid cross-reference table (invalid start value)" }

@jrmuizel
Copy link
Contributor Author

jrmuizel commented Sep 3, 2024

@Heinenen
Copy link
Collaborator

Heinenen commented Sep 4, 2024

The code that creates this error is

None => Err(Error::Xref(XrefError::Start)),
or rather

lopdf/src/nom_parser.rs

Lines 452 to 458 in ba237d1

pub fn xref_start(input: &[u8]) -> Option<i64> {
strip_nom(delimited(
pair(tag(b"startxref"), eol),
integer,
tuple((eol, tag(b"%%EOF"), space)),
)(input))
}

It only accepts EOL markers (\n, \r\n or \r), whereas the file has \n.

The relevant part of the spec is:

[..] The last line of the file shall contain
only the end-of-file marker, %%EOF. The two preceding lines shall contain, one per line and in order,
the keyword startxref and the byte offset in the decoded stream from the beginning of the PDF file to
the beginning of the xref keyword in the last cross-reference section. The startxref line shall be
preceded by the trailer dictionary [..]

The spec IMO isn't perfectly accurate here, but I understand it that lopdf follows the spec in this regard (feel free to disagree).
Note, that most PDF processors try to salvage as much as they can from PDFs, even if they are not strictly spec compliant. Implementing something like proposed in #41 would help solve this issue.

Even if this would be solved with #41, this issue should still stay opened as a reminder of what exactly needs to be addressed in #41.

@jrmuizel
Copy link
Contributor Author

jrmuizel commented Sep 6, 2024

I filed an issue with the spec.

@vcsmarts
Copy link

vcsmarts commented Jan 5, 2025

+1
It seems also that the code looks for the marker within the last 512 bytes. For one of the pdf files i have here, it would have to be within the last 35416809 bytes.

https://github.com/J-F-Liu/lopdf/blob/main/src/reader.rs#L429-L431

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants