You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While validating WARCs at the National Archives of the Netherlands we encountered the hopsFromSeed field. We could not find an explanation of the values, other than on Twitter or in source code of WARC tools. Please add the possible values (or those known to you) to the documentation. E.g. (from Twitter thread of 2015):
L link, E embed, X speculative embed (probably from JavaScript) P prerequisite (robots.txt, DNS)
The documentation of Heritrix's discovery path is in Heritrix's Glossary but indeed it wasn't very discoverable. I have slightly expanded the explanation and added a mention of hopsFromSeed so hopefully it will eventually start turning up in search results now. I agree that since the WARC standard mentions hopsFromSeed it should include an explanation of the values.
Can this be closed now that hopsFromSeed is documented in the Community Annotations? Or is the motivation here that it should be in a new version of the specification?
While validating WARCs at the National Archives of the Netherlands we encountered the hopsFromSeed field. We could not find an explanation of the values, other than on Twitter or in source code of WARC tools. Please add the possible values (or those known to you) to the documentation. E.g. (from Twitter thread of 2015):
and/or see Heririx source code https://github.com/internetarchive/heritrix3/blob/d0ebd405782b0c33131ad72e3a76406a475bbf3f/modules/src/main/java/org/archive/modules/extractor/Hop.java
The text was updated successfully, but these errors were encountered: