Module sndjvu_format::parsing

Expand description

Low-level parser for the DjVu transfer format.

“Low-level” means that only the grittiest details of the file format are abstracted away, and that flexibility is prioritized over convenience in the design of the interface. Callers only need to load a relatively small amount of data into memory to begin parsing, and can choose exactly which parts of the document they want to parse.

Entry points are the document and indirect_component free functions; other key items are Progress and the pointer-like types ComponentP, ElementP.

Incremental parsing

A DjVu document in the transfer format is a sequence of bytes. A major design goal of this module is that you can parse parts of a document without having all the bytes in memory at any one time. The parser keeps track of a correspondence between positions in an abstract byte stream and features of the document structure, and the caller provides a chunk of bytes from a specific position when it wants to parse the corresponding feature. The functions that work this way are:

The return type of each of these looks like Result<Progress<T, N>, Error>, which is how we represent this set of possible outcomes:

some part of the provided data was invalid (Result::Err)
not enough bytes were provided to parse the requested document feature (Result::Ok(Progress::None))
parsing succeeded (Result::Ok(Progress::Advanced))
there is no feature to parse at this position (Result::Ok(Progress::End))—this is a possible outcome only for ComponentP::feed and ElementP::feed

Structs

Bookmark
A single bookmark record from the (decompressed) NAVM chunk.
Bundled
Data from the DIRM chunk that’s only present in bundled documents.
BundledIter
Iterator over “pointers” to each component of a bundled multi-page document.
Chunk
A chunk from a DJVU or DJVI component whose type wasn’t recognized.
ComponentMeta
Basic metadata about one component of a multi-page document.
ComponentP
Pointer-like immutable cursor to the start or end of a component.
DecodedAntz
Contents of an ANTz chunk after BZZ decompression.
DecodedNavm
Contents of the NAVM chunk after BZZ decompression.
Dirm
Parsed representation of the DIRM chunk.
DirmExtra
Compressed portion of the DIRM chunk.
ElementP
Pointer-like immutable cursor to the start or end of an element.
Error
An error encountered while parsing.
Fgbz
Parsed representation of an FGbz chunk.
FgbzIndices
Compressed portion of an FGbz chunk.
Info
Parsed representation of an INFO chunk.
Iw44
Parsed representation of an FG44, BG44, or TH44 chunk.
Mmr
Raw MMR-compressed data from an Smmr chunk.
MmrStripes
“Striped” MMR-compressed data from an Smmr chunk.
ParsingAnnots
Fallible iterator for parsing the annotations from a string.
ParsingBookmarks
Fallible iterator for parsing the bookmarks from a decompressed NAVM chunk.
ParsingStripes
Fallible iterator for parsing the stripe data from an Smmr chunk.
RawAnta
Unparsed representation of an ANTa chunk.
RawAntz
Unparsed representation of an ANTz chunk.
RawBg44
Unparsed representation of a BG44 chunk.
RawBgjp
Represents a BGjp chunk, which doesn’t need parsing.
RawDirm
Unparsed representation of the DIRM chunk.
RawDjbz
Represents a Djbz chunk, which doesn’t need further parsing.
RawFg44
Unparsed representation of an FG44 chunk.
RawFgbz
Unparsed representation of an FGbz chunk.
RawFgjp
Represents an FGjp chunk, which doesn’t need parsing.
RawIncl
Represents an INCL chunk, which doesn’t need parsing.
RawInfo
Unparsed representation of an INFO chunk.
RawNavm
Unparsed representation of the NAVM chunk.
RawSjbz
Represents an Sjbz chunk, which doesn’t need further parsing.
RawSmmr
Unparsed representation of an Smmr chunk.
RawTh44
Unparsed representation of an Smmr chunk.
RawTxta
Unparsed representation of an TXTa chunk.
RawTxtz
Unparsed representation of a TXTz chunk.
Smmr
Parsed representation of an Smmr chunk.
ThumbnailP
Pointer-like immutable cursor to the start or end of a thumbnail.
Txt
Parsed representation of a TXTa or (decompressed) TXTz chunk.

Enums

ComponentHead
Parsed representation of the start of a component.
DocumentHead
Parsed representation of the start of a document.
Element
An unparsed chunk that represents a page element.
Iw44Kind
Subtypes of the FG44/BG44/TH44 chunk types.
Progress
The outcome of a parsing operation, if no Error was encountered.
SmmrBody
Possible formats for the MMR data in an Smmr chunk.

Functions

annots
Start parsing annotations from the string content of a ANTa or ANTz chunk.
annots_lizardtech
document
Start parsing a DjVu document from some bytes.
indirect_component
Start parsing a component of an indirect multi-page DjVu document from some bytes.