Module sndjvu_format::parsing
source · Expand description
Low-level parser for the DjVu transfer format.
“Low-level” means that only the grittiest details of the file format are abstracted away, and that flexibility is prioritized over convenience in the design of the interface. Callers only need to load a relatively small amount of data into memory to begin parsing, and can choose exactly which parts of the document they want to parse.
Entry points are the document
and indirect_component
free functions; other key items
are Progress
and the pointer-like types ComponentP
, ElementP
.
Incremental parsing
A DjVu document in the transfer format is a sequence of bytes. A major design goal of this module is that you can parse parts of a document without having all the bytes in memory at any one time. The parser keeps track of a correspondence between positions in an abstract byte stream and features of the document structure, and the caller provides a chunk of bytes from a specific position when it wants to parse the corresponding feature. The functions that work this way are:
The return type of each of these looks like Result<Progress<T, N>, Error>
, which is how we
represent this set of possible outcomes:
- some part of the provided data was invalid (
Result::Err
) - not enough bytes were provided to parse the requested document feature
(
Result::Ok(Progress::None)
) - parsing succeeded (
Result::Ok(Progress::Advanced)
) - there is no feature to parse at this position (
Result::Ok(Progress::End)
)—this is a possible outcome only forComponentP::feed
andElementP::feed
Structs
- A single bookmark record from the (decompressed)
NAVM
chunk. - Data from the
DIRM
chunk that’s only present in bundled documents. - Iterator over “pointers” to each component of a bundled multi-page document.
- A chunk from a
DJVU
orDJVI
component whose type wasn’t recognized. - Basic metadata about one component of a multi-page document.
- Pointer-like immutable cursor to the start or end of a component.
- Contents of an
ANTz
chunk after BZZ decompression. - Contents of the
NAVM
chunk after BZZ decompression. - Parsed representation of the
DIRM
chunk. - Compressed portion of the
DIRM
chunk. - Pointer-like immutable cursor to the start or end of an element.
- An error encountered while parsing.
- Parsed representation of an
FGbz
chunk. - Compressed portion of an
FGbz
chunk. - Parsed representation of an
INFO
chunk. - Parsed representation of an
FG44
,BG44
, orTH44
chunk. - Raw MMR-compressed data from an
Smmr
chunk. - “Striped” MMR-compressed data from an
Smmr
chunk. - Fallible iterator for parsing the annotations from a string.
- Fallible iterator for parsing the bookmarks from a decompressed
NAVM
chunk. - Fallible iterator for parsing the stripe data from an
Smmr
chunk. - Unparsed representation of an
ANTa
chunk. - Unparsed representation of an
ANTz
chunk. - Unparsed representation of a
BG44
chunk. - Represents a
BGjp
chunk, which doesn’t need parsing. - Unparsed representation of the
DIRM
chunk. - Represents a
Djbz
chunk, which doesn’t need further parsing. - Unparsed representation of an
FG44
chunk. - Unparsed representation of an
FGbz
chunk. - Represents an
FGjp
chunk, which doesn’t need parsing. - Represents an
INCL
chunk, which doesn’t need parsing. - Unparsed representation of an
INFO
chunk. - Unparsed representation of the
NAVM
chunk. - Represents an
Sjbz
chunk, which doesn’t need further parsing. - Unparsed representation of an
Smmr
chunk. - Unparsed representation of an
Smmr
chunk. - Unparsed representation of an
TXTa
chunk. - Unparsed representation of a
TXTz
chunk. - Parsed representation of an
Smmr
chunk. - Pointer-like immutable cursor to the start or end of a thumbnail.
- Parsed representation of a
TXTa
or (decompressed)TXTz
chunk.
Enums
- Parsed representation of the start of a component.
- Parsed representation of the start of a document.
- An unparsed chunk that represents a page element.
- Subtypes of the
FG44
/BG44
/TH44
chunk types. - The outcome of a parsing operation, if no
Error
was encountered. - Possible formats for the MMR data in an
Smmr
chunk.
Functions
- Start parsing annotations from the string content of a
ANTa
orANTz
chunk. - Start parsing a DjVu document from some bytes.
- Start parsing a component of an indirect multi-page DjVu document from some bytes.