Module sndjvu_format::parsing

source ·
Expand description

Low-level parser for the DjVu transfer format.

“Low-level” means that only the grittiest details of the file format are abstracted away, and that flexibility is prioritized over convenience in the design of the interface. Callers only need to load a relatively small amount of data into memory to begin parsing, and can choose exactly which parts of the document they want to parse.

Entry points are the document and indirect_component free functions; other key items are Progress and the pointer-like types ComponentP, ElementP.

Incremental parsing

A DjVu document in the transfer format is a sequence of bytes. A major design goal of this module is that you can parse parts of a document without having all the bytes in memory at any one time. The parser keeps track of a correspondence between positions in an abstract byte stream and features of the document structure, and the caller provides a chunk of bytes from a specific position when it wants to parse the corresponding feature. The functions that work this way are:

The return type of each of these looks like Result<Progress<T, N>, Error>, which is how we represent this set of possible outcomes:

  • some part of the provided data was invalid (Result::Err)
  • not enough bytes were provided to parse the requested document feature (Result::Ok(Progress::None))
  • parsing succeeded (Result::Ok(Progress::Advanced))
  • there is no feature to parse at this position (Result::Ok(Progress::End))—this is a possible outcome only for ComponentP::feed and ElementP::feed

Structs

  • A single bookmark record from the (decompressed) NAVM chunk.
  • Data from the DIRM chunk that’s only present in bundled documents.
  • Iterator over “pointers” to each component of a bundled multi-page document.
  • A chunk from a DJVU or DJVI component whose type wasn’t recognized.
  • Basic metadata about one component of a multi-page document.
  • Pointer-like immutable cursor to the start or end of a component.
  • Contents of an ANTz chunk after BZZ decompression.
  • Contents of the NAVM chunk after BZZ decompression.
  • Parsed representation of the DIRM chunk.
  • Compressed portion of the DIRM chunk.
  • Pointer-like immutable cursor to the start or end of an element.
  • An error encountered while parsing.
  • Parsed representation of an FGbz chunk.
  • Compressed portion of an FGbz chunk.
  • Parsed representation of an INFO chunk.
  • Parsed representation of an FG44, BG44, or TH44 chunk.
  • Raw MMR-compressed data from an Smmr chunk.
  • “Striped” MMR-compressed data from an Smmr chunk.
  • Fallible iterator for parsing the annotations from a string.
  • Fallible iterator for parsing the bookmarks from a decompressed NAVM chunk.
  • Fallible iterator for parsing the stripe data from an Smmr chunk.
  • Unparsed representation of an ANTa chunk.
  • Unparsed representation of an ANTz chunk.
  • Unparsed representation of a BG44 chunk.
  • Represents a BGjp chunk, which doesn’t need parsing.
  • Unparsed representation of the DIRM chunk.
  • Represents a Djbz chunk, which doesn’t need further parsing.
  • Unparsed representation of an FG44 chunk.
  • Unparsed representation of an FGbz chunk.
  • Represents an FGjp chunk, which doesn’t need parsing.
  • Represents an INCL chunk, which doesn’t need parsing.
  • Unparsed representation of an INFO chunk.
  • Unparsed representation of the NAVM chunk.
  • Represents an Sjbz chunk, which doesn’t need further parsing.
  • Unparsed representation of an Smmr chunk.
  • Unparsed representation of an Smmr chunk.
  • Unparsed representation of an TXTa chunk.
  • Unparsed representation of a TXTz chunk.
  • Parsed representation of an Smmr chunk.
  • Pointer-like immutable cursor to the start or end of a thumbnail.
  • Parsed representation of a TXTa or (decompressed) TXTz chunk.

Enums

  • Parsed representation of the start of a component.
  • Parsed representation of the start of a document.
  • An unparsed chunk that represents a page element.
  • Subtypes of the FG44/BG44/TH44 chunk types.
  • The outcome of a parsing operation, if no Error was encountered.
  • Possible formats for the MMR data in an Smmr chunk.

Functions

  • Start parsing annotations from the string content of a ANTa or ANTz chunk.
  • Start parsing a DjVu document from some bytes.
  • Start parsing a component of an indirect multi-page DjVu document from some bytes.