Crate sndjvu_format

source ·
Expand description

sndjvu_format is a library for working with the transfer format for DjVu documents.

The “transfer format” is the canonical DjVu file format defined by the DjVu v3 standard. You can use this library to parse a DjVu file or create one programmatically. The lowest-level details of the format are abstracted away, but you still need to understand the structure of a DjVu document at the “chunk” level (see below) to use this library effectively.

Overview of the DjVu v3 document model

(This overview is not intended to substitute for reading the relevant parts of the DjVu v3 standard.)

A DjVu document is either single-page or multi-page. A single-page document consists of a single component; a multi-page document consists of zero or more components, plus some metadata.

DjVu components come in three types: DJVU, DJVI, and THUM. A DJVU component represents a page, a DJVI component holds data that’s shared between several pages, and a THUM component holds thumbnail images for several pages. The single component of a single-page document must be of type DJVU.

Every piece of data in a DjVu document is contained in a chunk, and each chunk has a type. Most chunks are contained in a components; the exceptions are the DIRM and NAVM chunks that contain the metadata for a multi-page document. A chunk of type INFO can only appear at the start of a DJVU component (and is mandatory in that position); it describes some basic properties of the corresponding page, like its width and height in pixels. Other than the INFO chunk, the same types of chunk can appear in the DJVU and DJVI components. A chunk of one of these types is called an element, and describes one aspect of the page or pages with which it is associated (image data, OCRed text, annotations, etc.).

Modules

  • Representation of document annotations from the ANTa and ANTz chunks.
  • Low-level parser for the DjVu transfer format.
  • Serialization to the DjVu transfer format.

Structs

  • Version information associated with the DIRM chunk.
  • Version information associated with the FGbz chunk.
  • Version information associated with the INFO chunk.
  • Version information associated with the BG44, FG44, and TH44 chunks.
  • A BGR8 color from the palette of an FGbz chunk.
  • Index into the color palette of an FGbz chunk.
  • Version information associated with the TXTa and TXTz chunks.
  • Zone record from a TXTa or TXTz chunk.

Enums

  • The type of a component in a multi-page DjVu document.
  • Color space of the image data in a BG44, FG44, or TH44 chunk.
  • Uninhabited type, used to customize the Progress type.
  • The orientation of an encoded page.
  • The type of a zone record in a TXTa or TXTz chunk.