DjVu Reference

7 Component pieces (IFF chunks) of DjVu documents and images [top]

This section describes the DjVu file format at a middle level. This includes types of chunks which can go into various types of documents but not a detailed layout of the contents of those chunks.

DjVu documents are IFF85 files (see Reference 2 for details). The IFF85 structure provides a hierarchy of containers which hold various types of information in a DjVu file. The containers are called “chunks”. How the chunk is used (what it holds) can be determined by its “chunk type” or “chunk ID”. For example, the list of files contained in a multi-page document is held in the DIRM (“directory”) chunk; annotations are held in a ANTz chunk.

FORM chunks are composite (contain other chunks). Their specific use is exposed by a secondary chunk ID. For example, a single page consists of several different chunks all contained within a single FORM:DJVU chunk. A multi-page document consists of several pages (and other chunks) all contained in a FORM:DJVM chunk.

This section discusses the various kinds of DjVu documents and the corresponding chunks of which they consist.

7.1 Single-page documents

A single-page document is composed of a single FORM:DVJU composite chunk. This composite chunk always begins with one INFO chunk describing the image size, resolution and related information (see §8.3.11). The document contains exactly one DjVu image, whose content varies as described below.

7.1.1 Photo DjVu image

Photo DjVu image files are best used for encoding photographic images in colors or in shades of gray. The data compression model relies on the IW44 wavelet representation. This format is designed such that the IW44 decoder is able to quickly perform progressive rendering of any image segment using only a small amount of memory. One or more additional BG44 chunks contain the image data encoded with the IW44 representation. The image size specified in the INFO chunk and the image size specified in the IW44 data must be equal.

7.1.2 Bi-level DjVu image

Bi-level DjVu image files are used to compress black and white images representing text and simple drawings. The JB2 data compression model uses the soft pattern-matching technique, which essentially consists of encoding each character by describing how it differs from a well-chosen already-encoded character. A Sjbz chunk contains the bi-level data encoded with the JB2 representation (see Appendix 2). The image size specified in the INFO chunk and the image size specified in the JB2 data must be equal.

7.1.3 Compound DjVu image

Compound DjVu image files are an extremely efficient way to compress high-resolution compound document images containing both images and text, such as a page of a magazine. Compound DjVu files represent the document image using two layers. The background layer is used for encoding the pictures and the paper texture.

The foreground layer is used for encoding the text and the drawings. Additional chunks hold the components of either the foreground or the background layers.

The main component of the foreground layer is a bi-level image named the foreground mask. The pixel size of the foreground mask is equal to the size of the DjVu image. It contains a black-on-white representation of the text and the drawings. This image is encoded by a Sjbz chunk using the JB2 representation. There may also be a companion chunk Djbz containing a shape dictionary that defines bi-level shapes referenced by the Sjbz chunk.

7.1.3.1 Foreground encoding

The foreground colors can be encoded according to two models:

Using a small color image, the foreground color image, encoded as a single FG44 chunk using the IW44 representation (see IW44Image.h).
By specifying one solid color per object described by the JB2-encoded mask. These JB2 colors are color-quantized and stored in a single FGbz chunk (see §6.3.10). Such compound DjVu images are rendered by painting each foreground object on top of the background color image using the solid color specified by the FGbz chunk.

7.1.3.2 Background encoding

The background layer is a color image, the background color image, encoded by an arbitrary number of BG44 chunks containing successive IW44 refinements (see Appendix 1). The size of this image is computed by rounding up the quotient of the mask size by an integer sub-sampling factor ranging from 1 to 12. Most compound DjVu images use a background sub-sampling factor equal to 3. Smaller sub-sampling factors are adequate for images with a very rich paper texture. Larger sub-sampling factors are adequate for images containing no pictures.

There are no ordering or interleaving constraints on these chunks except that (a) the INFO chunk must appear first, and (b) the successive BG44 refinements must appear with their natural order. The chunk order simply affects the progressive rendering of DjVu images in a web browser.

7.1.3.3 Alternative encodings

Besides the JB2 and IW44 encoding schemes, the DjVu format supports alternative encdoding methods for its components.

The foreground mask may be represented by a single Smmr chunk instead of Sjbz. The Smmr chunk contains a bi-level image encoded with the Fax-G4/MMR method. Although the resulting files are typically six times larger, this capability can be useful when DjVu is used as a front-end for fax machines and scanners with embedded Fax-G4/MMR capabilities.

The background color image may be represented by a single BGjp chunk instead of several BG44 chunks. The BGjp chunk contains a JPEG-encoded color image (see JPEGDecoder.cpp). The resulting files are significantly larger and lack the progressivity of the usual DjVu files. This is useful because some scanners have embedded JPEG capabilities.

The foreground color image may be represented by a single FGjp chunk instead of a single FG44 chunk. This is useful because some scanners have embedded JPEG capabilities.

7.1.3.4 Annotations and textual information

All types of DjVu images may contain annotation chunks. Annotation chunks are used to describe hyperlinks, to specify more viewer settings (page background, initial zoom, etc.), and to hold metadata information. Annotations are contained in ANTa or ANTz chunks.

All types of DjVu image files may also contain a computer-readable description of the text appearing on the page. This information is contained in either a TXTa chunk or a TXTz chunk.

7.2 Multi-page documents

A multi-page document is composed of a FORM:DJVM whose first chunk is a DIRM chunk containing the document directory. This directory lists all component files composing the given document, helps to access every component file and identifies the pages of the document.

In a bundled multi-page file, the component files are stored immediately after the DIRM chunk, within the FORM:DJVM composite chunk.

In an indirect multi-page file, the component files are stored in different files whose URLs are composed using information stored in the DIRM chunk.

7.2.1 Component files

A multi-page DjVu document necessarily references other FORM (composite) chunks. Specifically:

Each page is a single-page document (FORM:DJVU chunk).
Embedded thumbnails (if any) are contained in one or more FORM:THUM chunks.
Shared annotations (if any) and shape dictionaries (if any) are contained in one or more FORM:DJVI chunks.

Each of these composite chunks (FORM:DJVU, FORM:THUM, FORM:DJVI) is a well-formed IFF byte stream in its own right and can be held in a separate disk file. In the context of a multi-page — either bundled or indirect — document, we refer to these composite chunks as component files.

7.2.2 Including shared information

In many cases, efficiencies can be achieved by sharing JB2 shape definitions and/or annotations across pages. To facilitate this, any DjVu image file contained in a multi-page file may contain an INCL chunk containing the ID of a shared component file. The decoder processes the chunks contained in the shared component file as if the DjVu image file contained them. All relevant pages include this shared component file. Although they appear in several pages, these shared shapes are encoded only once in the document.

A shared component file is composed of a single FORM:DJVI potentially containing any information otherwise allowed in a DjVu image file (except for the INFO chunk of course).

8 Low-level chunk structure and definition [top]

This section describes the DjVu file format at a low level. This includes the binary layout of the IFF85 wrapper and, of course, the layout of each contained chunk.

8.1 Header

The first four bytes of a DjVu file are 0x41 0x54 0x26 0x54. This preamble is not part of the EA IFF85 format, but it is required in order to identify DjVu files.

8.2 DjVu file structure

8.2.1 IFF wrapper

An IFF file consists of a number of chunks. Each chunk is laid out in three fields:

BYTE[4]	Chunk ID. Describes the use of the chunk. The strings that identify the types of chunks used in DjVu are listed below.
BE32	The length of the data.
BYTE[]	The contained data.

A chunk whose type is not recognized by the application is to be ignored. In the IFF format, chunks may be nested: a chunk may contain other chunks as part of its data. In the DjVu format, there is only one chunk at the outermost nesting level, a FORM chunk. All other chunks appear within the FORM chunk, sequentially, with no nesting.^[1]

Example:

Offset	Bytes	Interpretation
00000000	41 54 26 54	"AT&T"; magic described in §8.1.
00000004	46 4f 52 4d	"FORM"; chunk ID = FORM
00000008	00 00 68 a6	0xa668 = 26970; length of this FORM chunk
0000000B	44 4a 56 55	"DJVU"; first four bytes of contained data. Since this is a FORM chunk, this starts with the sub-identifier. This is a FORM:DJVU chunk, a single-page document.

8.2.2 Chunk summary

The chunks used in the DjVu file format are summarized in Table 1.

Table 1: chunk summary
Chunk ID	Usage
FORM	The composite chunk. The first four bytes of the FORM chunk are a secondary identifier. Such chunks are referred to as FORM:XXXX where XXXX stands for the secondary identifier.
FORM:DJVM	A multi-page DjVu document. Composite chunk that contains the DIRM chunk, possible shared/included chunks and subsequent FORM:DJVU chunks which make up a multi-page document.
FORM:DJVU	A DjVu page/single-page DjVu document. Composite chunk that contains the chunks that make up a page in a DjVu document.
FORM:DJVI	A “shared” DjVu file which is included via the INCL chunk. Shared annotations, shared shape dictionary.
FORM:THUM	Composite chunk that contains the TH44 chunks that are the embedded thumbnails.
DIRM	Page name information for multi-page documents.
NAVM	Bookmark information.
ANTa, ANTz	Annotations, including both initial view settings and overlaid hyperlinks, text boxes, etc.
TXTa, TXTz	Unicode text and layout information.
Djbz	Shared shape table.
Sjbz	BZZ-compressed JB2 bitonal data used to store mask.
FG44	IW44 data used to store foreground.
BG44	IW44 data used to store background.
TH44	IW44 data used to store embedded thumbnail images.
WMRM	JB2 data required to remove a watermark.^[1]
FGbz	Color JB2 data. Provides a color for each [blit or shape?] in the corresponding Sjbz chunk.
INFO	Information about a DjVu page.
INCL	The ID of an included FORM:DJVI chunk.
BGjp	JPEG-encoded background.
FGjp	JPEG-encoded foreground.
Smmr	G4-encoded mask.

8.3 IFF chunk types

8.3.1 Container chunk: FORM

The FORM chunk is used as a chunk container. The first four bytes of the FORM chunk are a secondary ID used to identify the contained chunks.

8.3.1.1 FORM:DJVM

As discussed in §7.2, a multi-page DjVu document is composed of a single (composite) FORM:DJVM chunk. The first nested chunk is always a DIRM chunk containing the document directory (see DjVmDir.h), which represents the list of the component files that make up the document. An optional NAVM chunk, which describes the outline of the document, may follow the DIRM chunk.

Example:

FORM:DJVM [126475 bytes]
  DIRM [59 bytes] Document directory (bundled, 3 files, 2 pages)
  FORM:DJVI [3493 bytes] dict0002.iff
  FORM:DJVU [115016 bytes] p0001.djvu
  FORM:DJVU [7869 bytes] p0002.djvu

8.3.1.2 FORM:DJVU

As discussed in §7.1, a single page in a DjVu document is contained in a single (composite) FORM:DJVU chunk. The nested first chunk must be the INFO chunk. The chunks after the INFO chunk may occur in any order, although the order of the BG44 chunks, if there is more than one, is significant.

Example:

FORM:DJVU [26790 bytes]
  INFO [10 bytes] 2202 × 967, version 26, 300 dpi, gamma = 2.2
  Sjbz [13133 bytes] JB2 bi-level data
  FG44 [185 bytes] IW44 data #1, 76 slices, version 1.2 (color), 184 × 81
  BG44 [935 bytes] IW44 data #1, 74 slices, version 1.2 (color), 734 × 323
  BG44 [1672 bytes] IW44 data #2, 10 slices
  BG44 [815 bytes] IW44 data #3, 4 slices
  BG44 [9976 bytes] IW44 data #4, 9 slices

8.3.1.3 FORM:DJVI

Multi-page DjVu documents can share information between pages by nesting a chunk inside a FORM:DJVI chunk (which is itself held inside the FORM:DJVM chunk) and referencing the contained chunk from within a page. Individual pages reference the shared chunks via the INCL chunk.

Example:

FORM:DJVM [126475 bytes]
  DIRM [59 bytes] Document directory (bundled, 3 files, 2 pages)
  FORM:DJVI [3493 bytes] dict0002.iff
    Djbz [3481 bytes] JB2 shared dictionary
  FORM:DJVU [115016 bytes] p0001.djvu
    INFO [10 bytes] 2539 × 3295, version 25, 300 dpi, gamma = 2.2
    INCL [12 bytes] Indirection chunk → dict0002.iff
    Sjbz [70497 bytes] JB2 bi-level data

8.3.1.4 FORM:THUM

Pre-rendered thumbnails may be included. This allows very large documents to render thumbnails of pages without downloading and decoding them. FORM:THUM chunks contain several TH44 chunks. Each of these chunks contains the thumbnails of the pages that follow.

Example:

FORM:DJVM [2272012 bytes]
  DIRM [108 bytes] Document directory (bundled, 7 files, 4 pages)
  FORM:THUM [5960 bytes] p0001.thumb
    TH44 [5984 bytes] Thumbnail icon for page 1
  FORM:DJVU [1413380 bytes] p0001.djvu
    INFO [10 bytes] 4728 × 6300, version 25, 600 dpi, gamma = 2.2
    …
  FORM:THUM [12148 bytes] p0004.thumb
    TH44 [3418 bytes] Thumbnail icon for page 2
    TH44 [4150 bytes] Thumbnail icon for page 3
    TH44 [4552 bytes] Thumbnail icon for page 4
  FORM:DJVU [777858 bytes] p0002.djvu
    …

8.3.2 Directory chunk: DIRM

As described in §7.2, a multi-page document will contain “component files” such as individual pages (FORM:DJVU) or shared annotations (FORM:DJVI).

The first contained chunk in a FORM:DJVM composite chunk is the DIRM chunk containing the document directory. It contains information the decoder will need to access the component files (see §7.2).

8.3.2.1 Unencoded data

The first part of the DIRM chunk is unencoded:

BYTE

Bit 7 (the MSB) is the bundled flag: 1 for bundled, 0 for indirect.

Bits 6 through 0 are the version, currently 1.

BE16

Number of component files.

BE32[]

When the document is a bundled document (i.e. the flag bundled is set), the header above is followed by the offsets of each of the component files within the FORM:DJVM. These offsets allow for random component file access.

When the document is indirect, these offsets are omitted.

8.3.2.2 BZZ-encoded data

The rest of the chunk is entirely compressed with the BZZ general-purpose compressor. We describe now the data fed into (or retrieved from) the BZZ codec (see BSByteStream.cpp and Appendix 4).

BE24[]

Size of each component file. May be 0 for indirect documents.

BYTE[]

Flag byte for each component file.

0b⟨hasname⟩⟨hastitle⟩000000 for a file included by other files
0b⟨hasname⟩⟨hastitle⟩000001 for a file representing a page
0b⟨hasname⟩⟨hastitle⟩000010 for a file containing thumbnails

Flag hasname is set when the name of the file is different from the file ID. Flag hastitle is set when the title of the file is different from the file ID. These flags are used to avoid encoding the same string three times.

Note: in practice, the hasname and hastitle bits are poorly tested and not used.

ZSTR[]

There are one to three zero-terminated strings per component file. The first string contains the ID of the component file. If hasname is set, then there is a second string which contains the name of the component file (in the case of an indirect file, this is the disk filename). If hastitle is set, then there is a third string which contains the name of the component (for display — for example, alternate page numberings in the Foreword or Preface).

Examples:

Bytes	Interpretation
unencoded portion
81	bundled document, version 1
00 03	3 files
00 00 00 54 00 00 0e 02 00 01 cf 52	file offsets: 84, 3586, 118610

encoded portion (as passed to encoder)
00 0d ad 01 c1 50 00 1e c5	file sizes: 3501, 115024, 7877
00 01 01	flag bytes: included file, `hasname` = 0, `hastitle` = 0; page file, `hasname` = 0, `hastitle` = 0; page file, `hasname` = 0, `hastitle` = 0
64 69 63 74 30 30 30 32 2e 69 66 66 00 70 30 30 30 31 2e 64 6a 76 75 00 70 30 30 30 32 2e 64 6a 76 75 00	file IDs: dict0002.iff, p0001.djvu, p0002.djvu

Bytes	Interpretation
unencoded portion
01	indirect document, version 1
00 03	3 files
encoded portion (as passed to encoder)
00 0d ad 01 c1 50 00 1e c5	file sizes: 3501, 115024, 7877
00 01 01	flag bytes: included file, `hasname` = 0, `hastitle` = 0; page file, `hasname` = 0, `hastitle` = 0; page file, `hasname` = 0, `hastitle` = 0
64 69 63 74 30 30 30 32 2e 69 66 66 00 70 30 30 30 31 2e 64 6a 76 75 00 70 30 30 30 32 2e 64 6a 76 75 00	file IDs: dict0002.iff, p0001.djvu, p0002.djvu

8.3.3 Document outline chunk: NAVM

The NAVM chunk contains bookmarks which describe an outline of the document. The intent is to allow content authors to create an electronic table of contents which gives users rapid access to various parts of the document.

This chunk is optional, but, if present, must immediately follow the DIRM chunk.

The entire chunk is BZZ-encoded and starts with a single field specifying the total number of bookmark records:

BE16	The total number of bookmarks in the document.

And then the individual bookmark records, nested as necessary:

BYTE	The number of immediate child bookmark records.
BE24	Size of the description text.
STR	The description text.
BE24	Size of the URL text.
STR	The URL text. This may (and typically does) use the syntax described for the URLs in the ANTa chunk (and similarly, is not URL-encoded).

Example (as passed to the BZZ encoder):

Consider a small document outline as follows:

Table of Contents
1. Introduction
2. Datasheet
  1. For More Info (Online)

There is no hyperlink associated with the single root entry "Table of Contents". At a binary level, the chunk looks like this [before encoding]:

Offset	Bytes	Interpretation
0012F06C	00 04	4 bookmarks
first record
0012F06E	02	2 children
0012F06F	00 00 11	17 bytes of description text
0012F072	54 61 62 6c 65 20 6f 66 20 43 6f 6e 74 65 6e 74 73	description text: "Table of Contents"
0012F083	00 00 00	0 bytes of URL text
second record
0012F086	00	0 children
0012F087	00 00 0c	12 bytes of description text
0012F08A	49 9e 74 72 6f 64 75 63 74 69 6f 6e	description text: "Introduction"
0012F096	00 00 0b	11 bytes of URL text
0012F099	23 70 30 30 30 31 2e 64 6a 76 75	URL text: "#p0001.djvu"
third record
0012F0A4	01	1 child
0012F0A5	00 00 09	9 bytes of description text
0012F0A8	44 61 74 61 73 68 65 65 74	description text: "Datasheet"
0012F0B1	00 00 0b	11 bytes of URL text
0012F0B4	23 70 30 30 30 32 2e 64 6a 76 75	URL text: "#p0002.djvu"
fourth record
0012F0BF	00	0 children
0012F0C0	00 00 16	22 bytes of description text
0012F0C3	46 6f 72 20 4d 64 72 65 20 49 6e 66 6f 20 28 4f 6e 6c 69 6e 65 29	description text: "For More Info (Online)"
0012F0D9	00 00 19	25 bytes of URL text
0012F0DC	68 74 74 70 3a 2f 2f 77 77 77 2e 6c 69 7a 61 72 64 74 65 63 68 2e 63 6f 6d	URL text: "http://www.lizardtech.com"

8.3.4 Annotation chunk: ANTa, ANTz

Annotations are contained in ANTa or ANTz chunks. The ANTa chunks contain the annotation in plain text. The ANTz chunks contain the same information compressed with the BZZ encoder (see BSByteStream.h).

The use of the ANTa chunk is discouraged.

Pages can share annotations using an INCL chunk as explained in §7.2.2. The complete annotation text is obtained by concatenating all annotation chunks present in the page. A restriction of the current reference library implementation limits the number of shared annotation files to one.^[1]

The syntax of the annotation text uses a simple parenthesized notation. All text is standard UTF-8.

8.3.4.1 Initial document view

8.3.4.1.1 Background color

8.3.4.1.2 Initial zoom

8.3.4.1.3 Initial display level

8.3.4.1.4 Alignment

8.3.4.2 Maparea (overprinted annotations)

8.3.4.2.1 `url`

8.3.4.2.2 `comment`

8.3.4.2.3 `area`

8.3.5 Text chunk: TXTa, TXTz

Text is contained in TXTa or TXTz chunks. The TXTa chunks contain the text unencoded. The TXTz chunks contain the same information compressed with the BZZ encoder (see BSByteStream.h).

The use of the TXTa chunk is discouraged.

The chunk begins with the UTF-8-encoded text of the page:

BE24	size of the text string in bytes
STR	UTF-8-encoded string
BYTE	version (currently 1)

10 Appendix 1: IW44 coding [top]

This section describes the coding of chunks of type BG44, FG44, PM44, BM44, and TH44. Chunks of type BG44, FG44, and TH44 may be color or grayscale chunks. Chunks of type PM44 are color chunks. Chunks of type BM44 are grayscale chunks. All of these color and grayscale chunk types have the same structure. The chunk consists of a chunk header followed by arithmetically-coded wavelet coefficient updates. The coefficients are organized in a hierarchical fashion.

10.1 Definitions

Color component

Compound DjVu images and photo DjVu images contain color or grayscale image data. Color IW44 images contain color image data. Grayscale IW44 images contain grayscale image data. Color image data is coded using three color components, called Y, Cb, and Cr. These correspond to the usual YCbCr color space, adjusted to facilitate transformation to the RGB color space. Grayscale image data is coded using one color component, called Y. This corresponds to the grayscale intensity of the image.

Color layer

A color layer is any of:

The foreground layer of a compound DjVu image, coded in one FG44 chunk
The background layer of compound DjVu image, coded in one or more BG44 chunks
The only layer of a photo DjVu image, coded in one or more BG44 chunks
The only layer of a color IW44 image, coded in one or more PM44 chunks
The only layer of a grayscale IW44 image, coded in one or more BM44 chunks

Color chunk

A color chunk is a chunk of type BG44, FG44, PM44, or BM44. A color chunk contains wavelet coefficient update information for one or three color components.

Block

A rectangular array of pixels of size 32 × 32 or less. The blocks are numbered starting in the lower left corner of the image. All blocks are 32 × 32 except possible those along the right edge or top edge; those blocks may be smaller if the image dimensions are not divisible by 32.

Block count

The number of blocks in the image, denoted by NB.

Wavelet block

The set of coefficients associated with one block of the image, in one color component. There are 1024 wavelet coefficients in a wavelet block, numbered 0 through 1023. The coefficients in a wavelet block have effects on the reconstruction of other blocks in the image, but for coding purposes they are considered to be localized within the block in which they are coded.

Bucket

A particular set of 16 wavelet coefficients within a wavelet block. A wavelet block consists of 64 buckets, numbered 0 through 63. Table 2 gives the correspondence between coefficients and buckets.

Band

A subset of wavelet coefficients for a given color component. There are 10 bands. The correspondence among band numbers, coefficient indices, and bucket coefficients is given by Table 2.

Cycle

Data for one color component consisting of coefficient updates for all coefficients, that is, for all 10 bands, starting with band 0. Within one band, only some coefficients are updated, but within a cycle, all coefficients are updated. The last cycle of a color component may have fewer than 10 bands.

Color band number

The current band number for a color component. Each color component’s color band number starts at 0, and increases by 1 at the end of selected slices until it reaches 9; then it is reset to 0.

Color band

A collection of update information for a subset of the coefficients of one color component of the image, consisting of updates of all the coefficients in the image whose indices within their respective blocks are those corresponding to the current color band’s color band number.

Slice

A slice is the highest-level subdivision of a color chunk. A slice contains data for one color band for each of the color components in a color layer, that is, for three color components for a color image, or for one color component for a grayscale image.

Block band

A collection of update information for a subset of the coefficients of one color component of a wavelet block, consisting of updates of the coefficients in the block whose indices are those corresponding to a given band.

Chrominance delay counter

An integer counter that indicates how many slices in a color layer contain a color band only for the Y color component, and not for the Cb and Cr color components. The chrominance delay counter is initially set to the value specified in the INFO chunk for the color layer, and decremented by 1 after each slice in the color layer until it reaches 0. See §10.4.1.1.

Step size table

A table that indicates the precision to which each coefficient in a color component is currently stored. There are three such tables for a given color layer, one for each color component. Each such table has 16 entries. Each entry specifies the current step size for 1, 4, 16, 64, or 256 different coefficient indices, according to Table 3.

10.2 Color chunks within a DjVu file

There may be more than one BG44, PM44, or BM44 chunk in a DjVu file. If there is more than one such color chunk, the coefficient updating is continuous across the chunks, and the data is taken from the chunks in the order in which they appear in the file. Nothing is reinitialized at the beginning of chunks after the first color chunk of these types, except for the low-level arithmetic coder. The probability estimates for the arithmetic coder are not reinitialized.

In a compound DjVu image file, in which both an FG44 chunk and one or more BG44 chunks appear in the same file, the coding of the foreground layer, using the FG44 chunk, is independent of the coding of the background layer, using the BG44 chunks.

Each color layer is coded using a Dubuc–Deslauriers–Lemire (4, 4) interpolative wavelet transform. Each layer of the image is transformed into a set of wavelet coefficients, one wavelet coefficient for each pixel in the original image. This transform is especially effective for coding images at high compression ratios.

The value of each coefficient is coded in a distributed fashion, through a number of cycles. Within one cycle, each coefficient is updated once (that is, in only one of the 10 bands), and receives approximately one additional bit of information. Specifically, from cycle to cycle the absolute value of a coefficient is first narrowed down by eliminating possible values for the most significant non-zero bit until the correct most significant non-zero bit is found. The sign is coded in the same cycle in which the most significant non-zero bit is found. Then in each subsequent cycle, one additional bit of the value is coded.

10.3 Color chunk data headers

A color chunk begins with a data header consisting of 2 or 9 octets, as follows:

BYTE	Serial number	The serial number of the first chunk of a given chunk type is 0. Successive chunks are assigned consecutive serial numbers.
BYTE	Number of slices	The number of slices coded in the chunk.
BYTE	Major version number and color type	One octet containing two values, present only if the serial number is 0. The least significant seven bits designate the major version number of the standard being implemented by the decoder. For this version of the standard, the major version number is 1. The most significant bit is the color type bit. The color type bit is 0 if the chunk describes three color components. The color type bit is 1 if the chunk describes one color component.
BYTE	Minor version number	A one-octet unsigned integer, present only if the serial number is 0. This octet designates the minor version number of the standard being implemented by the decoder. For this version of the standard, the minor version number is 0.
BE16	Image width	A two-octet unsigned integer, most significant octet first, present only if the serial number is 0. This field indicates the number of pixels in each row of the image described by the current chunk. The image width will be less than the width of the original image if the chunk describes a layer coded at lower resolution than the original image. For a BG44 or FG44 chunk, if `W` is the width of the original image specified in the INFO chunk, and `w` is the width of the image described by the current chunk, then the allowable values of `w` are $⌈ W / 1 ⌉$ , $⌈ W / 2 ⌉$ , $⌈ W / 3 ⌉$ , $⌈ W / 4 ⌉$ , $⌈ W / 5 ⌉$ , $⌈ W / 6 ⌉$ , $⌈ W / 7 ⌉$ , $⌈ W / 8 ⌉$ , $⌈ W / 9 ⌉$ , $⌈ W / 10 ⌉$ , $⌈ W / 11 ⌉$ , and $⌈ W / 12 ⌉$ . For a BM44 or PM44 chunk, there are no restrictions on the image width.
BE16	Image height	A two-octet unsigned integer, most significant octet first, present only if the serial number is 0. This field indicates the number of pixels in each column of the image described by the current chunk. The image height will be less than the height of the original image if the chunk describes a layer coded at lower resolution than the original image. For a BG44 or FG44 chunk, if `H` is the height of the original image specified in the INFO chunk, and `h` is the height of the image described by the current chunk, then the allowable values of `h` are $⌈ H / 1 ⌉$ , $⌈ H / 2 ⌉$ , $⌈ H / 3 ⌉$ , $⌈ H / 4 ⌉$ , $⌈ H / 5 ⌉$ , $⌈ H / 6 ⌉$ , $⌈ H / 7 ⌉$ , $⌈ H / 8 ⌉$ , $⌈ H / 9 ⌉$ , $⌈ H / 10 ⌉$ , $⌈ H / 11 ⌉$ , and $⌈ H / 12 ⌉$ . For a BM44 or PM44 chunk, there are no restrictions on the image height.
BYTE	Initial value of chrominance delay counter	A one-octet unsigned integer, present only if the serial number is 0. Only the least significant seven bits are used. The most significant bit is ignored, but should be set to 1 by an encoder. This field specifies the initial value of the chrominance delay counter, used as described below.

10.4 Color chunk data

10.4.1 Hierarchical structure of a coded color layer

The data coded in a color chunk consists of information needed to reconstruct wavelet coefficients. There are one or three color components; each color component has its own set of wavelet coefficients. Within a color component, there are 1024 wavelet coefficients for each 32 × 32 block of the image.

Within one layer (background or foreground for DjVu image, or the only layer for an IW44 image), coding is divided into a series of slices. All the slices may be coded in one chunk, or they may be separated into a number of chunks. The only difference it makes whether the slices are coded in one chunk or in several chunks is in the order of progressive rendering; the final reconstructed image will be the same. The number of slices in each chunk is specified in the color chunk data header. One slice contains refinement data for one color band for each color component. Within a color component, all coefficients in a slice are in the same band.

A color chunk describes the full image at the spatial resolution implied by the image width and image height fields in the data header of the first chunk of the same type as the current color chunk.

The sequence of color components within a slice is: first Y, then Cb, then Cr, although the Cb and Cr components are not present in a slice if the chunk describes grayscale data or if the chrominance delay counter is not equal to 0 at the time the slice is coded.

A color band is made of up coefficient updates for all blocks in the image, but only for coefficients that are in the currently active band for the color component. Each block’s set of updates within a color band is called a block band. The block bands are coded block by block, first from left to right within the bottom row, then by rows moving up the image, left to right within each row.

Within a block band, there are 16, 64, or 256 coefficient updates. The coefficients being updated are divided into buckets, each bucket containing 16 coefficients. Thus, a block band contains 1, 4, or 16 buckets. The buckets and coefficients being updated are determined by the color band number according to Table 2.

10.4.1.1 Band counting

The header of the first color chunk contains an initial value for the chrominance delay counter. It may be 0 or a positive integer.

At the beginning of the first color chunk, the color band number for each of the three color components is set to 0.

At the beginning of each slice, the chrominance delay counter is tested. If the chrominance delay counter is 0, and if the slice describes color image data, then all three color components are present. If the chrominance delay counter is greater than 0, or if the chunk describes grayscale image data, only the Y color component is present for the slice.

At the end of a slice, the following actions take place:

The color band number is increased by 1 for the Y component. If the new color band number exceeds 9, it is set to 0.
If the chrominance delay counter is 0, the color band numbers for the Cb and Cr components are increased by 1. If the new color band numbers exceed 9, they are set to 0. (Note: the color band numbers for the Cb and Cr components are always equal to each other.)
If the chrominance delay counter is greater than 0, it is decreased by 1.

A color chunk ends when the number of slices specified in the color chunk header have been coded. At the beginning of each color chunk after the first for a given color layer, the chrominance delay counter and color band numbers retain the values they had at the end of the previous color chunk.

10.4.2 Quantization of coefficients

At each point during the decoding process, each wavelet coefficient has been determined to a certain precision. The current value a of the coefficient is stored, and a current step size S is associated with the coefficient. The current step size for each coefficient is governed by a step size table. The index of the entry in the step size table that contains the step size for a given coefficient that contains the step size for a given coefficient is given in Table 3.

11 Appendix 2: JB2 coding [top]

11.1 General considerations

Selection layer coding is used in compound DjVu images. In such images, there are three layers. The foreground layer is coded in one FG44 chunk, and is rendered as described in Appendix 1. The background layer is coded in one or more BG44 chunks, and is rendered as described in Appendix 1. The selection layer is coded using one Sjbz chunk. Black pixels in the selection layer specify those pixels that are to be rendered using the foreground color. All other pixels are to be rendered using the background color.

Black-and-white coding is used in bi-level DjVu images. In such images, there are three layers. The foreground layer is black. The background layer is white. The selection layer is coded using one Sjbz chunk. The selection layer specifies those pixels that are to be rendered in black. All other pixels are to be rendered in white.

An Sjbz chunk contains a single arithmetically-coded data stream, coded using the Z′-Coder (Appendix 3). All data, including headers and record types, is coded in this arithmetically-coded stream.

11.2 Arithmetic coding

The arithmetically-coded data in an Sjbz chunk consists logically of records. The record types are listed in Table 6, and described in §11.4. The records consist of fields. The fields present for records of each record type are listed in Table 6. The fields within a record are coded in the order listed in Table 6 for records of that type. Details of the coding for each field appear in §11.5.

A field may contain one or more data elements. The data elements consist of flags, pixel colors, and integers. Because of the nature of arithmetic coding, the records, fields, and data elements are not of fixed sizes, and do not necessarily begin on bit boundaries within the data stream.

Flags are binary decisions, each coded using the Z′-Coder with a particular context. There are two different contexts for flags, the eventual image refinement context and the offset type context.

Pixel colors are binary decisions, coded using the Z′-Coder with a particular context. For pixel colors, there are 3072 different contexts. There are 1024 contexts used for direct coding of bitmaps; these correspond to the $2^{10} = 1024$ different combinations of values that the pixels in the direct coding template can assume. There are 2048 contexts used for refinement coding of bitmaps; these correspond to the $2^{11} = 2048$ different combinations of values that the pixels in the refinement coding template can assume.

Integers are coded using the multivalue extension to the Z′-Coder, described below. There are 15 contexts for coding multivalued integers, as described in Table 7.

11.2.1 Initialization of the Z′-Coder

All Z′-Coder contexts are initialized to the value 0. This applies both to contexts used to encode single-bit values, including pixel colors, and to contexts that are part of an integer context used by the multivalue extension to the Z′-Coder.

11.2.2 The multivalue extension to the Z′-Coder for coding of numeric data

Quantities that can take on multiple values are coded as integers using the multivalue extension to the Z′-Coder. This extension of the Z′-Coder allows all data in the bit stream to be coded using the same coder, the Z′-Coder. There are 15 integer contexts, specified in Table 7. A single integer context includes a number of binary contexts.

One integer context consists of a binary decision tree. See Figure 1 for an example of part of such a tree. The root node of the tree corresponds to the decision about the sign of the number n being decoded. Each of the two sub-trees under the root corresponds to a set of decisions that eventually identify a range in which n lies. The sub-trees under the nodes corresponding to identified ranges are complete binary trees that identify the exact value of n.

Each node of the binary decision tree for an integer context maintains its own binary probability estimation context for the Z′-Coder. The trees for different integer contexts are completely independent. Thus each node of a tree contains probability information conditioned on a conditioning context. The conditioning context consists of both the type of value being coded (i.e., the selection of the integer context), and of the values of the decisions coded so far when encoding the current integer.

11.2.3 Record types

Table 6: record types and fields coded for each record type
Record type coded value	Record type	Fields coded
0	Start of image	Record type Image size Eventual image refinement flag
1	New symbol, add to image and library	Record type Absolute symbol size Bitmap by direct coding Location relative to a previous symbol
2	New symbol, add to library only	Record type Absolute symbol size Bitmap by direct coding
3	New symbol, add to image only	Record type Absolute symbol size Bitmap by direct coding Location relative to a previous symbol
4	Matched symbol with refinement, add to image and library	Record type Index of matching symbol in bitmap library Relative symbol size Bitmap by refinement coding Location relative to a previous symbol
5	Matched symbol with refinement, add to library only	Record type Index of matching symbol in bitmap library Relative symbol size Bitmap by refinement coding
6	Matched symbol with refinement, add to image only	Record type Index of matching symbol in bitmap library Relative symbol size Bitmap by refinement coding Location relative to a previous symbol
7	Matched symbol, copy to image without refinement	Record type Index of matching symbol in bitmap library Location relative to a previous symbol
8	Non-symbol data	Record type Absolute symbol size Bitmap by direct coding Absolute location
9	Shared dictionary or reset	Record type Shared dictionary size
10	Comment	Record type Comment length Comment data
11	End of data	Record type

11.2.4 Fields and contexts

Table 7: multivalued integer contexts for arithmetic coding
Context name	Integer data coded using this context
Record type	Record type
Image size	Image height; image width
Matching symbol index	Index within the symbol library of the symbol matching the current symbol
Symbol width	Width of the current symbol in pixels
Symbol height	Height of the current symbol in pixels
Symbol width difference	Number of pixels that must be added to the width of the matching symbol to obtain the width of the current symbol
Symbol height difference	Number of pixels that must be added to the height of the matching symbol to obtain the height of the current symbol
Symbol column number	Column number of the absolute location of the left edge of the current symbol (leftmost column of the image is column number 1)
Symbol row number	Row number of the absolute location of the top edge of the current symbol (bottom row of the image is row number 1)
Same line column offset	Number of pixels that must be added to the column number of the right edge of the previous symbol on the current text line to obtain the column number of the left edge of the current symbol
Same line row offset	Number of pixels that must be added to the row number of the current baseline on the current text line to obtain the row number of the bottom edge of the current symbol
New line column offset	Number of pixels that must be added to the column number of the left edge of the first symbol on the current text line to obtain the column number of the left edge of the current symbol
New line row offset	Number of pixels that must be added to the row number of the bottom edge of the first symbol on the current text line to obtain the row number of the top edge of the current symbol
Comment length	The number of octets in the current comment
Comment octet	One octet in the current comment
Dictionary size	Number of shapes in the shared dictionary

11.2.5 Coding phases

Figure 1: Part of the coding tree for multivalue arithmetic coding. Each internal node represents one context with its own probability information, to be used by the Z′-Coder. The square node at the root of the tree represents the Phase 1 decision, whether the integer n being coded is negative. The filled circles are the Phase 2 nodes, moving down the tree in ever-increasing ranges. The open circles represent Phase 3 decisions, traversing a complete binary sub-tree to reach the specific value of n. A decoded value of 0 indicates a left branch in this tree. A decoded value of 1 indicates a right branch.

This method allows high compression efficiency by allowing the coder to adapt to the statistics of the data. In effect, the binary probability information stored collectively in the nodes of the decision tree closely approximates the probability distribution of the underlying multivalued integer.

The allowable range of values for n is always specified. The smallest value that n could possibly take is denoted by l. The largest value that n could possibly take is denoted by h. When l and h are equal, n is equal to both of them, and no Z′-Coder decoding is performed.

The decoder maintains a non-negative intermediate value v, defined as follows: $v = {\begin{matrix} | n | & if n \geq 0 \\ | n | - 1 & if n < 0 . \end{matrix}$ At the end of the process of decoding an integer, v is converted to n, the value of the decoded integer.

The value of an integer is coded by making a sequence of binary decisions, each one narrowing the set of values that the integer can possibly take. The decisions are based on traversing a binary decision tree to one of its leaves. Note: although the tree conceptually has a large number of nodes, it is possible in an implementation to allocate memory only for those nodes actually traversed. Decoding proceeds in four phases.

11.2.5.1 Phase 1

Phase 1 determines the sign of n. A value of 0 returned by the Z′-Coder means that $n < 0$ . A value of 1 returned by the Z′-Coder means that $n \geq 0$ .

11.2.5.2 Phase 2

Phase 2 determines a range of possible values for v. The Z′-Coder is invoked repeatedly to answer the question “Is the value of v in the range being tested?” The sequence of ranges tested is given in Table 8. A value of 0 returned by the Z′-Coder means that v is not in the specified range, and the next range in the sequence must be tested. A value of 1 returned by the Z′-Coder means that v is in the specified range, and decoding is to proceed to Phase 3.

Table 8: sequence of ranges in which `v` may fall
0
1–2
3–6
7–14
15–30
31–62
63–126
127–254
255–510
511–1022
1023–2046
2047–4094
4095–8190
8191–16382
16383–32766
32767–65534
65535–131070
131071–262142

11.2.5.3 Phase 3

Phase 3 consists of determining the exact value of v within the range determined in Phase 2. If Phase 2 determined that $v = 0$ , then Phase 3 is skipped. Otherwise, since the size of the range is a power of 2, the corresponding sub-tree is a complete binary tree. The sequence of coding decisions is based directly on traversing the binary tree. At each node, 0 returned by the Z′-Coder means left branch (smaller values of v) and 1 means right branch (larger values of v. The bits returned by the Z′-Coder during Phase 3 are the bits of v, most significant bit first.

11.2.5.4 Phase 4

In Phase 4, the unsigned value v is converted to n, the signed value to be returned, as follows: $n = {\begin{matrix} v & if n is non-negative, as determined in Phase 1; \\ - v - 1 & if n is negative. \end{matrix}$

In any of the phases, if the input values of l and h (the range of allowable values) predetermine any decision, then the coding for that decision is not performed; the predetermined decision is assumed.

Each type of integer has its own set of binary contexts. Thus the probability information will reflect the underlying probability distribution of the particular type of integer. The Z′-Coder probability state indices of all the binary nodes are initialized to 0.

11.3 Image reconstruction

Records in an Sjbz chunk are interpreted in the order in which they appear. A "Start of data" record specifies the dimensions of the image. An "Image refinement data" record indicates […]^[1]. An "End of data" record indicates the end of the Sjbz chunk. A "Comment" record contains uninterpreted data.

A record identified by any other record type describes one bitmap. The model used in DjVu for the selection layer is based on symbol-based coding. Bitmaps are placed into the reconstructed image as follows: The image is initially entirely white. When a bitmap is placed into the image, the pixels that are black in the current symbol become black at the appropriate position in the reconstructed image. Once a pixel in the reconstructed image becomes black, it remains black.

Because symbols in the document images are often similar to each other, it is often possible to obtain more efficient coding by making use of previously coded symbols. As symbols are decoded, their bitmaps may be placed into a symbol bitmap library. There is exactly one symbol bitmap library. Once a symbol has been placed into the symbol bitmap library, later records may cause copies of the symbol to be placed into the image, or may define a new bitmap by refining the bitmap in the library.

Depending on the record type, the symbol bitmap may be described by direct coding, by refinement coding, or by a copy operation. In direct coding, all pixels of the bitmap are coded directly, without reference to any other bitmap. In refinement coding, all pixels of the bitmap are also coded directly, but a bitmap in the library is used to make the coding more efficient. In a copy operation, the pixels of the bitmap are the same as the pixels of a bitmap in the library.

Depending on the record type, the bitmap may or may not be placed into the image. If the bitmap is placed into the image, then depending on the record type, it may be placed either at an absolute location or at a location relative to a previously placed bitmap.

Depending on the record type, the bitmap may or may not be placed into the symbol bitmap library. The first symbol placed into the library has index 0. Subsequent symbols are assigned consecutive integer indices.

The pixels of the reconstructed image are arranged in a rectangular coordinate system. For the pixel in the lower left corner of the image, the column number is 1 and the row number is 1. All coordinates refer to the pixels themselves, not to the edges between pixels.

11.4 Records

Records in Sjbz chunks have the following interpretations.

11.4.1 Start of image

This record is the first record in an Sjbz chunk. It specifies the dimensions of the image.

11.4.2 New symbol, add to image and library

This record specifies the bitmap of a symbol that is coded directly and placed into the reconstructed image and into the symbol bitmap library.

11.4.3 New symbol, add to library only

This record specifies the bitmap of a symbol that is coded directly and placed into the symbol bitmap library but not into the image.

11.4.4 New symbol, add to image only

This record specifies the bitmap of a symbol that is coded directly and placed into the reconstructed image but not into the symbol bitmap library.

11.4.5 Matched symbol with refinement, add to image and library

This record specifies the bitmap of a symbol that is coded by refinement of a symbol in the symbol bitmap library and placed into the reconstructed image and into the symbol bitmap library.

11.4.6 Matched symbol with refinement, add to library only

This record specifies the bitmap of a symbol that is coded by refinement of a symbol in the symbol bitmap library and placed into the symbol bitmap library, but not into the reconstructed image.

11.4.7 Matched symbol with refinement, add to image only

This record specifies the bitmap of a symbol that is coded by refinement of a symbol in the symbol bitmap library and placed into the reconstructed image, but not into the symbol bitmap library.

11.4.8 Matched symbol, copy to image without refinement

This record specifies the location at which the bitmap of a symbol in the symbol bitmap library is to be placed into the reconstructed image.

11.4.9 Non-symbol data

This record specifies a direct-coded bitmap to be placed at an absolute location in the reconstructed image. A bitmap of non-symbol data is not placed into the symbol bitmap library.

11.4.10 Shared dictionary or reset

This record is overloaded and its meaning depends on its context. If the record occurs before a "Start of data" record, then it is a "Required dictionary" record. If the record occurs after a "Start of data" record then it is a "Reset" record.

11.4.10.1 Shared shape dictionaries

Starting with version 21, the JB2 format provides support for sharing symbol definitions between the pages of a document. To achieve this objective, the JB2 image data chunk must be able to address symbols defined elsewhere by a JB2 dictionary data chunk shared by all the pages of a document.

A "Shared dictionary or reset" record can appear before the "Start of data" record. The record type field is followed by a single number arithmetically encoded using the sixteenth dictionary size context. This record appears when the JB2 data chunk requires symbols encoded in a separate JB2 dictionary data chunk. The number (the dictionary size) indicates how many symbols should have been defined by the JB2 dictionary data chunk. The decoder should simply load these symbols into the symbol library and proceed as usual. New symbols potentially defined by the subsequent JB2 image data records will therefore be numbered with integers greater than or equal to the dictionary size.

11.4.10.2 Numcoder reset

The encoding of numbers potentially uses an unbounded number of binary coding contexts. These contexts are normally allocated when they are used for the first time (see ).

Starting with version 21, a "Shared dictionary or reset" record can appear after the "Start of data" record. The decoder should proceed with the next record after clearing all binary contexts used for coding numbers. This operation implies that all binary contexts previously allocated for coding numbers can be deallocated.

Starting with version 21, the JB2 encoder should insert a "Shared dictionary or reset" record whenever the number of these allocated binary contexts exceeds 20000. Only very large documents ever reach such a large number of allocated binary contexts (e.g. large maps). Hardware implementation however can benefit greatly from a hard bound on the total number of binary coding contexts. Old JB2 decoders will treat this record type as an "End of data" record and cleanly stop decoding (see ).

11.4.10.3 Record types in a shared dictionary

The shared JB2 dictionary data format is a pure subset of the JB2 image data format:

Required dictionary
Start of data
New symbol, add to library only
Matched symbol with refinement, add to library only
Numcoder reset
Comment
End of data

Note that each shared dictionary can itself include another shared dictionary.

The JB2 dictionary data is usually located in an Djbz chunk. Each page may directly contain a Djbz chunk, or may indirectly point to such a chunk using an INCL chunk (see ).

11.4.11 Comment

This record contains data whose interpretation is not specified by the standard.

11.4.12 End of data

This record is the last record of an Sjbz chunk.

11.5 Fields

The following fields are coded in records of the types specified in Table 6 and in §11.4.

11.5.1 Record type

The record type is coded by the multivalue extension to the Z′-Coder using the record type context. The range of allowable record types is from 0 to 11. The coded values are specified in the first column of Table 6.

11.5.2 Image size

The width and height of the image are coded by the multivalue extension to the Z′-Coder using the image size context. The width is coded first, then the height. The range of allowable values is from 0 to 262142. The width and height of a compound DjVu image or bi-level DjVu image must be the same as the width and height of the image specified in the INFO chunk.

11.5.3 Eventual image refinement flag

The eventual image refinement flag is coded once, in the "Start of image" record, to notify the decoder whether image refinement data will eventually be provided. It is a binary value, coded by the Z′-Coder using the eventual image refinement context. The coded value 1 means TRUE and the coded value 0 means FALSE. Note: this flag is always FALSE in the current version of the standard, but it may be TRUE in later versions.

11.5.4 Index of matching symbol in bitmap library

The index of a matching symbol in the bitmap library is coded with the multivalue extension to the Z′-Coder, using the matching symbol index context. The range of allowable values is from 0 to one less than the number of symbols currently in the bitmap library.

11.5.5 Absolute symbol size

The width of a symbol is coded by the multivalue extension to the Z′-Coder using the symbol width context. Then the height of the symbol is coded by the multivalue extension to the Z′-Coder using the symbol height context. The range of allowable values for both of these data elements is from 0 to 262142.

11.5.6 Relative symbol size

The signed differences between the width and height of the current symbol and the width and heigh of the matching symbol are coded by the multivalue extension to the Z′-Coder using the symbol width different context and the symbol height difference context for the height. The width difference is coded first, then the height difference. The coded signed difference is added to the width or height of the matching symbol to obtain the width or height of the current symbol. The range of allowable values for both of these data elements is from -262143 to 262142.

11.5.7 Absolute location

The horizontal and vertical positions of the upper left corner of the bitmap are coded by the multivalue extension to the Z′-Coder using the symbol column number context for the horizontal position and the symbol row number context for the vertical position. The horizontal position is coded first, then the vertical position. The range of allowable values for the horizontal position is from 1 to the width of the image in pixels. The range of allowable values for the vertical position is from 1 to the height of the image in pixels.

11.5.8 Location relative to a previous symbol

The offset type flag is coded by the Z′-Coder using the offset type context. It indicates the reference symbol for coding the offset of the location of the current symbol. The coded value 1 means FIRST, which means that the location of the current symbol is being specified relative to the first symbol on the current text line. The value 0 means PREVIOUS, which means that the location of the current symbol is being specified relative to the most recently coded symbol on the current text line.

If the offset type flag is FIRST, then the reference symbol is the first symbol on the current text line. The horizontal offset is the signed difference between the left edge of the current symbol and the left edge of the reference symbol. It is coded with the multivalue extension to the Z′-Coder using the new line column offset context.

11.5.9 Bitmap by direct coding

Non-symbol bitmaps and symbol bitmaps with no sufficiently closely matching symbol in the symbol library are coded directly. A directly-coded bitmap is coded by repeated applications of the Z′-Coder to the pixels of the bitmap left to right across the rows, starting with the top row. When one row has been coded, the next lower row is coded. Each pixel is coded by the Z′-Coder using an appropriate context based on the values of 10 previously coded pixels. A coded value of 1 means the pixel is black. A coded value of 0 means the pixel is white.

The colors of the pixels numbered 1 through 10 in Figure 2, taken collectively, form a 10-bit value. Each of these values is an index into a table of 1024 different directly-coded bitmap contexts. The pixel labeled P in Figure 2 is coded using the context indexed by the collective values of the other 10 numbered pixels in the template. Pixels outside the bounding box are considered to be white.

Figure 2: template for direct coding

11.5.10 Bitmap by refinement coding

Some bitmaps are coded by making use of data from another bitmap; this process is called refinement coding. Matched symbols other than those to be copied are coded using refinement coding.

A bitmap coded by refinement coding is coded by repeated application of the Z′-Coder to the pixels of the bitmap, left to right across the rows. When one row has been coded, the next lower row is coded. Each pixel is coded by the Z′-Coder using an appropriate context based on the values of four previously coded pixels from the bitmap being coded and seven pixels from the matching bitmap. (The pixels numbered 1 through 4 in Figure 3 are from the current symbol; the pixels numbered 5 through 11 are from the matching symbol.) A coded value of 1 means the pixel is black. A coded value of 0 means the pixel is white. The colors of the pixels numbered 1 through 11 in Figure 3, taken collectively, form an 11-bit value. Each of these values is an index into a table of 2048 different refinement-coded bitmap contexts. The pixel labeled P in Figure 3 is coded using the context indexed by the collective values of the 11 numbered pixels in the template. Pixel 7 is in the position in the matching symbol that corresponds to the position of pixel P in the current symbol when the two symbols are aligned.

Alignment of the current bitmap and the matching bitmap proceeds as follows. For matched symbols, the current symbol and the matching symbol are aligned according to the geometric centers of their bounding rectangles. If the number of columns or rows is even, the geometric center falls between two columns or rows, respectively. In this case, the leftmost of the two central columns or the lowermost of the two central rows is considered to be the central column or row, respectively.

Figure 3: template for refinement coding. (a) Pixels from symbol being coded from matching symbol. (b) Pixels from matching symbol.

It is possible for the current symbol to have empty edge rows or columns. These empty rows and columns are coded, and are included in the bounding rectangle. For symbols added to the library, the symbol is added to the library after it has been placed into the image. Any empty edge rows and columns are removed before the symbol is added to the library.

11.5.11 Comment length

The comment length is the number of octets in the comment. It is coded by the multivalue extension to the Z′-Coder using the comment length context. The range of allowable values for the comment length is from 0 to 262142.

11.5.12 Comment data

Comment data consists of the individual octets of the comment. The number of octets in the comment is given by the comment length field. Each of the octets is coded using the multivalue extension to the Z′-Coder using the comment octet context. The range of allowable values for each octet is from 0 to 255.

12 Appendix 3: Z′ coding [top]

The Z′-Coder is an approximate binary arithmetic coder. Decoding proceeds as follows.^[1]

See also files ZPCodec.h and ZPCodec.cpp in DjVuLibre.

12.1 Registers and data storage

In Figure 1 and Figure 2, the values of variables A, C, D, and Z are stored in registers of at least 16 bits each. A and C retain their values between invocations of the Z′-Coder. Note: if register overflow can be ignored, storing variables A and C in registers of exactly 16 bits allows a simplification of lines 11, 12, 16, and 17 of Figure 1 and lines 8, 9, 12, and 13 of Figure 2.

At the beginning of a chunk, the values of A and C are reinitialized. When the decoder is decoding a chunk, it may require more bits than are present within the chunk’s data. In this case, all additional required bits are to be assumed by the decoder to be 1. If there are excess bits at the end of a chunk, they are ignored.

K is conceptually an array with a single 8-bit entry for each binary decision context. (In practice, K consists of a number of of individual values, arrays, and tree nodes, but each one has a specific address and a single 8-bit value at any time.) This array is indexed by the value of i, which is the input to the decoder. $K [i]$ is the current value of the probability state index for context i. $K [i]$ may be updated as part of the decoding process.

In pass-through mode, the decoder is invoked with no input argument. No context is involved.

B is the 1-bit value returned by the decoder.

The Z′-Coder is state-based. Decoding is governed by four fixed tables, given in Table 9. The tables are indexed by $K [i]$ , the probability state index for the current context. All probability state indices are initialized to 0. That is, at the beginning of coding, for all i, $K [i] = 0$ . These values are not reinitialized at the beginning of chunks after the first.

The more probably symbol is denoted by MPS. The MPS is 1 if the probability state index is an odd integer, and 0 if the probability state index is an even integer. The less probable symbol is denoted by LPS. The LPS is 0 if the probability state index is an odd integer, and 1 if the probability state context is an even integer.

$Δ_{k}$ is the amount by which the current arithmetic coding interval is reduced if the decoded symbol is the MPS. $θ_{k}$ is the threshold above which an MPS triggers a probability state update. $μ_{k}$ is the next probability state index for context k after an MPS triggers a probability state index update. An LPS always triggers a probability state index update. $λ_{k}$ is the next probability state index for context k after an LPS.

12.2 Initialization

Initially, A is set to 0x0000. Two octets are read from the input data stream into the lowest 16 bits of C. If the bits of C are numbered such that bit 15 is the most significant bit and bit 0 is the least significant bit, then the first input octet is stored in bits 15 through 8, and the second input octet is stored in bits 7 through 0.

12.3 Decoding

Figure 1 shows the steps involved in decoding a single binary decision. The input to the decoder is the index i of the appropriate context for the binary decision being decoded. The output from the decoder is a single bit B.

Z := A + Δ_K[i]
D := 0x6000 + (Z + A)/4
if (Z > D) { Z := D }
if (C > Z) {
    B := K[i] mod 2
    if (A ≥ θ_K[i]) { K[i] = μ_K[i] }
    A := Z
}
else {
    B := 1 - (K[i] mod 2)
    A := A + 0x10000 - Z
    C := C + 0x10000 - Z
    K[i] = λ_K[i]
}
while (A ≥ 0x8000) {
    A := A + A - 0x10000
    A := C + C - 0x10000 + next code bit
}
return B

Figure 1: decoder for the Z′-Coder

12.3.1 Notes on specific lines of Figure 1

Line 2: The division is a right shift, discarding the two least significant bits.
Lines 4–8: These lines are executed when the decoded bit is the MPS.
Line 5: This line determines the MPS from from the parity of the probability state index.
Line 6: Sometimes an MPS event triggers an update of the probability state index, based on the value of $θ_{k}$ . Note that when the probability state index $k = 0$ or $k > 83$ , $θ_{k} = 0$ , so an MPS will trigger an update of the probability state index. All probability state indices are initialized to 0, but the first coded decision for a context causes the index to become at least 83. When $k = 0$ or $k \geq 83$ , the probability estimate for the context is in its early estimation phase. When $0 < k < 83$ , the probability estimate for the context is in its steady-state phase, which it never leaves.
Lines 9–14: These lines are executed when the decoded bit is the LPS.
Line 10: This line determines the LPS from the parity of the probability state index.
Line 13: An LPS always triggers an update of the probability state index.
Lines 15–18: When the values in the registers are too large, they must be renormalized.
Lines 16–17: $A + A$ and $C + C$ may be accomplished by left shifts, leaving the least significant bit equal to 0.
Line 17: The least significant bit of C is filled with the next bit from the input stream. Bits are taken from each octet in the input stream most significant bit first.

12.4 Pass-through decoding

Figure 2 shows the steps involved in decoding a single binary decision using the Z′-Coder in pass-through mode. No input is required. No context is involved. No probability state index values are updated. The output from the decoder is the single bit B.

Z := 0x8000 + (A + A + A)/8
if (C > Z) {
    B := 0
    A := Z
}
else {
    B := 1
    A := A + 0x10000 - Z
    C := C + 0x10000 - Z
}
while (A ≥ 0x8000) {
    A := A + A - 0x10000
    A := C + C - 0x10000 + next code bit
}
return B

Figure 2: decoder for the Z′-Coder operating in pass-through mode

12.4.1 Notes on specific lines of Figure 2

Line 1: The division is a right shift, discarding the three least significant bits.
Lines 2–5: These lines are executed when the decoded bit is 0.
Lines 6–10: These lines are executed when the decoded bit is 1.
Lines 11–14: When the values in the registers are too large, they must be renormalized.
Lines 12–13: $A + A$ and $C + C$ may be accomplished by left shifts, leaving the least significant bit equal to 0.
Line 13: The least significant bit of C is filled with the next bit from the input stream. Bits are taken from each octet in the input stream most significant bit first.

13 Appendix 4: BZZ coding [top]

Numerous streams in the DjVu file format are compressed using the general-purpose compressor described here called “BZZ”. BZZ transforms the input data using the well-documented Burrows–Wheeler transform. However, the traditional “Move To Front” permutation table is augmented with a frequency estimation provided by the Z′-Coder.

13.1 Encoding

BZZ first takes as input a 24-bit integer as block size between 10K and 4M and an input stream (to be compressed). The stream is partitioned into blocks terminated with a special ⟨EOB⟩ symbol. It is then transformed using the well-documented Burrows–Wheeler (BW or “block sorting”) transform. Then, one block at a time, the block size and resulting output stream are passed as input to be compressed using the Z′-Coder (Appendix 3).

13.2 Decoding

We describe the decoding algorithm by means of pseudo-code.

13.2.1 Decoding pseudo-code

13.2.2 Notes

13.2.2.1 Overview of decoding a block

For each block, one must decode

the block size (with decode_raw)
the estimation speed FSHIFT ∈ {0, 1, 2} (two bits^[1] with the pass-through decoder)
the sequence of symbols representing the Burrows–Wheeler transform of the block. At this point, the sequence of symbols is logically encoded as a sequence of numbers representing the position of each symbol in the MTF array.

Then one must perform the inverse Burrows–Wheeler transform to recover the decoded block.

The following points are significant when recovering the BWT and are discussed below:

The MTF array is reordered after decoding each number.
The numbers themselves are arithmetically encoded.

13.2.2.2 MTF array reordering

The MTF array contains 256 bytes initialized with the identity mapping, that is MTF[0] = 0, MTF[1] = 1, … , MTF[255] = 255.

Whenever one decodes a number MTFNO, the corresponding symbol to store in the Burrows–Wheeler buffer is MTF[MTFNO] (except for the ⟨EOB⟩ symbol — see §13.2.2.3) and the contents of the MTF array are rotated. The rotation moves the symbol that was at position MTF[MTFNO] to a position M that can be 0, 1, 2, or 3. Meanwhile the symbols MTF[M] to MTF[MTFNO - 1] are moved to positions M + 1 to MTFNO.

The position M is chosen using an estimate of the frequency of the symbol MTF[MTFNO]. One strives to position the most frequent symbols at the beginning of the MTF array. To that end, one maintains an array FREQ[0..3] that contains numbers representative of the instantaneous frequencies of the symbols MTF[0..3].

Of course this array must also be “rotated” when the rotation of the MTF array affects its first four elements.

Consider the frequency $F (T)$ of a particular symbol S measured after decoding the Tth symbol. Ideally, $F (T) = λ F (T - 1) + D$ where

$0 < λ \leq 1$ . This models how quickly one forgets past information, and
$D = 1$ if the Tth symbol is S, and $D = 0$ otherwise. This allows $F (T)$ to grow each time the symbol S occurs.

To avoid multiplying all the frequencies by $λ$ , the FREQ array contains instead $G (T) = F (T) / λ^{T}$ It is then easy to see that $G (T) = G (T - 1) + D / λ^{T}$ Therefore we only need to update the G corresponding to the symbol being decoded (i.e. $D = 1$ ), since the G for the other symbols does not change.

A dedicated variable FADD contains $λ^{T}$ . Before each rotation we divide FADD by $λ^{T}$ . This is accomplished by the line

FADD = FADD + SHIFTRIGHT(FADD, FSHIFT)

The values 0, 1 or 2 of variable FSHIFT correspond to $λ = 1 / 2$ , $2 / 3$ , or $4 / 5$ . To avoid overflows we divide everything (FADD and FREQ[0..3]) by 0x10000000 whenever FADD becomes bigger than 0x10000000. This happens rarely enough to take very little time.

The $G (T)$ of the freshly decoded symbol is therefore $G (T - 1) +$ FADD. We can only compute this exactly when $S$ is one of the first four symbols of the MTF because we only store FREQ[0..3]. If the decoded number MTFNO is greater than 3, we assume that $G (T - 1) = 0$ and simply consider $G (T) =$ FADD.

The number M is then chosen to make sure the array FREQ remains sorted in decreasing order after the rotation.

13.2.2.3 Decoding the number `MTFNO`

Now we can discuss how the numbers MTFNO are stored. There are 262 arithmetic coding contexts. These are initialized to zero at the beginning of the stream decoding process. They should not be reset to zero at the beginning of the block decoding process.

Because the most frequently used symbols should appear near the front of the array, we expect small values for MTFNO (the index into the MTF array). By design, the number of bits and the number of contexts required to decode increases for larger values of MTFNO:

A first bit is decoded using context 0, 1 or 2. Context 0 or 1 is used if the previous MTFNO was 0 or 1. Otherwise context 2 is used. If this bit is set, the new MTFNO is 0.
Otherwise a second bit is decoded using context 3, 4 or 5. Context 3 or 4 is used if the previous MTFNO was 0 or 1. Otherwise context 5 is used. If this bit is set, the new MTFNO is 1.
Otherwise a third bit is decoded using context 6. If this bit is set, the new MTFNO is obtained by adding 2 to a 1-bit number decoded with decode_bin using context 7.
Otherwise a fourth bit is decoded using context 8. If this bit is set, the new MTFNO is obtained by adding 4 to a 2-bit number decoded with decode_bin using contexts 9..11.
And so forth until…
Otherwise a ninth bit is decoded using context 132. If this bit is set, the new MTFNO is obtained by adding 128 to a 7-bit number decoded with decode_bin using contexts 133..261.
Otherwise the next symbol is the ⟨EOB⟩ symbol. Since there is only one ⟨EOB⟩ symbol, we store a zero in the Burrows–Wheeler buffer and record its position in variable MARKERPOS.

13.2.2.4 Inverse Burrows–Wheeler transform

After decoding the BLOCKSIZE symbols composing the Burrows–Wheeler buffer, we need to perform the inverse Burrows–Wheeler transform to recover the BLOCKSIZE - 1 decoded bytes followed by the ⟨EOB⟩ symbol.

To start, we

copy the buffer into an array POSC[0..BLOCKSIZE - 1],
prepare an array COUNT[0..255] that counts how many occurrences of each symbol are found,
prepare an array POSN[0..BLOCKSIZE - 1] that indicates the rank of each occurrence of a symbol in the buffer.

Imagine that we are sorting the buffer in symbol order (⟨EOB⟩ being the smallest symbol). The buffer would be composed of a single ⟨EOB⟩, followed by a run of COUNT[0] symbols 0, followed by a run of COUNT[1] symbols 1, etc.

Using the COUNT array, we compute the position SORTEDPOS[0..255] of each run of symbol in this array.

To perform the inverse Burrows–Wheeler transform, it is now sufficient to follow the thread backwards:

The array DATA[0..BLOCKSIZE - 2] then contains the decoded bytes of the block.

Document Date:	November 2005
From:	Lizardtech, a Celartem Company
Status of Standard:	Released

DjVu Reference

1 Introduction [top]

2 Document organization [top]

3 Overview [top]

3.1 DjVu images

3.2 DjVu documents

3.2.1 Bundled multi-page documents

3.2.2 Indirect multi-page documents

3.3 Non-raster data

3.3.1 Annotations

3.3.2 Hidden text

3.3.3 Thumbnails

4 What’s new in the DjVu file format [top]

5 Acknowledgements [top]

6 References [top]

6.1 DjVu 2

6.2 IFF

6.3 JPEG

6.4 TIFF

6.5 G4

6.6 UTF-8

6.7 DjVuLibre

7 Component pieces (IFF chunks) of DjVu documents and images [top]

7.1 Single-page documents

7.1.1 Photo DjVu image

7.1.2 Bi-level DjVu image

7.1.3 Compound DjVu image

7.1.3.1 Foreground encoding

7.1.3.2 Background encoding

7.1.3.3 Alternative encodings

7.1.3.4 Annotations and textual information

7.2 Multi-page documents

7.2.1 Component files

7.2.2 Including shared information

8 Low-level chunk structure and definition [top]

8.1 Header

8.2 DjVu file structure

8.2.1 IFF wrapper

8.2.2 Chunk summary

8.3 IFF chunk types

8.3.1 Container chunk: FORM

8.3.1.1 FORM:DJVM

8.3.1.2 FORM:DJVU

8.3.1.3 FORM:DJVI

8.3.1.4 FORM:THUM

8.3.2 Directory chunk: DIRM

8.3.2.1 Unencoded data

8.3.2.2 BZZ-encoded data

8.3.3 Document outline chunk: NAVM

8.3.4 Annotation chunk: ANTa, ANTz

8.3.4.1 Initial document view

8.3.4.1.1 Background color

8.3.4.1.2 Initial zoom

8.3.4.1.3 Initial display level

8.3.4.1.4 Alignment

8.3.4.2 Maparea (overprinted annotations)

8.3.4.2.1 url

8.3.4.2.2 comment

8.3.4.2.3 area

8.3.5 Text chunk: TXTa, TXTz

9 DjVu in the raw (binary and IFF-level dumps) [top]

10 Appendix 1: IW44 coding [top]

10.1 Definitions

10.2 Color chunks within a DjVu file

10.3 Color chunk data headers

10.4 Color chunk data

10.4.1 Hierarchical structure of a coded color layer

10.4.1.1 Band counting

10.4.2 Quantization of coefficients

11 Appendix 2: JB2 coding [top]

11.1 General considerations

11.2 Arithmetic coding

11.2.1 Initialization of the Z′-Coder

11.2.2 The multivalue extension to the Z′-Coder for coding of numeric data

11.2.3 Record types

11.2.4 Fields and contexts

11.2.5 Coding phases

11.2.5.1 Phase 1

11.2.5.2 Phase 2

11.2.5.3 Phase 3

8.3.4.2.1 `url`

8.3.4.2.2 `comment`

8.3.4.2.3 `area`

13.2.2.3 Decoding the number `MTFNO`