Document Date: | November 2005 |
From: | Lizardtech, a Celartem Company |
Status of Standard: | Released |
Although the Internet has given us a worldwide infrastructure on which to build the universal library, much of the world’s knowledge, history, and literature is still trapped on paper in the basements of the world’s traditional libraries. Many libraries and content owners are in the process of digitizing their collections. While many such efforts involve the painstaking process of converting paper documents to computer-friendly form, such as SGML-based formats, the high cost of such conversions limits their extent. Scanning documents and distributing the resulting images electronically is not only considerably cheaper, but also more faithful to the original document because it preserves its visual aspect.
Despite the quickly-improving speed of network connections and computers, the number of scanned document images accessible on the Web today is relatively small. There are several reasons for this.
The first reason is the relatively high cost of scanning anything else but unbound sheets in black and white. This problem is slowly going away with the appearance of fast and low-cost color scanners with sheet feeders.
The second reason is that long-established image compression standards and file formats have proved inadequate for distributing scanned documents at high resolution, particularly color documents. Not only are the file sizes and download times impractical, the decoding and rendering times are also prohibitive. A typical magazine page scanned in color at 100 dpi in JPEG would typically occupy 100 KB to 200 KB, but the text would be hardly readable: insufficient for screen viewing and totally unacceptable for printing. The same page at 300 dpi would have sufficient quality for viewing and printing, but the file size would be 300 KB to 1000 KB at best, which is impractical for remote access. Another major problem is that a fully-decoded 300 dpi color image of a letter-size page occupies 24 MB of memory and easily causes disk swapping.
The third reason is that digital documents are more than just a collection of individual page images. Pages in a scanned document have a natural serial order. Special provision must be made to ensure that flipping pages be instantaneous and effortless so as to maintain a good user experience. Even more important, most existing document formats force users to download the entire document first before displaying a chosen page. However, users often want to jump to individual pages of the document without waiting for the entire document to download. Efficient browsing requires efficient random page access, fast sequential page flipping, and quick rendering. This can be achieved with a combination of advanced compression, pre-fetching, pre-decoding, caching, and progressive rendering. DjVu decomposes each page into multiple components (text, backgrounds, images, libraries of common shapes…) that may be shared by several pages and downloaded on demand. This allows a suitably designed DjVu-viewing application to handle on-demand downloading, pre-fetching, decoding, caching, and progressive rendering of the page images.
This document describes the DjVu file format. It is written “from top down”, providing first a high-level understanding of the features and techniques used in DjVu (see §3), then a mid-level view at the IFF85 level (see §7), and finally a very detailed description of the underlying algorithms and byte-by-byte makeup of DjVu files (see §8).
This section describes the DjVu file format at a high level: how DjVu uses the Mixed Raster Content model, how images are composed into documents, and the non-raster data that such documents can also contain.
The principal imaging model used in DjVu is the “Mixed Raster Content” (MRC) model described in ITU-T Recommendation T.44, ISO/IEC 16485. In this model, an image is decomposed into foreground and background layers. To select whether a particular pixel comes from the foreground or background, a bitonal “selection” or “mask” layer is provided. These three layers are compressed separately using techniques which are optimized for each type of data. The foreground and background layers are compressed using a wavelet-based continuous-tone image compression technique known as “IW44”. The mask layer is compressed using a bitonal image compression technique that takes advantage of repetitions of nearly identical shapes on the page (such as characters) to efficiently compress text images. A DjVu image need not contain all three layers and alternative compression techniques are available for each layer. DjVu documents can be single- or multi-page. Each page consists of a DjVu image as described above (photo, bitonal, or an MRC-based composition). Such a page, by itself, is a valid DjVu document. Multi-page documents can take either of two forms: bundled or indirect. A bundled multi-page DjVu document uses a single file to represent the entire document. This single file contains all the pages as well as ancillary information (e.g. the page directory, data shared by several pages, thumbnails, etc.). Using a single-file format is very convenient for storing documents or for sending email attachments. There are problems inherent to storing multiple pages in a single file. A viewer may not be able to utilize a byte-serving mechanism such as that available in HTTP/1.1. Therefore any request for any page of such a file will necessarily result in the entire document being transmitted. Furthermore, a reasonable work pattern is to read the first few pages (perhaps a table of contents) and then navigate to a page much further into the document. However, in such a file, data for page 100 can not be viewed until after data for pages 1–99 have been downloaded. Indirect multi-page documents address these problems. Such a document is composed of several files. The main file is named the index file. You can view document using the URL of the index file, just like you do with a bundled multi-page document. However, the index file is very small. It simply contains the document directory and the URLs of secondary files containing the page data. When you view an indirect multi-page document, the viewer only needs to download the files corresponding to the pages you are viewing. Every DjVu image optionally includes several different kinds of annotations. These annotations are often used to define hyperlinks to other document pages or to arbitrary Web pages. They can also be used for other purposes such as setting the initial viewing mode of a page and defining highlighted zones. Every DjVu image optionally includes a hidden text layer that associates graphical features with the corresponding text. The hidden text layer is usually generated by running optical character recognition software. This textual information provides for indexing DjVu documents and copying/pasting text from DjVu page images. DjVu documents sometimes contain pre-computed page thumbnails. These allow a viewer to display a graphical representation of many pages by downloading a very small “thumbnail” file instead of the actual pages themselves.3.1 DjVu images
3.2 DjVu documents
3.2.1 Bundled multi-page documents
3.2.2 Indirect multi-page documents
3.3 Non-raster data
3.3.1 Annotations
3.3.2 Hidden text
3.3.3 Thumbnails
Since the last update to the file format documentation, Reference 1, the file format has been extended to include:
This work is significantly based on Reference 1 and the summary of file format changes described in the DjVuLibre project maintained by Leon Bottou and others.
The DjVu file format specification that was originally released by AT&T in 1999. http://djvuzone.org/djvu/djvu/djvuspec/001.djvu
EA IFF 85 format, Electronic Arts’ public-domain IFF standard for an Interchange File Format, released in January, 1985. http://www.dcs.ed.ac.uk/home/mxr/gfx/2d/IFF.txt
JPEG File Interchange Format, Version 1.01 (ISO DIS 10918-1, JPEG JFIF). The specification is located at http://www.w3.org/Graphics/JPEG/jfif.txt.
http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf
ITU-T (CCITT) T.6. Facsimile Coding Schemes and Coding Control Functions for Group 4 Facsimile Apparatus. https://www.itu.int/rec/T-REC-T.6-198811-I/en
All text in DjVu files is Unicode-encoded using the UTF-8 encoding. http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf
An open-source reference implementation of this file format specification is available at http://sourceforge.net/projects/djvu/. Throughout this specification, there are numerous references to source files in this implementation.
This section describes the DjVu file format at a middle level. This includes types of chunks which can go into various types of documents but not a detailed layout of the contents of those chunks.
DjVu documents are IFF85 files (see Reference 2 for details). The IFF85 structure provides a hierarchy of containers which hold various types of information in a DjVu file. The containers are called “chunks”. How the chunk is used (what it holds) can be determined by its “chunk type” or “chunk ID”. For example, the list of files contained in a multi-page document is held in the DIRM (“directory”) chunk; annotations are held in a ANTz chunk.
FORM chunks are composite (contain other chunks). Their specific use is exposed by a secondary chunk ID. For example, a single page consists of several different chunks all contained within a single FORM:DJVU chunk. A multi-page document consists of several pages (and other chunks) all contained in a FORM:DJVM chunk.
This section discusses the various kinds of DjVu documents and the corresponding chunks of which they consist.
A single-page document is composed of a single FORM:DVJU composite chunk. This composite chunk always begins with one INFO chunk describing the image size, resolution and related information (see §8.3.11). The document contains exactly one DjVu image, whose content varies as described below.
Photo DjVu image files are best used for encoding photographic images in colors or in shades of gray. The data compression model relies on the IW44 wavelet representation. This format is designed such that the IW44 decoder is able to quickly perform progressive rendering of any image segment using only a small amount of memory. One or more additional BG44 chunks contain the image data encoded with the IW44 representation. The image size specified in the INFO chunk and the image size specified in the IW44 data must be equal.
Bi-level DjVu image files are used to compress black and white images representing text and simple drawings. The JB2 data compression model uses the soft pattern-matching technique, which essentially consists of encoding each character by describing how it differs from a well-chosen already-encoded character. A Sjbz chunk contains the bi-level data encoded with the JB2 representation (see Appendix 2). The image size specified in the INFO chunk and the image size specified in the JB2 data must be equal.
Compound DjVu image files are an extremely efficient way to compress high-resolution compound document images containing both images and text, such as a page of a magazine. Compound DjVu files represent the document image using two layers. The background layer is used for encoding the pictures and the paper texture.
The foreground layer is used for encoding the text and the drawings. Additional chunks hold the components of either the foreground or the background layers.
The main component of the foreground layer is a bi-level image named the foreground mask. The pixel size of the foreground mask is equal to the size of the DjVu image. It contains a black-on-white representation of the text and the drawings. This image is encoded by a Sjbz chunk using the JB2 representation. There may also be a companion chunk Djbz containing a shape dictionary that defines bi-level shapes referenced by the Sjbz chunk.
The foreground colors can be encoded according to two models:
The background layer is a color image, the background color image, encoded by an arbitrary number of BG44 chunks containing successive IW44 refinements (see Appendix 1). The size of this image is computed by rounding up the quotient of the mask size by an integer sub-sampling factor ranging from 1 to 12. Most compound DjVu images use a background sub-sampling factor equal to 3. Smaller sub-sampling factors are adequate for images with a very rich paper texture. Larger sub-sampling factors are adequate for images containing no pictures.
There are no ordering or interleaving constraints on these chunks except that (a) the INFO chunk must appear first, and (b) the successive BG44 refinements must appear with their natural order. The chunk order simply affects the progressive rendering of DjVu images in a web browser.
Besides the JB2 and IW44 encoding schemes, the DjVu format supports alternative encdoding methods for its components.
The foreground mask may be represented by a single Smmr chunk instead of Sjbz. The Smmr chunk contains a bi-level image encoded with the Fax-G4/MMR method. Although the resulting files are typically six times larger, this capability can be useful when DjVu is used as a front-end for fax machines and scanners with embedded Fax-G4/MMR capabilities.
The background color image may be represented by a single BGjp chunk instead of several BG44 chunks. The BGjp chunk contains a JPEG-encoded color image (see JPEGDecoder.cpp). The resulting files are significantly larger and lack the progressivity of the usual DjVu files. This is useful because some scanners have embedded JPEG capabilities.
The foreground color image may be represented by a single FGjp chunk instead of a single FG44 chunk. This is useful because some scanners have embedded JPEG capabilities.
All types of DjVu images may contain annotation chunks. Annotation chunks are used to describe hyperlinks, to specify more viewer settings (page background, initial zoom, etc.), and to hold metadata information. Annotations are contained in ANTa or ANTz chunks.
All types of DjVu image files may also contain a computer-readable description of the text appearing on the page. This information is contained in either a TXTa chunk or a TXTz chunk.
A multi-page document is composed of a FORM:DJVM whose first chunk is a DIRM chunk containing the document directory. This directory lists all component files composing the given document, helps to access every component file and identifies the pages of the document.
In a bundled multi-page file, the component files are stored immediately after the DIRM chunk, within the FORM:DJVM composite chunk.
In an indirect multi-page file, the component files are stored in different files whose URLs are composed using information stored in the DIRM chunk.
A multi-page DjVu document necessarily references other FORM (composite) chunks. Specifically:
Each of these composite chunks (FORM:DJVU, FORM:THUM, FORM:DJVI) is a well-formed IFF byte stream in its own right and can be held in a separate disk file. In the context of a multi-page — either bundled or indirect — document, we refer to these composite chunks as component files.
In many cases, efficiencies can be achieved by sharing JB2 shape definitions and/or annotations across pages. To facilitate this, any DjVu image file contained in a multi-page file may contain an INCL chunk containing the ID of a shared component file. The decoder processes the chunks contained in the shared component file as if the DjVu image file contained them. All relevant pages include this shared component file. Although they appear in several pages, these shared shapes are encoded only once in the document.
A shared component file is composed of a single FORM:DJVI potentially containing any information otherwise allowed in a DjVu image file (except for the INFO chunk of course).
This section describes the DjVu file format at a low level. This includes the binary layout of the IFF85 wrapper and, of course, the layout of each contained chunk.
The first four bytes of a DjVu file are 0x41 0x54 0x26 0x54. This preamble is not part of the EA IFF85 format, but it is required in order to identify DjVu files.
An IFF file consists of a number of chunks. Each chunk is laid out in three fields:
BYTE[4] | Chunk ID. Describes the use of the chunk. The strings that identify the types of chunks used in DjVu are listed below. |
BE32 | The length of the data. |
BYTE[] | The contained data. |
A chunk whose type is not recognized by the application is to be ignored. In the IFF format, chunks may be nested: a chunk may contain other chunks as part of its data. In the DjVu format, there is only one chunk at the outermost nesting level, a FORM chunk. All other chunks appear within the FORM chunk, sequentially, with no nesting.[1]
Example:
Offset | Bytes | Interpretation |
---|---|---|
00000000 | 41 54 26 54 | "AT&T"; magic described in §8.1. |
00000004 | 46 4f 52 4d | "FORM"; chunk ID = FORM |
00000008 | 00 00 68 a6 | 0xa668 = 26970; length of this FORM chunk |
0000000B | 44 4a 56 55 | "DJVU"; first four bytes of contained data. Since this is a FORM chunk, this starts with the sub-identifier. This is a FORM:DJVU chunk, a single-page document. |
The chunks used in the DjVu file format are summarized in Table 1.
Chunk ID | Usage |
---|---|
FORM | The composite chunk. The first four bytes of the FORM chunk are a secondary identifier. Such chunks are referred to as FORM:XXXX where XXXX stands for the secondary identifier. |
FORM:DJVM | A multi-page DjVu document. Composite chunk that contains the DIRM chunk, possible shared/included chunks and subsequent FORM:DJVU chunks which make up a multi-page document. |
FORM:DJVU | A DjVu page/single-page DjVu document. Composite chunk that contains the chunks that make up a page in a DjVu document. |
FORM:DJVI | A “shared” DjVu file which is included via the INCL chunk. Shared annotations, shared shape dictionary. |
FORM:THUM | Composite chunk that contains the TH44 chunks that are the embedded thumbnails. |
DIRM | Page name information for multi-page documents. |
NAVM | Bookmark information. |
ANTa, ANTz | Annotations, including both initial view settings and overlaid hyperlinks, text boxes, etc. |
TXTa, TXTz | Unicode text and layout information. |
Djbz | Shared shape table. |
Sjbz | BZZ-compressed JB2 bitonal data used to store mask. |
FG44 | IW44 data used to store foreground. |
BG44 | IW44 data used to store background. |
TH44 | IW44 data used to store embedded thumbnail images. |
WMRM | JB2 data required to remove a watermark.[1] |
FGbz | Color JB2 data. Provides a color for each [blit or shape?] in the corresponding Sjbz chunk. |
INFO | Information about a DjVu page. |
INCL | The ID of an included FORM:DJVI chunk. |
BGjp | JPEG-encoded background. |
FGjp | JPEG-encoded foreground. |
Smmr | G4-encoded mask. |
The FORM chunk is used as a chunk container. The first four bytes of the FORM chunk are a secondary ID used to identify the contained chunks.
As discussed in §7.2, a multi-page DjVu document is composed of a single (composite) FORM:DJVM chunk. The first nested chunk is always a DIRM chunk containing the document directory (see DjVmDir.h), which represents the list of the component files that make up the document. An optional NAVM chunk, which describes the outline of the document, may follow the DIRM chunk.
Example:
FORM:DJVM [126475 bytes] DIRM [59 bytes] Document directory (bundled, 3 files, 2 pages) FORM:DJVI [3493 bytes] dict0002.iff FORM:DJVU [115016 bytes] p0001.djvu FORM:DJVU [7869 bytes] p0002.djvu
As discussed in §7.1, a single page in a DjVu document is contained in a single (composite) FORM:DJVU chunk. The nested first chunk must be the INFO chunk. The chunks after the INFO chunk may occur in any order, although the order of the BG44 chunks, if there is more than one, is significant.
Example:
FORM:DJVU [26790 bytes] INFO [10 bytes] 2202 × 967, version 26, 300 dpi, gamma = 2.2 Sjbz [13133 bytes] JB2 bi-level data FG44 [185 bytes] IW44 data #1, 76 slices, version 1.2 (color), 184 × 81 BG44 [935 bytes] IW44 data #1, 74 slices, version 1.2 (color), 734 × 323 BG44 [1672 bytes] IW44 data #2, 10 slices BG44 [815 bytes] IW44 data #3, 4 slices BG44 [9976 bytes] IW44 data #4, 9 slices
Multi-page DjVu documents can share information between pages by nesting a chunk inside a FORM:DJVI chunk (which is itself held inside the FORM:DJVM chunk) and referencing the contained chunk from within a page. Individual pages reference the shared chunks via the INCL chunk.
Example:
FORM:DJVM [126475 bytes] DIRM [59 bytes] Document directory (bundled, 3 files, 2 pages) FORM:DJVI [3493 bytes] dict0002.iff Djbz [3481 bytes] JB2 shared dictionary FORM:DJVU [115016 bytes] p0001.djvu INFO [10 bytes] 2539 × 3295, version 25, 300 dpi, gamma = 2.2 INCL [12 bytes] Indirection chunk → dict0002.iff Sjbz [70497 bytes] JB2 bi-level data
Pre-rendered thumbnails may be included. This allows very large documents to render thumbnails of pages without downloading and decoding them. FORM:THUM chunks contain several TH44 chunks. Each of these chunks contains the thumbnails of the pages that follow.
Example:
FORM:DJVM [2272012 bytes] DIRM [108 bytes] Document directory (bundled, 7 files, 4 pages) FORM:THUM [5960 bytes] p0001.thumb TH44 [5984 bytes] Thumbnail icon for page 1 FORM:DJVU [1413380 bytes] p0001.djvu INFO [10 bytes] 4728 × 6300, version 25, 600 dpi, gamma = 2.2 … FORM:THUM [12148 bytes] p0004.thumb TH44 [3418 bytes] Thumbnail icon for page 2 TH44 [4150 bytes] Thumbnail icon for page 3 TH44 [4552 bytes] Thumbnail icon for page 4 FORM:DJVU [777858 bytes] p0002.djvu …
As described in §7.2, a multi-page document will contain “component files” such as individual pages (FORM:DJVU) or shared annotations (FORM:DJVI).
The first contained chunk in a FORM:DJVM composite chunk is the DIRM chunk containing the document directory. It contains information the decoder will need to access the component files (see §7.2).
The first part of the DIRM chunk is unencoded:
BYTE | Bit 7 (the MSB) is the bundled flag: 1 for bundled, 0 for indirect. Bits 6 through 0 are the version, currently 1. |
BE16 | Number of component files. |
BE32[] | When the document is a bundled document (i.e. the flag bundled is set), the header above is followed by the offsets of each of the component files within the FORM:DJVM. These offsets allow for random component file access. When the document is indirect, these offsets are omitted. |
The rest of the chunk is entirely compressed with the BZZ general-purpose compressor. We describe now the data fed into (or retrieved from) the BZZ codec (see BSByteStream.cpp and Appendix 4).
BE24[] | Size of each component file. May be 0 for indirect documents. |
BYTE[] |
Flag byte for each component file.
Flag hasname is set when the name of the file is different from the file ID. Flag hastitle is set when the title of the file is different from the file ID. These flags are used to avoid encoding the same string three times. Note: in practice, the hasname and hastitle bits are poorly tested and not used. |
ZSTR[] | There are one to three zero-terminated strings per component file. The first string contains the ID of the component file. If hasname is set, then there is a second string which contains the name of the component file (in the case of an indirect file, this is the disk filename). If hastitle is set, then there is a third string which contains the name of the component (for display — for example, alternate page numberings in the Foreword or Preface). |
Examples:
Bytes | Interpretation |
---|---|
unencoded portion | |
81 | bundled document, version 1 |
00 03 | 3 files |
00 00 00 54 00 00 0e 02 00 01 cf 52 |
file offsets: 84, 3586, 118610 |
encoded portion (as passed to encoder) | |
00 0d ad 01 c1 50 00 1e c5 |
file sizes: 3501, 115024, 7877 |
00 01 01 |
flag bytes: included file, hasname = 0, hastitle = 0; page file, hasname = 0, hastitle = 0; page file, hasname = 0, hastitle = 0 |
64 69 63 74 30 30 30 32 2e 69 66 66 00 70 30 30 30 31 2e 64 6a 76 75 00 70 30 30 30 32 2e 64 6a 76 75 00 |
file IDs: dict0002.iff, p0001.djvu, p0002.djvu |
Bytes | Interpretation |
---|---|
unencoded portion | |
01 | indirect document, version 1 |
00 03 | 3 files |
encoded portion (as passed to encoder) | |
00 0d ad 01 c1 50 00 1e c5 |
file sizes: 3501, 115024, 7877 |
00 01 01 |
flag bytes: included file, hasname = 0, hastitle = 0; page file, hasname = 0, hastitle = 0; page file, hasname = 0, hastitle = 0 |
64 69 63 74 30 30 30 32 2e 69 66 66 00 70 30 30 30 31 2e 64 6a 76 75 00 70 30 30 30 32 2e 64 6a 76 75 00 |
file IDs: dict0002.iff, p0001.djvu, p0002.djvu |
The NAVM chunk contains bookmarks which describe an outline of the document. The intent is to allow content authors to create an electronic table of contents which gives users rapid access to various parts of the document.
This chunk is optional, but, if present, must immediately follow the DIRM chunk.
The entire chunk is BZZ-encoded and starts with a single field specifying the total number of bookmark records:
BE16 | The total number of bookmarks in the document. |
And then the individual bookmark records, nested as necessary:
BYTE | The number of immediate child bookmark records. |
BE24 | Size of the description text. |
STR | The description text. |
BE24 | Size of the URL text. |
STR | The URL text. This may (and typically does) use the syntax described for the URLs in the ANTa chunk (and similarly, is not URL-encoded). |
Example (as passed to the BZZ encoder):
Consider a small document outline as follows:
There is no hyperlink associated with the single root entry "Table of Contents". At a binary level, the chunk looks like this [before encoding]:
Offset | Bytes | Interpretation |
---|---|---|
0012F06C | 00 04 | 4 bookmarks |
first record | ||
0012F06E | 02 | 2 children |
0012F06F | 00 00 11 | 17 bytes of description text |
0012F072 | 54 61 62 6c 65 20 6f 66 20 43 6f 6e 74 65 6e 74 73 |
description text: "Table of Contents" |
0012F083 | 00 00 00 | 0 bytes of URL text |
second record | ||
0012F086 | 00 | 0 children |
0012F087 | 00 00 0c | 12 bytes of description text |
0012F08A | 49 9e 74 72 6f 64 75 63 74 69 6f 6e |
description text: "Introduction" |
0012F096 | 00 00 0b | 11 bytes of URL text |
0012F099 | 23 70 30 30 30 31 2e 64 6a 76 75 | URL text: "#p0001.djvu" |
third record | ||
0012F0A4 | 01 | 1 child |
0012F0A5 | 00 00 09 | 9 bytes of description text |
0012F0A8 | 44 61 74 61 73 68 65 65 74 |
description text: "Datasheet" |
0012F0B1 | 00 00 0b | 11 bytes of URL text |
0012F0B4 | 23 70 30 30 30 32 2e 64 6a 76 75 | URL text: "#p0002.djvu" |
fourth record | ||
0012F0BF | 00 | 0 children |
0012F0C0 | 00 00 16 | 22 bytes of description text |
0012F0C3 | 46 6f 72 20 4d 64 72 65 20 49 6e 66 6f 20 28 4f 6e 6c 69 6e 65 29 |
description text: "For More Info (Online)" |
0012F0D9 | 00 00 19 | 25 bytes of URL text |
0012F0DC | 68 74 74 70 3a 2f 2f 77 77 77 2e 6c 69 7a 61 72 64 74 65 63 68 2e 63 6f 6d |
URL text: "http://www.lizardtech.com" |
Annotations are contained in ANTa or ANTz chunks. The ANTa chunks contain the annotation in plain text. The ANTz chunks contain the same information compressed with the BZZ encoder (see BSByteStream.h).
The use of the ANTa chunk is discouraged.
Pages can share annotations using an INCL chunk as explained in §7.2.2. The complete annotation text is obtained by concatenating all annotation chunks present in the page. A restriction of the current reference library implementation limits the number of shared annotation files to one.[1]
The syntax of the annotation text uses a simple parenthesized notation. All text is standard UTF-8.
Text is contained in TXTa or TXTz chunks. The TXTa chunks contain the text unencoded. The TXTz chunks contain the same information compressed with the BZZ encoder (see BSByteStream.h).
The use of the TXTa chunk is discouraged.
The chunk begins with the UTF-8-encoded text of the page:
BE24 | size of the text string in bytes |
STR | UTF-8-encoded string |
BYTE | version (currently 1) |
This section describes the coding of chunks of type BG44, FG44, PM44, BM44, and TH44. Chunks of type BG44, FG44, and TH44 may be color or grayscale chunks. Chunks of type PM44 are color chunks. Chunks of type BM44 are grayscale chunks. All of these color and grayscale chunk types have the same structure. The chunk consists of a chunk header followed by arithmetically-coded wavelet coefficient updates. The coefficients are organized in a hierarchical fashion.
There may be more than one BG44, PM44, or BM44 chunk in a DjVu file. If there is more than one such color chunk, the coefficient updating is continuous across the chunks, and the data is taken from the chunks in the order in which they appear in the file. Nothing is reinitialized at the beginning of chunks after the first color chunk of these types, except for the low-level arithmetic coder. The probability estimates for the arithmetic coder are not reinitialized.
In a compound DjVu image file, in which both an FG44 chunk and one or more BG44 chunks appear in the same file, the coding of the foreground layer, using the FG44 chunk, is independent of the coding of the background layer, using the BG44 chunks.
Each color layer is coded using a Dubuc–Deslauriers–Lemire (4, 4) interpolative wavelet transform. Each layer of the image is transformed into a set of wavelet coefficients, one wavelet coefficient for each pixel in the original image. This transform is especially effective for coding images at high compression ratios.
The value of each coefficient is coded in a distributed fashion, through a number of cycles. Within one cycle, each coefficient is updated once (that is, in only one of the 10 bands), and receives approximately one additional bit of information. Specifically, from cycle to cycle the absolute value of a coefficient is first narrowed down by eliminating possible values for the most significant non-zero bit until the correct most significant non-zero bit is found. The sign is coded in the same cycle in which the most significant non-zero bit is found. Then in each subsequent cycle, one additional bit of the value is coded.
A color chunk begins with a data header consisting of 2 or 9 octets, as follows:
BYTE | Serial number | The serial number of the first chunk of a given chunk type is 0. Successive chunks are assigned consecutive serial numbers. |
BYTE | Number of slices | The number of slices coded in the chunk. |
BYTE | Major version number and color type | One octet containing two values, present only if the serial number is 0. The least significant seven bits designate the major version number of the standard being implemented by the decoder. For this version of the standard, the major version number is 1. The most significant bit is the color type bit. The color type bit is 0 if the chunk describes three color components. The color type bit is 1 if the chunk describes one color component. |
BYTE | Minor version number | A one-octet unsigned integer, present only if the serial number is 0. This octet designates the minor version number of the standard being implemented by the decoder. For this version of the standard, the minor version number is 0. |
BE16 | Image width | A two-octet unsigned integer, most significant octet first, present only if the serial number is 0. This field indicates the number of pixels in each row of the image described by the current chunk. The image width will be less than the width of the original image if the chunk describes a layer coded at lower resolution than the original image. For a BG44 or FG44 chunk, if W is the width of the original image specified in the INFO chunk, and w is the width of the image described by the current chunk, then the allowable values of w are , , , , , , , , , , , and . For a BM44 or PM44 chunk, there are no restrictions on the image width. |
BE16 | Image height | A two-octet unsigned integer, most significant octet first, present only if the serial number is 0. This field indicates the number of pixels in each column of the image described by the current chunk. The image height will be less than the height of the original image if the chunk describes a layer coded at lower resolution than the original image. For a BG44 or FG44 chunk, if H is the height of the original image specified in the INFO chunk, and h is the height of the image described by the current chunk, then the allowable values of h are , , , , , , , , , , , and . For a BM44 or PM44 chunk, there are no restrictions on the image height. |
BYTE | Initial value of chrominance delay counter | A one-octet unsigned integer, present only if the serial number is 0. Only the least significant seven bits are used. The most significant bit is ignored, but should be set to 1 by an encoder. This field specifies the initial value of the chrominance delay counter, used as described below. |
The data coded in a color chunk consists of information needed to reconstruct wavelet coefficients. There are one or three color components; each color component has its own set of wavelet coefficients. Within a color component, there are 1024 wavelet coefficients for each 32 × 32 block of the image.
Within one layer (background or foreground for DjVu image, or the only layer for an IW44 image), coding is divided into a series of slices. All the slices may be coded in one chunk, or they may be separated into a number of chunks. The only difference it makes whether the slices are coded in one chunk or in several chunks is in the order of progressive rendering; the final reconstructed image will be the same. The number of slices in each chunk is specified in the color chunk data header. One slice contains refinement data for one color band for each color component. Within a color component, all coefficients in a slice are in the same band.
A color chunk describes the full image at the spatial resolution implied by the image width and image height fields in the data header of the first chunk of the same type as the current color chunk.
The sequence of color components within a slice is: first Y, then Cb, then Cr, although the Cb and Cr components are not present in a slice if the chunk describes grayscale data or if the chrominance delay counter is not equal to 0 at the time the slice is coded.
A color band is made of up coefficient updates for all blocks in the image, but only for coefficients that are in the currently active band for the color component. Each block’s set of updates within a color band is called a block band. The block bands are coded block by block, first from left to right within the bottom row, then by rows moving up the image, left to right within each row.
Within a block band, there are 16, 64, or 256 coefficient updates. The coefficients being updated are divided into buckets, each bucket containing 16 coefficients. Thus, a block band contains 1, 4, or 16 buckets. The buckets and coefficients being updated are determined by the color band number according to Table 2.
The header of the first color chunk contains an initial value for the chrominance delay counter. It may be 0 or a positive integer.
At the beginning of the first color chunk, the color band number for each of the three color components is set to 0.
At the beginning of each slice, the chrominance delay counter is tested. If the chrominance delay counter is 0, and if the slice describes color image data, then all three color components are present. If the chrominance delay counter is greater than 0, or if the chunk describes grayscale image data, only the Y color component is present for the slice.
At the end of a slice, the following actions take place:
A color chunk ends when the number of slices specified in the color chunk header have been coded. At the beginning of each color chunk after the first for a given color layer, the chrominance delay counter and color band numbers retain the values they had at the end of the previous color chunk.
At each point during the decoding process, each wavelet coefficient has been determined to a certain precision. The current value a of the coefficient is stored, and a current step size S is associated with the coefficient. The current step size for each coefficient is governed by a step size table. The index of the entry in the step size table that contains the step size for a given coefficient that contains the step size for a given coefficient is given in Table 3.
Selection layer coding is used in compound DjVu images. In such images, there are three layers. The foreground layer is coded in one FG44 chunk, and is rendered as described in Appendix 1. The background layer is coded in one or more BG44 chunks, and is rendered as described in Appendix 1. The selection layer is coded using one Sjbz chunk. Black pixels in the selection layer specify those pixels that are to be rendered using the foreground color. All other pixels are to be rendered using the background color.
Black-and-white coding is used in bi-level DjVu images. In such images, there are three layers. The foreground layer is black. The background layer is white. The selection layer is coded using one Sjbz chunk. The selection layer specifies those pixels that are to be rendered in black. All other pixels are to be rendered in white.
An Sjbz chunk contains a single arithmetically-coded data stream, coded using the Z′-Coder (Appendix 3). All data, including headers and record types, is coded in this arithmetically-coded stream.
The arithmetically-coded data in an Sjbz chunk consists logically of records. The record types are listed in Table 6, and described in §11.4. The records consist of fields. The fields present for records of each record type are listed in Table 6. The fields within a record are coded in the order listed in Table 6 for records of that type. Details of the coding for each field appear in §11.5.
A field may contain one or more data elements. The data elements consist of flags, pixel colors, and integers. Because of the nature of arithmetic coding, the records, fields, and data elements are not of fixed sizes, and do not necessarily begin on bit boundaries within the data stream.
Flags are binary decisions, each coded using the Z′-Coder with a particular context. There are two different contexts for flags, the eventual image refinement context and the offset type context.
Pixel colors are binary decisions, coded using the Z′-Coder with a particular context. For pixel colors, there are 3072 different contexts. There are 1024 contexts used for direct coding of bitmaps; these correspond to the different combinations of values that the pixels in the direct coding template can assume. There are 2048 contexts used for refinement coding of bitmaps; these correspond to the different combinations of values that the pixels in the refinement coding template can assume.
Integers are coded using the multivalue extension to the Z′-Coder, described below. There are 15 contexts for coding multivalued integers, as described in Table 7.
All Z′-Coder contexts are initialized to the value 0. This applies both to contexts used to encode single-bit values, including pixel colors, and to contexts that are part of an integer context used by the multivalue extension to the Z′-Coder.
Quantities that can take on multiple values are coded as integers using the multivalue extension to the Z′-Coder. This extension of the Z′-Coder allows all data in the bit stream to be coded using the same coder, the Z′-Coder. There are 15 integer contexts, specified in Table 7. A single integer context includes a number of binary contexts.
One integer context consists of a binary decision tree. See Figure 1 for an example of part of such a tree. The root node of the tree corresponds to the decision about the sign of the number n being decoded. Each of the two sub-trees under the root corresponds to a set of decisions that eventually identify a range in which n lies. The sub-trees under the nodes corresponding to identified ranges are complete binary trees that identify the exact value of n.
Each node of the binary decision tree for an integer context maintains its own binary probability estimation context for the Z′-Coder. The trees for different integer contexts are completely independent. Thus each node of a tree contains probability information conditioned on a conditioning context. The conditioning context consists of both the type of value being coded (i.e., the selection of the integer context), and of the values of the decisions coded so far when encoding the current integer.
Record type coded value | Record type | Fields coded |
---|---|---|
0 | Start of image |
|
1 | New symbol, add to image and library |
|
2 | New symbol, add to library only |
|
3 | New symbol, add to image only |
|
4 | Matched symbol with refinement, add to image and library |
|
5 | Matched symbol with refinement, add to library only |
|
6 | Matched symbol with refinement, add to image only |
|
7 | Matched symbol, copy to image without refinement |
|
8 | Non-symbol data |
|
9 | Shared dictionary or reset |
|
10 | Comment |
|
11 | End of data |
|
Context name | Integer data coded using this context |
---|---|
Record type | Record type |
Image size | Image height; image width |
Matching symbol index | Index within the symbol library of the symbol matching the current symbol |
Symbol width | Width of the current symbol in pixels |
Symbol height | Height of the current symbol in pixels |
Symbol width difference | Number of pixels that must be added to the width of the matching symbol to obtain the width of the current symbol |
Symbol height difference | Number of pixels that must be added to the height of the matching symbol to obtain the height of the current symbol |
Symbol column number | Column number of the absolute location of the left edge of the current symbol (leftmost column of the image is column number 1) |
Symbol row number | Row number of the absolute location of the top edge of the current symbol (bottom row of the image is row number 1) |
Same line column offset | Number of pixels that must be added to the column number of the right edge of the previous symbol on the current text line to obtain the column number of the left edge of the current symbol |
Same line row offset | Number of pixels that must be added to the row number of the current baseline on the current text line to obtain the row number of the bottom edge of the current symbol |
New line column offset | Number of pixels that must be added to the column number of the left edge of the first symbol on the current text line to obtain the column number of the left edge of the current symbol |
New line row offset | Number of pixels that must be added to the row number of the bottom edge of the first symbol on the current text line to obtain the row number of the top edge of the current symbol |
Comment length | The number of octets in the current comment |
Comment octet | One octet in the current comment |
Dictionary size | Number of shapes in the shared dictionary |
The value of an integer is coded by making a sequence of binary decisions, each one narrowing the set of values that the integer can possibly take. The decisions are based on traversing a binary decision tree to one of its leaves. Note: although the tree conceptually has a large number of nodes, it is possible in an implementation to allocate memory only for those nodes actually traversed. Decoding proceeds in four phases.
Phase 1 determines the sign of n. A value of 0 returned by the Z′-Coder means that . A value of 1 returned by the Z′-Coder means that .
Phase 2 determines a range of possible values for v. The Z′-Coder is invoked repeatedly to answer the question “Is the value of v in the range being tested?” The sequence of ranges tested is given in Table 8. A value of 0 returned by the Z′-Coder means that v is not in the specified range, and the next range in the sequence must be tested. A value of 1 returned by the Z′-Coder means that v is in the specified range, and decoding is to proceed to Phase 3.
0 |
1–2 |
3–6 |
7–14 |
15–30 |
31–62 |
63–126 |
127–254 |
255–510 |
511–1022 |
1023–2046 |
2047–4094 |
4095–8190 |
8191–16382 |
16383–32766 |
32767–65534 |
65535–131070 |
131071–262142 |
Phase 3 consists of determining the exact value of v within the range determined in Phase 2. If Phase 2 determined that , then Phase 3 is skipped. Otherwise, since the size of the range is a power of 2, the corresponding sub-tree is a complete binary tree. The sequence of coding decisions is based directly on traversing the binary tree. At each node, 0 returned by the Z′-Coder means left branch (smaller values of v) and 1 means right branch (larger values of v. The bits returned by the Z′-Coder during Phase 3 are the bits of v, most significant bit first.
In Phase 4, the unsigned value v is converted to n, the signed value to be returned, as follows:
In any of the phases, if the input values of l and h (the range of allowable values) predetermine any decision, then the coding for that decision is not performed; the predetermined decision is assumed.
Each type of integer has its own set of binary contexts. Thus the probability information will reflect the underlying probability distribution of the particular type of integer. The Z′-Coder probability state indices of all the binary nodes are initialized to 0.
Records in an Sjbz chunk are interpreted in the order in which they appear. A "Start of data" record specifies the dimensions of the image. An "Image refinement data" record indicates […][1]. An "End of data" record indicates the end of the Sjbz chunk. A "Comment" record contains uninterpreted data.
A record identified by any other record type describes one bitmap. The model used in DjVu for the selection layer is based on symbol-based coding. Bitmaps are placed into the reconstructed image as follows: The image is initially entirely white. When a bitmap is placed into the image, the pixels that are black in the current symbol become black at the appropriate position in the reconstructed image. Once a pixel in the reconstructed image becomes black, it remains black.
Because symbols in the document images are often similar to each other, it is often possible to obtain more efficient coding by making use of previously coded symbols. As symbols are decoded, their bitmaps may be placed into a symbol bitmap library. There is exactly one symbol bitmap library. Once a symbol has been placed into the symbol bitmap library, later records may cause copies of the symbol to be placed into the image, or may define a new bitmap by refining the bitmap in the library.
Depending on the record type, the symbol bitmap may be described by direct coding, by refinement coding, or by a copy operation. In direct coding, all pixels of the bitmap are coded directly, without reference to any other bitmap. In refinement coding, all pixels of the bitmap are also coded directly, but a bitmap in the library is used to make the coding more efficient. In a copy operation, the pixels of the bitmap are the same as the pixels of a bitmap in the library.
Depending on the record type, the bitmap may or may not be placed into the image. If the bitmap is placed into the image, then depending on the record type, it may be placed either at an absolute location or at a location relative to a previously placed bitmap.
Depending on the record type, the bitmap may or may not be placed into the symbol bitmap library. The first symbol placed into the library has index 0. Subsequent symbols are assigned consecutive integer indices.
The pixels of the reconstructed image are arranged in a rectangular coordinate system. For the pixel in the lower left corner of the image, the column number is 1 and the row number is 1. All coordinates refer to the pixels themselves, not to the edges between pixels.
Records in Sjbz chunks have the following interpretations.
This record is the first record in an Sjbz chunk. It specifies the dimensions of the image.
This record specifies the bitmap of a symbol that is coded directly and placed into the reconstructed image and into the symbol bitmap library.
This record specifies the bitmap of a symbol that is coded directly and placed into the symbol bitmap library but not into the image.
This record specifies the bitmap of a symbol that is coded directly and placed into the reconstructed image but not into the symbol bitmap library.
This record specifies the bitmap of a symbol that is coded by refinement of a symbol in the symbol bitmap library and placed into the reconstructed image and into the symbol bitmap library.
This record specifies the bitmap of a symbol that is coded by refinement of a symbol in the symbol bitmap library and placed into the symbol bitmap library, but not into the reconstructed image.
This record specifies the bitmap of a symbol that is coded by refinement of a symbol in the symbol bitmap library and placed into the reconstructed image, but not into the symbol bitmap library.
This record specifies the location at which the bitmap of a symbol in the symbol bitmap library is to be placed into the reconstructed image.
This record specifies a direct-coded bitmap to be placed at an absolute location in the reconstructed image. A bitmap of non-symbol data is not placed into the symbol bitmap library.
This record is overloaded and its meaning depends on its context. If the record occurs before a "Start of data" record, then it is a "Required dictionary" record. If the record occurs after a "Start of data" record then it is a "Reset" record.
Starting with version 21, the JB2 format provides support for sharing symbol definitions between the pages of a document. To achieve this objective, the JB2 image data chunk must be able to address symbols defined elsewhere by a JB2 dictionary data chunk shared by all the pages of a document.
A "Shared dictionary or reset" record can appear before the "Start of data" record. The record type field is followed by a single number arithmetically encoded using the sixteenth dictionary size context. This record appears when the JB2 data chunk requires symbols encoded in a separate JB2 dictionary data chunk. The number (the dictionary size) indicates how many symbols should have been defined by the JB2 dictionary data chunk. The decoder should simply load these symbols into the symbol library and proceed as usual. New symbols potentially defined by the subsequent JB2 image data records will therefore be numbered with integers greater than or equal to the dictionary size.
The encoding of numbers potentially uses an unbounded number of binary coding contexts. These contexts are normally allocated when they are used for the first time (see ).
Starting with version 21, a "Shared dictionary or reset" record can appear after the "Start of data" record. The decoder should proceed with the next record after clearing all binary contexts used for coding numbers. This operation implies that all binary contexts previously allocated for coding numbers can be deallocated.
Starting with version 21, the JB2 encoder should insert a "Shared dictionary or reset" record whenever the number of these allocated binary contexts exceeds 20000. Only very large documents ever reach such a large number of allocated binary contexts (e.g. large maps). Hardware implementation however can benefit greatly from a hard bound on the total number of binary coding contexts. Old JB2 decoders will treat this record type as an "End of data" record and cleanly stop decoding (see ).
The shared JB2 dictionary data format is a pure subset of the JB2 image data format:
Note that each shared dictionary can itself include another shared dictionary.
The JB2 dictionary data is usually located in an Djbz chunk. Each page may directly contain a Djbz chunk, or may indirectly point to such a chunk using an INCL chunk (see ).
This record contains data whose interpretation is not specified by the standard.
This record is the last record of an Sjbz chunk.
The following fields are coded in records of the types specified in Table 6 and in §11.4.
The record type is coded by the multivalue extension to the Z′-Coder using the record type context. The range of allowable record types is from 0 to 11. The coded values are specified in the first column of Table 6.
The width and height of the image are coded by the multivalue extension to the Z′-Coder using the image size context. The width is coded first, then the height. The range of allowable values is from 0 to 262142. The width and height of a compound DjVu image or bi-level DjVu image must be the same as the width and height of the image specified in the INFO chunk.
The eventual image refinement flag is coded once, in the "Start of image" record, to notify the decoder whether image refinement data will eventually be provided. It is a binary value, coded by the Z′-Coder using the eventual image refinement context. The coded value 1 means TRUE
and the coded value 0 means FALSE
. Note: this flag is always FALSE
in the current version of the standard, but it may be TRUE
in later versions.
The index of a matching symbol in the bitmap library is coded with the multivalue extension to the Z′-Coder, using the matching symbol index context. The range of allowable values is from 0 to one less than the number of symbols currently in the bitmap library.
The width of a symbol is coded by the multivalue extension to the Z′-Coder using the symbol width context. Then the height of the symbol is coded by the multivalue extension to the Z′-Coder using the symbol height context. The range of allowable values for both of these data elements is from 0 to 262142.
The signed differences between the width and height of the current symbol and the width and heigh of the matching symbol are coded by the multivalue extension to the Z′-Coder using the symbol width different context and the symbol height difference context for the height. The width difference is coded first, then the height difference. The coded signed difference is added to the width or height of the matching symbol to obtain the width or height of the current symbol. The range of allowable values for both of these data elements is from -262143 to 262142.
The horizontal and vertical positions of the upper left corner of the bitmap are coded by the multivalue extension to the Z′-Coder using the symbol column number context for the horizontal position and the symbol row number context for the vertical position. The horizontal position is coded first, then the vertical position. The range of allowable values for the horizontal position is from 1 to the width of the image in pixels. The range of allowable values for the vertical position is from 1 to the height of the image in pixels.
The offset type flag is coded by the Z′-Coder using the offset type context. It indicates the reference symbol for coding the offset of the location of the current symbol. The coded value 1 means FIRST
, which means that the location of the current symbol is being specified relative to the first symbol on the current text line. The value 0 means PREVIOUS
, which means that the location of the current symbol is being specified relative to the most recently coded symbol on the current text line.
If the offset type flag is FIRST
, then the reference symbol is the first symbol on the current text line. The horizontal offset is the signed difference between the left edge of the current symbol and the left edge of the reference symbol. It is coded with the multivalue extension to the Z′-Coder using the new line column offset context.
Non-symbol bitmaps and symbol bitmaps with no sufficiently closely matching symbol in the symbol library are coded directly. A directly-coded bitmap is coded by repeated applications of the Z′-Coder to the pixels of the bitmap left to right across the rows, starting with the top row. When one row has been coded, the next lower row is coded. Each pixel is coded by the Z′-Coder using an appropriate context based on the values of 10 previously coded pixels. A coded value of 1 means the pixel is black. A coded value of 0 means the pixel is white.
The colors of the pixels numbered 1 through 10 in Figure 2, taken collectively, form a 10-bit value. Each of these values is an index into a table of 1024 different directly-coded bitmap contexts. The pixel labeled P in Figure 2 is coded using the context indexed by the collective values of the other 10 numbered pixels in the template. Pixels outside the bounding box are considered to be white.
Some bitmaps are coded by making use of data from another bitmap; this process is called refinement coding. Matched symbols other than those to be copied are coded using refinement coding.
A bitmap coded by refinement coding is coded by repeated application of the Z′-Coder to the pixels of the bitmap, left to right across the rows. When one row has been coded, the next lower row is coded. Each pixel is coded by the Z′-Coder using an appropriate context based on the values of four previously coded pixels from the bitmap being coded and seven pixels from the matching bitmap. (The pixels numbered 1 through 4 in Figure 3 are from the current symbol; the pixels numbered 5 through 11 are from the matching symbol.) A coded value of 1 means the pixel is black. A coded value of 0 means the pixel is white. The colors of the pixels numbered 1 through 11 in Figure 3, taken collectively, form an 11-bit value. Each of these values is an index into a table of 2048 different refinement-coded bitmap contexts. The pixel labeled P in Figure 3 is coded using the context indexed by the collective values of the 11 numbered pixels in the template. Pixel 7 is in the position in the matching symbol that corresponds to the position of pixel P in the current symbol when the two symbols are aligned.
Alignment of the current bitmap and the matching bitmap proceeds as follows. For matched symbols, the current symbol and the matching symbol are aligned according to the geometric centers of their bounding rectangles. If the number of columns or rows is even, the geometric center falls between two columns or rows, respectively. In this case, the leftmost of the two central columns or the lowermost of the two central rows is considered to be the central column or row, respectively.
It is possible for the current symbol to have empty edge rows or columns. These empty rows and columns are coded, and are included in the bounding rectangle. For symbols added to the library, the symbol is added to the library after it has been placed into the image. Any empty edge rows and columns are removed before the symbol is added to the library.
The comment length is the number of octets in the comment. It is coded by the multivalue extension to the Z′-Coder using the comment length context. The range of allowable values for the comment length is from 0 to 262142.
Comment data consists of the individual octets of the comment. The number of octets in the comment is given by the comment length field. Each of the octets is coded using the multivalue extension to the Z′-Coder using the comment octet context. The range of allowable values for each octet is from 0 to 255.
The Z′-Coder is an approximate binary arithmetic coder. Decoding proceeds as follows.[1]
See also files ZPCodec.h and ZPCodec.cpp in DjVuLibre.
In Figure 1 and Figure 2, the values of variables A, C, D, and Z are stored in registers of at least 16 bits each. A and C retain their values between invocations of the Z′-Coder. Note: if register overflow can be ignored, storing variables A and C in registers of exactly 16 bits allows a simplification of lines 11, 12, 16, and 17 of Figure 1 and lines 8, 9, 12, and 13 of Figure 2.
At the beginning of a chunk, the values of A and C are reinitialized. When the decoder is decoding a chunk, it may require more bits than are present within the chunk’s data. In this case, all additional required bits are to be assumed by the decoder to be 1. If there are excess bits at the end of a chunk, they are ignored.
K is conceptually an array with a single 8-bit entry for each binary decision context. (In practice, K consists of a number of of individual values, arrays, and tree nodes, but each one has a specific address and a single 8-bit value at any time.) This array is indexed by the value of i, which is the input to the decoder. is the current value of the probability state index for context i. may be updated as part of the decoding process.
In pass-through mode, the decoder is invoked with no input argument. No context is involved.
B is the 1-bit value returned by the decoder.
The Z′-Coder is state-based. Decoding is governed by four fixed tables, given in Table 9. The tables are indexed by , the probability state index for the current context. All probability state indices are initialized to 0. That is, at the beginning of coding, for all i, . These values are not reinitialized at the beginning of chunks after the first.
The more probably symbol is denoted by MPS. The MPS is 1 if the probability state index is an odd integer, and 0 if the probability state index is an even integer. The less probable symbol is denoted by LPS. The LPS is 0 if the probability state index is an odd integer, and 1 if the probability state context is an even integer.
is the amount by which the current arithmetic coding interval is reduced if the decoded symbol is the MPS. is the threshold above which an MPS triggers a probability state update. is the next probability state index for context k after an MPS triggers a probability state index update. An LPS always triggers a probability state index update. is the next probability state index for context k after an LPS.
Initially, A is set to 0x0000. Two octets are read from the input data stream into the lowest 16 bits of C. If the bits of C are numbered such that bit 15 is the most significant bit and bit 0 is the least significant bit, then the first input octet is stored in bits 15 through 8, and the second input octet is stored in bits 7 through 0.
Figure 1 shows the steps involved in decoding a single binary decision. The input to the decoder is the index i of the appropriate context for the binary decision being decoded. The output from the decoder is a single bit B.
Figure 2 shows the steps involved in decoding a single binary decision using the Z′-Coder in pass-through mode. No input is required. No context is involved. No probability state index values are updated. The output from the decoder is the single bit B.
Numerous streams in the DjVu file format are compressed using the general-purpose compressor described here called “BZZ”. BZZ transforms the input data using the well-documented Burrows–Wheeler transform. However, the traditional “Move To Front” permutation table is augmented with a frequency estimation provided by the Z′-Coder.
See also file BSByteStream.cpp.
BZZ first takes as input a 24-bit integer as block size between 10K and 4M and an input stream (to be compressed). The stream is partitioned into blocks terminated with a special ⟨EOB⟩
symbol. It is then transformed using the well-documented Burrows–Wheeler (BW or “block sorting”) transform. Then, one block at a time, the block size and resulting output stream are passed as input to be compressed using the Z′-Coder (Appendix 3).
We describe the decoding algorithm by means of pseudo-code.
For each block, one must decode
decode_raw
)FSHIFT ∈ {0, 1, 2}
(two bits[1] with the pass-through decoder)Then one must perform the inverse Burrows–Wheeler transform to recover the decoded block.
The following points are significant when recovering the BWT and are discussed below:
The MTF array contains 256 bytes initialized with the identity mapping, that is MTF[0] = 0
, MTF[1] = 1
, … , MTF[255] = 255
.
Whenever one decodes a number MTFNO
, the corresponding symbol to store in the Burrows–Wheeler buffer is MTF[MTFNO]
(except for the ⟨EOB⟩
symbol — see §13.2.2.3) and the contents of the MTF array are rotated. The rotation moves the symbol that was at position MTF[MTFNO]
to a position M
that can be 0, 1, 2, or 3. Meanwhile the symbols MTF[M]
to MTF[MTFNO - 1]
are moved to positions M + 1
to MTFNO
.
The position M
is chosen using an estimate of the frequency of the symbol MTF[MTFNO]
. One strives to position the most frequent symbols at the beginning of the MTF array. To that end, one maintains an array FREQ[0..3]
that contains numbers representative of the instantaneous frequencies of the symbols MTF[0..3]
.
Of course this array must also be “rotated” when the rotation of the MTF array affects its first four elements.
Consider the frequency of a particular symbol S measured after decoding the Tth symbol. Ideally, where
To avoid multiplying all the frequencies by , the FREQ
array contains instead
It is then easy to see that
Therefore we only need to update the G corresponding to the symbol being decoded (i.e. ),
since the G for the other symbols does not change.
A dedicated variable FADD
contains
.
Before each rotation we divide FADD
by
.
This is accomplished by the line
FADD = FADD + SHIFTRIGHT(FADD, FSHIFT)
The values 0, 1 or 2 of variable FSHIFT
correspond to
,
, or
.
To avoid overflows we divide everything (FADD
and FREQ[0..3]
) by 0x10000000
whenever FADD
becomes bigger than 0x10000000
. This happens rarely enough to take very little time.
The
of the freshly decoded symbol is therefore
FADD
.
We can only compute this exactly when is one of the first four symbols of the MTF because we only store FREQ[0..3]
. If the decoded number MTFNO
is greater than 3, we assume that
and simply consider
FADD
.
The number M
is then chosen to make sure the array FREQ
remains sorted in decreasing order after the rotation.
MTFNO
Now we can discuss how the numbers MTFNO
are stored. There are 262 arithmetic coding contexts. These are initialized to zero at the beginning of the stream decoding process. They should not be reset to zero at the beginning of the block decoding process.
Because the most frequently used symbols should appear near the front of the array, we expect small values for MTFNO
(the index into the MTF array). By design, the number of bits and the number of contexts required to decode increases for larger values of MTFNO
:
MTFNO
was 0 or 1. Otherwise context 2 is used. If this bit is set, the new MTFNO
is 0.MTFNO
was 0 or 1. Otherwise context 5 is used. If this bit is set, the new MTFNO
is 1.MTFNO
is obtained by adding 2 to a 1-bit number decoded with decode_bin
using context 7.MTFNO
is obtained by adding 4 to a 2-bit number decoded with decode_bin
using contexts 9..11
.MTFNO
is obtained by adding 128 to a 7-bit number decoded with decode_bin
using contexts 133..261
.⟨EOB⟩
symbol. Since there is only one ⟨EOB⟩
symbol, we store a zero in the Burrows–Wheeler buffer and record its position in variable MARKERPOS
.After decoding the BLOCKSIZE
symbols composing the Burrows–Wheeler buffer, we need to perform the inverse Burrows–Wheeler transform to recover the BLOCKSIZE - 1
decoded bytes followed by the ⟨EOB⟩
symbol.
To start, we
POSC[0..BLOCKSIZE - 1]
,COUNT[0..255]
that counts how many occurrences of each symbol are found,POSN[0..BLOCKSIZE - 1]
that indicates the rank of each occurrence of a symbol in the buffer.Imagine that we are sorting the buffer in symbol order (⟨EOB⟩
being the smallest symbol). The buffer would be composed of a single ⟨EOB⟩
, followed by a run of COUNT[0]
symbols 0, followed by a run of COUNT[1]
symbols 1, etc.
Using the COUNT
array, we compute the position SORTEDPOS[0..255]
of each run of symbol in this array.
To perform the inverse Burrows–Wheeler transform, it is now sufficient to follow the thread backwards:
The array
DATA[0..BLOCKSIZE - 2]
then contains the decoded bytes of the block.