844 lines
31 KiB
Plaintext
844 lines
31 KiB
Plaintext
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Network Working Group S. Pfeiffer
|
|||
|
Request for Comments: 3533 CSIRO
|
|||
|
Category: Informational May 2003
|
|||
|
|
|||
|
|
|||
|
The Ogg Encapsulation Format Version 0
|
|||
|
|
|||
|
Status of this Memo
|
|||
|
|
|||
|
This memo provides information for the Internet community. It does
|
|||
|
not specify an Internet standard of any kind. Distribution of this
|
|||
|
memo is unlimited.
|
|||
|
|
|||
|
Copyright Notice
|
|||
|
|
|||
|
Copyright (C) The Internet Society (2003). All Rights Reserved.
|
|||
|
|
|||
|
Abstract
|
|||
|
|
|||
|
This document describes the Ogg bitstream format version 0, which is
|
|||
|
a general, freely-available encapsulation format for media streams.
|
|||
|
It is able to encapsulate any kind and number of video and audio
|
|||
|
encoding formats as well as other data streams in a single bitstream.
|
|||
|
|
|||
|
Terminology
|
|||
|
|
|||
|
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
|
|||
|
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
|
|||
|
document are to be interpreted as described in BCP 14, RFC 2119 [2].
|
|||
|
|
|||
|
Table of Contents
|
|||
|
|
|||
|
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2
|
|||
|
2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 2
|
|||
|
3. Requirements for a generic encapsulation format . . . . . . . 3
|
|||
|
4. The Ogg bitstream format . . . . . . . . . . . . . . . . . . . 3
|
|||
|
5. The encapsulation process . . . . . . . . . . . . . . . . . . 6
|
|||
|
6. The Ogg page format . . . . . . . . . . . . . . . . . . . . . 9
|
|||
|
7. Security Considerations . . . . . . . . . . . . . . . . . . . 11
|
|||
|
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12
|
|||
|
A. Glossary of terms and abbreviations . . . . . . . . . . . . . 13
|
|||
|
B. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14
|
|||
|
Author's Address . . . . . . . . . . . . . . . . . . . . . . . 14
|
|||
|
Full Copyright Statement . . . . . . . . . . . . . . . . . . . 15
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Pfeiffer Informational [Page 1]
|
|||
|
|
|||
|
RFC 3533 OGG May 2003
|
|||
|
|
|||
|
|
|||
|
1. Introduction
|
|||
|
|
|||
|
The Ogg bitstream format has been developed as a part of a larger
|
|||
|
project aimed at creating a set of components for the coding and
|
|||
|
decoding of multimedia content (codecs) which are to be freely
|
|||
|
available and freely re-implementable, both in software and in
|
|||
|
hardware for the computing community at large, including the Internet
|
|||
|
community. It is the intention of the Ogg developers represented by
|
|||
|
Xiph.Org that it be usable without intellectual property concerns.
|
|||
|
|
|||
|
This document describes the Ogg bitstream format and how to use it to
|
|||
|
encapsulate one or several media bitstreams created by one or several
|
|||
|
encoders. The Ogg transport bitstream is designed to provide
|
|||
|
framing, error protection and seeking structure for higher-level
|
|||
|
codec streams that consist of raw, unencapsulated data packets, such
|
|||
|
as the Vorbis audio codec or the upcoming Tarkin and Theora video
|
|||
|
codecs. It is capable of interleaving different binary media and
|
|||
|
other time-continuous data streams that are prepared by an encoder as
|
|||
|
a sequence of data packets. Ogg provides enough information to
|
|||
|
properly separate data back into such encoder created data packets at
|
|||
|
the original packet boundaries without relying on decoding to find
|
|||
|
packet boundaries.
|
|||
|
|
|||
|
Please note that the MIME type application/ogg has been registered
|
|||
|
with the IANA [1].
|
|||
|
|
|||
|
2. Definitions
|
|||
|
|
|||
|
For describing the Ogg encapsulation process, a set of terms will be
|
|||
|
used whose meaning needs to be well understood. Therefore, some of
|
|||
|
the most fundamental terms are defined now before we start with the
|
|||
|
description of the requirements for a generic media stream
|
|||
|
encapsulation format, the process of encapsulation, and the concrete
|
|||
|
format of the Ogg bitstream. See the Appendix for a more complete
|
|||
|
glossary.
|
|||
|
|
|||
|
The result of an Ogg encapsulation is called the "Physical (Ogg)
|
|||
|
Bitstream". It encapsulates one or several encoder-created
|
|||
|
bitstreams, which are called "Logical Bitstreams". A logical
|
|||
|
bitstream, provided to the Ogg encapsulation process, has a
|
|||
|
structure, i.e., it is split up into a sequence of so-called
|
|||
|
"Packets". The packets are created by the encoder of that logical
|
|||
|
bitstream and represent meaningful entities for that encoder only
|
|||
|
(e.g., an uncompressed stream may use video frames as packets). They
|
|||
|
do not contain boundary information - strung together they appear to
|
|||
|
be streams of random bytes with no landmarks.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Pfeiffer Informational [Page 2]
|
|||
|
|
|||
|
RFC 3533 OGG May 2003
|
|||
|
|
|||
|
|
|||
|
Please note that the term "packet" is not used in this document to
|
|||
|
signify entities for transport over a network.
|
|||
|
|
|||
|
3. Requirements for a generic encapsulation format
|
|||
|
|
|||
|
The design idea behind Ogg was to provide a generic, linear media
|
|||
|
transport format to enable both file-based storage and stream-based
|
|||
|
transmission of one or several interleaved media streams independent
|
|||
|
of the encoding format of the media data. Such an encapsulation
|
|||
|
format needs to provide:
|
|||
|
|
|||
|
o framing for logical bitstreams.
|
|||
|
|
|||
|
o interleaving of different logical bitstreams.
|
|||
|
|
|||
|
o detection of corruption.
|
|||
|
|
|||
|
o recapture after a parsing error.
|
|||
|
|
|||
|
o position landmarks for direct random access of arbitrary positions
|
|||
|
in the bitstream.
|
|||
|
|
|||
|
o streaming capability (i.e., no seeking is needed to build a 100%
|
|||
|
complete bitstream).
|
|||
|
|
|||
|
o small overhead (i.e., use no more than approximately 1-2% of
|
|||
|
bitstream bandwidth for packet boundary marking, high-level
|
|||
|
framing, sync and seeking).
|
|||
|
|
|||
|
o simplicity to enable fast parsing.
|
|||
|
|
|||
|
o simple concatenation mechanism of several physical bitstreams.
|
|||
|
|
|||
|
All of these design considerations have been taken into consideration
|
|||
|
for Ogg. Ogg supports framing and interleaving of logical
|
|||
|
bitstreams, seeking landmarks, detection of corruption, and stream
|
|||
|
resynchronisation after a parsing error with no more than
|
|||
|
approximately 1-2% overhead. It is a generic framework to perform
|
|||
|
encapsulation of time-continuous bitstreams. It does not know any
|
|||
|
specifics about the codec data that it encapsulates and is thus
|
|||
|
independent of any media codec.
|
|||
|
|
|||
|
4. The Ogg bitstream format
|
|||
|
|
|||
|
A physical Ogg bitstream consists of multiple logical bitstreams
|
|||
|
interleaved in so-called "Pages". Whole pages are taken in order
|
|||
|
from multiple logical bitstreams multiplexed at the page level. The
|
|||
|
logical bitstreams are identified by a unique serial number in the
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Pfeiffer Informational [Page 3]
|
|||
|
|
|||
|
RFC 3533 OGG May 2003
|
|||
|
|
|||
|
|
|||
|
header of each page of the physical bitstream. This unique serial
|
|||
|
number is created randomly and does not have any connection to the
|
|||
|
content or encoder of the logical bitstream it represents. Pages of
|
|||
|
all logical bitstreams are concurrently interleaved, but they need
|
|||
|
not be in a regular order - they are only required to be consecutive
|
|||
|
within the logical bitstream. Ogg demultiplexing reconstructs the
|
|||
|
original logical bitstreams from the physical bitstream by taking the
|
|||
|
pages in order from the physical bitstream and redirecting them into
|
|||
|
the appropriate logical decoding entity.
|
|||
|
|
|||
|
Each Ogg page contains only one type of data as it belongs to one
|
|||
|
logical bitstream only. Pages are of variable size and have a page
|
|||
|
header containing encapsulation and error recovery information. Each
|
|||
|
logical bitstream in a physical Ogg bitstream starts with a special
|
|||
|
start page (bos=beginning of stream) and ends with a special page
|
|||
|
(eos=end of stream).
|
|||
|
|
|||
|
The bos page contains information to uniquely identify the codec type
|
|||
|
and MAY contain information to set up the decoding process. The bos
|
|||
|
page SHOULD also contain information about the encoded media - for
|
|||
|
example, for audio, it should contain the sample rate and number of
|
|||
|
channels. By convention, the first bytes of the bos page contain
|
|||
|
magic data that uniquely identifies the required codec. It is the
|
|||
|
responsibility of anyone fielding a new codec to make sure it is
|
|||
|
possible to reliably distinguish his/her codec from all other codecs
|
|||
|
in use. There is no fixed way to detect the end of the codec-
|
|||
|
identifying marker. The format of the bos page is dependent on the
|
|||
|
codec and therefore MUST be given in the encapsulation specification
|
|||
|
of that logical bitstream type. Ogg also allows but does not require
|
|||
|
secondary header packets after the bos page for logical bitstreams
|
|||
|
and these must also precede any data packets in any logical
|
|||
|
bitstream. These subsequent header packets are framed into an
|
|||
|
integral number of pages, which will not contain any data packets.
|
|||
|
So, a physical bitstream begins with the bos pages of all logical
|
|||
|
bitstreams containing one initial header packet per page, followed by
|
|||
|
the subsidiary header packets of all streams, followed by pages
|
|||
|
containing data packets.
|
|||
|
|
|||
|
The encapsulation specification for one or more logical bitstreams is
|
|||
|
called a "media mapping". An example for a media mapping is "Ogg
|
|||
|
Vorbis", which uses the Ogg framework to encapsulate Vorbis-encoded
|
|||
|
audio data for stream-based storage (such as files) and transport
|
|||
|
(such as TCP streams or pipes). Ogg Vorbis provides the name and
|
|||
|
revision of the Vorbis codec, the audio rate and the audio quality on
|
|||
|
the Ogg Vorbis bos page. It also uses two additional header pages
|
|||
|
per logical bitstream. The Ogg Vorbis bos page starts with the byte
|
|||
|
0x01, followed by "vorbis" (a total of 7 bytes of identifier).
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Pfeiffer Informational [Page 4]
|
|||
|
|
|||
|
RFC 3533 OGG May 2003
|
|||
|
|
|||
|
|
|||
|
Ogg knows two types of multiplexing: concurrent multiplexing (so-
|
|||
|
called "Grouping") and sequential multiplexing (so-called
|
|||
|
"Chaining"). Grouping defines how to interleave several logical
|
|||
|
bitstreams page-wise in the same physical bitstream. Grouping is for
|
|||
|
example needed for interleaving a video stream with several
|
|||
|
synchronised audio tracks using different codecs in different logical
|
|||
|
bitstreams. Chaining on the other hand, is defined to provide a
|
|||
|
simple mechanism to concatenate physical Ogg bitstreams, as is often
|
|||
|
needed for streaming applications.
|
|||
|
|
|||
|
In grouping, all bos pages of all logical bitstreams MUST appear
|
|||
|
together at the beginning of the Ogg bitstream. The media mapping
|
|||
|
specifies the order of the initial pages. For example, the grouping
|
|||
|
of a specific Ogg video and Ogg audio bitstream may specify that the
|
|||
|
physical bitstream MUST begin with the bos page of the logical video
|
|||
|
bitstream, followed by the bos page of the audio bitstream. Unlike
|
|||
|
bos pages, eos pages for the logical bitstreams need not all occur
|
|||
|
contiguously. Eos pages may be 'nil' pages, that is, pages
|
|||
|
containing no content but simply a page header with position
|
|||
|
information and the eos flag set in the page header. Each grouped
|
|||
|
logical bitstream MUST have a unique serial number within the scope
|
|||
|
of the physical bitstream.
|
|||
|
|
|||
|
In chaining, complete logical bitstreams are concatenated. The
|
|||
|
bitstreams do not overlap, i.e., the eos page of a given logical
|
|||
|
bitstream is immediately followed by the bos page of the next. Each
|
|||
|
chained logical bitstream MUST have a unique serial number within the
|
|||
|
scope of the physical bitstream.
|
|||
|
|
|||
|
It is possible to consecutively chain groups of concurrently
|
|||
|
multiplexed bitstreams. The groups, when unchained, MUST stand on
|
|||
|
their own as a valid concurrently multiplexed bitstream. The
|
|||
|
following diagram shows a schematic example of such a physical
|
|||
|
bitstream that obeys all the rules of both grouped and chained
|
|||
|
multiplexed bitstreams.
|
|||
|
|
|||
|
physical bitstream with pages of
|
|||
|
different logical bitstreams grouped and chained
|
|||
|
-------------------------------------------------------------
|
|||
|
|*A*|*B*|*C*|A|A|C|B|A|B|#A#|C|...|B|C|#B#|#C#|*D*|D|...|#D#|
|
|||
|
-------------------------------------------------------------
|
|||
|
bos bos bos eos eos eos bos eos
|
|||
|
|
|||
|
In this example, there are two chained physical bitstreams, the first
|
|||
|
of which is a grouped stream of three logical bitstreams A, B, and C.
|
|||
|
The second physical bitstream is chained after the end of the grouped
|
|||
|
bitstream, which ends after the last eos page of all its grouped
|
|||
|
logical bitstreams. As can be seen, grouped bitstreams begin
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Pfeiffer Informational [Page 5]
|
|||
|
|
|||
|
RFC 3533 OGG May 2003
|
|||
|
|
|||
|
|
|||
|
together - all of the bos pages MUST appear before any data pages.
|
|||
|
It can also be seen that pages of concurrently multiplexed bitstreams
|
|||
|
need not conform to a regular order. And it can be seen that a
|
|||
|
grouped bitstream can end long before the other bitstreams in the
|
|||
|
group end.
|
|||
|
|
|||
|
Ogg does not know any specifics about the codec data except that each
|
|||
|
logical bitstream belongs to a different codec, the data from the
|
|||
|
codec comes in order and has position markers (so-called "Granule
|
|||
|
positions"). Ogg does not have a concept of 'time': it only knows
|
|||
|
about sequentially increasing, unitless position markers. An
|
|||
|
application can only get temporal information through higher layers
|
|||
|
which have access to the codec APIs to assign and convert granule
|
|||
|
positions or time.
|
|||
|
|
|||
|
A specific definition of a media mapping using Ogg may put further
|
|||
|
constraints on its specific use of the Ogg bitstream format. For
|
|||
|
example, a specific media mapping may require that all the eos pages
|
|||
|
for all grouped bitstreams need to appear in direct sequence. An
|
|||
|
example for a media mapping is the specification of "Ogg Vorbis".
|
|||
|
Another example is the upcoming "Ogg Theora" specification which
|
|||
|
encapsulates Theora-encoded video data and usually comes multiplexed
|
|||
|
with a Vorbis stream for an Ogg containing synchronised audio and
|
|||
|
video. As Ogg does not specify temporal relationships between the
|
|||
|
encapsulated concurrently multiplexed bitstreams, the temporal
|
|||
|
synchronisation between the audio and video stream will be specified
|
|||
|
in this media mapping. To enable streaming, pages from various
|
|||
|
logical bitstreams will typically be interleaved in chronological
|
|||
|
order.
|
|||
|
|
|||
|
5. The encapsulation process
|
|||
|
|
|||
|
The process of multiplexing different logical bitstreams happens at
|
|||
|
the level of pages as described above. The bitstreams provided by
|
|||
|
encoders are however handed over to Ogg as so-called "Packets" with
|
|||
|
packet boundaries dependent on the encoding format. The process of
|
|||
|
encapsulating packets into pages will be described now.
|
|||
|
|
|||
|
From Ogg's perspective, packets can be of any arbitrary size. A
|
|||
|
specific media mapping will define how to group or break up packets
|
|||
|
from a specific media encoder. As Ogg pages have a maximum size of
|
|||
|
about 64 kBytes, sometimes a packet has to be distributed over
|
|||
|
several pages. To simplify that process, Ogg divides each packet
|
|||
|
into 255 byte long chunks plus a final shorter chunk. These chunks
|
|||
|
are called "Ogg Segments". They are only a logical construct and do
|
|||
|
not have a header for themselves.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Pfeiffer Informational [Page 6]
|
|||
|
|
|||
|
RFC 3533 OGG May 2003
|
|||
|
|
|||
|
|
|||
|
A group of contiguous segments is wrapped into a variable length page
|
|||
|
preceded by a header. A segment table in the page header tells about
|
|||
|
the "Lacing values" (sizes) of the segments included in the page. A
|
|||
|
flag in the page header tells whether a page contains a packet
|
|||
|
continued from a previous page. Note that a lacing value of 255
|
|||
|
implies that a second lacing value follows in the packet, and a value
|
|||
|
of less than 255 marks the end of the packet after that many
|
|||
|
additional bytes. A packet of 255 bytes (or a multiple of 255 bytes)
|
|||
|
is terminated by a lacing value of 0. Note also that a 'nil' (zero
|
|||
|
length) packet is not an error; it consists of nothing more than a
|
|||
|
lacing value of zero in the header.
|
|||
|
|
|||
|
The encoding is optimized for speed and the expected case of the
|
|||
|
majority of packets being between 50 and 200 bytes large. This is a
|
|||
|
design justification rather than a recommendation. This encoding
|
|||
|
both avoids imposing a maximum packet size as well as imposing
|
|||
|
minimum overhead on small packets. In contrast, e.g., simply using
|
|||
|
two bytes at the head of every packet and having a max packet size of
|
|||
|
32 kBytes would always penalize small packets (< 255 bytes, the
|
|||
|
typical case) with twice the segmentation overhead. Using the lacing
|
|||
|
values as suggested, small packets see the minimum possible byte-
|
|||
|
aligned overhead (1 byte) and large packets (>512 bytes) see a fairly
|
|||
|
constant ~0.5% overhead on encoding space.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Pfeiffer Informational [Page 7]
|
|||
|
|
|||
|
RFC 3533 OGG May 2003
|
|||
|
|
|||
|
|
|||
|
The following diagram shows a schematic example of a media mapping
|
|||
|
using Ogg and grouped logical bitstreams:
|
|||
|
|
|||
|
logical bitstream with packet boundaries
|
|||
|
-----------------------------------------------------------------
|
|||
|
> | packet_1 | packet_2 | packet_3 | <
|
|||
|
-----------------------------------------------------------------
|
|||
|
|
|||
|
|segmentation (logically only)
|
|||
|
v
|
|||
|
|
|||
|
packet_1 (5 segments) packet_2 (4 segs) p_3 (2 segs)
|
|||
|
------------------------------ -------------------- ------------
|
|||
|
.. |seg_1|seg_2|seg_3|seg_4|s_5 | |seg_1|seg_2|seg_3|| |seg_1|s_2 | ..
|
|||
|
------------------------------ -------------------- ------------
|
|||
|
|
|||
|
| page encapsulation
|
|||
|
v
|
|||
|
|
|||
|
page_1 (packet_1 data) page_2 (pket_1 data) page_3 (packet_2 data)
|
|||
|
------------------------ ---------------- ------------------------
|
|||
|
|H|------------------- | |H|----------- | |H|------------------- |
|
|||
|
|D||seg_1|seg_2|seg_3| | |D|seg_4|s_5 | | |D||seg_1|seg_2|seg_3| | ...
|
|||
|
|R|------------------- | |R|----------- | |R|------------------- |
|
|||
|
------------------------ ---------------- ------------------------
|
|||
|
|
|||
|
|
|
|||
|
pages of |
|
|||
|
other --------| |
|
|||
|
logical -------
|
|||
|
bitstreams | MUX |
|
|||
|
-------
|
|||
|
|
|
|||
|
v
|
|||
|
|
|||
|
page_1 page_2 page_3
|
|||
|
------ ------ ------- ----- -------
|
|||
|
... || | || | || | || | || | ...
|
|||
|
------ ------ ------- ----- -------
|
|||
|
physical Ogg bitstream
|
|||
|
|
|||
|
In this example we take a snapshot of the encapsulation process of
|
|||
|
one logical bitstream. We can see part of that bitstream's
|
|||
|
subdivision into packets as provided by the codec. The Ogg
|
|||
|
encapsulation process chops up the packets into segments. The
|
|||
|
packets in this example are rather large such that packet_1 is split
|
|||
|
into 5 segments - 4 segments with 255 bytes and a final smaller one.
|
|||
|
Packet_2 is split into 4 segments - 3 segments with 255 bytes and a
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Pfeiffer Informational [Page 8]
|
|||
|
|
|||
|
RFC 3533 OGG May 2003
|
|||
|
|
|||
|
|
|||
|
final very small one - and packet_3 is split into two segments. The
|
|||
|
encapsulation process then creates pages, which are quite small in
|
|||
|
this example. Page_1 consists of the first three segments of
|
|||
|
packet_1, page_2 contains the remaining 2 segments from packet_1, and
|
|||
|
page_3 contains the first three pages of packet_2. Finally, this
|
|||
|
logical bitstream is multiplexed into a physical Ogg bitstream with
|
|||
|
pages of other logical bitstreams.
|
|||
|
|
|||
|
6. The Ogg page format
|
|||
|
|
|||
|
A physical Ogg bitstream consists of a sequence of concatenated
|
|||
|
pages. Pages are of variable size, usually 4-8 kB, maximum 65307
|
|||
|
bytes. A page header contains all the information needed to
|
|||
|
demultiplex the logical bitstreams out of the physical bitstream and
|
|||
|
to perform basic error recovery and landmarks for seeking. Each page
|
|||
|
is a self-contained entity such that the page decode mechanism can
|
|||
|
recognize, verify, and handle single pages at a time without
|
|||
|
requiring the overall bitstream.
|
|||
|
|
|||
|
The Ogg page header has the following format:
|
|||
|
|
|||
|
0 1 2 3
|
|||
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
|
|||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
| capture_pattern: Magic number for page start "OggS" | 0-3
|
|||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
| version | header_type | granule_position | 4-7
|
|||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
| | 8-11
|
|||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
| | bitstream_serial_number | 12-15
|
|||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
| | page_sequence_number | 16-19
|
|||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
| | CRC_checksum | 20-23
|
|||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
| |page_segments | segment_table | 24-27
|
|||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
| ... | 28-
|
|||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|||
|
|
|||
|
The LSb (least significant bit) comes first in the Bytes. Fields
|
|||
|
with more than one byte length are encoded LSB (least significant
|
|||
|
byte) first.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Pfeiffer Informational [Page 9]
|
|||
|
|
|||
|
RFC 3533 OGG May 2003
|
|||
|
|
|||
|
|
|||
|
The fields in the page header have the following meaning:
|
|||
|
|
|||
|
1. capture_pattern: a 4 Byte field that signifies the beginning of a
|
|||
|
page. It contains the magic numbers:
|
|||
|
|
|||
|
0x4f 'O'
|
|||
|
|
|||
|
0x67 'g'
|
|||
|
|
|||
|
0x67 'g'
|
|||
|
|
|||
|
0x53 'S'
|
|||
|
|
|||
|
It helps a decoder to find the page boundaries and regain
|
|||
|
synchronisation after parsing a corrupted stream. Once the
|
|||
|
capture pattern is found, the decoder verifies page sync and
|
|||
|
integrity by computing and comparing the checksum.
|
|||
|
|
|||
|
2. stream_structure_version: 1 Byte signifying the version number of
|
|||
|
the Ogg file format used in this stream (this document specifies
|
|||
|
version 0).
|
|||
|
|
|||
|
3. header_type_flag: the bits in this 1 Byte field identify the
|
|||
|
specific type of this page.
|
|||
|
|
|||
|
* bit 0x01
|
|||
|
|
|||
|
set: page contains data of a packet continued from the previous
|
|||
|
page
|
|||
|
|
|||
|
unset: page contains a fresh packet
|
|||
|
|
|||
|
* bit 0x02
|
|||
|
|
|||
|
set: this is the first page of a logical bitstream (bos)
|
|||
|
|
|||
|
unset: this page is not a first page
|
|||
|
|
|||
|
* bit 0x04
|
|||
|
|
|||
|
set: this is the last page of a logical bitstream (eos)
|
|||
|
|
|||
|
unset: this page is not a last page
|
|||
|
|
|||
|
4. granule_position: an 8 Byte field containing position information.
|
|||
|
For example, for an audio stream, it MAY contain the total number
|
|||
|
of PCM samples encoded after including all frames finished on this
|
|||
|
page. For a video stream it MAY contain the total number of video
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Pfeiffer Informational [Page 10]
|
|||
|
|
|||
|
RFC 3533 OGG May 2003
|
|||
|
|
|||
|
|
|||
|
frames encoded after this page. This is a hint for the decoder
|
|||
|
and gives it some timing and position information. Its meaning is
|
|||
|
dependent on the codec for that logical bitstream and specified in
|
|||
|
a specific media mapping. A special value of -1 (in two's
|
|||
|
complement) indicates that no packets finish on this page.
|
|||
|
|
|||
|
5. bitstream_serial_number: a 4 Byte field containing the unique
|
|||
|
serial number by which the logical bitstream is identified.
|
|||
|
|
|||
|
6. page_sequence_number: a 4 Byte field containing the sequence
|
|||
|
number of the page so the decoder can identify page loss. This
|
|||
|
sequence number is increasing on each logical bitstream
|
|||
|
separately.
|
|||
|
|
|||
|
7. CRC_checksum: a 4 Byte field containing a 32 bit CRC checksum of
|
|||
|
the page (including header with zero CRC field and page content).
|
|||
|
The generator polynomial is 0x04c11db7.
|
|||
|
|
|||
|
8. number_page_segments: 1 Byte giving the number of segment entries
|
|||
|
encoded in the segment table.
|
|||
|
|
|||
|
9. segment_table: number_page_segments Bytes containing the lacing
|
|||
|
values of all segments in this page. Each Byte contains one
|
|||
|
lacing value.
|
|||
|
|
|||
|
The total header size in bytes is given by:
|
|||
|
header_size = number_page_segments + 27 [Byte]
|
|||
|
|
|||
|
The total page size in Bytes is given by:
|
|||
|
page_size = header_size + sum(lacing_values: 1..number_page_segments)
|
|||
|
[Byte]
|
|||
|
|
|||
|
7. Security Considerations
|
|||
|
|
|||
|
The Ogg encapsulation format is a container format and only
|
|||
|
encapsulates content (such as Vorbis-encoded audio). It does not
|
|||
|
provide for any generic encryption or signing of itself or its
|
|||
|
contained content bitstreams. However, it encapsulates any kind of
|
|||
|
content bitstream as long as there is a codec for it, and is thus
|
|||
|
able to contain encrypted and signed content data. It is also
|
|||
|
possible to add an external security mechanism that encrypts or signs
|
|||
|
an Ogg physical bitstream and thus provides content confidentiality
|
|||
|
and authenticity.
|
|||
|
|
|||
|
As Ogg encapsulates binary data, it is possible to include executable
|
|||
|
content in an Ogg bitstream. This can be an issue with applications
|
|||
|
that are implemented using the Ogg format, especially when Ogg is
|
|||
|
used for streaming or file transfer in a networking scenario. As
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Pfeiffer Informational [Page 11]
|
|||
|
|
|||
|
RFC 3533 OGG May 2003
|
|||
|
|
|||
|
|
|||
|
such, Ogg does not pose a threat there. However, an application
|
|||
|
decoding Ogg and its encapsulated content bitstreams has to ensure
|
|||
|
correct handling of manipulated bitstreams, of buffer overflows and
|
|||
|
the like.
|
|||
|
|
|||
|
8. References
|
|||
|
|
|||
|
[1] Walleij, L., "The application/ogg Media Type", RFC 3534, May
|
|||
|
2003.
|
|||
|
|
|||
|
[2] Bradner, S., "Key words for use in RFCs to Indicate Requirement
|
|||
|
Levels", BCP 14, RFC 2119, March 1997.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Pfeiffer Informational [Page 12]
|
|||
|
|
|||
|
RFC 3533 OGG May 2003
|
|||
|
|
|||
|
|
|||
|
Appendix A. Glossary of terms and abbreviations
|
|||
|
|
|||
|
bos page: The initial page (beginning of stream) of a logical
|
|||
|
bitstream which contains information to identify the codec type
|
|||
|
and other decoding-relevant information.
|
|||
|
|
|||
|
chaining (or sequential multiplexing): Concatenation of two or more
|
|||
|
complete physical Ogg bitstreams.
|
|||
|
|
|||
|
eos page: The final page (end of stream) of a logical bitstream.
|
|||
|
|
|||
|
granule position: An increasing position number for a specific
|
|||
|
logical bitstream stored in the page header. Its meaning is
|
|||
|
dependent on the codec for that logical bitstream and specified in
|
|||
|
a specific media mapping.
|
|||
|
|
|||
|
grouping (or concurrent multiplexing): Interleaving of pages of
|
|||
|
several logical bitstreams into one complete physical Ogg
|
|||
|
bitstream under the restriction that all bos pages of all grouped
|
|||
|
logical bitstreams MUST appear before any data pages.
|
|||
|
|
|||
|
lacing value: An entry in the segment table of a page header
|
|||
|
representing the size of the related segment.
|
|||
|
|
|||
|
logical bitstream: A sequence of bits being the result of an encoded
|
|||
|
media stream.
|
|||
|
|
|||
|
media mapping: A specific use of the Ogg encapsulation format
|
|||
|
together with a specific (set of) codec(s).
|
|||
|
|
|||
|
(Ogg) packet: A subpart of a logical bitstream that is created by the
|
|||
|
encoder for that bitstream and represents a meaningful entity for
|
|||
|
the encoder, but only a sequence of bits to the Ogg encapsulation.
|
|||
|
|
|||
|
(Ogg) page: A physical bitstream consists of a sequence of Ogg pages
|
|||
|
containing data of one logical bitstream only. It usually
|
|||
|
contains a group of contiguous segments of one packet only, but
|
|||
|
sometimes packets are too large and need to be split over several
|
|||
|
pages.
|
|||
|
|
|||
|
physical (Ogg) bitstream: The sequence of bits resulting from an Ogg
|
|||
|
encapsulation of one or several logical bitstreams. It consists
|
|||
|
of a sequence of pages from the logical bitstreams with the
|
|||
|
restriction that the pages of one logical bitstream MUST come in
|
|||
|
their correct temporal order.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Pfeiffer Informational [Page 13]
|
|||
|
|
|||
|
RFC 3533 OGG May 2003
|
|||
|
|
|||
|
|
|||
|
(Ogg) segment: The Ogg encapsulation process splits each packet into
|
|||
|
chunks of 255 bytes plus a last fractional chunk of less than 255
|
|||
|
bytes. These chunks are called segments.
|
|||
|
|
|||
|
Appendix B. Acknowledgements
|
|||
|
|
|||
|
The author gratefully acknowledges the work that Christopher
|
|||
|
Montgomery and the Xiph.Org foundation have done in defining the Ogg
|
|||
|
multimedia project and as part of it the open file format described
|
|||
|
in this document. The author hopes that providing this document to
|
|||
|
the Internet community will help in promoting the Ogg multimedia
|
|||
|
project at http://www.xiph.org/. Many thanks also for the many
|
|||
|
technical and typo corrections that C. Montgomery and the Ogg
|
|||
|
community provided as feedback to this RFC.
|
|||
|
|
|||
|
Author's Address
|
|||
|
|
|||
|
Silvia Pfeiffer
|
|||
|
CSIRO, Australia
|
|||
|
Locked Bag 17
|
|||
|
North Ryde, NSW 2113
|
|||
|
Australia
|
|||
|
|
|||
|
Phone: +61 2 9325 3141
|
|||
|
EMail: Silvia.Pfeiffer@csiro.au
|
|||
|
URI: http://www.cmis.csiro.au/Silvia.Pfeiffer/
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Pfeiffer Informational [Page 14]
|
|||
|
|
|||
|
RFC 3533 OGG May 2003
|
|||
|
|
|||
|
|
|||
|
Full Copyright Statement
|
|||
|
|
|||
|
Copyright (C) The Internet Society (2003). All Rights Reserved.
|
|||
|
|
|||
|
This document and translations of it may be copied and furnished to
|
|||
|
others, and derivative works that comment on or otherwise explain it
|
|||
|
or assist in its implementation may be prepared, copied, published
|
|||
|
and distributed, in whole or in part, without restriction of any
|
|||
|
kind, provided that the above copyright notice and this paragraph are
|
|||
|
included on all such copies and derivative works. However, this
|
|||
|
document itself may not be modified in any way, such as by removing
|
|||
|
the copyright notice or references to the Internet Society or other
|
|||
|
Internet organizations, except as needed for the purpose of
|
|||
|
developing Internet standards in which case the procedures for
|
|||
|
copyrights defined in the Internet Standards process must be
|
|||
|
followed, or as required to translate it into languages other than
|
|||
|
English.
|
|||
|
|
|||
|
The limited permissions granted above are perpetual and will not be
|
|||
|
revoked by the Internet Society or its successors or assigns.
|
|||
|
|
|||
|
This document and the information contained herein is provided on an
|
|||
|
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
|
|||
|
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
|
|||
|
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
|
|||
|
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
|
|||
|
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
|
|||
|
|
|||
|
Acknowledgement
|
|||
|
|
|||
|
Funding for the RFC Editor function is currently provided by the
|
|||
|
Internet Society.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Pfeiffer Informational [Page 15]
|
|||
|
|