H.264/MPEG-4 AVC
|
Title_h264.jpg
H.264, or MPEG-4 Part 10, is a high compression digital video codec standard written by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG) as the product of a collective partnership effort known as the Joint Video Team (JVT). The ITU-T H.264 standard and the ISO/IEC MPEG-4 Part 10 standard (formally, ISO/IEC 14496-10) are technically identical, and the technology is also known as AVC, for Advanced Video Coding. The final drafting work on the first version of the standard was completed in May of 2003.
H.264 is a name related to the ITU-T line of H.26x video standards, while AVC relates to the ISO/IEC MPEG side of the partnership project that completed the work on the standard, after earlier development done in the ITU-T as a project called H.26L. It is usual to call the standard as H.264/AVC (or AVC/H.264 or H.264/MPEG-4 AVC or MPEG-4/H.264 AVC) to emphasize the common heritage. The name H.26L, harkening back to its ITU-T history, is far less common, but still used. Occasionally, it has also been referred to as "the JVT codec", in reference to the JVT organization that developed it. (Such partnership and multiple naming is not unprecedented, as the video codec standard known as MPEG-2 also arose from a partnership between MPEG and the ITU-T, and MPEG-2 video is also known in the ITU-T community as H.262.)
The intent of H.264/AVC project has been to create a standard that would be capable of providing good video quality at bit rates that are substantially lower (e.g., half or less) than what previous standards would need (e.g., relative to MPEG-2, H.263, or MPEG-4 Part 2), and to do so without so much of an increase in complexity as to make the design impractical (expensive) to implement. An additional goal was to do this in a flexible way that would allow the standard to be applied to a very wide variety of applications (e.g., for both low and high bit rates, and low and high resolution video) and to work well on a very wide variety of networks and systems (e.g., for broadcast, DVD storage, RTP/IP packet networks, and ITU-T multimedia telephony systems).
The JVT recently completed the development of some extensions to the original standard that are known as the Fidelity Range Extensions (FRExt). These extensions support higher-fidelity video coding by supporting increased sample accuracy (including 10-bit and 12-bit coding) and higher-resolution color information (including sampling structures known as YUV 4:2:2 and YUV 4:4:4). Several other features are also included in the Fidelity Range Extensions project (such as adaptive switching between 4×4 and 8×8 integer transforms, encoder-specified perceptual-based quantization weighting matrices, efficient inter-picture lossless coding, support of additional color spaces, and a residual color transform). The design work on the Fidelity Range Extensions was completed in July of 2004, and the drafting was finished in September of 2004.
Since the completion of the original version of the standard in May of 2003, the JVT has also done one round of "corrigendum" errata corrections, and an additional round of such corrigendum work was recently completed and approved in the ITU-T and will soon also be finished in MPEG.
Contents |
Features
H.264/AVC contains a number of new features that allow it to compress video much more effectively than older codecs and to provide more flexibility for application to a wide variety of network environments. In particular, some such key features include:
- Multi-picture motion compensation using previously-encoded pictures as references in a much more flexible way than in past standards, thus allowing up to 32 reference pictures to be used in some cases (unlike in prior standards, where the limit was typically one or, in the case of conventional "B pictures", two). This particular feature usually allows modest improvements in bit rate and quality in most scenes. But in certain types of scenes, for example scenes with rapid repetitive flashing or back-and-forth scene cuts or uncovered background areas, it allows a very significant reduction in bit rate.
- Variable block-size motion compensation (VBSMC) with block sizes as large as 16×16 and as small as 4×4, enabling very precise segmentation of moving regions.
- Six-tap filtering for derivation of half-pel luma sample predictions, in order to lessen the aliasing and eventually provide sharper image.
- Macroblock pair structure, allowing 16x16 macroblocks in field mode (vs. 16x8 in MPEG-2).
- Quarter-pixel precision for motion compensation, enabling very precise description of the displacements of moving areas. For chroma the resolution is typically halved (see 4:2:0) therefore the motion compensation precision is down to one-eighth pixel.
- Weighted prediction, allowing an encoder to specify the use of a scaling and offset when performing motion compensation, and providing a significant benefit in performance in special cases—such as fade-to-black, fade-in, and cross-fade transitions.
- An in-loop deblocking filter which helps prevent the blocking artifacts common to other DCT-based image compression techniques.
- An exact-match integer 4×4 spatial block transform (similar to the well-known DCT design), and in the case of the new FRExt "High" profiles, the ability for the encoder to adaptively select between a 4×4 and 8×8 transform block size for the integer transform operation.
- A secondary Hadamard transform performed on "DC" coefficients of the primary spatial transform (for chroma DC coefficients and also luma in one special case) to obtain even more compression in smooth regions.
- Spatial prediction from the edges of neighboring blocks for "intra" coding (rather than the "DC"-only prediction found in MPEG-2 Part 2 and the transform coefficient prediction found in H.263+ and MPEG-4 Part 2).
- Context-adaptive binary arithmetic coding (CABAC), which is a clever technique to losslessly compress syntax elements in the video stream knowing the probabilities of syntax elements in a given context.
- Context-adaptive variable-length coding (CAVLC), which is a lower-complexity alternative to CABAC for the coding of quantized transform coefficient values. Although lower complexity than CABAC, CAVLC is more elaborate and more efficient than the methods typically used to code coefficients in other prior designs.
- A common simple and highly-structured variable length coding (VLC) technique for many of the syntax elements not coded by CABAC or CAVLC, referred to as Exponential-Golomb (Exp-Golomb) coding.
- A network abstraction layer (NAL) definition allowing the same video syntax to be used in many network environments, including features such as sequence parameter sets (SPSs) and picture parameter sets (PPSs) that provide more robustness and flexibility than provided in prior designs.
- Switching slices (called SP and SI slices), features that allow an encoder to direct a decoder to jump into an ongoing video stream for such purposes as video streaming bit rate switching and "trick mode" operation. When a decoder jumps into the middle of a video stream using the SP/SI feature, it can get an exact match to the decoded pictures at that location in the video stream despite using different pictures (or no pictures at all) as references prior to the switch.
- Flexible macroblock ordering (FMO, also known as slice groups) and arbitrary slice ordering (ASO), which are techniques for restructuring the ordering of the representation of the fundamental regions (called macroblocks) in pictures. Typically considered an error/loss robustness feature, FMO and ASO can also be used for other purposes.
- Data partitioning (DP), a feature providing the ability to separate more important and less important syntax elements into different packets of data, enabling the application of unequal error protection (UEP) and other types of improvement of error/loss robustness.
- Redundant slices (RS), an error/loss robustness feature allowing an encoder to send an extra representation of a picture region (typically at lower fidelity) that can be used if the primary representation is corrupted or lost.
- A simple automatic process for preventing the accidental emulation of start codes, which are special sequences of bits in the coded data that allow random access into the bitstream and recovery of byte alignment in systems that can lose byte synchronization.
- Supplemental enhancement information (SEI) and video usability information (VUI), which are extra information that can be inserted into the bitstream to enhance the use of the video for a wide variety of purposes.
- Auxiliary pictures, which can be used for such purposes as alpha compositing.
- Frame numbering, a feature that allows the creation of "sub-sequences" (enabling temporal scalability by optional inclusion of extra pictures between other pictures), and the detection and concealment of losses of entire pictures (which can occur due to network packet losses or channel errors).
- Picture order count, a feature that serves to keep the ordering of the pictures and the values of samples in the decoded pictures isolated from timing information (allowing timing information to be carried and controlled/changed separately by a system without affecting decoded picture content).
These techniques, along with several others, help H.264 to perform significantly better than any prior standard can, under a wide variety of circumstances in a wide variety of application environments. H.264 can often perform radically better than MPEG-2 video—typically obtaining the same quality at half of the bit rate or less.
Like other ISO/IEC MPEG video standards, H.264/AVC has a reference software implementation that can be freely downloaded. Its main purpose is to give examples of H.264/AVC features, rather than being a useful application per se. (See the links section for a pointer to that software.) Some reference hardware design work is also under way in MPEG.
Patent licensing
As with MPEG-2 Parts 1 and 2 and MPEG-4 Part 2, the vendors of H.264/AVC products and services are expected to pay patent licensing royalties for the patented technology that their products use. The primary source of licenses for patents applying to this standard is a private organization known as MPEG-LA (http://www.mpegla.com/avc/), LLC (which is not affiliated in any way with the MPEG standardization organization, but which also administers patent pools for MPEG-2 Part 1 Systems, MPEG-2 Part 2 Video, MPEG-4 Part 2 Video, and other technologies).
Applications
Both of the major candidate next-generation DVD rival formats planned for product deployment in late 2005 include the H.264/AVC High Profile as a mandatory player feature — specifically:
- The HD-DVD format of the DVD Forum
- The Blu-ray Disc format of the Blu-Ray Disc Association (BDA)
The Digital Video Broadcast (DVB) standards body in Europe approved the use of H.264/AVC for broadcast television in Europe in late 2004.
The prime minister of France, Jean-Pierre_Raffarin, announced the selection of H.264/AVC as a requirement for receivers of HDTV and pay TV channels for digital terrestrial broadcast television services (referred to as "TNT") in France in late 2004.
The Advanced Television Systems Committee (ATSC) standards body in the United States is in final consideration work on potential use of H.264/AVC for U.S. broadcast television.
The Digital Multimedia Broadcast (DMB) service in the Republic of Korea will use H.264/AVC.
Mobile-segment terrestrial broadcast services of ISDB-T in Japan will use the H.264/AVC codec, including major broadcasters:
- NHK
- Tokyo Broadcasting System (TBS)
- Nippon Television (NTV)
- TV Asahi
- Fuji TV
- TV Tokyo
Direct broadcast satellite TV services will use the new standard, including:
- News Corp. / DirecTV (in the United States)
- Echostar / Dish Network / Voom TV (in the United States)
- Euro1080 (in Europe)
- Premiere (in Germany)
- BSkyB (in the United Kingdom and Ireland)
The 3rd Generation Partnership Project (3GPP) has approved the inclusion of H.264/AVC as an optional feature in release 6 of its mobile multimedia telephony services specifications.
The Motion Imagery Standards Board (MISB) of the United States Department of Defense (DoD) has adopted H.264/AVC as its preferred video codec for essentially all applications.
The Internet Engineering Task Force (IETF) has completed a payload packetization format (RFC 3984) for carrying H.264/AVC video using its Real-time Transport Protocol (RTP).
The Internet Streaming Media Alliance (ISMA) has adopted H.264/AVC for its new ISMA 2.0 specifications.
The Moving Picture Experts Group (MPEG) has fully integrated support of H.264/AVC into its system standards (e.g., MPEG-2 and MPEG-4 systems) and its ISO media file format specification.
The International Telecommunications Union-Telecom. Standardization Sector (ITU-T) has adopted H.264/AVC in its H.32x suite of multimedia telephony systems specifications. Based on the ITU-T standards, H.264/AVC is already widely used for videoconferencing, including its support in products of the two main companies in that market (Polycom and Tandberg). Essentially all new videoconferencing products now include support for H.264/AVC.
H.264 will probably be used by various video-on-demand services on the Internet to provide films and television shows directly to computers.
Products and Implementations
Several companies are producing custom chips capable of decoding H.264/AVC video. As of January 2005, sample quantities are available from Broadcom (the BCM7411), Conexant (the CX2418X), Neomagic (MiMagic 6), and STMicroelectronics (the STB7100). Sigma Designs predicts samples for March 2005. Such chips will allow widespread deployment of low-cost devices capable of playing H.264/AVC video at standard-definition and high-definition television resolutions. Four out of five of these chips (all but the Neomagic chip, which is targeted for low-power applications) will include HDTV video quality capability, and most will support the new High profile of the standard.
Apple Computer has integrated H.264 into Mac OS X version 10.4 (Tiger), as well as Quicktime version 7, which was released on April 29 2005 with Tiger. In April 2005, Apple Computer updated its version of DVD Studio Pro to support authoring HD content. DVD Studio Pro allows for the burning of HD-DVD content to both standard DVD's and HD-DVD media (even though no burners are available). For playing back HD-DVDs burnt onto a standard DVD, Apple requires a PowerPC G5, Apple DVD Player v4.6, and Mac OS X v10.4 or later.
Envivio, Inc. is shipping broadcast H.264 encoders for standard definition live encoding and off-line encoders for High Definition (720p, 1080i, 1080p). Envivio also supplies H.264 decoders for Windows, Linux and Macintosh as well as H.264 Video Servers and Authoring tools.
Modulus Video is shipping broadcast-quality H.264 standard definition real-time encoders to broadcasters (including telephone companies) and has announced its high definition real-time encoder (the ME6000) for shipment in mid 2005. The Modulus Video HD encoder technology was demonstrated at NAB in April 2004, where it won a "Pick Hit" award. The Modulus design uses technology from LSI Logic.
Tandberg television has announced a real-time high-definition encoder product (the EN5990). DirecTV and BSkyB have selected that product for their DBS deployments.
Harmonic has announced a real-time encoder product (the DiviCom MV 100). TF1 (the French broadcaster) and the Video Networks Limited (VNL) Homechoice video on demand service in London have announced the use of that product.
ATI Technologies has announced that its next-generation graphics processing unit (GPU), codenamed R520, would feature hardware acceleration of H.264. [1] (http://www.xbitlabs.com/news/video/display/20050526100319.html) [2] (http://apps.ati.com/ir/PressReleaseText.asp?compid=105421&releaseID=713852)
The Premiere DBS deployment will use set-top boxes from Pace Micro.
The PlayStation Portable console features hardware decoding of video files in the H.264 format.
The Nero Digital package, co-developed by Nero AG and Ateme, includes an H.264 encoder which was judged best overall by Doom9 in its 2004 codec shoot-out [3] (http://www.doom9.org/codecs-104-1.htm)".
Sorenson offers an implementation of H.264. The Sorenson AVC Pro codec is available in Sorenson Squeeze 4.1 for MPEG-4.
The free x264 codec has been released under the terms of the GPL license and is used in the free VideoLAN and MPlayer multimedia players. A Video for Windows frontend is also available.
External links
- H.264/AVC overview paper including new FRExt enhancements (Sullivan, Topiwala, and Luthra) (http://www.fastvdo.com/spie04/)
- Various papers on H.264/AVC and related topics (Wiegand) (http://iphome.hhi.de/wiegand/pubs.htm)
- More papers on H.264/AVC and related topics (Marpe) (http://iphome.hhi.de/marpe/pub.htm)
- H.264/AVC Software Coordination (Suehring) (http://iphome.hhi.de/suehring/tml/)
- H.264/MPEG-4 Part 10 Tutorials (Richardson) (http://www.vcodex.com/h264.html)
- Book: H.264 and MPEG-4 Video Compression (Richardson) (http://www.vcodex.com/h264mpeg4/)
- H.264/AVC Textbook (in Japanese: Okubo, Kadono, Kikuchi, and Suzuki) (http://internet.impress.co.jp/books/1983/)
- JVT Experts Group document archive (http://ftp3.itu.ch/av-arch/jvt-site)
- MPEG LA Terms of H.264/MPEG-4 AVC Patent License (http://www.mpegla.com/news/n_03-11-17_avc.html)
- A fast GPL H.264 encoder library with support for most H.264 features (http://www.videolan.org/x264.html)
- MPEG Industry Forum (http://www.m4if.org/)
- ITU-T official publication page (http://www.itu.int/rec/recommendation.asp?type=folders&lang=e&parent=T-REC-H.264)
- ISO official publication page (http://www.iso.ch/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=40890&ICS1=35&ICS2=40&ICS3=)
- W&W Communications H.264 Overview and IEEE Paper (http://www.wwcoms.com/technology/standard.htm)
- Apple's H.264 Gallery (http://www.apple.com/quicktime/hdgallery/)