Bob Currier, Synthetic Aperture
In part one of this article we looked at the need for compressing digital video and some of the methods people have come up with to accomplish that compression. This time let's look at some of the specific codecs which implement those methods, and when we might use them.
Most of these codecs are block-oriented. That is, they work by first dividing the frame image into regular blocks, usually 8x8 pixels square. Each block is examined for uniformity. If it is uniform, it is encoded as-is. If it is not, then it is further subdivided. This block-orientation is what leads to the blocky-pixel appearance when the codec breaks down. While most are block-oriented, they differ in how the blocks are encoded.
Cinepak
Cinepak was originally developed by SuperMac for use with Apple's QuickTime on the Macintosh. Since then it has been implemented for QuickTime for Windows, Microsoft's Video for Windows, 3DO and Nintendo game machines, and has been licensed for hardware implementation on PC-based video cards.
Cinepak is a lossy, block-oriented compressor, using vector quantization (VQ) to perform its compression. It has built-in data rate limiting, making it ideal for bandwidth-limited applications, such as delivering multimedia video from CD-ROM. It uses both spatial and temporal compression.
It is a software-only codec, not requiring hardware for compression or decompression, although some hardware implementations do exist. It is asymmetric, in that it takes much longer to compress the video than it does to decompress and display it, taking minutes to compress a single frame in extreme cases. Because it is software-only, playback size and frame-rate depends on the computer you are using, with the fastest systems achieving 30 field per second, 640x480 pixel playback, but with 15 fps, 320x240 pixel images more common.
Because it was designed specifically for multimedia playback from CD-ROM, it excels at that task. Unlike other codecs designed for that purpose, it handles video which contains a lot of motion quite well. However, in more static video, such as taking head shots, there is noticeable temporal aliasing, or "pixel crawl." Cinepak is currently the most popular codec for delivering video for multimedia applications.
Indeo 3.2
The Intel Indeo 3.2 codec grew out of the DVI codec, which Intel purchased from General Electric's Sarnoff Labs. Originally introduced for Video for Windows, Indeo has gone through several iterations, and is now available under Video for Windows, QuickTime, and some Unix systems.
Indeo is a lossy, block-oriented codec, using VQ to perform its compression. The latest version has built-in data rate limiting, and uses both spatial and temporal compression.
Originally intended for use with Intel's i750 hardware implementation, Indeo is now available as pure software. It is asymmetric, but compression typically is faster than with Cinepak. As with Cinepak, playback rate and frame size is limited to multimedia applications, unless hardware decompression is used.
Indeo's roots in hardware compression severely limited its software-only performance in early versions of Indeo. However, the latest version, 3.2, is a worthy competitor to Cinepak in the software-only area. Indeo maintains a sharper, more color correct, image with low motion video, but still loses to Cinepak on video with high motion content. Intel continues to refine their algorithm--as does SuperMac--so expect a continuing battle between these two codecs for multimedia supremacy.
Motion JPEG
Motion JPEG, or M-JPEG, is a video adaptation of the JPEG standard for still photos. It simply treats a video stream as a series of still photos, compressing each individually, with no interframe compression. Because it uses no interframe compression, it is ideal for editing; arbitrary cuts are not complicated by the loss of key frames.
JPEG was one of the first image compression standards to have special hardware support built for it. Adapting these still-image JPEG chips to video was a matter of feeding the chip a succession of images from the video stream. Because of the ready availability of these chips, many non-linear editing systems have been built around M-JPEG, including the Avid systems, the Radius VideoVision and Telecaster products, and the ImMix VideoCube.
JPEG is a lossy, block-oriented hardware-based compression method, using the discrete cosine transform (DCT) to perform the compression. Compression can be accomplished in real-time.
While the underlying JPEG compression method is a standard, the ways in which various implementations adapt JPEG to video differ, making the different systems' compressed video incompatible with each other. For example, Radius has implemented an adaptive compression method that varies the level of compression to maintain a constant data rate out to the hard disk.
While primarily used for video capture and non-linear editing, M-JPEG has found limited use in kiosk and other multimedia applications.
MPEG-1
The MPEG codecs are the only ones that can claim to be true standards. Based on the work of the Motion Picture Expert's group, a joint committee of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), they enjoy widespread acceptance and support.
MPEG-1 was designed to deliver 30 field per second video from bandwidth limited sources such as CD-ROM. MPEG-1 is a lossy, block-oriented compression method, using DCT to do the compression. It uses both spatial and temporal compression.
MPEG differs from other codecs in the way that it performs interframe compression. As with other codecs, it uses key frames--which are called I-frames in MPEG--which contain all of the information for the frame. But MPEG then uses two different types of interframe compression. The first, called P-frames, are frames that are based on past frames, with only the differences encoded, the traditional method of doing temporal compression However, the second, called B-frames, are bidirectionally encoded, based on both past and future frames in the video stream. These B-frames can be very highly compressed because they are based on the information contained in two other frames, making the differences which must be encoded quite small.
MPEG-1 was designed to use a frame size of 352x240 pixels, with each pixel horizontally and vertically doubled during playbackyielding a grainy picture, charitably called "VHS-quality." However, as with most standards, people have taken it on themselves to "expand" the standard, this time to encode 640x480 pixel frames. By using such non-standard changes, MPEG-1 has been extended well beyond its original CD-ROM playback origins to be used as the basis of some of the current DBS satellite TV systems.
While both compression and decompression of MPEG-1 is possible in software, it was designed to use special-purpose hardware. To achieve the highest quality of MPEG-1 compression, a lot of hardware horsepower must be used, making compression an expensive proposition. Playback can be done with lower-cost, consumer-level hardware. With the increasing computing power of PC's, software-only playback of MPEG-1 will likely become common.
Some vendors are experimenting with using MPEG hardware in editing systems but, in general, MPEG should be considered a delivery system and not an editing system, due to the high level of interframe compression. And while MPEG-1 has received a lot of verbal support, it has yet to really take off in the consumer market, its original target.
MPEG-2
MPEG-2 was designed to build on the MPEG-1 standard and be used in high-bandwidth applications such as satellite delivery. It delivers 60 field per second video at full CCIR 601 resolution.
Because it took a while to finalize the MPEG-2 standard, many users have extended the MPEG-1 standard instead, delivering systems often called MPEG-1.5. This is likely to be a temporary situation as the MPEG-2 standard becomes more widely available, driving costs down.
MPEG-2 requires special high-speed hardware for compression and playback. Real-time compression of MPEG-2 is not yet generally available, requiring all video to be pre-compressed. This is a major stumbling block to its use in systems that must cover live events.
As with MPEG-1, MPEG-2 is not well suited to editing applications.
Fractal Compression
Fractal compression is based on the patented work of Dr. Michael Barnsley. Fractal compression offers the advantage of being resolution-independent: in theory you can scale up an image without loss of resolution.
Like many of the other codecs we've discussed, fractal compression is block oriented. But rather than representing similar blocks in a lookup dictionary, fractal compression represents them as mathematical (fractal) equations.
Fractal compression is highly asymmetric because determining the mathematical equations is very compute intensive. However, decoding the image for display is very fast.
While there is great promise in fractal compression, it has yet to gain significant use.
Wavelet Compression
Wavelet compression performs compression by breaking each frame apart based on frequency. This allows it to preserve high-frequency information (edges, fine detail) by using a lower level of compression,while compressing lower-frequency content to a greater degree.
Wavelet compression is symmetric, compressing and decompressing quite quickly. The most widely known use of wavelet compression was the "Captain Crunch" codec, announced--but never released--by the now-bankrupt Media Vision.
Proprietary Codecs
In addition to the codecs already discussed, there are numerous proprietary codecs in use. These range from ones designed for very high compression rates in video games, to the ones used in Sony's Digital BetaCam and Ampex's DCT, which use very low levels of compression to maintain very high quality. Most of these are built into specific pieces of equipment, so we have little choice in what to use; the manufacturer has already decided for us.
Summary
Much work continues to be done in developing codecs. The different applications--multimedia, broadcast, acquisition--all have different requirements, and the increasing power of inexpensive processors keeps changing the equation of whether hardware-assisted or software-only solutions are best. While we may eventually settle on one or two standards for each application, we are still a long way from determining what those standards will be.
| Codec | Data Rate Mb/sec | Lossy | Spatial Compr. | Temporal Compr. | Editable | Frame Size | Field Rate | Special Hardware | Quality |
|---|---|---|---|---|---|---|---|---|---|
| Cinepak | 0.1-4 | Yes | Yes | Yes | Yes | 160x120- 640x480 | 8-30 | No | Good |
| Indeo 3.2 | 0.1-4 | Yes | Yes | Yes | Yes | 160x120- 320x240 | 8-30 | Optional | Good |
| M-JPEG | 1-10 | Yes | Yes | No | Yes | 160x120- 640x480 | 60 | Yes | Better |
| MPEG-1 | 1.5 | Yes | Yes | Yes | No | 352x240 | 30 | Yes | Good |
| MPEG-2 | 1.5-100 | Yes | Yes | Yes | No | 720x480 | 60 | Yes | Better |
| Fractal | 0.1-4 | Yes | Yes | Yes | Yes | 160x120- 640x480 | 8-30 | Yes | Good |
| Wavelet | 0.1-4 | Yes | Yes | Yes | Yes | 160x120- 320x240 | 8-30 | No | Good |
| Digital BetaCam | 135 | Yes | Yes | No | Yes | 720x480 | 60 | Yes | Best |
| CCIR 601 | 270 | No | No | No | Yes | 720x480 | 60 | Yes | Very Best |
Bob Currier is President of Synthetic Aperture, a multimedia production company specializing in digital video and QuickTime VR. He also serves as Sysop of the Macintosh Multimedia Forum on CompuServe.
He can be reached at rcurrier@synthetic-ap.com. Be sure to visit the Synthetic Aperture web site at <http://www.synthetic-ap.com/> for more tutorial information, sample content, and information on new media services.
This article orignally appeared in a slightly different form in Computer Video Production magazine.
|
Tips & Articles | About Us | What's New | Press Room |