An Apple patent (number 8018994) for selecting encoding types and predictive mode for encoding types and predictive modes for encoding video data. It's directed towards selecting encoding types and predictive modes for encoding video data.
In some embodiments, a method of determining encoding type and predictive mode(s) selections for a macroblock of a video frame is provided. In some embodiments, a general method 1) selects the encoding type (16.times.16 or 4.times.4) that is initially considered for a macroblock using an encoding type selection algorithm (based on an attribute of the macroblock that is easy to compute), 2) if the 16.times.16 encoding type is selected in step 1, consider the four 16.times.16 prediction modes that may be used on the macroblock using conventional methods or an improved 16.times.16 predictive mode search algorithm based on distortion thresholds, and 3) if the 4.times.4 encoding type is selected in step 1, select the 4.times.4 prediction mode to be used for each of the sixteen 4.times.4 blocks of the macroblock using conventional methods or an improved 4.times.4 predictive mode search algorithm based on the positional relationships between predictive modes. The inventors ate Xin Tong and Xiaocun Nie.
Here's Apple's background and summary of the invention: "A video stream is comprised of a sequence of video frames where each frame is comprised of multiple macroblocks. Each macroblock is typically a 16.times.16 array of pixels, although other sizes of macroblocks are also possible. Video codecs (COmpressor-DECompressor) are software, hardware, or combined software and hardware implementations of compression algorithms designed to encode/compress and decode/decompress video data streams to reduce the size of the streams for faster transmission and smaller storage space. While lossy, video codecs attempt to maintain video quality while compressing the binary data of a video stream. Examples of popular video codecs include WMV, RealVideo, as well as implementations of compression standards such as MPEG-2, MPEG-4, H.261, H.263, and H.264.
"Under H.264 compression standards, a macroblock of a video frame can be intra encoded as a 16.times.16 pixel array, the pixel values of the array being predicted using values calculated from previously encoded macroblocks. A 16.times.16 macroblock can also be intra encoded as sixteen 4.times.4 pixel arrays, where pixel values in each 4.times.4 array are predicted using values calculated from previously encoded 4.times.4 arrays. There are 4 possible intra prediction modes for 16.times.16 arrays (luma blocks) and 9 possible intra prediction modes for 4.times.4 arrays (luma blocks).
"As such, in encoding a macroblock, two determinations (selections) must be made: 1) whether the macroblock is to be encoded as a 16.times.16 array (referred to herein as 16.times.16 encoding) or as sixteen 4.times.4 arrays (referred to herein as 4.times.4 encoding), and 2) the predictive mode(s) to be used to encode the macroblock. For example, if it is determined that the macroblock is to be encoded as a 16.times.16 array, it must also be determined which of the four predictive modes for the 16.times.16 array is to be used.
"If it is determined that the macroblock is to be encoded as a sixteen 4.times.4 arrays, it must also be determined, for each of the sixteen 4.times.4 arrays, which of the nine predictive modes for the 4.times.4 array is to be used. Step 1 is referred to herein as encoding type selection and step 2 is referred to herein as predictive mode selection.
"Encoding type selection and predictive mode selection are made using cost functions. For example, cost functions are typically used to determine whether a macroblock is to be encoded as a 16.times.16 array or as sixteen 4.times.4 arrays where the type of encoding (16.times.16 or 4.times.4 encoding) having the lower cost is chosen. Cost is typically equal to the distortion or the weighted average of distortion plus an estimate of the number of bits produced by the prediction mode, where an increase in distortion and/or number of bits increases the cost.
"Distortion reflects the difference between original pixel values and predicted (or encoded) values and can be measured in various ways. For example, distortion can be measured as the sum of the absolute differences between the original pixel values and predicted (or encoded) values.
"An exhaustive search approach to selecting an optimal encoding type (16.times.16 or 4.times.4 encoding) and optimal predictive mode(s) for a macroblock involves determining costs of all four 16.times.16 prediction modes and all combinations of nine 4.times.4 prediction modes for sixteen 4.times.4 blocks in the macroblock, where a 16.times.16 prediction mode or a particular combination of 4.times.4 prediction modes that gives the lowest cost is selected. For each macroblock, the exhaustive search approach requires consideration of 9^16 different combinations of 4.times.4 prediction modes, rendering the exhaustive search approach practically infeasible.
"As such, the following operations are typically performed to determine the encoding type and predictive mode(s) for a macroblock: 1) Compute the cost of all four possible 16.times.16 predictive modes. 2) For each of the sixteen 4.times.4 blocks, select the predictive mode (among the 9 predictive modes) having the lowest cost, and then compute the total cost of the resulting combination (i.e., the sum cost of the sixteen determined costs). 3) Compare the cost determined at step 1 with the cost determined at step 2 and select the lowest one. This selection provides both the encoding type selection and the predictive mode(s) selection.
"The conventional approach, however, still involves determining costs for 9.times.16 different combinations of the 4.times.4 predictive modes plus the costs for the four 16.times.16 predictive modes."
-- Dennis Sellers