Order-independent transparency and anti-aliasing

An order-independent transparency (OIT) rendering algorithm is one that allows triangles with partially transparent textures to be drawn in any order, without having to process them in back-to-front or front-to-back order as in traditional alpha blending. Research has produced many approaches to this (see references below), each with their own benefits and drawbacks.

This engine's goal is to render dense foliage efficiently. To maximize triangle/pixel throughput, it uses the idea that transparency can be approximated by randomly deciding whether to draw each pixel or not by using opacity as a probability (known as alpha stippling/dithering). As such, this algorithm would produce way too much noise, but anti-aliasing with specific tweaks is used to reduce it without affecting performance too much.

Because each subpixel is drawn either as fully opaque or fully transparent, no order-dependent blending is needed, and the triangles/pixels can be drawn in any order. Furthermore, the depth buffer can be used to skip pixels (or quads or Hi-Z/Early-Z blocks) of foliage that is behind other foliage. Objects to be rendered are still coarsely sorted in a front-to-back order, to help the depth buffer in reducing overdraw. The z axis in front of the camera is divided into ranges of exponentially increasing size, and within each depth range, objects are sorted by drawing state (shader, texture etc.), which reduces the number of draw calls and GPU state changes and improves texture cache locality.

This system allows rendering large amounts of foliage with much better performance than e.g. full alpha blending (that has a lot of overdraw and requires sorting), and almost as good image quality when sufficient noise reduction is used. Additionally, it allows efficiently cross-fading between different detail-level versions of a 3D model.

(click image to zoom)

Random seed

The decision whether to draw a pixel or not is made using a pseudo-random number calculated from a random seed. The random seed is formed as a combination of the pixel position on the screen and an ID of the surface being drawn. With foliage, the surface is usually a quad of two triangles and this is applied to the random seed by using the index of the quad in the model. Different surfaces should have different random seeds, so that surfaces covering the same pixels don't make the same "random" transparency decisions at each pixel, because then the surface in front would entirely occlude the surface at the back (which is what happens with typical GPU alpha-to-coverage algorithms).

Time is not included in the random seed unless temporal anti-aliasing is enabled, because that would only add temporal noise when the camera is standing still. With temporal anti-aliasing, a little temporal noise may actually bring more input for the temporal anti-aliasing algorithm and helps calculate better color averages, although it is unclear whether this brings any additional benefit when camera jittering is used anyway. The "time" used in the calculations is actually the lowest few bits of the current frame number, with the bits reversed to produce a better random distribution on successive frames.

Multisampling

Multi-sample anti-aliasing (MSAA) is used to reduce noise of the order-independent transparency algorithm. Whereas without multisampling, each pixel is written by the pixel shader as fully opaque or fully transparent, with multisampling each subpixel sample can be either opaque or transparent. During post-processing, the samples of each pixel are averaged to produce the final color of the pixel, which effectively allows a few different transparency levels to be output by the pixel shader.

The alpha value in the pixel shader determines how many subpixels are written as opaque and how many as transparent. The number is rounded up or down randomly with a probability that causes the final pixel to have the correct opacity on average. Which exact subpixels to write as opaque is chosen randomly instead of e.g. always choosing the first n pixels, so that overlapping surfaces will have correct opacity as described in the section above.

Note that the choice of whether to write an opaque or transparent pixel is not done independently for each subpixel, because that would only generate more randomness and thus noise, without any benefit, compared to always using either the rounded-up or rounded-down number of subpixels.

Multisampling consumes more render buffer memory and bandwidth, although some GPUs are able to optimize the writing of multiple samples within the same pixel so that the color data needs to be sent across the memory bus only once. Another reason multisampling reduces performance is because it becomes more likely that at least one sample within a pixel (or a group of pixels) is not yet covered by a semi-transparent object that has been drawn in front and therefore fewer pixels can be skipped by depth buffer checks.

Temporal anti-aliasing

While regular supersampling-anti-aliasing would reduce aliasing by taking multiple color samples from slightly different positions, temporal anti-aliasing (TAA) takes samples from slightly earlier times.

When rendering objects in the pixel shader, it is determined how much the point on the object's surface has moved since the previous frame. The screen position difference is written as a 2D vector in a "velocity buffer". After the scene has been rendered, each pixel is processed, and a weighted average is calculated between the pixel color on the current frame and the previous frame. Because the previous frame's color was also an average with the earlier frame and so on, the result is essentially a weighted average of the color of the same surface point on multiple earlier frames.

In terms of performance, this is a relatively efficient way of collecting more color samples for a pixel, because the data from previous frames is readily available as an image without having to actually render more samples. Enabling temporal anti-aliasing reduced framerate by approximately 10-15% and almost completely removed noise from foliage.

In practice, the algorithm requires several tweaks to avoid blurring / ghosting, because of inaccuracies and special situations, such as objects going behind each other or coming out from behind each other. See the links below for researched methods. In this engine, the temporal anti-aliasing is mainly tuned to reduce noise from the order-independent transparency algorithm, and uses techniques such as velocity-dilation, neighborhood-clipping, jittering and image sharpening. It also supports disabling temporal anti-aliasing for individual pixels by writing a special value to the velocity buffer, which is needed in e.g. reflective water surfaces that do not support calculating pixel velocities. The velocity buffer uses 16-bit fixed-point format and the same number of multisamples as the color buffer (4 at maximum detail settings). When multisample-resolving velocities, the largest velocity sample is used, ignoring samples with a temporal-aa-disabled value.

When MSAA and TAA are used at the same time, the multisample resolve must also collect min/max colors and max velocity. To avoid wasting bandwidth on writing these to temporary buffers, the image is processed in 16x16 tiles by a compute shader, which does the multisample resolve and writes temporary min/max-colors and velocities to group-shared memory from a slightly larger 18x18 tile (for the velocity-dilation and neighborhood-clipping). Each thread first fetches its pixel inside the 16x16 tile, then an edge pixel of the 18x18 tile if necessary (which are divided systematically between the threads).

No anti-aliasing. Note the noisy stipple pattern in the foliage and the jagged edge of the tree.

MSAA 4x.

MSAA 4x and TAA. A bigger benefit not seen in these static images is the reduction of temporal noise when the camera is moving.

View the comparison videos below at the highest resolution and full-screen to see the difference in the amount of noise (although it still gets partially filtered because of video compression).

Cross-fade

With traditional alpha blending, cross-fading between e.g. LOD versions of a model cannot be done by simply rendering all surfaces of both models with an opacity factor, because e.g. drawing both models with 50% opacity would make the inside of the models as well as the background partially visible.

With the order-independent transparency algorithm, we can form a bit mask in the pixel shader that specifies which subpixel samples are allowed to be drawn for each of the two models, and then draw the models as usual, except use the logical and operator to only draw to the allowed samples. The number of samples to assign to each version is chosen according to the cross-fade factor (a number within 0..1), by multiplying it with the total number of samples and rounding randomly up or down, and making the rounding decision with a probability that depends on how close the factor was to the upper or lower number (and as explained earlier, this kind of rounding causes less noise than simply choosing each mask bit randomly according to the cross-fade factor).

The see-through problem gets solved, because a solid surface in front will cover all samples assigned to that version of the model and surfaces behind it (in the same model version) will be completely hidden. Also surfaces of any opacity will have the correct average opacity. The background will be fully covered if it is fully covered by both versions, and covered with correct opacity if both versions are partially transparent.

A zoomed image of a tree top before cross-fade (low-detail billboard).

Tree top during cross-fade. The pattern could probably be less regular if a better random distribution was used, but the pattern is not bothersome in a non-zoomed image.

Tree top after cross-fade (high-detail 3D model).

Other considerations

The quality/performance trade-off of this algorithm can be adjusted by changing the number of multisamples and disabling temporal anti-aliasing or changing its parameters.

Much of the foliage consists of sets of two or three intersecting quads that contain e.g. several leaves. When a quad is at an oblique angle towards the camera, its flatness would be more visible and the leaves would look less natural. To hide this, the quads are faded gradually to become transparent as the angle approaches oblique.

Unlike traditional alpha-testing and alpha-to-coverage methods, this algorithm does not require textures to be preprocessed so that the percentage of pixels exceeding each alpha testing/coverage threshold would be the same in all mipmaps, which would have been necessary to prevent textures from becoming more transparent in the distance (see Computing Alpha Mipmaps). It also does not have the alpha-to-coverage's problem of overlapping semi-transparent surfaces being too transparent when behind other semi-transparent surfaces, which would occur because both surfaces would occupy the same subsamples.

Links

Order-independent transparency:

Temporal anti-aliasing: