Implemented Path tracing using Direct3D 12 and DirectX Raytraycing (DXR)
Implemented spatiotemporal reservoir importance resampling (ReSTIR) for sampling lights during the direct lighting phase in multiple light scenarios to gain a performance boost while retaining the visual quality compared to other importance sampling techniques.
Added Bilateral Filter for denoising, providing a low-noise/noise free final image
Pipeline Overview
The pipeline is structured around five phases
GBuffer and Reservoir Sampling: Stores all the geometric data of the scene in different Gbuffer textures and perform reservoir sampling to choose a light that contributes the most to a point that the pixel is currently rendering.
Temporal Reuse: Uses reservoir data from the previous for the pixel and combines it with the current frame reservoir data, effectively giving the reservoir access to
Spatial Reuse: Combines data from up to 5 neighboring reservoirs from a square radius of 32 reservoirs
Lighting: Uses the light chosen by reservoir to color a point for direct lighting. Performs simple uniform sampling for 3 bounce indirect lighting
Denoise: A joint bilateral filter denoises the render by averaging each pixel with its neighbors, weighted by spatial distance, world-position similarity, and normal similarity to preserve edges, trading a small amount of detail for significantly reduced noise.
GBuffer and Reservoir Sampling
GBuffers in order: Position, BaseColor, Pixel normal, Surface Normal, Roughness, Metalness, Depth
Motion Vector/Velocity Buffer
The GBuffer pass has two parts. The first populates seven textures, each storing a distinct surface attribute needed by later passes:
Position Buffer - World-space hit position for each pixel's primary camera ray.
Shading Normal Buffer - Per-pixel normals derived from normal maps, used in lighting calculations.
Surface Normal Buffer - Geometric normals from the mesh, used to offset ray origins slightly above the surface to prevent self-shadowing.
Depth Buffer - Distance from each world-space point to the camera.
Motion Vector Buffer - Per-pixel 2D screen-space offset between the current and previous frame, used for temporal reprojection.
Base Color Buffer - Surface albedo sampled from the diffuse texture.
Roughness-Metalness Buffer - Surface material properties packed into the green (roughness) and blue (metalness) channels of a single texture.
Part two of this phase is reservoir sampling. Each pixel chooses 32 lights at random from the scene, regardless of the visibility and where the light is at from the point in world that the pixel is shading. Each of the 32 lights chosen purely at random is weighted based on the how much they would be contributing to the point's color. After a light is assigned a weight, based on the light's weight, there is a chance for the light to be stored in the pixel's reservoir structure.
After all the lights are processed, if the reservoir chooses a light, it's weight is recomputed based on how may lights the reservoir has seen and the sum of weights all the lights it has seen.
White pixels are the ones that chose shadowed lights
Render after reservoir sampling (no denoising or frame accumulation). Noise is the result of choosing shadowed lights
Temporal Reuse
During temporal reuse, the current frame's reservoir of a pixel is combined with a the previous frame's reservoir for the pixel shading the same point. This is achieved by using motion vectors to access the required pixel and validate by comparing the normals and depth. When valid, merging the two reservoirs is equivalent to having sampled up to 640 candidate lights (20x the initial 32) at minimal additional cost, noticeably improving visual quality.
Image after temporal reuse. Slightly cleaner and well defined around the lights compared to the previous pass
Spatial Reuse
For spatial reuse, each pixel samples from 5 other pixels at random from within a 32 pixel radius, and validate by comparing the depth. rejecting neighbors that differ too much to avoid blending samples across geometrically unrelated surfaces. This allows good light samples found on one pixel to propagate quickly across the image, producing a sharper and cleaner result than temporal reuse alone.
Image after Spatial Reuse. No significant difference. Slightly well lit than before in some areas. Multiple passes of spatial reuse could help but at the cost of performance
Lighting
First, the color of a point from direct lighting is evaluated using the light chosen by the pixel's reservoir. If the reservoir is invalid or if the point is shadowed, direct lighting contribution is evaluated as zero. If not, diffuse and specular BRDFs for the surface are evaluated using Lambert and the microfacet models respectively based on the point's metalness and roughness, combined with the light's incoming radiance scaled with its weight in the reservoir produces the final lighting color. Any noise that is present is eliminated by a couple frames of accumulation.
Direct lighting only, after few frames of accumulation
The bulk of the thesis is direct lighting. To finish the path tracing loop and to keep things simple, I chose to implement a simple 3-6 bounce indirect lighting using importance sampling to pick a direction and weighted reservoir sampling to choose a light. Starting at the ray's hit position during direct lighting, a bounce direction is sampled from the surface's BRDF, biased towards specular or diffuse based on the surface material, and a ray is traced in that direction. At each bounce hit, a light is selected and evaluated the same way as direct lighting, with the contribution scaled by the remaining throughput at that bounce. Paths that carry too little energy are terminated early via Russian roulette, and any remaining noise is handled by accumulation across frames.
Direct and Indirect after a few frames of accumulation and denoiser enabled
Denoise
A joint bilateral filter is applied as a post process over the noisy render output to reduce noise. For each pixel, a weighted average is computed over a square neighborhood, where each neighbor's contribution is determined by three edge-stopping terms multiplied together: a Gaussian spatial falloff based on pixel distance, a position-based weight that rejects samples from geometrically distant surfaces, and a normal-based weight that prevents blurring across surface boundaries. The result is an image that has little to low noise but as a trade off, we lose some of the image's quality and is visibly blurry but not by a lot.