A professional-grade, high-performance 3D software rendering engine written in C. This project implements a full custom graphics pipeline from scratch, optimised for CPU-bound environments. It features volumetric extrusion, multithreaded rasterisation, shadow mapping, distance-based LOD, and a configurable projection system with supersampling anti-aliasing and a full cinematic post-processing stack.
./renderer -t 10000000 \
-sid 200 -seg 200\
-aces \
-cz -70 -lz -70 -ly 2 \
-color 240,20,30 \
-threads 8 -iw 7680 -ih 4320 \
-ow 1920 -oh 1080 -sw 4096 -sh 4096 -png -o stress.png./renderer \
-t 800000 \
-seg 80 -sid 60 \
-r 0.04 \
-scale 1.8 \
-ry 1.3 -rz 0.7 -rx 0.15 \
-as 0.0025 \
-cz -100 -cy 1.5 \
-fov 95 \
-focus 0,-2,-10 \
-lx 12 -ly 8 -lz -22 \
-bg 240,240,240 \
-fog -fogcolor 60,80,140 -fogdensity 0.15 \
-vignette 0.25 \
-aces \
-sw 1920 -sh 1080 \
-iw 3840 -ih 2160 \
-ow 1920 -oh 1080 \
-png \
-o cinematic_final.png./renderer \
-t 2000000 \
-seg 150 -sid 250 \
-r 0.04 \
-scale 1.8 \
-ry 1.3 -rz 0.7 -rx 0.15 \
-as 0.0025 \
-cz -75 -cy 1 \
-fov 150.0f \
-focus 0,-2,-10 \
-lx 12 -ly 8 -lz -22 \
-bg 250,250,180 -color 255,10,10 \
-fog -fogcolor 70,90,150 -fogdensity 0.12 \
-vignette 0.45 \
-aces \
-dof -focal 0.4 -aperture 3.5 \
-bloom -bloomthreshold 0.65 -bloomintensity 0.35 \
-sw 4096 -sh 4096 \
-threads 8 \
-iw 3840 -ih 2160 \
-ow 1920 -oh 1080 \
-png \
-o ultimate_test.pngThe system is designed as a modular pipeline where data flows from abstract mathematical definitions to a discrete pixel grid.
types.h: Defines the core data structures (Vec3,Mat4,AABB) and engine constants, with runtime-configurable resolutions and thread count.main.c: Command-line interface, system initialisation, and the final output supersampling/downscaling plus all post-processing effects (background, fog, ACES, vignette, DOF, bloom).geometry.c: Implementation of Cubic Bezier evaluation (de Casteljau) and stable reference frame generation.scene.c: Orchestrates thread management (pthreads) and high-level pass logic (Shadow Pass vs. Render Pass) with dynamic thread count.renderer.c: The rasterisation engine, managing scan-line filling, Gouraud shading, shadow depth-testing, and LOD distance culling.math.c: Linear algebra suite including matrix multiplication, vertex transformation, and frustum culling logic.
The engine does not store static meshes. Instead, it extrudes geometry along a Cubic Bezier Curve.
A Bezier curve is defined by four control points: bezier_eval calculates the 3D position at any time
Vec3 bezier_eval(BezierCubic b, float t){
float u = 1.0f - t;
// First level of interpolation
Vec3 a = { u*b.p0.x + t*b.p1.x, u*b.p0.y + t*b.p1.y, u*b.p0.z + t*b.p1.z };
Vec3 c = { u*b.p1.x + t*b.p2.x, u*b.p1.y + t*b.p2.y, u*b.p1.z + t*b.p2.z };
Vec3 d = { u*b.p2.x + t*b.p3.x, u*b.p2.y + t*b.p3.y, u*b.p2.z + t*b.p3.z };
// Second level
Vec3 e = { u*a.x + t*c.x, u*a.y + t*c.y, u*a.z + t*c.z };
Vec3 f = { u*c.x + t*d.x, u*c.y + t*d.y, u*c.z + t*d.z };
// Final point
return (Vec3){ u*e.x + t*f.x, u*e.y + t*f.y, u*e.z + t*f.z };
}To create a tube we also need an orientation. The bezier_tangent function computes the first derivative:
That tangent serves as the "forward" direction. By taking the cross product of the tangent and a chosen up-vector (with a fallback when they are parallel) we generate a stable local coordinate frame at every segment. A 2D circle of vertices is then rotated to be always perpendicular to the path, preventing the tube from becoming flat or twisted.
Because software rasterisation is computationally heavy, scene.c employs a data-parallel architecture.
The engine divides the total tube_count into -threads). Each thread processes its own chunk of tubes independently.
int per = scene.tube_count / num_threads;
for(int t = 0; t < num_threads; t++){
jobs[t].start = t * per;
jobs[t].end = (t == num_threads-1) ? scene.tube_count : (t+1)*per;
pthread_create(&threads[t], NULL, render_thread, &jobs[t]);
}- Shadow Pass - All threads render their tubes from the light's point of view into private depth buffers that are then merged into the global
shadow_map. - Render Pass - All threads render from the camera's perspective, performing depth-tests against the shared
zbufferand occlusion tests against theshadow_map.
To maintain interactive performance with millions of tubes, geometry outside the view frustum is discarded early.
Every tube is enclosed in an Axis-Aligned Bounding Box (AABB). The function aabb_in_frustum transforms the eight corners of this box into clip space using the View-Projection matrix. If at least one corner falls inside the canonical clip volume
int aabb_in_frustum(AABB box, Mat4 mvp){
Vec3 corners[8] = {
{box.min.x, box.min.y, box.min.z}, …
};
for(int i = 0; i < 8; i++){
Vec4 c = mat4_mul_vec4(mvp, (Vec4){corners[i].x, corners[i].y, corners[i].z, 1.0f});
if(c.w > 0){
float nx = c.x/c.w, ny = c.y/c.w, nz = c.z/c.w;
if(nx >= -1 && nx <= 1 && ny >= -1 && ny <= 1 && nz >= 0 && nz <= 1.1f)
return 1; // visible
}
}
return 0; // culled
}Once geometry is projected to screen space, renderer.c fills the resulting triangles with a scanline rasteriser.
The engine uses Gouraud shading: lighting intensity is computed per vertex and then linearly interpolated across the triangle. For every pixel the pipeline performs:
- Z-Buffer Test - Compare the pixel's depth against
zbuffer[y][x]. If farther, skip. - Shadow Test - The light-space coordinates of the pixel are likewise interpolated. The interpolated depth is compared to the value in the pre-computed
shadow_map; if the pixel lies deeper than the stored depth (with a small bias), it is in shadow and its intensity is reduced.
// Inside the scanline loop for each pixel
if(depth >= zrow[x]) continue;
float intensity = …, shadow_factor = 1.0f;
if(intensity > 0.35f){
// Interpolate light-space coordinates
LightCoord lc = { la.x + t*(lb.x - la.x), … };
if(lc.x >= 0.0f && lc.x <= 1.0f && lc.y >= 0.0f && lc.y <= 1.0f){
int sxi = (int)(lc.x * (shadow_w - 1));
int syi = (int)(lc.y * (shadow_h - 1));
if(lc.z > shadow_map[syi * shadow_w + sxi] + 0.001f)
intensity *= 0.25f; // in shadow
}
}
zrow[x] = depth;
row[x].r = (unsigned char)(intensity * cr);
// …In addition, a distance-based LOD system inside render_tube dynamically reduces the segment and side counts for tubes far from the camera, trading detail for speed while preserving visual quality.
The renderer draws to a high-resolution internal framebuffer (configurable via -iw/-ih). For each final output pixel, a small region of the high-res buffer is averaged using a box filter. This is equivalent to a 2x2 ordered-grid SSAA and removes jagged edges without blurring detail.
// Downsampling loop
for(int y = 0; y < out_h; y++){
for(int x = 0; x < out_w; x++){
float u = (x + 0.5f) / out_w, v = (y + 0.5f) / out_h;
float du = 0.5f / out_w, dv = 0.5f / out_h;
int sx0 = (int)((u-du)*render_width); if(sx0 < 0) sx0 = 0;
int sx1 = (int)((u+du)*render_width); if(sx1 >= render_width) sx1 = render_width-1;
int sy0 = (int)((v-dv)*render_height); if(sy0 < 0) sy0 = 0;
int sy1 = (int)((v+dv)*render_height); if(sy1 >= render_height) sy1 = render_height-1;
int r=0, g=0, b=0, count=0;
for(int sy = sy0; sy <= sy1; sy++)
for(int sx = sx0; sx <= sx1; sx++){
Pixel *p = &fb[sy * render_width + sx];
r += p->r; g += p->g; b += p->b; count++;
}
unsigned char px[3] = { r/count, g/count, b/count };
fwrite(px, 1, 3, outfile);
}
}The internal and output resolutions are independent, giving users full control over quality vs. performance. Output is supported in PPM (default) or PNG via a flag.
Raw linear RGB values can clip harshly in bright areas, losing detail in highlights and making the image look synthetic. The engine optionally applies an ACES filmic tone map, a curve that smoothly rolls off the highlights while preserving shadow detail, giving every render a cinematic, film-like quality.
The implementation uses the Narkowicz approximation to the ACES Reference Render Transform, a rational function of the form:
// ACES applied per-channel as a fast post-process
float r = pixel.r / 255.0f;
r = (r * (2.51f * r + 0.03f)) / (r * (2.43f * r + 0.59f) + 0.14f);
// clamp and convert back to 8-bit
pixel.r = (unsigned char)(fmaxf(0.0f, fminf(1.0f, r)) * 255.0f);The engine offers optional post-processing effects that are applied as fast per-pixel passes over the finished framebuffer.
- Background Fill: Replaces untouched pixels (sky) with a user-defined colour.
- Exponential Depth Fog: Simulates atmospheric scattering by blending tube colours toward a fog colour as depth increases, using
fog_factor = 1 - e^(-depth * density). Fog colour and density are configurable. - Vignette: Darkens the image corners with a soft radial gradient, drawing the viewer's eye to the centre. Strength is adjustable.
All three are toggleable independently via the CLI.
A configurable depth-of-field effect blurs objects that are far from the focal plane, simulating camera lens focus. The implementation uses a pre-blurred image pyramid (full, half, quarter resolution) and bilinear sampling to produce a smooth, physically plausible blur with minimal performance impact.
Bloom adds a soft glow around bright areas. The effect downsamples the framebuffer to half resolution, thresholds to isolate bright pixels, applies a separable Gaussian blur, and then up-samples and adds the result back to the original image. Threshold, intensity, and blur width are fully controllable.
-t <int>: Total number of tubes (default 100000).-seg <int>: Longitudinal segments per tube. Higher -> smoother curves.-sid <int>: Radial sides per tube.3= triangular,12+= smooth cylinder.-r <float>: Tube radius.-scale <float>: Global scale of the scene.
-p0 <x,y,z>: Origin point of the spline.-p1 <x,y,z>: First control point (influences curve exit).-p2 <x,y,z>: Second control point (influences curve entry).-p3 <x,y,z>: Destination point of the spline.
-rx, -ry, -rz <float>: Rotation multipliers (per-tube angle step x index).-rcx, -rcy, -rcz <float>: Constant rotation offsets.-as <float>: Angle step - increment applied to rotation per successive tube.-tx, -ty, -tz <float>: Global translation (world position offset).-mtx, -mty, -mtz <float>: Translation multipliers that scale the per-index step.-ts <float>: Translate step - linear offset per tube index, used with the multipliers.
-cx, -cy, -cz <float>: Camera position.-fov <float>: Field of View in degrees.-focus <x,y,z>: LookAt target (default 0,0,0).-lx, -ly, -lz <float>: Point light position (affects shading and shadows).
-rgb: Enable rainbow-cycling colours based on tube index.-color <r,g,b>: Set a static colour (e.g.255,128,0).-cycles <float>: Number of full hue cycles when using-rgb(default 1.0).-aces: Applies ACES filmic tone mapping.-bg <r,g,b>: Background colour for empty pixels.-fog: Enables exponential depth fog.-fogcolor <r,g,b>: Fog colour (default 180,200,255).-fogdensity <float>: Fog density (default 0.15, higher = thicker).-vignette [strength]: Enables vignette darkening with optional strength (default 0.4).-dof: Enables depth of field.-focal <float>: Focal depth in NDC (0..1, default 0.5).-aperture <float>: Blur amount for DOF (default 8.0).-bloom: Enables bloom (glow).-bloomthreshold <float>: Brightness threshold for bloom (0..1, default 0.7).-bloomintensity <float>: Bloom strength (default 0.4).-threads <int>: Number of rendering threads (default 8).-iw <int>: Internal render width (default 3840).-ih <int>: Internal render height (default 2160).-ow <int>: Output image width (default 1920).-oh <int>: Output image height (default 1080).-sw <int>: Shadow map width (default 1024).-sh <int>: Shadow map height (default 1024).-png: Output as PNG instead of PPM.-o <string>: Output filename (defaultoutput.ppm, oroutput.pngif-pngis used).
The renderer is written in standard C99 (compatible with C11) and uses POSIX threads.
You can build it with either Make (Linux/macOS/MinGW) or CMake (all platforms including MSVC).
A Makefile is included for users who prefer a traditional build.
# Build (use -j$(nproc) for parallel compilation)
make -j$(nproc)
# Clean build artifacts
make clean
# (Optional) Generate compile_commands.json for LSP/IDE support
bear -- makeCompile flags used by the Makefile:
| Flag | Purpose |
|---|---|
-O3 |
Maximum optimisation |
-march=native |
Enable all CPU instruction‑set extensions |
-ffast-math |
Relax IEEE float compliance for speed |
-funroll-loops |
Unroll small loops |
-flto |
Link‑time optimisation |
-pthread |
Link POSIX threads |
A CMakeLists.txt is provided for a modern, cross‑platform build.
It automatically handles compiler flags, threading, and maths linking.
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build buildFirst install pthreads via vcpkg (one‑time):
git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
.\bootstrap-vcpkg.bat
.\vcpkg integrate install
.\vcpkg install pthreads:x64-windowsThen build (replace [vcpkg-root] with your actual vcpkg path):
cmake -B build -DCMAKE_BUILD_TYPE=Release `
-DCMAKE_TOOLCHAIN_FILE=[vcpkg-root]\scripts\buildsystems\vcpkg.cmake
cmake --build build --config ReleaseThe executable will be build/TubeRenderer (or build/Release/TubeRenderer.exe on Windows).
./build/TubeRenderer -ow 1920 -oh 1080 -png -o output.png(On Windows use .\build\Release\TubeRenderer.exe.)
- Thread count is auto‑detected; you can override it with
-threads <N>. - Pthreads is built‑in on Unix and supplied by vcpkg on Windows – no extra steps required.
- The CMake build applies the same high‑performance flags as the Makefile when possible, adapting them for MSVC automatically.
If you want the entire README restructured or additional sections updated (dependencies, features, etc.), I can do that too.
## License
This source code is provided as an open-source reference for high-performance software rendering techniques.







