what kind of noise to use with marching cubes terrain

GPU Gems 3

GPU Gems iii is now available for free online!

The CD content, including demos and content, is available on the web and for download.

Y'all tin can also subscribe to our Programmer News Feed to become notifications of new material on the site.

Chapter 1. Generating Complex Procedural Terrains Using the GPU

Ryan Geiss
NVIDIA Corporation

1.i Introduction

Procedural terrains accept traditionally been express to peak fields that are generated past the CPU and rendered past the GPU. Yet, the serial processing nature of the CPU is not well suited to generating extremely complex terrains—a highly parallel task. Plus, the simple tiptop fields that the CPU can procedure exercise not offering interesting terrain features (such as caves or overhangs).

To generate procedural terrains with a high level of complexity, at interactive frame rates, nosotros look to the GPU. By utilizing several new DirectX 10 capabilities such as the geometry shader (GS), stream output, and rendering to 3D textures, we can use the GPU to apace generate big blocks of circuitous procedural terrain. Together, these blocks create a large, detailed polygonal mesh that represents the terrain inside the current view frustum. Figure 1-1 shows an instance.

one.ii Marching Cubes and the Density Part

Conceptually, the terrain surface tin can be completely described by a single function, chosen the density function. For whatsoever bespeak in 3D space (x, y, z), the part produces a single floating-point value. These values vary over space—sometimes positive, sometimes negative. If the value is positive, then that bespeak in space is inside the solid terrain.

If the value is negative, then that bespeak is located in empty infinite (such as air or water). The boundary between positive and negative values—where the density value is zero—is the surface of the terrain. It is forth this surface that nosotros wish to construct a polygonal mesh.

We use the GPU to generate polygons for a "cake" of terrain at a time, but we further subdivide the block into 32x32x32 smaller cells, or voxels. Figure 1-2 illustrates the coordinate arrangement. It is within these voxels that we will construct polygons (triangles) that represent the terrain surface. The marching cubes algorithm allows u.s. to generate the correct polygons within a single voxel, given, every bit input, the density value at its eight corners. Equally output, it will produce anywhere from zero to five polygons. If the densities at the 8 corners of a jail cell all have the aforementioned sign, then the cell is entirely inside or outside the terrain, so no polygons are output. In all other cases, the cell lies on the purlieus between rock and air, and anywhere from one to 5 polygons will exist generated.

Figure ane-2 The Coordinate System Used for Voxel Infinite

1.ii.1 Generating Polygons Within a Cell

The generation of polygons within a cell works equally follows: As shown in Effigy ane-3, we accept the density values at the 8 corners and decide whether each value is positive or negative. From each ane we make a bit. If the density is negative, we prepare the bit to zero; if the density is positive, we set the bit to i.

Figure 1-three A Single Voxel with Known Density Values at Its Viii Corners

We then logically concatenate (with a bitwise OR operation) these eight bits to produce a byte—besides chosen the instance—in the range 0–255. If the case is 0 or 255, then the cell is entirely inside or exterior the terrain and, as previously described, no polygons volition be generated. However, if the example is in the range [1..254], some number of polygons will be generated.

If the case is not 0 or 255, it is used to alphabetize into various lookup tables (on the GPU, constant buffers are used) to decide how many polygons to output for that case, as well as how to build them. Each polygon is created by connecting three points (vertices) that lie somewhere on the 12 edges of the jail cell. Figure 1-4 illustrates the basic cases resulting from application of the marching cubes algorithm.

Effigy 1-4 The 14 Cardinal Cases for Marching Cubes

Exactly where a vertex is placed along an edge is determined by interpolation. The vertex should be placed where the density value is approximately nix. For example, if the density at stop A of the edge is 0.i and at end B is -0.3, the vertex would be placed 25 per centum of the way from A to B.

Effigy 1-v illustrates one case. After the case is used to alphabetize into lookup tables, the blue dots signal which edges must take vertices placed on them. Gray areas show how these vertices volition be continued to brand triangles. Annotation that where the bluish dots actually appear along the edges depends on the density values at the ends of the edge.

Our output is a triangle listing, so every three vertices that are output create a triangle, and the next vertex output begins a new triangle. If a sure case requires that nosotros generate N polygons, we will demand to generate a vertex (somewhere along 1 of the cell'southward edges) 3xDue north times.

1.2.ii Lookup Tables

Two primary lookup tables are at work here. The first, when indexed by the example number, tells u.s. how many polygons to create for that case:

          int case_to_numpolys[256];

The 2d lookup table is much larger. Once it receives the example number, the table provides the information needed to build up to 5 triangles within the jail cell. Each of the v triangles is described past simply an int3 value (three integers); the three values are the edge numbers [0..xi] on the cube that must be connected in guild to build the triangle. Effigy ane-half-dozen shows the edge-numbering scheme.

          int3 edge_connect_list[256][5];

Figure ane-6 The Edge Numbers Assigned to the 12 Edges of a Voxel

For example, if the case number is 193, then looking up case_to_numpolys[193] would tell the states how many polygons we need to generate, which is iii. Next, the edge_connect_list[193][] lookups would return the following values:

          int3 edge_connect_list[193][0]:  xi  5 10    int3 edge_connect_list[193][i]:  11  vii  five    int3 edge_connect_list[193][ii]:   8  three  0    int3 edge_connect_list[193][three]:  -one -ane -one    int3 edge_connect_list[193][iv]:  -1 -i -1

To build the triangles inside this prison cell, a geometry shader would generate and stream out nine vertices (at the appropriate places along the edges listed)—forming three triangles—to a vertex buffer for storage. Note that the terminal two int3 values are -ane; these values will never even be sampled, though, because we know there are only iii triangles for this example. The GPU would so movement on to the adjacent jail cell.

We encourage yous to copy the lookup tables from the demo on this book's accompanying DVD, considering generating the tables from scratch can be time-consuming. The tables tin be constitute in the file models\tables.nma.

1.3 An Overview of the Terrain Generation Arrangement

We divide the world into an infinite number of as sized cubic blocks, every bit already described. In the world-infinite coordinate organization, each cake is 1x1x1 in size. However, within each block are 32³ voxels that potentially comprise polygons. A pool of effectually 300 vertex buffers are dynamically assigned to the blocks currently visible in the view frustum, with college priority given to the closest blocks. As new blocks come up into the view frustum (as the user moves effectually), the farthest away or newly view-cullable vertex buffers are evicted and reused for the newly wished-for blocks.

Not all blocks contain polygons. Whether they exercise or not depends on circuitous calculations, so ordinarily we won't know if they incorporate polygons until nosotros effort to generate the blocks. Every bit each cake is generated, a stream-out query asks the GPU if any polygons were actually created. Blocks that don't produce polygons—this is common—are flagged as "empty" and put into a list so they won't exist uselessly regenerated. This move likewise prevents those empty blocks from unnecessarily occupying a vertex buffer.

For each frame, we sort all the vertex buffers (their bounding boxes are well known) from front to back. We then generate any new blocks that are needed, evicting the near distant block whenever nosotros need a free vertex buffer. Finally, we render the sorted blocks from front to back so that the GPU doesn't waste fourth dimension shading pixels that might exist occluded by other parts of the terrain.

1.3.1 Generating the Polygons Within a Block of Terrain

Conceptually, generating a cake of terrain involves two master steps. We outline the steps here and then elaborate upon them in the post-obit subsections.

Kickoff, we employ the GPU's pixel shader (PS) unit to evaluate the complex density function at every cell corner inside the block and shop the results in a large 3D texture. The blocks are generated one at a fourth dimension, so i 3D texture can exist shared universally. However, because the texture stores the density values at the cell corners, the texture is 33x33x33 in size, rather than 32x32x32 (the number of cells in the block).
Adjacent, we visit each voxel and generate actual polygons within information technology, if necessary. The polygons are streamed out to a vertex buffer, where they tin can be kept and repeatedly rendered to the screen until they are no longer visible.

1.3.2 Generating the Density Values

Rendering to a 3D texture is a somewhat new idea, and it's worthy of some explanation here. On the GPU, a 3D texture is implemented as an array of 2D textures. To run a PS that writes to every pixel in a piece, we draw two triangles that, together, cover the render portal. To cover all the slices, we use instancing. In DirectX, this means nothing more than calling ID3D10Device::DrawInstanced() (rather than the ordinary Draw() office) with the numInstances parameter set to 33. This procedure effectively draws the triangle pair 33 times.

The vertex shader (VS) knows which instance is being drawn by specifying an input attribute using the SV_InstanceID semantic; these values volition range from 0 to 32, depending on which case is being drawn. The VS can then laissez passer this value on to the geometry shader, which writes it out to an attribute with the SV_RenderTarget - ArrayIndex semantic. This semantic determines to which slice of the 3D texture (the render target array) the triangle really gets rasterized. In this way, the PS is run on every pixel in the entire 3D texture. On the book'southward DVD, see shaders\1b_build_density_vol.vsh and .gsh.

Conceptually, the PS that shades these triangles takes, every bit input, a world-space coordinate and writes, as output, a unmarried, floating-betoken density value. The math that converts the input into the output is our density function.

1.three.3 Making an Interesting Density Office

The sole input to the density part is this:

          float3 ws;

This value is the world-space coordinate. Luckily, shaders give us plenty of useful tools to translate this value into an interesting density value. Some of the tools at our disposal include the post-obit:

Sampling from source textures, such as 1D, 2d, 3D, and cube maps
Constant buffers, such equally lookup tables
Mathematical functions, such as cos(), sin(), pw(), exp(), frac(), floor(), and arithmetic

For example, a good starting bespeak is to place a footing plane at y = 0:

          float density = -ws.y;

This divides the world into positive values, those beneath the y = 0 plane (allow's call that earth), and negative values, those to a higher place the plane (which we'll call air). A good start! Figure 1-7 shows the event.

Adjacent, let's brand the footing more interesting by adding a bit of randomness, as shown in Figure ane-8. We simply use the world-space coordinate (ws) to sample from a minor (16³) repeating 3D texture full of random ("noise") values in the range [-1..1], as follows:

          density += noiseVol.Sample(TrilinearRepeat, ws).x;

Figure one-eight shows how the footing plane warps when a single octave of noise is added.

Annotation that we can scale ws prior to the texture lookup to modify the frequency (how quickly the dissonance varies over space). We can too scale the result of the lookup earlier adding it to the density value; scaling changes the amplitude, or strength, of the noise. To generate the prototype in Figure 1-9, the following line of shader code uses a dissonance "function" with twice the frequency and half the amplitude of the previous example:

          density += noiseVol.Sample(TrilinearRepeat, ws*2).x*0.5;

Figure i-9 One Octave of Dissonance with Twice the Frequency but One-half the Amplitude of

I octave (sample) of dissonance isn't that interesting; using three octaves is an comeback, as Figure 1-10 shows. To be optimal, the amplitude of each octave should be half that of the previous octave, and the frequency should be roughly double the frequency of the previous octave. It'southward important not to brand the frequency exactly double, though. The interference of two overlapping, repeating signals at slightly different frequencies is beneficial here because it helps break up repetition. Note that we also use three different dissonance volumes.

Figure 1-10 Three Octaves at High Frequency Generate More than Details

          density += noiseVol1.Sample(TrilinearRepeat, ws*4.03).x*0.25; density += noiseVol2.Sample(TrilinearRepeat, ws*1.96).10*0.50; density += noiseVol3.Sample(TrilinearRepeat, ws*1.01).x*one.00;

If more octaves of racket are added at progressively lower frequencies (and higher amplitudes), larger terrain structures begin to emerge, such as big mountains and trenches. In practice, we need about 9 octaves of dissonance to create a world that is rich in both of these depression-frequency features (such as mountains and canyons), but that besides retains interesting high-frequency features (random detail visible at close range). See Figure i-11.

Figure 1-11 Adding Lower Frequencies at Higher Aamplitude Creates Mountains

Sampling Tips

It'south worth going over a few details apropos sampling the diverse octaves. First, the earth-space coordinate tin can exist used, without modification, to sample the seven or eight highest-frequency octaves. Still, for the lowest octave or two, the world-space coordinate should first be slightly rotated (past 3x3 rotation matrices) to reduce repetition of the most salient features of the terrain. Information technology's also wise to reuse iii or iv noise textures amongst your nine octaves to improve cache coherency; this reuse will not be noticeable at significantly different scales.

Finally, precision can brainstorm to break downwards when nosotros sample extremely depression frequency octaves of noise. Error begins to show up as (unwanted) loftier-frequency dissonance. To work around this, we manually implement trilinear interpolation when sampling the very lowest octave or two, using full floating-point precision. For more than data on how to do this, see the comments and lawmaking in shaders\density.h.

Using many octaves of racket creates detail that is isotropic (equal in all directions), which sometimes looks a bit too regular. Ane way to break up this homogeneity is to warp the world-space coordinate past another (low-frequency) racket lookup, before using the coordinate for the nine noise lookups. At a medium frequency and balmy amplitude, the warp creates a surreal, ropey, organic-looking terrain, every bit shown in Effigy 1-12. At a lower frequency and higher amplitude, the warp can increase the occurrence of caves, tunnels, and arches. The effects can easily exist combined, of course, past summing two octaves.

          // Do this before using 'ws' to sample the nine octaves! float3 warp = noiseVol2.Sample( TrilinearRepeat, ws*0.004 ).xyz; ws += warp * 8;

Effigy 1-12 Warping the World-Infinite Coordinate Creates a Surreal-Looking Terrain

Nosotros can besides introduce a hard "floor" to the scene past very suddenly boosting the density values below a certain y coordinate. To some degree, this mimics sediment deposition in nature, every bit eroded sediment finds its way to lower areas and fills them in. Visually, it makes the scene experience less aquatic and more than landlike, every bit shown in Figure 1-13.

          float hard_floor_y = -13; density += saturate((hard_floor_y - ws_orig.y)*3)*40;

Many other interesting effects can be accomplished by adjusting only the density value by the world-space y coordinate. We can make the adjustment past using either the prewarp or the post-warp world-infinite coordinate. If nosotros use the post-warp, and then nether its influence whatever furnishings that expect similar shelves or terraces volition appear to cook and sag, along with the rest of the terrain. Figures ane-fourteen and 1-15 bear witness 2 examples.

These are very basic techniques. Myriad effects are possible, of class, and can exist hands prototyped by modifying shaders\density.h.

1.3.iv Customizing the Terrain

All of the techniques nosotros have discussed so far contribute to producing a prissy, organic-looking terrain. Notwithstanding, for practical employ, nosotros must exist able to domesticate the shape of the terrain. There are many ways to do this.

Use a Hand-Painted 2d Texture

Stretch a manus-painted second texture over a very large expanse (one that would take, say, ten minutes to "walk across" in your demo or game). The density part could sample this 2D texture using ws.xz and use the result to drive the eight noise lookups. The crimson aqueduct of the texture could influence the scaling of ws.y before using it for the lookups, resulting in the noise appearing vertically squished in some areas or stretched in others. The green channel could modulate the amplitudes of the higher-frequency octaves of noise, causing the terrain to look rocky in some places and smoother in others. And the blue aqueduct could even attune the warp effect discussed earlier, and so that some areas look more than "alien" than others.

Add Manually Controlled Influences

Information technology's also possible to add manually controlled influences to the density field. And then, if your game level needs a apartment spot for a ship to state, as in Figure ane-xvi, you lot could laissez passer data to describe it to the pixel shader in a constant buffer. The shader lawmaking might then await something like the following:

          float distance_from_flat_spot = length(ws.xz – flat_spot_xz_coord); float flatten_amount = saturate(outer_radius – distance_from_flat_spot)/                        (outer_radius – inner_radius) ) * 0.9; density = lerp(density, ws.y – flat_spot_height, flatten_amount);

Figure one-16 This "Man-Fabricated" Apartment Spot Creates an Ideal Landing Place for Shipping

Here, the density part will exist xc percent replaced by the "flat spot" function inside inner_radius globe-space units of its eye; nonetheless, by a altitude of outer_radius, the influence drops off to zero (saturate() clamps a value to the 0..one range). In improver, many apartment spots could be used across your terrain—at varying heights, weights, and radii of influence. These flat spots are obtained for the cost of just one flat spot, as long as there is enough distance betwixt them that an individual cake of terrain volition simply be affected by, at most, one flat spot at a time. (Of course, dynamic looping could too be used if you need more than one per block.) The application is responsible for updating the constant buffer with the relevant data whenever a cake is to be generated.

Add Miscellaneous Furnishings

Other effects are possible by combining all the previous techniques—from caves, to cliffs, to painted maps of rivers. If you tin imagine it, you can probably code it. Want a spherical planet, as in Figure 1-17? Instead of using the y plane as a ground, try this (plus dissonance):

          float rad = 80; float density = rad - length(ws – float3(0, -rad, 0));

Or, for a never-ending 3D network of caves, try this (plus racket):

          //This positive starting bias gives us a piddling more rock than open space. bladder density = 12;

The density role used in the demo can be modified past editing shaders\density.h, which is included by several other shaders in the demo. If you'd like to make changes in real time, do this: Run the demo in windowed manner (see bin\args.txt), edit the density function to your liking, and then press F9 (Reload Shaders) followed by "Z" (Regenerate Terrain).

i.4 Generating the Polygons Inside a Block of Terrain

At that place are many ways to break up the chore of generating a cake of terrain on the GPU. In the simplest approach, nosotros generate density values throughout a 3D texture (representing the corners of all the voxels in the block) in 1 render pass. We then run a 2nd render pass, where we visit every voxel in the density volume and use the GS to generate (and stream out to a vertex buffer) anywhere from 0 to xv vertices in each voxel. The vertices will be interpreted equally a triangle list, so every 3 vertices make a triangle.

For now, allow's focus on what nosotros need to exercise to generate just one of the vertices. There are several pieces of information nosotros'd like to know and store for each vertex:

The earth-infinite coordinate
The globe-space normal vector (used for lighting)
An "ambient occlusion" lighting value

These data can be easily represented by the seven floats in the following layout. Note that the ambient occlusion lighting value is packed into the .w channel of the first float4.

          struct rock_vertex {    float4 wsCoordAmbo;    float3 wsNormal; };

The normal can exist computed easily, by taking the gradient of the density function (the partial derivative, or independent charge per unit of change, in the 10, y, and z directions) and and then normalizing the resulting vector. This is easily accomplished by sampling the density book six times. To determine the rate of change in x, nosotros sample the density volume at the next texel in the +x direction, and then once again at the next texel in the -10 direction, and take the divergence; this is the rate of change in ten. Nosotros repeat this calculation in the y and z directions, for a total of half dozen samples. The iii results are put together in a float3, and then normalized, producing a very high quality surface normal that tin subsequently be used for lighting. Listing 1-ane shows the shader lawmaking.

Example one-ane. Computing the Normal via a Gradient

          float d = ane.0/(float)voxels_per_block; float3 grad; grad.x = density_vol.Sample(TrilinearClamp, uvw + float3( d, 0, 0)) -          density_vol.Sample(TrilinearClamp, uvw + float3(-d, 0, 0)); grad.y = density_vol.Sample(TrilinearClamp, uvw + float3( 0, d, 0)) -          density_vol.Sample(TrilinearClamp, uvw + float3( 0,-d, 0)); grad.z = density_vol.Sample(TrilinearClamp, uvw + float3( 0, 0, d)) -          density_vol.Sample(TrilinearClamp, uvw + float3( 0, 0,-d)); output.wsNormal = -normalize(grad);

The ambient occlusion lighting value represents how much light, in general, is likely to achieve the vertex, based on the surrounding geometry. This value is responsible for darkening the vertices that lie deep within nooks, crannies, and trenches, where less light would be able to penetrate. Conceptually, we could generate this value by first placing a large, uniform sphere of ambient light that shines on the vertex. Then we trace rays inward to meet what fraction of the vertices could actually reach the vertex without colliding with other parts of the terrain, or we could think of it as casting many rays out from the vertex and tracking the fraction of rays that can get to a certain distance without penetrating the terrain. The latter variant is the method our terrain demo uses.

To compute an ambience occlusion value for a betoken in infinite, we cast out 32 rays. A constant Poisson distribution of points on the surface of a sphere works well for this. We store these points in a constant buffer. We tin can—and should—reuse the same set up of rays over and over for each vertex for which we want ambient apoplexy. (Notation: You tin use our Poisson distribution instead of generating your own; search for "g_ray_dirs_32" in models\tables.nma on the book's DVD.) For each of the rays bandage, we take 16 samples of the density value along the ray—once more, just by sampling the density volume. If whatsoever of those samples yields a positive value, the ray has hit the terrain and nosotros consider the ray fully blocked. Once all 32 rays are cast, the fraction of them that were blocked—commonly from 0.5 to one—becomes the ambient occlusion value. (Few vertices take ambient occlusion values less than 0.5, because most rays traveling in the hemisphere toward the terrain volition quickly be occluded.)

Afterwards, when the rock is drawn, the lighting will exist computed as usual, just the final calorie-free amount (diffuse and specular) will be modulated based on this value before we use it to the surface color. We recommend multiplying the low-cal by saturate(ane – 2*ambient_occlusion), which translates an occlusion value of 0.five into a calorie-free multiplier of ane, and an occlusion value of 1 to a light multiplier of 0. The multiplier tin also be run through a pow() part to artistically influence the falloff rate.

1.iv.1 Margin Information

Y'all might detect, at this signal, that some of the occlusion-testing rays go outside the current cake of known density values, yielding bad information. This scenario would create lighting artifacts where two blocks meet. Nevertheless, this is easily solved by enlarging our density volume slightly and using the extra space to generate density values a fleck beyond the borders of our block. The block might be divided into 32^three voxels for tessellation, but we might generate density values for, say, a 44^three density volume, where the extra "margin" voxels stand for density values that are really physically outside of our 32^three block. Now we tin can cast occlusion rays a little farther through our density book and get more than-accurate results. The results still might not be perfect, simply in practice, this ratio (32 voxels versus half-dozen voxels of margin data at each edge) produces squeamish results without noticeable lighting artifacts. Proceed in mind that these dimensions represent the number of voxels in a block; the density volume (which corresponds to the voxel corners) volition contain ane more element in each dimension.

Unfortunately, casting such brusque rays fails to respect large, low-frequency terrain features, such as the concealment that should happen inside big gorges or holes. To account for these low-frequency features, we also take a few samples of the existent density function along each ray, but at a longer range—intentionally exterior the current block. Sampling the real density part is much more computationally expensive, but fortunately, we demand to perform sampling only near four times for each ray to get good results. To lighten some of the processing load, we can as well employ a "lightweight" version of the density function. This version ignores the higher-frequency octaves of noise considering they don't matter and then much beyond big ranges. In practice, with 8 octaves of noise, it's safe to ignore the three highest-frequency octaves.

The block of pseudocode shown in Listing 1-2 illustrates how to generate ambience occlusion for a vertex.

Note the employ of saturate(d * 9999), which lets any positive sample, even a tiny one, completely block the ray. However, values deep within the terrain tend to take progressively higher density values, and values farther from the terrain surface do tend to progressively get more negative. Although the density office is not strictly a signed distance function, it often resembles one, and we take advantage of that here.

Example 1-2. Pseudocode for Generating Ambient Occlusion for a Vertex

          float visibility = 0; for (ray = 0 .. 31) {   float3 dir = ray_dir[ray];   // From constant buffer   float this_ray_visibility = 1;   // Brusk-range samples from density volume:   for (stride = 1 .. sixteen)   // Don't showtime at zero   {     bladder d = density_vol.Sample( ws + dir * pace );     this_ray_visibility *= saturate(d * 9999);   }   // Long-range samples from density office:   for (step = 1 .. 4)  // Don't commencement at aught!   {     float d = density_function( ws + dir * big_step );     this_ray_visibility *= saturate(d * 9999);   }   visibility += this_ray_visibility; } return (1 – visibility/32.0); // Returns apoplexy

During ray casting, instead of strictly interpreting each sample equally black or white (hit or miss), we permit things to go "fuzzy." A fractional apoplexy happens when the sample is near the surface (or not also deep into the surface). In the demo on the volume's DVD, we use a multiplier of eight (rather than 9999) for brusque-range samples, and nosotros use 0.v for long-range samples. (Note that these values are relative to the range of values that are output by your particular density function). These lower multipliers are especially beneficial for the long-range samples; information technology becomes difficult to tell that there are simply 4 samples being taken. Figures 1-18 through 1-twenty bear witness some examples.

Effigy one-xix Both Long-Range and Brusque-Range Ambience Apoplexy

Figure 1-20 The Regular Scene, Shaded Using Ambient Occlusion

1.iv.ii Generating a Block: Method 1

This section outlines iii methods for edifice a cake. Every bit nosotros progress from method 1 to method three, the techniques get successively more circuitous, merely faster.

The start (and simplest) method for building a block of terrain is the near straight forward, and requires but 2 render passes, as shown in Table 1-ane.

Table 1-1. Method 1 for Generating a Block

Pass Proper name	Clarification	Geometry Shader Output Struct
`build_densities`	Fill density volume with density values.	Northward/A
`gen_vertices`	Visit each (nonmargin) voxel in the density volume. The geometry shader generates and streams out upwards to 15 vertices (5 triangles) per voxel.	float4 wsCoordAmbo; float3 wsNormal; Count: 0/iii/6/9/12/fifteen

Pass Proper name

Clarification

Geometry Shader Output Struct

build_densities

Fill density volume with density values.

Northward/A

gen_vertices

Visit each (nonmargin) voxel in the density volume.

The geometry shader generates and streams out upwards to 15 vertices (5 triangles) per voxel.

                  float4 wsCoordAmbo; float3 wsNormal;

Count: 0/iii/6/9/12/fifteen

Even so, this method is easily optimized. Outset, the execution speed of a geometry shader tends to subtract every bit the maximum size of its output (per input archaic) increases. Here, our maximum output is 15 vertices, each consisting of 7 floats—for a whopping 105 floats. If we could reduce the floats to 32 or less—or even 16 or less—the GS would run a lot faster.

Another factor to consider is that a GS is not as fast as a VS because of the geometry shader's increased flexibility and stream-out capability. Moving most of the vertex generation work, especially the ambient apoplexy ray casting, into a vertex shader would be worthwhile. Fortunately, we can accomplish this, and reduce our GS output size, by introducing an extra render pass.

ane.4.3 Generating a Block: Method ii

The problems described in method ane—extremely big geometry shader output (per input archaic) and the need to migrate work from the geometry shaders to the vertex shaders—are resolved past this design, shown in Table ane-ii, which is an impressive 22 times faster than method one.

Table one-two. Method two for Generating a Block

Pass Name	Description	Geometry Shader Output Struct
`build_densities`	Fill density volume with density values.	N/A
`list_triangles`	Visit each voxel in the density volume; stream out a lightweight mark signal for each triangle to exist generated. Utilise a stream-out query to skip remaining passes if no output hither.	`uint z6_y6_x6_edge1_edge2_edge3;` Count: 0–five
`gen_vertices`	March through the triangle list, using the vertex shader to do well-nigh of the work for generating the vertex. The geometry shader is a pass-through, merely streaming the result out to a buffer.	float4 wsCoordAmbo; float3 wsNormal; Count: 3

Here, the gen_vertices pass has been broken into list_triangles and gen_vertices. The list_triangles pass has much smaller maximum output; information technology outputs, at most, 5 marker points. Each point represents a triangle that will be fleshed out subsequently, but for now, it'southward only a single uint in size (an unsigned integer—the same size as a float). Our maximum output size has gone from 105 to v, so the geometry shader will execute much faster now.

The crucial data for generating each triangle is packed into the uint:

          struct triangle_marker_point {       uint z6_y6_x6_edge1_edge2_edge3; };

Half dozen integer values are packed into this one uint, which tells united states everything we need to build a triangle within this voxel. The x, y, and z bit fields (6 bits each, or [0..31]) signal which voxel, within the electric current block, should contain the generated triangle. And the iii border fields (each iv bits) indicate the edge [0..11] along which the vertex should be placed. This information, plus admission to the density book, is all the vertex shader in the terminal pass needs to generate the three vertices that make upward the triangle. In that final pass, all three vertices are generated in a unmarried execution of the vertex shader and then passed to the geometry shader together, in a large construction, like this:

          struct v2gConnector {      float4 wsCoordAmbo1;      float3 wsNormal1;      float4 wsCoordAmbo2;      float3 wsNormal2;      float4 wsCoordAmbo3;      float3 wsNormal3; };

The GS then writes out three separate vertices from this ane large structure. This activity produces a triangle listing identical to what method i produced, but much more than quickly.

Adding another return laissez passer is helpful because it lets the states skip the final (and most expensive) pass if nosotros observe that there are no triangles in the block. The examination to determine if any triangles were generated but involves surrounding the list_triangles pass with a stream output query (ID3D10Query with D3D10_QUERY_SO_STATISTICS), which returns the number of primitives streamed out. This is some other reason why we run into such a huge speed boost between methods ane and 2.

Method ii is faster and introduces the useful concept of calculation a new render laissez passer to migrate heavy GS work into the VS. Even so, method two has one major flaw: information technology generates each final vertex once for each triangle that uses it. A vertex is normally shared by an average of about five triangles, so we're doing v times more work than nosotros need to.

one.four.4 Generating a Block: Method iii

This method generates each vertex once, rather than an average of v times, as in the previous methods. Despite having more render passes, method 3 is withal virtually 80 percent faster than method 2. Method 3, instead of producing a unproblematic, nonindexed triangle listing in the form of many vertices (many of them redundant), creates a vertex puddle and a separate index list. The indices in the listing are now references to the vertices in the vertex pool, and every iii indices announce a triangle.

To produce each vertex just once, we limit vertex production within a cell to only edges 3, 0, and 8. A vertex on any other edge will be produced by another jail cell—the one in which the vertex, conveniently, does autumn on edge iii, 0, or eight. This successfully produces all the needed vertices, simply once.

Inside a cell, knowing whether a vertex is needed forth a particular border (3, 0, or 8) is equally unproblematic as checking if the instance number bits are different at the two corners that the border connects. Meet shaders\4_list_vertices_to_generate.vsh for an example of how to do this, just note that it could also be done easily using a lookup table.

The render passes to generate vertices uniquely are shown in Tabular array ane-3.

Table 1-3. Method iii for Generating a Block

Pass Name	Description	Geometry Shader Output Struct
`build_densities`	Fill density volume with density values.	N/A
`list_nonempty_cells`	Visit each voxel in the density volume; stream out a lightweight marker signal for each voxel that needs triangles in it. Employ a stream-out query to skip remaining passes if no output hither.	`uint z8_y8_x8_case8;` Count: 0–1 Note: Includes an actress row of voxels at the far end of x, y, and z.
`list_verts_to_generate`	Marches `nonempty_cell_list` and looks only at edges 3, 0, and viii; streams out a marker indicate for each one that needs a vertex on it.	`uint z8_y8_x8_null4_edge4;` Count: 0–3
`gen_vertices`	Marches `vert_list` and generates the final (real) vertices. VS does the bulk of the piece of work to generate the real vertex; GS just streams information technology out.	`float4 wsCoordAmbo; float3 wsNormal;` Count: one

The previous passes generate our vertices without redundancy, only we still need to generate our index list. This is the nigh hard concept in method iii. To make it work, we'll need a temporary volume texture, VertexIDVol. This is a 32^three book texture of the format DXGI_FORMAT_R32_UINT, and then each voxel can hold a unmarried uint.

The problem is that when generating the indices for a given (nonempty) prison cell, we have no idea where the indices are in the vertex buffer, that magical storage structure where all those streamed-out vertices have been consolidated for us. (Remember, only a minor fraction of cells really generate vertices.) The 3D texture is our solution; we employ information technology to splat the vertex ID (or alphabetize within the vertex buffer) of each vertex into a structure that we can look up into randomly. Sampling this volume tin can then act as a office that takes, as input, a 3D location (within the cake) and provides, as output, the vertex ID (or index in the vertex buffer) of the vertex that was generated in that location. This is the missing information we need to be able to generate an index listing to connect our vertices into triangles. The two extra render passes are described in Tabular array 1-iv.

Table 1-4. Method three for Generating the Index Buffer

Pass Name	Description	Geometry Shader Output Struct
`splat_vertex_ids`	Marches `vert_list` and splats each i'due south `SV_VertexID` to `VertexIDVol`.	(No stream output; pixel shader simply writes out `SV_VertexID`.)
`gen_indices`	Marches `nonempty_cell_list` and streams out up to fifteen uints per cell—the indices to brand upwards to five triangles. Samples `VertexIDVol` to get this data.	`uint alphabetize;` Count: xv Note: Practise not output any indices for cells in whatsoever final row (in x/y/z).

Pass Name

Description

Geometry Shader Output Struct

splat_vertex_ids

Marches vert_list and splats each i'due south SV_VertexID to VertexIDVol.

(No stream output; pixel shader simply writes out SV_VertexID.)

gen_indices

Marches nonempty_cell_list and streams out up to fifteen uints per cell—the indices to brand upwards to five triangles. Samples VertexIDVol to get this data.

uint alphabetize;

Count: xv

Note: Practise not output any indices for cells in whatsoever final row (in x/y/z).

The splat is accomplished by cartoon a unmarried signal into a voxel of the 3D texture. The xy coordinates are taken from the block-infinite xy coordinate, and using the block-space z coordinate, the GS routes the point to the right slice in the 3D texture. The value written out in the PS is the value that came into the VS as SV_VertexID, which is a system-generated value indicating the zero-based index of the vertex in our vert_list vertex buffer. Note that information technology'due south not necessary to articulate the VertexIDVol prior to splatting.

When sampling the volume to fetch a vertex ID, note that if you need the vertex ID for a vertex on an border other than three, 0, or eight, you will instead have to sample the appropriate neighbour voxel forth edge iii, 0, or 8. The mechanics for this actually turn out to be niggling; encounter shaders\7_gen_indices.gsh on the DVD.

You might discover that we would often be splatting upwardly to three vertices per voxel, but we're only writing to a one-channel texture! Thus, if a unmarried cell had vertices on edges iii and 8, the VertexIDVol could hold the index for only one of them. The easy solution is to triple the width of the vertex ID volume (making it 3NxDue northtenN in size). When splatting, multiply the integer block-space 10 coordinate by iii, then add 0, 1, or 2 depending on which border you're splatting the vertex ID for (3, 0, or eight, respectively). (See shaders\5b_splat_vertex_IDs.vsh.) In the final laissez passer, when y'all're sampling the results, accommodate the ten coordinate similarly, depending whether you want the vertex ID along edge 3, 0, or eight within that voxel. (Come across shaders\7_gen_indices.gsh.)

Also note that in method 3, the list_nonempty_cells pass includes an extra layer of cells at the far ends of the x, y, and z axes. Sometimes vertices will be needed that fall on an edge other than 3, 0, or eight in the final row of cells, so this ensures that the vertices are all created—even if the cell in which they do map onto edge 3, 0, or 8 happens to be in a neighboring cake. In the gen_indices laissez passer, which operates on nonempty_cell_list, index generation is skipped for cells that are actually beyond the current block. (See shaders\7_gen_indices.gsh for an example.)

Method three allows u.s.a. to share vertices and, as a result, generate nigh 1-fifth every bit many of them. This is quite a savings, considering that each vertex samples the complex density function 128 times (32 rays ten 4 long-range apoplexy samples per ray).

Using the density part and settings that send with the demo for this chapter, an NVIDIA GeForce 8800 GPU can generate about 6.6 blocks per second using method ane, nigh 144 blocks per second using method 2, and about 260 blocks per second using method three.

one.five Texturing and Shading

One challenge with procedural generation of terrains, or any shape of arbitrary topology, is texturing—specifically, generating texture coordinates, commonly known equally UVs. How tin can we seamlessly map textures onto these polygons with minimal distortion? A single planar projection, like the one shown in Effigy i-21, of a seamless and repeating 2d texture looks good from one bending, but inferior from others, because of intense stretching (see as well Figure 1-23).

Figure one-21 A Single Planar Projection Is Plagued by Distortion

A simple mode to resolve this is to utilise triplanar texturing, or iii different planar projections, i along each of the three primary axes (ten, y, and z). At any given indicate, we use the projection that offers the least distortion (stretching) at that indicate—with some projections blending in the in-between areas, as in Figure 1-22. For example, a surface point whose normal vector pointed by and large in the +x or -x direction would apply the yz planar projection. A blending range of roughly 10 to xx degrees tends to work well; see Figure 1-23 for an illustration of how broad the blending range is.

Effigy 1-22 Three Planar Projections of the Same Texture, Blended Together Based on the Surface Normal Vector

When we use bump mapping with triplanar texturing, we need to make sure the tangent footing is generated from the normal separately for each project. For example, for the x-project, a very rough world-infinite tangent footing could merely be the vectors <0, 1, 0> and <0, 0, 1>. Yet, we tin do much better than this. Fortunately, it is very piece of cake to generate the existent tangent basis from the normal. Here, 90-degree rotations of vectors amount to just swizzling 2 of the components and flipping the sign on one of them. For case, for the x-projection, you might apply the vectors <normal.z, normal.y, -normal.x> and <normal.y, -normal.x, normal.z>. Withal, note that the location of the negative sign depends on the manner the data is stored in the crash-land maps.

Triplanar texturing tin can be done in a unmarried render pass, as shown in the pixel shader lawmaking sample in Listing 1-3. The sample fetches the colour values and bump vectors for each of the three planar projections and then blends them together based on the normal. Finally, the blended bump vector is applied to the vertex-interpolated normal to yield a bumped normal.

Another convenient way to texture the terrain is by mapping certain top values to sure colors, creating striations. With graphics hardware, this translates into a 1D texture lookup that uses the world-space y coordinate—or, better, apply a second texture lookup. In the texture, let the color scheme vary beyond the u axis, and let the v axis represent the distance. When doing the lookup, use a single octave of very low frequency noise to drive the u coordinate, so that the color scheme changes as the viewer roams.

Example 1-3. Texture Planar Projection

          // Determine the alloy weights for the 3 planar projections. // N_orig is the vertex-interpolated normal vector. float3 blend_weights = abs( N_orig.xyz );   // Tighten up the blending zone: blend_weights = (blend_weights - 0.two) * 7; blend_weights = max(blend_weights, 0);      // Force weights to sum to 1.0 (very important!) blend_weights /= (blend_weights.x + blend_weights.y + blend_weights.z ).thirty; // Now make up one's mind a color value and bump vector for each of the iii // projections, blend them, and shop composite results in these two // vectors: float4 blended_color; // .w hold spec value float3 blended_bump_vec; { // Compute the UV coords for each of the iii planar projections. // tex_scale (default ~ one.0) determines how big the textures appear. float2 coord1 = v2f.wsCoord.yz * tex_scale; float2 coord2 = v2f.wsCoord.zx * tex_scale; float2 coord3 = v2f.wsCoord.xy * tex_scale; // This is where you would utilize provisional displacement mapping. //if (blend_weights.ten > 0) coord1 = . . . //if (blend_weights.y > 0) coord2 = . . . //if (blend_weights.z > 0) coord3 = . . . // Sample colour maps for each projection, at those UV coords. float4 col1 = colorTex1.Sample(coord1); float4 col2 = colorTex2.Sample(coord2); float4 col3 = colorTex3.Sample(coord3); // Sample bump maps besides, and generate bump vectors. // (Annotation: this uses an oversimplified tangent ground.) float2 bumpFetch1 = bumpTex1.Sample(coord1).xy - 0.v; float2 bumpFetch2 = bumpTex2.Sample(coord2).xy - 0.5;  float2 bumpFetch3 = bumpTex3.Sample(coord3).xy - 0.5;  float3 bump1 = float3(0, bumpFetch1.x, bumpFetch1.y);  float3 bump2 = float3(bumpFetch2.y, 0, bumpFetch2.10);  float3 bump3 = float3(bumpFetch3.x, bumpFetch3.y, 0);  // Finally, alloy the results of the 3 planar projections. blended_color = col1.xyzw * blend_weights.xxxx +                 col2.xyzw * blend_weights.yyyy +                 col3.xyzw * blend_weights.zzzz; blended_bump_vec = bump1.xyz * blend_weights.xxx +                    bump2.xyz * blend_weights.yyy +                    bump3.xyz * blend_weights.zzz; } // Utilise bump vector to vertex-interpolated normal vector. float3 N_for_lighting = normalize(N_orig + blended_bump);

To make information technology even more interesting, make the low-frequency noise vary slowly on 10 and z but quickly on y—this arroyo will make the colour scheme vary with altitude, every bit well. It'south also fun to add a small amount of the normal vector's y component to the u coordinate for the lookup; this helps break upwardly the horizontal nature of the striations. Figures 1-24 and 1-25 illustrate these techniques.

Planar 2D projections and distance-based lookups are very useful for projecting detailed, high-resolution 2d textures onto terrain. Notwithstanding, texturing tin also be done procedurally. For example, a few octaves of noise—based on the earth-infinite coordinate—can be used to perturb the normal vector, adding a few extra octaves of perceived particular, equally shown in Listing ane-4 and Figure 1-26.

Example one-4. Normal Vector Perturbation

          // Further perturb normal vector by a few octaves of procedural noise. float3 v = 0; v += noiseVol1.Sample(ws* iii.97)*1.00; v += noiseVol2.Sample(ws* 8.06)*0.l; five += noiseVol3.Sample(ws*fifteen.96)*0.25; N = normalize(N + five);

The base of operations surface color tin can also be generated or modified procedurally. For example, a simple marble texture tin can exist created past warping the world-space coordinate by several octaves of lower-frequency racket, then taking sin(ws.y), equally shown in Listing 1-five and Figure 1-27.

Texture projection and procedural texturing are very powerful tools that tin can be combined easily to accomplish a big variety of effects. Used alone, procedural texturing tends to lack the high-resolution detail that a texture (such as a photograph or a painted map) can offering. On the other hand, texture projection used alone looks repetitive over a very big space. Still, simple and creative combinations of both tools can solve these issues and create beautifully textured landscapes with impressive detail and variety.

Case one-five. Marble Texture Generation

          // Use warped world-space coordinate to generate a marble texture. float3 v = 0; v += noiseVol2.Sample(ws*0.47)*ane.00; five += noiseVol3.Sample(ws*1.06)*0.l; 5 += noiseVol1.Sample(ws*1.96)*0.25; float3 ws_warped = ws + v; float is_marble = pw( saturate( sin(ws_warped.y)*1.1 ), 3.0) ; float3 marble_color = 1; blended_color = lerp(blended_color, marble_color, is_marble);

1.half-dozen Considerations for Real-World Applications

1.6.1 Level of Detail

In an ideal 3D scene, all polygons would show up at about the same size on the screen. In practice, though, this rarely happens. The terrain technique presented so far has created a tessellation that is somewhat uniform in globe space, just definitely not compatible in screen space. As a issue, afar polygons appear very tiny (one pixel or less in size), which is wasteful, and introduces aliasing artifacts. To alleviate this trouble, we'd like to split blocks into three groups: shut, medium, and far. Blocks at shut range will have polygons at a regular size, and blocks at a medium altitude volition have larger polygons (in world space). Finally, blocks at a far distance volition take the largest polygons. To implement this arroyo, we can choose from two bones schemes: 1 in which lower-level-of-detail (LOD) blocks have fewer polygons in them, and another where lower-LOD blocks simply correspond a larger space.

In the "fewer polygons" scheme, all blocks remain at 1x1x1 in world infinite, only faraway blocks have a smaller internal filigree size (sixteen³ or viii^three instead of 32³). Unfortunately, this scheme causes the number of blocks to bloat very quickly, which chop-chop decreases performance—for both the rendering and, especially, the generation of blocks.

The "bigger blocks" scheme is a better approach. Here, all blocks have a abiding 32³ internal grid size, just the world-space size of the terrain that the blocks represent changes, based on their altitude from the viewer. Nearby blocks will occupy a cube that is 1x1x1 in world space, while larger blocks (for terrain that is farther away) will cover a 2x2x2 cube in world space and some even larger cubes (4x4x4) out to a bang-up altitude. At depict time, we depict the big (faraway) blocks first, so the medium blocks, so the fine blocks (1x1x1) overtop. Because the number of blocks remains manageable, this is the preferred scheme.

As with many LOD schemes, withal, switching a department of terrain from 1 LOD to some other creates a sudden, visual popping artifact. The easiest style to deal with this problem is to draw both LODs during the transition period. Describe the low LOD first and slowly alpha-fade the college-LOD block in, or out, over some brusk menstruation of fourth dimension. Nonetheless, this works well but if the z-buffer (depth) values at every pixel are consistently closer for the higher-LOD block; otherwise the higher-LOD block won't blastoff-blend over the lower-LOD block.

Therefore, some small amount of negative bias on the z (depth) value should be used in the vertex shader when we're drawing higher-LOD blocks. Even better, we tin can generate the lower-LOD blocks by using a pocket-size negative bias in the density role. This arroyo isotropically "erodes" the blocks and is similar to shrinking the surface along its surface normals (but ameliorate, because it does not result in whatsoever pinch points). As a upshot, the college-LOD chunks volition unremarkably encase the lower-LOD chunks, and volition not have z-fighting problems when alpha blending over the tiptop of them.

1.half dozen.2 Collisions and Lighting of Foreign Objects

Collisions

In an interactive or game environs, many movable objects—such equally insects, birds, characters' anxiety, and so on—must be able to detect, and answer to, collisions with the terrain. "Intelligent" flying creatures might demand to bandage rays out ahead of time (as the dragonflies practise in the "Cascades" demo) in guild to steer clear of terrain features. And surface-crawling objects—such equally growing vines (a subconscious feature in the "Cascades" demo), spiders, or flowing water—must stick to the terrain surface every bit they move effectually. Thrown or launched objects also need to know when they hitting the terrain, and so that they tin stop moving (such every bit a spear hitting the ground), bouncing (as in a soccer ball), or triggering some kind of consequence.

These object-terrain interactions are easy to compute if the object's movement is primarily driven by the GPU from inside a shader. It's easiest to do this computation using a geometry shader, where a minor buffer containing a single element (vertex) for each moving object is run through the geometry shader for each frame. In order for the geometry shader to know nearly the terrain, the density function must be placed in a separate file that can be included (via #include) by other shaders. The geometry shader can then include the file and use the function, querying it when needed, to examination if a bespeak in space is inside or outside of the terrain.

For example, if a soccer ball were sailing toward the terrain, the geometry shader could exam the density function at the previous frame's position and at the new frame's position. If the ball was previously in the air but the new position would exist inside the terrain, then the exact location where the ball kickoff hitting the surface could be determined by an interpolation of the density value to find where it would equal nada. Or we could use an iterative refinement technique, such as interval halving. In one case we detect the exact point of collision, we tin compute the gradient at that point (via 6 more samples). Finally, knowing the velocity of the brawl and the normal of the terrain, we can compute the bounce direction, and and so we tin output the proper new position and velocity of the brawl.

Lighting

If the soccer ball of the previous example falls into a cave, the viewer volition await it to look darker because of the general occlusion of ambience light that is happening within the cave (assuming information technology's not a magical, calorie-free-producing soccer ball). Fortunately, this is easily achieved by casting ambient occlusion rays out from the center of the object for each frame (if the object is moving), just similar the ray casting when nosotros generate a single terrain vertex.

The but difference is that here the density function must exist used instead of the density volume, because the density volume data for this block is likely long gone. Using the density function is much slower, but if these occlusion rays are cast for only a few dozen moving, visible objects per frame, the touch on is not noticeable.

1.7 Conclusion

We have presented a mode to use the GPU to generate compelling, unbounded 3D terrains at interactive frame rates. We have likewise shown how to texture and procedurally shade the terrain for display, build a level-of-detail scheme for the terrain, let foreign objects to collaborate with the terrain, plus light these objects to lucifer the terrain.

Nosotros encourage you to try these techniques and build on them. The techniques we covered in this affiliate are only the basics for, potentially, a new prototype of GPU-powered terrain generation. What is fully possible now is largely unexplored, and what is possible with hereafter generations of chips holds even greater promise.

one.eight References

Ebert, David S., F. Kenton Musgrave, Darwyn Peachey, Ken Perlin, and Steven Worley. 2003. Texturing & Modeling: A Procedural Approach, 3rd ed. Academic Press. Capacity ix–13 by F. Kenton Musgrave are recommended.

Geiss, Ryan, and Michael Thompson. 2007. "NVIDIA Demo Squad Secrets—Cascades." Presentation at Game Developers Conference 2007. Available online at http://developer.download.nvidia.com/presentations/2007/gdc/CascadesDemoSecrets.zip.

NVIDIA Corporation. 2007. "Cascades." Demo. More than data available online at http://www.nzone.com/object/nzone_cascades_home.html.

castillosirstion66.blogspot.com

Source: https://developer.nvidia.com/gpugems/gpugems3/part-i-geometry/chapter-1-generating-complex-procedural-terrains-using-gpu