Revealing The Power of DirectX 11



No matter whether we've got a low end or high end system, we all
expect the realtime 3D revolution to continue until we achieve near
parity with reality. The push forward is backed by many factors
including pure hardware performance and brilliant advances in
techniques for better approximating what we see. But there's another
side to the equation beyond just hardware and developers: there is the
graphics API.

Unlike CPUs, graphics hardware (GPUs) do not have a common instruction
set upon which tools and software can be built. In order to get power
of the hardware out to the public, we need a common interface that
works no matter what GPU is underneath. It's left to the graphics
hardware designer to take the code generated by this application
programming interface (API) and translate it into something that their
chip can use. Because it's the developer's single point of contact,
the graphics API is incredibly important. It defines how much
flexibility programmers have in using hardware and shapes the world of
high performance realtime 3D graphics.

Some of the key work done through graphics API is taking descriptions
of 3D objects in a 3D world, sending those objects and other resources
to the hardware, and then telling the hardware what to do with them.
There is sort of a step by step process that needs to be followed that
we generally call a pipeline. Graphics API pipelines have different
stages where different work is done. Here's the general structure of a
3D graphics pipeline:

First vertex data (information about the position of the corners of
shapes) is taken in and processed. Then those shapes can then be
further manipulated and re-processed if needed. After this, 3D objects
are broken down from 3D shapes by projecting them into 2D fragments
called pixels (this step is called rasterization), and then these
pixels are each processed by looking up texture information and using
lighting techniques and so on. When pixels are finished processing,
they are output and displayed on the screen. And that's the mile high
overview of how 3D graphics work.

For the past dozen years (it seems longer doesn't it?), we've seen
makers of 3D graphics hardware accelerate two very prominent APIs:
OpenGL and DirectX.

We recently touched on advancements tangential to OpenGL in our OpenCL
article, but today our focus will be on DirectX. Microsoft's DirectX
graphics API is much more heavily used in game engines than OpenGL in
good part because DirectX tends to move much more quickly and sets the
bar for both hardware and OpenGL in terms of feature set and
flexibility. Which always makes upcoming versions of DirectX exciting
to talk about: they define the future capabilities of hardware and
expose improved tools to developers. Upcoming DirectX versions are
glimpses into our graphical future. Currently we have a lot of DirectX
9 and DirectX 10 games available and in development, but DirectX 11
looms on the horizon.

As usual, Microsoft will be trying to time the release of their next
DirectX revision with the release of compatible graphics hardware. As
with last time, DirectX 11 will also be released with Windows 7. With
the Windows 7 Beta already under way, we expect the OS to be done some
time this year.

Microsoft has been rather aggressive with Windows 7 scheduling in
light of the rejection of Vista, so it appears they are stepping up to
the plate to get everything out sooner rather than later. There was a
little more than 4 years between the release of DirectX 9 and DirectX
10. As it hit the streets with Vista in January of 2007, DirectX 10
has just turned 2 and we are already anticipating it's replacement in
the very near future. As we will learn, this speedy transition should
be very good for DirectX 11 adoption as DirectX 10 hasn't even become
pervasive yet: many games are still DirectX 9 only.

But let's take a closer look at what we are talking about before we go
any further.

Introducing DirectX 11: The Pipeline and Features

This is DirectX 10.

We all remember him from our G80 launch article back in the day when
no one knew how much Vista would really suck. Some of the short falls
of DirectX 10 have been in operating system support, driver support,
time to market issues, and other unfortunate roadblocks that kept
developers from making full use of all the cool new features and tools
DirectX 10 brought.

Meet DirectX 11.

She's much cooler than her older brother, and way hotter too. Many
under-the-hood enhancements mean higher performance for features
available but less used under DX10. The major changes to the pipeline
mark revolutionary steps in graphics hardware and software
capabilities. Tessellation (made up of the hull shader, tessellator
and domain shader) and the Compute Shader are major developments that
could go far in assisting developers in closing the gap between
reality and unreality. These features have gotten a lot of press
already, but we feel the key to DirectX 11 adoption (and thus
exploitation) is in some of the more subtle elements. But we'll get in
to all that in due time.

Along with the pipeline changes, we see a whole host of new tweaks and
adjustments. DirectX 11 is actually a strict superset of DirectX 10.1,
meaning that all of those features are completely encapsulated in and
unchanged by DirectX 11. This simple fact means that all DX11 hardware
will include the changes required to be DX 10.1 compliant (which only
AMD can claim at the moment). In addition to these tweaks, we also see
these further extensions:

While changes in the pipeline allow developers to write programs to
accomplish different types of tasks, these more subtle changes allow
those programs to be more complex, higher quality, and/or more
performant. Beyond all this, Microsoft has also gone out of its way to
help make parallel programming a little bit easier for game

From Evolution to Expansion and Multi-Threading: The Mile High

The November DirectX SDK update was the first to include some DirectX
11 features for developers to try out. Of course, there is no DX11
hardware yet, but what is included will run on the current DX10 setup
with DX10 hardware under Vista and the beta Windows 7. This combined
with the fact that Khronos finished the OpenCL specification last
month mark two major developments on the path to more general purpose
computing on the GPU. Of course, DX11 is more geared toward realtime
3D and OpenCL is targeted at real general purpose data parallel
programming (across multiple CPUs and GPUs) distinct from graphics,
but these two programming APIs are major milestones in the future
history of computing.

There is more than just the compute shader included in DX11, and since
our first real briefing about it at this year's NVISION, we've had the
chance to do a little more research, reading slides and listening to
presentations from SIGGRAPH and GameFest 2008 (from which we've
included slides to help illustrate this article). The most interesting
things to us are more subtle than just the inclusion of a tessellator
or the addition of the Compute Shader. And the introduction of DX11
will also bring benefits to owners of current DX10 and DX10.1
hardware, provided AMD and NVIDIA keep up with appropriate driver
support anyway.

Many of the new aspects of DirectX 11 seem to indicate to us that the
landscape is ripe for a fairly quick adoption especially if Microsoft
brings Windows 7 out sooner rather than later. There have been
adjustments to HLSL that should make it much more attractive to
developers, the fact that DX10 is a subset of DX11 has some good
transitional implications, and changes that make parallel programming
much easier should all go a long way to helping developers pick up the
API quickly. DirectX 11 will be available for Vista, so there won't be
as many complications from a lack of users upgrading, and Windows 7
may also inspire Windows XP gamers to upgrade meaning a larger install
base for developers to target as well.

The bottom line is that while DirectX 10 promised features that could
bring a revolution in visual fidelity and rendering techniques,
DirectX 11 may actually deliver the goods while helping developers
make the API transition faster than we've seen in the past. We might
not see techniques that take advantage of the exclusive DirectX 11
features right off the bat, but adoption of the new version of the API
itself will go a long way to inspiring amazing advances in realtime 3D

From DirectX 6 through DirectX 9, Microsoft steadily evolved their
graphics programming API from a fixed function vehicle for setting
state and moving data structures around to a rich, programmable
environment enabling deep control of graphics hardware. The step from
DX9 to DX10 was the final break in the old ways, opening up and
expanding on the programmability to DX9 to add more depth and
flexibility enabled by newer hardware. Microsoft also forced a shift
in the driver model with the DX10 transition to leave the rest of the
legacy behind and try and help increase stability and flexibility when
using DX10 hardware. But DirectX 11 is different.

Rather than throwing out old constructs in order to move towards more
programmability, Microsoft has built DirectX 11 as a strict superset
of DirectX 10/10.1, which enables some curious possibilities.
Essentailly, DX10 code will be DX11 code that chooses not to implement
some of the advanced features. On the flipside, DX11 will be able to
run on down level hardware. Of course, all of the features of DX11
will not be available, but it does mean that developers can stick with
DX11 and target both DX10 and DX11 hardware without the need for two
completely separate implementations: they're both the same but one
targets a subset of functionality. Different code paths will be
necessary if something DX11 only (like the tessellator or compute
shader) is used, but this will still definitely be a benefit in
transitioning to DX11 from DX10.

Running on lower spec'd hardware will be important, and this could
make the transition from DX10 to DX11 one of the fastest we have ever
seen. In fact, with lethargic movement away from DX9 (both by
developers and consumers), the rush to bring out Windows 7 and slow
adoption of Vista, we could end up looking back at DX10 as merely a
transitional API rather than the revolutionary paradigm shift it could
have been. Of course, Microsoft continues to push that the fastest
route to DX11 is to start developing DX10.1 code today. With DX11 as a
superset of DX10, this is certainly true, but developer's time will
very likely be better spent putting the bulk of their effort into a
high quality DX9 path with minimal DX10 bells and whistles while
saving the truly fundamental shifts in technique made possible by DX10
for games targeted at DX11 hardware and timeframe.

We are especially hopeful about a faster shift to DX11 because of the
added advantages it will bring even to DX10 hardware. The major
benefit I'm talking about here is multi-threading. Yes, eventually
everything will need to be drawn rasterized and displayed (linearly
and synchronously), but DX11 adds multi-threading support that allows
applications to simultaneously create resources or manage state and
issue draw commands all from an arbitrary number of threads. This may
not significantly speed up the graphics subsystem (especially if we
are already very GPU limited), but this does increase the ability to
more easily explicitly massively thread a game and take advantage of
the increasing number of CPU cores on the desktop.

With 8 and 16 logical processor systems coming soon to a system near
you, we need developers to push beyond the very coarse grained and
heavy threads they are currently using that run well on two core
systems. The cost/benefit of developing a game that is significantly
assisted by the availability of more than 2 cores is very poor at this
point. It is too difficult to extract enough parallelism to matter on
quad core and beyond in most video games. But enabling simple parallel
creation of resources and display lists by multiple threads could
really open up opportunities for parallelizing game code that would
otherwise have remained single threaded. Rather than one thread to
handle all the DX state change and draw calls (or very well behaved
and heavily synchronized threads sharing the responsibility),
developers can more naturally create threads to manage types or groups
of objects or parts of a world, opening up the path to the future
where every object or entity can be managed by it's own thread (which
would be necessary to extract performance when we eventually expand
into hundreds of logical cores).

The fact that Microsoft has planned multi-threading support for DX11
games running on DX10 hardware is a major bonus. The only caveat here
is that AMD and NVIDIA will need to do a little driver work for their
existing DX10 hardware to make this work to its fullest extent (it
will "work" but not as well even without a driver change). Of course,
we expect that NVIDIA and especially AMD (as they are also a multi-
core CPU company) will be very interested in making this happen. And,
again, this provides major incentives for game developers to target
DX11 even before DX11 hardware is widely available or deployed.

All this is stacking up to make DX11 look like the goto technology.
The additions to and expansions of DX10, the timing and the ability to
run on down level hardware could create a perfect storm for a
relatively quick uptake. By relatively quick, we are still looking at
years for pervasive use of DX11, but we expect that the attractiveness
of the new features and benefit to the existing install base will
provide a bigger motivation for game developers to transition than
we've seen before.

If only Microsoft would (and could) back-port DX11 to Windows XP,
there would be no reason for game developers to maintain legacy code
paths. I know, I know, that'll never (and can't by design) happen.
While we whole heartedly applaud the idea of imposing strict minimum
requirements on hardware for a new operating system, unnecessarily
cutting off an older OS at the knees is not the way to garner support.
If Windows 7 ends up being a more expensive Vista in a shiny package,
we may still have some pull towards DX9, especially for very
mainstream or casual games that tend to lag a bit anyway (and as some
readers have pointed out because consoles will still be DX9 for the
next few years). It's in these incredibly simple but popular games and
console games that the true value of amazing realtime 3D graphics
could be brought to the general computing populous, but craptacular
low end hardware and limiting API accessibility on popular operating
systems further contribute to the retardation of graphics in the

But that's the overview. Let's take some time to drill down a bit
further into some of the technology.

Drilling Down: DX11 And The Multi-Threaded Game Engine

In spite of the fact that multithreaded programming has been around
for decades, mainstream programmers didn't start focusing on parallel
programming until multicore CPUs started coming along. Much general
purpose code is straight forward as a single thread; extracting
performance via parallel programming can be difficult and isn't always
obvious. Even with talented programmers, Amdahls Law is a bitch: your
speed up from parallelization is limited by the percent of code that
is necessarily sequential.

Currently, in game development, rendering is one of those
"necessarily" sequential tasks. DirectX 10 isn't set up to
appropriately handle multiple threads all throwing commands at the
GPU. That doesn't mean parallelization of renders can't happen, but it
does limit speed up because costly synchronization techniques or
management threads need to be implemented in order to make sure
nothing steps out of line. All this limits the benefit of
parallelization and discourages programmers from trying too hard.
After all, its a better idea to put more of your effort into areas
where performance can be improved more significantly. John Carmack put
it really well once, but I can't remember the quote. And I'm doing too
much benchmarking to go look for it now. :p

No matter what anyone does, some stuff in the renderer will need to be
sequential. Programs, textures and resources must be loaded up,
geometry happens before pixel processing, draw calls intended to be
executed while a certain state is active must have that state set
first and not changed until completion. Even in such a massively
parallel machine, order must be maintained for many things. But order
doesn't /always/ matter.

Making more things thread-safe through an extended device interface
using multiple contexts and making a lot of synchronization overhead
the responsibility of the API and/or graphics driver, Microsoft has
enabled game developers to more easily and effortlessly thread not
only their rendering code, but their game code as well. These things
will also work on DX10 hardware running on a system with DX11, though
some missing hardware optimizations will reduce the performance
benefit. But the fundamental ability to write code differently will go
a long way to getting programmers more used to and better at
parallelization. Let's take a look at the tools available to
accomplish this in DX11.

First up is free threaded asynchronous resource loading. That's a bit
of a mouthful, but this feature gives developers the ability to upload
programs, textures, state objects, and all resources in a thread-safe
way and, if desired, concurrent with the rendering process. This
doesn't mean that all this stuff will get pushed up in parallel with
rendering, as the driver will manage what gets sent to the GPU and
when based on priority, but it does mean the developer no longer has
to think about synchronizing or manually prioritizing resource
loading. Multiple threads can start loading whatever resources the
need whenever they need them. The fact that this can also be done
concurrently with rendering could improve performance for games that
stream in data for massive open worlds in addition to enabling
multithreaded opportunities.

In order to enable this and other threading, the D3D device interface
is now split into three separate interfaces: the Device, the Immediate
Context, and the Deferred Context. Resource creation is done through
the Device. The Immediate Context is the interface for setting device
state, draw calls, and queries. There can only be one Device and one
Immediate Context. The Deferred Context is another interface for state
and draw calls, but many can exist in one program and can be used as
the per-thread interface (Deferred Contexts themselves are thread
unsafe though). Deferred Contexts and the free threaded resource
creation through the device are where DX11 gets it multithreaded

Multiple threads submit state and draw calls to their Deferred Context
which complies a display list that is eventually executed by the
Immediate Context. Games will still need a render thread, and this
thread will use the Immediate Context to execute state and draw calls
and to consume the display lists generated by Deferred Contexts. In
this way, the ultimate destination of all state and draw calls is the
Immediate Context, but fine grained synchronization is handled by the
API and the display driver so that parallel threads can be better used
to contribute to the rendering process. Some limitations on Deferred
Contexts include the fact that they cannot query the device and they
can't download or read back anything from the GPU. Deferred Contexts
can, however, consume the display lists generated by other Deferred

The end result of all this is that the future will be more parallel
friendly. As two and four core CPUs become more and more popular and 8
and 16 (logical) core CPUs are on the horizon we need all the help we
can get when trying to extract performance from parallelism. This is a
good move for DirectX and we hope it will help push game engines to
more fully utilize more than 2 or even 4 cores when the time comes.

Going Deeper: The DX11 Compute Shader and OpenCL/OpenGL

Many developers are excited about the added flexibility of the Compute
Shader (also referred to as the CS). This addition to the pipeline
steps further from a render-centric API and enables more general
purpose algorithms. We see added flexibility in both the type of
operations that can be preformed on data and the type of data that can
be operated on.

In other pipeline stages, we see limitations imposed that are designed
to speed up execution that get in the way of general purpose code.
Although we can shoehorn general purpose algorithms into a pixel
shader program, we don't have the freedom to use data structures like
trees, sharing data between pixels (and thus threads) is difficult and
costly, and we have to go through the motions of drawing triangles and
mapping solutions onto this.

Enter DirectX11 and the CS. Developers have the option to pass data
structures over to the Compute Shader and run more general purpose
algorithms on them. The Compute Shader, like the other fully
programmable stages of the DX10 and DX11 pipeline, will share a single
set of physical resources (shader processors).

This hardware will need to be a little more flexible than it currently
is as when it runs CS code it will have to support random reads and
writes and irregular arrays (rather than simple streams or fixed size
2D arrays), multiple outputs, direct invocation of individual or
groups of threads as per the programmers needs, 32k of shared register
space and thread group management, atomic instructions,
synchronization constructs and the ability to perform unordered IO

At the same time, the CS looses some features as well. As each thread
is no longer treated as a pixel, so the association with geometry is
lost (unless specifically passed in a data structure). This means
that, although CS programs can still use texture samplers, automatic
trilinear LOD calculations are not automatic (LOD must be specified).
Additionally, depth culling, antialiasing, alphablending, and other
operations that have no meaning to generic data cannot be performed
inside a CS program.

The type of new applications opened up by the CS are actually
infinite. But the most immediate interest will come from game
developers looking to augment their graphics engines with fancy
techniques not possible in the Pixel Shader. Some of these
applications include A-Buffer techniques to allow very high quality
antialiasing and order independent transparency, more advanced
deferred shading techniques, advanced post processing effects and
convolution, FFTs (fourier transforms) for frequency domain
operations, and summed area tables.

Beyond the rendering specific applications, game developers may wish
to do things like IK (inverse kinematics), physics, AI, and other
traditionally CPU specific tasks on the GPU. Having this data on the
GPU by performing calculations in the CS means that the data is more
quickly available for use in rendering and some algorithms may be much
faster on the GPU as well. It might even be an option to run things
like AI or physics on both the GPU and the CPU if algorithms that
always yield the same result on both types of processors can be found
(which would essentially substitute compute power for bandwidth).

Even though the code will run on the same hardware, PS and CS code
will perform very differently based on the algorithms being
implemented. One of the interesting things to look at is exposure and
histogram data often used in HDR rendering. Calculating this data in
the PS requires several passes and tricks to take all the pixels and
either bin them or average them. Despite the fact that sharing data is
going to slow things down quite a bit, sharing data can be much faster
than running many passes and this makes the CS an ideal stage for such

A while back we took a look at OpenCL, and we know that OpenCL will be
able to share data structures with OpenGL. We haven't yet gotten a
developers take on comparing OpenCL and the DX11 CS, but at first
blush it seems that the possibilities opened up for game developers
and graphics processing with DX11 and the Compute Shader will also be
possible with OpenGL+OpenCL. Although the CS can be used as a general
purpose hardware accelerated GPU computing interface, OpenCL is
taregeted more at that arena and it's independence from Microsoft and
DirectX will likely mean wider adoption as a GPU compute language for
general purpose tasks.

The use of OpenGL has declined significantly in the game developer
community over the last five years. While OpenCL may enable DX11 like
applications to be written in combination with OpenGL, it is more
likely that this will be the venue of workstation applications like
CAD/CAM and simulations that require visualization. While I'm a fan of
OpenGL myself, I don't see the flexibility of OpenCL as a significant
boon to its adoption in game engines.

So What's a Tessellator?

This has been covered before now in other articles about DirectX 11,
but we first touched on the subject back with the R600 launch. Both
R6xx and R7xx hardware have tessellators, but since these are
proprietary implementations, they won't be directly compatible with
DirectX 11 which uses a much more sophisticated setup. While neither
AMD nor the DX11 tessellator itself is programmable, DX11 includes
programmable input to and output of the tesselator (TS) through two
additional pipeline stages called the Hull Shader (HS) and the Domain
Shader (DS).

The tessellator can take coarse shapes and break them up into smaller
parts. It can also take these smaller parts and reshape them to form
geometry that is much more complex and that more closely approximates
reality. It can take a cube and turn it into a sphere with very little
overhead and much fewer space requirements. Quality, performance and
manageability benefit.

The Hull Shader takes in patches and control points out outputs data
on how to configure the tessellator. Patches are a new primitive (like
vertices and pixels) that define a segment of a plane to be
tessellated. Control points are used to define the parametric shape of
the desired surface (like a curve or something). If you've ever used
the pen tool in Photoshop, then you know what control points are:
these just apply to surfaces (patches) instead of lines. The Hull
Shader uses the control points to determine how to set up the
tessellator and then passes them forward to the Domain Shader.

The tessellator just tessellates: it breaks up patch fed to it by the
Hull Shader based on the parameters set by the Hull shader per patch.
It outputs a stream of points to the Domain Shader which then needs to
finish up the process. While programmers must write HS programs for
their code, there isn't any programming required for the TS. It's just
a fixed function block that processes input based on parameters.

The Domain Shader takes points generated by the tessellator and
manipulates them to form the appropriate geometry based on control
points and/or displacement maps. It performs this manipulation by
running developer designed DS programs which can manipulate how the
newly generated points are further shifted or displaced based on
control points and textures. The Domain Shader, after processing a
point, outputs a vertex. These vertices can be further processed by
Geometry Shader which can also feed them back up to the Vertex Shader
using stream out functionality. More likely than heading back up for a
second pass, we will probably see most output of the Domain Shader
head straight on to rasterization so that it's geometry can be broken
down into screen space fragments for Pixel Shader processing.

That covers what the basics of what the tesselator can do and how it
does it. But do you find your self wondering: "self, can't the
Geometry Shader just be used to create tessellated surfaces and move
the resulting vertices around?" Well, you would be right. That is
technically possible, but not practical at this point. Let's dive in
to that a bit more.

Tessellation: Because The GS Isn't Fast Enough

Microsoft and AMD tend to get the most excited about tessellation when
ever the topic of DX11 comes up. AMD jumped on the tessellation
bandwagon long ago, and perhaps it does make sense for consoles like
the XBox 360. Adding fixed function hardware to quickly and
efficiently handle a task that improves memory footprint has major
advantages in the living room. We still aren't sold on the need for a
tessellator on the desktop, but who's to argue with progress.

Or is it really progressive? The tessellator itself is fixed function
rather than programmable. Sure, the input to and output of the
tessellator can be manipulated a bit through the Hull Shader and
Domain Shader, but the heart of the beast is just not that flexible.
The Geometry Shader is the programmable block in the pipeline that is
capable of tessellation as well much more, but it just doesn't have
the power to do tessellation on any useful scale. So while most
everything has been moving towards programmability in the rendering
pipe, we have sort of a step backward here. But why?

The argument between fixed function and programmable hardware is
always one of performance versus flexibility and usefulness. In the
beginning, fixed function was necessary to get the desired
performance. As time went on, it became clear that adding in more
fixed function hardware to graphics chips just wasn't feasible. The
transistors put into specialized hardware just go unused if developers
don't program to take advantage of it. This made a shift toward a
architectures where expanding the pool of compute resources that could
be shared and used for many different tasks became a much more
attractive way to go. In the general case anyway. But that doesn't
mean that fixed function hardware doesn't have it's place.

We do still have the problem that all the transistors put into the
tessellator are worthless unless developers take advantage of the
hardware. But the reason it makes sense is that the ROI (return on
investment: what you get for what you put in) on those transistors is
huge if developers do take advantage of the hardware: it's much easier
to get huge tessellation performance out of a fixed function
tessellator than to put the necessary resources into the Geometry
Shader to allow it to be capable of the same tessellation performance
programmatically. This doesn't mean we'll start to see a renaissance
of fixed function blocks in our graphics hardware, just that
significantly advanced features going forward may still require the
sacrifice of programability in favor of early adoption of a feature.
The majority of tasks will continue to be enabled in a flexible
programmable way, and in the future we may see more flexibility
introduced into the tessellator until it becomes fully programmable as
well (or ends up just being merged into some future version of the
Geometry Shader).

Now don't let this technical assessment of fixed function tessellation
make you think we aren't interested in reaping the benefits of the
tessellator. Currently, artists need to create different versions of
their objects for different LODs (Level of Detail -- reducing or
increasing complexity as the object moves further or nearer the
viewer), and geometry simulation through texturing at each LOD needs
to be done by pixel shaders. This requires extra work from both
artists and programmers and costs a good bit in terms of performance.
There are also some effects than can only be done with more geometry.

Tessellation is a great way to get that geometry in there for more
detail, shadowing, and smooth edges. High geometry also allows really
cool displacement mapping effects. Currently, much geometry is
simulated through textures and techniques like bump mapping or
parallax occlusion mapping or some other technique. Even with high
geometry, we will want to have large normal maps for our lighting
algorithms to use, but we won't need to do so much work to make things
like cracks, bumps, ridges, and small detail geometry appear to be
there when it isn't because we can just tessellate and displace in a
single pass through the pipeline. This is fast, efficient, and can
produce very detailed effects while freeing up pixel shader resources
for other uses. With tessellation, artists can create one sub division
surface that can have a dynamic LOD free of charge with a simple hull
shader and a displacement map applied in the domain shader will save a
lot of work, increase quality and improve performance quite a bit.

If developers adopt tessellation, we could see cool things, and with
the move to DX11 class hardware both NVIDIA and AMD will be making
parts with tessellation capability. But we may not see developers just
start using tessellation (or the compute shader for that matter) right
away. Because DirectX 11 will run on down level hardware and at the
release of DX11 we will already have a huge number cards on the market
capable of running a subset of DX11 bringing with it a better, more
refined, programming language in the new version of HLSL and seamless
parallelization optimizations, we will very likely see the first DX11
games only implementing features that can run completely on DX10

Of course, at that point developers can be fully confident of
exploiting all the aspects of DX10 hardware, which they still aren't
completely taking advantage of. Many people still want and need a DX9
path because of Vista's failure, which means DX10 code tends to be
more or less an enhanced DX9 path rather than something fundamentally
different. So when DirectX 11 finally debuts, we will start to see
what developers could really do with DX10.

Certainly there will be developers experimenting with tessellation,
but these will probably just be simple amplification to get rid of
those jagged edges around curved surfaces at first. It will take time
for the real advanced tessellation techniques everyone is excited
about to come to fruition.

One Last Thing and Closing Thoughts

The final bit of DX11 we'll touch on is the update to HLSL (MS's High
Level Shader Language) in version 5.0 which brings some very developer
friendly adjustments. While HLSL has always been similar in syntax to
C, 5.0 adds support for classes and interfaces. We still don't get to
use pointers though.

These changes are being made because of the sheer size of shader code.
Programmers and artists need to build or generate either a single
massive shader or tons of smaller shader programs for any given game.
These code resources are huge and can be hard to manage without OOP
(Object Oriented Programming) constructs. But there are some
differences to how things work in other OOP languages. For instance,
there is no need for memory management (because there are no pointers)
or constructors / destructors in HLSL. Tasks like initialization are
handled through updates to constant buffers, which generally reflect
member data.

Aside from the programmability aspect, classes and interfaces were
added to support dynamic shader linkage to combat the intricacy of
developing with huge numbers of resources and effects. Dynamic linking
allows the application to decide at runtime what shaders to compile
and link and enables interfaces to be left ambiguous until runtime. At
runtime, shaders are dynamically linked and based on what is linked
all possible function bodies are then compiled and optimized. Compiled
hardware-native code isn't inlined until the appropriate SetShader
function is called.

The flexibility this provides will enable development of much more
complex and dynamic shader code, as it won't all need to be in one
giant block with lots of "if"s nor will there need to be thousands of
smaller shaders cluttering up the developers mind. Performance of the
shaders will still limit what can be done, but with this step DirectX
helps reduce code complexity as a limiting factor in development.

With all of this, the ability to perform unordered memory accesses,
multi-threading, tessellation, and the Compute Shader, DX11 is pretty
aggressive. The complexity of the upgrade, however, is mitigated by
the fact that this is nothing like the wholesale changes made in the
move from DX9 to DX10: DX11 is really just a superset of DX10 in terms
of features. This enables the ability for DX11 to run on down-level
hardware (where DX11 specific features are not used), which when
combined with the enhancements to HLSL with OOP and dynamic shader
linking mean that developers should really have fewer qualms about
moving from DX10 to DX11 than we saw with the transition from DX9.

To be fair, the OS upgrade requirement also threw a wrench in the
gears. That won't be a problem this time, as Vista still sucks but
will be getting DX11 support and Windows 7 looks like a better upgrade
option for XP users than Vista. Developers who haven't already moved
from DX9 may well skip DX10 altogether in favor of DX11 depending on
the predicted ship dates of their titles, all signs point to DX11 as
setting the time frame we start to see the revolution promised with
the move to DX10 take place. Developers have had time to familiarize
themselves with the extended advantages of programmability offered by
DX10, coding for DX11 will be much easier though OOP constructs and
multithreaded support, and if the features don't entice them, the
ability to run on downlevel hardware with a better coding environment
might just seal the deal.

I'm still an OpenGL developer at this point, and I've dabbled a bit
with DirectX at times. But DirectX 11 (and my disappointment with
OpenGL 3.0) mark the first time I think I might actually make the
switch. The first preview of DX11 is already available in the latest


Good article, but I read it at the original source, which you didn't
bother to include. Why not?


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question