Skip to content
Ian Munsie edited this page Nov 16, 2015 · 2 revisions

There's a few pieces you need to get an auto crosshair working, and you will need to adjust it for each game.

Step 1: Copy the depth buffer

Firstly, you will need to get access to the depth buffer in the crosshair's vertex shader, which you can use the new resource copying feature introduced in 3DMigoto 1.2.4.

Option 1: Simple

If the game leaves the depth buffer assigned while it is drawing UI elements, you can simply do this to copy it to texture 110 in the vertex shader:

[ShaderOverrideCrosshair]
Hash = ...
vs-t110 = oD

This may or may not work in a given game. You can use frame analysis to dump out the depth buffer for this shader to check, or just try it and see.

There may also be some other problems with this method in terms of performance or vram usage as this will cause a full copy every time this shader is encountered. This is likely to happen multiple times in a frame, and if you are adjusting the entire UI you may end up with copies performed for each UI element (eating performance), and copies hanging around for each unique UI shader (consuming vram). Resource renaming (a hardware feature to trade off vram for performance) may also cause more vram to be consumed than we would naively expect. This should be fairly well contained though - if you see ram or vram usage continue to increase without limit while using this feature let me know as that may signify be a bug (which is entirely possible as this is all very new code).

Option 2: Reduced copies, better performance, less vram usage

You can limit the number of copies to once per frame for a given resource to eliminate these issues, though it uses slightly different code:

[ResourceDepthBuffer]
max_copies_per_frame=1

[ShaderOverrideCrosshair]
Hash = ...
; Since we are limiting the number of copies, use the 'unless_null' keyword to
; make sure we don't end up with a blank buffer if some draw call doesn't have
; a depth buffer bound:
ResourceDepthBuffer = oD unless_null
vs-t110 = ResourceDepthBuffer

The idea there is that a depth buffer will only be copied to ResourceDepthBuffer the first time a UI shader is encountered in a frame, eliminating excess copies and reducing the performance overhead. Using an intermediate resource means we can get this benefit even if multiple UI shaders are used. The "copy" of the temporary resource to a texture happens by reference (by default), so there is no overhead there.

Edit: Supported in 3DMigoto 1.2.5. Added 'unless_null' keyword.

Option 3: Copy from another shader (may make step 5 easier)

If the depth buffer is not assigned when the UI shader is drawn you can't just copy it directly and will need to copy it from a separate shader. This should not be too difficult as the depth buffer is typically available all over the place. You might even be able to find a version that has been pre-scaled to world coordinates (a "W-buffer"), which will make step 5 trivial.

e.g. you might copy it out of a shadow shader:

[ResourceDepthBuffer]
[ShaderOverrideShadow]
Hash = ...
; There's a good chance that copying by reference will work as we are copying a
; texture for use in a texture slot (same type of binding), and if it works this
; will save us the copy and extra storage:
ResourceDepthBuffer = reference ps-t0
; But if it doesn't work (maybe the game overwrites the texture afterwards),
; use a full copy instead, which is the default when copying to a temporary
; resource so we don't need to explicitly say so:
;ResourceDepthBuffer = ps-t0

[ShaderOverrideCrosshair]
Hash = ...
vs-t110 = ResourceDepthBuffer

or perhaps you have found (using frame analysis) a shader run once at the start of post processing that renders the depth buffer to a render target (seems to be fairly common):

[ResourceDepthBuffer]
[ShaderOverridePostProcessingDepthBuffer]
Hash = ...
; Unlike the above we almost certainly need to do a full copy here as we are
; copying a render target for use in a texture, which are different types of
; bindings (unless the game created the texture with both bind flags, which is
; unlikely). This is the default behaviour when copying to a temporary
; resource, so we don't need to explicitly say so:
;
; Since we are copying a render target we want to wait until after the draw
; call has finished before we copy it, which requires the "post" keyword
; introduced in 3DMigoto 1.2.5:
post ResourceDepthBuffer = o0

[ShaderOverrideCrosshair]
Hash = ...
vs-t110 = ResourceDepthBuffer

Edit: Added post keyword in 3DMigoto 1.2.5 to copy render targets after the draw call

Step 2: Declare the copied depth buffer

Once you have the depth buffer copied into the vertex shader you need to add this declaration to the top, adjusting register(t110) to match whichever texture slot you copied the depth buffer to:

// Depth buffer copied to this input with 3Dmigoto:
Texture2D<float> DepthBuffer : register(t110);

Note that if you copied the resource from another texture (as opposed to a depth buffer) you should use the same declaration as the shader you copied it from (except for the register number). If you copied it from another render target you may also need to change float to float4 to match the oN register in the original pixel shader. If either of these mean using a float2/float3/float4 you may also need to adjust which channel the depth is read from where it is used below.

Step 3: Copy and paste the auto crosshair code

Then, you will want to paste this code into the shader before the main() function. You will need to change some things (either near & far, or the scaling applied in world_z_from_depth_buffer), but we will come back to that:

static const float near = 0.1;
static const float far = 40000;

float world_z_from_depth_buffer(float x, float y)
{
	uint width, height;
	float z;

	DepthBuffer.GetDimensions(width, height);

	x = min(max((x / 2 + 0.5) * width, 0), width - 1);
	y = min(max((-y / 2 + 0.5) * height, 0), height - 1);
	z = DepthBuffer.Load(int3(x, y, 0));
	if (z == 1)
		return 0;

	// Derive world Z from depth buffer. This is a kluge since I don't know
	// the correct scaling, and the Z buffer seems to be (1 - what I expected).
	// Might be able to determine the correct way to scale it from other shaders.
	return far*near/(((1-z)*near) + (far*z));
}

float adjust_from_depth_buffer(float x, float y)
{
	float4 stereo = StereoParams.Load(0);
	float separation = stereo.x; float convergence = stereo.y;
	float old_offset, offset, w, sampled_w, distance;
	uint i;

	// Stereo cursor: To improve the accuracy of the stereo cursor, we
	// sample a number of points on the depth buffer, starting at the near
	// clipping plane and working towards original x + separation.
	//
	// You can think of this as a line in three dimensional space that
	// starts at each eye and stretches out towards infinity. We sample 255
	// points along this line (evenly spaced in the X axis) and compare
	// with the depth buffer to find where the line is first intersected.
	//
	// Note: The reason for sampling 255 points came from a restriction in
	// DX9/SM3 where loops had to run a constant number of iterations and
	// there was no way to set that number from within the shader itself.
	// I'm not sure if the same restriction applies in DX11 with SM4/5 - if
	// it doesn't, we could change this to check each pixel instead for
	// better accuracy.
	//
	// Based on DarkStarSword's stereo crosshair code originally developed
	// for Miasmata, adapted to Unity, then translated to HLSL.

	offset = (near - convergence) * separation;	// Z = X offset from center
	distance = separation - offset;			// Total distance to cover (separation - starting X offset)

	old_offset = offset;
	for (i = 0; i < 255; i++) {
		offset += distance / 255.0;

		// Calculate depth for this point on the line:
		w = (separation * convergence) / (separation - offset);

		sampled_w = world_z_from_depth_buffer(x + offset, y);
		if (sampled_w == 0)
			return 0;

		// If the sampled depth is closer than the calculated depth,
		// we have found something that intersects the line, so exit
		// the loop and return the last point that was not intersected:
		if (w > sampled_w)
			break;

		old_offset = offset;
	}

	return old_offset;
}

Step 4: Hook up the auto crosshair code

Option 1: Adjust based on the center of the screen

Then, somewhere in the body of the code you call this function and pass it the coordinates on the depth buffer you want to check. For example, if you are adjusting a crosshair you probably want to sample around the center of the screen (0,0):

o0.x += adjust_from_depth_buffer(0, 0);

This assumes that o0.w == 1 and the UI element was being displayed at screen depth originally. If that is not the case, you would need to change the adjustment to compensate (e.g. by multiplying by o0.w, and/or subtracting the nvidia formula and/or normalising the coordinate so that it is at depth==1 or depth==convergence) - standard UI adjustment stuff.

Option 2: Adjust each UI element separately

I've generally found it's simpler to adjust the whole UI to the crosshair depth, or better to only adjust the crosshair, but if you wanted to experiment with automatically adjust the entire UI you could do something like this to adjust each vertex individually. The result looks very similar to how the UI appears in compatibility mode, with the UI distorted over the geometry, so I don't necessarily recommend this:

o0.x += adjust_from_depth_buffer(o0.x, o0.y);

You might also experiment with finding a consistent point to sample for a given UI element. You might be able to multiply a point at the origin (or some fixed offset) by the MVP matrix passed to the shader to find a point that will be consistent for all corners, or perhaps you could copy a vertex buffer in to look up the coordinates of other vertices (I haven't tried this yet).

Step 5: Figure out the right scaling

Finally, you will need to figure out the correct way to scale a value from the depth buffer to world Z for this particular game, and the right answer here will vary depending on the game.

Option 1: Adjust near + far clipping planes

A simple way that might work is to find or guess the values of the near and far clipping planes and adjust the definitions at the top of the code. You might be able to find these in a constant buffer somewhere and use them directly or dump them out using frame analysis to find their values and hardcode them (which will only work if they don't change during the game - check a few levels & cutscenes to make sure they are ok).

You can also use the convergence to guess these - find something in the game that clips through the camera (or failing that is as close to the camera as possible), then adjust the convergence to put the point it clips at screen depth. At that point the convergence will be an upper bound for the near clipping plane. Then, find something far away (like a mountain, but probably not the sky box) and adjust convergence until that is at screen depth, which will give you a lower bound for the far clipping plane. Plug those values into the code, then use trial and error to tune them until the crosshair rests on whatever you are aiming at.

If this is totally not working it may be that the game is fundamentally scaling their depth buffer differently (e.g. linear vs exponential), and the formula in world_z_from_depth_buffer() may need to be adjusted (the above one works in Witcher 3 and Mad Max).

Option 2: Do what the game does

A better way is to look at other places the game uses the depth buffer to calculate a world coordinate (such as in most deferred lighting shaders) and replace the "return far*near/..." line in the world_z_from_depth_buffer() function with code that does the same scaling as the game. You may find you also need to some values which may or may not already be available in the crosshair shader - if they aren't you can copy them in, or use frame analysis to dump them out and hardcode them (which will only work if they don't change during the game).

Option 3: Revisit step 1

The game may have a pre-scaled copy of the depth buffer available somewhere (a "W-buffer"), allowing you to use it's values directly with no scaling whatsoever. This won't be available if you copied a depth buffer from oD, so you may wish to go back to step 1 and look for alternative sources of depth information.

e.g. in The Witcher 3 the first shader run during post-processing scales the Z-buffer into a W-buffer, so using it allows us to skip scaling it ourselves:

[ResourceWBuffer]
[ShaderOverrideHBAODepthPass]
Hash = 170486ed36efcc9e
; This shader converts the Z buffer into a W buffer. Save it off for later use
; in HUD shaders:
post ResourceWBuffer = o0