Nvidia recently introduced a new memory type for data streaming
(awesome!), but yuzu was assuming that all heaps had enough memory
for the assumed stream buffer size (256 MiB).
This worked fine on AMD but Nvidia's new memory heap was smaller than
256 MiB. This commit changes this assumption and allocates a bit less
than the size of the preferred heap, with a maximum of 256 MiB (to avoid
allocating all system memory on integrated devices).
- Fixes a crash on NVIDIA 450.82.0.0
Some variables aren't used, so we can remove these.
Unfortunately, diagnostics are still reported on structured bindings
even when annotated with [[maybe_unused]], so we need to unpack the
elements that we want to use manually.
Implement indexed quads (GL_QUADS used with glDrawElements*) with a
compute pass conversion.
The compute shader converts from uint8/uint16/uint32 indices to uint32.
The format is passed through push constants to avoid having different
variants of the same shader.
- Used by Fast RMX
- Used by Xenoblade Chronicles 2 (it still has graphical due to
synchronization issues on Vulkan)
The original idea of returning pointers is that handles can be moved.
The problem is that the implementation didn't take that in mind and made
everything harder to work with. This commit drops pointer to handles and
returns the handles themselves. While it is still true that handles can
be invalidated, this way we get an old handle instead of a dangling
pointer.
This problem can be solved in the future with sparse buffers.
Allows reporting more cases where logic errors may exist, such as
implicit fallthrough cases, etc.
We currently ignore unused parameters, since we currently have many
cases where this is intentional (virtual interfaces).
While we're at it, we can also tidy up any existing code that causes
warnings. This also uncovered a few bugs as well.
This can result in silent logic bugs within code, and given the amount
of times these kind of warnings are caused, they should be flagged at
compile-time so no new code is submitted with them.
When the dynamic state is specified, pViewports and pScissors are
ignored, quoting the specification:
pViewports is a pointer to an array of VkViewport structures, defining
the viewport transforms. If the viewport state is dynamic, this member
is ignored.
That said, AMD's proprietary driver itself seem to read it regardless of
what the specification says.
This is a simple optimization as Buffer Copies are mostly used for texture recycling. They are, however, useful when games abuse undefined behavior but most 3D APIs forbid it.
This reverts commit 05cf270836.
Apparently the first approach using floats instead of bitfieldInert
worked better for Fire Emblem: Three Houses. Reverting to get that
behavior back.
From my testing on a Splatoon 2 shader that takes 3800ms on average to
compile changing to FullDecompile reduces it to 900ms on average.
The shader decoder will automatically fallback to a more naive method if
it can't use full decompile.
Adds optional support for Nsight Aftermath. It is enabled through
ENABLE_NSIGHT_AFTERMATH in cmake. A path to the SDK has to be provided
by the environment variable NSIGHT_AFTERMATH_SDK.
Nsight Aftermath allows an application to generate "minidumps" of the
GPU state when a device loss happens. By analysing these on Nsight we
can know what a game was doing and why it triggered a device loss.
The dump is generated inside %APPDATA%\yuzu\log\gpucrash and this
directory is deleted every time a new instance is initialized with
Nsight enabled.
To enable it on yuzu there has a to be a driver and device capable of
running Nsight Aftermath on Vulkan. That means only Turing based GPUs
on the latest stable driver, beta drivers won't work for now.
It is manually enabled in Configuration>Debug>Enable Graphics Debugging
because when using all debugging capabilities there is a runtime cost.
The base level is already included in the texture view. If we specify
the base level in the texture again, this will end up in the incorrect
level and potentially out of bounds.
This also fixes Turing issues but it avoids doing more bitcasts. This
should improve the generated code while also avoiding more points where
compilers can flush floats.
Implements the common usages for VMNMX. Inputs with a different size
than 32 bits are not supported and sign mismatches aren't supported
either.
VMNMX works as follows:
It grabs Ra and Rb and applies a maximum/minimum on them (this is
defined by .MX), having in mind the input sign. This result can then be
saturated. After the intermediate result is calculated, it applies
another operation on it using Rc. These operations are merges,
accumulations or another min/max pass.
This instruction allows to implement with a more flexible approach GCN's
min3 and max3 instructions (for instance).
preserve_contents was always true. We can't assume we don't have to
preserve clears because scissored and color masked clears exist.
This removes preserve_contents and assumes it as true at all times.
Since commit e22816a5bb we handle type mismatches from the CPU.
We don't need to hack our shader decoder due to game bugs anymore.
Removed in this commit.
This is a reversed look up table extracted from
https://gist.github.com/rygorous/2203834#file-gistfile1-cpp-L41-L62
that is used in
04d4e9e587/source/maxwell/tsc_generate.cpp (L38)
Games usually bind 0xFD expecting a float texture border of 1.0f.
The conversion previous to this commit was multiplying the uint8 sRGB
texture border color by 255. This is close to 1.0f but when that
difference matters, some graphical glitches appear.
This look up table is manually changed in the edges, clamping towards
0.0f and 1.0f.
While we are at it, move this logic to its own translation unit.
Reimplements I2I adding sign extension, saturation (clamp source value
to the destination), selection and destination sizes that are not 32
bits wide.
It doesn't implement CC yet.
Implements a reduction operation. It's an atomic operation that doesn't
return a value.
This commit introduces another primitive because some shading languages
might have a primitive for reduction operations.
Credits go to gdkchan and Ryujinx. The pull request used for this can
be found here: https://github.com/Ryujinx/Ryujinx/pull/1082
yuzu was already using the header for interpolation, but it was missing
the FragCoord.w multiplication described in the linked pull request.
This commit finally removes the FragCoord.w == 1.0f hack from the shader
decompiler.
While we are at it, this commit renames some enumerations to match
Nvidia's documentation (linked below) and fixes component declaration
order in the shader program header (z and w were swapped).
https://github.com/NVIDIA/open-gpu-doc/blob/master/Shader-Program-Header/Shader-Program-Header.html
The intention behind a Vulkan wrapper is to drop Vulkan-Hpp.
The issues with Vulkan-Hpp are:
- Regular breaks of the API.
- Copy constructors that do the same as the aggregates (fixed recently)
- External dynamic dispatch that is hard to remove
- Alias KHR handles with non-KHR handles making it impossible to use
smart handles on Vulkan 1.0 instances with extensions that were included
on Vulkan 1.1.
- Dynamic dispatchers silently change size depending on preprocessor
definitions. Different files will have different dispatch definitions,
generating all kinds of hard to debug memory issues.
In other words, Vulkan-Hpp is not "production ready" for our needs and
this wrapper aims to replace it without losing RAII and exception
safety.
This information is required to properly implement SULD.B. It might also
be handy for all image operations, since it would allow us to implement
them on devices that require the image format to be specified (on
desktop, this would be AMD on OpenGL and Intel on OpenGL and Vulkan).
Rounding operations only matter when the conversion size of source and
destination is the same, i.e. .F16.F16, .F32.F32 and .F64.F64.
When there is a mismatch (.F16.F32), these bits are used for IEEE
rounding, we don't emulate this because GLSL and SPIR-V don't support
configuring it per operation.
Changes the GraphicsContext to be managed by the GPU core. This
eliminates the need for the frontends to fool around with tricky
MakeCurrent/DoneCurrent calls that are dependent on the settings (such
as async gpu option).
This also refactors out the need to use QWidget::fromWindowContainer as
that caused issues with focus and input handling. Now we use a regular
QWidget and just access the native windowHandle() directly.
Another change is removing the debug tool setting in FrameMailbox.
Instead of trying to block the frontend until a new frame is ready, the
core will now take over presentation and draw directly to the window if
the renderer detects that its hooked by NSight or RenderDoc
Lastly, since it was in the way, I removed ScopeAcquireWindowContext and
replaced it with a simple subclass in GraphicsContext that achieves the
same result