This moves dynamic state present in VK_EXT_extended_dynamic_state to a
separate structure in FixedPipelineState. This is structure is at the
bottom allowing us to hash and memcmp only when the extension is not
supported.
Avoid illegal copies. This intercepts the last step of a copy to avoid
generating validation errors or corrupting the driver on some instances.
We can create views and emit copies accordingly in future commits and
remove this last-step validation.
Add a flat table to test if it's legal to create a texture view between
two formats or copy betweem them.
This table is based on ARB_copy_image and ARB_texture_view. Copies are
more permissive than views.
After marking buffers as resident, Nvidia's driver seems to take a
slow path. To workaround this issue, copy to a STREAM_READ buffer and
then call GetNamedBufferSubData on it.
This is a temporary solution until we have asynchronous flushing.
Previously if applications would send faulty buffers(example homebrew) it would lead to us returning uninitalized data. Switching from ASSERT_MSG to ASSERT_OR_EXECUTE_MSG allows us to have a fail safe to prevent crashes but also continue execution without introducing undefined behavior
Making the stream buffer resident increases GPU usage significantly on
some games. This seems to be addressed invalidating the stream buffer
with InvalidateBufferData instead of using a Unmap + Map (with
invalidation flags).
Switch games are allowed to bind less data than what they use in a
vertex buffer, the expected behavior here is that these values are read
as zero. At the moment of writing this only D3D12, OpenGL and NVN through
NV_vertex_buffer_unified_memory support vertex buffer with a size limit.
In theory this could be emulated on Vulkan creating a new VkBuffer for
each (handle, offset, length) tuple and binding the expected data to it.
This is likely going to be slow and memory expensive when used on the
vertex buffer and we have to do it on all draws because we can't know
without analyzing indices when a game is going to read vertex data out
of bounds.
This is not a problem on OpenGL's BufferAddressRangeNV because it takes
a length parameter, unlike Vulkan's CmdBindVertexBuffers that only takes
buffers and offsets (the length is implicit in VkBuffer). It isn't a
problem on D3D12 either, because D3D12_VERTEX_BUFFER_VIEW on
IASetVertexBuffers takes SizeInBytes as a parameter (although I am not
familiar with robustness on D3D12).
Currently this only implements buffer ranges for vertex buffers,
although indices can also be affected. A KHR_robustness profile is not
created, but Nvidia's driver reads out of bound vertex data as zero
anyway, this might have to be changed in the future.
- Fixes SMO random triangles when capturing an enemy, getting hit, or
looking at the environment on certain maps.
Make stream buffer and cached buffers as resident and query their
address. This allows us to use GPU addresses for several proprietary
Nvidia extensions.
Expose NV_vertex_buffer_unified_memory when the driver supports it.
This commit adds a function the determine if a GL_RENDERER is a Turing
GPU. This is required because on Turing GPUs Nvidia's driver crashes
when the buffer is marked as resident or on DeleteBuffers. Without a
synchronous debug output (single threaded driver), it's likely that
the driver will crash in the first blocking call.
Add HSET2_IMM. Due to the complexity of the encoding avoid using
BitField unions and read the relevant bits from the code itself.
This is less error prone.
Update validation layer string to VK_LAYER_KHRONOS_validation.
While we are at it, properly check for available validation layers
before enabling them.
Enable GL_EXT_texture_shadow_lod if available. If this extension is not available, such as on Intel/AMD proprietary drivers, use textureGrad as a workaround.
Occurs when doing a local compile in MSVC build. The compiler I'm using is as below:
Microsoft Visual Studio Community 2019 Preview
Version 16.6.0 Preview 5.0
Fixes this error:
CVTRES : fatal error CVT1100: duplicate resource. type:MANIFEST, name:1, language:0x0409
LINK : fatal error LNK1123: failure during conversion to COFF: file invalid or corrupt
I have put 0 since previous name was 1. If have other names in mind, please let me know.
Co-Authored-By: dragios <dragios@users.noreply.github.com>
Variables that are marked as const cannot have the move constructor
invoked when returning from a function (the move constructor requires a
non-const variable so it can "steal" the resources from it.
Check() can throw an exception if the Vulkan result isn't successful.
We remove the check so that std::terminate isn't outright called and
allows for better debugging (should it ever actually fail).
Renames some variables to prevent ones in inner scopes from shadowing
outer-scoped variables.
The Copy* functions have no shadowing, but we rename them anyways to
remain consistent with the other functions.
We can reduce the capture scope so that it's not possible for both "reg"
variables to clash with one another.
While we're at it, we can prevent unnecessary copies while we're at it.
There's no need to load contents from the CPU when a clear resets all
the contents of the underlying memory. This is already implemented on
OpenGL and the texture cache.
maxwell_to_vk: Reorder filtering modes to start with None, then Nearest, then Linear.
maxwell_to_vk: Logs filter modes under UNREACHABLE_MSG instead of UNIMPLEMENTED_MSG, since any unknown filter modes are invalid and not unimplemented.
maxwell_to_vk: Return VK_SAMPLER_MIPMAP_MODE_NEAREST instead of VK_SAMPLER_MIPMAP_MODE_LINEAR when mipmap_filter is None with the description from the VkSamplerCreateInfo(3) man page.
maxwell_to_gl: Log unimplemented features under UNIMPLEMENTED_MSG instead of LOG_ERROR to bring into parity with maxwell_to_vk
maxwell_to_gl: Deduplicate logging in VertexType(), merging them into one.
maxwell_to_gl: Return GL_NEAREST instead of GL_LINEAR if an unknown texture filter mode is encountered.
maxwell_to_gl: Log the mipmap filter mode if an unknown value is passed in.
maxwell_to_gl: Reorder filtering modes to start with None, then Nearest, then Linear.
Due to the limitation of GL_MAX_IMAGE_UNITS being low (8) on Intel's and Nvidia's proprietary drivers, we have to reserve an appropriate amount of image bindings for each of the stages. So far games have been observed to use 4 image bindings on the fragment stage (Kirby Star Allies) and 1 on the vertex stage (TWD series).
No games thus far in my limited testing used more than 4 images concurrently and across all currently active programs.
This fixes shader compilation errors on Kirby Star Allies on OpenGL (GLSL/GLASM)
All registers are now callee-save registers.
RBX and RBP selected for STATE and RESULT because these are most commonly accessed; this is to avoid the REX prefix.
RBP not used for STATE because there are some SIB restrictions, RBX emits smaller code.
Emit code compatible with NV_gpu_program5.
This should emit code compatible with Fermi, but it wasn't tested on
that architecture. Pascal has some issues not present on Turing GPUs.
GetTotalPhysicalMemoryAvailableWithoutSystemResource & GetTotalPhysicalMemoryUsedWithoutSystemResource seem to subtract the resource size from the usage.
Instead of using as template argument a shared pointer, use the
underlying type and manage shared pointers explicitly. This can make
removing shared pointers from the cache more easy.
While we are at it, make some misc style changes and general
improvements (like insert_or_assign instead of operator[] + operator=).
Vertex buffers bindings become invalid after the stream buffer is
invalidated. We were originally doing this, but it got lost at some
point.
- Fixes Animal Crossing: New Horizons, but it affects everything.
This allows rendering to 3D textures with more than one slice.
Applications are allowed to render to more than one slice of a texture
using gl_Layer from a VTG shader.
This also requires reworking how 3D texture collisions are handled, for
now, this commit allows rendering to slices but not to miplevels. When a
render target attempts to write to a mipmap, we fallback to the previous
implementation (copying or flushing as needed).
- Fixes color correction 3D textures on UE4 games (rainbow effects).
- Allows Xenoblade games to render to 3D textures directly.
Implement a generic shader cache for fast lookups and invalidations.
Invalidations are cheap but expensive when a shader is invalidated.
Use two mutexes instead of one to avoid locking invalidations for
lookups and vice versa. When a shader has to be removed, lookups are
locked as expected.
Skip fast buffer uploads on Nvidia 443.24 Vulkan beta driver on OpenGL.
This driver throws the following error when calling BufferSubData or
BufferData on buffers that are candidates for fast constant buffer
uploads. This is the equivalens to push constants on Vulkan, except that
they can access the full buffer. The error:
Unknown internal debug message. The NVIDIA OpenGL driver has encountered
an out of memory error. This application might
behave inconsistently and fail.
If this error persists on future drivers, we might have to look deeper
into this issue. For now, we can black list it and log it as a temporary
solution.
Games using D3D idioms can join images and samplers when a shader
executes, instead of baking them into a combined sampler image. This is
also possible on Vulkan.
One approach to this solution would be to use separate samplers on
Vulkan and leave this unimplemented on OpenGL, but we can't do this
because there's no consistent way of determining which constant buffer
holds a sampler and which one an image. We could in theory find the
first bit and if it's in the TIC area, it's an image; but this falls
apart when an image or sampler handle use an index of zero.
The used approach is to track for a LOP.OR operation (this is done at an
IR level, not at an ISA level), track again the constant buffers used as
source and store this pair. Then, outside of shader execution, join
the sample and image pair with a bitwise or operation.
This approach won't work on games that truly use separate samplers in a
meaningful way. For example, pooling textures in a 2D array and
determining at runtime what sampler to use.
This invalidates OpenGL's disk shader cache :)
- Used mostly by D3D ports to Switch
NV_transform_feedback, NV_transform_feedback2 and
ARB_transform_feedback3 with NV_transform_feedback interactions allows
implementing transform feedbacks as dynamic state.
Maxwell implements transform feedbacks as dynamic state, so using these
extensions with TransformFeedbackStreamAttribsNV allows us to properly
emulate transform feedbacks without having to recompile shaders when the
state changes.
On Intel's proprietary drivers, gl_Layer and gl_ViewportIndex are not allowed members of gl_PerVertex block, causing the shader to fail to compile. Fix this by declaring these variables outside of gl_PerVertex.
This avoids using Nvidia's ASTC decoder on OpenGL.
The last time it was profiled, it was slower than yuzu's decoder.
While we are at it, fix a bug in the texture cache when native ASTC is
not supported.
Previously we were disabling compute shaders on Intel's proprietary driver due to broken compute. This has been fixed in the latest Intel drivers. Re-enable compute for Intel proprietary drivers and remove the check for broken compute.
Geometry shaders built from Nvidia's compiler check for bits[16:23] to
be less than or equal to 0 with VSETP to default to a "safe" value of
0x8000'0000 (safe from hardware's perspective). To avoid hitting this
path in the shader, return 0x00ff'0000 from S2R INVOCATION_INFO.
This seems to be the maximum number of vertices a geometry shader can
emit in a primitive.
Implement more surface reconstruct cases. Allow overlaps with more than
one layer and mipmap and copies all of them to the new texture.
- Fixes textures moving around objects on Xenoblade games
Changes many patch_manager functions to use a case-less variant of
GetSubdirectory. Fixes patches not showing up on *nix systems when
patch directories are named with odd cases, i.e. `exeFS'.
Avoid copying to a staging buffer on non-granular memory addresses.
Add a callable argument to StreamBufferUpload to be able to copy to the
staging buffer directly from ReadBlockUnsafe.
Stop ignoring image swizzles on depth and stencil images.
This doesn't fix a known issue on Xenoblade Chronicles 2 where an OpenGL
texture changes swizzles twice before being used. A proper fix would be
having a small texture view cache for this like we do on Vulkan.
While Vulkan was assuming we had no negative viewports, OpenGL code
was assuming we had them. Port the old code from Vulkan to OpenGL,
checking if the first viewport is negative before flipping faces.
This is not a complete implementation since we only check for the first
viewport to be negative. That said, unless a game is using Vulkan,
OpenGL and NVN games should be fine here, and we can always compare with
our Vulkan backend to see if there's a difference.
The check to flip faces when viewports are negative were a left over
from the old OpenGL code. This is not required on Vulkan where we have
negative viewports.
Hardware S2R special registers match gl_Thread*MaskNV. We can trivially
implement these using Nvidia's extension on OpenGL or naively stubbing
them with the ARB instructions to match. This might cause issues if the
host device warp size doesn't match Nvidia's. That said, this is
unlikely on proper shaders.
Refer to the attached url for more documentation about these flags.
https://www.khronos.org/registry/OpenGL/extensions/NV/NV_shader_thread_group.txt
Some operations like atomicMin were ignored because they returned were
being stored to RZ. This operations have a side effect and it was being
ignored.
Drop the std::list hack to allocate memory indefinitely.
Instead use a custom allocator that keeps references valid until
destruction. This allocates fixed chunks of memory and puts pointers in
a free list. When an allocation is no longer used put it back to the
free list, this doesn't heap allocate because std::vector doesn't change
the capacity. If the free list is empty, allocate a new chunk.
Most overlaps in the buffer cache only contain one mapped address.
We can avoid close to all heap allocations once the buffer cache is
warmed up by using a small_vector with a stack size of one.
Instead of using boost::icl::interval_map for caching, use
boost::intrusive::set. interval_map is intended as a container where the
keys can overlap with one another; we don't need this for caching
buffers and a std::set-like data structure that allows us to search with
lower_bound is enough.
This has been wrong since 0432af5ad1
I haven't found a game that called this function (and I haven't tried this on a real Switch), and because of this I haven't been able to check if the number in assert OR the string in the assert is wrong, but one of the two is wrong:
NetworkProfileData is 0x18E, while SfNetworkProfileData is 0x17C, according to Switchbrew
Switchbrew doesn't officially say that NetworkProfileData's size is 0x18E but it's possible to calculate its size since Switchbrew provides the size and the offset of all the components of NetworkProfileData (which isn't currently implemented in yuzu, alongside SfNetworkProfileData)
NetworkProfileData documentation: https://switchbrew.org/wiki/Network_Interface_services#NetworkProfileData
SfNetworkProfileData documentation: https://switchbrew.org/wiki/Network_Interface_services#SfNetworkProfileData
Since I trust ogniK's work on reversing NIFM, I'd assume this was just a typo in the string
Previously, we were reading the keys everytime a KeyManager object was created, causing yuzu to reread the keys file multiple hundreds of times when loading the game list.
With this change, it is only loaded once.
On my system, this decreased game list loading times by a factor of 20.
Add code required to use OpenGL assembly programs based on
NV_gpu_program5. Decompilation for ARB programs is intended to be added
in a follow up commit. This does **not** include ARB decompilation and
it's not in an usable state.
The intention behind assembly programs is to reduce shader stutter
significantly on drivers supporting NV_gpu_program5 (and other required
extensions). Currently only Nvidia's proprietary driver supports these
extensions.
Add a UI option hidden for now to avoid people enabling this option
accidentally.
This code path has some limitations that OpenGL compatibility doesn't
have:
- NV_shader_storage_buffer_object is limited to 16 entries for a single
OpenGL context state (I don't know if this is an intended limitation, an
specification issue or I am missing something). Currently causes issues
on The Legend of Zelda: Link's Awakening.
- NV_parameter_buffer_object can't bind buffers using an offset
different to zero. The used workaround is to copy to a temporary buffer
(this doesn't happen often so it's not an issue).
On the other hand, it has the following advantages:
- Shaders build a lot faster.
- We have control over how floating point rounding is done over
individual instructions (SPIR-V on Vulkan can't do this).
- Operations on shared memory can be unsigned and signed.
- Transform feedbacks are dynamic state (not yet implemented).
- Parameter buffers (uniform buffers) are per stage, matching NVN and
hardware's behavior.
- The API to bind and create assembly programs makes sense, unlike
ARB_separate_shader_objects.
Constant attributes (in OpenGL known disabled attributes) are not
supported on Vulkan, even with extensions. To emulate this behavior we
return zero on reads from disabled vertex attributes in shader code.
This has no caching cost because attribute formats are not dynamic state
on Vulkan and we have to store it in the pipeline cache anyway.
- Fixes Animal Crossing: New Horizons terrain borders
This was a left over from OpenGL when disabled buffers where not properly
emulated. We no longer have to assert this as it is checked in vertex
buffer initialization.
Previously we never cleared the states of the entries and the key would stay held down, also looping over the key bytes for each key lead to setting every bit for the key state instead of the key we wanted
"Not equal" operators on GLSL seem to behave as unordered when we expect
an ordered comparison.
Manually emulate this checking for LGE values (numbers, not-NaNs).
* Remove git submodules that will be loaded through conan
* Move custom Find modules to their own folder
* Use conan for downloading missing external dependencies
* CI: Change the yuzu source folder user to the user that the containers run on
* Attempt to remove dirty mingw build hack
* Install conan on the msvc build
* Only set release build type when using not using multi config generator
* Re-add qt bundled to workaround an issue with conan qt not downloading prebuilt binaries
* Add workaround for submodules that use legacy CMAKE variables
* Re-add USE_BUNDLED_QT on the msvc build bot
This should fix grass interactions on Breath of the Wild on Vulkan.
It is currently untested against validation layers.
Nvidia's Windows 443.09 beta driver or Linux 440.66.12 is required for
now.
While èis generally representable in some language encodings, in some
it isn't and will result in compilation warnings occurring. To remain
friendly with other language's codepages on Windows, we normalize it to
an ASCII e.
In file included from src/video_core/renderer_opengl/renderer_opengl.cpp:25:
In file included from src/./video_core/renderer_opengl/gl_rasterizer.h:26:
In file included from src/./video_core/renderer_opengl/gl_fence_manager.h:11:
src/./video_core/fence_manager.h:91:32: error: use 'template' keyword
to treat 'Write' as a dependent template name
memory_manager.Write<u32>(current_fence->GetAddress(), current_fence->GetPayload());
^
template
src/./video_core/fence_manager.h:137:32: error: use 'template'
keyword to treat 'Write' as a dependent template name
memory_manager.Write<u32>(current_fence->GetAddress(), current_fence->GetPayload());
^
template
Return the proper state of vr mode for IsVrModeEnabled
We should not return an error for SetVrModeEnabled. When VR Mode is turned on, it signals to lbl to turn vr mode on, not return an error code
Reduces some header churn and reduces rebuilds when some header
internals change.
While we're at it we can also resolve a missing include in buffer_cache.
Xenoblade 2 invokes a draw call with zero vertices.
This is likely due to indirect drawing (glDrawArraysIndirect).
This causes a crash in the staging buffer pool when trying to create a
buffer with a size of zero. To workaround this, skip index buffer setup
entirely when the number of indices is zero.
Drop MemoryBarrier from the buffer cache and use Maxwell3D's register
WaitForIdle.
To implement this on OpenGL we just call glMemoryBarrier with the
necessary bits.
Vulkan lacks this synchronization primitive, so we set an event and
immediately wait for it. This is not a pretty solution, but it's what
Vulkan can do without submitting the current command buffer to the queue
(which ends up being more expensive on the CPU).
Using deko3d as reference:
4e47ba0013/source/maxwell/gpu_3d_state.cpp (L42)
We were using bits 3 and 4 to determine depth clamping, but these are
the same both enabled and disabled:
state->depthClampEnable ? 0x101A : 0x181D
The same happens on Nvidia's OpenGL driver, where they do something like
this (default capabilities, GL 4.5 compatibility):
(state & DEPTH_CLAMP) != 0 ? 0x201a : 0x281c
There's always a difference between the first bits in this register, but
bit 11 is consistently disabled on both deko3d/NVN and OpenGL. This
commit changes yuzu's behaviour to use bit 11 to determine depth
clamping.
- Fixes depth issues on Super Mario Odyssey's intro.
This reverts commit 94b0e2e5da.
preserve_contents proved to be a meaningful optimization. This commit
reintroduces it but properly implemented on OpenGL.
We have to make sure the clear removes all the previous contents of the
image.
It's not currently implemented on Vulkan because we can do smart things
there that's preferred to be introduced in a separate commit.
Deduplicate code shared between vk_pipeline_cache and gl_shader_cache as
well as shader decoder code.
While we are at it, fix a bug in gl_shader_cache where compute shaders
had an start offset of a stage shader.
Signed integer addition overflow might be undefined behavior. It's free
to change operations to UAdd and use unsigned integers to avoid
potential bugs.
P2R CC takes the state of condition codes and puts them into a register.
We already have this implemented for PR (predicates). This commit
implements CC over that.
Sometimes for unknown reasons NVN games can bind a render target format
of 0. This may be a yuzu bug.
With the commits before this the formats were specified without being
"packed", assuming all formats and texceptions will be written like in
the color_attachments vector.
To address this issue, iterate all render targets and pack them as they
are valid. This way they will match color_attachments.
- Fixes validation errors and graphical issues on Breath of the Wild.
Currently SetBufferCount doesn't write to the out buffer which then contains uninitialized data. This leads to non-zero data which leads to responding with different error codes
We can also allow unicorn to be constructed in 32-bit mode or 64-bit
mode to satisfy the need for both interpreter instances.
Allows this code to compile successfully of non x86-64 architectures.
Any time the lambda function is called, the permission being used in the
capture would be passed in as an argument to the lambda, so the capture
is unnecessary.
The encoding for negation and absolute value was wrong.
Extracting is now done manually. Similar instructions having different
encodings is the rule, not the exception. To keep sanity and readability
I preferred to extract the desired bit manually.
This is implemented against nxas:
8dbc389957/table.h (L68)
That is itself tested against nvdisasm (Nvidia's official disassembler).
This allows deducing some properties from the texture instruction before
asking the runtime. By doing this we can handle type mismatches in some
instructions from the renderer instead of the shader decoder.
Fixes texelFetch issues with games using 2D texture instructions on a 1D
sampler.
Patch the RomFS with the selected updates before dumping. Previously the resulting RomFS only contained data from the original title.
To dump the RomFS without updates the user can disable the update under Properties before choosing Dump RomFS.
The intention behind this was to assign a float to from an uint32_t, but
it was unintentionally being copied directly into the std::optional.
Copy to a temporary and assign that temporary to std::optional. This can
be replaced with std::bit_cast<float> once we are in C++20.