Commit graph

4033 commits

Author SHA1 Message Date
Rodrigo Locatti 64cd46579b
Merge pull request #3303 from lioncash/reorder
control_flow: Silence -Wreorder warning for CFGRebuildState
2020-01-14 16:15:18 -03:00
Lioncash a1eee1749e control_flow: Silence -Wreorder warning for CFGRebuildState
Organizes the initializer list in the same order that the variables
would actually be initialized in.
2020-01-14 13:28:48 -05:00
Lioncash f10ea944e0 gl_shader_cache: Remove unused STAGE_RESERVED_UBOS constant
Given this isn't used, this can be removed entirely.
2020-01-14 13:16:52 -05:00
Lioncash 4cd5ad90f3 gl_shader_cache: std::move entries in CachedShader constructor
Avoids several reallocations of std::vector instances where applicable.
2020-01-14 13:14:16 -05:00
Lioncash 15a6840e7a gl_shader_cache: Remove unused entries variable in BuildShader()
Eliminates a few unnecessary constructions of std::vectors.
2020-01-14 13:11:49 -05:00
bunnei 55f95e7f26
Merge pull request #3287 from ReinUsesLisp/ldg-stg-16
shader_ir/memory: Implement u16 and u8 for STG and LDG
2020-01-14 09:57:08 -05:00
bunnei 15788ffcde
Merge pull request #3288 from ReinUsesLisp/uncurse-aoffi
shader_ir/texture: Simplify AOFFI code
2020-01-13 23:52:12 -05:00
bunnei 6985eea519
Merge pull request #3290 from ReinUsesLisp/gl-clamp
maxwell_to_vk: Implement GL_CLAMP hacking Nvidia's driver
2020-01-13 19:16:06 -05:00
ReinUsesLisp 09e17fbb0f vk_texture_cache: Implement generic texture cache on Vulkan
It currently ignores PBO linearizations since these should be dropped as
soon as possible on OpenGL.
2020-01-13 20:37:50 -03:00
ReinUsesLisp 2b2712fa95 texture_cache/surface_params: Make GetNumLayers public 2020-01-13 20:35:43 -03:00
Rodrigo Locatti b1138e5ea1
vk_compute_pass: Address feedback
Comment hardcoded SPIR-V modules.
2020-01-10 22:46:34 -03:00
ReinUsesLisp 3d46709b7f maxwell_to_vk: Implement GL_CLAMP hacking Nvidia's driver
Nvidia's driver defaults invalid enumerations to GL_CLAMP. Vulkan
doesn't expose GL_CLAMP through its API, but we can hack it on Nvidia's
driver using the internal driver defaults.
2020-01-10 17:12:50 -03:00
ReinUsesLisp 13021b534c shader_ir/texture: Simplify AOFFI code 2020-01-09 03:50:37 -03:00
ReinUsesLisp e2a2a556b9 shader_ir/memory: Implement u16 and u8 for STG and LDG
Using the same technique we used for u8 on LDG, implement u16.

In the case of STG, load memory and insert the value we want to set
into it with bitfieldInsert. Then set that value.
2020-01-09 02:12:29 -03:00
ReinUsesLisp 908e085d02 vk_compute_pass: Add compute passes to emulate missing Vulkan features
This currently only supports quad arrays and u8 indices.

In the future we can remove quad arrays with a table written from the
CPU, but this was used to bootstrap the other passes helpers and it
was left in the code.

The blob code is generated from the "shaders/" directory. Read the
instructions there to know how to generate the SPIR-V.
2020-01-08 19:24:26 -03:00
ReinUsesLisp 82a64da077 vk_shader_util: Add helper to build SPIR-V shaders 2020-01-08 19:22:20 -03:00
ReinUsesLisp 6888d776ff vk_pipeline_cache: Initial implementation
Given a pipeline key, this cache returns a pipeline abstraction (for
graphics or compute).
2020-01-06 22:02:26 -03:00
ReinUsesLisp 2effdeb924 vk_graphics_pipeline: Initial implementation
This abstractio represents the state of the 3D engine at a given draw.
Instead of changing individual bits of the pipeline how it's done in
APIs like D3D11, OpenGL and NVN; on Vulkan we are forced to put
everything together into a single, immutable object.

It takes advantage of the few dynamic states Vulkan offers.
2020-01-06 22:02:26 -03:00
ReinUsesLisp dc96a59fa0 vk_compute_pipeline: Initial implementation
This abstraction represents a Vulkan compute pipeline.
2020-01-06 22:02:26 -03:00
ReinUsesLisp b392a5986e vk_pipeline_cache: Add file and define descriptor update template filler
This function allows us to share code between compute and graphics
pipelines compilation.
2020-01-06 22:02:26 -03:00
ReinUsesLisp 3142f1b597 fixed_pipeline_state: Add depth clamp 2020-01-06 22:02:26 -03:00
ReinUsesLisp 9c548146ca vk_rasterizer: Add placeholder 2020-01-06 22:02:26 -03:00
bunnei 5be00cba15
Merge pull request #3276 from ReinUsesLisp/pipeline-reqs
vk_update_descriptor/vk_renderpass_cache: Add pipeline cache dependencies
2020-01-06 17:03:34 -05:00
ReinUsesLisp 5aeff9aff5 vk_renderpass_cache: Initial implementation
The renderpass cache is used to avoid creating renderpasses on each
draw. The hashed structure is not currently optimized.
2020-01-06 18:28:32 -03:00
ReinUsesLisp 322d6a0311 vk_update_descriptor: Initial implementation
The update descriptor is used to store in flat memory a large chunk of
staging data used to update descriptor sets through templates. It
provides a push interface to easily insert descriptors following the
current pipeline. The order used in the descriptor update template has
to be implicitly followed. We can catch bugs here using validation
layers.
2020-01-06 18:28:32 -03:00
ReinUsesLisp 5b01f80a12 vk_stream_buffer/vk_buffer_cache: Avoid halting and use generic cache
The stream buffer before this commit once it was full (no more bytes to
write before looping) waiting for all previous operations to finish.
This was a temporary solution and had a noticeable performance penalty
in performance (from what a profiler showed).

To avoid this mark with fences usages of the stream buffer and once it
loops wait for them to be signaled. On average this will never wait.
Each fence knows where its usage finishes, resulting in a non-paged
stream buffer.

On the other side, the buffer cache is reimplemented using the generic
buffer cache. It makes use of the staging buffer pool and the new
stream buffer.
2020-01-06 18:13:41 -03:00
ReinUsesLisp ceb851b590 vk_memory_manager: Misc changes
* Allocate memory in discrete exponentially increasing chunks until the
128 MiB threshold. Allocations larger thant that increase linearly by
256 MiB (depending on the required size). This allows to use small
allocations for small resources.

* Move memory maps to a RAII abstraction. To optimize for debugging
tools (like RenderDoc) users will map/unmap on usage. If this ever
becomes a noticeable overhead (from my profiling it doesn't) we can
transparently move to persistent memory maps without harming the API,
getting optimal performance for both gameplay and debugging.

* Improve messages on exceptional situations.

* Fix typos "requeriments" -> "requirements".

* Small style changes.
2020-01-06 18:13:41 -03:00
ReinUsesLisp 85bb6a6f08 vk_buffer_cache: Temporarily remove buffer cache
This is intended for a follow up commit to avoid circular dependencies.
2020-01-06 17:58:46 -03:00
bunnei 89fc75d769
Merge pull request #3257 from degasus/no_busy_loops
video_core: Block in WaitFence.
2020-01-06 00:09:57 -05:00
Fernando Sahmkow 56e450a3f7
Merge pull request #3264 from ReinUsesLisp/vk-descriptor-pool
vk_descriptor_pool: Initial implementation
2020-01-05 15:54:41 -04:00
bunnei cd0a7dfdbc
Merge pull request #3258 from FernandoS27/shader-amend
Shader_IR: add the ability to amend code in the shader ir.
2020-01-04 14:05:17 -05:00
Fernando Sahmkow 3dd6b55851 Shader_IR: Address Feedback 2020-01-04 14:40:57 -04:00
Fernando Sahmkow a1667a7b46 Shader_IR: Implement TXD Array.
This commit extends the compilation of TXD to support array samplers on
TXD.
2020-01-04 13:28:02 -04:00
Rodrigo Locatti 6e347d8d1b
Update src/video_core/renderer_vulkan/vk_descriptor_pool.cpp
Co-Authored-By: Mat M. <mathew1800@gmail.com>
2020-01-03 17:34:30 -03:00
ReinUsesLisp 0d6d8129c4 yuzu: Remove Maxwell debugger
This was carried from Citra and wasn't really used on yuzu. It also adds
some runtime overhead. This commit removes it from yuzu's codebase.
2020-01-02 23:09:44 -03:00
bunnei ae0e481677
Merge pull request #3243 from ReinUsesLisp/topologies
maxwell_to_gl: Implement missing primitive topologies
2020-01-01 20:33:33 -05:00
ReinUsesLisp 1fe7df4517 vk_descriptor_pool: Initial implementation
Create a large descriptor pool where we allocate all our descriptors
from. It has to be wide enough to support any pipeline, hence its large
numbers.

If the descritor pool is filled, we allocate more memory at that moment.
This way we can take advantage of permissive drivers like Nvidia's that
allocate more descriptors than what the spec requires.
2020-01-01 16:44:06 -03:00
bunnei 028b2718ed
Merge pull request #3239 from ReinUsesLisp/p2r
shader/p2r: Implement P2R Pr
2019-12-31 20:37:16 -05:00
Fernando Sahmkow b3371ed09e Shader_IR: add the ability to amend code in the shader ir.
This commit introduces a mechanism by which shader IR code can be
amended and extended. This useful for track algorithms where certain
information can derived from before the track such as indexes to array
samplers.
2019-12-30 15:31:48 -04:00
Fernando Sahmkow 7bd447355f
Merge pull request #3248 from ReinUsesLisp/vk-image
vk_image: Add an image object abstraction
2019-12-30 14:25:14 -04:00
Rodrigo Locatti 4cbb363d3f
vk_image: Avoid unnecesary equals 2019-12-30 13:28:23 -03:00
Fernando Sahmkow 287d5921cf
Merge pull request #3249 from ReinUsesLisp/vk-staging-buffer-pool
vk_staging_buffer_pool: Add a staging pool for temporary operations
2019-12-30 12:25:59 -04:00
Markus Wick cb9dd01ffd video_core: Block in WaitFence.
This function is called rarely and blocks quite often for a long time.
So don't waste power and let the CPU sleep.

This might also increase the performance as the other cores might be allowed to clock higher.
2019-12-30 13:04:53 +01:00
Rodrigo Locatti f2c61bbe13
vk_staging_buffer_pool: Initialize last epoch to zero 2019-12-29 19:19:43 -03:00
Fernando Sahmkow f846e3d6d0
Merge pull request #3250 from ReinUsesLisp/empty-fragment
gl_rasterizer: Allow rendering without fragment shader
2019-12-28 14:33:53 -04:00
bunnei 8a76f816a4
Merge pull request #3228 from ReinUsesLisp/ptp
shader/texture: Implement AOFFI and PTP for TLD4 and TLD4S
2019-12-26 21:43:44 -05:00
ReinUsesLisp 5b989f189f
gl_rasterizer: Allow rendering without fragment shader
Rendering without a fragment shader is usually used in depth-only
passes.
2019-12-26 16:38:49 -03:00
ReinUsesLisp 3813af2f3c
vk_staging_buffer_pool: Add a staging pool for temporary operations
The job of this abstraction is to provide staging buffers for temporary
operations. Think of image uploads or buffer uploads to device memory.

It automatically deletes unused buffers.
2019-12-25 18:12:17 -03:00
ReinUsesLisp c83bf7cd1e
vk_image: Add an image object abstraction
This object's job is to contain an image and manage its transitions.
Since Nvidia hardware doesn't know what a transition is but Vulkan
requires them anyway, we have to state track image subresources
individually.

To avoid the overhead of tracking each subresource in images with many
subresources (think of cubemap arrays with several mipmaps), this commit
tracks when subresources have diverged. As long as this doesn't happen
we can check the state of the first subresource (that will be shared
with all subresources) and update accordingly.

Image transitions are deferred to the scheduler command buffer.
2019-12-25 18:00:16 -03:00
Fernando Sahmkow 5619d24377
Merge pull request #3244 from ReinUsesLisp/vk-fps
fixed_pipeline_state: Define structure and loaders
2019-12-25 14:31:29 -04:00
bunnei 4af569ee47
Merge pull request #3236 from ReinUsesLisp/rasterize-enable
gl_rasterizer: Implement RASTERIZE_ENABLE
2019-12-24 22:54:10 -05:00
ReinUsesLisp b9e3f5eb36
fixed_pipeline_state: Define symetric operator!= and mark as noexcept
Marks as noexcept Hash, operator== and operator!= for consistency.
2019-12-24 18:24:08 -03:00
ReinUsesLisp 4a3026b16b
fixed_pipeline_state: Define structure and loaders
The intention behind this hasheable structure is to describe the state
of fixed function pipeline state that gets compiled to a single graphics
pipeline state object. This is all dynamic state in OpenGL but Vulkan
wants it in an immutable state, even if hardware can edit it freely.

In this commit the structure is defined in an optimized state (it uses
booleans, has paddings and many data entries that can be packed to
single integers). This is intentional as an initial implementation that
is easier to debug, implement and review. It will be optimized in later
stages, or it might change if Vulkan gets more dynamic states.
2019-12-22 22:59:11 -03:00
ReinUsesLisp 5770418fb3
maxwell_3d: Add depth bounds registers 2019-12-22 22:55:06 -03:00
ReinUsesLisp 91d35559e5
maxwell_to_gl: Implement missing primitive topologies
Many of these topologies are exclusively available in OpenGL.
2019-12-22 22:33:01 -03:00
bunnei e976d0e924
Merge pull request #3241 from ReinUsesLisp/gl-shader-cache
gl_shader_cache: Style changes
2019-12-22 16:23:46 -05:00
bunnei 1e76655f83
Merge pull request #3238 from ReinUsesLisp/vk-resource-manager
vk_resource_manager: Catch device losses and other changes
2019-12-22 15:57:16 -05:00
bunnei 0f3ac9cfeb
Merge pull request #3203 from FernandoS27/tex-cache-fixes
Texture Cache: Add HLE methods for building 3D textures
2019-12-22 14:25:13 -05:00
Fernando Sahmkow 3dc585d011
Merge pull request #3237 from ReinUsesLisp/vk-shader-decompiler
vk_shader_decompiler: Misc changes
2019-12-22 12:36:56 -04:00
Fernando Sahmkow 218ee18417 Texture Cache: Improve documentation 2019-12-22 12:29:23 -04:00
Fernando Sahmkow a3916588b6 Texture Cache: Address Feedback 2019-12-22 12:24:34 -04:00
Fernando Sahmkow 51c9e98677 Texture Cache: Add HLE methods for building 3D textures within the GPU in certain scenarios.
This commit adds a series of HLE methods for handling 3D textures in
general. This helps games that generate 3D textures on every frame and
may reduce loading times for certain games.
2019-12-22 12:24:34 -04:00
Fernando Sahmkow aea978e037
Merge pull request #3230 from ReinUsesLisp/vk-emu-shaders
renderer_vulkan/shader: Add helper GLSL shaders
2019-12-22 11:23:09 -04:00
Fernando Sahmkow 27efcc15e9
Merge pull request #3240 from ReinUsesLisp/decomp-cond-code
vk_shader_decompiler: Use Visit instead of reimplementing it
2019-12-22 11:20:55 -04:00
bunnei 16dcfacbfc
Merge pull request #3235 from ReinUsesLisp/ldg-u8
shader/memory: Implement LDG.U8 and unaligned U8 loads
2019-12-21 22:50:28 -05:00
ReinUsesLisp 1e16023d60
gl_shader_cache: Update commentary for shared memory
Remove false commentary. Not dividing by 4 the size of shared memory is
not a hack; it describes the number of integers, not bytes.

While we are at it sort the generated code to put preprocessor lines on
the top.
2019-12-20 22:51:21 -03:00
ReinUsesLisp 486c6a5316
gl_shader_cache: Remove unused entry in GetPrimitiveDescription 2019-12-20 22:49:30 -03:00
ReinUsesLisp af93909c9c
vk_shader_decompiler: Use Visit instead of reimplementing it
ExprCondCode visit implements the generic Visit. Use this instead of
that one.

As an intended side effect this fixes unwritten memory usages in cases
when a negation of a condition code is used.
2019-12-20 21:36:25 -03:00
ReinUsesLisp 38d3a48873
shader/p2r: Implement P2R Pr
P2R dumps predicate or condition codes state to a register. This is
useful for unit testing.
2019-12-20 18:02:41 -03:00
ReinUsesLisp cf27b59493
shader/r2p: Refactor P2R to support P2R 2019-12-20 17:55:42 -03:00
bunnei 7be65c6a68
Merge pull request #3234 from ReinUsesLisp/i2f-u8-selector
shader/conversion: Implement byte selector in I2F
2019-12-19 22:36:26 -05:00
bunnei 6d55b14cc0
Merge pull request #3233 from ReinUsesLisp/mismatch-sizes
shader/texture: Properly shrink unused entries in size mismatches
2019-12-19 20:40:27 -05:00
ReinUsesLisp e41da22c8d
vk_resource_manager: Add entry to VKFence to test its usage 2019-12-19 16:31:34 -03:00
ReinUsesLisp ec983a2451
vk_reosurce_manager: Add assert for releasing fences
Notify the programmer when a request to release a fence is invalid
because the fence is already free.
2019-12-19 16:31:34 -03:00
ReinUsesLisp 6ddffa010a
vk_resource_manager: Implement VKFenceWatch move constructor
This allows us to put VKFenceWatch inside a std::vector without storing
it in heap. On move we have to signal the fences where the new protected
resource is, adding some overhead.
2019-12-19 16:31:34 -03:00
ReinUsesLisp 54747d60bc
vk_device: Add entry to catch device losses
VK_NV_device_diagnostic_checkpoints allows us to push data to a Vulkan
queue and then query it even after a device loss. This allows us to push
the current pipeline object and see what was the call that killed the
device.
2019-12-19 16:31:33 -03:00
ReinUsesLisp 2a63b3bdb9
vk_shader_decompiler: Fix full decompilation
When full decompilation was enabled, labels were not being inserted and
instructions were misused. Fix these bugs.
2019-12-19 16:24:45 -03:00
ReinUsesLisp de918ebeb0
vk_shader_decompiler: Skip NDC correction when it is native
Avoid changing gl_Position when the NDC used by the game is [0, 1]
(Vulkan's native).
2019-12-19 16:24:45 -03:00
ReinUsesLisp 485c21eac3
vk_shader_decompiler: Normalize output fragment attachments
Some games write from fragment shaders to an unexistant framebuffer
attachment or they don't write to one when it exists in the framebuffer.
Fix this by skipping writes or adding zeroes.
2019-12-19 16:24:45 -03:00
bunnei 1eb4a95d2b
Merge pull request #3232 from ReinUsesLisp/gl-decompiler-images
gl_shader_decompiler: Add missing DeclareImages
2019-12-19 11:32:47 -05:00
bunnei 253aa52351
Merge pull request #3231 from ReinUsesLisp/tld4s-encoding
shader_bytecode: Fix TLD4S encoding
2019-12-19 11:32:25 -05:00
ReinUsesLisp f4a25f854c
vk_device: Add query for RGBA8Uint 2019-12-19 02:08:29 -03:00
ReinUsesLisp abb33d4aec
vk_shader_decompiler: Update sirit and implement Texture AOFFI 2019-12-19 01:42:13 -03:00
bunnei d53cf05513
Merge pull request #3221 from ReinUsesLisp/vk-scheduler
vk_scheduler: Delegate commands to a worker thread and state track
2019-12-18 22:04:08 -05:00
ReinUsesLisp da0aa4da6b
gl_rasterizer: Implement RASTERIZE_ENABLE
RASTERIZE_ENABLE is the opposite of GL_RASTERIZER_DISCARD. Implement it
naturally using this.

NVN games expect rasterize to be enabled by default, reflect that in our
initial GPU state.
2019-12-18 19:28:23 -03:00
ReinUsesLisp ae8d4b6c0c
shader/memory: Implement LDG.U8 and unaligned U8 loads
LDG can load single bytes instead of full integers or packs of integers.
These have the advantage of loading bytes that are not aligned to 4
bytes.

To emulate these this commit gets the byte being referenced (by doing
"address & 3" and then using that to extract the byte from the loaded
integer:

result = bitfieldExtract(loaded_integer, (address % 4) * 8, 8)
2019-12-18 01:21:46 -03:00
ReinUsesLisp a7d6bd1ef1
shader/conversion: Implement byte selector in I2F
I2F's byte selector is used to choose what bytes to convert to float.
e.g. if the input is 0xaabbccdd and the selector is ".B3" it will
convert 0xaa. The default (when it's not shown in nvdisasm) is ".B0", in
that example the default would convert 0xdd to float.
2019-12-18 00:41:22 -03:00
ReinUsesLisp 15a753b9a5
shader/texture: Properly shrink unused entries in size mismatches
When a image format mismatches we were inserting zeroes to the texture
itself. This was not handling cases were the mismatch uses less
coordinates than the guest shader code. Address that by resizing the
vector.
2019-12-17 23:38:10 -03:00
ReinUsesLisp e438079b50
gl_shader_decompiler: Add missing DeclareImages 2019-12-17 23:34:15 -03:00
ReinUsesLisp 8b26b4228b
shader_bytecode: Fix TLD4S encoding 2019-12-17 23:32:10 -03:00
ReinUsesLisp b52297767e
renderer_vulkan/shader: Add helper GLSL shaders
These shaders are used to specify code that is not dynamically generated
in the Vulkan backend. Instead of packing it inside the build system,
it's manually built and copied to the C++ file to avoid adding
unnecessary build time dependencies.

quad_array should be dropped in the future since it can be emulated with
a memory pool generated from the CPU.
2019-12-16 17:59:08 -03:00
bunnei 65b1b05e05
Merge pull request #3182 from ReinUsesLisp/renderer-opengl
renderer_opengl: Miscellaneous clean ups
2019-12-16 13:01:04 -05:00
ReinUsesLisp e09c1fbc1f
shader/texture: Implement TLD4.PTP 2019-12-16 04:09:24 -03:00
ReinUsesLisp 844e4a297b
shader/texture: Enable arrayed TLD4 2019-12-16 02:37:21 -03:00
ReinUsesLisp a87c85eba2
gl_shader_decompiler: Rename "sepparate" to "separate" 2019-12-16 02:12:51 -03:00
ReinUsesLisp 3d2c44848b
shader/texture: Implement AOFFI for TLD4S 2019-12-16 02:06:42 -03:00
ReinUsesLisp 3d9fff82c0
shader/texture: Remove unnecesary parenthesis 2019-12-16 01:52:33 -03:00
Rodrigo Locatti eac075692b
Merge pull request #3219 from FernandoS27/fix-bindless
Corrections and fixes to TLD4S & bindless samplers failing
2019-12-16 01:26:11 -03:00
bunnei 3d51153611
Merge pull request #3222 from ReinUsesLisp/maxwell-to-vk
maxwell_to_vk: Use VK_EXT_index_type_uint8 and misc changes
2019-12-14 22:30:12 -05:00
bunnei 035ec7d9de
Merge pull request #3213 from ReinUsesLisp/intel-mesa
gl_device: Enable compute shaders for Intel Mesa drivers
2019-12-14 16:04:31 -05:00
bunnei 2b650543c6
Merge pull request #3212 from ReinUsesLisp/fix-smem-lmem
gl_shader_cache: Add missing new-line on emitted GLSL
2019-12-13 21:35:29 -05:00
ReinUsesLisp e3ea583893
maxwell_to_vk: Improve image format table and add more formats
A1B5G5R5 uses A1R5G5B5. This is flipped with image view swizzles;
flushing is still not properly implemented on Vulkan for this particular
format.
2019-12-13 03:12:29 -03:00
ReinUsesLisp f27b21077d
maxwell_to_vk: Implement more vertex formats 2019-12-13 03:12:28 -03:00
ReinUsesLisp 8db8631d81
maxwell_to_vk: Implement more primitive topologies
Add an extra argument to query device capabilities in the future. The
intention behind this is to use native quads, quad strips, line loops
and polygons if these are released for Vulkan.
2019-12-13 03:12:28 -03:00
ReinUsesLisp 15513f0801
maxwell_to_vk: Approach GL_CLAMP closer to the GL spec
The OpenGL spec defines GL_CLAMP's formula similarly to CLAMP_TO_EDGE
and CLAMP_TO_BORDER depending on the filter mode used. It doesn't
exactly behave like this, but it's the closest we can get with what
Vulkan offers without emulating it by injecting shader code.
2019-12-13 03:12:28 -03:00
ReinUsesLisp f845df8651
maxwell_to_vk: Use VK_EXT_index_type_uint8 when available 2019-12-13 02:37:23 -03:00
ReinUsesLisp 2df9a2dcaf
vk_scheduler: Delegate commands to a worker thread and state track
Introduce a worker thread approach for delegating Vulkan work derived
from dxvk's approach. https://github.com/doitsujin/dxvk

Now that the scheduler is what handles all Vulkan work related to
command streaming, store state tracking in itself. This way we can know
when to reupload Vulkan dynamic state to the queue (since this one is
invalidated between command buffers unlike NVN). We can also store the
renderpass state and graphics pipeline bound to avoid redundant binds
and renderpass begins/ends.
2019-12-13 02:24:48 -03:00
bunnei 8fc49a83b6
Merge pull request #3217 from jhol/fix-boost-include
Added missing include
2019-12-11 22:21:24 -05:00
Fernando Sahmkow c0ee0aa1a8 Shader_IR: Correct TLD4S Depth Compare. 2019-12-11 19:53:17 -04:00
Fernando Sahmkow af89723fa3 Shader_Ir: Correct TLD4S encoding and implement f16 flag. 2019-12-11 19:53:17 -04:00
Fernando Sahmkow 84a158c977 Gl_Shader_compiler: Correct Depth Compare for Texture Gather operations. 2019-12-11 19:53:16 -04:00
Fernando Sahmkow 271a3264f3 Shader_Ir: default failed tracks on bindless samplers to null values. 2019-12-11 19:53:16 -04:00
Fernando Sahmkow 1d2ba3cc97 Gl_Rasterizer: Skip Tesselation Control and Eval stages as they are un implemented.
This commit ensures the OGL backend does not execute tesselation shader 
stages as they are currently unimplemented.
2019-12-11 15:41:26 -04:00
bunnei 1a66cde175
Merge pull request #3210 from ReinUsesLisp/memory-barrier
shader: Implement MEMBAR.GL
2019-12-11 14:24:39 -05:00
Joel Holdsworth e9faa1617c Added missing include 2019-12-11 18:11:49 +00:00
ReinUsesLisp f564eaebed
gl_device: Enable compute shaders for Intel Mesa drivers
Previously we naively checked for "Intel" in GL_VENDOR, but this
includes both Intel's proprietary driver and the mesa driver. Re-enable
compute shaders for mesa.
2019-12-11 00:00:30 -03:00
ReinUsesLisp 48e16c4c49
gl_shader_cache: Add missing new-line on emitted GLSL
Add missing new-line. This caused shaders using local memory and shared
memory to inject a preprocessor GLSL line after an expression (resulting
in invalid code).

It looked like this:
shared uint smem[8];#define LOCAL_MEMORY_SIZE 16

It should look like this (addressed by this commit):
shared uint smem[8];
\#define LOCAL_MEMORY_SIZE 16
2019-12-10 23:52:51 -03:00
Fernando Sahmkow 7ffb672f61 Maxwell3D: Implement Depth Mode.
This commit finishes adding depth mode that was reverted before due to
other unresolved issues.
2019-12-10 19:51:46 -04:00
ReinUsesLisp 425a254fa2
shader: Implement MEMBAR.GL
Implement using memoryBarrier in GLSL and OpMemoryBarrier on SPIR-V.
2019-12-10 16:45:03 -03:00
ReinUsesLisp 233ed96a5c
vk_shader_decompiler: Fix build issues on old gcc versions 2019-12-10 01:55:38 -03:00
ReinUsesLisp d30cf51d7d
vk_shader_decompiler: Reduce YNegate's severity 2019-12-09 23:52:28 -03:00
ReinUsesLisp 0b5b93053d
shader_ir/other: Implement S2R InvocationId 2019-12-09 23:52:28 -03:00
ReinUsesLisp ecbfa416f0
vk_shader_decompiler: Misc changes
Update Sirit and its usage in vk_shader_decompiler. Highlights:
- Implement tessellation shaders
- Implement geometry shaders
- Implement some missing features
- Use native half float instructions when available.
2019-12-09 23:51:57 -03:00
ReinUsesLisp 9ad6327fbd
shader: Keep track of shaders using warp instructions 2019-12-09 23:40:41 -03:00
ReinUsesLisp 6233b1db08
shader_ir/memory: Implement patch stores 2019-12-09 23:25:21 -03:00
ReinUsesLisp 19ce0d4f1a
vk_device: Misc changes
- Setup more features and requirements.
- Improve logging for missing features.
- Collect telemetry parameters.
- Add queries for more image formats.
- Query push constants limits.
- Optionally enable some extensions.
2019-12-09 01:04:48 -03:00
bunnei faf5ae6a50
Merge pull request #3198 from ReinUsesLisp/tessellation-maxwell
maxwell_3d: Add tessellation state entries
2019-12-08 22:28:25 -05:00
ReinUsesLisp 7ea362e134
externals: Update Vulkan-Headers 2019-12-08 22:08:19 -03:00
ReinUsesLisp f632d00eb1
vk_swapchain: Add support for swapping sRGB
We don't know until the game is running if it's using an sRGB color
space or not. Add support for hot-swapping swapchain surface formats.
2019-12-06 22:42:08 -03:00
ReinUsesLisp 36651f215a
maxwell_3d: Add tessellation tess level registers 2019-12-06 22:08:22 -03:00
ReinUsesLisp 707bf41c6f
maxwell_3d: Add tessellation mode register 2019-12-06 22:07:31 -03:00
ReinUsesLisp d2b50c5ebd
maxwell_3d: Add patch vertices register 2019-12-06 22:06:53 -03:00
ReinUsesLisp 74f515e8b6
shader_bytecode: Remove corrupted character 2019-12-06 20:31:56 -03:00
bunnei e36814d6d5
Merge pull request #3109 from FernandoS27/new-instr
Implement FLO & TXD Instructions on GPU Shaders
2019-12-06 18:18:16 -05:00
bunnei 3c1b6b5723
Merge pull request #2987 from FernandoS27/texture-invalid
Texture_Cache: Redo invalid Surfaces handling.
2019-12-02 12:07:05 -05:00
bunnei 930b7c18a6
Merge pull request #3184 from ReinUsesLisp/framebuffer-cache
gl_framebuffer_cache: Optimize framebuffer cache management
2019-11-30 18:46:40 -05:00
ReinUsesLisp ff64c3951a
texture_cache/surface_base: Fix out of bounds texture views
Some texture views were being created out of bounds (with more layers or
mipmaps than what the original texture has). This is because of a
miscalculation in mipmap bounding. end_layer and end_mipmap are out of
bounds (e.g. layer 6 in a cubemap), there's no need to add one more
there.

Fixes OpenGL errors and Vulkan crashes on Splatoon 2.
2019-11-29 16:51:14 -03:00
ReinUsesLisp fb6cf12a17
gl_framebuffer_cache: Optimize framebuffer key
Pack color attachment enumerations into a single u32. To determine the
number of buffers, the highest color attachment with a shared pointer
that doesn't point to null is used.
2019-11-28 23:02:20 -03:00
ReinUsesLisp c34da106ed
gl_rasterizer: Re-enable framebuffer cache for clear buffers 2019-11-28 23:02:20 -03:00
ReinUsesLisp e6a0a30334
renderer_opengl: Make ScreenRectVertex's constructor constexpr 2019-11-28 20:36:02 -03:00
ReinUsesLisp dee7844443
renderer_opengl: Remove C casts 2019-11-28 20:28:27 -03:00
ReinUsesLisp 3a44faff11
renderer_opengl: Use explicit binding for presentation shaders 2019-11-28 20:25:56 -03:00
ReinUsesLisp 75cc501d52
renderer_opengl: Drop macros for message decorations 2019-11-28 20:15:25 -03:00
ReinUsesLisp 056f049b26
renderer_opengl: Move static definitions to anonymous namespace 2019-11-28 20:14:40 -03:00
ReinUsesLisp 4589582eaf
renderer_opengl: Move commentaries to header file 2019-11-28 20:11:03 -03:00
bunnei e3ee017e91
Merge pull request #3169 from lioncash/memory
core/memory: Deglobalize memory management code
2019-11-28 11:43:17 -05:00
Rodrigo Locatti 913d0bb269
Merge pull request #3174 from lioncash/optional
video_core/gpu_thread: Tidy up SwapBuffers()
2019-11-27 20:35:31 -03:00
Lioncash aed6d8bef5 video_core/gpu_thread: Tidy up SwapBuffers()
We can just use std::nullopt and std::make_optional to make this a
little bit less noisy.
2019-11-27 17:46:11 -05:00
Lioncash 9403979c22 video_core/const_buffer_locker: Make use of std::tie in HasEqualKeys()
Tidies it up a little bit visually.
2019-11-27 05:53:43 -05:00
Lioncash 930e311526 video_core/const_buffer_locker: Remove unused includes 2019-11-27 05:51:13 -05:00
Lioncash 9341ca7979 video_core/const_buffer_locker: Remove #pragma once from cpp file
Silences a compiler warning.
2019-11-27 05:50:51 -05:00
Lioncash 849581075a core/memory: Migrate over RasterizerMarkRegionCached() to the Memory class
This is only used within the accelerated rasterizer in two places, so
this is also a very trivial migration.
2019-11-26 21:55:38 -05:00
Lioncash 3f08e8d8d4 core/memory: Migrate over GetPointer()
With all of the interfaces ready for migration, it's trivial to migrate
over GetPointer().
2019-11-26 21:55:38 -05:00
Lioncash 536fc7f0ea core: Prepare various classes for memory read/write migration
Amends a few interfaces to be able to handle the migration over to the
new Memory class by passing the class by reference as a function
parameter where necessary.

Notably, within the filesystem services, this eliminates two ReadBlock()
calls by using the helper functions of HLERequestContext to do that for
us.
2019-11-26 21:55:37 -05:00
bunnei 6df6caaf5f
Merge pull request #3143 from ReinUsesLisp/indexing-bug
gl_device: Deduce indexing bug from device instead of heuristic
2019-11-26 21:53:12 -05:00
ReinUsesLisp ef4446cb11
gl_shader_decompiler: Fix casts from fp32 to f16
Casts from f32 to f16 zeroes the higher half of the target register.
2019-11-25 22:22:33 -03:00
ReinUsesLisp 410d44ce05
gl_device: Deduce indexing bug from device instead of heuristic
The heuristic to detect AMD's driver was not working properly since it
also included Intel. Instead of using heuristics to detect it, compare
the GL_VENDOR string.
2019-11-25 16:15:22 -03:00
bunnei 2899c93818
Merge pull request #3158 from ReinUsesLisp/srgb-blit
gl_texture_cache: Apply sRGB on blits
2019-11-24 20:47:13 -05:00
bunnei 33a6b45a6c
Merge pull request #3155 from bunnei/fix-asynch-gpu-wait
gpu_thread: Don't spin wait if there are no GPU commands.
2019-11-24 20:19:25 -05:00
bunnei b03242067d
Merge pull request #3098 from ReinUsesLisp/shader-invalidations
gl_shader_cache: Miscellaneous changes to shaders
2019-11-24 19:36:30 -05:00
ReinUsesLisp 74fff717aa
gl_texture_cache: Apply sRGB on blits
glBlitFramebuffer keeps in mind GL_FRAMEBUFFER_SRGB's state. Enable this
depending on the target surface pixel format.
2019-11-24 18:13:33 -03:00
bunnei b7031b2b9d
Merge pull request #3105 from ReinUsesLisp/fix-stencil-reg
maxwell_3d: Fix stencil_back_func_mask offset
2019-11-24 13:53:23 -05:00
bunnei e81e0036b4
Merge pull request #3145 from ReinUsesLisp/buffer-cache-init
buffer_cache: Remove brace initialized for objects with default constructor
2019-11-24 02:55:02 -05:00
bunnei 9ec84fc592 gpu_thread: Don't spin wait if there are no GPU commands. 2019-11-23 15:17:28 -05:00
bunnei 4ed183ee42
Merge pull request #3141 from ReinUsesLisp/gl-position
gl_shader_gen: Apply default value to gl_Position
2019-11-23 13:23:46 -05:00
ReinUsesLisp dc2e83fa31
gl_device: Reserve base bindings on limited devices
SSBOs and other resources are limited per pipeline on Intel and AMD.
Heuristically reserve resources per stage having in mind the reported
OpenGL limits.
2019-11-22 21:28:50 -03:00
ReinUsesLisp e3d7334be9
gl_state: Skip null texture binds
glBindTextureUnit doesn't support null textures. Skip binding these.
2019-11-22 21:28:50 -03:00
ReinUsesLisp 919ac2c4d3
gl_rasterizer: Disable compute shaders on Intel
Intel's proprietary driver enters in a corrupt state when compute
shaders are executed. For now, disable these.
2019-11-22 21:28:50 -03:00
ReinUsesLisp 894ad74b87
gl_shader_cache: Hack shared memory size
The current shared memory size seems to be smaller than what the game
actually uses. This makes Nvidia's driver consistently blow up; in the
case of FE3H it made it explode on Qt's SwapBuffers while SDL2 worked
just fine. For now keep this hack since it's still progress over the
previous hardcoded shared memory size.
2019-11-22 21:28:49 -03:00
ReinUsesLisp e35b9597ef
gl_shader_decompiler: Normalize image bindings 2019-11-22 21:28:49 -03:00
ReinUsesLisp 36d9b409fc
gl_shader_decompiler: Normalize cbuf bindings
Stage and compute shaders were using a different binding counter.
Normalize these.
2019-11-22 21:28:49 -03:00
ReinUsesLisp f936b86c7c
gl_rasterizer: Add missing cbuf counter reset on compute 2019-11-22 21:28:49 -03:00
ReinUsesLisp 180417c514
gl_shader_cache: Remove dynamic BaseBinding specialization 2019-11-22 21:28:49 -03:00
ReinUsesLisp c8a48aacc0
video_core: Unify ProgramType and ShaderStage into ShaderType 2019-11-22 21:28:48 -03:00
ReinUsesLisp 0f23359a44
gl_rasterizer: Bind graphics images to draw commands
Images were not being bound to draw invocations because these would
require a cache invalidation.
2019-11-22 21:28:48 -03:00
ReinUsesLisp 287ae2b9e8
gl_shader_cache: Specialize local memory size for compute shaders
Local memory size in compute shaders was stubbed with an arbitary size.
This commit specializes local memory size from guest GPU parameters.
2019-11-22 21:28:48 -03:00
ReinUsesLisp dbeb523879
gl_shader_cache: Specialize shared memory size
Shared memory was being declared with an undefined size. Specialize from
guest GPU parameters the compute shader's shared memory size.
2019-11-22 21:28:47 -03:00
ReinUsesLisp 4f5d8e4342
gl_shader_cache: Specialize shader workgroup
Drop the usage of ARB_compute_variable_group_size and specialize compute
shaders instead. This permits compute to run on AMD and Intel
proprietary drivers.
2019-11-22 21:28:47 -03:00
ReinUsesLisp dc9961f341
shader/texture: Handle TLDS texture type mismatches
Some games like "Fire Emblem: Three Houses" bind 2D textures to offsets
used by instructions of 1D textures. To handle the discrepancy this
commit uses the the texture type from the binding and modifies the
emitted code IR to build a valid backend expression.

E.g.: Bound texture is 2D and instruction is 1D, the emitted IR samples
a 2D texture in the coordinate ivec2(X, 0).
2019-11-22 21:28:47 -03:00
ReinUsesLisp 32c1bc6a67
shader/texture: Deduce texture buffers from locker
Instead of specializing shaders to separate texture buffers from 1D
textures, use the locker to deduce them while they are being decoded.
2019-11-22 21:28:47 -03:00
ReinUsesLisp 73aaf365e7
buffer_cache: Remove brace initialized for objects with default constructor 2019-11-20 16:00:40 -03:00
Fernando Sahmkow cc81c0ce64 Texture_Cache: Redo invalid Surfaces handling.
This commit aims to redo the full setup of invalid textures and
guarantee correct behavior across backends in the case of finding one by
using black dummy textures that match the target of the expected
texture.
2019-11-20 14:59:35 -04:00
ReinUsesLisp 24f4198cee
shader/other: Reduce DEPBAR log severity
While DEPBAR is stubbed it doesn't change anything from our end. Shading
languages handle what this instruction does implicitly. We are not
getting anything out fo this log except noise.
2019-11-19 21:26:40 -03:00
ReinUsesLisp bc10714dcf
gl_shader_gen: Apply default value to gl_Position
Nvidia has sane default output values for varyings, but the other
vendors don't apply these. To properly emulate this we would have to
analyze the shader header. For the time being, apply the same default
Nvidia applies so we get the same behaviour on non-Nvidia drivers.
2019-11-19 20:32:01 -03:00
bunnei b0819e2ffb
Merge pull request #3086 from ReinUsesLisp/format-lookups
texture_cache: Use a flat table instead of switch for texture format lookups
2019-11-19 18:29:17 -05:00
Fernando Sahmkow c8473f399e Shader_IR: Address Feedback 2019-11-18 07:34:34 -04:00
bunnei a8295d2c53
Merge pull request #3047 from ReinUsesLisp/clip-control
gl_rasterizer: Emulate viewport flipping with ARB_clip_control
2019-11-15 12:09:19 -05:00
ReinUsesLisp 4681381a34
format_lookup_table: Address feedback
format_lookup_table: Drop bitfields

format_lookup_table: Use std::array for definition table

format_lookup_table: Include <limits> instead of <numeric>
2019-11-14 20:57:30 -03:00
ReinUsesLisp 80eacdf89b
texture_cache: Use a table instead of switch for texture formats
Use a large flat array to look up texture formats. This allows us to
properly implement formats with different component types. It should
also be faster.
2019-11-14 20:57:10 -03:00
ReinUsesLisp 48a1687f51
texture_cache: Drop abstracted ComponentType
Abstracted ComponentType was not being used in a meaningful way.
This commit drops its usage.

There is one place where it was being used to test compatibility between
two cached surfaces, but this one is implied in the pixel format.
Removing the component type test doesn't change the behaviour.
2019-11-14 18:21:42 -03:00
greggameplayer c6bc13d0aa correct the implementation of RGBA16UI 2019-11-14 21:37:39 +01:00
Fernando Sahmkow cd0f5dfc17 Shader_IR: Implement TXD instruction. 2019-11-14 11:15:27 -04:00
Fernando Sahmkow f3d1b370aa Shader_IR: Implement FLO instruction. 2019-11-14 11:15:27 -04:00
Fernando Sahmkow 95137a04e1 Shader_Bytecode: Add encodings for FLO, SHF and TXD 2019-11-14 11:15:26 -04:00
Fernando Sahmkow b6f6733131
Merge pull request #3081 from ReinUsesLisp/fswzadd-shuffles
shader: Implement FSWZADD and reimplement SHFL
2019-11-14 10:27:27 -04:00
ReinUsesLisp 7990220df7
maxwell_3d: Fix stencil_back_func_mask offset
stencil_back_func_mask and stencil_back_mask were misplaced. This commit
addresses that issue.
2019-11-13 16:35:17 -03:00
Rodrigo Locatti cf770a68a5
Merge pull request #3084 from ReinUsesLisp/cast-warnings
video_core: Treat implicit conversions as errors
2019-11-13 02:16:22 -03:00
Rodrigo Locatti fb9418798d
video_core: Enable sign conversion warnings
Enable sign conversion warnings but don't treat them as errors.
2019-11-11 18:00:37 -03:00
bunnei 0fc596de6e
Merge pull request #3082 from ReinUsesLisp/fix-lockers
gl_shader_cache: Fix locker constructors
2019-11-09 13:58:36 -05:00
ReinUsesLisp 18c1cb68fd video_core: Treat implicit conversions as errors 2019-11-08 22:49:39 +00:00
ReinUsesLisp 096f339a2a video_core: Silence implicit conversion warnings 2019-11-08 22:48:50 +00:00
bunnei a056d8de16
Merge pull request #3080 from FernandoS27/glsl-fix
GLSLDecompiler: Correct Texture Gather Offset.
2019-11-08 15:56:29 -05:00
ReinUsesLisp bfa973a62b
gl_shader_cache: Fix locker constructors
Properly pass engine when a shader is being constructed from memory.
2019-11-07 20:43:31 -03:00
ReinUsesLisp 3ab0514698
gl_shader_cache: Enable extensions only when available
Silence GLSL compilation warnings.
2019-11-07 20:08:42 -03:00
ReinUsesLisp cd66395944
gl_shader_decompiler: Add safe fallbacks when ARB_shader_ballot is not available 2019-11-07 20:08:42 -03:00
ReinUsesLisp 56e237d1f9
shader_ir/warp: Implement FSWZADD 2019-11-07 20:08:41 -03:00
ReinUsesLisp 08b2b1080a
gl_shader_decompiler: Reimplement shuffles with platform agnostic intrinsics 2019-11-07 20:08:41 -03:00
Fernando Sahmkow 3d7c284e0f GLSLDecompiler: Correct Texture Gather Offset.
This commit corrects the argument ordering in textureGatherOffset.
2019-11-07 11:43:56 -04:00
bunnei b6ae48966d
Merge pull request #3032 from ReinUsesLisp/simplify-control-flow-brx
shader/control_flow: Abstract repeated code chunks in BRX tracking
2019-11-07 01:30:01 -05:00
Morph 0e8a3bf3e5 buffer_cache: Add missing includes (#3079)
`boost::make_iterator_range` is available when `boost/range/iterator_range.hpp` is included.
Also include `boost/icl/interval_map.hpp` and `boost/icl/interval_set.hpp`.
2019-11-07 06:25:53 +00:00
bunnei 344d15f61e
Merge pull request #3070 from ReinUsesLisp/shader-warnings
shader_ir: Reduce severity of warnings
2019-11-07 00:47:24 -05:00
ReinUsesLisp e9d2fad984
gl_rasterizer: Remove front facing hack 2019-11-07 01:52:18 -03:00
ReinUsesLisp f1facaeaef
gl_shader_decompiler: Fix typo "y_negate"->"y_direction" 2019-11-07 01:52:18 -03:00
ReinUsesLisp e2ea0c3e11
gl_shader_manager: Remove unused variable in SetFromRegs 2019-11-07 01:52:18 -03:00
ReinUsesLisp f019817f8f
gl_rasterizer: Emulate viewport flipping with ARB_clip_control
Emulates negative y viewports with ARB_clip_control. This allows us to
more easily emulated pipelines with tessellation and/or geometry shader
stages. It also avoids corrupting games with transform feedbacks and
negative viewports (gl_Position.y was being modified).
2019-11-07 01:52:18 -03:00
Rodrigo Locatti ff5a0f370c
shader/control_flow: Specify constness on caller lambdas
Update src/video_core/shader/control_flow.cpp

Co-Authored-By: Mat M. <mathew1800@gmail.com>

Update src/video_core/shader/control_flow.cpp

Co-Authored-By: Mat M. <mathew1800@gmail.com>

Update src/video_core/shader/control_flow.cpp

Co-Authored-By: Mat M. <mathew1800@gmail.com>

Update src/video_core/shader/control_flow.cpp

Co-Authored-By: Mat M. <mathew1800@gmail.com>

Update src/video_core/shader/control_flow.cpp

Co-Authored-By: Mat M. <mathew1800@gmail.com>

Update src/video_core/shader/control_flow.cpp

Co-Authored-By: Mat M. <mathew1800@gmail.com>
2019-11-07 01:44:09 -03:00
ReinUsesLisp 7b069252f8
shader/control_flow: Use callable template instead of std::function 2019-11-07 01:44:08 -03:00
ReinUsesLisp 46c3047283
shader/control_flow: Abstract repeated code chunks in BRX tracking
Remove copied and pasted for cycles into a common templated function.
2019-11-07 01:44:08 -03:00
ReinUsesLisp ae7dfa93be
shader/control_flow: Silence Intellisense cast warnings 2019-11-07 01:44:08 -03:00
ReinUsesLisp deb1b54eed
shader/control_flow: Remove brace initializer in std containers
These containers have a default constructor.
2019-11-07 01:44:08 -03:00
ReinUsesLisp 39c66abd91
shader/decode: Reduce severity of arithmetic rounding warnings 2019-11-07 01:43:38 -03:00
ReinUsesLisp c4374d0d41
shader/arithmetic: Reduce RRO stub severity 2019-11-07 01:43:38 -03:00
ReinUsesLisp 35d40b74b3
shader/texture: Remove NODEP warnings
These warnings don't offer meaningful information while decoding
shaders. Remove them.
2019-11-07 01:43:38 -03:00
bunnei 468576284d
Merge pull request #3057 from ReinUsesLisp/buffer-sub-data
gl_rasterizer: Upload constant buffers with glNamedBufferSubData
2019-11-06 10:08:55 -05:00
Rodrigo Locatti 654b77d2ec
Merge pull request #3039 from ReinUsesLisp/cleanup-samplers
shader/node: Unpack bindless texture encoding
2019-11-06 04:54:11 +00:00
bunnei 21e07df7b7
Merge pull request #2914 from FernandoS27/fermi-fix
Fermi2D: limit blit area to only available area
2019-11-05 20:45:24 -05:00
bunnei 1bdae0fe29 common_func: Use std::array for INSERT_PADDING_* macros.
- Zero initialization here is useful for determinism.
2019-11-03 22:22:41 -05:00
ReinUsesLisp 442a1cc021
gl_rasterizer: Re-enable stream buffer memory due to global memory
Global memory is still using the stream buffer when it shouldn't. As a
temporary fix re-enable the stream buffer on compute.
2019-11-02 13:19:19 -03:00
ReinUsesLisp 76ca2a5f82
gl_rasterizer: Upload constant buffers with glNamedBufferSubData
Nvidia's OpenGL driver maps gl(Named)BufferSubData with some requirements
to a fast. This path has an extra memcpy but updates the buffer without
orphaning or waiting for previous calls. It can be seen as a better
model for "push constants" that can upload a whole UBO instead of 256
bytes.

This path has some requirements established here:
http://on-demand.gputechconf.com/gtc/2014/presentations/S4379-opengl-44-scene-rendering-techniques.pdf#page=24

Instead of using the stream buffer, this commits moves constant buffers
uploads to calls of glNamedBufferSubData and from my testing it brings a
performance improvement. This is disabled when the vendor is not Nvidia
since it brings performance regressions.
2019-11-02 05:05:34 -03:00
Fernando Sahmkow 23cabc98db Shader_IR: Fix regression on TLD4
Originally on the last commit I thought TLD4 acted the same as TLD4S and 
didn't have a mask. It actually does have a component mask. This commit 
corrects that.
2019-10-30 21:14:57 -04:00
Rodrigo Locatti 658489ebf7
Merge pull request #3050 from FernandoS27/fix-tld4
shader_ir: Fix TLD4 and add bindless variant
2019-10-30 18:37:17 +00:00
Fernando Sahmkow 9293c3a0f2 Shader_IR: Fix TLD4 and add Bindless Variant.
This commit fixes an issue where not all 4 results of tld4 were being
written, the color component was defaulted to red, among other things.
It also implements the bindless variant.
2019-10-30 12:02:03 -04:00
bunnei 2382bbe3ac
Merge pull request #3046 from ReinUsesLisp/clean-gl-state
gl_state: Miscellaneous clean up
2019-10-29 22:50:04 -04:00
bunnei b5138f3c35
Merge pull request #3035 from ReinUsesLisp/rasterizer-accelerated
rasterizer_accelerated: Add intermediary for GPU rasterizers
2019-10-29 22:06:41 -04:00
Rodrigo Locatti 3d0cde6a75
gl_state: Use std::array::fill instead of std::fill
Co-Authored-By: Mat M. <mathew1800@gmail.com>
2019-10-30 01:30:31 +00:00
ReinUsesLisp ce20ed8e4e
gl_state: Move dirty checks to individual apply calls instead of Apply
This requires removing constness from some methods, but for consistency
it's removed in all methods.
2019-10-29 21:27:25 -03:00
ReinUsesLisp 3c6557c235
gl_state: Remove ApplyDefaultState
OpenGL has defaults values we can trust. Remove these.
2019-10-29 21:27:25 -03:00
ReinUsesLisp d3651b0b82
gl_state: Change SetDefaultViewports to use default constructor 2019-10-29 21:27:24 -03:00
ReinUsesLisp c7698d0bc8
gl_state: Minor style changes 2019-10-29 21:27:24 -03:00
ReinUsesLisp a14d202ac2
gl_state: Remove unused Citra TextureUnits 2019-10-29 21:27:24 -03:00
ReinUsesLisp 28fece8e9b
gl_state: Move initializers from constructor to class declaration 2019-10-29 21:27:23 -03:00
ReinUsesLisp a993df1ee2
shader/node: Unpack bindless texture encoding
Bindless textures were using u64 to pack the buffer and offset from
where they come from. Drop this in favor of separated entries in the
struct.

Remove the usage of std::set in favor of std::list (it's not std::vector
to avoid reference invalidations) for samplers and images.
2019-10-29 20:53:48 -03:00
Rodrigo Locatti 2ec5b55ee3
Merge pull request #3004 from ReinUsesLisp/maxwell3d-cleanup
maxwell_3d: Remove unused entries
2019-10-29 23:46:33 +00:00
Rodrigo Locatti c5d9589942
Merge pull request #3037 from FernandoS27/new-formats
video_core: Implement texture format E5B9G9R9_SHAREDEXP.
2019-10-28 01:36:58 -03:00
ReinUsesLisp fa31e5b868
maxwell_3d/kepler_compute: Remove unused arguments in GetTexture 2019-10-28 00:23:42 -03:00
ReinUsesLisp 538ddd220e
video_core/textures: Remove unused index entry in FullTextureInfo 2019-10-28 00:14:38 -03:00
ReinUsesLisp 961fe4d19b
maxwell_3d: Remove unused method GetStageTextures 2019-10-28 00:14:29 -03:00
Fernando Sahmkow 3f9262195b Video_Core: Implement texture format E5B9G9R9_SHAREDEXP.
This commit implements the E5B9G9R9 Texture format into the general 
system and OpenGL backend.
2019-10-27 16:44:09 -04:00
bunnei 6909b2f0f9
Merge pull request #3034 from ReinUsesLisp/w4244-maxwell3d
maxwell_3d: Silence implicit conversion warnings
2019-10-27 15:08:59 -04:00
ReinUsesLisp 3e469cecc1
maxwell_3d: Silence implicit conversion warnings
While we are at it, unify types for dirty reg pointers.
2019-10-27 15:22:17 -03:00
ReinUsesLisp bd2aff3e26
rasterizer_accelerated: Add intermediary for GPU rasterizers
Add an intermediary class that implements common functions across GPU
accelerated rasterizers. This avoids code repetition on different
backends.
2019-10-27 03:40:08 -03:00
ReinUsesLisp a5aa1bb174
astc: Silence implicit conversion warnings 2019-10-27 03:04:50 -03:00
Rodrigo Locatti 26f3e18c5c
Merge pull request #2976 from FernandoS27/cache-fast-brx-rebased
Implement Fast BRX, fix TXQ and addapt the Shader Cache for it
2019-10-26 16:56:13 -03:00
Fernando Sahmkow be856a38d6 Shader_IR: Address Feedback. 2019-10-26 15:38:30 -04:00
Rodrigo Locatti a0d79085c4
Merge pull request #3027 from lioncash/lookup
shader_ir: Use std::array with std::pair instead of std::unordered_map
2019-10-26 05:49:15 -03:00
Rodrigo Locatti d52598173d
Merge pull request #3013 from FernandoS27/tld4s-fix
Shader_Ir: Fix TLD4S from using a component mask.
2019-10-25 20:06:26 -03:00
Fernando Sahmkow e3afd6595a Shader_IR: Clang format 2019-10-25 09:01:32 -04:00
ReinUsesLisp 78f3e8a757 gl_shader_cache: Implement locker variants invalidation 2019-10-25 09:01:32 -04:00
ReinUsesLisp ec85648af3 gl_shader_disk_cache: Store and load fast BRX 2019-10-25 09:01:31 -04:00
ReinUsesLisp fa2c297f3e const_buffer_locker: Minor style changes 2019-10-25 09:01:31 -04:00
ReinUsesLisp 7b81ba4d8a gl_shader_decompiler: Move entries to a separate function 2019-10-25 09:01:31 -04:00
Fernando Sahmkow 1244f2d368 Shader_IR: Implement Fast BRX and allow multi-branches in the CFG. 2019-10-25 09:01:31 -04:00
Fernando Sahmkow a05120ec0b Shader_IR: Correct typo in Consistent method. 2019-10-25 09:01:30 -04:00
Fernando Sahmkow 33fcec3502 Shader_IR: allow lookup of texture samplers within the shader_ir for instructions that don't provide it 2019-10-25 09:01:30 -04:00
Fernando Sahmkow 8909f52166 Shader_IR: Implement Fast BRX and allow multi-branches in the CFG. 2019-10-25 09:01:30 -04:00
Fernando Sahmkow acd6441134 Shader_Cache: setup connection of ConstBufferLocker 2019-10-25 09:01:29 -04:00
Fernando Sahmkow 1a58f45d76 VideoCore: Unify const buffer accessing along engines and provide ConstBufferLocker class to shaders. 2019-10-25 09:01:29 -04:00
Fernando Sahmkow 2ef696c85a Shader_IR: Implement BRX tracking. 2019-10-25 09:01:29 -04:00
Rodrigo Locatti 5062728669
Merge pull request #3028 from lioncash/constexpr
shader_bytecode: Make Matcher constexpr capable
2019-10-24 15:10:40 -03:00
Lioncash 7fdf991097 shader_bytecode: Make Matcher constexpr capable
Greatly shrinks the amount of generated code for GetDecodeTable().

Collapses an assembly output of 9000+ lines down to ~3621 with Clang,
and 6513 down to ~2616 with GCC, given it's now allowed to construct all
the entries as a sequence of constant data.
2019-10-24 01:10:10 -04:00
Lioncash 382717172e shader_ir: Use std::array with pair instead of unordered_map
Given the overall size of the maps are very small, we can use arrays of
pairs here instead of always heap allocating a new map every time the
functions are called. Given the small size of the maps, the difference
in container lookups are negligible, especially given the entries are
already sorted.
2019-10-24 00:25:38 -04:00
Lioncash 1f5401c89c video_core/shader: Resolve instances of variable shadowing
Silences a few -Wshadow warnings.
2019-10-23 23:00:31 -04:00
Fernando Sahmkow c4a0aa9207
Merge pull request #2995 from ReinUsesLisp/ignore-gmem
shader_ir/memory: Ignore global memory when tracking fails
2019-10-22 13:22:43 -04:00
Fernando Sahmkow 7ecf9f7228
Merge pull request #2983 from lioncash/fallthrough
gl_shader_decompiler/vk_shader_decompiler: Resolve implicit fallthrough cases
2019-10-22 13:16:46 -04:00
Fernando Sahmkow 1509d2ffbd Shader_Ir: Fix TLD4S from using a component mask.
TLD4S always outputs 4 values, the previous code checked a component 
mask and omitted those values that weren't part of it. This commit 
corrects that and makes sure all 4 values are set.
2019-10-22 10:59:07 -04:00
ReinUsesLisp 1ea07954fb shader_ir/memory: Ignore global memory when tracking fails
Ignore global memory operations instead of invoking undefined behaviour
when constant buffer tracking fails and we are blasting through asserts,
ignore the operation.

In the case of LDG this means filling the destination registers with
zeroes; for STG this means ignore the instruction as a whole.

The default behaviour is still to abort execution on failure.
2019-10-22 02:49:17 -03:00
ReinUsesLisp e3107788e6
maxwell_3d: Reduce FlushMMEInlineDraw logging to Trace 2019-10-20 03:43:17 -03:00
Rodrigo Locatti dc5eedef71
Merge pull request #2994 from lioncash/fmt
video_core/shader/ast: Minor changes to ASTPrinter
2019-10-18 01:05:25 -03:00
Lioncash 074b38b7a9 video_core/shader/ast: Make ShowCurrentState() and SanityCheck() const member functions
These can also trivially be made const member functions, with the
addition of a few consts.
2019-10-17 20:59:48 -04:00
Lioncash 222f4b45eb video_core/shader/ast: Make ASTManager::Print a const member function
Given all visiting functions never modify the nodes, we can trivially
make this a const member function.
2019-10-17 20:56:39 -04:00
Rodrigo Locatti fd922ddb01
Merge pull request #2993 from lioncash/vulkan-expr
vk_shader_decompiler: Mark operator() function parameters as const references
2019-10-17 21:46:49 -03:00
Lioncash 7831e86c34 video_core/shader/ast: Make ExprPrinter members private
This member already has an accessor, so there's no need for it to be
public.
2019-10-17 20:39:36 -04:00
Lioncash a2eccbf075 video_core/shader/ast: Make Indent() return a string_view
The returned string is simply a substring of our constexpr tabs
string_view, so we can just use a string_view here as well, since the
original string_view is guaranteed to always exist.

Now the function is fully non-allocating.
2019-10-17 20:29:00 -04:00
Lioncash 15d177a6ac video_core/shader/ast: Make Indent() private
It's never used outside of this class, so we can narrow its scope down.
2019-10-17 20:26:13 -04:00
Lioncash 7f6a8a33d4 video_core/shader/ast: Rename Ident() to Indent()
This can be confusing, given "ident" is generally used as a shorthand
for "identifier".
2019-10-17 20:26:13 -04:00
Lioncash 081530686c video_core/shader/ast: Make use of fmt where applicable
Makes a few strings nicer to read and also eliminates a bit of string
churn with operator+.
2019-10-17 20:26:10 -04:00
Lioncash c6bec9aa10 vk_shader_decompiler: Mark operator() function parameters as const references
These parameters aren't actually modified in any way, so they can be
made const references.
2019-10-17 19:44:00 -04:00
Rodrigo Locatti 219fdcb9d9
Merge pull request #2966 from FernandoS27/astc-formats
Implement a series of ASTC formats and R4G4B4A4 format
2019-10-17 19:24:11 -03:00
Rodrigo Locatti a21b88ef8f
Merge pull request #2979 from lioncash/macro
video_core/macro_interpreter: Make definitions of most private enums/unions hidden
2019-10-17 19:21:09 -03:00
Fernando Sahmkow c0eb1aecfd Fermi2D: Use a different formula for delimiting blit areas. 2019-10-17 18:21:01 -04:00
Lioncash 125caf5d6e video_core/macro_interpreter: Make definitions of most private enums/unions hidden
This allows the implementation of these types to change without
requiring a rebuild of everything that includes the macro interpreter
header.
2019-10-17 17:55:46 -04:00
bunnei 9fe8072c67
Merge pull request #2980 from lioncash/warn
maxwell_3d: Silence truncation warnings
2019-10-17 14:02:16 -04:00
Fernando Sahmkow 57a46c69f1 Fermi2D: limit blit area to only available area
Normaly OpenGL does not care if the areas exceed the texture regions but
other backends such as Vulkan do care about the limits of this areas.
This PR crops the areas of the blit in order that they don't surpass the
limits of the textures. This should help Vulkan and faulty OpenGL
drivers
2019-10-17 10:38:44 -04:00
Rodrigo Locatti 60c602e4e7
Merge pull request #2978 from lioncash/doxygen
video_core/texture_cache: Amend Doxygen references
2019-10-16 22:09:40 -03:00
Rodrigo Locatti e00b529a89
Merge pull request #2982 from lioncash/surface
texture_cache: Avoid unnecessary surface copies within PickStrategy() and TryReconstructSurface()
2019-10-16 19:43:32 -03:00
bunnei ef9b31783d
Merge pull request #2912 from FernandoS27/async-fixes
General fixes to Async GPU
2019-10-16 10:34:48 -04:00
Rodrigo Locatti 60315060b1
Merge pull request #2984 from lioncash/fallthrough2
video_core/surface: Add missing break in PixelFormatFromTextureFormat()
2019-10-15 23:08:34 -03:00
Lioncash cf9e13c255 video_core/surface: Add missing break in PixelFormatFromTextureFormat()
Prevents fallthrough into the following case.
2019-10-15 21:53:15 -04:00
Rodrigo Locatti 14f3cebcd4
Merge pull request #2981 from lioncash/copy
gl_shader_decompiler: Minor cleanup-related changes
2019-10-15 21:07:25 -03:00
Lioncash 6947bf8e44 vk_shader_decompiler: Resolve fallthrough within ExprDecompiler's ExprCondCode operator()
This would previously result in NeverExecute and UnusedIndex being
treated as regular predicates.
2019-10-15 19:40:58 -04:00
Lioncash b42a74ff2c gl_shader_decompiler: Resolve fallthrough within ExprDecompiler's ExprCondCode operator()
This would previously result in NeverExecute and UnusedIndex being
treated as regular predicates.
2019-10-15 19:38:55 -04:00
Lioncash a24e8bf9cf texture_cache: Avoid unnecessary surface copies within PickStrategy() and TryReconstructSurface()
We can take these by const reference and avoid making unnecessary
copies, preventing some atomic reference count increments and
decrements.
2019-10-15 19:31:33 -04:00
Lioncash 77b4916b33 control_flow: Silence truncation warnings
This can be trivially fixed by making the input size a size_t.
CFGRebuildState's constructor parameter is already a std::size_t, so
this just makes the size type fully conform with it.
2019-10-15 19:10:28 -04:00
Lioncash 4f16ce9294 gl_shader_decompiler: Make ExprDecompiler's GetResult() a const member function
This is only ever used to read, but not write, the resulting string, so
we can enforce this by making it a const member function.
2019-10-15 19:02:59 -04:00
Lioncash 67df3f7742 gl_shader_decompiler: Use a std::string_view with GetDeclarationWithSuffix()
This allows the function to be completely non-allocating for inputs of
all sizes (i.e. there's no heap cost for an input to convert to a
std::string_view).
2019-10-15 19:00:48 -04:00
Lioncash 04a1161354 gl_shader_decompiler: Fold flow_var constant into GetFlowVariable()
This is only ever used within this function, so we can narrow it's scope
down.
2019-10-15 18:58:36 -04:00
Lioncash 2f2ab9b5bc gl_shader_decompiler: Mark ASTDecompiler/ExprDecompiler parameters as const references where applicable
These member functions don't actually modify the input parameter, so we
can make this explicit with the use of const.
2019-10-15 18:57:02 -04:00
Lioncash b8a62adcf1 gl_shader_decompiler: Pass by reference to GenerateTextureArgument()
Avoids an unnecessary atomic reference count increment and decrement.
2019-10-15 18:29:37 -04:00
Lioncash d1d7ce74d2 gl_shader_decompiler: Use std::holds_alternative within GenerateTexture()
This only ever queries if the type exists within the variant, but
doesn't actually do anything with the return value. We can just use
std::holds_alternative for this use case.
2019-10-15 18:25:48 -04:00
Lioncash 67658dd6e8 shader/node: std::move Meta instance within OperationNode constructor
Allows usages of the constructor to avoid an unnecessary copy.
2019-10-15 18:21:59 -04:00
Lioncash 9760795bfb gl_shader_decompiler: Avoid unnecessary copies of MetaImage
MetaImage contains a std::vector, so copying here could result in
unnecessary reallocations. Given the operation lives throughout the
entire scope, this is safe to do.
2019-10-15 18:14:55 -04:00
Lioncash c9c75f9587 maxwell_3d: Silence truncation warnings
A trivial warning caused by not using size_t as the argument types
instead of u32.
2019-10-15 17:51:35 -04:00
bunnei 2299950de1
Merge pull request #2972 from lioncash/system
{bcat, gpu, nvflinger}: Remove trivial usages of the global system accessor
2019-10-15 17:49:12 -04:00
Lioncash b25b94400e video_core/gpu: Remove use of the global system accessor
We can just make use of the reference member variable instead of
accessing the global system instance.
2019-10-15 16:39:30 -04:00
Lioncash 524eb15513 video_core/texture_cache: Amend Doxygen references
Amends the doxygen comments so that they properly resolve. While we're
at it, we can correct some typos and fix up some of the comments'
formatting in order to make them slightly nicer to read.
2019-10-15 15:40:00 -04:00
Lioncash ac4dbd3b25 common: Rename binary_find.h to algorithm.h
Makes the header more general for other potential algorithms in the
future. While we're at it, include a missing <functional> include to
satisfy the use of std::less.
2019-10-15 15:24:50 -04:00
Fernando Sahmkow cfc2f30dc4 AsyncGpu: Address Feedback 2019-10-11 13:41:15 -04:00
bunnei 2ba273e49e
Merge pull request #2928 from ReinUsesLisp/dirty-depth-bounds
maxwell_3d: Add dirty flags for depth bounds values
2019-10-09 15:44:30 -04:00
bunnei 6b5e50d20e
Merge pull request #2927 from ReinUsesLisp/polygon-offset-units
gl_rasterizer: Fix polygon offset units
2019-10-09 15:38:52 -04:00
Fernando Sahmkow f32a49d3d8 Surfaces: Implement R4G4B4A4U format. 2019-10-09 12:57:02 -04:00
Fernando Sahmkow b9ddb517b1 Surfaces: Implement ASTC 6x6 10x10 12x12 8x6 6x5 2019-10-09 12:44:31 -04:00
ReinUsesLisp 3d0f357307
shader/half_set_predicate: Fix HSETP2 for constant buffers
HSETP2 when used with a constant buffer parses the second operand type
as F32. This is not configurable.
2019-10-07 14:49:47 -03:00
ReinUsesLisp 632c9e4ee3
shader/half_set_predicate: Reduce DEBUG_ASSERT to LOG_DEBUG 2019-10-07 14:48:58 -03:00
ReinUsesLisp 58b597c5ec
gl_shader_disk_cache: Properly ignore existing cache
Previously old entries where appended to the file even if the shader
cache was ignored at boot. Address that issue.
2019-10-06 18:00:20 -03:00
Lioncash f883cd4f0e video_core/control_flow: Eliminate variable shadowing warnings 2019-10-05 09:14:27 -04:00
Lioncash 25702b6256 video_core/control_flow: Eliminate pessimizing moves
These can inhibit the ability of a compiler to perform RVO.
2019-10-05 09:14:27 -04:00
Lioncash d82b181d44 video_core/ast: Unindent most of IsFullyDecompiled() by one level 2019-10-05 09:14:27 -04:00
Lioncash 6c41d1cd7e video_core/ast: Make ShowCurrentState() take a string_view instead of std::string
Allows the function to be non-allocating in terms of the output string.
2019-10-05 09:14:27 -04:00
Lioncash 3c54edae24 video_core/ast: Eliminate variable shadowing warnings 2019-10-05 09:14:26 -04:00
Lioncash 5a0a9c7449 video_core/ast: Replace std::string with a constexpr std::string_view
Same behavior, but without the need to heap allocate
2019-10-05 09:14:26 -04:00
Lioncash 3a20d9734f video_core/ast: Default the move constructor and assignment operator
This is behaviorally equivalent and also fixes a bug where some members
weren't being moved over.
2019-10-05 09:14:26 -04:00
Lioncash 43503a69bf video_core/{ast, expr}: Organize forward declaration
Keeps them alphabetically sorted for readability.
2019-10-05 09:14:26 -04:00
Lioncash 50ad745585 video_core/expr: Supply operator!= along with operator==
Provides logical symmetry to the interface.
2019-10-05 09:14:26 -04:00
Lioncash 8eb1398f8d video_core/{ast, expr}: Use std::move where applicable
Avoids unnecessary atomic reference count increments and decrements.
2019-10-05 09:14:23 -04:00
Lioncash 8e0c80f269 video_core/ast: Supply const accessors for data where applicable
Provides const equivalents of data accessors for use within const
contexts.
2019-10-05 08:22:03 -04:00
David 3728bbc22a
Merge pull request #2888 from FernandoS27/decompiler2
Shader_IR: Implement a full control flow decompiler for the shader IR.
2019-10-05 21:52:20 +10:00
ReinUsesLisp fe7f20e659 maxwell_3d: Add dirty flags for depth bounds values
This is useful in Vulkan where we want to update depth bounds without
caring if it's enabled or disabled through vkCmdSetDepthBounds.
2019-10-05 04:07:47 +00:00
Fernando Sahmkow 538f5880ff GL_Renderer: Remove lefting snippet. 2019-10-04 19:59:55 -04:00
Fernando Sahmkow 9f2719d1a4 Gl_Rasterizer: Protect CPU Memory mapping from multiple threads. 2019-10-04 19:59:53 -04:00
Fernando Sahmkow 3f104464de Core: Wait for GPU to be idle before shutting down. 2019-10-04 19:59:53 -04:00
Fernando Sahmkow ffc2ce89a0 Nvdrv: Do framelimiting only in the CPU Thread 2019-10-04 19:59:50 -04:00
Fernando Sahmkow 5b5e60ffec GPU_Async: Correct fences, display events and more.
This commit uses guest fences on vSync event instead of an articial fake 
fence we had.
It also corrects to keep signaling display events while loading the game 
as the OS is suppose to send buffers to vSync during that time.
2019-10-04 19:59:48 -04:00
Fernando Sahmkow ab47a660c8 Texture_Cache: Blit Deduction corrections and simplifications. 2019-10-04 18:53:47 -04:00
Fernando Sahmkow 2036504a82 TextureCache: Add the ability to deduce if two textures are depth on blit. 2019-10-04 18:53:46 -04:00
Fernando Sahmkow e6eae4b815 Shader_ir: Address feedback 2019-10-04 18:52:57 -04:00
Fernando Sahmkow 3c09d9abe6 Shader_Ir: Address Feedback and clang format. 2019-10-04 18:52:57 -04:00
Fernando Sahmkow 507a9c6a40 vk_shader_decompiler: Correct Branches inside conditionals. 2019-10-04 18:52:56 -04:00
Fernando Sahmkow 000ad558dd vk_shader_decompiler: Clean code and be const correct. 2019-10-04 18:52:55 -04:00
Fernando Sahmkow 7c756baa77 Shader_IR: clean up AST handling and add documentation. 2019-10-04 18:52:55 -04:00
Fernando Sahmkow 5ea740beb5 Shader_IR: Correct OutwardMoves for Ifs 2019-10-04 18:52:54 -04:00
Fernando Sahmkow 100a4bd988 vk_shader_compiler: Don't enclose branches with if(true) to avoid crashing AMD 2019-10-04 18:52:54 -04:00
Fernando Sahmkow 189a50bc2a gl_shader_decompiler: Refactor and address feedback. 2019-10-04 18:52:53 -04:00
Fernando Sahmkow b3c46d6948 Shader_IR: corrections and clang-format 2019-10-04 18:52:53 -04:00
Fernando Sahmkow 466cd52ad4 vk_shader_compiler: Correct SPIR-V AST Decompiling 2019-10-04 18:52:52 -04:00
Fernando Sahmkow 2e9a810423 Shader_IR: allow else derivation to be optional. 2019-10-04 18:52:52 -04:00
Fernando Sahmkow ca9901867e vk_shader_compiler: Implement the decompiler in SPIR-V 2019-10-04 18:52:51 -04:00
Fernando Sahmkow 0366c18d87 Shader_IR: mark labels as unused for partial decompile. 2019-10-04 18:52:51 -04:00
Fernando Sahmkow 47e4f6a52c Shader_Ir: Refactor Decompilation process and allow multiple decompilation modes. 2019-10-04 18:52:50 -04:00
Fernando Sahmkow 38fc995f6c gl_shader_decompiler: Implement AST decompiling 2019-10-04 18:52:50 -04:00
Fernando Sahmkow 6fdd501113 shader_ir: Declare Manager and pass it to appropiate programs. 2019-10-04 18:52:49 -04:00
Fernando Sahmkow 8be6e1c522 shader_ir: Corrections to outward movements and misc stuffs 2019-10-04 18:52:48 -04:00
Fernando Sahmkow 4fde66e609 shader_ir: Add basic goto elimination 2019-10-04 18:52:48 -04:00
Fernando Sahmkow c17953978b shader_ir: Initial Decompile Setup 2019-10-04 18:52:47 -04:00
ReinUsesLisp 69c806feb6
gl_rasterizer: Fix polygon offset units
For some reason hardware divides polygon offset units by two. This is
visible since drivers multiply the application requested polygon offset
by two.
2019-10-01 02:00:23 -03:00
ReinUsesLisp f926230ab1
gl_shader_decompiler: Add tailing return for HUnpack2 2019-09-24 01:03:59 -03:00
ReinUsesLisp 25bfaffdff
gl_shader_decompiler: Fix clang build issues 2019-09-24 01:03:27 -03:00
bunnei 376f1a4432
Merge pull request #2869 from ReinUsesLisp/suld
shader/image: Implement SULD and fix SUATOM
2019-09-23 21:47:03 -04:00
David 9d69206cd0
Merge pull request #2870 from FernandoS27/multi-draw
Implement a MME Draw commands Inliner and correct host instance drawing
2019-09-22 23:13:02 +10:00
Fernando Sahmkow 822ca65d69
Merge pull request #2891 from FearlessTobi/rod-tex
video_core: Implement RGBX16F and lower Surface Copy log severity
2019-09-22 09:11:28 -04:00
David 3bfba23362
Merge pull request #2867 from ReinUsesLisp/configure-framebuffers-clean
gl_rasterizer: Remove unused code paths from ConfigureFramebuffers
2019-09-22 23:10:07 +10:00
Fernando Sahmkow 68f5aff64f Maxwell3D: Corrections and refactors to MME instance refactor 2019-09-22 07:23:13 -04:00
FearlessTobi 01fc969a5f Fix clang-format 2019-09-22 02:21:56 +02:00
FearlessTobi 366e900376 fermi_2d: Lower surface copy log severity to DEBUG 2019-09-22 02:18:57 +02:00
FearlessTobi 55d272efe6 video_core: Implement RGBX16F PixelFormat 2019-09-22 02:16:44 +02:00
Rodrigo Locatti 9286976948
Merge pull request #2878 from FernandoS27/icmp
shader_ir: Implement ICMP
2019-09-21 18:06:07 -03:00
ReinUsesLisp 44000971e2
gl_shader_decompiler: Use uint for images and fix SUATOM
In the process remove implementation of SUATOM.MIN and SUATOM.MAX as
these require a distinction between U32 and S32. These have to be
implemented with imageCompSwap loop.
2019-09-21 17:33:52 -03:00
ReinUsesLisp 675f23aedc
shader/image: Implement SULD and remove irrelevant code
* Implement SULD as float.
* Remove conditional declaration of GL_ARB_shader_viewport_layer_array.
2019-09-21 17:32:48 -03:00
ReinUsesLisp 4de0f1e1c8
shader_bytecode: Add SULD encoding 2019-09-21 17:31:46 -03:00
Fernando Sahmkow 527b841c15 Shader_IR: ICMP corrections and fixes 2019-09-21 14:28:03 -04:00
David 9ad42fb0cf
Merge pull request #2868 from ReinUsesLisp/fix-mipmaps
maxwell_to_gl: Fix mipmap filtering
2019-09-21 19:57:09 +10:00
David Marcec 01a4afee42 Mark DrawArrays as LOG_TRACE
There's no reason to clog logs with DrawArray.
2019-09-21 15:43:58 +10:00
bunnei bbe82d62b0
Merge pull request #2846 from ReinUsesLisp/fixup-viewport-index
gl_shader_decompiler: Avoid writing output attribute when unimplemented
2019-09-20 17:11:20 -04:00
bunnei 88d857499b
Merge pull request #2855 from ReinUsesLisp/shfl
shader_ir/warp: Implement SHFL for Nvidia devices
2019-09-20 17:10:42 -04:00
Fernando Sahmkow 433e764bb0 Rasterizer: Correct introduced bug where a conditional render wouldn't stop a draw call from executing 2019-09-20 15:44:28 -04:00
Fernando Sahmkow 4b81d19a1a Shader_IR: Implement ICMP. 2019-09-19 20:56:29 -04:00
Fernando Sahmkow 7761e44d18 Rasterizer: Refactor and simplify DrawBatch Interface. 2019-09-19 11:41:33 -04:00
Fernando Sahmkow d2ea592ddb Rasterizer: Address Feedback and conscerns. 2019-09-19 11:41:32 -04:00
Fernando Sahmkow c17655ce74 Rasterizer: Refactor draw calls, remove deadcode and clean up. 2019-09-19 11:41:31 -04:00
Fernando Sahmkow 7606da5611 VideoCore: Corrections to the MME Inliner and removal of hacky instance management. 2019-09-19 11:41:29 -04:00
Fernando Sahmkow ba02d564f8 Video Core: initial Implementation of InstanceDraw Packaging 2019-09-19 11:41:27 -04:00
bunnei b31880dc5e
Merge pull request #2784 from ReinUsesLisp/smem
shader_ir: Implement shared memory
2019-09-18 16:26:05 -04:00
ReinUsesLisp 0526bf1895 shader_ir/warp: Implement SHFL 2019-09-17 17:44:07 -03:00
ReinUsesLisp 2dd6411753 maxwell_to_gl: Fix mipmap filtering
OpenGL texture filters follow GL_<texture_filter>_MIPMAP_<mipmap_filter>
but we were using them in the opposite way.
2019-09-17 03:32:24 -03:00
ReinUsesLisp af809b491e gl_rasterizer: Remove unused code paths from ConfigureFramebuffers 2019-09-17 02:50:42 -03:00
Fernando Sahmkow 393cc3ef2f
Merge pull request #2851 from ReinUsesLisp/srgb
renderer_opengl: Fix sRGB blits
2019-09-15 10:38:10 -04:00
Fernando Sahmkow b8b1747704
Merge pull request #2824 from ReinUsesLisp/mme
Revert "Revert #2466" and stub FirmwareCall 4
2019-09-15 06:17:04 -04:00
Rodrigo Locatti 193bfefce4
maxwell_3d: Update firmware 4 call stub commentary 2019-09-14 22:51:18 -03:00
Fernando Sahmkow daae327e86
Merge pull request #2857 from ReinUsesLisp/surface-srgb
video_core/surface: Add function to detect sRGB surfaces
2019-09-14 03:53:21 -04:00
Fernando Sahmkow 18fac59050
Merge pull request #2858 from ReinUsesLisp/vk-device
vk_device: Add miscellaneous features and minor style changes
2019-09-14 03:52:06 -04:00
ReinUsesLisp 01d96e1136 vk_device: Add miscellaneous features and minor style changes
* Increase minimum Vulkan requirements
* Require VK_EXT_vertex_attribute_divisor
* Require depthClamp, samplerAnisotropy and largePoints features
* Search and expose VK_KHR_uniform_buffer_standard_layout
* Search and expose VK_EXT_index_type_uint8
* Search and expose native float16 arithmetics
* Track current driver with VK_KHR_driver_properties
* Query and expose SSBO alignment
* Query more image formats
* Improve logging overall
* Minor style changes
* Minor rephrasing of commentaries
2019-09-13 02:10:07 -03:00
ReinUsesLisp 99e23bd0fd video_core/surface: Add function to detect sRGB surfaces
This is required for proper conversion to RGBA8_UNORM or RGBA8_SRGB
surfaces when a backend can target both native and converted ASTC.
2019-09-13 00:27:04 -03:00
ReinUsesLisp 6b997c8f7f renderer_opengl: Fix rebase mistake 2019-09-11 00:09:37 -03:00
ReinUsesLisp 36abf67e79 shader/image: Implement SUATOM and fix SUST 2019-09-10 20:22:31 -03:00
Fernando Sahmkow e60d281a01 gl_rasterizer: Correct sRGB Fix regression 2019-09-10 19:31:42 -03:00
ReinUsesLisp 78574746bd renderer_opengl: Fix sRGB blits
Removes the sRGB hack of tracking if a frame used an sRGB rendertarget
to apply at least once to blit the final texture as sRGB. Instead of
doing this apply sRGB if the presented image has sRGB.

Also enable sRGB by default on Maxwell3D registers as some games seem to
assume this.
2019-09-10 19:31:42 -03:00
bunnei 34b2c60f95
Merge pull request #2823 from ReinUsesLisp/shr-clamp
shader/shift: Implement SHR wrapped and clamped variants
2019-09-10 11:56:17 -04:00
bunnei c7ec7bc1f5
Merge pull request #2810 from ReinUsesLisp/mme-opt
maxwell_3d: Avoid moving macro_params
2019-09-10 11:55:45 -04:00
ReinUsesLisp 17a9b0178d gl_shader_decompiler: Avoid writing output attribute when unimplemented 2019-09-06 15:02:12 -03:00
ReinUsesLisp 1f43e5296f gl_shader_decompiler: Keep track of written images and mark them as modified 2019-09-05 23:26:05 -03:00
ReinUsesLisp 7228e22098 texture_cache: Minor changes 2019-09-05 23:25:15 -03:00
ReinUsesLisp 322d0200c8 gl_rasterizer: Apply textures and images state 2019-09-05 20:35:51 -03:00
ReinUsesLisp 80ec2feee8 gl_rasterizer: Add samplers to compute dispatches 2019-09-05 20:35:51 -03:00
ReinUsesLisp 954fc02fdd gl_rasterizer: Minor code changes 2019-09-05 20:35:51 -03:00
ReinUsesLisp 04cdecb7a1 gl_state: Split textures and samplers into two arrays 2019-09-05 20:35:51 -03:00
ReinUsesLisp 6170337001 gl_rasterizer: Implement image bindings 2019-09-05 20:35:51 -03:00
ReinUsesLisp 5edf24b510 gl_state: Add support for glBindImageTextures 2019-09-05 20:35:51 -03:00
ReinUsesLisp 2424eefad2 texture_cache: Pass TIC to texture cache 2019-09-05 20:35:51 -03:00
ReinUsesLisp 3a450c1395 kepler_compute: Implement texture queries 2019-09-05 20:35:51 -03:00
ReinUsesLisp 2e5b5c2358 gl_rasterizer: Split SetupTextures 2019-09-05 20:35:51 -03:00
Fernando Sahmkow 4ee9949639
Merge pull request #2804 from ReinUsesLisp/remove-gs-special
gl_shader_cache: Remove special casing for geometry shaders
2019-09-05 16:03:46 -04:00
bunnei 03badbdd9b
Merge pull request #2833 from ReinUsesLisp/fix-stencil
gl_rasterizer: Fix stencil testing
2019-09-05 15:27:31 -04:00
ReinUsesLisp 0f7b813d65 gl_shader_decompiler: Implement shared memory 2019-09-05 01:40:24 -03:00
ReinUsesLisp 4de04eba39 shader_ir: Implement LD_S
Loads from shared memory.
2019-09-05 01:38:37 -03:00
ReinUsesLisp f17415d431 shader_ir: Implement ST_S
This instruction writes to a memory buffer shared with threads within
the same work group. It is known as "shared" memory in GLSL.
2019-09-05 01:38:37 -03:00
David d34fa7c4fa
Merge pull request #2802 from ReinUsesLisp/hsetp2-pred
half_set_predicate: Fix HSETP2 predicate assignments
2019-09-05 12:26:39 +10:00
ReinUsesLisp 6177cbdbe1 gl_shader_decompiler: Fixup slow path 2019-09-04 15:03:51 -03:00
ReinUsesLisp 7bbc98cfc3 gl_rasterizer: Fix stencil testing
* Fix stencil dirty flags tracking when stencil is disabled
* Attach stencil on clears (previously it only attached depth)
* Attach stencil on drawing regardless of stencil testing being enabled
2019-09-04 01:59:09 -03:00
ReinUsesLisp 5f309b88db Revert "Revert #2466" and stub FirmwareCall 4 2019-09-04 01:55:45 -03:00
ReinUsesLisp 77ef4fa907 shader/shift: Implement SHR wrapped and clamped variants
Nvidia defaults to wrapped shifts, but this is undefined behaviour on
OpenGL's spec. Explicitly mask/clamp according to what the guest shader
requires.
2019-09-04 01:55:24 -03:00
ReinUsesLisp 701dedcfad maxwell_3d: Avoid moving macro_params 2019-09-04 01:55:01 -03:00
ReinUsesLisp 42e1bb6d46 gl_shader_cache: Remove special casing for geometry shaders
Now that ProgramVariants holds the primitive topology we no longer need
to keep track of individual geometry shaders topologies.
2019-09-04 01:54:43 -03:00
ReinUsesLisp dfae2d141a half_set_predicate: Fix predicate assignments 2019-09-04 01:54:23 -03:00
ReinUsesLisp 9cf52d027d gl_device: Disable precise in fragment shaders on bugged drivers 2019-09-04 01:54:00 -03:00
ReinUsesLisp 03276e7490 gl_shader_decompiler: Fixup AMD's slow path type 2019-09-04 01:54:00 -03:00
ReinUsesLisp 6c449793b8 gl_shader_decompiler: Rework GLSL decompiler type system
GLSL decompiler type system was broken. We converted all return values
to float except for some cases where returning we couldn't and
implicitly broke the rule of returning floats (e.g. for bools or bool
pairs).

Instead of doing this introduce class Expression that knows what type a
return value has and when a consumer wants to use the string it asks for
it with a required type, emitting a runtime error if types are
incompatible.

This has the disadvantage that there's more C++ code, but we can emit
better GLSL code that's easier to read.
2019-09-04 01:54:00 -03:00
bunnei 19af91434e
Merge pull request #2793 from ReinUsesLisp/bgr565
renderer_opengl: Implement RGB565 framebuffer format
2019-09-03 22:36:32 -04:00
bunnei 81fbc5370d
Merge pull request #2812 from ReinUsesLisp/f2i-selector
shader_ir/conversion: Implement F2I and F2F F16 selector
2019-09-03 22:35:33 -04:00
bunnei d4f33b822b
Merge pull request #2811 from ReinUsesLisp/fsetp-fix
float_set_predicate: Add missing negation bit for the second operand
2019-09-03 22:34:34 -04:00
bunnei 137d165672
Merge pull request #2826 from ReinUsesLisp/macro-binding
maxwell_3d: Fix macro binding cursor
2019-09-03 22:32:42 -04:00
bunnei 50b5bb44a0
Merge pull request #2765 from FernandoS27/dma-fix
MaxwellDMA: Fixes, corrections and relaxations.
2019-09-01 13:13:05 -04:00
ReinUsesLisp 52a41f482f maxwell_3d: Fix macro binding cursor 2019-09-01 05:01:11 -03:00
Rodrigo Locatti 4d4f9cc104 video_core: Silent miscellaneous warnings (#2820)
* texture_cache/surface_params: Remove unused local variable

* rasterizer_interface: Add missing documentation commentary

* maxwell_dma: Remove unused rasterizer reference

* video_core/gpu: Sort member declaration order to silent -Wreorder warning

* fermi_2d: Remove unused MemoryManager reference

* video_core: Silent unused variable warnings

* buffer_cache: Silent -Wreorder warnings

* kepler_memory: Remove unused MemoryManager reference

* gl_texture_cache: Add missing override

* buffer_cache: Add missing include

* shader/decode: Remove unused variables
2019-08-30 14:08:00 -04:00
ReinUsesLisp 878adee0a3 gl_buffer_cache: Add missing include
RasterizerInterface was considered an incomplete object by clang.
2019-08-29 22:02:52 +00:00
bunnei a67c4e6e02
Merge pull request #2742 from ReinUsesLisp/fix-texture-buffers
gl_texture_cache: Miscellaneous texture buffer fixes
2019-08-29 15:59:17 -04:00
bunnei e424615839
Merge pull request #2783 from FernandoS27/new-buffer-cache
Implement a New LLE Buffer Cache
2019-08-29 13:07:01 -04:00
bunnei f8cc5668f8
Merge pull request #2758 from ReinUsesLisp/packed-tid
shader/decode: Implement S2R Tic
2019-08-29 12:58:43 -04:00
ReinUsesLisp e3534700d7 shader_ir/conversion: Split int and float selector and implement F2F H1 2019-08-28 16:09:33 -03:00
ReinUsesLisp b13fbc25b8 shader_ir/conversion: Implement F2I F16 Ra.H1 2019-08-27 23:40:40 -03:00
ReinUsesLisp 6207751b00 float_set_predicate: Add missing negation bit for the second operand 2019-08-27 21:57:43 -03:00
ReinUsesLisp 4e35177e23 shader_ir: Implement VOTE
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics

Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.

To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:

* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true

ballotARB, also known as "uint64_t(activeThreadsNV())", emits

VOTE.ANY Rd, PT, PT;

on nouveau's compiler. This doesn't match exactly to Nvidia's code

VOTE.ALL Rd, PT, PT;

Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
2019-08-21 14:50:38 -03:00