Commit graph

232 commits

Author SHA1 Message Date
David Marcec fabdf5d385 Addressed issues 2020-06-24 12:09:03 +10:00
David Marcec 6ce5f3120b Macro HLE support 2020-06-24 12:09:01 +10:00
bunnei c2ea1e1bcb
Merge pull request #4049 from ReinUsesLisp/separate-samplers
shader/texture: Join separate image and sampler pairs offline
2020-06-13 13:48:27 -04:00
ReinUsesLisp c95c254f3e texture_cache: Implement rendering to 3D textures
This allows rendering to 3D textures with more than one slice.
Applications are allowed to render to more than one slice of a texture
using gl_Layer from a VTG shader.

This also requires reworking how 3D texture collisions are handled, for
now, this commit allows rendering to slices but not to miplevels. When a
render target attempts to write to a mipmap, we fallback to the previous
implementation (copying or flushing as needed).

- Fixes color correction 3D textures on UE4 games (rainbow effects).
- Allows Xenoblade games to render to 3D textures directly.
2020-06-08 05:01:00 -03:00
ReinUsesLisp 5b2b6d594c shader/texture: Join separate image and sampler pairs offline
Games using D3D idioms can join images and samplers when a shader
executes, instead of baking them into a combined sampler image. This is
also possible on Vulkan.

One approach to this solution would be to use separate samplers on
Vulkan and leave this unimplemented on OpenGL, but we can't do this
because there's no consistent way of determining which constant buffer
holds a sampler and which one an image. We could in theory find the
first bit and if it's in the TIC area, it's an image; but this falls
apart when an image or sampler handle use an index of zero.

The used approach is to track for a LOP.OR operation (this is done at an
IR level, not at an ISA level), track again the constant buffers used as
source and store this pair. Then, outside of shader execution, join
the sample and image pair with a bitwise or operation.

This approach won't work on games that truly use separate samplers in a
meaningful way. For example, pooling textures in a 2D array and
determining at runtime what sampler to use.

This invalidates OpenGL's disk shader cache :)

- Used mostly by D3D ports to Switch
2020-06-05 00:24:51 -03:00
David Marcec 411f5527d4 Mark parameters as const 2020-06-03 16:33:38 +10:00
David Marcec 3a20e74f40 Pass by reference instead of copying parameters 2020-06-02 16:37:06 +10:00
David Marcec b032ebdfee Implement macro JIT 2020-05-30 11:40:04 +10:00
bunnei 50c27d5ae1
Merge pull request #3885 from ReinUsesLisp/viewport-swizzles
video_core: Implement viewport swizzles with NV_viewport_swizzle
2020-05-08 15:16:53 -04:00
bunnei 41682e0888
Merge pull request #3815 from FernandoS27/command-list-2
GPU: More optimizations to GPU Command List Processing and DMA Copy Optimizations
2020-05-05 17:12:42 -04:00
ReinUsesLisp 2dbf5290f2 vk_graphics_pipeline: Implement viewport swizzles with NV_viewport_swizzle 2020-05-04 18:31:17 -03:00
ReinUsesLisp 9b8e962368 maxwell_3d: Add viewport swizzles 2020-05-04 17:50:59 -03:00
bunnei 2aff0b4733
Merge pull request #3808 from ReinUsesLisp/wait-for-idle
{maxwell_3d,buffer_cache}: Implement memory barriers using 3D registers
2020-05-03 02:43:18 -04:00
Fernando Sahmkow 9df67b2095 Clang Format and Documentation. 2020-04-28 14:02:51 -04:00
ReinUsesLisp fe931ac976 {maxwell_3d,buffer_cache}: Implement memory barriers using 3D registers
Drop MemoryBarrier from the buffer cache and use Maxwell3D's register
WaitForIdle.

To implement this on OpenGL we just call glMemoryBarrier with the
necessary bits.

Vulkan lacks this synchronization primitive, so we set an event and
immediately wait for it. This is not a pretty solution, but it's what
Vulkan can do without submitting the current command buffer to the queue
(which ends up being more expensive on the CPU).
2020-04-28 02:18:12 -03:00
Fernando Sahmkow 90e5694230 VideoCore/Engines: Refactor Engines CallMethod. 2020-04-27 21:47:58 -04:00
ReinUsesLisp bb1ed66d99 maxwell_3d: Fix depth clamping register
Using deko3d as reference:
4e47ba0013/source/maxwell/gpu_3d_state.cpp (L42)

We were using bits 3 and 4 to determine depth clamping, but these are
the same both enabled and disabled:

state->depthClampEnable ? 0x101A : 0x181D

The same happens on Nvidia's OpenGL driver, where they do something like
this (default capabilities, GL 4.5 compatibility):

(state & DEPTH_CLAMP) != 0 ? 0x201a : 0x281c

There's always a difference between the first bits in this register, but
bit 11 is consistently disabled on both deko3d/NVN and OpenGL. This
commit changes yuzu's behaviour to use bit 11 to determine depth
clamping.

- Fixes depth issues on Super Mario Odyssey's intro.
2020-04-27 20:50:14 -03:00
bunnei 6c7d8073be
Merge pull request #3742 from FernandoS27/command-list
Optimize GPU Command Lists and Introduce Fast GPU Time Option
2020-04-27 00:18:46 -04:00
Fernando Sahmkow 3fedcc2f6e DMAPusher: Propagate multimethod writes into the engines. 2020-04-23 08:52:55 -04:00
ReinUsesLisp 0bbae63300 gl_rasterizer: Fix buffers without size
On NVN buffers can be enabled but have no size. According to deko3d and
the behavior we see in Animal Crossing: New Horizons these buffers get
the special address of 0x1000 and limit themselves to 0xfff.

Implement buffers without a size by binding a null buffer to OpenGL
without a side.

1d1930beea/source/maxwell/gpu_3d_vbo.cpp (L62-L63)
2020-04-21 19:55:44 -03:00
ReinUsesLisp ab6704f20c fixed_pipeline_state: Pack attribute state
Reduce FixedPipelineState's size from 1384 to 664 bytes
2020-04-18 19:21:19 -03:00
ReinUsesLisp 6dfcabc800 gl_rasterizer: Implement constant vertex attributes
Credits go to gdkchan from Ryujinx for finding constant attributes are
used in retail games.
2020-04-14 17:58:53 -03:00
ReinUsesLisp 76615b9f34 gl_rasterizer: Implement line widths and smooth lines
Implements "legacy" features from OpenGL present on hardware such as
smooth lines and line width.
2020-04-13 01:30:34 -03:00
ReinUsesLisp a7baf6fee4 video_core: Add MSAA registers in 3D engine and TIC
This adds the registers used for multisampling. It doesn't implement
anything for now.
2020-04-12 00:21:27 -03:00
namkazy f66743cd0c maxwell_3d: change declaration order 2020-03-22 13:41:16 +07:00
namkazy 7051dc1902 maxwell_3d: update comments for shadow ram usage 2020-03-22 11:35:26 +07:00
Nguyen Dac Nam dbfbe352e0 maxwell_3d: implement MME shadow RAM 2020-03-22 10:53:35 +07:00
ReinUsesLisp afebdda203 maxwell_3d: Add padding words to XFB entries
Use INSERT_UNION_PADDING_WORDS instead of alignas to ensure a size
requirement.
2020-03-13 18:33:05 -03:00
ReinUsesLisp 8e9f23f393 gl_rasterizer: Implement transform feedback bindings 2020-03-13 18:33:04 -03:00
Rodrigo Locatti 244fe13219
Merge branch 'master' into shader-purge 2020-03-13 16:44:06 -03:00
ReinUsesLisp e4bc3c3342 gl_rasterizer: Implement polygon modes and fill rectangles 2020-03-09 20:39:58 -03:00
ReinUsesLisp eb5861e0a2 engines/maxwell_3d: Add TFB registers and store them in shader registry 2020-03-09 18:40:53 -03:00
ReinUsesLisp 042256c6bb state_tracker: Remove type traits with named structures 2020-02-28 17:56:43 -03:00
ReinUsesLisp 15cadc3948 maxwell_3d: Use two tables instead of three for dirty flags 2020-02-28 17:56:42 -03:00
ReinUsesLisp 9b08698a0c maxwell_3d: Change write dirty flags to a bitset 2020-02-28 17:56:42 -03:00
ReinUsesLisp 9e74e6988b maxwell_3d: Flatten cull and front face registers 2020-02-28 17:56:41 -03:00
ReinUsesLisp eed789d0d1 video_core: Reintroduce dirty flags infrastructure 2020-02-28 17:56:41 -03:00
ReinUsesLisp 1eee891f6e gl_state: Remove clip distances tracking 2020-02-28 17:26:26 -03:00
ReinUsesLisp d3e433a380 gl_state: Remove viewport and depth range tracking 2020-02-28 17:25:18 -03:00
ReinUsesLisp 96ac3d518a gl_rasterizer: Remove dirty flags 2020-02-28 16:39:27 -03:00
bunnei e22ad52cdb
Merge pull request #3425 from ReinUsesLisp/layered-framebuffer
texture_cache: Implement layered framebuffer attachments
2020-02-24 10:14:50 -05:00
ReinUsesLisp 6a0220b2e1 texture_cache: Implement layered framebuffer attachments
Layered framebuffer attachments is a feature that allows applications to
write attach layered textures to a single attachment. What layer the
fragments are written to is decided from the shader using gl_Layer.
2020-02-16 04:19:32 -03:00
ReinUsesLisp aae8c180cb gl_query_cache: Implement host queries using a deferred cache
Instead of waiting immediately for executed commands, defer the query
until the guest CPU reads it. This way we get closer to what the guest
program is doing.

To archive this we have to build a dependency queue, because host APIs
(like OpenGL and Vulkan) use ranged queries instead of counters like
NVN.

Waiting for queries implicitly uses fences and this requires a command
being queued, otherwise the driver will lock waiting until a timeout. To
fix this when there are no commands queued, we explicitly call glFlush.
2020-02-14 17:33:13 -03:00
ReinUsesLisp 2b58652f08 maxwell_3d: Slow implementation of passed samples (query 21)
Implements GL_SAMPLES_PASSED by waiting immediately for queries.
2020-02-14 17:27:17 -03:00
bunnei 3563af2364
Merge pull request #3395 from FernandoS27/queries
GPU: Refactor queries implementation and correct GPU Clock.
2020-02-13 20:18:26 -05:00
bunnei 37f1cf8cbd
Merge pull request #3376 from ReinUsesLisp/point-sprite
gl_rasterizer: Implement GL_POINT_SPRITE
2020-02-11 08:26:07 -05:00
Fernando Sahmkow 0cb3bcfbb7 Maxwell3D: Correct query reporting. 2020-02-10 10:41:43 -04:00
ReinUsesLisp 7da52673d0 gl_rasterizer: Implement GL_POINT_SPRITE
OpenGL core defaults to GL_POINT_SPRITE, meanwhile on OpenGL
compatibility we have to explicitly enable it. This fixes
gl_PointCoord's behaviour.
2020-02-04 15:19:45 -03:00
ReinUsesLisp 4eed744277 maxwell_3d: Fix stencil back mask 2020-02-02 17:50:46 -03:00
Fernando Sahmkow b97608ca64 Shader_IR: Allow constant access of guest driver. 2020-01-24 16:43:30 -04:00