In case of redundant yields, the scheduler will now idle the core for
it's timeslice, in order to avoid continuously yielding the same thing
over and over.
This only encourages the use of the global system instance (which will
be phased out long-term). Instead, we use the direct system function
call directly to remove the appealing but discouraged short-hand.
If an unmapping operation fails, we shouldn't be decrementing the amount
of memory mapped and returning that the operation was successful. We
should actually be returning the error code in this case.
Avoids potentially expensive (depending on the size of the memory block)
allocations by reserving the necessary memory before performing both
insertions. This avoids scenarios where the second insert may cause a
reallocation to occur.
Avoids needing to read the same long sequence of code in both code
paths. Also makes it slightly nicer to read and debug, as the locals
will be able to be shown in the debugger.
This commit ensures that all backing memory allocated for the Guest CPU
is aligned to 256 bytes. This due to how gpu memory works and the heavy
constraints it has in the alignment of physical memory.
This was initially necessary when AArch64 JIT emulation was in its
infancy and all memory-related instructions weren't implemented.
Given the JIT now has all of these facilities implemented, we can remove
these functions from the CPU interface.
Prior to PR, Yuzu did not restore memory to RW-
on unmap of mirrored memory or unloading of NRO.
(In fact, in the NRO case, the memory was unmapped
instead of reprotected to --- on Load, so it was
actually lost entirely...)
This PR addresses that, and restores memory to RW-
as it should.
This fixes a crash in Super Smash Bros when creating
a World of Light save for the first time, and possibly
other games/circumstances.
This sets the DeviceMapped attribute for GPU-mapped memory blocks,
and prevents merging device mapped blocks. This prevents memory
mapped from the gpu from having its backing address changed by
block coalesce.
This implements svcMapPhysicalMemory/svcUnmapPhysicalMemory for Yuzu,
which can be used to map memory at a desired address by games since
3.0.0.
It also properly parses SystemResourceSize from NPDM, and makes
information available via svcGetInfo.
This is needed for games like Super Smash Bros. and Diablo 3 -- this
PR's implementation does not run into the "ASCII reads" issue mentioned
in the comments of #2626, which was caused by the following bugs in
Yuzu's memory management that this PR also addresses:
* Yuzu's memory coalescing does not properly merge blocks. This results
in a polluted address space/svcQueryMemory results that would be
impossible to replicate on hardware, which can lead to game code making
the wrong assumptions about memory layout.
* This implements better merging for AllocatedMemoryBlocks.
* Yuzu's implementation of svcMirrorMemory unprotected the entire
virtual memory range containing the range being mirrored. This could
lead to games attempting to map data at that unprotected
range/attempting to access that range after yuzu improperly unmapped
it.
* This PR fixes it by simply calling ReprotectRange instead of
Reprotect.
Prior to execution within a process beginning, the process establishes
its own TLS region for uses (as far as I can tell) related to exception
handling.
Now that TLS creation was decoupled from threads themselves, we can add
this behavior to our Process class. This is also good, as it allows us
to remove a stub within svcGetInfo, namely querying the address of that
region.
Provides a more accurate name for the memory region and also
disambiguates between the map and new map regions of memory, making it
easier to understand.
Handles the placement of the stack a little nicer compared to the
previous code, which was off in a few ways. e.g.
The stack (new map) region, shouldn't be the width of the entire address
space if the size of the region calculation ends up being zero. It
should be placed at the same location as the TLS IO region and also have
the same size.
In the event the TLS IO region contains a size of zero, we should also
be doing the same thing. This fixes our memory layout a little bit and
also resolves some cases where assertions can trigger due to the memory
layout being incorrect.
Extracts out all of the thread local storage management from thread
instances themselves and makes the owning process handle the management
of the memory. This brings the memory management slightly more in line
with how the kernel handles these allocations.
Furthermore, this also makes the TLS page management a little more
readable compared to the lingering implementation that was carried over
from Citra.
This will be necessary for making our TLS slot management slightly more
straightforward. This can also be utilized for other purposes in the
future.
We can implement the existing simpler overload in terms of this one
anyways, we just pass the beginning and end of the ASLR region as the
boundaries.
The old implementation had faulty Threadsafe methods where events could
be missing. This implementation unifies unsafe/safe methods and makes
core timing thread safe overall.
This is performing more work than would otherwise be necessary during
VMManager's destruction. All we actually want to occur in this scenario
is for any allocated memory to be freed, which will happen automatically
as the VMManager instance goes out of scope.
Anything else being done is simply unnecessary work.
Given we don't currently implement the personal heap yet, the existing
memory querying functions are essentially doing what the memory querying
types introduced in 6.0.0 do.
So, we can build the necessary machinery over the top of those and just
use them as part of info types.
These are only used from within this translation unit, so they don't
need to have external linkage. They were intended to be marked with this
anyways to be consistent with the other service functions.
Renames the members to more accurately indicate what they signify.
"OneShot" and "Sticky" are kind of ambiguous identifiers for the reset
types, and can be kind of misleading. Automatic and Manual communicate
the kind of reset type in a clearer manner. Either the event is
automatically reset, or it isn't and must be manually cleared.
The "OneShot" and "Sticky" terminology is just a hold-over from Citra
where the kernel had a third type of event reset type known as "Pulse".
Given the Switch kernel only has two forms of event reset types, we
don't need to keep the old terminology around anymore.
This reduces the boilerplate that services have to write out the current thread explicitly. Using current thread instead of client thread is also semantically incorrect, and will be a problem when we implement multicore (at which time there will be multiple current threads)
These are actually quite important indicators of thread lifetimes, so
they should be going into the debug log, rather than being treated as
misc info and delegated to the trace log.
Makes the code much nicer to follow in terms of behavior and control
flow. It also fixes a few bugs in the implementation.
Notably, the thread's owner process shouldn't be accessed in order to
retrieve the core mask or ideal core. This should be done through the
current running process. The only reason this bug wasn't encountered yet
is because we currently only support running one process, and thus every
owner process will be the current process.
We also weren't checking against the process' CPU core mask to see if an
allowed core is specified or not.
With this out of the way, it'll be less noisy to implement proper
handling of the affinity flags internally within the kernel thread
instances.
This is a holdover from Citra, where the 3DS has both
WaitSynchronization1 and WaitSynchronizationN. The switch only has one
form of wait synchronizing (literally WaitSynchonization). This allows
us to throw out code that doesn't apply at all to the Switch kernel.
Because of this unnecessary dichotomy within the wait synchronization
utilities, we were also neglecting to properly handle waiting on
multiple objects.
While we're at it, we can also scrub out any lingering references to
WaitSynchronization1/WaitSynchronizationN in comments, and change them
to WaitSynchronization (or remove them if the mention no longer
applies).
The actual behavior of this function is slightly more complex than what
we're currently doing within the supervisor call. To avoid dumping most
of this behavior in the supervisor call itself, we can migrate this to
another function.
This member variable is entirely unused. It was only set but never
actually utilized. Given that, we can remove it to get rid of noise in
the thread interface.
Essentially performs the inverse of svcMapProcessCodeMemory. This unmaps
the aliasing region first, then restores the general traits of the
aliased memory.
What this entails, is:
- Restoring Read/Write permissions to the VMA.
- Restoring its memory state to reflect it as a general heap memory region.
- Clearing the memory attributes on the region.
This gives us significantly more control over where in the
initialization process we start execution of the main process.
Previously we were running the main process before the CPU or GPU
threads were initialized (not good). This amends execution to start
after all of our threads are properly set up.
Initially required due to the split codepath with how the initial main
process instance was initialized. We used to initialize the process
like:
Init() {
main_process = Process::Create(...);
kernel.MakeCurrentProcess(main_process.get());
}
Load() {
const auto load_result = loader.Load(*kernel.GetCurrentProcess());
if (load_result != Loader::ResultStatus::Success) {
// Handle error here.
}
...
}
which presented a problem.
Setting a created process as the main process would set the page table
for that process as the main page table. This is fine... until we get to
the part that the page table can have its size changed in the Load()
function via NPDM metadata, which can dictate either a 32-bit, 36-bit,
or 39-bit usable address space.
Now that we have full control over the process' creation in load, we can
simply set the initial process as the main process after all the loading
is done, reflecting the potential page table changes without any
special-casing behavior.
We can also remove the cache flushing within LoadModule(), as execution
wouldn't have even begun yet during all usages of this function, now
that we have the initialization order cleaned up.
Our initialization process is a little wonky than one would expect when
it comes to code flow. We initialize the CPU last, as opposed to
hardware, where the CPU obviously needs to be first, otherwise nothing
else would work, and we have code that adds checks to get around this.
For example, in the page table setting code, we check to see if the
system is turned on before we even notify the CPU instances of a page
table switch. This results in dead code (at the moment), because the
only time a page table switch will occur is when the system is *not*
running, preventing the emulated CPU instances from being notified of a
page table switch in a convenient manner (technically the code path
could be taken, but we don't emulate the process creation svc handlers
yet).
This moves the threads creation into its own member function of the core
manager and restores a little order (and predictability) to our
initialization process.
Previously, in the multi-threaded cases, we'd kick off several threads
before even the main kernel process was created and ready to execute (gross!).
Now the initialization process is like so:
Initialization:
1. Timers
2. CPU
3. Kernel
4. Filesystem stuff (kind of gross, but can be amended trivially)
5. Applet stuff (ditto in terms of being kind of gross)
6. Main process (will be moved into the loading step in a following
change)
7. Telemetry (this should be initialized last in the future).
8. Services (4 and 5 should ideally be alongside this).
9. GDB (gross. Uses namespace scope state. Needs to be refactored into a
class or booted altogether).
10. Renderer
11. GPU (will also have its threads created in a separate step in a
following change).
Which... isn't *ideal* per-se, however getting rid of the wonky
intertwining of CPU state initialization out of this mix gets rid of
most of the footguns when it comes to our initialization process.
Some objects declare their handle type as const, while others declare it
as constexpr. This makes the const ones constexpr for consistency, and
prevent unexpected compilation errors if these happen to be attempted to be
used within a constexpr context.
We need to ensure dynarmic gets a valid pointer if the page table is
resized (the relevant pointers would be invalidated in this scenario).
In this scenario, the page table can be resized depending on what kind
of address space is specified within the NPDM metadata (if it's
present).
Adjusts the interface of the wrappers to take a system reference, which
allows accessing a system instance without using the global accessors.
This also allows getting rid of all global accessors within the
supervisor call handling code. While this does make the wrappers
themselves slightly more noisy, this will be further cleaned up in a
follow-up. This eliminates the global system accessors in the current
code while preserving the existing interface.
Keeps the return type consistent with the function name. While we're at
it, we can also reduce the amount of boilerplate involved with handling
these by using structured bindings.
We need to be checking whether or not the given address is within the
kernel address space or if the given address isn't word-aligned and bail
in these scenarios instead of trashing any kernel state.
Given server sessions can be given a name, we should allow retrieving
it instead of using the default implementation of GetName(), which would
just return "[UNKNOWN KERNEL OBJECT]".
The AddressArbiter type isn't actually used, given the arbiter itself
isn't a direct kernel object (or object that implements the wait object
facilities).
Given this, we can remove the enum entry entirely.
Similarly like svcGetProcessList, this retrieves the list of threads
from the current process. In the kernel itself, a process instance
maintains a list of threads, which are used within this function.
Threads are registered to a process' thread list at thread
initialization, and unregistered from the list upon thread destruction
(if said thread has a non-null owning process).
We assert on the debug event case, as we currently don't implement
kernel debug objects.
Now that ShouldWait() is a const qualified member function, this one can
be made const qualified as well, since it can handle passing a const
qualified this pointer to ShouldWait().
Previously this was performing a u64 + int sign conversion. When dealing
with addresses, we should generally be keeping the arithmetic in the
same signedness type.
This also gets rid of the static lifetime of the constant, as there's no
need to make a trivial type like this potentially live for the entire
duration of the program.
This doesn't really provide any benefit to the resource limit interface.
There's no way for callers to any of the service functions for resource
limits to provide a custom name, so all created instances of resource
limits other than the system resource limit would have a name of
"Unknown".
The system resource limit itself is already trivially identifiable from
its limit values, so there's no real need to take up space in the object to
identify one object meaningfully out of N total objects.
Since C++17, the introduction of deduction guides for locking facilities
means that we no longer need to hardcode the mutex type into the locks
themselves, making it easier to switch mutex types, should it ever be
necessary in the future.
Since C++17, we no longer need to explicitly specify the type of the
mutex within the lock_guard. The type system can now deduce these with
deduction guides.
The kernel makes sure that the given size to unmap is always the same
size as the entire region managed by the shared memory instance,
otherwise it returns an error code signifying an invalid size.
This is similarly done for transfer memory (which we already check for).
Reports the (mostly) correct size through svcGetInfo now for queries to
total used physical memory. This still doesn't correctly handle memory
allocated via svcMapPhysicalMemory, however, we don't currently handle
that case anyways.
This will make operating with the process-related SVC commands much
nicer in the future (the parameter representing the stack size in
svcStartProcess is a 64-bit value).
In some cases, our callbacks were using s64 as a parameter, and in other
cases, they were using an int, which is inconsistent.
To make all callbacks consistent, we can just use an s64 as the type for
late cycles, given it gets rid of the need to cast internally.
While we're at it, also resolve some signed/unsigned conversions that
were occurring related to the callback registration.
One behavior that we weren't handling properly in our heap allocation
process was the ability for the heap to be shrunk down in size if a
larger size was previously requested.
This adds the basic behavior to do so and also gets rid of HeapFree, as
it's no longer necessary now that we have allocations and deallocations
going through the same API function.
While we're at it, fully document the behavior that this function
performs.
Makes it more obvious that this function is intending to stand in for
the actual supervisor call itself, and not acting as a general heap
allocation function.
Also the following change will merge the freeing behavior of HeapFree
into this function, so leaving it as HeapAllocate would be misleading.
In cases where HeapAllocate is called with the same size of the current
heap, we can simply do nothing and return successfully.
This avoids doing work where we otherwise don't have to. This is also
what the kernel itself does in this scenario.
Another holdover from citra that can be tossed out is the notion of the
heap needing to be allocated in different addresses. On the switch, the
base address of the heap will always be managed by the memory allocator
in the kernel, so this doesn't need to be specified in the function's
interface itself.
The heap on the switch is always allocated with read/write permissions,
so we don't need to add specifying the memory permissions as part of the
heap allocation itself either.
This also corrects the error code returned from within the function.
If the size of the heap is larger than the entire heap region, then the
kernel will report an out of memory condition.
The use of a shared_ptr is an implementation detail of the VMManager
itself when mapping memory. Because of that, we shouldn't require all
users of the CodeSet to have to allocate the shared_ptr ahead of time.
It's intended that CodeSet simply pass in the required direct data, and
that the memory manager takes care of it from that point on.
This means we just do the shared pointer allocation in a single place,
when loading modules, as opposed to in each loader.
Makes it more evident that one is for actual code and one is for actual
data. Mutable and static are less than ideal terms here, because
read-only data is technically not mutable, but we were mapping it with
that label.
Given this is utilized by the loaders, this allows avoiding inclusion of
the kernel process definitions where avoidable.
This also keeps the loading format for all executable data separate from
the kernel objects.
Rather than make a global accessor for this sort of thing. We can make
it a part of the thread interface itself. This allows getting rid of a
hidden global accessor in the kernel code.
This condition was checking against the nominal thread priority, whereas
the kernel itself checks against the current priority instead. We were
also assigning the nominal priority, when we should be assigning
current_priority, which takes priority inheritance into account.
This can lead to the incorrect priority being assigned to a thread.
Given we recursively update the relevant threads, we don't need to go
through the whole mutex waiter list. This matches what the kernel does
as well (only accessing the first entry within the waiting list).
Makes it an instantiable class like it is in the actual kernel. This
will also allow removing reliance on global accessors in a following
change, now that we can encapsulate a reference to the system instance
in the class.
Within the kernel, shared memory and transfer memory facilities exist as
completely different kernel objects. They also have different validity
checking as well. Therefore, we shouldn't be treating the two as the
same kind of memory.
They also differ in terms of their behavioral aspect as well. Shared
memory is intended for sharing memory between processes, while transfer
memory is intended to be for transferring memory to other processes.
This breaks out the handling for transfer memory into its own class and
treats it as its own kernel object. This is also important when we
consider resource limits as well. Particularly because transfer memory
is limited by the resource limit value set for it.
While we currently don't handle resource limit testing against objects
yet (but we do allow setting them), this will make implementing that
behavior much easier in the future, as we don't need to distinguish
between shared memory and transfer memory allocations in the same place.
With this, all kernel objects finally have all of their data members
behind an interface, making it nicer to reason about interactions with
other code (as external code no longer has the freedom to totally alter
internals and potentially messing up invariants).
There's no real need to use a shared lifetime here, since we don't
actually expose them to anything else. This is also kind of an
unnecessary use of the heap given the objects themselves are so small;
small enough, in fact that changing over to optionals actually reduces
the overall size of the HLERequestContext struct (818 bytes to 808
bytes).
Now that we have the address arbiter extracted to its own class, we can
fix an innaccuracy with the kernel. Said inaccuracy being that there
isn't only one address arbiter. Each process instance contains its own
AddressArbiter instance in the actual kernel.
This fixes that and gets rid of another long-standing issue that could
arise when attempting to create more than one process.
Similar to how WaitForAddress was isolated to its own function, we can
also move the necessary conditional checking into the address arbiter
class itself, allowing us to hide the implementation details of it from
public use.