This gives us significantly more control over where in the
initialization process we start execution of the main process.
Previously we were running the main process before the CPU or GPU
threads were initialized (not good). This amends execution to start
after all of our threads are properly set up.
Neither of these functions require the use of shared ownership of the
returned pointer. This makes it more difficult to create reference
cycles with, and makes the interface more generic, as std::shared_ptr
instances can be created from a std::unique_ptr, but the vice-versa
isn't possible. This also alters relevant functions to take NCA
arguments by const reference rather than a const reference to a
std::shared_ptr. These functions don't alter the ownership of the memory
used by the NCA instance, so we can make the interface more generic by
not assuming anything about the type of smart pointer the NCA is
contained within and make it the caller's responsibility to ensure the
supplied NCA is valid.
A process should never require being reference counted in this
situation. If the handle to a process is freed before this function is
called, it's definitely a bug with our lifetime management, so we can
put the requirement in place for the API that the process must be a
valid instance.