This forum uses cookies

~~Ontakeio~~ · (This post was last modified: 04-24-2014, 09:48 PM by Ontakeio.)

TL,DR; you can skip first part below if you just want to see my ideas:

I would like to first say that I have been long time observer of this emulator, and have digged in the source lots and notice the many issues that can be fixed. Also, speed seems to be problem for the future as single die chips will not sport the power needed. I have followed these forums for many months and have seen progression, but notice some simple problems can be sorted out by creating separate builds and a co-design plan to cator for higher-end machines that could handle the power of parallelism that ps3 needs.

Here are some of my ideas (more to be added) on how RPCS3 can be benefit:
1.JIT or JITIL emulation:
JIT will compiles the PowerPC code into x86 code (or whatever targets wanted), copies it into a cache, then executes it. While in cache, it run faster since access time is reduced. JITIL can be optimized and experimental, but can work better since JITIL does basically the same thing, but can compile PS3's machine code to an intermediate language before native execution and can cache some of that as well. For some FPU instructions in PS3, caching and transferring some code to an IL will benefit some game code. At the least, it should be considered since Dolphin use this and it works wonders on running some games and gives a performance boost.

2.Supercomputer optimization:
Though it is not practical, optimizing RSX and Cell's PPU parallelism between a specific option for supercomputing platforms will greatly benefit performance. I mean not talking about ten thousand dollar supercomputers and stuff, but a lighter supercomputer that can take advantage of maybe a MPI (message passing interface, a system designed for parallelism, typically in supercomputers), definitely multi-GPU and GPGPU stuff (get a build set up to work with Mantle and some low-level shaders, implement some GPGPU for off-loading the main CPU and use multi-GPU for drawing to the screen and other logical GPU for computations). Linux release could benefit a similar technology of Open MPI as well as Mac/Unix,etc. This supercomputer could be an array of several i7s connected through a MPI or such and multi-GPUs ($2000-$3000), and all tasks can be splitted up and processed by many at once, making the desired output faster. This may not seem worthy ,but it is not a total waste and will definitely be promising if done right. Any PS3 game and power should run perfect as the long run for this, and making it an option for rpcs3 with other execution core might suffice.

3.Take advantage of lower-level advantage:
Some parts of code can be optimized much better in Assembler language for the desired platforms. By doing so you reduce C/C++ overhead, even if you think your compiler can beat assembler all the time. Doing this for one subroutine is hardly worthy, but if you optimize many parts of execution (subroutines, loops, iterations, etc.) for lower-level, faster algorithms, the whole program will benefit. Some high-level C++ code can better be written in target assembly and take advantage of features C++ cannot offer. It also avoid using the C++ runtime somewhat, and can be replaced with assembly using less opcode that gets the same amount of work done with (sometimes) millions of saved clock cycles, depending on optimizing.

If this thread still open, I will come back and add a few more ideas but right now I have to go somewhere and do something.

***ssshadow*** · (This post was last modified: 04-24-2014, 10:10 PM by ssshadow.)

I don't feel like I can comment on this since I don't have that deep understanding of how rpcs3 works, so instead I am going to ask, can you do this? Feel free to Wink

***Bigpet*** · (This post was last modified: 04-24-2014, 10:58 PM by Bigpet.)

I appreciate the sentiment but I don't think you are correctly assessing the situation. We currently spend most of the time in our inefficient memory system. We aren't really strapped for "ideas" what we're lacking is manpower and time. But again, I appreciate the will to help.

Darkriot · 04-24-2014, 10:59 PM

Amm...i bad know english, and know little bit about Rpcs3, but if I understand this text, this guy wants to help Rpcs3?
(Sorry, for this terrible english)

logan · 04-25-2014, 12:14 AM

great ideas, but you should start on github.
Start writing a code

***notq*** · 04-25-2014, 12:16 AM

(04-24-2014, 10:59 PM)fenix0082 Wrote: Amm...i bad know english, and know little bit about Rpcs3, but if I understand this text, this guy wants to help Rpcs3?
(Sorry, for this terrible english)

He just tells devs about his ideas. But i guess devs knows about this methods and what they are doing more better.

derpf · 04-25-2014, 01:10 AM

1. A PPU recompiler (or two or three?) is planned, and an SPU recompiler is already in place. I am very certainly no expert on the matter, but going through an IL will just be extra indirection when you could optimize the target assembly anyway.

2. The only reason to do this would be for an academic research project. If you feel you are an academic, feel free to try doing it, but that is not the goal of most or all existing emulation projects.

3. Inline assembly is almost never worth it. rpcs3 already uses SSE intrinsics on some platforms. Assembly just convolutes the code while simultaneously making it unportable, for extremely little to no benefit whatsoever. You will have a much better time increasing cache coherence or, even better, fixing the larger architectural or implementation issues (like the memory manager as Bigpet points out.)

Thanks for the write-up. Anything else in that head of yours? Tongue

Triple1Truth · 05-11-2014, 10:10 PM

But building your own recompiler from scratch like PCSX2 did seem really,extremely legit. In some cases, even though the PS2 is different from the Wii, PCSX2 runs better on most amd cpu's then on Dolphin. And this time, AMD can really be helpful with their AMD FX 6300 / 8350. Intel Core i7-4960X 6-core cpu and developers wont just go around saying " Go get an I7-4960X with a Geforce GTX 750 / Radeon HD 6850 ( something that possibly wont bottleneck the i7-4960X ) " to people with $500-$600 budgets.

***hlide*** · 05-11-2014, 10:41 PM

PS2 has several dedicated cores, so I guess PCSX2 is exploiting the cores of a PC to make the emulator running faster. There is a small difference, though. CPU frequencies between PS2 and PS3 have a huge gap and PS3 can be considered as real 7/8 cores (it is not clear whether the 7th SPU can be used by a game) whereas AMD FX and i7 (4-core) only offers real 4 physical cores (8 logical processors which not fully runnable in parallel).

***[Unknown]*** · 05-13-2014, 07:54 AM

To be sure, there are i7's with 6/12 cores:
http://ark.intel.com/products/63696/Inte...o-3_90-GHz

Something I read indicated that the Cell PPU supported basically hyperthreading. And I will say, hyperthreading does work. For example, the uber slow softgpu in ppsspp runs much faster on my i7 with 8 threads than with 4, but worse with more than 8. They're not as good as two real cores, but they're way better than just one.

IMHO, writing assembly for certain routines is sometimes a very good idea. Some reasons:
* Hand coded assembly does not need to conform to the parameter passing ABI. This can have a large impact on performance in tight code (in cases where inlining might even hurt performance.)
* Optimizers / compilers are sometimes stupid. This is more relevant when targeting ARM / etc.
* If generated at runtime, it can allow you to more conveniently use/not use features based on the host CPU.
* It will not bloat the executable nearly as much as using a ton of templates would.

For example, the vertexdecoder jit in ppsspp gave great performance gains on both x86 and ARM, but it's not a recompiler - it just generates assembly instead of calling an array of functions, and ignores the C++ ABI.

That said, it's often not a good idea unless you've tried everything else first (especially for portability reasons.) A much smarter thing is to look at the assembly that is being produced, and first try to understand and resolve poor codegen by the compiler. For example, MSVC iirc will not optimize accesses to a member variable. You will often get better performance by doing this (only for hot loops):

const int x = m_x;
// tight loop
m_x = x;

Than by using x directly inside the loop. This doesn't require writing assembly to figure out, and if you think about multiple threads you might even realize why the compiler can't safely optimize it.

Anyway, moving things more low level would help in some areas for sure. Some things are very unnecessarily abstracted. The more I look at the less I know where to start to improve things.

Bigpet: to be sure, there are multiple areas. Even in my lazy approach to mapping memory, which did help a little, performance did not change much because it became dominated by the PPU interpreter (and its X vtable lookups, breakpoint checks, and thread status checks per single CPU instruction.) There are definitely multiple areas which are slow right now.

-[Unknown]

Login
Username:
Password:	Lost Password?
	Remember me