Posts: 10
Threads: 4
Joined: Jun 2010
Reputation:
0
We have tried the Nanodesktop GRAPHDEMO application under JPCSP
emulator.
It seems that the emulation of the priorities of the PSP threads
in incorrect.
You can download a copy of GRAPHDEMO for the real PSP and for
JPCSP here:
http://www.megaupload.com/?d=8FY7GZM8
Note: copy the GRAPHDEMO folder in ms0:/ before starting the
application.
Please compare the behaviour of the program under the real PSP
and under JPCSP emulator. Under JPCSP, the mouse pointer is very,
very slow because of the slow emulation of the PhoenixMouse thread.
Double click of the icons doesn't work for the same reason.
Thanks in advance for your support and... Merry Christmas
Posts: 2,420
Threads: 30
Joined: Dec 2009
Reputation:
50
(12-26-2010, 06:21 PM)pegasus2000 Wrote: We have tried the Nanodesktop GRAPHDEMO application under JPCSP
emulator.
It seems that the emulation of the priorities of the PSP threads
in incorrect.
You can download a copy of GRAPHDEMO for the real PSP and for
JPCSP here:
http://www.megaupload.com/?d=8FY7GZM8
Note: copy the GRAPHDEMO folder in ms0:/ before starting the
application.
Please compare the behaviour of the program under the real PSP
and under JPCSP emulator. Under JPCSP, the mouse pointer is very,
very slow because of the slow emulation of the PhoenixMouse thread.
Double click of the icons doesn't work for the same reason.
Thanks in advance for your support and... Merry Christmas Hi!
the reason why the PhoenixMouse thread (priority=0x22) is not scheduled as often as on the PSP is because the threads "thread1" and "thread2" (priority=0x21) are doing CPU intensive work (spent in ndHAL_WindowsRender.c): on a real PSP, there are still some free CPU cycles between 2 VBLANKs for PhoenixMouse, on Jpcsp, the processing is a little bit slower and there are no free cycles available when thread1/thread2 are running. I've tried to change the priority of the PhoenixMouse thread to 0x20 and the mouse is responding much better. Is it a problem to increase the priority of this thread on a real PSP?
BTW, you would get much better performance (even on a PSP), when using the PSP graphical engine instead of using software rendering. E.g. for the blend operations in ndHAL_WindowsRender.c (see sceGuBlendFunc() ).
A profiling on Jpcsp shows that around 50% of the CPU cycles are spent in the rendering routines from ndHAL_WindowsRender.c. If switching to the PSP graphical engine is not an option for you, you might have a look at the code generated for ndHAL_WindowsRender.c and see if there are potential optimizations.
Here is an example of the generated code doing MathBlend for the X-axis:
Code: 089123C0:[952D004A]: lhu $t5, 74($t1)
089123C4:[95240054]: lhu $a0, 84($t1)
089123C8:[95230048]: lhu $v1, 72($t1)
089123CC:[014D5823]: subu $t3, $t2, $t5
089123D0:[3162FFFF]: andi $v0, $t3, -1
089123D4:[00820018]: mult $a0, $v0
089123D8:[02083821]: addu $a3, $s0, $t0
089123DC:[00072840]: sll $a1, $a3, 0x0001
089123E0:[0103C823]: subu $t9, $t0, $v1
089123E4:[00B13021]: addu $a2, $a1, $s1
089123E8:[94CB0000]: lhu $t3, 0($a2)
089123EC:[3323FFFF]: andi $v1, $t9, -1
089123F0:[00081040]: sll $v0, $t0, 0x0001
089123F4:[7D6508C0]: ext $a1, $t3, 3, 2
089123F8:[7D660A00]: ext $a2, $t3, 8, 2
089123FC:[7D670B40]: ext $a3, $t3, 13, 2
08912400:[004E5821]: addu $t3, $v0, $t6
08912404:[250D0001]: addiu $t5, $t0, 1
08912408:[31A8FFFF]: andi $t0, $t5, -1
0891240C:[0308682B]: sltu $t5, $t8, $t0
08912410:[00002012]: mflo $a0
08912414:[0083C821]: addu $t9, $a0, $v1
08912418:[00192040]: sll $a0, $t9, 0x0001
0891241C:[008F1021]: addu $v0, $a0, $t7
08912420:[94590000]: lhu $t9, 0($v0)
08912424:[7F232280]: ext $v1, $t9, 10, 5
08912428:[3324001F]: andi $a0, $t9, 31
0891242C:[7F222140]: ext $v0, $t9, 5, 5
08912430:[00673821]: addu $a3, $v1, $a3
08912434:[00852021]: addu $a0, $a0, $a1
08912438:[00461021]: addu $v0, $v0, $a2
0891243C:[2CF90020]: sltiu $t9, $a3, 32
08912440:[2C850020]: sltiu $a1, $a0, 32
08912444:[17200002]: bne $t9, $zr, 0x08912450
08912448:[2C460020]: sltiu $a2, $v0, 32
0891244C:[2407001F]: addiu $a3, $zr, 31 <=> li $a3, 31
08912450:[14A00002]: bne $a1, $zr, 0x0891245C
08912454:[00071A80]: sll $v1, $a3, 0x000A
08912458:[2404001F]: addiu $a0, $zr, 31 <=> li $a0, 31
0891245C:[14C00002]: bne $a2, $zr, 0x08912468
08912460:[00641821]: addu $v1, $v1, $a0
08912464:[2402001F]: addiu $v0, $zr, 31 <=> li $v0, 31
08912468:[00023940]: sll $a3, $v0, 0x0005
0891246C:[24E58000]: addiu $a1, $a3, -32768
08912470:[00653021]: addu $a2, $v1, $a1
08912474:[11A0FFD2]: beq $t5, $zr, 0x089123C0
08912478:[A5660000]: sh $a2, 0($t3)
Some pointer calculations could be done outside the loop (just incrementing the pointer inside the loop) and also there are some unnecessary unsigned short to int conversions (e.g. "andi $v0, $t3, -1").
Merry Christmas!
Posts: 10
Threads: 4
Joined: Jun 2010
Reputation:
0
(12-27-2010, 05:05 PM)gid15 Wrote: (12-26-2010, 06:21 PM)pegasus2000 Wrote: We have tried the Nanodesktop GRAPHDEMO application under JPCSP
emulator.
It seems that the emulation of the priorities of the PSP threads
in incorrect.
You can download a copy of GRAPHDEMO for the real PSP and for
JPCSP here:
http://www.megaupload.com/?d=8FY7GZM8
Note: copy the GRAPHDEMO folder in ms0:/ before starting the
application.
Please compare the behaviour of the program under the real PSP
and under JPCSP emulator. Under JPCSP, the mouse pointer is very,
very slow because of the slow emulation of the PhoenixMouse thread.
Double click of the icons doesn't work for the same reason.
Thanks in advance for your support and... Merry Christmas Hi!
the reason why the PhoenixMouse thread (priority=0x22) is not scheduled as often as on the PSP is because the threads "thread1" and "thread2" (priority=0x21) are doing CPU intensive work (spent in ndHAL_WindowsRender.c): on a real PSP, there are still some free CPU cycles between 2 VBLANKs for PhoenixMouse, on Jpcsp, the processing is a little bit slower and there are no free cycles available when thread1/thread2 are running. I've tried to change the priority of the PhoenixMouse thread to 0x20 and the mouse is responding much better. Is it a problem to increase the priority of this thread on a real PSP?
BTW, you would get much better performance (even on a PSP), when using the PSP graphical engine instead of using software rendering. E.g. for the blend operations in ndHAL_WindowsRender.c (see sceGuBlendFunc() ).
A profiling on Jpcsp shows that around 50% of the CPU cycles are spent in the rendering routines from ndHAL_WindowsRender.c. If switching to the PSP graphical engine is not an option for you, you might have a look at the code generated for ndHAL_WindowsRender.c and see if there are potential optimizations.
Here is an example of the generated code doing MathBlend for the X-axis:
Code: 089123C0:[952D004A]: lhu $t5, 74($t1)
089123C4:[95240054]: lhu $a0, 84($t1)
089123C8:[95230048]: lhu $v1, 72($t1)
089123CC:[014D5823]: subu $t3, $t2, $t5
089123D0:[3162FFFF]: andi $v0, $t3, -1
089123D4:[00820018]: mult $a0, $v0
089123D8:[02083821]: addu $a3, $s0, $t0
089123DC:[00072840]: sll $a1, $a3, 0x0001
089123E0:[0103C823]: subu $t9, $t0, $v1
089123E4:[00B13021]: addu $a2, $a1, $s1
089123E8:[94CB0000]: lhu $t3, 0($a2)
089123EC:[3323FFFF]: andi $v1, $t9, -1
089123F0:[00081040]: sll $v0, $t0, 0x0001
089123F4:[7D6508C0]: ext $a1, $t3, 3, 2
089123F8:[7D660A00]: ext $a2, $t3, 8, 2
089123FC:[7D670B40]: ext $a3, $t3, 13, 2
08912400:[004E5821]: addu $t3, $v0, $t6
08912404:[250D0001]: addiu $t5, $t0, 1
08912408:[31A8FFFF]: andi $t0, $t5, -1
0891240C:[0308682B]: sltu $t5, $t8, $t0
08912410:[00002012]: mflo $a0
08912414:[0083C821]: addu $t9, $a0, $v1
08912418:[00192040]: sll $a0, $t9, 0x0001
0891241C:[008F1021]: addu $v0, $a0, $t7
08912420:[94590000]: lhu $t9, 0($v0)
08912424:[7F232280]: ext $v1, $t9, 10, 5
08912428:[3324001F]: andi $a0, $t9, 31
0891242C:[7F222140]: ext $v0, $t9, 5, 5
08912430:[00673821]: addu $a3, $v1, $a3
08912434:[00852021]: addu $a0, $a0, $a1
08912438:[00461021]: addu $v0, $v0, $a2
0891243C:[2CF90020]: sltiu $t9, $a3, 32
08912440:[2C850020]: sltiu $a1, $a0, 32
08912444:[17200002]: bne $t9, $zr, 0x08912450
08912448:[2C460020]: sltiu $a2, $v0, 32
0891244C:[2407001F]: addiu $a3, $zr, 31 <=> li $a3, 31
08912450:[14A00002]: bne $a1, $zr, 0x0891245C
08912454:[00071A80]: sll $v1, $a3, 0x000A
08912458:[2404001F]: addiu $a0, $zr, 31 <=> li $a0, 31
0891245C:[14C00002]: bne $a2, $zr, 0x08912468
08912460:[00641821]: addu $v1, $v1, $a0
08912464:[2402001F]: addiu $v0, $zr, 31 <=> li $v0, 31
08912468:[00023940]: sll $a3, $v0, 0x0005
0891246C:[24E58000]: addiu $a1, $a3, -32768
08912470:[00653021]: addu $a2, $v1, $a1
08912474:[11A0FFD2]: beq $t5, $zr, 0x089123C0
08912478:[A5660000]: sh $a2, 0($t3)
Some pointer calculations could be done outside the loop (just incrementing the pointer inside the loop) and also there are some unnecessary unsigned short to int conversions (e.g. "andi $v0, $t3, -1").
Merry Christmas!
Thanks for your support.
There is no problem to change the priority of PhoenixMouse thread:
I can simply change the priority for the JPCSP HAL and keep unchanged
it for the other HALs.
Thanks for the advices for MathBlend: I'll verify if an optimization can
be done.
I have a propose for you: we are planning to activate a SVN for
the next Nanodesktop 0.5 source code. I could include you into the
list of the authorized members if you are agree.
So, you could submit patches for our source code and we could make
even closer the integration between JPCSP and Nanodesktop.
What do you think about our idea ?
|