FS#6469 - Add a 32bpp SSE2 palette animator.

Attached to Project: OpenTTD
Opened by J G Rennison (JGR) - Wednesday, 25 May 2016, 18:55 GMT
Last edited by andythenorth (andythenorth) - Sunday, 03 September 2017, 09:48 GMT
Type Patch
Category Core
Status New
Assigned To No-one
Operating System All
Severity Medium
Priority High
Reported Version trunk
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 0
Private No


Add a 32bpp SSE2 palette animator. When tested this was approximately ~4x faster than 32bpp-anim's palette animator.

Create a new blitter mode: 32bpp-sse2-anim, which is 32bpp-anim + this palette animator.
32bpp-sse2-anim is now used by default where 32bpp-anim would have been.
Also use this palette animator with the 32bpp-sse4-anim blitter.

This changes the alignment requirement of the palette animation buffer and each line within in, which is the reason for the buffer offset changes in other parts of the blitters.

This does not change rendering or other non palette-animation blitter functionality.
This task depends upon

Comment by andythenorth (andythenorth) - Saturday, 02 September 2017, 07:11 GMT
Fails to apply to r27908. I didn't paste all the .rej contents, let me know if you want those.

openttd-trunk(master)$ curl /task/6469/getfile/10530/32bpp-anim-sse2-palette-animator.diff | patch -p0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 20416 100 20416 0 0 106k 0 --:--:-- --:--:-- --:--:-- 106k
patching file source.list
patching file src/blitter/32bpp_anim.cpp
Hunk #2 FAILED at 39.
Hunk #3 FAILED at 281.
Hunk #4 FAILED at 292.
Hunk #5 FAILED at 305.
Hunk #6 FAILED at 319.
Hunk #7 FAILED at 333.
Hunk #8 FAILED at 347.
Hunk #9 FAILED at 357.
Hunk #10 FAILED at 370.
Hunk #11 FAILED at 401.
Hunk #12 FAILED at 410.
Hunk #13 FAILED at 422.
Hunk #14 FAILED at 436.
Hunk #15 FAILED at 457.
Hunk #16 FAILED at 484.
Hunk #17 FAILED at 515.
16 out of 17 hunks FAILED -- saving rejects to file src/blitter/32bpp_anim.cpp.rej
patching file src/blitter/32bpp_anim.hpp
Hunk #1 FAILED at 18.
Hunk #2 succeeded at 62 (offset 4 lines).
1 out of 2 hunks FAILED -- saving rejects to file src/blitter/32bpp_anim.hpp.rej
patching file src/blitter/32bpp_anim_sse2.cpp
patching file src/blitter/32bpp_anim_sse2.hpp
patching file src/blitter/32bpp_anim_sse4.cpp
Hunk #1 FAILED at 35.
Hunk #2 FAILED at 353.
2 out of 2 hunks FAILED -- saving rejects to file src/blitter/32bpp_anim_sse4.cpp.rej
patching file src/blitter/32bpp_anim_sse4.hpp
patching file src/blitter/32bpp_base.cpp
patching file src/blitter/32bpp_base.hpp
patching file src/blitter/32bpp_sse_func.hpp
patching file src/gfxinit.cpp
Hunk #1 succeeded at 284 (offset 18 lines).
patching file src/stdafx.h
Hunk #1 succeeded at 535 (offset 19 lines).
Comment by andythenorth (andythenorth) - Sunday, 03 September 2017, 09:44 GMT
I tried this in JGR's patchpack where it's included.

In subjective testing, comparing with trunk r27910, using this blitter is substantially faster on OS X (judging by ffwd, for the same savegame in both versions).

There might be other performance reasons why JGR PP is faster than trunk, but this blitter change looks worth reviewing.

I suspect this also eliminates or mitigates FS#6546.
Comment by Charles Pigott (LordAro) - Sunday, 03 September 2017, 10:55 GMT
Patch got broken by r27796. This completely supercedes that patch though, so a revert of that lets this patch apply again

Git patch attached. Diff slightly cleaned up, but nothing significant
Comment by andythenorth (andythenorth) - Sunday, 03 September 2017, 12:01 GMT
Applied LordAro's patch on trunk r27911. Same subjective-but-obvious speed improvement when using full animation. This is substantially improved on OS X 10.12.6 / 3.3GHz i7.