FS#3789 - Moving m7 to the tile struct

Attached to Project: OpenTTD
Opened by Xander Hoogendoorn (xahodo) - Saturday, 24 April 2010, 09:55 GMT
Last edited by Zdeněk Sojka (SmatZ) - Saturday, 24 April 2010, 13:48 GMT
Type Patch
Category Core
Status Closed
Assigned To No-one
Operating System All
Severity Low
Priority Normal
Reported Version trunk
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No



This patch gets rid of _me and TileExtended.
It moves m7 to the tile struct.

Also some profiling data is attached (with and without this patch). Gprof and a well developed game were used for profiling. For completeness I also included the unprocessed data in addition to the patch.

I created this patch to make life simpler for me when I start on the new map array.
This task depends upon

Closed by  Zdeněk Sojka (SmatZ)
Saturday, 24 April 2010, 13:48 GMT
Reason for closing:  Won't implement
Additional comments about closing:  Decreases performance and increases memory requirements.
Comment by Zdeněk Sojka (SmatZ) - Saturday, 24 April 2010, 11:27 GMT

that not only makes access to data more complicated, but also makes the struct 10 bytes long because of alignment requirements.
Those profiling data are useless, they are not run on the same data/for the same time. You can use "-v null:ticks=500" parameter to run game for 500 ticks without video output. Also keep in mind the compiler can have different decisions about inlining at many places when you change such a low-level thing.
Comment by Remko Bijker (Rubidium) - Saturday, 24 April 2010, 12:45 GMT
There has been a lot of profiling back when m7 was added with all kinds of different ways to add this extra byte; packed, unpacked, wasting 7 full bytes and a separate array. The separate array was always the fastest, primarily because m7 isn't used that often and as such not having to "load" it for reading the other ones means less cache misses and such.

As the map array is accessed like millions of times a second, having fast access is important. However, profiling with e.g. gprof won't show it as a "bottleneck" because the inlined code is spread *all* over the functions.

Just to "prove" my millions: RunTileLoop loops over the map every 256 ticks, giving 4 million TileLoop calls (on 2kx2k) which all use several map accessors, i.e. 15-30 million map accesses wouldn't be odd (just look at clear tiles that have at least 4 map accesses per tile loop and trees have even a few more). 256 ticks is roughly 7.5 seconds. That gives 2-4 million map accesses per second with ONLY one function; no ship/road/rail pathfinding, no path reservation, no signal updates, etc. So even when you can't find it in your profiles, it might be very visible when looking at it from a step futher away.
Comment by Zdeněk Sojka (SmatZ) - Saturday, 24 April 2010, 13:47 GMT
I did a quick test with TIC()/TOC() in RunTileLoop() loop. x86_64-linux, gcc 4.4.3, 1.6GHz (forced) Phenom X4

Without the patch, common results:
# taskset -c 1 nice -n -19 bin/openttd -g /home/smatz/.openttd/save/openttdcoop/PublicServerGame_136_Final.sav -v null:ticks=1000 -s null -m null
dbg: [misc] [RTL] 34385400 [avg: 343.9]
dbg: [misc] [RTL] 34981298 [avg: 349.8]
dbg: [misc] [RTL] 34555169 [avg: 345.6]
dbg: [misc] [RTL] 34567775 [avg: 345.7]
dbg: [misc] [RTL] 34758364 [avg: 347.6]
dbg: [misc] [RTL] 34502906 [avg: 345.0]
dbg: [misc] [RTL] 34594530 [avg: 345.9]
dbg: [misc] [RTL] 34553789 [avg: 345.5]
dbg: [misc] [RTL] 34329414 [avg: 343.3]
dbg: [misc] [RTL] 34821060 [avg: 348.2]
dbg: [misc] [RTL] 34455456 [avg: 344.6]
dbg: [misc] [RTL] 34452131 [avg: 344.5]
dbg: [misc] [RTL] 34634198 [avg: 346.3]
dbg: [misc] [RTL] 34689166 [avg: 346.9]
dbg: [misc] [RTL] 34883137 [avg: 348.8]
dbg: [misc] [RTL] 34260323 [avg: 342.6]
dbg: [misc] [RTL] 34377560 [avg: 343.8]
dbg: [misc] [RTL] 34246259 [avg: 342.5]
dbg: [misc] [RTL] 34774051 [avg: 347.7]
dbg: [misc] [RTL] 34668791 [avg: 346.7]

With the patch, common results:
# taskset -c 1 nice -n -19 bin/openttd -g /home/smatz/.openttd/save/openttdcoop/PublicServerGame_136_Final.sav -v null:ticks=1000 -s null -m null
dbg: [misc] [RTL] 37031031 [avg: 370.3]
dbg: [misc] [RTL] 37621455 [avg: 376.2]
dbg: [misc] [RTL] 37102894 [avg: 371.0]
dbg: [misc] [RTL] 37262759 [avg: 372.6]
dbg: [misc] [RTL] 37515277 [avg: 375.2]
dbg: [misc] [RTL] 37085875 [avg: 370.9]
dbg: [misc] [RTL] 37254597 [avg: 372.5]
dbg: [misc] [RTL] 37130335 [avg: 371.3]
dbg: [misc] [RTL] 36965037 [avg: 369.7]
dbg: [misc] [RTL] 37504985 [avg: 375.0]
dbg: [misc] [RTL] 37012636 [avg: 370.1]
dbg: [misc] [RTL] 37102777 [avg: 371.0]
dbg: [misc] [RTL] 37168996 [avg: 371.7]
dbg: [misc] [RTL] 37231533 [avg: 372.3]
dbg: [misc] [RTL] 37471573 [avg: 374.7]
dbg: [misc] [RTL] 36828152 [avg: 368.3]
dbg: [misc] [RTL] 36829345 [avg: 368.3]
dbg: [misc] [RTL] 36729934 [avg: 367.3]
dbg: [misc] [RTL] 37148927 [avg: 371.5]
dbg: [misc] [RTL] 37115159 [avg: 371.2]

~7% performance decrease is a lot.