FS#6689 - reduce hash collisions in CHashTableT

Attached to Project: OpenTTD
Opened by George Koehler (kernigh2) - Friday, 23 March 2018, 18:07 GMT
Type Patch
Category Vehicles → YAPF
Status New
Assigned To No-one
Operating System All
Severity Low
Priority Normal
Reported Version trunk
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 0
Private No


YAPF for ships uses a hash table of closed nodes with 4096 buckets. If a ship is travelling in a straight line along tiles with the same x-coordinate, then the table tends to have hash collisions. Here's an example of a hash with 4084 items, where 3232 (out of 4096) buckets are empty. (Buckets are also known as slots.)

(trunk r27993)
3232 buckets with 0 items
69 buckets with 1 items
93 buckets with 2 items
69 buckets with 3 items
104 buckets with 4 items
175 buckets with 5 items
206 buckets with 6 items
103 buckets with 7 items
32 buckets with 8 items
12 buckets with 9 items
1 buckets with 10 items
4084 items total

I attach a patch (openttd-hash175.diff) that changes CHashTableT::CalcHash() to use multiplication by prime numbers. With this patch, the same example uses more buckets and has fewer collisions.

(trunk r27993 + openttd-hash175.diff)
1106 buckets with 0 items
1974 buckets with 1 items
940 buckets with 2 items
74 buckets with 3 items
2 buckets with 4 items
4084 items total

My example uses a 256x256 map. Collisions seem to happen because the number of buckets (4096) is a multiple of the map size (256). The prime number seems to reduce collisions because 4096 or 256 is not divisible by a prime.

Here's how trunk picks a bucket. My game allows 90 degree turns, so it uses CYapfNodeKeyExitDir::CalcHash() in src/pathfinder/yapf/yapf_node.hpp. This packs bits in a hash:
- Bits 0..1 are the exit direction.
- Bits 2..9 are the x-coordinate.
- Bits 10..17 are the y-coordinate.
- Bits 18..31 are zero.

CHashTableT::CalcHash() in src/misc/hashtable.hpp folds the bits into a smaller value, then takes the modulo number of slots. Because there are 4096 slots (Thash_bits == 12), this operation is
hash ^= hash >> 24;
hash ^= hash >> 12;
hash &= 4095;

In my example, hash >> 24 is zero, and hash >> 12 extracts 5 bits of the y-coordinate. If several keys have the same x-coordinate, then bits 6..9 are the same in their hashes.

My patch (openttd-hash175.diff) changes the operation in CHashTableT::CalcHash() to
hash -= (hash >> 17); // * 131071 / 131072
hash -= (hash >> 5); // * 31 / 32
hash &= 4095;

This uses the Mersenne prime 31 = (1 << 5) - 1. This is not my idea; 31 appears in the string hashing functions in the one true awk and in java.lang.String. The product 31 * hash is hash << 5 - hash, but the low bits caused collisions, so I drop the 5 low bits using hash - hash >> 5. The previous line using 131071 == (1 << 17) - 1 does almost nothing in my example; I put it there so the hash wouldn't ignore the high bits. I also tried using 127 = (1 << 7) - 1 instead of 31, but 127 seems to cause more collisions than 31.

To count buckets, I attach another patch (openttd-hashdump.diff) to dump the first hash to be destroyed with more than 3000 items. The script dumps to /tmp/hashdump.txt; if you can't write to /tmp, you must edit the path in the code. I attach an awk script (hash.awk) to count the buckets using `awk -f hash.awk /tmp/hashdump.txt`. I attach a scenario (Mardhattan Transport) that triggers a hash dump as soon as it is unpaused.

The reduced collisions (openttd-hash175.diff) don't change YAPF's performance in a significant way; YAPF is still as slow as ever. To test performance with Mardhattan Transport:
1. I open the console and enter `set.yapf.max_search_nodes 99999`. This raises the limit from 10000 so YAPF can find longer paths.
2. I delete all orders using buoys. The ships using buoys all go through Tennton Buoy (near the middle of the canal), so I open the ship list for Tennton Buoy and delete all buoys from their orders.
3. I close the canal by bombing the lock near Senville Coal Mine west of Tennton Buoy. This causes YAPF to route ships along the southwest coast.
4. I open Donningville Ship Depot and clone 18 ships on the Donningville-Grenninghead route. I start all 18 ships at once, and centre my view on 1 of them.
5. After the mass of 18 ships leaves Donningville Docks, the scrolling starts and stops as YAPF runs 18 times whenever the mass enters the next tile.
6. The animation starts to smooth after the mass passes Hendworth Falls Oilfield. This seems to be the same whether or not I'm using openttd-hash175.diff to reduce collisions.
This task depends upon