FS#5867 - [Windows] Critical Crash when selecting zBase

Attached to Project: OpenTTD
Opened by Memman (memman32) - Sunday, 19 January 2014, 02:12 GMT
Last edited by fonsinchen (fonsinchen) - Sunday, 23 February 2014, 14:32 GMT
Type Bug
Category Core
Status Closed
Assigned To No-one
Operating System Windows
Severity Critical
Priority Normal
Reported Version 1.4.0-beta2
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No


Switched to zBase before starting a game. Once I x'd out, the game immediately crashed. I also manually loaded the latest version of zBase, but it still crashed. No issues in 1.33, but I may go back and try beta 1 to see if it works there.
This task depends upon

Closed by  fonsinchen (fonsinchen)
Sunday, 23 February 2014, 14:32 GMT
Reason for closing:  Fixed
Additional comments about closing:  In r26365
Comment by frosch (frosch) - Sunday, 19 January 2014, 11:40 GMT
Information from crash.dmp:
00000000`001db840 00000001`3ff304cc : 00000000`00000002 00000000`001db990 00000000`00000000 00000000`00000000 : openttd!Blitter_32bppSSE4_Anim::Draw+0xcfa [c:\bamboo-agent-home\xml-data\build-dir\ottd-rls-w64bit\src\blitter\32bpp_anim_sse4.cpp @ 379]
00000000`001db960 00000001`3ff2edd1 : 00000000`00000000 00000000`0000013d 00000000`00000018 00000001`00000000 : openttd!GfxBlitter<1,1>+0x448 [c:\bamboo-agent-home\xml-data\build-dir\ottd-rls-w64bit\src\gfx.cpp @ 947]
00000000`001dba00 00000001`3ff2f4ec : 00000001`40b801b8 00000000`00000000 00000000`00000000 00000000`0da7bb40 : openttd!DrawSprite+0xb5 [c:\bamboo-agent-home\xml-data\build-dir\ottd-rls-w64bit\src\gfx.cpp @ 818]
00000000`001dba50 00000001`401e67bc : 00000000`00000000 00000000`001dbad9 00000000`04380780 00000000`00000006 : openttd!DrawMouseCursor+0x184 [c:\bamboo-agent-home\xml-data\build-dir\ottd-rls-w64bit\src\gfx.cpp @ 1239]
00000000`001dba90 00000000`77399bd1 : 00000000`00000005 00000000`00000001 00000000`00000000 00000000`00000006 : openttd!WndProcGdi+0x270 [c:\bamboo-agent-home\xml-data\build-dir\ottd-rls-w64bit\src\video\win32_v.cpp @ 916]
00000000`001dbb40 00000000`00000005 : 00000000`00000001 00000000`00000000 00000000`00000006 00000000`00000000 : user32+0x19bd1
00000000`001dbb48 00000000`00000001 : 00000000`00000000 00000000`00000006 00000000`00000000 00000000`00000000 : 0x5
00000000`001dbb50 00000000`00000000 : 00000000`00000006 00000000`00000000 00000000`00000000 00000000`00000001 : 0x1

Most likely duplicate of  FS#5854  and  FS#5855 , and already fixed in nightlies.
Comment by frosch (frosch) - Sunday, 19 January 2014, 11:41 GMT
Can you please check nightlies, whether it works there?
Comment by MJP (MJP) - Monday, 20 January 2014, 22:23 GMT
 FS#5854  and  FS#5855  could not happen on Windows so it must be something else.

I can't reproduce the crash with the OpenTTD 1.4.0 beta 2 64 bits for Windows.
(tried windowed, fullscreen and with the NewGRFs required by crash.sav)

When I switch from OpenGFX to zBase, the only oddity I see is the temporary
"black colour remap" (see black.png, it happens with all 32bpp-anim too).
If the palette is not initialized maybe this is the same for another variable
which could cause the crash if the switch to zBase occurs at the "right" time.
   black.png (257.2 KiB)
Comment by Memman (memman32) - Monday, 20 January 2014, 23:06 GMT
Apparently it did not like the fact I had selected 1600 for my resolution. I maxed the resolution out and it now works fine. Thanks for your help folks. :)
Comment by MJP (MJP) - Friday, 24 January 2014, 04:17 GMT
OK so the crash is trigged by:
- Windows resolution set on 1920*1080
- OpenTTD set on fullscreen, any other resolution and graphicsset = "OpenGFX"
- change from 8bpp to any 32bpp graphics set (so OpenGFX to NightGFX or zBase)

With a game resolution different than the desktop one, after the validation of a 32bpp set, there are 2 resolution changes: in my case, a quick return to 1920*1080 then back to what is specified in openttd.cfg. During the transition, there is a time where graphics seem to be unavailable and any try to draw a sprite leads to a crash.

The fix consists to delay the screen redraw.
Comment by fonsinchen (fonsinchen) - Sunday, 02 February 2014, 20:27 GMT
I can reproduce this.
- Putting another MarkWholeScreenDirty() in there doesn't do anything as we do the same thing already from GameSizeChanged(). Just removing the UpdateWindows() without adding anything already fixes it. I don't know why, though.
- Putting the mutex lock before AllocateDibSection doesn't change anything.
Comment by fonsinchen (fonsinchen) - Sunday, 02 February 2014, 21:20 GMT
It seems the sprite data for the mouse cursor is corrupt when trying to draw it in the way the above stacktrace shows. The exact location of the crash is 32bpp_anim_sse4.cpp:64, where src_rgba_line is invalid. That is because si has an implausibly high sprite_offset. zoom is ZOOM_LVL_OUT_4X or 2 there. sd does have seemingly valid data at offesets 0 and 1. As the sprite is picked from the sprite cache in DrawSprite() (gfx.cpp:806ff), I suspect the spritecache needs some refreshing when we change the bit depth and we're not doing that in time. That doesn't really explain why it doesn't crash if you use your display's native resolution, though.
Comment by MJP (MJP) - Monday, 03 February 2014, 14:14 GMT
There is indeed a better way to fix the problem.
Comment by frosch (frosch) - Monday, 03 February 2014, 19:23 GMT
LoadSpriteTables resets savegame-sensitive NewGRF data. Calling it should break stuff, but calling it shouldn't be needed either.

If the issue is with drawing the mouse: Maybe it needs some Undraw/DrawMouseCursor before/after blitter switch somewhere?
Comment by MJP (MJP) - Tuesday, 04 February 2014, 01:05 GMT
"LoadSpriteTables resets savegame-sensitive NewGRF data." <-- Good to know, this is not obvious.

I tested with only UpdateCursorSize() added at the end of CheckBlitter() and it did not crash.
Comment by MJP (MJP) - Tuesday, 04 February 2014, 13:52 GMT
And what about that?

void CheckBlitter()
if (!SwitchNewGRFBlitter()) return;
GfxInitPalettes(); // fix  FS#5889 
UpdateCursorSize(); // fix  FS#5867 
Comment by fonsinchen (fonsinchen) - Tuesday, 04 February 2014, 19:20 GMT
I'd much rather like to hear an argument why those fixes actually work than more and more unexplained suggestions. This is obviously a thread synchronization issue. The video driver thread seems to access sprites while they're being updated by the main thread. If we accept a fix that just incidentally "works" by changing the thread scheduling a bit then someone will soon trigger the same thing again in a different way. What we need is a fix that makes sure the video driver thread always sees clean data, no matter how it's scheduled.
Comment by fonsinchen (fonsinchen) - Saturday, 08 February 2014, 10:48 GMT
The CheckBlitter fix above might fix  FS#5889  and  FS#5867 . One could argue that when we're clearing the sprite cache we're probably allowed to change the cursor sprites, too. However,  FS#5863  and  FS#5892  are still a mystery then. Those change the blitters during map generation which should not call CheckBlitter but GfxLoadSprites. GfxLoadSprites already does GfxInitPalettes and UpdateCursorSize. Maybe those bugs are different after all.
Comment by fonsinchen (fonsinchen) - Saturday, 08 February 2014, 11:45 GMT
Trying to bisect this I notice that the bug is way older than the CheckBlitter() method ... in fact this is reproducible all the way back to OpenTTD 1.0.0. I didn't try any further because I think it actually has different problems with loading NightGFX back then.
Comment by frosch (frosch) - Saturday, 22 February 2014, 14:29 GMT
For reference: r26360 reverts r25550, which previously tried to fix  FS#5571 , which equals this task.
Comment by frosch (frosch) - Saturday, 22 February 2014, 15:11 GMT
I've compared the Win32 video driver with the other ones.

- Win32 is the only one which does not immediately set _screen.dst_ptr after the old buffer has been destroyed and a new one has been created.
- Win32 is the only one which calls DrawMouseCursor when the cursor enters the window. All others only call UndrawMouseCursor immediately.

The second thing is kind of weird, no idea about that. The first thing looks very likely to me to be the issue of all this.

Attached is a diff for that, though I have no means for testing it.
Comment by frosch (frosch) - Saturday, 22 February 2014, 15:19 GMT
Comparing more of Win32 with SDL leads up to this diff.

ChangeResolution, ToggleFullscreen snd AfterBlitterChange need a (recursive) mutex lock of the backend.

EditBoxLostFocus does not exist for SDL, but is also called from generic OTTD code and interacts with the backend, so it should also require the mutex.
Comment by fonsinchen (fonsinchen) - Saturday, 22 February 2014, 21:17 GMT
The patches don't help. Generally it seems that we should protect calls to MakeWindow by a mutex. Event with the patches there are 2 seemingly unprotected occurences: One in WndProcGdi() and one in Start(). I suspect the occurence in WndProcGdi should be protected as that can spontaneuously called from the window system (if I get that right).
Comment by fonsinchen (fonsinchen) - Saturday, 22 February 2014, 22:01 GMT
Actually it also crashes if the mutex is locked to the thread trying to read the mouse cursor sprite when it crashes. This makes me think that either the update of the sprite cache doesn't lock (which seems improbable) or the sprite is actually not properly updated before we try to use it.
Comment by frosch (frosch) - Saturday, 22 February 2014, 22:05 GMT
I expected WndProcGdi would be called via win32_v.cpp:1249: PeekMessage and DispatchMessage (which hold the _draw_mutex).

If WndProcGdi is called from an entirely different thread than MainLoop is running with, then I have no idea how stuff works at all. GameLoop does not hold any mutex (_draw_mutex is released just before), and WndProcGdi has tons of calls to generic OTTD code (like HandleMouseEvents).
Comment by fonsinchen (fonsinchen) - Saturday, 22 February 2014, 22:56 GMT
The problem seems to be that the 32bit blitters expect the mouse cursor sprite to look different than the 8bit blitters - or possibly the sprite gets deleted/not updated/whatever somewhere. I probably won't find out about that anymore today without going crazy in the process (I really can't understand why people like Visual Studio so much). Anyway, the stacktrace looks like this now:

openttd.exe!Blitter_32bppSSE4_Anim::Draw<0,1,2,1,1>(const Blitter::BlitterParams * bp, ZoomLevel zoom) Zeile 64 + 0x3 Bytes C++
openttd.exe!Blitter_32bppSSE4_Anim::Draw(Blitter::BlitterParams * bp, BlitterMode mode, ZoomLevel zoom) Zeile 363 C++
openttd.exe!GfxBlitter<1,1>(const Sprite * const sprite, int x, int y, BlitterMode mode, const SubSprite * const sub, unsigned int sprite_id, ZoomLevel zoom) Zeile 946 + 0x2c Bytes C++
openttd.exe!GfxMainBlitter(const Sprite * sprite, int x, int y, BlitterMode mode, const SubSprite * sub, unsigned int sprite_id, ZoomLevel zoom) Zeile 957 C++
openttd.exe!DrawSprite(unsigned int img, unsigned int pal, int x, int y, const SubSprite * sub, ZoomLevel zoom) Zeile 818 C++
openttd.exe!DrawMouseCursor() Zeile 1239 C++
openttd.exe!UpdateWindows() Zeile 3029 C++
openttd.exe!ClientSizeChanged(int w, int h) Zeile 201 C++
openttd.exe!WndProcGdi(HWND__ * hwnd, unsigned int msg, unsigned int wParam, long lParam) Zeile 911 C++
[Unten angegebene Rahmen sind möglicherweise nicht korrekt und/oder fehlen, keine Symbole geladen für user32.dll]
openttd.exe!WndProcGdi(HWND__ * hwnd, unsigned int msg, unsigned int wParam, long lParam) Zeile 1027 + 0x18 Bytes C++
openttd.exe!WndProcGdi(HWND__ * hwnd, unsigned int msg, unsigned int wParam, long lParam) Zeile 1019 + 0xc Bytes C++
openttd.exe!VideoDriver_Win32::MakeWindow(bool full_screen) Zeile 266 + 0xe Bytes C++

The mutex is actually locked at that point and it still crashes.
Comment by fonsinchen (fonsinchen) - Saturday, 22 February 2014, 23:03 GMT
Just another really systematic description on how I trigger that:

1. Start OpenTTD in windowed mode with OpenGFX
2. Open game options, set resolution to something that's close to your native resolution, but not quite the right one. In my case I choose 1280x768 because my display's native resolution is 1280x800.
3. Switch to fullscreen (and admire the ugliness)
4. Switch the blitter to zBase
5. Close the game options window.

This crashes the game with the above stacktrace and actually leaves the whole windowing system in a "broken" state where I have a strip of 32 pixels of garbage at the bottom below the task bar. It stays like that until I start openttd again and switch to the native resolution in 8bit fullscreen.
Comment by fonsinchen (fonsinchen) - Saturday, 22 February 2014, 23:10 GMT
In some instances it didn't crash when I set breakpoints in certain places. I cannot remember which places those were. Maybe the sprite updates and the drawing don't really happen in parallel, but rather not in the correct order. That order might be determined by thread scheduling.
Comment by frosch (frosch) - Sunday, 23 February 2014, 11:39 GMT
Is that trace from 22:56 with my patches applied?

In unmodified trunk ClientSizeChanged calles PostResize and GameSizeChanged before assigning _screen.dst_ptr. Which is likely wrong.

The cursor is "removed" via "GameSizeChanged -> ScreenSizeChanged -> _cursor.visible = false", so at least we do not have to worry about UndrawMouseCursor trying to undraw the cursor using the screenbackup from a different blitter.
Comment by fonsinchen (fonsinchen) - Sunday, 23 February 2014, 11:42 GMT
If I haven't royally messed up yesterday then that trace is with your patches and possibly some additional _draw_mutex->BeginCritical(true) somewhere.
Comment by fonsinchen (fonsinchen) - Sunday, 23 February 2014, 13:33 GMT
UpdateWindows() and the DrawMouseCursor() in there is called after AfterBlitterChange() and before the next GameLoop(), which triggers UpdateCursorSize(). So DrawMouseCursor sees inconsistent data about the cursor and crashes. As UpdateWindows() is always called after GameLoop() anyway the idea of just dropping the extra call in ClientSizeChanged() is actually valid. If the resolution isn't "weird" we don't get that call to UpdateWindows() from ClientSizeChanged() and thus evade the crash.