Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Windows] Critical Crash when selecting zBase #5867

Closed
DorpsGek opened this issue Jan 19, 2014 · 27 comments
Closed

[Windows] Critical Crash when selecting zBase #5867

DorpsGek opened this issue Jan 19, 2014 · 27 comments
Labels
flyspray This issue is imported from FlySpray (https://bugs.openttd.org/)

Comments

@DorpsGek
Copy link
Member

memman32 opened the ticket and wrote:

Switched to zBase before starting a game. Once I x'd out, the game immediately crashed. I also manually loaded the latest version of zBase, but it still crashed. No issues in 1.33, but I may go back and try beta 1 to see if it works there.

Attachments

Reported version: 1.4.0-beta2
Operating system: Windows


This issue was imported from FlySpray: https://bugs.openttd.org/task/5867
@DorpsGek
Copy link
Member Author

frosch wrote:

Information from crash.dmp:
00000000`001db840 00000001`3ff304cc : 00000000`00000002 00000000`001db990 00000000`00000000 00000000`00000000 : openttd!Blitter_32bppSSE4_Anim::Draw+0xcfa [c:\bamboo-agent-home\xml-data\build-dir\ottd-rls-w64bit\src\blitter\32bpp_anim_sse4.cpp @ 379]
00000000`001db960 00000001`3ff2edd1 : 00000000`00000000 00000000`0000013d 00000000`00000018 00000001`00000000 : openttd!GfxBlitter<1,1>+0x448 [c:\bamboo-agent-home\xml-data\build-dir\ottd-rls-w64bit\src\gfx.cpp @ 947]
00000000`001dba00 00000001`3ff2f4ec : 00000001`40b801b8 00000000`00000000 00000000`00000000 00000000`0da7bb40 : openttd!DrawSprite+0xb5 [c:\bamboo-agent-home\xml-data\build-dir\ottd-rls-w64bit\src\gfx.cpp @ 818]
00000000`001dba50 00000001`401e67bc : 00000000`00000000 00000000`001dbad9 00000000`04380780 00000000`00000006 : openttd!DrawMouseCursor+0x184 [c:\bamboo-agent-home\xml-data\build-dir\ottd-rls-w64bit\src\gfx.cpp @ 1239]
00000000`001dba90 00000000`77399bd1 : 00000000`00000005 00000000`00000001 00000000`00000000 00000000`00000006 : openttd!WndProcGdi+0x270 [c:\bamboo-agent-home\xml-data\build-dir\ottd-rls-w64bit\src\video\win32_v.cpp @ 916]
00000000`001dbb40 00000000`00000005 : 00000000`00000001 00000000`00000000 00000000`00000006 00000000`00000000 : user32+0x19bd1
00000000`001dbb48 00000000`00000001 : 00000000`00000000 00000000`00000006 00000000`00000000 00000000`00000000 : 0x5
00000000`001dbb50 00000000`00000000 : 00000000`00000006 00000000`00000000 00000000`00000000 00000000`00000001 : 0x1

Most likely duplicate of #5854 and #5855, and already fixed in nightlies.


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment12928

@DorpsGek
Copy link
Member Author

frosch wrote:

Can you please check nightlies, whether it works there?


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment12929

@DorpsGek
Copy link
Member Author

MJP wrote:

#5854 and #5855 could not happen on Windows so it must be something else.

I can't reproduce the crash with the OpenTTD 1.4.0 beta 2 64 bits for Windows.
(tried windowed, fullscreen and with the NewGRFs required by crash.sav)

When I switch from OpenGFX to zBase, the only oddity I see is the temporary
"black colour remap" (see black.png, it happens with all 32bpp-anim too).
If the palette is not initialized maybe this is the same for another variable
which could cause the crash if the switch to zBase occurs at the "right" time.

Attachments


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment12935

@DorpsGek
Copy link
Member Author

memman32 wrote:

Apparently it did not like the fact I had selected 1600 for my resolution. I maxed the resolution out and it now works fine. Thanks for your help folks. :)


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment12937

@DorpsGek
Copy link
Member Author

MJP wrote:

OK so the crash is trigged by:
- Windows resolution set on 1920*1080
- OpenTTD set on fullscreen, any other resolution and graphicsset = "OpenGFX"
- change from 8bpp to any 32bpp graphics set (so OpenGFX to NightGFX or zBase)

With a game resolution different than the desktop one, after the validation of a 32bpp set, there are 2 resolution changes: in my case, a quick return to 1920*1080 then back to what is specified in openttd.cfg. During the transition, there is a time where graphics seem to be unavailable and any try to draw a sprite leads to a crash.

The fix consists to delay the screen redraw.

Attachments


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment12939

@DorpsGek
Copy link
Member Author

DorpsGek commented Feb 2, 2014

fonsinchen wrote:

I can reproduce this.
- Putting another MarkWholeScreenDirty() in there doesn't do anything as we do the same thing already from GameSizeChanged(). Just removing the UpdateWindows() without adding anything already fixes it. I don't know why, though.
- Putting the mutex lock before AllocateDibSection doesn't change anything.


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13001

@DorpsGek
Copy link
Member Author

DorpsGek commented Feb 2, 2014

fonsinchen wrote:

It seems the sprite data for the mouse cursor is corrupt when trying to draw it in the way the above stacktrace shows. The exact location of the crash is 32bpp_anim_sse4.cpp:64, where src_rgba_line is invalid. That is because si has an implausibly high sprite_offset. zoom is ZOOM_LVL_OUT_4X or 2 there. sd does have seemingly valid data at offesets 0 and 1. As the sprite is picked from the sprite cache in DrawSprite() (gfx.cpp:806ff), I suspect the spritecache needs some refreshing when we change the bit depth and we're not doing that in time. That doesn't really explain why it doesn't crash if you use your display's native resolution, though.


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13002

@DorpsGek
Copy link
Member Author

DorpsGek commented Feb 3, 2014

MJP wrote:

There is indeed a better way to fix the problem.

Attachments


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13005

@DorpsGek
Copy link
Member Author

DorpsGek commented Feb 3, 2014

frosch wrote:

LoadSpriteTables resets savegame-sensitive NewGRF data. Calling it should break stuff, but calling it shouldn't be needed either.

If the issue is with drawing the mouse: Maybe it needs some Undraw/DrawMouseCursor before/after blitter switch somewhere?


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13009

@DorpsGek
Copy link
Member Author

DorpsGek commented Feb 4, 2014

MJP wrote:

"LoadSpriteTables resets savegame-sensitive NewGRF data." <-- Good to know, this is not obvious.

I tested with only UpdateCursorSize() added at the end of CheckBlitter() and it did not crash.


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13010

@DorpsGek
Copy link
Member Author

DorpsGek commented Feb 4, 2014

MJP wrote:

And what about that?

void CheckBlitter()
{
if (!SwitchNewGRFBlitter()) return;
ClearFontCache();
GfxClearSpriteCache();
GfxInitPalettes(); // fix #5889
UpdateCursorSize(); // fix #5867
}


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13011

@DorpsGek
Copy link
Member Author

DorpsGek commented Feb 4, 2014

fonsinchen wrote:

I'd much rather like to hear an argument why those fixes actually work than more and more unexplained suggestions. This is obviously a thread synchronization issue. The video driver thread seems to access sprites while they're being updated by the main thread. If we accept a fix that just incidentally "works" by changing the thread scheduling a bit then someone will soon trigger the same thing again in a different way. What we need is a fix that makes sure the video driver thread always sees clean data, no matter how it's scheduled.


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13012

@DorpsGek
Copy link
Member Author

DorpsGek commented Feb 8, 2014

fonsinchen wrote:

The CheckBlitter fix above might fix #5889 and #5867. One could argue that when we're clearing the sprite cache we're probably allowed to change the cursor sprites, too. However, #5863 and #5892 are still a mystery then. Those change the blitters during map generation which should not call CheckBlitter but GfxLoadSprites. GfxLoadSprites already does GfxInitPalettes and UpdateCursorSize. Maybe those bugs are different after all.


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13027

@DorpsGek
Copy link
Member Author

DorpsGek commented Feb 8, 2014

fonsinchen wrote:

Trying to bisect this I notice that the bug is way older than the CheckBlitter() method ... in fact this is reproducible all the way back to OpenTTD 1.0.0. I didn't try any further because I think it actually has different problems with loading NightGFX back then.


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13028

@DorpsGek
Copy link
Member Author

frosch wrote:

For reference: r26360 reverts r25550, which previously tried to fix #5571, which equals this task.


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13093

@DorpsGek
Copy link
Member Author

frosch wrote:

I've compared the Win32 video driver with the other ones.

- Win32 is the only one which does not immediately set _screen.dst_ptr after the old buffer has been destroyed and a new one has been created.
- Win32 is the only one which calls DrawMouseCursor when the cursor enters the window. All others only call UndrawMouseCursor immediately.

The second thing is kind of weird, no idea about that. The first thing looks very likely to me to be the issue of all this.

Attached is a diff for that, though I have no means for testing it.

Attachments


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13094

@DorpsGek
Copy link
Member Author

frosch wrote:

Comparing more of Win32 with SDL leads up to this diff.

ChangeResolution, ToggleFullscreen snd AfterBlitterChange need a (recursive) mutex lock of the backend.

EditBoxLostFocus does not exist for SDL, but is also called from generic OTTD code and interacts with the backend, so it should also require the mutex.

Attachments


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13095

@DorpsGek
Copy link
Member Author

fonsinchen wrote:

The patches don't help. Generally it seems that we should protect calls to MakeWindow by a mutex. Event with the patches there are 2 seemingly unprotected occurences: One in WndProcGdi() and one in Start(). I suspect the occurence in WndProcGdi should be protected as that can spontaneuously called from the window system (if I get that right).


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13097

@DorpsGek
Copy link
Member Author

fonsinchen wrote:

Actually it also crashes if the mutex is locked to the thread trying to read the mouse cursor sprite when it crashes. This makes me think that either the update of the sprite cache doesn't lock (which seems improbable) or the sprite is actually not properly updated before we try to use it.


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13098

@DorpsGek
Copy link
Member Author

frosch wrote:

I expected WndProcGdi would be called via win32_v.cpp:1249: PeekMessage and DispatchMessage (which hold the _draw_mutex).

If WndProcGdi is called from an entirely different thread than MainLoop is running with, then I have no idea how stuff works at all. GameLoop does not hold any mutex (_draw_mutex is released just before), and WndProcGdi has tons of calls to generic OTTD code (like HandleMouseEvents).


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13099

@DorpsGek
Copy link
Member Author

fonsinchen wrote:

The problem seems to be that the 32bit blitters expect the mouse cursor sprite to look different than the 8bit blitters - or possibly the sprite gets deleted/not updated/whatever somewhere. I probably won't find out about that anymore today without going crazy in the process (I really can't understand why people like Visual Studio so much). Anyway, the stacktrace looks like this now:

openttd.exe!Blitter_32bppSSE4_Anim::Draw<0,1,2,1,1>(const Blitter::BlitterParams * bp, ZoomLevel zoom) Zeile 64 + 0x3 Bytes C++
openttd.exe!Blitter_32bppSSE4_Anim::Draw(Blitter::BlitterParams * bp, BlitterMode mode, ZoomLevel zoom) Zeile 363 C++
openttd.exe!GfxBlitter<1,1>(const Sprite * const sprite, int x, int y, BlitterMode mode, const SubSprite * const sub, unsigned int sprite_id, ZoomLevel zoom) Zeile 946 + 0x2c Bytes C++
openttd.exe!GfxMainBlitter(const Sprite * sprite, int x, int y, BlitterMode mode, const SubSprite * sub, unsigned int sprite_id, ZoomLevel zoom) Zeile 957 C++
openttd.exe!DrawSprite(unsigned int img, unsigned int pal, int x, int y, const SubSprite * sub, ZoomLevel zoom) Zeile 818 C++
openttd.exe!DrawMouseCursor() Zeile 1239 C++
openttd.exe!UpdateWindows() Zeile 3029 C++
openttd.exe!ClientSizeChanged(int w, int h) Zeile 201 C++
openttd.exe!WndProcGdi(HWND__ * hwnd, unsigned int msg, unsigned int wParam, long lParam) Zeile 911 C++
user32.dll!7e368734()
[Unten angegebene Rahmen sind möglicherweise nicht korrekt und/oder fehlen, keine Symbole geladen für user32.dll]
user32.dll!7e368816()
user32.dll!7e37bf15()
user32.dll!7e378d77()
openttd.exe!WndProcGdi(HWND__ * hwnd, unsigned int msg, unsigned int wParam, long lParam) Zeile 1027 + 0x18 Bytes C++
user32.dll!7e368734()
user32.dll!7e368816()
user32.dll!7e378ea0()
user32.dll!7e3a95ed()
user32.dll!7e3a9641()
openttd.exe!WndProcGdi(HWND__ * hwnd, unsigned int msg, unsigned int wParam, long lParam) Zeile 1019 + 0xc Bytes C++
user32.dll!7e368734()
user32.dll!7e368816()
user32.dll!7e378ea0()
user32.dll!7e378eec()
ntdll.dll!7c91e473()
user32.dll!7e37b1a8()
openttd.exe!VideoDriver_Win32::MakeWindow(bool full_screen) Zeile 266 + 0xe Bytes C++
90909090()

The mutex is actually locked at that point and it still crashes.


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13100

@DorpsGek
Copy link
Member Author

fonsinchen wrote:

Just another really systematic description on how I trigger that:

  1. Start OpenTTD in windowed mode with OpenGFX
  2. Open game options, set resolution to something that's close to your native resolution, but not quite the right one. In my case I choose 1280x768 because my display's native resolution is 1280x800.
  3. Switch to fullscreen (and admire the ugliness)
  4. Switch the blitter to zBase
  5. Close the game options window.

This crashes the game with the above stacktrace and actually leaves the whole windowing system in a "broken" state where I have a strip of 32 pixels of garbage at the bottom below the task bar. It stays like that until I start openttd again and switch to the native resolution in 8bit fullscreen.


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13101

@DorpsGek
Copy link
Member Author

fonsinchen wrote:

In some instances it didn't crash when I set breakpoints in certain places. I cannot remember which places those were. Maybe the sprite updates and the drawing don't really happen in parallel, but rather not in the correct order. That order might be determined by thread scheduling.


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13102

@DorpsGek
Copy link
Member Author

frosch wrote:

Is that trace from 22:56 with my patches applied?

In unmodified trunk ClientSizeChanged calles PostResize and GameSizeChanged before assigning _screen.dst_ptr. Which is likely wrong.

The cursor is "removed" via "GameSizeChanged -> ScreenSizeChanged -> _cursor.visible = false", so at least we do not have to worry about UndrawMouseCursor trying to undraw the cursor using the screenbackup from a different blitter.


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13104

@DorpsGek
Copy link
Member Author

fonsinchen wrote:

If I haven't royally messed up yesterday then that trace is with your patches and possibly some additional _draw_mutex->BeginCritical(true) somewhere.


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13105

@DorpsGek
Copy link
Member Author

fonsinchen wrote:

UpdateWindows() and the DrawMouseCursor() in there is called after AfterBlitterChange() and before the next GameLoop(), which triggers UpdateCursorSize(). So DrawMouseCursor sees inconsistent data about the cursor and crashes. As UpdateWindows() is always called after GameLoop() anyway the idea of just dropping the extra call in ClientSizeChanged() is actually valid. If the resolution isn't "weird" we don't get that call to UpdateWindows() from ClientSizeChanged() and thus evade the crash.


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867#comment13107

@DorpsGek
Copy link
Member Author

fonsinchen closed the ticket.

Reason for closing: Fixed

In r26365


This comment was imported from FlySpray: https://bugs.openttd.org/task/5867

@DorpsGek DorpsGek added Core flyspray This issue is imported from FlySpray (https://bugs.openttd.org/) labels Apr 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flyspray This issue is imported from FlySpray (https://bugs.openttd.org/)
Projects
None yet
Development

No branches or pull requests

1 participant