Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link graph job thread-join schedule changes #6470

Closed
DorpsGek opened this issue May 25, 2016 · 10 comments
Closed

Link graph job thread-join schedule changes #6470

DorpsGek opened this issue May 25, 2016 · 10 comments
Assignees
Labels
flyspray This issue is imported from FlySpray (https://bugs.openttd.org/) patch from FlySpray This issue is in fact a Patch, but imported from FlySrpay

Comments

@DorpsGek
Copy link
Member

JGR opened the ticket and wrote:

Pause the game instead of blocking when link graph jobs lag.

Check if the job is still running one date fract tick before it is due to join and if so pause the game until its done.
This avoids the main thread being blocked on a thread join, which appears to the user as if the game is unresponsive, as the UI does not repaint and cannot be interacted with.
Show if pause is due to link graph job in status bar, update network messages.
This does not apply for network clients.

  1. (applies on top of 1)
    Join more than one link graph job at once where possible.

This can occur when the link graph recalculation times are changed.
In particular, if the duration or interval are reduced, the new jobs
will be delayed until the completion of the previous long job,
and then remain backlogged thereafter as jobs are dequeued at the same
rate as they are queued.

Attachments

Reported version: trunk
Operating system: All


This issue was imported from FlySpray: https://bugs.openttd.org/task/6470
@DorpsGek
Copy link
Member Author

JGR wrote:

The uploaded diffs somehow ended up as 0 bytes, uploading again...

Attachments


This comment was imported from FlySpray: https://bugs.openttd.org/task/6470#comment14195

@DorpsGek
Copy link
Member Author

fonsinchen wrote:

I like the lag/pause patch, but there are some stylistic issues with it:

  1. The IsFinished() method should be renamed as the name is quite ambiguous, now. Make it NeedsToBeJoined() or similar.
  2. Line wrapping could be improved, in particular the comment in LinkGraphSchedule::Run() is hard to read.
  3. The "if" cascade in StateGameLoop_LinkGraphPauseControl() can be simplified.
  4. The "extern" declaration in StateGameLoop() is unnecessary. Instead, StateGameLoop_LinkGraphPauseControl() should be declared in linkgraphschedule.h, which is already included in openttd.cpp.

The multi-join patch is somewhat problematic. I did this on purpose because the merging back of link graph state takes place on the main thread and joining multiple components in a row causes more lag than only joining one. I understand that this can hold up processing of the job queue and increase the delay between game changes and link graph reactions. Maybe some more intelligent mechanism could be found that joins at most N link graph edges at a time or triggers the joining more often when the queue is long.


This comment was imported from FlySpray: https://bugs.openttd.org/task/6470#comment14217

@DorpsGek
Copy link
Member Author

fonsinchen wrote:

Unfortunately the flag transfer between threads is undefined behavior in C++11. Also, even before C++11, the hardware is allowed to take an arbitrarily long time to actually make the write in the link graph thread visible to the read in the main thread. The compiler might even realize that the write is never done in the main thread and optimize the whole thing away. We should use a mutex there for older compilers, and a std::atomic with a stricter memory model for C++11. That is somewhat annoying, of course. Maybe I can come up with something better.


This comment was imported from FlySpray: https://bugs.openttd.org/task/6470#comment14219

@DorpsGek
Copy link
Member Author

fonsinchen wrote:

It's terribly ugly, but "volatile" would probably do the trick here. As we cannot generally use C++11 atomics, we cannot establish any proper synchronization without a using a mutex. volatile does exactly the right thing on MSVC, and at least prevents the compiler from optimizing the whole thing away on other platforms. Even though not specifically guaranteed, it's probably safe to assume that the change to job_completed becomes available to the main thread fairly soon after the worker thread has terminated. The CPU's cache coherency protocol should take care of that, provided that we actually reread the value from memory (or cache) every time we need it.

The # ifdef'd atomic operations are fairly useless in their current form as the relaxed memory model doesn't actually give us the guarantees we are looking for here.


This comment was imported from FlySpray: https://bugs.openttd.org/task/6470#comment14220

@DorpsGek
Copy link
Member Author

JGR wrote:

On your first comment:

  1. Perhaps IsScheduledToBeJoined() ?
  2. Noted, I'll see about reflowing that comment, though it probably needs re-writing anyway given your second and third comments.
  3. Agreed. Looking at it now, the else if clause and its nested ifs can be collapsed.
  4. Agreed.

As for merging multiple link graph jobs, merging two instead of one where available could be a way forward.
It prevents the queue from being backlogged indefinitely, but shouldn't cause an excessive latency spike.

For comments 2 and 3:
On MSVC volatile reads/writes have acquire/release semantics, which is definitely correct, but stricter than necessary.
Relaxed ordering semantics are sufficient as there are no other writes by the link graph worker thread which need to be ordered with respect to the visibility in the main thread of the change to the job_completed flag, until the worker thread exits which acts as a full memory barrier. There is only one shared write, so any ordering will do for both the reader and writer.
On non MSVC platforms volatile can be assumed to be at least relaxed ordering, and to prevent the compiler removing the load/store.
An ordinary instead of atomic write should also be OK as there is no write contention, so volatile ought to be good enough.


This comment was imported from FlySpray: https://bugs.openttd.org/task/6470#comment14221

@DorpsGek
Copy link
Member Author

JGR wrote:

I've attached an updated version of the lag/pause patch (diff 1), with changes based on the comments above.

For the multiple join patch (diff 2), instead of joining as many jobs as possible in the same tick, the patch now adds a second join check at a later date_fract in the same day as the original join check.
As joins are now checked more often than jobs are spawned, any backlog will be gradually reduced, without joining multiple jobs in a tick or excessively increasing the rate at which jobs are joined.

Attachments


This comment was imported from FlySpray: https://bugs.openttd.org/task/6470#comment14239

@DorpsGek DorpsGek added Core flyspray This issue is imported from FlySpray (https://bugs.openttd.org/) patch from FlySpray This issue is in fact a Patch, but imported from FlySrpay labels Apr 7, 2018
@frosch123 frosch123 removed the Core label Apr 14, 2018
@andythenorth andythenorth added the stale Stale issues label Jan 5, 2019
@andythenorth
Copy link
Contributor

@JGRennison is this still something you want to pursue? :) @LordAro is interested in getting it through.

@andythenorth andythenorth removed the stale Stale issues label Jan 12, 2019
@JGRennison
Copy link
Contributor

Patch 1 is the interesting one really. I'm happy to make further updates as necessary.
It would be useful to know what the policy on C++11 currently is with reference to atomic operations and fonsinchen's comments above.

@LordAro
Copy link
Member

LordAro commented Jan 13, 2019

It's still somewhat up in the air with regards to how C++11+ is used, but now that the compile farm fully supports C++11 properly, and that -std=c++11 is forced on in the makefile anyway, I think you should be good to go. I imagine no one would complain if you used anything C++14 either.

I seem to recall some issues with using with MinGW, but I just tested that and it seems to be working fine, so go nuts :)

JGRennison added a commit to JGRennison/Upstream-OpenTTD that referenced this issue Jan 22, 2019
)

Check if the job is still running one date fract tick before it is due
to join and if so pause the game until its done.
This avoids the main thread being blocked on a thread join, which appears
to the user as if the game is unresponsive, as the UI does not repaint
and cannot be interacted with.
Show if pause is due to link graph job in status bar, update network
messages.
This does not apply for network clients.
JGRennison added a commit to JGRennison/Upstream-OpenTTD that referenced this issue Mar 10, 2019
)

Check if the job is still running two date fract ticks before it is due
to join, and if so pause the game until its done.
When loading a game, check if the game would block immediately due to
a job which is scheduled to be joined within two date fract ticks,
and if so pause the game until its done.
This avoids the main thread being blocked on a thread join, which appears
to the user as if the game is unresponsive, as the UI does not repaint
and cannot be interacted with.
Show if pause is due to link graph job in status bar, update network
messages.
This does not apply for network clients.
JGRennison added a commit to JGRennison/Upstream-OpenTTD that referenced this issue Mar 10, 2019
)

Check if the job is still running two date fract ticks before it is due
to join, and if so pause the game until its done.
When loading a game, check if the game would block immediately due to
a job which is scheduled to be joined within two date fract ticks,
and if so pause the game until its done.
This avoids the main thread being blocked on a thread join, which appears
to the user as if the game is unresponsive, as the UI does not repaint
and cannot be interacted with.
Show if pause is due to link graph job in status bar, update network
messages.
This does not apply for network clients.
JGRennison added a commit to JGRennison/Upstream-OpenTTD that referenced this issue Mar 11, 2019
)

Check if the job is still running two date fract ticks before it is due
to join, and if so pause the game until its done.
When loading a game, check if the game would block immediately due to
a job which is scheduled to be joined within two date fract ticks,
and if so pause the game until its done.
This avoids the main thread being blocked on a thread join, which appears
to the user as if the game is unresponsive, as the UI does not repaint
and cannot be interacted with.
Show if pause is due to link graph job in status bar, update network
messages.
This does not apply for network clients.
@LordAro
Copy link
Member

LordAro commented Dec 31, 2019

Closed in favour of #7081

@LordAro LordAro closed this as completed Dec 31, 2019
TrueBrain pushed a commit to JGRennison/Upstream-OpenTTD that referenced this issue Dec 22, 2020
)

Check if the job is still running two date fract ticks before it is due
to join, and if so pause the game until its done.
When loading a game, check if the game would block immediately due to
a job which is scheduled to be joined within two date fract ticks,
and if so pause the game until its done.
This avoids the main thread being blocked on a thread join, which appears
to the user as if the game is unresponsive, as the UI does not repaint
and cannot be interacted with.
Show if pause is due to link graph job in status bar, update network
messages.
This does not apply for network clients.
TrueBrain pushed a commit that referenced this issue Dec 22, 2020
Check if the job is still running two date fract ticks before it is due
to join, and if so pause the game until its done.
When loading a game, check if the game would block immediately due to
a job which is scheduled to be joined within two date fract ticks,
and if so pause the game until its done.
This avoids the main thread being blocked on a thread join, which appears
to the user as if the game is unresponsive, as the UI does not repaint
and cannot be interacted with.
Show if pause is due to link graph job in status bar, update network
messages.
This does not apply for network clients.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flyspray This issue is imported from FlySpray (https://bugs.openttd.org/) patch from FlySpray This issue is in fact a Patch, but imported from FlySrpay
Projects
None yet
Development

No branches or pull requests

5 participants