FS#6389 - name is truncated in in-game content downloader

Attached to Project: OpenTTD
Opened by David P. Kendal (dpk) - Tuesday, 10 November 2015, 12:01 GMT
Type Bug
Category Interface
Status New
Assigned To No-one
Operating System All
Severity Medium
Priority Normal
Reported Version 1.5.2
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 0
Private No


The name of my scenario is "dpk’s UK and Ireland Map (1900)". On the website this is displayed fine, but in game it's truncated to "dpk’s UK and Ireland Map (190".
This task depends upon

Comment by Leif Linse (Zuu) - Saturday, 14 November 2015, 16:32 GMT
The reason for this is that ContentInfo::name is limited to 32 characters. This is also the number of characters OpenTTD receives from the content server.
Comment by David P. Kendal (dpk) - Saturday, 14 November 2015, 16:34 GMT
Then the limit should be made larger, or the content service upload system should warn you about this, or (even better) forbid you from uploading things with names longer than that.
Comment by Eearslya (Eearslya) - Saturday, 14 November 2015, 17:19 GMT
The problem is actually a little more complex; your scenario's name is only 31 characters long, but the apostrophe used is a special UTF-8 character that takes up 3 bytes instead of 1. The web frontend accepted it as one character, but the actual server and game code still use the UTF-8 apostrophe, causing it to be 33 characters, and thus truncated.
Comment by David P. Kendal (dpk) - Saturday, 14 November 2015, 21:21 GMT
Then the server and game code should count Unicode codepoints for the limit, not bytes. (Incidentally, what happens if a UTF-8 codepoint is split in the middle by the truncator?)
Comment by Eearslya (Eearslya) - Saturday, 14 November 2015, 21:59 GMT
Unfortunately, that's just not how C++ works. Not without throwing the string through a UTF-8 -> ASCII parser, at least. The strings are all stored as null-terminated byte arrays. And if it did happen to truncate in the middle of a UTF-8 character, it would probably either render a garbage character, or nothing at all for that byte.