Log files (MLSD output) in UTF-8 encoding

Advertisement

dma_k
dma_k avatar
Joined:
Posts:
19

Log files (MLSD output) in UTF-8 encoding

I am using WinSCP 5.13.4. Needless to say that Unicode support works fine as to directory listing & file transfer. However what is written to log file is like this:
< 2018-09-03 23:12:28.394 150 Opening ASCII mode data connection for MLSD
. 2018-09-03 23:12:28.397 Session ID reused
. 2018-09-03 23:12:28.425 Data connection closed
. 2018-09-03 23:12:28.426 modify=20180903211226;perm=adfrw;size=1521103;type=file;unique=3AU1934D;UNIX.group=100;UNIX.mode=0644;UNIX.owner=1000; Белоснежка и семь гномов.jpg
< 2018-09-03 23:12:28.426 226 Transfer complete
. 2018-09-03 23:12:28.426 Directory listing successful
One can see that is CP1251-encoded filename, which is then UTF-8 encoded. I think it is more practical to write to log already interpreted / decoded information, and not the raw server output. For example, PWD output is written in UTF-8 as I would expect:
> 2018-09-03 23:12:58.644 PWD
< 2018-09-03 23:12:59.159 257 "/ftp/images/Белоснежкa" is the current directory
I also wonder on which debug level auto-detected charset of FTP server is logged?

Reply with quote

Advertisement

martin
Site Admin
martin avatar
Joined:
Posts:
41,415
Location:
Prague, Czechia

Re: Log files (MLSD output) in UTF-8 encoding

WinSCP actually logs both raw listing and parsed listing.

WinSCP does not detect code page. It supports only UTF-8 and the legacy Ansi encoding, as configured in your system (what is CP1251 for your [russian?] system). If you have encoding set to default "Auto" in session settings, you should see a message like
Server does not send proper UTF-8, falling back to local charset
.

Reply with quote

dma_k
dma_k avatar
Joined:
Posts:
19

WinSCP actually logs both raw listing and parsed listing.
Thanks, I got the idea.
Server does not send proper UTF-8, falling back to local charset
I will appreciate if the message reads
Server does not send proper UTF-8, falling back to local charset CP1251
or whatever is configured for the system. Actually I am not sure where WinSCP takes this setting from... Is this a setting for console applications which don't support Unicode (Control panel→Region and Language→Administrative→Change system locale), or location (Control panel→Region and Language→Location)? Actually IMHO both of them are not good to use...

Does this message in log file appear when i.e. when client & server agreed to send data in UTF8? Like below:
< 2018-09-03 23:25:40.387 211-Features:
< 2018-09-03 23:25:40.398  UTF8
...
< 2018-09-03 23:25:40.409 211 End
> 2018-09-03 23:25:40.409 OPTS UTF8 ON
< 2018-09-03 23:25:40.412 200 UTF8 set to on

Reply with quote

martin
Site Admin
martin avatar
Joined:
Posts:
41,415
Location:
Prague, Czechia

dma_k wrote:

I will appreciate if the message reads
Server does not send proper UTF-8, falling back to local charset CP1251
or whatever is configured for the system.
OK, will consider that. But please make sure you understand, that WinSCP does not actually know/care about that. It lets the system do the conversion. WnSCP does not need to know what the Ansi encoding really is.

Actually I am not sure where WinSCP takes this setting from... Is this a setting for console applications which don't support Unicode (Control panel→Region and Language→Administrative→Change system locale), or location (Control panel→Region and Language→Location)? Actually IMHO both of them are not good to use...
Again, WinSCP does not read the information from anywhere. But when it asks system to do the conversion, system uses the settings from Control panel→Region and Language→Administrative→Change system locale. Why do you think it's not good?

Does this message in log file appear when i.e. when client & server agreed to send data in UTF8? Like below:
WinSCP always assumes the server uses UTF-8, until it does not.
WinSCP sends OPTS UTF8 ON, only when it believes the server needs it to actually use UTF-8. If it is confident that the server uses UTF-8 even without that command, it won't send it.

Reply with quote

dma_k
dma_k avatar
Joined:
Posts:
19

Again, WinSCP does not read the information from anywhere.
I see. I have read about OPTS UTF8 ON in this draft and it reads that it only tunes the control connection (e.g. CWD argument or PWD output). I am not expert on FTP protocol, but it seems like there is no agreement how data encoding send over data connection should be treated (at least for MLSD command output) – perhaps there is no way a server can tell a client what encoding it speaks.
But when it asks system to do the conversion, system uses the settings from Control panel→Region and Language→Administrative→Change system locale. Why do you think it's not good?
The description of this settings reads "Language for non-Unicode programs", hence it controls (perhaps among others) the set of glyphs to use in code positions 128-255 for console applications. WinSCP is Unicode program, hence it should not rely on this setting, or let's say, should fall back to it as a last resort. For example, NetBox (which is built in the top of WinSCP I believe) has a dedicated option for that.

netbox_encoding.png

Reply with quote

Advertisement

martin
Site Admin
martin avatar

I do not disagree. But there's a little demand for such an option. Most servers either speak UTF-8 or use an Ansi encoding of their users' language.

Reply with quote

Advertisement

You can post new topics in this forum