Support UTF-8 BOM when in text mode

Advertisement

chris
Guest

Support UTF-8 BOM when in text mode

Windows XP Notepad adds a UTF-8 byte-order-mark (BOM) (EF BB BF) to the start of UTF-8 files. But many parts of Linux (and Unix in general) don't support that. I would like to request that WinSCP strip out the BOM when uploading files in text mode.

Alternatively, add a checkbox giving you the option to strip the BOM from UTF-8 files.

How to reproduce:
1. Open Windows NotePad, and enter this perl script:

#!/usr/bin/perl
print "Hello, world!\n";

2. Save As mytest.pl, selecting an encoding of UTF-8.
3. Use WinSCP to transfer the file to a Unix box, with permissions rwxr-xr-x.
4. Try to run the program on the Unix box with
./mytest.pl

In my case, we are transferring gettext po files between platforms, but the essential problem is the same. I will also submit a request that the gettext tools add support for BOMs, but still I think it would be a useful option to have in WinSCP since so many Unix programs don't support BOMs (and often they refuse to add support).

Reply with quote

Advertisement

jmeijer
Guest

stripping BOM is unexpected behaviour

I have just spent a half an hour tracking down why the hell my files weren't the SAME after an upload. I did not imagine that a file-transfer program would CHANGE my files.

You can imagine my appreciation reading the 'bug' filed here, that motivates the 'feature added' to WinSCP. Because he uses the wrong program which has many alternatives, WinSCP adds behaviour to all usage case.

If the problem described is indeeed to occur:

1) it should be optional and NOT the default, or it should WARN about this. It is after all destroying the integrity of my files.
2) If not optional, WinSCP might detect changes in BOM when completing a download-upload cycle and offer to correct the problem. Always maintaining the server side. In first upload, it should not do anyhting with the file.

You have made the worst error possible, file-integrity has been violated! This harms credibility of WinSCP and that of its creators.

Thanks for making a fine product. Thumbs down for harming it.

Jan

Reply with quote

Advertisement

martin
Site Admin
martin avatar
Joined:
Posts:
41,454
Location:
Prague, Czechia

Re: stripping BOM is unexpected behaviour

jmeijer wrote:

I have just spent a half an hour tracking down why the hell my files weren't the SAME after an upload. I did not imagine that a file-transfer program would CHANGE my files.

...

You have made the worst error possible, file-integrity has been violated! This harms credibility of WinSCP and that of its creators.
BOM is stripped with text mode transfer only. With text mode transfer, file integrity is violated by definition. If you want to keep file integrity you must use binary transfer mode.

Reply with quote

hartnegg
Guest

winscp fools itself

The problem is even worse.

If I upload an utf-8 file with byte order mark with winscp, and then edit it with winscp, then the internal editor will correctly detect that it's utf-8.

But when I save the changed file, and edit again, then the editor will fail to detect that the file is using utf-8 encoding, because the byte order mark is missing.

So the transfer function removes the mark that is needed by the edit function.

An alternative would be to include a better automatic detection of utf-8, not just looking at the byte order mark. The utf-8 encoding is the only encoding where auto-detection is extremely reliable, so this would be a very good option.

Reply with quote

Advertisement

You can post new topics in this forum