Number: 1632
Title: Gvedit can't deal with the UTF-8 encoded input file
Submitter: xeontz
Date: Sat Apr 18 04:57:35 2009
Subsys: GVedit
Version: 2.22
System: x86-Windows-XP sp3
Severity: major
Problem:
An UTF-8 encoded dot file (with BOM header) can be accepted and processed by dot.exe in cmd enviroment. Like:

"C:dot UTF8.dot -Tpng -o UTF8.png", it works right.

However, if the file is opened in Gvedit.exe, it seems that the editor could not process it correctly. And I find the provided editor Gvedit did some default works after opening the UTF-8 encoded file.

The details of the changes are depended on the encode type of the input file.

If the file is encoded with UTF-8, the editor will remove the UTF-8 BOM header. And it will result in a syntax error when dot.exe processes the file. Like this: "Error: Invalid 2-byte UTF8 found in input. Perhaps "-Gcharset=Latin1" is needed?"

If the file is encoded with Unicode-16(litte endian), the editor will remove all content except the first 3 bytes. And the editor will display nothing or just some strange characters. It also will result in a syntax error when dot.exe processes the file. Like this:


"Error: ... syntax error near line 0
context: >>> <<<"

Input:
digraph G{
a [label=" 好"]
}
Comments:
[arif] It is a known issue with gvedit. I will work on it as soon as I have a chance. Meanwhile all I can recommend you is using a different editor. Thanks for the bug report

[erg] This is the same bug as 1423 and 1538.
Owner: *
Status: *