Number: 189
Title: Russian-only or Japanese-only labels disappear
Submitter: Martin Duerst
Date: Thu Aug 15 09:02:30 2002
Subsys: Common
Version: 1.7.4
System: x86-Linux-
Severity: major
Problem:
Labels with only Russian text or only Japanese text disappear completely in output (in particular, but not only, in SVG output). If some Latin characters are added into these labels, things work (modulo font settings).
Input file: b189.dot
Output file: b189.svg
Fix:
Comments: [erg] New versions of svg appear to handle this, at least in that the svg output has character codes for both the Cyrillic and Japanese text. I have only seen the Cyrillic displayed, presumably because the arial font used by the svg displayer does not have the appropriate font version for the Japanese.

Even if this is only a problem now in getting the right font, the box for the Cyrillic appears far too wide.

The user confirms that the lack of labels is fixed with the new release, provided he uses MS Arial Unicode to pick up the Japaneses glyphs. This still leaves the extra wide box.

[duerst] Could it be that somewhere bytes are counted instead of characters (which would just lead to a factor of 2)?

[ellson] I'm trying to work out the right way to deal with this.

Can you tell me what character encoding you are using to put Russian and Japanese text into the graph?

This information may come from some sort of localization data. Do you happen to know if there is an environment variable that dot should be looking at to find out what character encoding is being used?

On linux at least, you can tell what character encoding is being used with the command: locale charmap

Can you let me know what this returns on your system?

[duerst] The data is encoded as UTF-8. I know this because I produce it in a Java servlet, where I set the encoding explicitly, and I have also checked the actual encoding with od -hc (octal dump unix command), the Unicode book, and some of my personal utilities.

Please note that 'locale charmap' will only coincide with the actual encoding of the input file if the input file was created with the same setting, and by a program that uses the locale settings for deciding which encoding to use for the output. For data from the Web, for example, that doesn't apply.
Owner: ellson
Status: Fixed