FaqNonAscii

 

More generally, how do I use non-ASCII character sets?

The following applies to Graphviz 2.8 and later. (In older versions of Graphviz, you can sometimes get away with simply putting Latin-1 or other UTF-8 characters in the input stream, but the results are not always correct.)

Input: the general idea is to find the Unicode value for the glyph you want, and enter it within a text string "...." or HTML-like label <...>.

For example, the mathematical forall sign (∀) has the value 0x2200. There are several ways this can be inserted into a file. One is to write out the ASCII representation: "&#<nnn>;" where <nnn> is the decimal representation of the value. The decimal value of 0x2200 is 8704, so the character can be specified as "&#8704;" . Alternatively, Graphviz accepts UTF-8 encoded input. In the case of forall, its UTF-8 representation is 3 bytes whose decimal values are 226 136 128. For convenience, you would probably enter this using your favorite editor, tuned to your character set of choice. You can then use the iconv program to map the graph from your character set to UTF-8 or Latin-1.

We also accept the HTML symbolic names for Latin-1 characters as suggested in FaqSymbols. (Go to http://www.research.att.com/~john/docs/html/index.htm and click on Special symbols and Entities) For example, the cent sign (unicode and Latin-1 value decimal 162 can be inserted as

&cent;

Note that the graph file must always be a plain text document not a Word or other rich format file. Any characters not enclosed in "..." or <...> must be ordinary ASCII characters. In particular, all of the DOT keywords such as digraph or subgraph must be ASCII.

Because we cannot always guess the encoding, you should set the graph attribute charset to UTF-8, Latin1 (alias ISO-8859-1 or ISO-IR-100) or Big-5 for Traditional Chinese. This can be done in the graph file or on the command line. For example charset=Latin1.

Output: It is essential that a font which has the glyphs for your specified characters is available at final rendering time. The choice of this font depends on the target code generator. For the gd-based raster generators (PNG, GIF, etc.) you need a TrueType or Type-1 font file on the machine running the Graphviz program. If Graphviz is built with the fontconfig library, it will be used to find the specified font. Otherwise, Graphviz will look in various default directories for the font. The directories to be searched include those specified by the fontpath attribute, related environment or shell variables (see the fontpath entry), and known system font directories. The table http://www.graphviz.org/doc/char.html points out that these glyphs are from the times.ttf font. With fontconfig, it's hard to specify this font. Times usually gets resolved to Adobe Type1 times, which doesn't have all the glyphs seen on that page.)

For Postscript, the input must be either the ASCII subset of UTF-8 or Latin-1. (We have looked for more general solutions, but it appears that UTF-8 and Unicode are handled differently for every kind of font type in Postscript, and we don't have time to hack this case-by-case. If someone wants to volunteer to work on this, let us know.)

For SVG output, we just pass the raw UTF-8 (or other encoding) straight through to the generated code.

As you can see, this is a sad state of affairs. Our plan is to eventually migrate Graphviz to the pango text formatting library, to ameliorate the worst of these complications.

GraphvizWiki: FaqNonAscii (last edited 2008-01-17 17:59:03 by H-135-207-131-158)

Recent comments