Number: 1029
Title: Declared codepage not consistent with data when producing SVG
Submitter: Michel TERRISSE
Date: Fri Oct 6 04:45:39 2006
Subsys: Dot
Version: 2.8
System: x86-Windows-XP
Severity: minor
Problem:
I have node Reseau [label="RÉSEAU"] in a dot file saved in Windows Latin1 codepage (1252, very similar to iso-8859-1 = ISO Latin 1). When I produce SVG, Graphvis générates the header <?xml version="1.0" encoding="UTF-8" standalone="no"?> that indicates that the file is stored in utf-8 format, but writes 'RÉSEAU' in Windows Latin 1 codepage. So the svg file is not well-formed and connot be opened correctly in any SVG viewer.

Note that to type 'É' on a us keyboard you can use alt + 0201.

Regards,

Michel
Input:

digraph CodePageBug
{
  Reseau [label="RÉSEAU"]
}
Comments:
[erg] Since about April 2005, Latin-1 input needs to specify that is the case by setting charset=latin1. See

http://www.graphviz.org/doc/info/attrs.html#d:charset

In fact, more recent versions catch non-UTF8 input and warn about this.

If you run dot -Tsvg -Gcharset=latin1 on your input, you should find the SVG output is correct.

[mterrisse] Indeed if I specify charset="iso-8859-1", node labels are correctly converted to utf-8 when producing svg. However I could notice that this does not work for subgraph labels,

digraph DependancesFichiersBaseCommune { charset="iso-8859-1";

subgraph clusterFichiersGeneres { label="Fichiers générés"; FichierGenere [label="Fichier généré"]; } }

In the generated svg, the subgraph label is "Fichiers générés" whereas the node label is "Fichier généré".

[erg] Yes, you're correct. However, this problem has been fixed in the current version.
Owner: ellson
Status: Fixed