subscripts & greek letters in dot edge labels

[Env: graphviz 2.38, Windows 7]

I'm working on a project to create path diagrams for structural equation models in R.
The sem package contains a function, pathDiagram, that does this reasonably well, by constructing the required code for dot.

We use two back-end renderers: dot itself, with -Tpdf, and the R DiagrammeR package, that uses javascript libraries grViz and mermaid.

We recently added code to allow rendering edge labels using greek letters and subscripts, by
using the UTF-8 character equivalents, eg
"Beta" "Β" "Β"
subscripts <- c("₀", "₁", "₂", "₃", "₄", "₅", "₆",
"₇", "₈", "₉")

We find that this works perfectly with DiagrammeR. With dot, we get the Greek letters, but
nothing we have tried allows us to get subscripts from the standard command
dot -T pdf -o file.pdf
All we get are those little boxes with the 4-digit character code.

Is this a bug or limitation of dot? Is there any work-around?

Here is an example of a dot file generated by out software that illustrates this behavior

digraph "union.sem" {
node [fontname="Helvetica" fontsize=14 fillcolor="transparent" shape=box style=filled];
edge [fontname="Helvetica" fontsize=10];
{rank=min "x1"}
{rank=min "x2"}
"y1" [fillcolor="transparent"]
"y2" [fillcolor="transparent"]
"y3" [fillcolor="transparent"]
"x2" -> "y1" [label="γ̂&2081;&2082;=-0.09" color=red penwidth=1.001];
"y1" -> "y2" [label="β₂₁=-0.28" color=red penwidth=1.001];
"x2" -> "y2" [label="γ₂₂=0.06" color=black penwidth=1.001];
"y1" -> "y3" [label="β₃₁=-0.22" color=red penwidth=1.001];
"y2" -> "y3" [label="β₃₁=0.85" color=black penwidth=1.001];
"x1" -> "y3" [label="γ₃₁=0.86" color=black penwidth=1.001];
"x1" -> "x2" [label="σ₁₂=7.14" dir=both color=black penwidth=1.001];
// variable labels:
"y1" [label="Deference"];
"y2" [label="Activism"];
"y3" [label="Sentiment"];
"x1" [label="Years"];
"x2" [label="Age"];

pathDiagram.pdf31.31 KB

renders OK in

The subscripts render correctly in Graphviz 2.39 on Mac OSX and I'm sure most versions of graphviz that support UTF8 and CairoPango. It appears that in Windows7, the little box with the numbers is the way the cairopango renderer indicates that it could not find the needed glyph. I don't have Windows, so I can't try it here.  

This discussion is interesting:

It was claimed that Windows "Arial Unicode MS" is missing some useful technical or scientific characters - maybe the subscripts?  I wonder if it would help to try other fonts, or if that's somehow wired into the UTF-8 rendering already.


For most font features, we

For most font features, we rely on the underlying renderers such as cairo and svg. You can get access to these features by using HTML-like labels For example,

digraph G {
  c -> b [label=<&#x392;<sub>21</sub>>]
  a -> b [label=<&#x392;<sub><font point-size="8">21</font></sub>>]



yeah but

who knows what is really happening behind the scenes in R?

Recent comments