Graphviz Issue Tracker - graphviz
View Issue Details
0002573graphvizDotpublic2015-10-11 06:022015-10-21 08:25
SNoiraud 
 
highmajoralways
newopen 
Ubuntu trusty15.04
0002573: dot -Xdot gives incorrect length in T record with utf-8 strings
With the following source :
digraph GRAMPS_graph
{
  _a [ shape="box" style="solid" label=<<TABLE BORDER="0"><TR><TD>ëï éà€ùǜ
Next line</TD></TR></TABLE>> URL="P_a" ];
}

xdot gives 16 characters for "ëï éà€ùǜ" instead of 8.
Download the bug.dot file and do : dot -Txdot -o bug.out bug.dot

In the generated output, we get :
_ldraw_="F 14 11 -Times-Roman c 7 -#000000 T 15 27.3 -1 46 16 -ëï éà€ùǜ F 14 ...
instead of :
_ldraw_="F 14 11 -Times-Roman c 7 -#000000 T 15 27.3 -1 46 8 -ëï éà€ùǜ F 14 ..
No tags attached.
dot bug.dot (382) 2015-10-11 06:02
http://www.graphviz.org/mantisbt/file_download.php?file_id=488&type=bug
Issue History
2015-10-11 06:02SNoiraudNew Issue
2015-10-11 06:02SNoiraudFile Added: bug.dot
2015-10-11 06:08SNoiraudNote Added: 0000988
2015-10-13 11:56SNoiraudNote Edited: 0000988bug_revision_view_page.php?bugnote_id=988#r271
2015-10-13 11:57SNoiraudNote Edited: 0000988bug_revision_view_page.php?bugnote_id=988#r272
2015-10-21 08:25SNoiraudNote Added: 0000990

Notes
(0000988)
SNoiraud   
2015-10-11 06:08   
(edited on: 2015-10-13 11:57)
graphviz version is 2.36. I saw no similar bugs in the issue tracker.
Same problem with graphviz version 2.38 on archlinux and ubuntu 15.10

I made a mistake in summary : read -Txdot and not -Xdot.

(0000990)
SNoiraud   
2015-10-21 08:25   
For the moment, I use the following workaround in python :

        num = self.read_int()
        pos = self.buf.find("-", self.pos) + 1
        npos = pos + num
        # workaround for graphviz < 2.39
        if float(dotversion_str) < float(2.39):
            # we must find " F " if we have at least one utf8 char.
            # if not found, this means the string is the last field in the buffer.
            nb_utf8_chars = int(num - len("".join(i for i in self.buf[self.pos:npos] if ord(i)<128)))
            if nb_utf8_chars > 0:
                end_pos = self.buf.find(" F ", self.pos) + 1
                if end_pos > 0:
                    npos = end_pos
        # enf of workaround.
        self.pos = npos
        res = self.buf[pos:self.pos]

It works for all my cases.