Order of input in .dot file impacts pdf generation

Hi,

I was wondering why (and if there are some rules to know, e.g., which elements are important) the order of elements in the .dot files impact the generated output.
Considering the example file: https://raw.githubusercontent.com/ellson/graphviz/master/doc/dotguide/as...
If I create asde91_2.dot with:
$ diff asde91.dot asde91_2.dot
3a4,9
>
> { rank = same;
> "Software IS"; "Configuration Mgt"; "Architecture & Libraries";
> "Process";
> };
>
17,20d22
< { rank = same;
< "Software IS"; "Configuration Mgt"; "Architecture & Libraries";
< "Process";
< };

The output (made with dot -Tpdf asde91_2.dot > asde91_2.pdf) is really different, and far from what is wanted -- see the "history line" on the right.
Thanks.

.Y

If a graph is

If a graph is underconstrained (especially if it has cycles, as this one does), the dot code resolves things by giving preference to nodes and edges that appear earlier in the graph. This is a reasonable default, as people usually create graphs by starting with "root" nodes. (It is also why using a sophisticated algorithm to break cycles usually produces an unexpected output.) In this case, switching the lines as you did causes the depth-first search used to remove cycles to start with the main component. It finds nmake before ksh, so ksh will appear above nmake. The rank=same constraints thus force 1985 to appear above 1983.

So, (vague) Rule 1: Put important nodes first.

Rule 2: If your graph has cycles and you expect a specific order on output, add sufficient constraints to force the ordering.

 

 

Order made by libcgraph: is there an issue?

Thank you for your answer.
I'm actually not too surprised and wanted confirmation of such an intuitive fact.
I'm trying to make the link with the problem I'm having with Pygraphviz, and the answer I had about its use of libcgraph (see https://github.com/pygraphviz/pygraphviz/issues/14 for the whole small discussion).

The order specified by my python script, which analyzes a logfile to make the according graph, is not the same in the .dot output. I thought that it could be inherent to some ordered structure used in python, but I've been told that Pygraphviz does not interfere in the order, using "libcgraph's agwrite agwrite function" to export the graph to .dot.
I haven't used the c library. But I'm wondering if there is any issue in libcgraph concerning the order managed with the API and the generated order in output.

To a large extent, the output

To a large extent, the output done via agwrite() reflects the same biases the occur in layout, since both use the order of nodes and edges inherent in the graph due to how the graph was built (or parsed). If you are finding some disparity between what you are generating withb pygraphviz and what you are seeing in the dot file, please post a small example and perhaps I can see what's happening.

Order of adding a subgraph not taken into account

The script builds the .dot that you can find here: http://pastebin.com/raw.php?i=ePZ614kb
The output.pdf is not what I want, the subgraphes cluster$i box not beginning at the same level.
Trying to tune it by hand, I knew that having the following subgraph containing the first elements of the clusters at the end of the .dot would give the correct output.pdf but with a warning like "201_0 was already in a rankset, ignored in cluster _anonymous_0", and taking much time to generate with the whole graphe.

        {
                graph [rank=same];
                "201_0"          [rank=source,
                        style=invis];
                "201_1"          [rank=source];
                "201_2"          [rank=source];
                "201_3"          [rank=source,
                        style=invis];
        } 
Thus I added right before generating the .dot a cluster with the following lines:
g=G.add_subgraph(None, rank='same')
for n in subgraph_info.keys():
    g.add_node( str(201)+'_'+str(n) )

Compared to the previous output.dot, the resulting script:
- adds the subgraph prior to the cluster$i that were added before -- this is the point that I don't understand.
- removes each first element of each cluster (i.e.,   "201_0"          [rank=source, .....")
- adds the element "201_$i", for each cluster$i
- prints a warning like "Warning: 0 -> 201_0: head not inside head cluster cluster0"
Here's the output.dot: http://pastebin.com/raw.php?i=qbUuGsRQ

Short answer: remove

Short answer: remove style=invis from the nodes, add newrank=true to your graph, lay out the graph, and then look for further constraints to add. Then put back the style=invis.

Long answer: Don't start with things invisible. You won't see tell-tale anomalies.

You want some very specific constraints on the layout, and relying on the order in which nodes, edges and clusters are added is not strong enough, especially as pygraphviz is just a veneer on top of the graphviz internals.

In your case, you want clusters and a same rank constraint across nodes in different clusters. The default rank algorithm doesn't allow this. A node can only be in one constrained subgraph. This is why you are getting the warning. The node 201_0 is not put into cluster 0 because it is in the same rank subgraph. You would see this if the node weren't invisible.

A while back we introduced a new ranking algorithm that allows nodes to be multiply constrained. If you add newrank=true, you should be getting close to what you want.

By the way, you force 201_1 to be on the same rank as 201_3, which is above 401_3, but then there is an edge from 401_3 to 201_1, which necessarily will be pointing up. Is that your intention?

 

I want "clusters and a same

I want "clusters and a same rank constraint across nodes in different clusters" indeed. And thus yes, it is my intention to have all nodes prefixed with the same number at the same rank in the root graph: edges pointing up or down is one of the interesting information of the generated graphes.
Having tried different things, I came to use invisible: making all subgraphes with the same number of nodes and with relations between each other seemed the best, and maybe the only, thing to do at my level of understanding and from some searches on the net.
I just tried manually the addition of newrank=true in the definition of G on the first previous example.  There is no difference in the generated pdf. Am I using this uncorrectly?

Node 201_0 is indeed not put in cluster 0 in the second graph. And dropping the 'invis' is not necessary to realize this from the generated pdf: even if all 201_$i were 'invis', which is not the case, there are edges from $i to 201_$i making things quiet clear. I'm more understanding the warning because of the assumption that "A node can only be in one constrained subgraph".

Yet don't forget that output.dot is generated from a script. And I wonder if
- it is an expected behavior that the generated .dot does not take into account the subgraph addition order.
  -- regardless my problem and the presence of errors or not...
- if at least there shouldn't be an error instead of having the API generate a .dot after having removed a node from a subgraph to put it into another subgraph, and finally added another node with the same name minus its characteristics to the first subgraph.
  -- regardless the position the subgraph is added in the generated .dot
 

Adding invisible components

Adding invisible components is sometimes necessary to force certain global structure. My recommendation was to first keep everything visible, to see what's happening in order to make adjustments, and then make them invisible again. I find this helpful; it certainly helped me understand what your graph. If you find you can understand the layout without this, fine.

newrank=true won't affect the first graph since that graph doesn't try to do multiple subgraph constraints on nodes.

- it is an expected behavior that the generated .dot does not take into account the subgraph addition order.

This is expected behavior, in that writing out a dot file makes no guarantees as to preserving any order. The order used often reflects the addition order, at least for nodes and subgraphs, as these are kept in ordered lists, with new objects appended at the end. On output, these lists are written from start to end. However, if you need certain constraints, you can't necessarily rely on the ordering; the constraints should be made explicit.

- if at least there shouldn't be an error instead of having the API generate a .dot after having removed a node from a subgraph to put it into another subgraph, and finally added another node with the same name minus its characteristics to the first subgraph.

It has always been our approach to try to generate something, even if there are problems. You did receive a warning message that something didn't work. The layout did not remove a node from one subgraph and put it into another, nor add any new nodes. Without newrank=true, your input graph is unacceptable. dot simply interpreted the graph structure as best it could, ignoring certain specifications and generating a warning.

 

Actually, newrank=true does

Actually, newrank=true does not affect the second graph either... and I have the same warning at the pdf generation. - it is an expected behavior that the generated .dot does not take into account the subgraph addition order. You made the choice to not respect the order of input made with the API when generating the .dot output. I have trouble to understand why: it makes it harder to debug a code generating the .dot, and it is much less what the API user would like à priori -- here again, regardless the wanted constraints. - if at least there shouldn't be an error instead of having the API generate a .dot after having removed a node from a subgraph to put it into another subgraph, and finally added another node with the same name minus its characteristics to the first subgraph. Actually the generation of the pdf with 'dot' indeed generates warnings. But there is nothing during the .dot generation, no warning, no error... And the .dot is generated with a call to G.write(). Do you use 'dot' to write the graph to disk? And yes, I confirm that a node is more or less removed and added: in my python script, - I add nodes with their label in the cluster_$i subgraphes, and - I add the '2constraint' subgraph at the very end, and with it (containing) the first nodes of cluster_$i, but without the label. - Writing the graph to disk (in the resulting .dot file then) shows theses nodes with the label in the last added subgraph, and the nodes without the label at the corresponding place in the cluster_$i subgraphes, place where I added them with their label. So Maybe the internal structures used by the API cannot handle this, or maybe it is done when the graph is written to disk, but it's really confusing... I understand that 'dot' tries to make everything possible to accept the graph during the .pdf generation, but here it's the writing of the wanted graph, input of the program dot.

Actually, newrank=true does

Actually, newrank=true does not affect the second graph either... and I have the same warning at the pdf generation.

If that's the case, you must be using an older version of graphviz.

it is an expected behavior that the generated .dot does not take into account the subgraph addition order. You made the choice to not respect the order of input made with the API when generating the .dot output. I have trouble to understand why: it makes it harder to debug a code generating the .dot, and it is much less what the API user would like à priori -- here again, regardless the wanted constraints.

Except for lexical issues, it is good language design, especially for declarative languages such as dot, to make semantics independent of concrete syntax. If there are features you want during layout, they should be specified explicitly.

if at least there shouldn't be an error instead of having the API generate a .dot after having removed a node from a subgraph to put it into another subgraph, and finally added another node with the same name minus its characteristics to the first subgraph. Actually the generation of the pdf with 'dot' indeed generates warnings. But there is nothing during the .dot generation, no warning, no error...

If you feel this is the case, please submit a bug report. I believe you are confusing what is allowed in the language versus what is supported by the layout program. There is no reason to generate an error or warning when you write the dot file. The output is perfectliy legal dot, and can be used by the various Graphviz programs without problems, or even with dot if you use newrank=true. What you are encountering is a limitation in the dot layout program. It cannot handle the graph you have given it, so it gives a warning and makes some choices to resolve the things it can't handle and produces a layout. To you, this looks like nodes are being deleted and added. Rather, the dot layout picks some set of consistent constraints and uses them. I suspect the number of nodes in the input graph and the number of nodes in the picture are the same. If not, please send me an example.

I add nodes with their label in the cluster_$i subgraphes, and - I add the '2constraint' subgraph at the very end, and with it (containing) the first nodes of cluster_$i, but without the label. - Writing the graph to disk (in the resulting .dot file then) shows theses nodes with the label in the last added subgraph, and the nodes without the label at the corresponding place in the cluster_$i subgraphes, place where I added them with their label.

It sounds like what you are saying is that you create the graph

digraph {
  subgraph cluster0 {
     "301_0"          [label=< 301<br/>1693 >];
     "401_0"
     "301_0" -> "401_0"
     "501_0"          [label=< 501<br/>2140 >];
     "401_0" -> "501_0"  
  }
  subgraph cluster1 {
     "301_1"          [label=< 301<br/>3461 >];
     "401_1"          [label=< 401<br/>2116 >];
     "301_1" -> "401_1"
     "501_1"          [label=< 501<br/>1671 >];
     "401_1" -> "501_1"    
  }
  subgraph {
    rank=same
    "301_0" "301_1"
    
  }
}
but when you write this to a file, you see

digraph {
        {
                graph [rank=same];
                "301_0"          [label=< 301<br/>1693 >];
                "301_1"          [label=< 301<br/>3461 >];
        }
        subgraph cluster0 {
                "301_0";
                "301_0" -> "401_0";
                "501_0"          [label=< 501<br/>2140 >];
                "401_0" -> "501_0";
        }
        subgraph cluster1 {
                "301_1";
                "401_1"          [label=< 401<br/>2116 >];
                "301_1" -> "401_1";
                "501_1"          [label=< 501<br/>1671 >];
                "401_1" -> "501_1";
        }
}

Semantically, these are the same graphs. Both graphs have a single node call 301_1. It has label=< 301<br/>3461 > and it is a member of cluster1 and the anonymous subgraph.

 

 

 

Recent comments