Tips on usefully visualizing non-trivial graphs

I'm working on providing some visualization of the dependency graph for a new build tool and while I was easily able to export the basic data in DOT format and to visualize toy examples any real projects seem to end up an unwieldy mess.

Here are my results from a small hobby project which I am using as a test-case:
dependency mess.svg
dependency mess.gv
sfdp -Gsize=60! -Goverlap=prism -Tsvg -o out.svg in.gv

To be honest I'm very new to the whole business of automatic visualization and so I'm probably doing this all wrong. My gut-feeling tells me that a statically rendered diagram might not be of much use for this application, rather some type of interactive visualization allowing the user to trace local sub-sections of the graph might be preferable.

Any tips on the "correct" visualization tools or improved rendering switches would be much appreciated.

Dependency graphs usually

Dependency graphs usually have an inherent "flow", so you will probably get a more insightful drawing by using dot rather than a force-directed algorithm like sfdp. Given that, if you have a complex graph with lots of dependencies, it is hard to get a clear drawing. There are some non-domain-specific tricks you can use.

You can use the tred filter to do a transitive reduction.

Second, you have lots of nodes with high fan-out or fan-in. You can use the unflatten filter to stagger these. (Alternatively, replace the nodes with a single node representing all of them. Also, in dependency graphs, there can be nodes that can be assumed as part of the generic background, and have high degree, but don't reallly add much information to the drawing while causing a good deal of clutter. It is best just to remove these from the graph For example, in a C program, you probably don't want to show the dependencies on libc.)

It would also be good to add some extra vertical space between rows.

Combining these, one could do

   tred mess.gv | unflatten -f -l10 | dot -Granksep=3 -Tsvg > out.svg

To my eyes, this cleans up the drawing a good deal. You can see a lot of the structure but there is still a lot that can be done.

The labels on some of your nodes are extremely long. You would probably be better to use line breaks in the labels, or use something like tooltips to pop up the full label in an interactive viewer.

At this point, you may need to consider domain-specific modifications. There are still some very high-degree nodes that cause a lot of clutter. As I wrote above, if these are generic, maybe they should be removed. If not, maybe there is some other way to represent it. For example, you have the one node starting "tools\ld65..."  Perhaps many of the nodes pointing into it should be put in a cluster, with a single edge from the cluster the ld65 node. In general, do you reallly need all of the information in a single graph? One might factor the dependency information using different scales, or across different dimensions.

 

 

That's much better

Thank you! That's much better.

As you correctly point out these dependency graphs are typically highly regular and contain much irrelevant cruft which is best left out. To be clear I'm not in interested in this particular graph per-se, rather I am working on a visualization feature for (yet another) make replacement to help the users analyse/debug their scripts. Unfortunately this means that it is difficult for the build tool to make any general assumptions about what may or may not be left out or about how the data is logically organized.

I guess what you're saying is that some type of annotations allowing the user to filter and/or weigh the relevant paths is key. For instance let the user mark the "primary" sources of a rule (e.g. C file being compiled and not the thousand-and-one header files included) instead of relying on long and unwieldly command-lines to identify nodes.

Perhaps part of the problems is also that I'm attempting to "visualize" the graph in the abstract without a clear task in mind. While a general big-picture view might be useful it seems to me more likely that this feature would be pulled out when a user wants to figure out why touching a particular file rebuilds half the project or conversely why something which ought to be rebuilt is missed.

I'll need to give this a bit of thought and do some experimentation. In retrospect I was naive in expecting Graphviz to somehow magically produce a clear picture of inherently ugly data.

Recent comments