Graphviz Issue Tracker
Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0002425graphvizOutput Generationpublic2014-02-25 14:202014-03-25 10:05
Reportersparr 
Assigned To 
PrioritynormalSeverityimportantReproducibilityrandom
StatusresolvedResolutionfixed 
Platformx86_64OSOSXOS Version10.9
Summary0002425: random segfault in png:cairo:cairo and png:quartz:quartz in common call to fread() then flockfile()
DescriptionI am working with large dot files (100-400kb), containing either a single large digraph (200-400 nodes, 300-600 edges) or many (100+) smaller digraphs (3-30 nodes, 2-50 edges).

While using dot or neato to render these graphs, I am encountering a "Segmentation Fault: 11" at a randomly selected point during the render (sometimes on the 10th step, sometimes on the 100th, sometimes on the 1000th). The problem occurs whether I am using png:quartz:quartz or png:cairo:cairo, as their code paths coincide in the relevant spot. I have tracked the problem down as follows:

The segfault is caused by an EXC_BAD_ACCESS in libsystem_c.dylib flockfile(). This happens when flockfile is called by fread of fseek (different code path for quartz and cairo) with a null file pointer. That null file pointer is produced when gvusershape_find() is called with an empty string as a parameter (the string should usually be the path/name of the image file), returning a usershape_t with the 'f' field set to zero.

That string should be initialized in shapes.c:2887 as follows:
    name = ND_shape(n)->name;

This is as far down the rabbit hole as I was able to go on my own.

I am attaching a tarball containing three files:
1.png is a randomly chosen small png file from my normal usage
test.dot.lua is a lua script that outputs a large dot file (50 digraphs, 400 nodes and 800 edges each)
test.sh symlinks 1.png to 2.png through 200.png, then runs test.dot.lua and pipes the output to dot
Steps To Reproduce1. ungzip/untar the attached file
2. run test.sh (which requires lua and dot)
TagsNo tags attached.
AUXILLARY-FILES
DATE-FIXED
FIX-COMMENT
FORMER-ID
INPUT-FILE
OUTPUT-FILE
STATUS-COMMENT
VERSION
Attached Filesgz file icon graphviz_png_segfault_test.tar.gz (Attachment missing)

- Relationships

-  Notes
User avatar (0000690)
sparr (reporter)
2014-02-25 14:21

The problem persists with #define MAX_USERSHAPE_FILES_OPEN 1

The problem does not occur with jpg instead of png input images
User avatar (0000691)
ellson (administrator)
2014-02-25 16:04

Great test case, thanks. I was able to reproduce the problem on Fedora-20.

I'm trying this diff, hoping to get an error meeesage instead of a crash, but now I'm not seeing the bug at all. Can you see if it helps you?

diff --git a/plugin/pango/gvloadimage_pango.c b/plugin/pango/gvloadimage_pango.c
index 41c93ce..2d2325d 100644
--- a/plugin/pango/gvloadimage_pango.c
+++ b/plugin/pango/gvloadimage_pango.c
@@ -41,6 +41,8 @@ typedef enum {
 static cairo_status_t
 reader (void *closure, unsigned char *data, unsigned int length)
 {
+ if ((FILE *)closure == NULL)
+ return CAIRO_STATUS_READ_ERROR;
     if (length == fread(data, 1, length, (FILE *)closure)
      || feof((FILE *)closure))
         return CAIRO_STATUS_SUCCESS;
User avatar (0000692)
sparr (reporter)
2014-02-25 16:37

I already implemented a similar patch. This does resolve the segfault, but of course results in the files in question not being loaded (and for a thousand-image graph, there might be 2-10 instances of this bug).

I don't know if/when CAIRO_STATUS_READ_ERROR would result in an error message visible to the user.
User avatar (0000693)
ellson (administrator)
2014-02-25 17:14

we can also try returning CAIRO_STATUS_SUCCESS ... trying now ...
User avatar (0000694)
sparr (reporter)
2014-02-25 17:19

The return value isn't important. Skipping the fread means that the file won't be loaded.

I think the solution here is to be found upwards in the call stack. I determined that the null file pointer is coming from an empty string where there should be a filename being passed to gvusershape_find(), and that empty string is coming from ND_shape(n)->name in shapes.c on line 2887.

I am hopeful that someone will see this bug who is knowledgeable enough about the internal object representation to follow that a few steps further up.
User avatar (0000695)
sparr (reporter)
2014-02-26 10:04

crash logs and stack traces using cairo and quartz, respectively:

Process 10671 stopped
* thread 0000001: tid = 0x2fcdc4, 0x00007fff92d7af6a libsystem_c.dylib`flockfile + 18, queue = 'com.apple.main-thread, stop reason = EXC_BAD_ACCESS (code=1, address=0x68)
    frame #0: 0x00007fff92d7af6a libsystem_c.dylib`flockfile + 18
libsystem_c.dylib`flockfile + 18:
-> 0x7fff92d7af6a: movq 104(%rbx), %rdi
   0x7fff92d7af6e: cmpq %r14, 72(%rdi)
   0x7fff92d7af72: je 0x7fff92d7af92 ; flockfile + 58
   0x7fff92d7af74: addq $8, %rdi
(lldb) bt
* thread 0000001: tid = 0x2fcdc4, 0x00007fff92d7af6a libsystem_c.dylib`flockfile + 18, queue = 'com.apple.main-thread, stop reason = EXC_BAD_ACCESS (code=1, address=0x68)
    frame #0: 0x00007fff92d7af6a libsystem_c.dylib`flockfile + 18
    frame 0000001: 0x00007fff92d7cca6 libsystem_c.dylib`fread + 31
    frame 0000002: 0x0000000100111d7f libgvplugin_pango.6.dylib`reader(closure=0x0000000000000000, data=<unavailable>, length=<unavailable>) + 31 at gvloadimage_pango.c:44
    frame 0000003: 0x000000010038ccd5 libcairo.2.dylib`stream_read_func + 43
    frame 0000004: 0x0000000100771b90 libpng15.15.dylib`png_read_sig + 66
    frame 0000005: 0x000000010076baf5 libpng15.15.dylib`png_read_info + 46
    frame 0000006: 0x000000010038c8e0 libcairo.2.dylib`read_png + 186
    frame 0000007: 0x000000010038cc5a libcairo.2.dylib`cairo_image_surface_create_from_png_stream + 25
    frame 0000008: 0x0000000100111cb0 libgvplugin_pango.6.dylib`cairo_loadimage(job=<unavailable>, us=0x000000010485cf30) + 128 at gvloadimage_pango.c:78
    frame 0000009: 0x0000000100111dce libgvplugin_pango.6.dylib`pango_loadimage_cairo(job=<unavailable>, us=0x000000010485cf30, b=boxf at 0x00007fff5fbff4a0, filled=<unavailable>) + 30 at gvloadimage_pango.c:99
    frame 0000010: 0x000000010000ba39 libgvc.6.dylib`gvloadimage(job=<unavailable>, us=<unavailable>, b=boxf at 0x00007fff5fbff580, filled=<unavailable>, target=<unavailable>) + 249 at gvloadimage.c:62
    frame #11: 0x000000010000a9a7 libgvc.6.dylib`gvrender_usershape(job=<unavailable>, name=<unavailable>, a=<unavailable>, n=<unavailable>, filled='\0', imagescale=<unavailable>) + 1079 at gvrender.c:777
    frame 0000012: 0x000000010003caa9 libgvc.6.dylib`poly_gencode(job=0x000000010120e600, n=<unavailable>) + 2457 at shapes.c:2926
    frame 0000013: 0x000000010004bf1e libgvc.6.dylib`emit_node(job=0x000000010120e600, n=0x00000001048471d0) + 2286 at emit.c:1908
    frame 0000014: 0x000000010004a582 libgvc.6.dylib`emit_graph [inlined] emit_view(flags=13056, job=<unavailable>, g=<unavailable>) + 486 at emit.c:3338
    frame 0000015: 0x000000010004a39c libgvc.6.dylib`emit_graph [inlined] emit_page(job=0x000000010120e600, g=0x00000001048781e0) + 2792 at emit.c:3439
    frame 0000016: 0x00000001000498b4 libgvc.6.dylib`emit_graph(job=0x000000010120e600, g=0x00000001048781e0) + 2660 at emit.c:3496
    frame 0000017: 0x000000010005075c libgvc.6.dylib`gvRenderJobs(gvc=0x00000001002040a0, g=0x00000001048781e0) + 6108 at emit.c:4100
    frame 0000018: 0x0000000100003d05 dot`main(argc=<unavailable>, argv=<unavailable>) + 1301 at dot.c:192
    frame 0000019: 0x00007fff8addd5fd libdyld.dylib`start + 1
    frame 0000020: 0x00007fff8addd5fd libdyld.dylib`start + 1


Process 65216 stopped
* thread 0000001: tid = 0x30d140, 0x00007fff92d7af6a libsystem_c.dylib`flockfile + 18, queue = 'com.apple.main-thread, stop reason = EXC_BAD_ACCESS (code=1, address=0x68)
    frame #0: 0x00007fff92d7af6a libsystem_c.dylib`flockfile + 18
libsystem_c.dylib`flockfile + 18:
-> 0x7fff92d7af6a: movq 104(%rbx), %rdi
   0x7fff92d7af6e: cmpq %r14, 72(%rdi)
   0x7fff92d7af72: je 0x7fff92d7af92 ; flockfile + 58
   0x7fff92d7af74: addq $8, %rdi
(lldb) bt
* thread 0000001: tid = 0x30d140, 0x00007fff92d7af6a libsystem_c.dylib`flockfile + 18, queue = 'com.apple.main-thread, stop reason = EXC_BAD_ACCESS (code=1, address=0x68)
    frame #0: 0x00007fff92d7af6a libsystem_c.dylib`flockfile + 18
    frame 0000001: 0x00007fff92d7d48f libsystem_c.dylib`fseek + 53
    frame 0000002: 0x00007fff8afaec60 CoreGraphics`CGAccessSessionCreate + 126
    frame 0000003: 0x00007fff8af88322 CoreGraphics`CGDataProviderCopyData + 220
    frame 0000004: 0x00007fff8ef1ad38 ImageIO`CGImageReadCreateWithProvider + 229
    frame 0000005: 0x00007fff8ef1abbf ImageIO`CGImageSourceCreateWithDataProvider + 202
    frame 0000006: 0x000000010010ca72 libgvplugin_quartz.6.dylib`quartz_loadimage_quartz [inlined] quartz_loadimage(job=0x00000001002061c0) + 200 at gvloadimage_quartz.c:131
    frame 0000007: 0x000000010010c9aa libgvplugin_quartz.6.dylib`quartz_loadimage_quartz(job=0x00000001002061c0, us=0x000000010035c250, b=boxf at 0x00007fff5fbff4a0, filled=<unavailable>) + 26 at gvloadimage_quartz.c:164
    frame 0000008: 0x000000010000ba39 libgvc.6.dylib`gvloadimage(job=<unavailable>, us=<unavailable>, b=boxf at 0x00007fff5fbff580, filled=<unavailable>, target=<unavailable>) + 249 at gvloadimage.c:62
    frame 0000009: 0x000000010000a9a7 libgvc.6.dylib`gvrender_usershape(job=<unavailable>, name=<unavailable>, a=<unavailable>, n=<unavailable>, filled='\0', imagescale=<unavailable>) + 1079 at gvrender.c:777
    frame 0000010: 0x000000010003caa9 libgvc.6.dylib`poly_gencode(job=0x00000001002061c0, n=<unavailable>) + 2457 at shapes.c:2926
    frame #11: 0x000000010004bf1e libgvc.6.dylib`emit_node(job=0x00000001002061c0, n=0x0000000103136260) + 2286 at emit.c:1908
    frame 0000012: 0x000000010004a582 libgvc.6.dylib`emit_graph [inlined] emit_view(flags=205568, job=<unavailable>, g=<unavailable>) + 486 at emit.c:3338
    frame 0000013: 0x000000010004a39c libgvc.6.dylib`emit_graph [inlined] emit_page(job=0x00000001002061c0, g=0x0000000103146ce0) + 2792 at emit.c:3439
    frame 0000014: 0x00000001000498b4 libgvc.6.dylib`emit_graph(job=0x00000001002061c0, g=0x0000000103146ce0) + 2660 at emit.c:3496
    frame 0000015: 0x000000010005075c libgvc.6.dylib`gvRenderJobs(gvc=0x00000001002040a0, g=0x0000000103146ce0) + 6108 at emit.c:4100
    frame 0000016: 0x0000000100003d05 dot`main(argc=<unavailable>, argv=<unavailable>) + 1301 at dot.c:192
    frame 0000017: 0x00007fff8addd5fd libdyld.dylib`start + 1
    frame 0000018: 0x00007fff8addd5fd libdyld.dylib`start + 1
User avatar (0000696)
ellson (administrator)
2014-02-26 10:38

This should fix it:

diff --git a/lib/common/shapes.c b/lib/common/shapes.c
index 74d0c89..9b967c1 100644
--- a/lib/common/shapes.c
+++ b/lib/common/shapes.c
@@ -2886,9 +2886,9 @@ static void poly_gencode(GVJ_t * job, node_t * n)
     if (ND_shape(n)->usershape) {
        name = ND_shape(n)->name;
        if (streq(name, "custom"))
- name = agget(n, "shapefile");
- usershape_p = TRUE;
- } else if ((name = agget(n, "image"))) {
+ if ((name = agget(n, "shapefile")) && name[0])
+ usershape_p = TRUE;
+ } else if ((name = agget(n, "image")) && name[0]) {
        usershape_p = TRUE;
     }
     if (usershape_p) {
User avatar (0000697)
sparr (reporter)
2014-02-26 15:08

I think that patch just leaves usershape_p = FALSE when agget(n, "shapefile") returns an empty string. It seems like this will just cause the renderer to skip the image in question, instead of segfaulting when failing to read it. This is hard to verify since it's a single image out of thousands, but I will run the large test case and compare the different results to confirm.
User avatar (0000698)
sparr (reporter)
2014-02-26 15:27

confirmed. running with that patch appears to eliminate the segfaults, at the expense of images being unpredictably lost. running through the test case produces 50 output pngs that should be identical. for me, 20 of them (in no discernable pattern) are identical and complete while the remaining 30 have varying numbers of missing images, including three with no images at all (which indicates 200 consecutive image loading failures).

the file sizes are a rough indication of the number of missing images, with 2244330 being no images present, and 2807002 indicating an image in every node:

-rw-r--r-- 1 sparr staff 2244330 Feb 26 15:18 test.dot.5.png
-rw-r--r-- 1 sparr staff 2244330 Feb 26 15:17 test.dot.3.png
-rw-r--r-- 1 sparr staff 2244330 Feb 26 15:18 test.dot.22.png
-rw-r--r-- 1 sparr staff 2252604 Feb 26 15:18 test.dot.33.png
-rw-r--r-- 1 sparr staff 2265291 Feb 26 15:18 test.dot.43.png
-rw-r--r-- 1 sparr staff 2276297 Feb 26 15:18 test.dot.31.png
-rw-r--r-- 1 sparr staff 2299425 Feb 26 15:18 test.dot.36.png
-rw-r--r-- 1 sparr staff 2301078 Feb 26 15:18 test.dot.12.png
-rw-r--r-- 1 sparr staff 2355241 Feb 26 15:17 test.dot
-rw-r--r-- 1 sparr staff 2413776 Feb 26 15:18 test.dot.24.png
-rw-r--r-- 1 sparr staff 2424190 Feb 26 15:18 test.dot.39.png
-rw-r--r-- 1 sparr staff 2446325 Feb 26 15:18 test.dot.19.png
-rw-r--r-- 1 sparr staff 2452920 Feb 26 15:18 test.dot.27.png
-rw-r--r-- 1 sparr staff 2485990 Feb 26 15:18 test.dot.41.png
-rw-r--r-- 1 sparr staff 2485990 Feb 26 15:18 test.dot.13.png
-rw-r--r-- 1 sparr staff 2498113 Feb 26 15:18 test.dot.45.png
-rw-r--r-- 1 sparr staff 2512318 Feb 26 15:18 test.dot.28.png
-rw-r--r-- 1 sparr staff 2520703 Feb 26 15:18 test.dot.29.png
-rw-r--r-- 1 sparr staff 2527932 Feb 26 15:18 test.dot.40.png
-rw-r--r-- 1 sparr staff 2535168 Feb 26 15:18 test.dot.15.png
-rw-r--r-- 1 sparr staff 2536991 Feb 26 15:19 test.dot.50.png
-rw-r--r-- 1 sparr staff 2551833 Feb 26 15:18 test.dot.17.png
-rw-r--r-- 1 sparr staff 2553893 Feb 26 15:19 test.dot.49.png
-rw-r--r-- 1 sparr staff 2559449 Feb 26 15:19 test.dot.48.png
-rw-r--r-- 1 sparr staff 2660060 Feb 26 15:18 test.dot.35.png
-rw-r--r-- 1 sparr staff 2719506 Feb 26 15:18 test.dot.21.png
-rw-r--r-- 1 sparr staff 2731353 Feb 26 15:18 test.dot.18.png
-rw-r--r-- 1 sparr staff 2784432 Feb 26 15:18 test.dot.11.png
-rw-r--r-- 1 sparr staff 2784837 Feb 26 15:18 test.dot.38.png
-rw-r--r-- 1 sparr staff 2784837 Feb 26 15:18 test.dot.20.png
-rw-r--r-- 1 sparr staff 2798747 Feb 26 15:18 test.dot.26.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:17 test.dot.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:18 test.dot.9.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:18 test.dot.8.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:18 test.dot.7.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:18 test.dot.6.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:19 test.dot.47.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:18 test.dot.46.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:18 test.dot.44.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:18 test.dot.42.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:17 test.dot.4.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:18 test.dot.37.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:18 test.dot.34.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:18 test.dot.32.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:18 test.dot.30.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:18 test.dot.25.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:18 test.dot.23.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:17 test.dot.2.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:18 test.dot.16.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:18 test.dot.14.png
-rw-r--r-- 1 sparr staff 2807002 Feb 26 15:18 test.dot.10.png
User avatar (0000716)
erg (administrator)
2014-03-25 10:05

The image cache relies on the node's image attribute string. Before cgraph, these lived in a global table, but cgraph uses a separate string table for each graph. So, when the graph is deleted, the name pointers in the image cache are stale. The current fix is to dup the strings in the global string table. Alternatively, we could re-open the table with each graph, or use some dynamic strategy for cleaning the image cache and restricting its size.

- Issue History
Date Modified Username Field Change
2014-02-25 14:20 sparr New Issue
2014-02-25 14:20 sparr File Added: graphviz_png_segfault_test.tar.gz
2014-02-25 14:21 sparr Note Added: 0000690
2014-02-25 16:04 ellson Note Added: 0000691
2014-02-25 16:37 sparr Note Added: 0000692
2014-02-25 17:14 ellson Note Added: 0000693
2014-02-25 17:19 sparr Note Added: 0000694
2014-02-26 10:04 sparr Note Added: 0000695
2014-02-26 10:38 ellson Note Added: 0000696
2014-02-26 15:08 sparr Note Added: 0000697
2014-02-26 15:27 sparr Note Added: 0000698
2014-03-25 10:05 erg Note Added: 0000716
2014-03-25 10:05 erg Status new => resolved
2014-03-25 10:05 erg Resolution open => fixed


MantisBT 1.2.5[^]
Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker