Discussion:
[quagga-users 5395] next ospfd
Milan Kocián
2005-09-16 01:35:37 UTC
Permalink
Hello,

I did the changes in ospf areas and after this, my testing quagga box
began to crash down.
Areas on quagga's box remained unchanged.
Valgrind is my friend :-). So I send valgrind logs (quagga cvs from
21.8.2005, debian stable, kernel 2.6.11.12):


osadni:~# valgrind /usr/lib/quagga/ospfd
==17866== Memcheck, a memory error detector for x86-linux.
==17866== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et
al.
==17866== Using valgrind-2.4.0, a program supervision framework for
x86-linux.
==17866== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et
al.
==17866== For more details, rerun with: -v
==17866==



==17866== Invalid read of size 4
==17866== at 0x1B9358D7: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A46: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB0A0 is 8 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B935902: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A46: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB0A0 is 8 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B935916: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A46: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB0A0 is 8 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B937B2A: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9373BB: ospf_intra_add_router
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A58: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB098 is 0 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B937B34: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9373BB: ospf_intra_add_router
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A58: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB09C is 4 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B937B4B: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9373BB: ospf_intra_add_router
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A58: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB09C is 4 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B937B50: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9373BB: ospf_intra_add_router
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A58: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB098 is 0 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B937B2A: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9374BF: ospf_intra_add_transit
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A64: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB098 is 0 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B937B34: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9374BF: ospf_intra_add_transit
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A64: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB09C is 4 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B937B4B: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9374BF: ospf_intra_add_transit
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A64: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB09C is 4 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B937B50: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9374BF: ospf_intra_add_transit
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A64: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB098 is 0 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)


Best regard,
--
Milan Kocián <***@wq.cz>
Milan Kocián
2005-09-20 11:41:55 UTC
Permalink
Hello,

excuse me, I forgot to attach a crash:

OSPF: Received signal 11 at 1127214511 (si_addr 0x1b0a, PC 0x242f2a0c);
aborting...
Program counter: /usr/lib/libzebra.so.0(listnode_lookup
+0x3b)[0x242f2a0c]
Backtrace for 11 stack frames:
/usr/lib/libzebra.so.0(zlog_backtrace_sigsafe+0x28)[0x24306993]
/usr/lib/libzebra.so.0(zlog_signal+0x230)[0x24306963]
/usr/lib/libzebra.so.0[0x2430e1ce]
/lib/tls/libc.so.6[0x24412a10]
/usr/lib/libospf.so.0[0x2429d90d]
/usr/lib/libospf.so.0[0x2429ea47]
/usr/lib/libospf.so.0[0x2429eb37]
/usr/lib/libzebra.so.0(thread_call+0x6f)[0x242fd04e]
/usr/lib/quagga/ospfd(main+0x2c6)[0x804945c]
/lib/tls/libc.so.6(__libc_start_main+0xf4)[0x243ff974]
/usr/lib/quagga/ospfd[0x8049081]


Sorry,I still didn't compile quagga with more debug options. I am
working on it. But this segfault is highly reproducible ( I can't run
ospfd without valgrind, because it's crashing immediately after
startup). All my test quagga routers (two) have the same behaviour.
Changes in was only in topology of the network (more areas).
Other routers are PC's with gated.
Post by Milan Kocián
Hello,
I did the changes in ospf areas and after this, my testing quagga box
began to crash down.
Areas on quagga's box remained unchanged.
Valgrind is my friend :-). So I send valgrind logs (quagga cvs from
osadni:~# valgrind /usr/lib/quagga/ospfd
==17866== Memcheck, a memory error detector for x86-linux.
==17866== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et
al.
==17866== Using valgrind-2.4.0, a program supervision framework for
x86-linux.
==17866== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et
al.
==17866== For more details, rerun with: -v
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B9358D7: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A46: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB0A0 is 8 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B935902: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A46: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB0A0 is 8 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B935916: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A46: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB0A0 is 8 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B937B2A: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9373BB: ospf_intra_add_router
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A58: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB098 is 0 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B937B34: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9373BB: ospf_intra_add_router
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A58: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB09C is 4 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B937B4B: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9373BB: ospf_intra_add_router
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A58: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB09C is 4 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B937B50: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9373BB: ospf_intra_add_router
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A58: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB098 is 0 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B937B2A: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9374BF: ospf_intra_add_transit
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A64: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB098 is 0 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B937B34: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9374BF: ospf_intra_add_transit
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A64: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB09C is 4 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B937B4B: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9374BF: ospf_intra_add_transit
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A64: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB09C is 4 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866==
==17866== Invalid read of size 4
==17866== at 0x1B937B50: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9374BF: ospf_intra_add_transit
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A64: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB098 is 0 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A23: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
Best regard,
Best regard,
--
Milan Kocián <***@wq.cz>
Andrew J. Schorr
2005-09-20 13:55:15 UTC
Permalink
Hi Milan,
Post by Milan Kocián
OSPF: Received signal 11 at 1127214511 (si_addr 0x1b0a, PC 0x242f2a0c);
aborting...
Program counter: /usr/lib/libzebra.so.0(listnode_lookup
+0x3b)[0x242f2a0c]
/usr/lib/libzebra.so.0(zlog_backtrace_sigsafe+0x28)[0x24306993]
/usr/lib/libzebra.so.0(zlog_signal+0x230)[0x24306963]
/usr/lib/libzebra.so.0[0x2430e1ce]
/lib/tls/libc.so.6[0x24412a10]
/usr/lib/libospf.so.0[0x2429d90d]
/usr/lib/libospf.so.0[0x2429ea47]
/usr/lib/libospf.so.0[0x2429eb37]
/usr/lib/libzebra.so.0(thread_call+0x6f)[0x242fd04e]
/usr/lib/quagga/ospfd(main+0x2c6)[0x804945c]
/lib/tls/libc.so.6(__libc_start_main+0xf4)[0x243ff974]
/usr/lib/quagga/ospfd[0x8049081]
Sorry,I still didn't compile quagga with more debug options. I am
working on it. But this segfault is highly reproducible ( I can't run
ospfd without valgrind, because it's crashing immediately after
startup). All my test quagga routers (two) have the same behaviour.
Changes in was only in topology of the network (more areas).
Other routers are PC's with gated.
Good news that it's reproducible, but it will be hard to debug
without getting a good backtrace. When you run the configure script,
I think you need to give the --enable-gcc-rdynamic option. And
compiling with -g may also help.

Regards,
Andy
Paul Jakma
2005-09-25 08:49:39 UTC
Permalink
Post by Milan Kocián
Hello,
OSPF: Received signal 11 at 1127214511 (si_addr 0x1b0a, PC 0x242f2a0c);
aborting...
Program counter: /usr/lib/libzebra.so.0(listnode_lookup
+0x3b)[0x242f2a0c]
Arg, what is this damn crash? Driving me nuts. I can't reproduce it
either. I'm not using areas, but AFAIK neither is Stanislav (is that
correct Stanislav?).

The answers lie in the below somehow, but we need /full/ symbol
information? Can you try get a valgrind that shows these 'invalid
reads' and 'address is ... inside block ... freed' traces with full
symbolic information? Ie compile with --enable-gcc-rdynamic
--enable-shared if you can (CFLAGS="-O -g3" too for good measure).
Post by Milan Kocián
Post by Milan Kocián
==17866== Invalid read of size 4
==17866== at 0x1B9358D7: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A46: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB0A0 is 8 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== Invalid read of size 4
==17866== at 0x1B935902: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A46: (within /usr/lib/libospf.so.0.0.0)
==17866== Address 0x1BEFB0A0 is 8 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== Invalid read of size 4
==17866== at 0x1B935916: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A46: (within /usr/lib/libospf.so.0.0.0)
==17866== Address 0x1BEFB0A0 is 8 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== Invalid read of size 4
==17866== at 0x1B937B2A: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9373BB: ospf_intra_add_router
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A58: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x804945B: main (in /usr/lib/quagga/ospfd)
==17866== Address 0x1BEFB098 is 0 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== Invalid read of size 4
==17866== at 0x1B937B34: ospf_route_copy_nexthops_from_vertex
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9373BB: ospf_intra_add_router
(in /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936A58: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B936B36: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B99604D: thread_call (in /usr/lib/libzebra.so.0.0.0)
==17866== Address 0x1BEFB09C is 4 bytes inside a block of size 12
free'd
==17866== at 0x1B904B04: free (vg_replace_malloc.c:152)
==17866== by 0x1B9973BD: zfree (in /usr/lib/libzebra.so.0.0.0)
==17866== by 0x1B9355FE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9356DE: (within /usr/lib/libospf.so.0.0.0)
==17866== by 0x1B9363CD: (within /usr/lib/libospf.so.0.0.0)
regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
Real Users find the one combination of bizarre input values that shuts
down the system for days.
s***@wwwcom.ru
2005-09-26 05:45:35 UTC
Permalink
Hello, Paul Jakma.
Post by Milan Kocián
Hello,
OSPF: Received signal 11 at 1127214511 (si_addr 0x1b0a, PC 0x242f2a0c);
aborting...
Program counter: /usr/lib/libzebra.so.0(listnode_lookup
+0x3b)[0x242f2a0c]
PJ> Arg, what is this damn crash? Driving me nuts. I can't reproduce it
PJ> either. I'm not using areas, but AFAIK neither is Stanislav (is that
PJ> correct Stanislav?).

I use only backbone area, but if Milan will say what kind of changes He
did (just spliting to several areas?), I can try reproduce it.



---
Best regards,
Stanislav G. Ryabukhin.
Electronic Shield Ltd.
***@wwwcom.ru.
Paul Jakma
2005-09-26 07:08:27 UTC
Permalink
Post by s***@wwwcom.ru
I use only backbone area, but if Milan will say what kind of
changes He did (just spliting to several areas?), I can try
reproduce it.
Well, I've setup an ABR here, and it's not crashing either.. :( Mind
you, I don't have valgrind available on the ABR - i hacked together
some basic memory redzone + poison-on-free into lib/memory.c. (It can
only work for fixed-size memory objects though). Hasn't triggered
yet, grr.

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
Real wealth can only increase.
-- R. Buckminster Fuller
s***@wwwcom.ru
2005-09-26 12:47:03 UTC
Permalink
Hello, Paul Jakma.
Post by s***@wwwcom.ru
I use only backbone area, but if Milan will say what kind of
changes He did (just spliting to several areas?), I can try
reproduce it.
PJ> Well, I've setup an ABR here, and it's not crashing either.. :( Mind
PJ> you, I don't have valgrind available on the ABR - i hacked together
PJ> some basic memory redzone + poison-on-free into lib/memory.c. (It can
PJ> only work for fixed-size memory objects though). Hasn't triggered
PJ> yet, grr.

I found interesting things....

Network topology is:

Other Routers (area 0, quagga 0.98.5) <-> R1(area 0, quagga 0.99.1) <-> R2(area 0, quagga 0.99.1)

'R1' and 'Other Routers' has full routing table of backbone (sh ip
route) and full OSPF database (sh ip ospf database), 'R2' has full
OSPF database, but in routing table, only external routes and "some 'other routes' .
Why I can see 'some other routes', I couldn't find regularities.

When I changed quagga on 'R2' from 0.99.1 to 0.98.5, I has see full
routing table.

When I split backbone area:

Other Routers (area 0, quagga 0.98.5) <-> R1(area 0, quagga 0.99.1, area 1) <-> R2(area 1, quagga 0.99.1)

No crashes and on 'R2' I see full routing table........

As 'quagga 0.99.1' I use next release:
quagga 0.99.1 from 29-Apr-2005
quagga 0.99.1 from 20050915 + Your patch ('connected route')
quagga 0.99.1 from 20050926

As 'quagga 0.98.5' I use release from 28-Aug-2005

---
Best regards,
Stanislav G. Ryabukhin.
Electronic Shield Ltd.
***@wwwcom.ru.
Paul Jakma
2005-09-26 12:46:00 UTC
Permalink
Post by s***@wwwcom.ru
I found interesting things....
Other Routers (area 0, quagga 0.98.5) <-> R1(area 0, quagga 0.99.1) <-> R2(area 0, quagga 0.99.1)
'R1' and 'Other Routers' has full routing table of backbone (sh ip
route) and full OSPF database (sh ip ospf database), 'R2' has full
OSPF database, but in routing table, only external routes and "some
'other routes' . Why I can see 'some other routes', I couldn't find
regularities.
Hmm, sounds quite plausible, as the problem is almost certainly in
the SPF code (which transforms the OSPF LSAs into routes, and which
was converted to priority queues in 0.99..)
Post by s***@wwwcom.ru
When I changed quagga on 'R2' from 0.99.1 to 0.98.5, I has see full
routing table.
Aye, it's the 'scaleable SPF' - there's a problem there somewhere.
Post by s***@wwwcom.ru
Other Routers (area 0, quagga 0.98.5) <-> R1(area 0, quagga 0.99.1, area 1) <-> R2(area 1, quagga 0.99.1)
No crashes and on 'R2' I see full routing table........
Right.

Could I ask you for a favour: If possible, could you dump the entire
OSPF LSA database, in detail, on R2 and send it to me (privately)?
Eg, maybe using:

vtysh -c 'show ip ospf database <type>'

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
... when fits of creativity run strong, more than one programmer or writer
has been known to abandon the desktop for the more spacious floor.
-- Fred Brooks
Paul Jakma
2005-09-26 12:50:07 UTC
Permalink
FWIW,

I'm running 0.99.1 as:

- ABR with the GCC 'mudflap' feature, as well as with my
redzone and memory-poison+verify hacks

- plain router on Solaris with libumem to record allocations and
perform guard checks

I've also run 0.99 on a linux/i386 box under valgrind, and I can
*not* get this damn SPF bug to trigger here. My hacked memory
debugger isn't asserting, regular gcore's taken of the solaris ospfd
don't show any problems detected by libumem, arg! :(

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
CBG: Here you go, mutton-chop Yaz.

Milhouse:
I don't want it.

CBG: Freakin' kids.

Three Men and a Comic Book (Episode 7F21)
Milan Kocián
2005-09-26 15:29:50 UTC
Permalink
Post by s***@wwwcom.ru
Hello, Paul Jakma.
Post by Milan Kocián
Hello,
OSPF: Received signal 11 at 1127214511 (si_addr 0x1b0a, PC 0x242f2a0c);
aborting...
Program counter: /usr/lib/libzebra.so.0(listnode_lookup
+0x3b)[0x242f2a0c]
PJ> Arg, what is this damn crash? Driving me nuts. I can't reproduce it
PJ> either. I'm not using areas, but AFAIK neither is Stanislav (is that
PJ> correct Stanislav?).
I use only backbone area, but if Milan will say what kind of changes He
did (just spliting to several areas?), I can try reproduce it.
Hallo,

I think it will be difficulty to reproduce. But I try to describe
situation:

I use two testing quagga routers on our network. Other routers use
gated. First is on backbone only (it is testing control system). It use
quagga only to see to other region of our network via ospf. It is not
exporting any network. Second router is on backbone and on second iface
has simple non-bbone area.

Changes, which I did:
I add two new areas and two vlinks to join it to bbone. I did small
backup ring (all was on gated's routers). We have couple of similar
rings on network. All was working perfect until I see the ospfd on
quagga's routers was down. And I could not to start ospfd. Only with
valgrind. I tested non-striped binaries. Those are little more stable.

I found out it after all, so I can't say, what action was fatal.

Best regards,
--
Milan Kocián <***@wq.cz>
Paul Jakma
2005-09-26 15:49:35 UTC
Permalink
Post by Milan Kocián
I found out it after all, so I can't say, what action was fatal.
I'd love to get a full dump of the OSPF LSA database which causes
this problem... (If you do this with 0.99.1 you likely will need to
hack to set the SPF timers to something really high. High enough to
get the dump before it runs SPF and crashes at least).

I could then try construct something to feed it to the SPF code and
reproduce the bug here offline.

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
This fortune intentionally says nothing.
Milan Kocián
2005-09-29 09:03:43 UTC
Permalink
Post by Milan Kocián
I think it will be difficulty to reproduce. But I try to describe
I use two testing quagga routers on our network. Other routers use
gated. First is on backbone only (it is testing control system). It use
quagga only to see to other region of our network via ospf. It is not
exporting any network. Second router is on backbone and on second iface
has simple non-bbone area.
Hallo,

I setup third quagga router for testing purpose, and when you want I can
give you access to it.


Best regards,
--
Milan Kocián <***@wq.cz>
s***@wwwcom.ru
2005-09-29 11:00:33 UTC
Permalink
Hello, Milan Kocián.

You wrote (Thursday, September 29, 2005) :

MK>==19621== at 0x1B904B04: free (vg_replace_malloc.c:152)
MK>==19621== by 0x1B9993BD: zfree (memory.c:106)
MK>==19621== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
MK>==19621== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
MK>==19621== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
MK>==19621== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
MK>==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
MK>==19621== by 0x1B99804D: thread_call (thread.c:891)
MK>==19621== by 0x804945B: main (ospf_main.c:325)

MK> I setup third quagga router for testing purpose, and when you want I can
MK> give you access to it.

It's seems, that quagga 0.99.1 has bug in SPF algorithm code.....
Do try to install quagga 0.98.5.


---
Best regards,
Stanislav G. Ryabukhin.
Electronic Shield Ltd.
***@wwwcom.ru.
Paul Jakma
2005-09-29 13:07:01 UTC
Permalink
Post by Milan Kocián
I setup third quagga router for testing purpose, and when you want
I can give you access to it.
That's much appreciated. However, I think I can reproduce it now - I
/think/ i've fixed the leak and it's crashing fairly instantly for me
now in the same place. :)

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
They have been at a great feast of languages, and stolen the scraps.
-- William Shakespeare, "Love's Labour's Lost"
Paul Jakma
2005-09-29 14:06:25 UTC
Permalink
Hi,

Could you try attached patch? (if it crashes, valgrind it please).

That fixes at least one leaky path (but not all), and should possibly
change when it crashes..

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
Herb: All born in wedlock?

Homer: Yeah, though the boy was a close call.

Oh Brother, Where Art Thou?
Milan Kocián
2005-09-29 18:29:29 UTC
Permalink
Post by Paul Jakma
Hi,
Could you try attached patch? (if it crashes, valgrind it please).
That fixes at least one leaky path (but not all), and should possibly
change when it crashes..
Hallo,

here is a valgrind output (I don't see difference at first glance):

==30705== Invalid read of size 4
==30705== at 0x1B9378D7: ospf_vertex_add_parent (ospf_spf.c:204)
==30705== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705== Address 0x1BE2EBC0 is 8 bytes inside a block of size 12
free'd
==30705== at 0x1B904B04: free (vg_replace_malloc.c:152)
==30705== by 0x1B9993BD: zfree (memory.c:106)
==30705== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==30705== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==30705== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==30705== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705==
==30705== Invalid read of size 4
==30705== at 0x1B937902: ospf_vertex_add_parent (ospf_spf.c:207)
==30705== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705== Address 0x1BE2EBC0 is 8 bytes inside a block of size 12
free'd
==30705== at 0x1B904B04: free (vg_replace_malloc.c:152)
==30705== by 0x1B9993BD: zfree (memory.c:106)
==30705== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==30705== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==30705== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==30705== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705==
==30705== Invalid read of size 4
==30705== at 0x1B937916: ospf_vertex_add_parent (ospf_spf.c:208)
==30705== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705== Address 0x1BE2EBC0 is 8 bytes inside a block of size 12
free'd
==30705== at 0x1B904B04: free (vg_replace_malloc.c:152)
==30705== by 0x1B9993BD: zfree (memory.c:106)
==30705== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==30705== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==30705== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==30705== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705==
==30705== Invalid read of size 4
==30705== at 0x1B939B2A: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:807)
==30705== by 0x1B9393BB: ospf_intra_add_router (ospf_route.c:411)
==30705== by 0x1B938A58: ospf_spf_calculate (ospf_spf.c:1074)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705== Address 0x1BE2EBB8 is 0 bytes inside a block of size 12
free'd
==30705== at 0x1B904B04: free (vg_replace_malloc.c:152)
==30705== by 0x1B9993BD: zfree (memory.c:106)
==30705== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==30705== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==30705== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==30705== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705==
==30705== Invalid read of size 4
==30705== at 0x1B939B34: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:809)
==30705== by 0x1B9393BB: ospf_intra_add_router (ospf_route.c:411)
==30705== by 0x1B938A58: ospf_spf_calculate (ospf_spf.c:1074)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705== Address 0x1BE2EBBC is 4 bytes inside a block of size 12
free'd
==30705== at 0x1B904B04: free (vg_replace_malloc.c:152)
==30705== by 0x1B9993BD: zfree (memory.c:106)
==30705== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==30705== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==30705== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==30705== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705==
==30705== Invalid read of size 4
==30705== at 0x1B939B4B: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:812)
==30705== by 0x1B9393BB: ospf_intra_add_router (ospf_route.c:411)
==30705== by 0x1B938A58: ospf_spf_calculate (ospf_spf.c:1074)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705== Address 0x1BE2EBBC is 4 bytes inside a block of size 12
free'd
==30705== at 0x1B904B04: free (vg_replace_malloc.c:152)
==30705== by 0x1B9993BD: zfree (memory.c:106)
==30705== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==30705== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==30705== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==30705== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705==
==30705== Invalid read of size 4
==30705== at 0x1B939B50: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:813)
==30705== by 0x1B9393BB: ospf_intra_add_router (ospf_route.c:411)
==30705== by 0x1B938A58: ospf_spf_calculate (ospf_spf.c:1074)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705== Address 0x1BE2EBB8 is 0 bytes inside a block of size 12
free'd
==30705== at 0x1B904B04: free (vg_replace_malloc.c:152)
==30705== by 0x1B9993BD: zfree (memory.c:106)
==30705== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==30705== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==30705== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==30705== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705==
==30705== Invalid read of size 4
==30705== at 0x1B939B2A: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:807)
==30705== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==30705== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705== Address 0x1BE2EBB8 is 0 bytes inside a block of size 12
free'd
==30705== at 0x1B904B04: free (vg_replace_malloc.c:152)
==30705== by 0x1B9993BD: zfree (memory.c:106)
==30705== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==30705== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==30705== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==30705== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705==
==30705== Invalid read of size 4
==30705== at 0x1B939B34: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:809)
==30705== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==30705== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705== Address 0x1BE2EBBC is 4 bytes inside a block of size 12
free'd
==30705== at 0x1B904B04: free (vg_replace_malloc.c:152)
==30705== by 0x1B9993BD: zfree (memory.c:106)
==30705== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==30705== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==30705== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==30705== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705==
==30705== Invalid read of size 4
==30705== at 0x1B939B4B: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:812)
==30705== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==30705== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705== Address 0x1BE2EBBC is 4 bytes inside a block of size 12
free'd
==30705== at 0x1B904B04: free (vg_replace_malloc.c:152)
==30705== by 0x1B9993BD: zfree (memory.c:106)
==30705== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==30705== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==30705== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==30705== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705==
==30705== Invalid read of size 4
==30705== at 0x1B939B50: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:813)
==30705== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==30705== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)
==30705== Address 0x1BE2EBB8 is 0 bytes inside a block of size 12
free'd
==30705== at 0x1B904B04: free (vg_replace_malloc.c:152)
==30705== by 0x1B9993BD: zfree (memory.c:106)
==30705== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==30705== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==30705== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==30705== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==30705== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==30705== by 0x1B99804D: thread_call (thread.c:891)
==30705== by 0x804952E: main (ospf_main.c:325)

And here is the crash:

OSPF: Received signal 11 at 1128016551 (si_addr 0x1b0a, PC 0xb7f26c82);
aborting...
Program counter: /usr/lib/libzebra.so.0(listnode_lookup
+0x46)[0xb7f26c82]
Backtrace for 11 stack frames:
/usr/lib/libzebra.so.0(zlog_backtrace_sigsafe+0x2a)[0xb7f406cb]
/usr/lib/libzebra.so.0(zlog_signal+0x2d2)[0xb7f40696]
/usr/lib/libzebra.so.0[0xb7f4a597]
[0xffffe440]
/usr/lib/libospf.so.0[0xb7f958f1]
/usr/lib/libospf.so.0[0xb7f96da5]
/usr/lib/libospf.so.0[0xb7f96eed]
/usr/lib/libzebra.so.0(thread_call+0x96)[0xb7f33e27]
/usr/lib/quagga/ospfd(main+0x35e)[0x804952f]
/lib/tls/libc.so.6(__libc_start_main+0xf4)[0xb7d3a974]
/usr/lib/quagga/ospfd[0x8049081]


Best regards,
--
Milan Kocián <***@wq.cz>
Paul Jakma
2005-09-29 19:20:51 UTC
Permalink
Reasonably sure have it figured out I think.

There are actually two problems:

1. The root cause of your crash is that struct vertex_nexthop is
referenced by multiple struct vertex's. This is primarily due to
the call to ospf_nexthop_add_unique() in
ospf_nexthop_calculation() AFAICT.

Hence, if one reference is free'd and then later handed out again -
the other is corrupted, obvious double-free problem (i caught this
with my memory.c hacks :) )

2. The SPF vertices are never cleaned up at the end of SPF
calculation, they're just leaked instead, slowly. *This* is what
it made so damn difficult to reproduce, because it then wholly
depends on your topology as to whether you hit one of the other
vertex_nexthop_free() calls. (My test topology didn't hit them -
it was too simple - Stanislav's seemed to hit them only after
several hours, yours hit them reliably). This is also why all my
tests with valgrind, libumem, etc. were to no avail. :)

Unfortunately, I never noticed problem 2 until this morning, when I
looked at the memory allocation statistics (in my memory.c patches -
Andrew, you listening? :) ).

Once I fixed the leak in two my free()/alloc() poison/verify hack
caught the problem!

The fix for 1 is obviously to either refcount struct vertex_nexthop
or deep-copy it.

I'm too tired to do this this evening, tomorrow..

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
You have taken yourself too seriously.
Paul Jakma
2005-09-29 19:40:08 UTC
Permalink
The fix for 1 is obviously to either refcount struct vertex_nexthop or
deep-copy it.
Uhmm, or only ever create them for the root vertex, and let them be
inherited as appropriate (then, only freed when the root vertex is
freed).

I'll see tomorrow.

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
"Nuclear war would really set back cable."
-- Ted Turner
Paul Jakma
2005-10-11 04:23:06 UTC
Permalink
Hi Milan, Stanislav,

You might want to try:

http://hibernia.jakma.org/~paul/patches/quagga-ospfd-spf-fix-wip.diff

I'm not happy with it, and it still leaks a bit (but not as badly as
the current ospfd SPF code), but it should stand a better change of
working.

You'll need to add a definition for MTYPE_OSPF_VERTEX_PARENT, to
lib/memtypes.c to make it work.

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
People usually get what's coming to them...unless it's been mailed.
s***@wwwcom.ru
2005-10-12 09:03:33 UTC
Permalink
Hello, Paul Jakma.

You wrote (Tuesday, October 11, 2005) :

PJ> You might want to try:
PJ> http://hibernia.jakma.org/~paul/patches/quagga-ospfd-spf-fix-wip.diff
PJ> I'm not happy with it, and it still leaks a bit (but not as badly as
PJ> the current ospfd SPF code), but it should stand a better change of
PJ> working.

Thank you very much...
It seems that all work properly after applying your patch.
In my case, now I can see all route on both routers in topology:

Other Routers (area 0, quagga 0.98.5) <-> R1(area 0, quagga 0.99.1) <-> R2(area 0, quagga 0.99.1)

If all will be fine (without crash, error or anything else) till
weekend, I'll start testing quagga in production use.

PJ> You'll need to add a definition for MTYPE_OSPF_VERTEX_PARENT, to
PJ> lib/memtypes.c to make it work.

I add MTYPE_OSPF_VERTEX_PARENT in memtypes.h also.
I see error during compilation if MTYPE_OSPF_VERTEX_PARENT
is absent in memtypes.h:

In file included from memtypes.c:12:
zebra.h:247:2: warning: #warning "CMSG_FIRSTHDR is broken on this platform, using a workaround"
memtypes.c:194: `MTYPE_OSPF_VERTEX_PARENT' undeclared here (not in a function)
memtypes.c:194: initializer element is not constant
memtypes.c:194: (near initialization for `memory_list_ospf[24].index')
memtypes.c:194: initializer element is not constant
memtypes.c:194: (near initialization for `memory_list_ospf[24]')
memtypes.c:195: initializer element is not constant
memtypes.c:195: (near initialization for `memory_list_ospf[25]')
*** Error code 1

Stop in /space/quagga-0.99.1/lib.
*** Error code 1


---
Best regards,
Stanislav G. Ryabukhin.
Electronic Shield Ltd.
***@wwwcom.ru.
Paul Jakma
2005-10-12 09:05:51 UTC
Permalink
Post by s***@wwwcom.ru
Thank you very much...
You're welcome. Thanks very much to yourself and Milan for the work
in getting debugging information too ;).
Post by s***@wwwcom.ru
It seems that all work properly after applying your patch. In my
Other Routers (area 0, quagga 0.98.5) <-> R1(area 0, quagga 0.99.1)
<-> R2(area 0, quagga 0.99.1)
Excellent.
Post by s***@wwwcom.ru
If all will be fine (without crash, error or anything else) till
weekend, I'll start testing quagga in production use.
Please also check the resulting path costs are what you expect, as I
made some changes to code that deals with distances.

Note that this patch does not fix all the leaks the unpatched code
has, so it will leak memory, though slowly. Fixing this requires some
further thought on how to reorganise some aspects of things.. (Ie, i
know why it leaks).
Post by s***@wwwcom.ru
I add MTYPE_OSPF_VERTEX_PARENT in memtypes.h also.
You shouldn't have had to do that, memtypes.h /should/ be rebuilt
automatically by make (using the memtypes.awk script) - it requires
GNU AWK though.
Post by s***@wwwcom.ru
I see error during compilation if MTYPE_OSPF_VERTEX_PARENT is
See above ;)

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
The early bird gets the coffee left over from the night before.
s***@wwwcom.ru
2005-10-12 10:35:07 UTC
Permalink
Hello, Paul Jakma.

You wrote (Wednesday, October 12, 2005) :

PJ> Please also check the resulting path costs are what you expect, as I
PJ> made some changes to code that deals with distances.

All (path costs) looks correctly...

PJ> Note that this patch does not fix all the leaks the unpatched code
PJ> has, so it will leak memory, though slowly. Fixing this requires some
PJ> further thought on how to reorganise some aspects of things.. (Ie, i
PJ> know why it leaks).

I'll be waiting next patch in near future. ;)


---
Best regards,
Stanislav G. Ryabukhin.
Electronic Shield Ltd.
***@wwwcom.ru.
Paul Jakma
2005-10-12 15:00:02 UTC
Permalink
Post by s***@wwwcom.ru
All (path costs) looks correctly...
Ok, keep an eye on them please.
Post by s***@wwwcom.ru
I'll be waiting next patch in near future. ;)
Done, seems I had already solved the internal interface issue but
forgotten I had - i just had a couple of silly errors. I have updated
the patch at the previous URL - it shouldn't leak, shouldn't crash.

Note that this patch will not work with vlinks. (I havn't updated the
backlink stuff yet to work properly.).

Changes to ospf_spf in that patch:

- struct vertex_nexthop is only allocated /once/, for the first-level
of router vertices down from the root vertex. The canonical
nexthops are then just copied by reference in vertices down from
these vertices.

- a struct vertex_parent is allocated for each vertex down from root
to hold information for each parent (including the reference to the
nexthop information). Struct ospf_spf_edge might be a better name
for it though.

- vertices are only created once, i've removed the part where it
creates a temporary vertex, was silly and was hard to not leak the
temporary vertex.

- To cleanup, at the end of SPF we first free the canonical nexthops,
then recursively go through the tree of vertices, using the
vertex_parent list as a refcount to know when to really free the
vertex.

It's still a bit messy though.

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
"Ignorance is the soil in which belief in miracles grows."
-- Robert G. Ingersoll
s***@wwwcom.ru
2005-10-13 07:57:29 UTC
Permalink
Hello, Paul Jakma.

You wrote (Wednesday, October 12, 2005) :

PJ> I have updated the patch at the previous URL - it shouldn't leak, shouldn't crash.

I applyed this patch and have next:

gw-03# gdb ospfd
GNU gdb 5.2.1 (FreeBSD)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-undermydesk-freebsd"...
(gdb) run
Starting program: /usr/local/sbin/ospfd

Program received signal SIGSEGV, Segmentation fault.
0x2808d542 in ospf_vertex_free (v=0x80cbd00, area=0x8050800) at ospf_spf.c:163
163 for (ALL_LIST_ELEMENTS (v->children, node, nnode, vc))
(gdb) backtrace
#0 0x2808d542 in ospf_vertex_free (v=0x80cbd00, area=0x8050800) at ospf_spf.c:163
#1 0x2808d55d in ospf_vertex_free (v=0x80d0420, area=0x8050800) at ospf_spf.c:164
#2 0x2808d55d in ospf_vertex_free (v=0x80cb8c0, area=0x8050800) at ospf_spf.c:164
#3 0x2808d55d in ospf_vertex_free (v=0x80c4f80, area=0x8050800) at ospf_spf.c:164
#4 0x2808d55d in ospf_vertex_free (v=0x80c4e20, area=0x8050800) at ospf_spf.c:164
#5 0x2808d55d in ospf_vertex_free (v=0x80c4d60, area=0x8050800) at ospf_spf.c:164
#6 0x2808d55d in ospf_vertex_free (v=0x80c4cc0, area=0x8050800) at ospf_spf.c:164
#7 0x2808d55d in ospf_vertex_free (v=0x80c4c60, area=0x8050800) at ospf_spf.c:164
#8 0x2808e91e in ospf_spf_calculate (area=0x8050800, new_table=0x80adad0, new_rtrs=0x80adf70) at ospf_spf.c:1087
#9 0x2808e9fc in ospf_spf_calculate_timer (thread=0x0) at ospf_spf.c:1120
#10 0x280df624 in thread_call (thread=0xbfbffb34) at thread.c:891
#11 0x080490c6 in main (argc=1, argv=0xbfbffb98) at ospf_main.c:325
#12 0x08048d35 in _start ()



---
Best regards,
Stanislav G. Ryabukhin.
Electronic Shield Ltd.
***@wwwcom.ru.
Paul Jakma
2005-10-13 13:01:04 UTC
Permalink
Hi Stanislav,
Post by s***@wwwcom.ru
Program received signal SIGSEGV, Segmentation fault.
0x2808d542 in ospf_vertex_free (v=0x80cbd00, area=0x8050800) at ospf_spf.c:163
163 for (ALL_LIST_ELEMENTS (v->children, node, nnode, vc))
Sigh..
Post by s***@wwwcom.ru
(gdb) backtrace
#0 0x2808d542 in ospf_vertex_free (v=0x80cbd00, area=0x8050800) at ospf_spf.c:163
#1 0x2808d55d in ospf_vertex_free (v=0x80d0420, area=0x8050800) at ospf_spf.c:164
#2 0x2808d55d in ospf_vertex_free (v=0x80cb8c0, area=0x8050800) at ospf_spf.c:164
#3 0x2808d55d in ospf_vertex_free (v=0x80c4f80, area=0x8050800) at ospf_spf.c:164
#4 0x2808d55d in ospf_vertex_free (v=0x80c4e20, area=0x8050800) at ospf_spf.c:164
#5 0x2808d55d in ospf_vertex_free (v=0x80c4d60, area=0x8050800) at ospf_spf.c:164
#6 0x2808d55d in ospf_vertex_free (v=0x80c4cc0, area=0x8050800) at ospf_spf.c:164
#7 0x2808d55d in ospf_vertex_free (v=0x80c4c60, area=0x8050800) at ospf_spf.c:164
You have quite a deep network.

Can you do me a favour and set 'debug ospf event' and enable logging
to somewhere. Before it crashes, once the main SPF is done it should
print out the SPF tree, ala:

OSPF: SPF Result: 0 [R] 212.17.55.54
OSPF: SPF Result: 1 [N] 212.17.55.49/29
OSPF: nexthop 0x9d6f418 0.0.0.0 eth0:212.17.55.54
OSPF: SPF Result: 2 [R] 212.17.55.49
OSPF: nexthop 0x9d713c0 212.17.55.49 eth0:212.17.55.54
<etc>

You might have to scroll back a good bit (to before the
ospf_process_stubs() log messages). I'd like to see that tree (send
it to me in private if needs be).

Even better, scroll back and find the corresponding:

"ospf_spf_calculate: running Dijkstra for area 0.0.0.0"

And from that place forward, send me those log messages.

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
He is considered a most graceful speaker who can say nothing in the most words.
Paul Jakma
2005-10-13 14:35:04 UTC
Permalink
Post by Paul Jakma
Sigh..
Bah, a silly mistake. I've updated the patch. Please try one more
time :).

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
Stop searching forever. Happiness is just next to you.
Paul Jakma
2005-10-13 14:56:16 UTC
Permalink
Bah, a silly mistake. I've updated the patch. Please try one more time :).
Sigh, I made a mistake and didn't update the patch till just now. If
you had already retrieved the patch before receiving this mail you
may not have retrieved the patch with this latest (hopefully final)
fix. Apologies.

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
The easiest way to figure the cost of living is to take your income and
add ten percent.
Milan Kocián
2005-10-13 22:36:54 UTC
Permalink
Post by Paul Jakma
Hi Milan, Stanislav,
http://hibernia.jakma.org/~paul/patches/quagga-ospfd-spf-fix-wip.diff
I'm not happy with it, and it still leaks a bit (but not as badly as
the current ospfd SPF code), but it should stand a better change of
working.
You'll need to add a definition for MTYPE_OSPF_VERTEX_PARENT, to
lib/memtypes.c to make it work.
regards,
Hallo,

still crashing. (not so good news :-))

Output:


mon:~# valgrind --leak-check=full /usr/lib/quagga/ospfd -A 127.0.0.1
==16694== Memcheck, a memory error detector for x86-linux.
==16694== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et
al.
==16694== Using valgrind-2.4.0, a program supervision framework for
x86-linux.
==16694== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et
al.
==16694== For more details, rerun with: -v
==16694==
==16694== Invalid read of size 4
==16694== at 0x1B9378D7: ospf_vertex_add_parent (ospf_spf.c:204)
==16694== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==16694== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==16694== by 0x1B99804D: thread_call (thread.c:891)
==16694== by 0x804952E: main (ospf_main.c:325)
==16694== Address 0x1BE606A8 is 8 bytes inside a block of size 12
free'd
==16694== at 0x1B904B04: free (vg_replace_malloc.c:152)
==16694== by 0x1B9993BD: zfree (memory.c:106)
==16694== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==16694== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==16694== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==16694== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==16694== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==16694== by 0x1B99804D: thread_call (thread.c:891)
==16694== by 0x804952E: main (ospf_main.c:325)
==16694==
==16694== Invalid read of size 4
==16694== at 0x1B937902: ospf_vertex_add_parent (ospf_spf.c:207)
==16694== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==16694== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==16694== by 0x1B99804D: thread_call (thread.c:891)
==16694== by 0x804952E: main (ospf_main.c:325)
==16694== Address 0x1BE606A8 is 8 bytes inside a block of size 12
free'd
==16694== at 0x1B904B04: free (vg_replace_malloc.c:152)
==16694== by 0x1B9993BD: zfree (memory.c:106)
==16694== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==16694== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==16694== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==16694== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==16694== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==16694== by 0x1B99804D: thread_call (thread.c:891)
==16694== by 0x804952E: main (ospf_main.c:325)
==16694==
==16694== Invalid read of size 4
==16694== at 0x1B937916: ospf_vertex_add_parent (ospf_spf.c:208)
==16694== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==16694== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==16694== by 0x1B99804D: thread_call (thread.c:891)
==16694== by 0x804952E: main (ospf_main.c:325)
==16694== Address 0x1BE606A8 is 8 bytes inside a block of size 12
free'd
==16694== at 0x1B904B04: free (vg_replace_malloc.c:152)
==16694== by 0x1B9993BD: zfree (memory.c:106)
==16694== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==16694== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==16694== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==16694== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==16694== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==16694== by 0x1B99804D: thread_call (thread.c:891)
==16694== by 0x804952E: main (ospf_main.c:325)
==16694==
==16694== Invalid read of size 4
==16694== at 0x1B939B2A: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:807)
==16694== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==16694== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==16694== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==16694== by 0x1B99804D: thread_call (thread.c:891)
==16694== by 0x804952E: main (ospf_main.c:325)
==16694== Address 0x1BE606A0 is 0 bytes inside a block of size 12
free'd
==16694== at 0x1B904B04: free (vg_replace_malloc.c:152)
==16694== by 0x1B9993BD: zfree (memory.c:106)
==16694== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==16694== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==16694== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==16694== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==16694== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==16694== by 0x1B99804D: thread_call (thread.c:891)
==16694== by 0x804952E: main (ospf_main.c:325)
==16694==
==16694== Invalid read of size 4
==16694== at 0x1B939B34: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:809)
==16694== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==16694== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==16694== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==16694== by 0x1B99804D: thread_call (thread.c:891)
==16694== by 0x804952E: main (ospf_main.c:325)
==16694== Address 0x1BE606A4 is 4 bytes inside a block of size 12
free'd
==16694== at 0x1B904B04: free (vg_replace_malloc.c:152)
==16694== by 0x1B9993BD: zfree (memory.c:106)
==16694== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==16694== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==16694== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==16694== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==16694== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==16694== by 0x1B99804D: thread_call (thread.c:891)
==16694== by 0x804952E: main (ospf_main.c:325)
==16694==
==16694== Invalid read of size 4
==16694== at 0x1B939B4B: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:812)
==16694== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==16694== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==16694== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==16694== by 0x1B99804D: thread_call (thread.c:891)
==16694== by 0x804952E: main (ospf_main.c:325)
==16694== Address 0x1BE606A4 is 4 bytes inside a block of size 12
free'd
==16694== at 0x1B904B04: free (vg_replace_malloc.c:152)
==16694== by 0x1B9993BD: zfree (memory.c:106)
==16694== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==16694== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==16694== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==16694== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==16694== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==16694== by 0x1B99804D: thread_call (thread.c:891)
==16694== by 0x804952E: main (ospf_main.c:325)
==16694==
==16694== Invalid read of size 4
==16694== at 0x1B939B50: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:813)
==16694== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==16694== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==16694== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==16694== by 0x1B99804D: thread_call (thread.c:891)
==16694== by 0x804952E: main (ospf_main.c:325)
==16694== Address 0x1BE606A0 is 0 bytes inside a block of size 12
free'd
==16694== at 0x1B904B04: free (vg_replace_malloc.c:152)
==16694== by 0x1B9993BD: zfree (memory.c:106)
==16694== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==16694== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==16694== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==16694== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==16694== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==16694== by 0x1B99804D: thread_call (thread.c:891)
==16694== by 0x804952E: main (ospf_main.c:325)
^[[B^[[B^[[A
==16694==
==16694== ERROR SUMMARY: 28 errors from 7 contexts (suppressed: 35 from
1)
==16694== malloc/free: in use at exit: 588410 bytes in 25792 blocks.
==16694== malloc/free: 61791 allocs, 35999 frees, 1983530 bytes
allocated.
==16694== For counts of detected errors, rerun with: -v
==16694== searching for pointers to 25792 not-freed blocks.
==16694== checked 904840 bytes.
==16694==
==16694==
==16694== 24 bytes in 1 blocks are possibly lost in loss record 1 of 14
==16694== at 0x1B90459D: malloc (vg_replace_malloc.c:130)
==16694== by 0x1B9C8E21: cap_init (in /lib/libcap.so.1.10)
==16694== by 0x1B9A8E24: zprivs_init (privs.c:270)
==16694== by 0x804943C: main (ospf_main.c:279)
==16694==
==16694==
==16694== 312 (72 direct, 240 indirect) bytes in 2 blocks are definitely
lost in loss record 2 of 14
==16694== at 0x1B90459D: malloc (vg_replace_malloc.c:130)
==16694== by 0x1BB74EE6: (within /lib/tls/libc-2.3.2.so)
==16694== by 0x1BB74788: __nss_database_lookup
(in /lib/tls/libc-2.3.2.so)
==16694== by 0x1BCCC96F: ???
==16694== by 0x1BB35B1B: getpwnam_r (in /lib/tls/libc-2.3.2.so)
==16694== by 0x1BB35440: getpwnam (in /lib/tls/libc-2.3.2.so)
==16694== by 0x1B9A8CD7: zprivs_init (privs.c:194)
==16694== by 0x804943C: main (ospf_main.c:279)
==16694==
==16694==
==16694== 33522 (1680 direct, 31842 indirect) bytes in 140 blocks are
definitely lost in loss record 5 of 14
==16694== at 0x1B904F75: calloc (vg_replace_malloc.c:175)
==16694== by 0x1B99933B: zcalloc (memory.c:79)
==16694== by 0x1B98D3E4: vector_init (vector.c:31)
==16694== by 0x1B99181B: cmd_make_descvec (command.c:364)
==16694== by 0x1B991A50: install_element (command.c:490)
==16694== by 0x1B9955A8: cmd_init (command.c:3507)
==16694== by 0x8049465: main (ospf_main.c:281)
==16694==
==16694==
==16694== 21980 (256 direct, 21724 indirect) bytes in 9 blocks are
definitely lost in loss record 9 of 14
==16694== at 0x1B90459D: malloc (vg_replace_malloc.c:130)
==16694== by 0x1B9992F6: zmalloc (memory.c:63)
==16694== by 0x1B937642: ospf_vertex_new (ospf_spf.c:109)
==16694== by 0x1B93831F: ospf_spf_next (ospf_spf.c:751)
==16694== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==16694== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==16694== by 0x1B99804D: thread_call (thread.c:891)
==16694== by 0x804952E: main (ospf_main.c:325)
==16694==
==16694== LEAK SUMMARY:
==16694== definitely lost: 2008 bytes in 151 blocks.
==16694== indirectly lost: 53806 bytes in 3816 blocks.
==16694== possibly lost: 24 bytes in 1 blocks.
==16694== still reachable: 532572 bytes in 21824 blocks.
==16694== suppressed: 0 bytes in 0 blocks.
==16694== Reachable blocks (those to which a pointer was found) are not
shown.
==16694== To see them, rerun with: --show-reachable=yes

In log I see:

2005/10/13 23:51:34 OSPF: DR-Election[1st]: DR 212.71.169.2
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80bdbf0
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80c8010
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80c9bf8
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80c0900
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef190
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef1b0
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef1d0
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef1f0
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef210
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef230
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef250
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef270
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef308
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef3e0
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef400
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef420
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef440
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef460
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef538
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef558
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef578
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef598
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef728
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef748
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef768
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef9b0
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef9d0
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80ef9f0
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80efa10
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80efa30
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80efa50
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80efa70
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80efb48
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80efb68
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80f02b8
2005/10/13 23:51:35 OSPF: ospf_vertex_new: 0x80f0950
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80bdbf0
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80c8010
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80c9bf8
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef308
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80c0900
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef460
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef440
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef400
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef3e0
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80f0950
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef420
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef1b0
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef578
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef598
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef558
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef538
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef230
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef270
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef728
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef768
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef748
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef250
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef1d0
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef190
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef9b0
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80efa50
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80efa30
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80efa10
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80f02b8
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef9f0
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef9d0
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80efa70
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef1f0
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80efb68
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80efb48
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef210
2005/10/13 23:51:35 OSPF: ospf_vertex_free: 0x80ef538
OSPF: Received signal 11 at 1129240295 (si_addr 0x8, PC 0xb7f9573e);
aborting...
Program counter: /usr/lib/libospf.so.0[0xb7f9573e]
Backtrace for 13 stack frames:
/usr/lib/libzebra.so.0(zlog_backtrace_sigsafe+0x2a)[0xb7f406db]
/usr/lib/libzebra.so.0(zlog_signal+0x2d2)[0xb7f406a6]
/usr/lib/libzebra.so.0[0xb7f4a5a7]
[0xffffe440]
/usr/lib/libospf.so.0[0xb7f956f9]
/usr/lib/libospf.so.0[0xb7f956f9]
/usr/lib/libospf.so.0[0xb7f956f9]
/usr/lib/libospf.so.0[0xb7f96e84]
/usr/lib/libospf.so.0[0xb7f96f3b]
/usr/lib/libzebra.so.0(thread_call+0x96)[0xb7f33e37]
/usr/lib/quagga/ospfd(main+0x35e)[0x804952f]
/lib/tls/libc.so.6(__libc_start_main+0xf4)[0xb7d3a974]
/usr/lib/quagga/ospfd[0x8049081]

And last valgrind (without --leak-check=full) after running quagga on
other router in the same area:

mon:~# valgrind /usr/lib/quagga/ospfd -A 127.0.0.1
==25899== Memcheck, a memory error detector for x86-linux.
==25899== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et
al.
==25899== Using valgrind-2.4.0, a program supervision framework for
x86-linux.
==25899== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et
al.
==25899== For more details, rerun with: -v
==25899==


==25899== Invalid read of size 4
==25899== at 0x1B9378D7: ospf_vertex_add_parent (ospf_spf.c:204)
==25899== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==25899== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==25899== by 0x1B99804D: thread_call (thread.c:891)
==25899== by 0x804952E: main (ospf_main.c:325)
==25899== Address 0x1BF147B8 is 8 bytes inside a block of size 12
free'd
==25899== at 0x1B904B04: free (vg_replace_malloc.c:152)
==25899== by 0x1B9993BD: zfree (memory.c:106)
==25899== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==25899== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==25899== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==25899== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==25899== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==25899== by 0x1B99804D: thread_call (thread.c:891)
==25899== by 0x804952E: main (ospf_main.c:325)
==25899==
==25899== Invalid read of size 4
==25899== at 0x1B937902: ospf_vertex_add_parent (ospf_spf.c:207)
==25899== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==25899== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==25899== by 0x1B99804D: thread_call (thread.c:891)
==25899== by 0x804952E: main (ospf_main.c:325)
==25899== Address 0x1BF147B8 is 8 bytes inside a block of size 12
free'd
==25899== at 0x1B904B04: free (vg_replace_malloc.c:152)
==25899== by 0x1B9993BD: zfree (memory.c:106)
==25899== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==25899== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==25899== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==25899== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==25899== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==25899== by 0x1B99804D: thread_call (thread.c:891)
==25899== by 0x804952E: main (ospf_main.c:325)
==25899==
==25899== Invalid read of size 4
==25899== at 0x1B937916: ospf_vertex_add_parent (ospf_spf.c:208)
==25899== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==25899== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==25899== by 0x1B99804D: thread_call (thread.c:891)
==25899== by 0x804952E: main (ospf_main.c:325)
==25899== Address 0x1BF147B8 is 8 bytes inside a block of size 12
free'd
==25899== at 0x1B904B04: free (vg_replace_malloc.c:152)
==25899== by 0x1B9993BD: zfree (memory.c:106)
==25899== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==25899== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==25899== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==25899== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==25899== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==25899== by 0x1B99804D: thread_call (thread.c:891)
==25899== by 0x804952E: main (ospf_main.c:325)
==25899==
==25899== Invalid read of size 4
==25899== at 0x1B939B2A: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:807)
==25899== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==25899== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==25899== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==25899== by 0x1B99804D: thread_call (thread.c:891)
==25899== by 0x804952E: main (ospf_main.c:325)
==25899== Address 0x1BF147B0 is 0 bytes inside a block of size 12
free'd
==25899== at 0x1B904B04: free (vg_replace_malloc.c:152)
==25899== by 0x1B9993BD: zfree (memory.c:106)
==25899== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==25899== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==25899== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==25899== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==25899== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==25899== by 0x1B99804D: thread_call (thread.c:891)
==25899== by 0x804952E: main (ospf_main.c:325)
==25899==
==25899== Invalid read of size 4
==25899== at 0x1B939B34: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:809)
==25899== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==25899== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==25899== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==25899== by 0x1B99804D: thread_call (thread.c:891)
==25899== by 0x804952E: main (ospf_main.c:325)
==25899== Address 0x1BF147B4 is 4 bytes inside a block of size 12
free'd
==25899== at 0x1B904B04: free (vg_replace_malloc.c:152)
==25899== by 0x1B9993BD: zfree (memory.c:106)
==25899== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==25899== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==25899== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==25899== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==25899== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==25899== by 0x1B99804D: thread_call (thread.c:891)
==25899== by 0x804952E: main (ospf_main.c:325)
==25899==
==25899== Invalid read of size 4
==25899== at 0x1B939B4B: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:812)
==25899== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==25899== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==25899== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==25899== by 0x1B99804D: thread_call (thread.c:891)
==25899== by 0x804952E: main (ospf_main.c:325)
==25899== Address 0x1BF147B4 is 4 bytes inside a block of size 12
free'd
==25899== at 0x1B904B04: free (vg_replace_malloc.c:152)
==25899== by 0x1B9993BD: zfree (memory.c:106)
==25899== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==25899== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==25899== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==25899== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==25899== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==25899== by 0x1B99804D: thread_call (thread.c:891)
==25899== by 0x804952E: main (ospf_main.c:325)
==25899==
==25899== Invalid read of size 4
==25899== at 0x1B939B50: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:813)
==25899== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==25899== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==25899== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==25899== by 0x1B99804D: thread_call (thread.c:891)
==25899== by 0x804952E: main (ospf_main.c:325)
==25899== Address 0x1BF147B0 is 0 bytes inside a block of size 12
free'd
==25899== at 0x1B904B04: free (vg_replace_malloc.c:152)
==25899== by 0x1B9993BD: zfree (memory.c:106)
==25899== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==25899== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==25899== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==25899== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==25899== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==25899== by 0x1B99804D: thread_call (thread.c:891)
==25899== by 0x804952E: main (ospf_main.c:325)

Best regards,

Milan Kocian
Paul Jakma
2005-10-13 23:04:46 UTC
Permalink
Post by Milan Kocián
Hallo,
still crashing. (not so good news :-))
Can you confirm which revision of the patch? Before or after I fixed
the silly mistake which Stanislav detected in his testing?

If after, then can I get the logs with 'debug ospf event' enabled?

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
Some performers on television appear to be horrible people, but when
you finally get to know them in person, they turn out to be even worse.
-- Avery
Milan Kocián
2005-10-14 10:50:31 UTC
Permalink
Post by Paul Jakma
Post by Milan Kocián
Hallo,
still crashing. (not so good news :-))
Can you confirm which revision of the patch? Before or after I fixed
the silly mistake which Stanislav detected in his testing?
I think I use the last revision. I downloaded your patch 4 hours after
your last mail about it:

"Done, seems I had already solved the internal interface issue but
"forgotten I had - i just had a couple of silly errors. I have updated
"the patch at the previous URL - it shouldn't leak, shouldn't crash.
"
"Note that this patch will not work with vlinks. (I havn't updated the
"backlink stuff yet to work properly.).
Post by Paul Jakma
If after, then can I get the logs with 'debug ospf event' enabled?
It is longer. So I will not send it directly:

http://www.wq.cz/ospfd.log-14.10.2005

It's from start to crash.


Best regards,

Milan Kocian
Paul Jakma
2005-10-14 15:48:38 UTC
Permalink
Hi Milan, Stanislav,

Sorry, it was another silly mistake. Please try again (updated patch
at same URL).

Hopefully that's it - I'm trying to figure out how I can setup my
test network to have a vertex with multiple parents, but until then I
can't test that aspect of this patch and depend on yourselves to do
so, sorry. ;)

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
Your ignorance cramps my conversation.
s***@wwwcom.ru
2005-10-17 08:20:15 UTC
Permalink
Hello Paul, Milan.

You wrote (Friday, October 14, 2005) :
PJ> Sorry, it was another silly mistake. Please try again (updated patch
PJ> at same URL).

PJ> Hopefully that's it - I'm trying to figure out how I can setup my
PJ> test network to have a vertex with multiple parents, but until then I
PJ> can't test that aspect of this patch and depend on yourselves to do
PJ> so, sorry. ;)

Now ospfd works without crash.
I can see all routes, full LSDB. All path costs looks correctly.
I'm happy!
Thank You very much for Your patchs (RTM_CHANGE and this ).



---
Best regards,
Stanislav G. Ryabukhin.
Electronic Shield Ltd.
***@wwwcom.ru.
Paul Jakma
2005-10-17 10:18:36 UTC
Permalink
Post by s***@wwwcom.ru
Now ospfd works without crash.
Excellent ;) Now let's hope it also cures the crash on Milan's
network.
Post by s***@wwwcom.ru
I can see all routes, full LSDB. All path costs looks correctly.
I'm happy! Thank You very much for Your patchs (RTM_CHANGE and this
).
You're welcome!

BTW, I might have another form of the patch for you test later, if
possible, as I'd like to clean it up a bit before commiting.

Eg, I'd like the sillyness where spf_next does:

for (every link of V to W)
if (best distance from V to W)
ospf_nexthop_calculation:
for (every link of V to W)
add that link as a path.

That latter loop is why the have the stupidity in
ospf_spf_consider_nexthop() of checking nexthop output costs. It's
wrong, the nexthop calculation should just be per-link (not all
links). ;)

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
Duty, n:
What one expects from others.
-- Oscar Wilde
s***@wwwcom.ru
2005-10-17 13:25:49 UTC
Permalink
Hello, Paul Jakma.

You wrote (Monday, October 17, 2005) :

PJ> BTW, I might have another form of the patch for you test later, if
PJ> possible, as I'd like to clean it up a bit before commiting.

No problems, I am ready to test next patch. ;)


---
Best regards,
Stanislav G. Ryabukhin.
Electronic Shield Ltd.
***@wwwcom.ru.
Paul Jakma
2005-10-17 14:47:38 UTC
Permalink
Post by s***@wwwcom.ru
No problems, I am ready to test next patch. ;)
Well, reload the patch URL so ;) - it runs, but I only have a very
simple topology here.

If it has problems, it will be in the result of SPF computation
(shouldn't crash, we fixed that :) ) - in particular the next-hops
used where you have multiple links to the same neighbour.

Note that the results may end up different to before in certain
situations, but the idea is it should be /better/ than before - I've
cleaned it up and lifted a limitation in the old code.

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
Immanuel doesn't pun, he Kant.
Milan Kocián
2005-10-17 19:49:59 UTC
Permalink
Hallo,

I downloaded patch 30 mins ago and I see this compiling error:

ospf_interface.c: In function `ospf_new_if_params':
ospf_interface.c:562: error: structure has no member named
`fast_hello__config'
ospf_interface.c: In function `ospf_free_if_params':
ospf_interface.c:604: error: structure has no member named
`fast_hello__config'
ospf_interface.c: In function `ospf_if_new_hook':
ospf_interface.c:703: error: structure has no member named
`fast_hello__config'
ospf_interface.c:704: error: structure has no member named `fast_hello'
ospf_interface.c:704: error: `OSPF_FAST_HELLO_DEFAULT' undeclared (first
use in this function)
ospf_interface.c:704: error: (Each undeclared identifier is reported
only once
ospf_interface.c:704: error: for each function it appears in.)
make[3]: *** [ospf_interface.lo] Error 1
make[3]: Leaving directory `/home/debs/quagga/cvs/quagga-0.99.1/ospfd'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/debs/quagga/cvs/quagga-0.99.1'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/home/debs/quagga/cvs/quagga-0.99.1'
make: *** [build-stamp] Error 2

When I grep source I see:

grep -r OSPF_FAST_HELLO_DEFAULT *

ospfd/ospf_interface.c: IF_DEF_PARAMS (ifp)->fast_hello =
OSPF_FAST_HELLO_DEFAULT;

Defined only once. Missed I something?

Best regards,

Milan Kocian
Post by Paul Jakma
Post by s***@wwwcom.ru
Now ospfd works without crash.
Excellent ;) Now let's hope it also cures the crash on Milan's
network.
Post by s***@wwwcom.ru
I can see all routes, full LSDB. All path costs looks correctly.
I'm happy! Thank You very much for Your patchs (RTM_CHANGE and this
).
You're welcome!
BTW, I might have another form of the patch for you test later, if
possible, as I'd like to clean it up a bit before commiting.
for (every link of V to W)
if (best distance from V to W)
for (every link of V to W)
add that link as a path.
That latter loop is why the have the stupidity in
ospf_spf_consider_nexthop() of checking nexthop output costs. It's
wrong, the nexthop calculation should just be per-link (not all
links). ;)
regards,
Paul Jakma
2005-10-17 20:00:44 UTC
Permalink
Post by Milan Kocián
Hallo,
ospf_interface.c:562: error: structure has no member named
`fast_hello__config'
Defined only once. Missed I something?
Oops, bits from an unrelated patch seemed to have snuck in. Please
try again, I editdiff'd it and removed the stray changes.

Speaking of which, anyone fancy testing a patch that adds sub-second
hello's with 1s dead-interval (aka "fast-hello") to ospfd?
Post by Milan Kocián
Milan Kocian
regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
Password:
Milan Kocián
2005-10-17 20:52:27 UTC
Permalink
Hello,
Post by Paul Jakma
Oops, bits from an unrelated patch seemed to have snuck in. Please
try again, I editdiff'd it and removed the stray changes.
excellent, it is working now. I will let it run. We will see. Is it
final version? Will it be backported to stable version?
Post by Paul Jakma
Speaking of which, anyone fancy testing a patch that adds sub-second
hello's with 1s dead-interval (aka "fast-hello") to ospfd?
Interesting. When you want I can test it in my home network.
Post by Paul Jakma
regards,
--
Best regards and many thanks for your work.


Milan Kocian
Paul Jakma
2005-10-17 21:09:55 UTC
Permalink
Post by Milan Kocián
excellent, it is working now.
Yay :)
Post by Milan Kocián
I will let it run. We will see.
Iirc, in your topology it crashed pretty much instantly, right?
(Unlike Stanislav who had to wait a good few hours - presumably for
just the right change in his topology).
Post by Milan Kocián
Is it final version?
I have to fix the 'backlink' index, which is used by virtual-links.
Post by Milan Kocián
Will it be backported to stable version?
Stable (ie 0.98) does not have this crash AFAIK. The memory
management bugs were specific to the O((M+N)logN) SPF scaleability
changes added in 0.99.
Post by Milan Kocián
Interesting. When you want I can test it in my home network.
Will post a patch soon. It works here at. It should be compatible
with the "fast-hello" feature in IOS.

Also going to add sub-second SPF calculations and adaptive hold-times
(though, "timers spf 0 X" is as good really, from convergence POV at
least (X > 0 is usually a good idea)).

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
Predestination was doomed from the start.
Milan Kocián
2005-10-17 21:37:01 UTC
Permalink
Post by Paul Jakma
Post by Milan Kocián
excellent, it is working now.
Yay :)
Post by Milan Kocián
I will let it run. We will see.
Iirc, in your topology it crashed pretty much instantly, right?
(Unlike Stanislav who had to wait a good few hours - presumably for
just the right change in his topology).
Yes.
Post by Paul Jakma
Post by Milan Kocián
Is it final version?
I have to fix the 'backlink' index, which is used by virtual-links.
Post by Milan Kocián
Will it be backported to stable version?
Stable (ie 0.98) does not have this crash AFAIK. The memory
management bugs were specific to the O((M+N)logN) SPF scaleability
changes added in 0.99.
So, why I use devel version? :-))) Yes, I remember, there was problems
with stable version and Andrew J. Schorr recomended me to test devel
version. :-)
Post by Paul Jakma
Post by Milan Kocián
Interesting. When you want I can test it in my home network.
Will post a patch soon. It works here at. It should be compatible
with the "fast-hello" feature in IOS.
Also going to add sub-second SPF calculations and adaptive hold-times
(though, "timers spf 0 X" is as good really, from convergence POV at
least (X > 0 is usually a good idea)).
ok.
Post by Paul Jakma
regards,
--
Predestination was doomed from the start.
I run it with valgring (just to be sure). here is output:

mon:~# ==6633== Memcheck, a memory error detector for x86-linux.
==6633== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
==6633== Using valgrind-2.4.0, a program supervision framework for
x86-linux.
==6633== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==6633== For more details, rerun with: -v
==6633==
==6633== Invalid read of size 4
==6633== at 0x1B9378D7: ospf_vertex_add_parent (ospf_spf.c:204)
==6633== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==6633== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==6633== by 0x1B99804D: thread_call (thread.c:891)
==6633== by 0x804952E: main (ospf_main.c:325)
==6633== Address 0x1BE5FA08 is 8 bytes inside a block of size 12 free'd
==6633== at 0x1B904B04: free (vg_replace_malloc.c:152)
==6633== by 0x1B9993BD: zfree (memory.c:106)
==6633== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==6633== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==6633== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==6633== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==6633== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==6633== by 0x1B99804D: thread_call (thread.c:891)
==6633== by 0x804952E: main (ospf_main.c:325)
==6633==
==6633== Invalid read of size 4
==6633== at 0x1B937902: ospf_vertex_add_parent (ospf_spf.c:207)
==6633== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==6633== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==6633== by 0x1B99804D: thread_call (thread.c:891)
==6633== by 0x804952E: main (ospf_main.c:325)
==6633== Address 0x1BE5FA08 is 8 bytes inside a block of size 12 free'd
==6633== at 0x1B904B04: free (vg_replace_malloc.c:152)
==6633== by 0x1B9993BD: zfree (memory.c:106)
==6633== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==6633== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==6633== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==6633== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==6633== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==6633== by 0x1B99804D: thread_call (thread.c:891)
==6633== by 0x804952E: main (ospf_main.c:325)
==6633==
==6633== Invalid read of size 4
==6633== at 0x1B937916: ospf_vertex_add_parent (ospf_spf.c:208)
==6633== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==6633== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==6633== by 0x1B99804D: thread_call (thread.c:891)
==6633== by 0x804952E: main (ospf_main.c:325)
==6633== Address 0x1BE5FA08 is 8 bytes inside a block of size 12 free'd
==6633== at 0x1B904B04: free (vg_replace_malloc.c:152)
==6633== by 0x1B9993BD: zfree (memory.c:106)
==6633== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==6633== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==6633== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==6633== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==6633== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==6633== by 0x1B99804D: thread_call (thread.c:891)
==6633== by 0x804952E: main (ospf_main.c:325)
==6633==
==6633== Invalid read of size 4
==6633== at 0x1B939B2A: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:807)
==6633== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==6633== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==6633== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==6633== by 0x1B99804D: thread_call (thread.c:891)
==6633== by 0x804952E: main (ospf_main.c:325)
==6633== Address 0x1BE5FA00 is 0 bytes inside a block of size 12 free'd
==6633== at 0x1B904B04: free (vg_replace_malloc.c:152)
==6633== by 0x1B9993BD: zfree (memory.c:106)
==6633== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==6633== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==6633== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==6633== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==6633== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==6633== by 0x1B99804D: thread_call (thread.c:891)
==6633== by 0x804952E: main (ospf_main.c:325)
==6633==
==6633== Invalid read of size 4
==6633== at 0x1B939B34: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:809)
==6633== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==6633== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==6633== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==6633== by 0x1B99804D: thread_call (thread.c:891)
==6633== by 0x804952E: main (ospf_main.c:325)
==6633== Address 0x1BE5FA04 is 4 bytes inside a block of size 12 free'd
==6633== at 0x1B904B04: free (vg_replace_malloc.c:152)
==6633== by 0x1B9993BD: zfree (memory.c:106)
==6633== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==6633== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==6633== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==6633== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==6633== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==6633== by 0x1B99804D: thread_call (thread.c:891)
==6633== by 0x804952E: main (ospf_main.c:325)
==6633==
==6633== Invalid read of size 4
==6633== at 0x1B939B4B: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:812)
==6633== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==6633== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==6633== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==6633== by 0x1B99804D: thread_call (thread.c:891)
==6633== by 0x804952E: main (ospf_main.c:325)
==6633== Address 0x1BE5FA04 is 4 bytes inside a block of size 12 free'd
==6633== at 0x1B904B04: free (vg_replace_malloc.c:152)
==6633== by 0x1B9993BD: zfree (memory.c:106)
==6633== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==6633== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==6633== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==6633== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==6633== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==6633== by 0x1B99804D: thread_call (thread.c:891)
==6633== by 0x804952E: main (ospf_main.c:325)
==6633==
==6633== Invalid read of size 4
==6633== at 0x1B939B50: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:813)
==6633== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==6633== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==6633== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==6633== by 0x1B99804D: thread_call (thread.c:891)
==6633== by 0x804952E: main (ospf_main.c:325)
==6633== Address 0x1BE5FA00 is 0 bytes inside a block of size 12 free'd
==6633== at 0x1B904B04: free (vg_replace_malloc.c:152)
==6633== by 0x1B9993BD: zfree (memory.c:106)
==6633== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==6633== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==6633== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==6633== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==6633== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==6633== by 0x1B99804D: thread_call (thread.c:891)
==6633== by 0x804952E: main (ospf_main.c:325)

Best regards,

Milan Kocian
Paul Jakma
2005-10-18 03:26:18 UTC
Permalink
Post by Milan Kocián
Yes.
So hopefully your problem is gone.
Post by Milan Kocián
So, why I use devel version? :-)))
;)
Post by Milan Kocián
Yes, I remember, there was problems with stable version and Andrew
J. Schorr recomended me to test devel version. :-)
That Andrew, he's always causing problems :). Likely you were
encountering some other problem which was probably fixed in CVS. (Ie
he wanted you to check something which had changed in 0.99).
Post by Milan Kocián
==6633== Invalid read of size 4
==6633== at 0x1B9378D7: ospf_vertex_add_parent (ospf_spf.c:204)
==6633== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==6633== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==6633== by 0x1B99804D: thread_call (thread.c:891)
==6633== by 0x804952E: main (ospf_main.c:325)
==6633== Address 0x1BE5FA08 is 8 bytes inside a block of size 12 free'd
==6633== at 0x1B904B04: free (vg_replace_malloc.c:152)
==6633== by 0x1B9993BD: zfree (memory.c:106)
==6633== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==6633== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==6633== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
Uh, how is this possible? That's a symptom of the bug I've fixed -
fixed by removing ospf_vertex_free() from this path. With the patch,
ospf_spf_next() does not call ospf_vertex_free(), nor does
ospf_vertex_free() call vertex_nexthop_free().

You sure that's the right binary? (accidently using an old version
perhaps, or a binary compiled without the patch?)

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
Whom computers would destroy, they must first drive mad.
Milan Kocián
2005-10-18 08:19:51 UTC
Permalink
Post by Paul Jakma
Uh, how is this possible? That's a symptom of the bug I've fixed -
fixed by removing ospf_vertex_free() from this path. With the patch,
ospf_spf_next() does not call ospf_vertex_free(), nor does
ospf_vertex_free() call vertex_nexthop_free().
You sure that's the right binary? (accidently using an old version
perhaps, or a binary compiled without the patch?)
regards,
--
Whom computers would destroy, they must first drive mad.
Hallo,

I am sorry, you're right. I use old nonstripped libraries from first
experiments (loaded and linked manually). And dpkg does not rewrite
links to libraries (I do new debian package now).
Because I use this router to reporting (faster, bigger hdd) and second
only to see running/crash (only CF), all my reports till now was with
bad libraries.
On second router is all ok. After deleting old libraries is all ok.
Valgrind says nothing.
Interestingly, first run after installation was always ok. Even the
second run produce known bugs. (I reinstall, run, run, reinstall, run,
run .. I was crazy from this :-))

Conclusion: I am so stupid.

Best regards,

Milan Kocian
Paul Jakma
2005-10-18 08:30:20 UTC
Permalink
Post by Milan Kocián
On second router is all ok. After deleting old libraries is all ok.
Valgrind says nothing.
Excellent! :)
Post by Milan Kocián
Interestingly, first run after installation was always ok. Even the
second run produce known bugs. (I reinstall, run, run, reinstall,
run, run .. I was crazy from this :-))
Hehe, sorry about that ;).
Post by Milan Kocián
Conclusion: I am so stupid.
Bah, tiny mistake.

So the problem is definitely fixed then. Good news. Thanks to both of
you for your invaluable help!

I've commited the fixes to CVS - I still have to fix the 'backlink'
though, so CVS won't just yet work 100% correctly with Virtual links.
(But you, Milan, had another problem report with virtual links,
didn't you?)

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
Howe's Law:
Everyone has a scheme that will not work.
Milan Kocián
2005-10-18 09:33:01 UTC
Permalink
Post by Paul Jakma
Post by Milan Kocián
Interestingly, first run after installation was always ok. Even the
second run produce known bugs. (I reinstall, run, run, reinstall,
run, run .. I was crazy from this :-))
Hehe, sorry about that ;).
No need apologize. It was my mistake. :-)
Post by Paul Jakma
So the problem is definitely fixed then. Good news. Thanks to both of
you for your invaluable help!
I am glad I can help.
Post by Paul Jakma
I've commited the fixes to CVS - I still have to fix the 'backlink'
though, so CVS won't just yet work 100% correctly with Virtual links.
(But you, Milan, had another problem report with virtual links,
didn't you?)
No, I reported problem with lost neighbors (vlinks too) after deleting
network from interface.

But when you speak about vlinks I have other specialty:
Imagine network:

--bbone-- R1 --Area x(vlink)--- R2 --Area y

when R1 and R2 is gated all is ok
when R1 and R2 is quagga all is ok
when R1 is quagga and R2 is gated all is ok
when R2 is gated an R2 is quagga problem : quagga see only area x,y and
not bbone's routes

I did'nt investigate it too widely but when you want know something
about vlinks :-). May be problem will be with gated. When I will have a
little time , I will test it in detail. It is longer time when I tested
it. It was probably version 0.98.3.


Best regards,

Milan Kocian
Paul Jakma
2005-10-18 06:23:47 UTC
Permalink
Post by Milan Kocián
Post by Paul Jakma
Speaking of which, anyone fancy testing a patch that adds sub-second
hello's with 1s dead-interval (aka "fast-hello") to ospfd?
Interesting. When you want I can test it in my home network.
It's at:

http://hibernia.jakma.org/~paul/patches/quagga-ospfd-fast-hello.diff

Enable it with 'ip ospf dead-interval minimal hello-multiplier X',
for 0 < X <= 10. It must set on all routers on a segment (though, X
doesn't have to be the same).

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
ether leak
Andrew J. Schorr
2005-10-18 14:11:09 UTC
Permalink
Post by Paul Jakma
Post by Paul Jakma
Speaking of which, anyone fancy testing a patch that adds sub-second
hello's with 1s dead-interval (aka "fast-hello") to ospfd?
http://hibernia.jakma.org/~paul/patches/quagga-ospfd-fast-hello.diff
Enable it with 'ip ospf dead-interval minimal hello-multiplier X',
for 0 < X <= 10. It must set on all routers on a segment (though, X
doesn't have to be the same).
Thanks, this is a good feature to have, and I will try to summon up the courage
to test this. Do you know if this should interoperate properly with similarly
configured Cisco IOS 12.1 routers?

Regards,
Andy
Paul Jakma
2005-10-18 14:22:59 UTC
Permalink
Post by Andrew J. Schorr
Thanks, this is a good feature to have, and I will try to summon up
the courage to test this.
It seems to work here.

If you have problems you can always back to normal >= 1s hello and
dead-time. The patch shouldn't have an impact then (hasn't here at
least, on two different machines).
Post by Andrew J. Schorr
Do you know if this should interoperate properly with similarly
configured Cisco IOS 12.1 routers?
Maybe. Want to find out and let me know? :)

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
panic("aha1740.c"); /* Goodbye */
linux-2.2.16/drivers/scsi/aha1740.c
Milan Kocián
2005-09-26 15:36:40 UTC
Permalink
Post by Paul Jakma
Post by Milan Kocián
Hello,
OSPF: Received signal 11 at 1127214511 (si_addr 0x1b0a, PC 0x242f2a0c);
aborting...
Program counter: /usr/lib/libzebra.so.0(listnode_lookup
+0x3b)[0x242f2a0c]
Arg, what is this damn crash? Driving me nuts. I can't reproduce it
either. I'm not using areas, but AFAIK neither is Stanislav (is that
correct Stanislav?).
The answers lie in the below somehow, but we need /full/ symbol
information? Can you try get a valgrind that shows these 'invalid
reads' and 'address is ... inside block ... freed' traces with full
symbolic information? Ie compile with --enable-gcc-rdynamic
--enable-shared if you can (CFLAGS="-O -g3" too for good measure).
Ok, I will compile with it and I will try to test it.
--
Milan Kocián <***@wq.cz>
Milan Kocián
2005-09-29 08:41:09 UTC
Permalink
Post by Paul Jakma
The answers lie in the below somehow, but we need /full/ symbol
information? Can you try get a valgrind that shows these 'invalid
reads' and 'address is ... inside block ... freed' traces with full
symbolic information? Ie compile with --enable-gcc-rdynamic
--enable-shared if you can (CFLAGS="-O -g3" too for good measure).
Hallo,

I compiled binaries with required options and here is valgrind output
with more info:

==19621== Invalid read of size 4
==19621== at 0x1B9378D7: ospf_vertex_add_parent (ospf_spf.c:204)
==19621== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621== Address 0x1BEA0C78 is 8 bytes inside a block of size 12
free'd
==19621== at 0x1B904B04: free (vg_replace_malloc.c:152)
==19621== by 0x1B9993BD: zfree (memory.c:106)
==19621== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==19621== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==19621== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==19621== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621==
==19621== Invalid read of size 4
==19621== at 0x1B937902: ospf_vertex_add_parent (ospf_spf.c:207)
==19621== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621== Address 0x1BEA0C78 is 8 bytes inside a block of size 12
free'd
==19621== at 0x1B904B04: free (vg_replace_malloc.c:152)
==19621== by 0x1B9993BD: zfree (memory.c:106)
==19621== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==19621== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==19621== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==19621== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621==
==19621== Invalid read of size 4
==19621== at 0x1B937916: ospf_vertex_add_parent (ospf_spf.c:208)
==19621== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621== Address 0x1BEA0C78 is 8 bytes inside a block of size 12
free'd
==19621== at 0x1B904B04: free (vg_replace_malloc.c:152)
==19621== by 0x1B9993BD: zfree (memory.c:106)
==19621== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==19621== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==19621== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==19621== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621==
==19621== Invalid read of size 4
==19621== at 0x1B939B2A: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:807)
==19621== by 0x1B9393BB: ospf_intra_add_router (ospf_route.c:411)
==19621== by 0x1B938A58: ospf_spf_calculate (ospf_spf.c:1074)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621== Address 0x1BEA0C70 is 0 bytes inside a block of size 12
free'd
==19621== at 0x1B904B04: free (vg_replace_malloc.c:152)
==19621== by 0x1B9993BD: zfree (memory.c:106)
==19621== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==19621== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==19621== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==19621== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621==
==19621== Invalid read of size 4
==19621== at 0x1B939B34: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:809)
==19621== by 0x1B9393BB: ospf_intra_add_router (ospf_route.c:411)
==19621== by 0x1B938A58: ospf_spf_calculate (ospf_spf.c:1074)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621== Address 0x1BEA0C74 is 4 bytes inside a block of size 12
free'd
==19621== at 0x1B904B04: free (vg_replace_malloc.c:152)
==19621== by 0x1B9993BD: zfree (memory.c:106)
==19621== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==19621== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==19621== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==19621== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621==
==19621== Invalid read of size 4
==19621== at 0x1B939B4B: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:812)
==19621== by 0x1B9393BB: ospf_intra_add_router (ospf_route.c:411)
==19621== by 0x1B938A58: ospf_spf_calculate (ospf_spf.c:1074)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621== Address 0x1BEA0C74 is 4 bytes inside a block of size 12
free'd
==19621== at 0x1B904B04: free (vg_replace_malloc.c:152)
==19621== by 0x1B9993BD: zfree (memory.c:106)
==19621== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==19621== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==19621== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==19621== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621==
==19621== Invalid read of size 4
==19621== at 0x1B939B50: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:813)
==19621== by 0x1B9393BB: ospf_intra_add_router (ospf_route.c:411)
==19621== by 0x1B938A58: ospf_spf_calculate (ospf_spf.c:1074)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621== Address 0x1BEA0C70 is 0 bytes inside a block of size 12
free'd
==19621== at 0x1B904B04: free (vg_replace_malloc.c:152)
==19621== by 0x1B9993BD: zfree (memory.c:106)
==19621== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==19621== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==19621== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==19621== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621==
==19621== Invalid read of size 4
==19621== at 0x1B939B2A: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:807)
==19621== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==19621== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621== Address 0x1BEA0C70 is 0 bytes inside a block of size 12
free'd
==19621== at 0x1B904B04: free (vg_replace_malloc.c:152)
==19621== by 0x1B9993BD: zfree (memory.c:106)
==19621== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==19621== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==19621== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==19621== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621==
==19621== Invalid read of size 4
==19621== at 0x1B939B34: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:809)
==19621== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==19621== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621== Address 0x1BEA0C74 is 4 bytes inside a block of size 12
free'd
==19621== at 0x1B904B04: free (vg_replace_malloc.c:152)
==19621== by 0x1B9993BD: zfree (memory.c:106)
==19621== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==19621== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==19621== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==19621== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621==
==19621== Invalid read of size 4
==19621== at 0x1B939B4B: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:812)
==19621== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==19621== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621== Address 0x1BEA0C74 is 4 bytes inside a block of size 12
free'd
==19621== at 0x1B904B04: free (vg_replace_malloc.c:152)
==19621== by 0x1B9993BD: zfree (memory.c:106)
==19621== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==19621== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==19621== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==19621== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621==
==19621== Invalid read of size 4
==19621== at 0x1B939B50: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:813)
==19621== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==19621== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621== Address 0x1BEA0C70 is 0 bytes inside a block of size 12
free'd
==19621== at 0x1B904B04: free (vg_replace_malloc.c:152)
==19621== by 0x1B9993BD: zfree (memory.c:106)
==19621== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==19621== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==19621== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==19621== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)

But crash log is without change.


Best regards,
--
Milan Kocián <***@wq.cz>
Hasso Tepper
2005-09-29 10:41:21 UTC
Permalink
Post by Milan Kocián
==19621== Invalid read of size 4
==19621== at 0x1B9378D7: ospf_vertex_add_parent (ospf_spf.c:204)
==19621== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621== Address 0x1BEA0C78 is 8 bytes inside a block of size 12
free'd
==19621== at 0x1B904B04: free (vg_replace_malloc.c:152)
==19621== by 0x1B9993BD: zfree (memory.c:106)
==19621== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==19621== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==19621== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==19621== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
[snip]
Post by Milan Kocián
==19621== Invalid read of size 4
==19621== at 0x1B939B50: ospf_route_copy_nexthops_from_vertex
(ospf_route.c:813)
==19621== by 0x1B9394BF: ospf_intra_add_transit (ospf_route.c:475)
==19621== by 0x1B938A64: ospf_spf_calculate (ospf_spf.c:1076)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621== Address 0x1BEA0C70 is 0 bytes inside a block of size 12
free'd
==19621== at 0x1B904B04: free (vg_replace_malloc.c:152)
==19621== by 0x1B9993BD: zfree (memory.c:106)
==19621== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==19621== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==19621== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
==19621== by 0x1B938A23: ospf_spf_calculate (ospf_spf.c:1047)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
OK. That should be really helpful. But before to dig into it ... Vincenzo,
it's your code, so you should have at least ideas what exactly happens and
why? ;)
--
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator
Paul Jakma
2005-09-29 10:52:36 UTC
Permalink
Post by Milan Kocián
==19621== Invalid read of size 4
==19621== at 0x1B9378D7: ospf_vertex_add_parent (ospf_spf.c:204)
==19621== by 0x1B938A46: ospf_spf_calculate (ospf_spf.c:1064)
==19621== by 0x1B938B36: ospf_spf_calculate_timer (ospf_spf.c:1126)
==19621== by 0x1B99804D: thread_call (thread.c:891)
==19621== by 0x804945B: main (ospf_main.c:325)
==19621== Address 0x1BEA0C78 is 8 bytes inside a block of size 12
free'd
==19621== at 0x1B904B04: free (vg_replace_malloc.c:152)
==19621== by 0x1B9993BD: zfree (memory.c:106)
==19621== by 0x1B9375FE: vertex_nexthop_free (ospf_spf.c:87)
==19621== by 0x1B9376DE: ospf_vertex_free (ospf_spf.c:136)
==19621== by 0x1B9383CD: ospf_spf_next (ospf_spf.c:796)
Hmm, it must be ospf_nexthop_merge() which calls ospf_vertex_free.

I think the reason I can't reproduce your problem is because my
network just happens to result in an SPF that I never hit any of
these vertex_free's (further, there's a leak somewhere, cause neither
struct vertex's or struct vertext_nexthop's are ever freed for me.).
Explains why my poison/verify free/alloc hack doesn't pick up
anything.

Hmm.. getting warmer at least. So now we need to find out exactly
where the leak is.

The attached patch /might/ stop the crash, or make it crash 'better'
- but would still leave the slow leak. And likely isn't the /correct/
answer. Hmm.. (not tested - waiting on compile to finish).

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
You need to upgrade your VESA local bus to a MasterCard local bus.
Paul Jakma
2005-09-29 11:55:23 UTC
Permalink
Post by Paul Jakma
The attached patch /might/ stop the crash, or make it crash
'better' - but would still leave the slow leak. And likely isn't
the /correct/ answer. Hmm.. (not tested - waiting on compile to
finish).
Don't try it - working on a better one (i think i have the leak fixed
at least, which should make the memory corruption a lot easier to
trigger).

regards,
--
Paul Jakma ***@clubi.ie ***@jakma.org Key ID: 64A2FF6A
Fortune:
Acting is an art which consists of keeping the audience from coughing.
Loading...