Discussion:
[quagga-users 5744] ospf hello packets not recieved
Mike Smith
2005-10-27 19:29:33 UTC
Permalink
I'm running quagga ospf 0.98.5 on two linux boxes with a point-to-point link
between them. Most of the time everything works fine, but occasionally after
restarting the ospfd I find that one of the boxes does not see hello packets
from the other. This results in the other box reporting a neighbor in
init/DROther state, and of course no routes get exchanged.
Running tcpdump I see hello packets reported on both ends of the link.
Turning ospf debug on I see both ends sending hello packets but only one end
reports recieving hello packets.
Can anyone suggest what I should look at to figure out why ospfd does not
report receiving hello packets.

Thanks,
Mike
Andrew J. Schorr
2005-10-27 19:45:44 UTC
Permalink
Post by Mike Smith
I'm running quagga ospf 0.98.5 on two linux boxes with a point-to-point link
between them. Most of the time everything works fine, but occasionally after
restarting the ospfd I find that one of the boxes does not see hello packets
from the other. This results in the other box reporting a neighbor in
init/DROther state, and of course no routes get exchanged.
Running tcpdump I see hello packets reported on both ends of the link.
Turning ospf debug on I see both ends sending hello packets but only one end
reports recieving hello packets.
Can anyone suggest what I should look at to figure out why ospfd does not
report receiving hello packets.
A first step would be to look at OSPF group memberships.
Compare the output of 'netstat -g' and ospfd "show ip ospf interface"

Regards,
Andy
Andrew J. Schorr
2005-10-27 20:00:10 UTC
Permalink
Post by Andrew J. Schorr
A first step would be to look at OSPF group memberships.
Compare the output of 'netstat -g' and ospfd "show ip ospf interface"
Oops, I meant to say "multicast group memberships".

Regards,
Andy
MIke Smith
2005-10-27 20:02:53 UTC
Permalink
Post by Andrew J. Schorr
Post by Mike Smith
I'm running quagga ospf 0.98.5 on two linux boxes with a point-to-point link
between them. Most of the time everything works fine, but occasionally after
restarting the ospfd I find that one of the boxes does not see hello packets
from the other. This results in the other box reporting a neighbor in
init/DROther state, and of course no routes get exchanged.
Running tcpdump I see hello packets reported on both ends of the link.
Turning ospf debug on I see both ends sending hello packets but only one end
reports recieving hello packets.
Can anyone suggest what I should look at to figure out why ospfd does not
report receiving hello packets.
A first step would be to look at OSPF group memberships.
Compare the output of 'netstat -g' and ospfd "show ip ospf interface"
Regards,
Andy
I'm not sure what that tells me. The output looks the same when the link
is working and when it isn't. The following is from the case where
hellos are not being received. The point-to-point link is hdlc0

sh-2.05b# netstat -g
IPv6/IPv4 Group Memberships
Interface RefCnt Group
--------------- ------ ---------------------
lo 1 224.0.0.1
eth0 1 224.0.0.6
eth0 1 224.0.0.5
eth0 1 224.0.0.1
hdlc0 1 224.0.0.5
hdlc0 1 224.0.0.1
hdlc1 1 224.0.0.5
hdlc1 1 224.0.0.1
sh-2.05b#


localhost> show ip ospf interface hdlc0
hdlc0 is up
Internet Address 192.168.171.209/32, Peer 192.168.171.192, Area
0.0.0.0
Router ID 192.168.171.209, Network Type POINTOPOINT, Cost: 174
Transmit Delay is 1 sec, State Point-To-Point, Priority 1
No designated router on this network
No backup designated router on this network
Timer intervals configured, Hello 2, Dead 8, Wait 8, Retransmit 3
Hello due in 00:00:00
Neighbor Count is 0, Adjacent neighbor count is 0
localhost>
Andrew J. Schorr
2005-10-27 20:11:09 UTC
Permalink
Post by MIke Smith
I'm not sure what that tells me. The output looks the same when the link
is working and when it isn't. The following is from the case where
hellos are not being received. The point-to-point link is hdlc0
sh-2.05b# netstat -g
IPv6/IPv4 Group Memberships
Interface RefCnt Group
--------------- ------ ---------------------
lo 1 224.0.0.1
eth0 1 224.0.0.6
eth0 1 224.0.0.5
eth0 1 224.0.0.1
hdlc0 1 224.0.0.5
hdlc0 1 224.0.0.1
hdlc1 1 224.0.0.5
hdlc1 1 224.0.0.1
sh-2.05b#
That seems to look OK (224.0.0.5 is the OSPF-all-routers multicast
group that the hello packets are sent to).
Post by MIke Smith
localhost> show ip ospf interface hdlc0
hdlc0 is up
Internet Address 192.168.171.209/32, Peer 192.168.171.192, Area
0.0.0.0
Router ID 192.168.171.209, Network Type POINTOPOINT, Cost: 174
Transmit Delay is 1 sec, State Point-To-Point, Priority 1
No designated router on this network
No backup designated router on this network
Timer intervals configured, Hello 2, Dead 8, Wait 8, Retransmit 3
Hello due in 00:00:00
Neighbor Count is 0, Adjacent neighbor count is 0
Ah, I forgot that the patch to show the interface multicast group memberships
is not in 0.98, you'd need 0.99.1 or a CVS snapshot to include that.
But no matter, it still probably ought to be working.

Are you seeing any interesting error messages in the log file (presuming
you have logging enabled)? If not, you may need to turn on debugging
to see if the hello packets are being received and discarded for
some reason. Or perhaps use strace to see whether ospfd is receiving
the packets...

Regards,
Andy
MIke Smith
2005-10-27 20:51:27 UTC
Permalink
Post by Andrew J. Schorr
Are you seeing any interesting error messages in the log file (presuming
you have logging enabled)? If not, you may need to turn on debugging
to see if the hello packets are being received and discarded for
some reason. Or perhaps use strace to see whether ospfd is receiving
the packets...
Regards,
Andy
I didn't see any interesting error messages in the log, but I'm not sure
which of the debug options would be useful. I had "debug ospf packet
hello recv detail" set but no entries were logged.

I ran strace on the ospf process and I didn't see any recvfrom entries,
so I'm guessing that means none of the hellos were being passed to the
ospf process (tcpdump still shows them being recvd though).
Unfortunately, when I stopped the strace that killed the ospf process as
well, and when I restarted ospf everything was working correctly again.

So maybe this means that the local ospf process didn't successfully join
the multicast groups. Short of trying version 99.1 is there any way I
can confirm this, and is there a debug log I can turn on that might tell
me why the join failed if that is the case?

Thanks,
Mike
Andrew J. Schorr
2005-10-27 21:00:17 UTC
Permalink
Post by MIke Smith
I didn't see any interesting error messages in the log, but I'm not sure
which of the debug options would be useful. I had "debug ospf packet
hello recv detail" set but no entries were logged.
I ran strace on the ospf process and I didn't see any recvfrom entries,
so I'm guessing that means none of the hellos were being passed to the
ospf process (tcpdump still shows them being recvd though).
That sounds correct. For some reason, ospfd must not be subscribed to those
multicast messages.
Post by MIke Smith
Unfortunately, when I stopped the strace that killed the ospf process as
well, and when I restarted ospf everything was working correctly again.
So maybe this means that the local ospf process didn't successfully join
the multicast groups. Short of trying version 99.1 is there any way I
can confirm this, and is there a debug log I can turn on that might tell
me why the join failed if that is the case?
Well, the code will always log an info message whenever it joins a
multicast group. So if you have logging configured at level info or debug,
for example like this (in ospfd.conf):

log file /var/log/quagga/ospfd.log informational

then you should see entries like this in ospfd.log whenever a multicast
group is joined:

interface <interface IP address> join AllSPFRouters Multicast group.

If you do not see those messages, then it means the daemon is not joining
the group.

Note that this is an area (logic to control multicast group membership) where
there are important patches in 0.99.1, so your best bet might be to upgrade to
0.99.1 (or a CVS snapshot) and see if you still have problems. I have a sense
that CVS may be better than 0.99.1, but I'm not sure. Paul has recently fixed
several OSPF algorithm bugs that were probably in 0.99.1.

Regards,
Andy

Loading...