Thursday, April 2, 2026

FRR Troubleshooting: Fixing Kernel Route Redistribution Issues on Boot

If you are running FRRouting (FRR), you might have encountered a frustrating quirk: after a system reboot, kernel routes (like a static default route) fail to redistribute into RIP or OSPF. Curiously, as soon as you manually restart the FRR service, everything works perfectly.

Here is a breakdown of why this happens and how to fix it for good.

The Problem

During a cold boot, FRR starts its daemons (Zebra, RIPd, etc.), but routes defined at the OS level aren’t being advertised to neighbors. This breaks connectivity after an automated reboot and forces manual intervention (systemctl restart frr), which defeats the purpose of an automated routing stack.

The Root Cause: The Startup "Race Condition"

The culprit is typically a race condition between your network manager (like Netplan or systemd-networkd) and the FRR service.


    Interface Renaming: Modern tools like Netplan often rename interfaces during boot (e.g., from eth0 to enp1s0). If FRR initializes while an interface is in transition, Zebra may ignore routes associated with a "missing" or "changing" interface.

    The "Up" State: If Zebra scans the kernel routing table before the physical or virtual interface is fully marked as "UP" and operational, the routing protocols (like RIP) will deem the route invalid for redistribution.

    Dependency Timing: By default, the FRR service might attempt to load before the network is "fully online," leading to a failed initial synchronization between the kernel and Zebra.


Step-by-Step Solution

The fix involves modifying the systemd unit file to ensure FRR waits until the network stack is completely ready and interface names are finalized.

1. Edit the Service Configuration

Instead of editing the file in /lib/systemd/system/ directly, use a systemd "override" to keep your changes safe during updates:

bash


sudo systemctl edit frr.service


2. Adjust Network Dependencies

In the editor that opens, insert the following lines:

ini


[Unit]

After=network-online.target

Wants=network-online.target


This tells systemd that FRR should only start after the system reports that the network is fully up and functional (network-online.target).

3. Reload and Apply

Save the file and exit the editor. Then, reload the systemd manager to pick up the changes:

bash


sudo systemctl daemon-reload


Verification

On your next reboot, you can verify that Zebra is correctly picking up the routes by checking your FRR logs or using the following debug commands in vtysh:

vtysh


debug zebra kernel

debug rip events



Summary

In modern Linux distributions where interface naming and IP addressing are dynamic, startup timing is everything. Forcing FRR to wait for the network to be "online" ensures that when the daemons ask the kernel for routes, the kernel actually has the final, stable answers ready to give.

No comments:

Post a Comment