Snarky Brill's techblog: Networking

Showing posts with label Networking. Show all posts

Wednesday, October 4, 2023

FortiGate PBR needs active route - really?

Imagine you have a classic IPsec tunnel between HQ and branch with a slight twist: Say you want to distribute some portion of the address space that is assigned to HQ over the tunnel to the branch and use the IPs in branch DMZ.

The picture shows the routes - HQ has 5.6.7.8 address on wan1 and there are two static routes from the ISP:

5.6.9.0/24 via 5.6.7.8 (use in HQ LAN)
5.6.10.0/24 via 5.6.7.8 (use in branch DMZ)

So we set up a VPN tunnel, set static routes on HQ side, set IPs and then we need PBR to make sure that despite the default route going over wan1 on branch side to the ISP, the traffic from 5.6.10.0/24 to anywhere is routed over the tunnel back to HQ. Why? Well, the branch ISP applies RPF. Everybody is supposed to apply RPF, because it is the best practice... or something (*). [https://www.rfc-editor.org/info/bcp38]

Anyway, the problem is that PBR on its own is apparently not enough. If you just set it up, it does not work. Since it has not been immediately apparent why to be I tried to google questions like "FortiGate PBR not working" and I eventually found one "solution" but no real answer what is going on here. To me specific, last comment here [https://community.fortinet.com/t5/Support-Forum/Policy-Based-Routing-PBR-not-being-applied/td-p/35673?m=166935] gives instructions that seems to work:

Add a new default route to routing table. The route needs to have same Administrative Distance as the active default route (towards ISP), next-hop is not needed, but the interface is vpn0 and the Priority has to be lower (= higher number) than the real active default route towards the ISP.

OK, it works, but why? The official manual [https://docs.fortinet.com/document/fortigate/6.4.2/administration-guide/144044/policy-routes] nor this [https://community.fortinet.com/t5/FortiGate/Technical-Tip-Configure-policy-routes-for-route-based-interface/ta-p/193376] is not really helpful and I found few other speculations on Reddit, Stack Overflow and random blogs, but nothing on the sport. So, the answer is...

RPF.

There is reverse path filtering on the tunnel interface and FortiGates use "feasible path" RPF rule by default, which means that with an active default route through the tunnel is enough to provide feasible path, even though the route is de-prioritized (to avoid causing random load-balancing that would very likely lead to random asymmetric routing and therefore random packetloss in this setting). Btw. the AD has to be same as the real static route, otherwise it will not be considered as a feasible path.

And another possibility is disabling the RPF with "src-check disable", like this:

branch # config system interface 

branch (interface) # edit vpn0

branch (vpn0) # show
config system interface
    edit "vpn0"
        set vdom "root"
        set allowaccess ping
        set type tunnel
        set src-check disable
        set snmp-index 19
        config ipv6
            set ip6-allowaccess ping
        end
        set interface "wan"
    next
end

(*) On more serious note - mocking BCP 38 is not fair. It is undoubtedly the best community-based and therefore Internet-native solution to great portion of the known cyber-attacks.

Thursday, May 8, 2014

Smokeping slavery madness

So once again I have been hit by an insane portion of weird, flawed and inherently unreliable design in Smokeping.

What the fuck is Smokeping? Well, it is an utility for measuring and graphing latency in network. It should be pretty straight-forward to use. You configure IP addresses and names of boxes to monitor and that't it? Well, not so fast... It is a bit poxy because it is Perl after all. You need some modules, you need to have multiple configuration files, you have to configure probes, bind them to targets, modify arguments for probes in target configs etc. There is quite a lot of how-to's for this and in fact it works almost out of box on most distributions.

But... The Smokeping beast has also a slave mode. Which means: You have one master that holds configurations and it performs measurements and it generate graphs, everything is the same as in standalone mode. But it also commands slaves to perform the same measurements and report results so the master saves the results to .rrd files and it create graphs.

This communication with slaves is performed by CGI in the Smokeping webroot. And here comes the problem: Privileges.

When Smokeping runs as a daemon it is executed under smokeping user, at least on Debian. So all files that it stores (on Debian usually these files end up in /var/lib/smokeping) have smokeping:smokeping user and group and privileges 644. That works for daemon to store data and for CGIs to read them.

But if you need CGIs to store data from slaves, it is not enough. The Smokeping (daemon perhaps?) creates proper .rrd files for the remote measurements from slaves, but these files are updated, so no data in are being displayed in graphs.

Solution:

chmod -R g+w /var/lib/smokeping

Saturday, October 12, 2013

Flaw in common network design pattern (or in understanding how to use it)?

There is an old and not-so-clever network design pattern like this:

I know this from CCN(A|P) but I have no idea who invented it. Cisco perhaps. Idea is to use IGP, nowadays it is naturally OSPF in most cases, for resolution of the L3 path towards a subnet in question. The subnet itself is terminated on a gateway. Two different gateways to be accurate. One acts as a default gateway, which means that it is active router in HSRP, so it holds "virtual" IP address as well as MAC address of the default gateway. It is the address that hosts in the subnet uses in their routing tables and subsequently in their ARP tables.

So far so good. Everything is up and running, one gateway is active, another one passive, both routers are redistributing connected routes to OSPF and both are able to deliver traffic from L3 networks towards clients in the L2 subnet in question.

But wait a moment... How do L3 switches know where to send the traffic over L2? Yes, routing table and ARP resolution. By default the ARP table records timeout is 4 hrs, right?

And how do L2 switches know the port to forward traffic? MAC address table (or switching table, depends on terminology). Timeout for MAC address table? 5 mins.

So what does the passive switch do when no traffic from hosts hits this switch, because it does not have active default GW address and nobody wants to communicate over L2 to it's "real IP". Well, obviously the MAC address table cleans out after a while but ARP records stays much much longer (and ARP records are refreshed when some traffic arrives and the L3 switch does not have proper ARP table records).

But... When there is no MAC address table record for the particular MAC on the standby L3 switch, the switch just broadcasts the traffic for that MAC to all ports. And it multiplies in the network further because in case you have prevailing vertical flows in your network it is likely that any other L2 switch in the chain does not know the MAC address either. Except the one which has the destination host connected. So it means you might have huge multiplication of useless traffic if your L3 switch attracts some traffic in OSPF for any subnet which it is standby for it's gateway.

It might several megs or even tens or hundred megs of constant scattered traffic in real life scenario in 1 Gig network.

And what to do to avoid this? Think of the design pattern... Set higher OSPF costs for redistributed routes on the standby router. Simple but effective, huh?

Sunday, December 30, 2012

OpenWRT insanity, shame and ignorancy

I have been hit by a idiotic, unreasonable, insane and stupid setting in OpenWRT:

/etc/sysctl.conf:

net.ipv4.netfilter.ip_conntrack_tcp_timeout_established=3600

Well what does it mean? Simple thing: Every long-lasting connections breaks after one hour. No matter whether I have TCP keepalive on, because TCP keepalive timeout defaults to 7200 seconds.

In practice you will see this:

brill@tapir ~ $ ssh milhouse.backbone.ignum.cz
Linux milhouse 2.6.32-5-amd64 #1 SMP Wed May 18 23:13:22 UTC 2011 x86_64

Keep keep your hands off this server!
Nikdo tu na nic nesahejte!

Last login: Sun Dec 30 00:46:47 2012 from 89.177.24.237

<no activity for more than 1 hour>
brill@milhouse:~$ Write failed: Broken pipe

brill@tapir ~ $

There was a flame on this topic on OpenWRT forum:

https://dev.openwrt.org/ticket/5777

And the result is: invalid, wontfix, fuckyourself.

Motherfuckers. They break everything. God... You can sacrifice filesystem, give up serial port, GPIO, LED diodes, whatever, but keep firewall working normally when you are building network appliance you idiots!