Your VPS Server!

Monday, July 8, 2019

BGP: To filter or not to filter by prefix size. That is the question


Introduction


In order to write these post the R+D team and the WARP team joint together after analyzing some security incidents related with BGP, network accessibility, network hijacks and network visibilities.


As you probably know, in the BGP world, there are dozen of ways to filter prefixes. The goal of this post is to show some recommendations in order to have a more stable network, keep the visibility of your prefixes as much as possible and of course reduce the calls to the NOC


Scenario

Many ISPs around the world can not (or do not wish) to receive the full routing table (DFZ) which by the time of writing this text is about 750.000 prefixes (IPv4)


  • The above description could be due to some -not limited- of the following causes:
  • The routers does not have enough RAM to learn all the prefixes (please also note that there could be also several BGP session at the same time)
  • The network admin wants to save CPU cycles in their devices
  • The upstream providers is not announcing the full routing table
  • The network admin wishes to keep his network simple and easy

Anyhow, in the end of the story, the router is not learning the full routing table.



Problem

Not learning the full routing table can bring many partial inconveniences that in the end brings connectivity problems, users complaints, issues accessing some web sites and more.



Why?

Please try to imagine the following case



  1. I have a router (property of EXAMPLE) in Internet that is ONLY learning a partial DFZ
  2. The routers mentioned in “1” is only learning “big network”, over /20. I mean, the router learns /20, /19, /18, etc. (of course, we are talking about IPv4)
  3. Based in the configuration indicated in number “2”, the router is not going to learn prefixes such as /21, /22, /23 nor /24
  4. So far so good. However, somewhere in Internet, some people hijacked an /21 to the company ACME (hijack, bad configuration, whatever)
  5. ACME decides to perform more specific prefix advertisements, so, he takes his /21 and announces 8 /24 prefixes to the DFZ.
  6. Because of the filters configured by EXAMPLE, he will never learn the legitimate /24 prefixes advertised by ACME
  7. EXAMPLE will keep learning the hijacked /21 which obviously will bring connectivity problems aim to the legitimate owner of the network




Topology

The following diagram represents the hypothesis presented in the previous point in a graphic manner to facilitate its understanding.








Recommendation

The following recommendations only for networks that CAN NOT LEARN the full DFZ. One more time, they were found after studying many cases of connectivity problem and network hijacks.:


  • Do not filter more specific network,. We mean, it’s better to learn more specific networks such as: /24, /23, /22 (IPv4 world)
  • Filter using AS_PATH (like, 2,3 or 4 deep ASs)
  • Please create your ROS and use RPKI



Some examples (Cisco like)1. Learning only /22, /23 or /24:


router bgp 65002
  neighbor 10.0.0.1 remote-as 65001
  neighbor 10.0.0.1 route-map FILTRO-IN in
!
ip prefix-list SMALLNETWORKS seq 5 permit 0.0.0.0/0 ge 22 le 24
!
route-map FILTRO-IN permit 10
  match ip address prefix-list SMALLNETWORKS
!



2. Only learning two AS beyond us:

router bgp 65001
  neighbor 10.0.0.2 remote-as 65002
  neighbor 10.0.0.2 route-map ASFILTER-IN in


!
ip as-path access-list 5 permit ^[0-9]+_$
ip as-path access-list 5 permit ^[0-9]+ [0-9]+_$
!
route-map ASFILTER-IN permit 10
  match as-path 5
!



More information

Prefix hijack demonstration 1 / 2 (in Spanish):
https://www.youtube.com/watch?v=X5RNSs8y8Ao&t=39s


Prefix hijack demonstration 2/2 (in Spanish):
https://www.youtube.com/watch?v=m51WtuEZOKI


BGP Prefix-Based Outbound Route Filtering
http://www.cisco.com/c/en/us/td/docs/ios/12_2s/feature/guide/fsbgporf.html


Ejemplo de configuración de BGP con dos prestadores de servicio diferentes (conexiones múltiples)
http://www.cisco.com/cisco/web/support/LA/7/75/75930_27.html


Certificación de Recursos (RPKI)
http://www.lacnic.net/web/lacnic/certificacion-de-recursos-rpki


Información General sobre Certificación de Recursos (RPKI)
http://www.lacnic.net/web/lacnic/informacion-general-rpki




Authors:

Dario Gomez (https://twitter.com/daro_ua)

Alejandro Acosta (https://twitter.com/ITandNetworking)

Thursday, August 24, 2017

Google DNS --- Figuring out which DNS Cluster you are using

(this is -almost- a copy / paste of an email sent by Erik Sundberg to nanog mailing list on August 23).

This post is being posted with his explicit permission.
I sent this out on the outage list, with a lots of good feedback sent to me. So I figured it would be useful to share the information on nanog as well. A couple months ago had to troubleshoot a google DNS issue with Google’s NOC. Below is some helpful information on how to determine which DNS Cluster you are going to. Let’s remember that Google runs DNS Anycast for DNS queries to 8.8.8.8 and 8.8.4.4. Anycast routes your DNS queries to the closes DNS cluster based on the best route / lowest metric to 8.8.8.8/8.8.4.4. Google has deployed multiple DNS clusters across the world and each DNS Cluster has multiple servers. So a DNS query in Chicago will go to a different DNS clusters than queries from a device in Atlanta or New York.

How to get a list of google DNS Cluster’s.
dig -t TXT +short locations.publicdns.goog. @8.8.8.8

How to print this list in a table format. 
-- Script from: https://developers.google.com/speed/public-dns/faq -- 

#!/bin/bash
IFS="\"$IFS"
for LOC in $(dig -t TXT +short locations.publicdns.goog. @8.8.8.8)
do
  case $LOC in
    '') : ;;
    *.*|*:*) printf '%s ' ${LOC} ;;
    *) printf '%s\n' ${LOC} ;;
  esac
done
---------------

Which will give you a list like below. This is all of the IP network’s that 
google uses for their DNS Clusters and their associated locations. 
74.125.18.0/26 iad
74.125.18.64/26 iad
74.125.18.128/26 syd
74.125.18.192/26 lhr
74.125.19.0/24 mrn
74.125.41.0/24 tpe
74.125.42.0/24 atl
74.125.44.0/24 mrn
74.125.45.0/24 tul
74.125.46.0/24 lpp
74.125.47.0/24 bru
74.125.72.0/24 cbf
74.125.73.0/24 bru
74.125.74.0/24 lpp
74.125.75.0/24 chs
74.125.76.0/24 cbf
74.125.77.0/24 chs
74.125.79.0/24 lpp
74.125.80.0/24 dls
74.125.81.0/24 dub
74.125.92.0/24 mrn
74.125.93.0/24 cbf
74.125.112.0/24 lpp
74.125.113.0/24 cbf
74.125.115.0/24 tul
74.125.176.0/24 mrn
74.125.177.0/24 atl
74.125.179.0/24 cbf
74.125.181.0/24 bru
74.125.182.0/24 cbf
74.125.183.0/24 cbf
74.125.184.0/24 chs
74.125.186.0/24 dls
74.125.187.0/24 dls
74.125.190.0/24 sin
74.125.191.0/24 tul
172.217.32.0/26 lhr
172.217.32.64/26 lhr
172.217.32.128/26 sin
172.217.33.0/26 syd
172.217.33.64/26 syd
172.217.33.128/26 fra
172.217.33.192/26 fra
172.217.34.0/26 fra
172.217.34.64/26 bom
172.217.34.192/26 bom
172.217.35.0/24 gru
172.217.36.0/24 atl
172.217.37.0/24 gru
173.194.90.0/24 cbf
173.194.91.0/24 scl
173.194.93.0/24 tpe
173.194.94.0/24 cbf
173.194.95.0/24 tul
173.194.97.0/24 chs
173.194.98.0/24 lpp
173.194.99.0/24 tul
173.194.100.0/24 mrn
173.194.101.0/24 tul
173.194.102.0/24 atl
173.194.103.0/24 cbf
173.194.168.0/26 nrt
173.194.168.64/26 nrt
173.194.168.128/26 nrt
173.194.168.192/26 iad
173.194.169.0/24 grq
173.194.170.0/24 grq
173.194.171.0/24 tpe
2404:6800:4000::/48 bom
2404:6800:4003::/48 sin
2404:6800:4006::/48 syd
2404:6800:4008::/48 tpe
2404:6800:400b::/48 nrt
2607:f8b0:4001::/48 cbf
2607:f8b0:4002::/48 atl
2607:f8b0:4003::/48 tul
2607:f8b0:4004::/48 iad
2607:f8b0:400c::/48 chs
2607:f8b0:400d::/48 mrn
2607:f8b0:400e::/48 dls
2800:3f0:4001::/48 gru
2800:3f0:4003::/48 scl
2a00:1450:4001::/48 fra
2a00:1450:4009::/48 lhr
2a00:1450:400b::/48 dub
2a00:1450:400c::/48 bru
2a00:1450:4010::/48 lpp
2a00:1450:4013::/48 grq

There are
IPv4 Networks: 68
IPv6 Networks: 20
DNS Cluster’s Identified by POP Code’s: 20
DNS Clusters identified by POP Code to City, State, or Country. Not all of these are 
Google’s Core Datacenters, some of them are Edge Points of Presences (POPs). 
https://peering.google.com/#/infrastructure and 
https://www.google.com/about/datacenters/inside/locations/ 


Most of these are airport codes, it did my best to get the location correct.
iad          Washington, DC
syd         Sydney, Australia
lhr          London, UK
mrn        Lenoir, NC
tpe         Taiwan
atl          Altanta, GA
tul          Tulsa, OK
lpp          Findland
bru         Brussels, Belgium
cbf         Council Bluffs, IA
chs         Charleston, SC
dls          The Dalles, Oregon
dub        Dublin, Ireland
sin          Singapore
fra          Frankfort, Germany
bom       Mumbai, India
gru         Sao Paulo, Brazil
scl          Santiago, Chile
nrt          Tokyo, Japan
grq         Groningen, Netherlans

Which Google DNS Server Cluster am I using. I am testing this from Chicago, IL
# dig o-o.myaddr.l.google.com -t txt +short @8.8.8.8 "173.194.94.135" <<<<<<DNS Server IP, reference the list above to get the cluster, Council Bluffs, IA 
"edns0-client-subnet 207.xxx.xxx.0/24" <<<< Your Source IP Block Side note, the google 
dns servers will not respond to DNS queries to the Cluster’s Member’s IP, they will 
only respond to dns queries to 8.8.8.8 and 8.8.4.4. So the following will not work.

dig google.com @173.194.94.135 Now to see the DNS Cluster load balancing in action. 
I am doing a dig query from our Telx\Digital Realty POP in Atlanta, GA. We do peer 
with google at this location. I dig a dig query about 10 times and received the 
following unique dns cluster member ip’s as responses. 

dig o-o.myaddr.l.google.com -t txt +short @8.8.8.8
"74.125.42.138"
"173.194.102.132"
"74.125.177.5"
"74.125.177.74"
"74.125.177.71"
"74.125.177.4"

Which all are Google DNS Networks in Atlanta.
74.125.42.0/24

atl

74.125.177.0/24

atl

172.217.36.0/24

atl

173.194.102.0/24

atl

2607:f8b0:4002::/48

atl



Just thought it would be helpful when troubleshooting google DNS issues.



(this is -almost- a copy / paste of an email sent by Erik Sundberg to nanog mailing list on August 23 2017).
 This post is being posted with his explicit permission.

Monday, August 21, 2017

My humble results. Testing ping to loopback using v4 and v6

Hello there,
  Regarding RTT v4 vs v6 I did something "interesting" recently, would like to know your thoughs.
  If you ping6 your loopback (let´s say 1000 packets) interface with Windows or Linux, v6 is faster.
  Now try the same on MAC (El capitan for example).., v6 is 20-25% slower.
  I did the above with many devices (and asked some friends) and the behavior was pretty much the same.


MAC:
--- 127.0.0.1 ping statistics ---
100 packets transmitted, 100 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.037/0.098/1.062/0.112 ms

--- ::1 ping6 statistics ---
100 packets transmitted, 100 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.058/0.120/0.194/0.027 ms




Linux:
--- 127.0.0.1 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 98999ms
rtt min/avg/max/mdev = 0.015/0.021/0.049/0.007 ms

--- ::1 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 99013ms
rtt min/avg/max/mdev = 0.019/0.031/0.040/0.004 ms



Windows 10:

Ping statistics for ::1:
    Packets: Sent = 100, Received = 100, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 0ms, Average = 0ms



Ping statistics for 127.0.0.1:
    Packets: Sent = 100, Received = 100, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 4ms, Average = 0ms



Bye,

Monday, February 29, 2016

Read a BGP live stream from CAIDA

Objective
  Read a BGP live stream from CAIDA and insert them into a BGP session

What do we need
  bgpreader from the bgpstream core package provided by Caida
  bgp_simple.pl obtained in github

Overview
  We will read the BGP live stream feed using bgpreader, then the standard output of it will be redirected to a pipe file (mkfifo) where a perl script called bgpsimple will be reading this file. This very same script will established the BGP session against a BGP speaker and announce the prefixes received in the stream.

LAB Topology
  The configuration was already tested in Cisco & Quagga
  The BGP Speaker (Cisco/Quagga) has the IPv4 address 192.168.1.1
  The BGP Simple Linux box has the IP 192.168.1.2

How does it works?
  bgpreader has the ability to write his output in the -m format used by libbgpdump (by RIPENCC), this is the very same format bgpsimple uses as stdin. That's why myroutes is a PIPE file (created with mkfifo).

Steps:  

INSTALL BGP READER - UBUNTU 15.04

First install general some packages:
apt-get install apt-file libsqlite3-dev libsqlite3 libmysqlclient-dev libmysqlclient
apt-get install libcurl-dev libcurl  autoconf git libssl-dev
apt-get install build-essential zlib1g-dev libbz2-dev
apt-get install libtool git
apt-get install zlib1g-dev

Also intall wandio
wandio-1.0.3
git clone https://github.com/alistairking/wandio

./configure

cd wandio
./bootstrap.sh
./configure && ./make && ./make install
wandiocat http://www.apple.com/library/test/success.html

to test wandio:
wandiocat http://www.apple.com/library/test/success.html

Download bgp reader tarball from:
https://bgpstream.caida.org/download

#ldconfig (before testing)

#mkfifo myroutes

to test bgpreader:
./bgpreader -p caida-bmp -w 1453912260 -m
(wait some seconds and then you will see something)

# git clone https://github.com/xdel/bgpsimple


Finally run everything
In two separate terminals (or any other way you would like to do it):

./bgpreader -p caida-bmp -w 1453912260 -m > /usr/src/bgpsimple/myroutes
./bgp_simple.pl -myas 65000 -myip 192.168.1.2 -peerip 192.168.1.1 -peeras 65000 -p myroutes

One more time, what will happen behind this?
bgpreader will read an online feed from a project called caida-bmp with starting timestamp 1453912260 (Jan 27 2016, 16:31) in "-m" format, It means a libbgpdump format (see references). The stardard output of all this will be send to the file /usr/src/bgpsimple/myroutes which is a "pipe file". At the same time, bgp_simple.pl will create an iBGP session againts peer 192.168.1.1/AS65000 (a bgp speaker such as Quagga or Cisco). bgp_simple.pl will read myroutes files and send what it seems in this file thru the iBGP Session.

Important information
- The BGP Session won't be established until there is something in the file myroutes
- eBGP multi-hop session are allowed
- You have to wait short time (few seconds) until bgpreaders start to actually see something and bgp_simple.pl starts to announce to the BGP peer

References / More information:
-Part of the work was based on:
http://evilrouters.net/2009/08/21/getting-bgp-routes-into-dynamips-with-video/

- Caida BGP Stream:
https://bgpstream.caida.org/

- bgpreader info:
https://bgpstream.caida.org/docs/tools/bgpreader

- RIPE NCC libbgpdump:
http://www.ris.ripe.net/source/bgpdump/

- Introduction of "Named Pipes" (pipe files in Linux):
http://www.linuxjournal.com/article/2156

Wednesday, February 17, 2016

Animation: The sad tale of the ISP that did not deploy IPv6


Hello,
  The following animation is based on the story called: "The sad tale of the ISP that didn't deploy IPv6" [1]. Hope you enjoy it:


[1] http://portalipv6.lacnic.net/en/the-sad-tale-of-the-isp-that-didnt-deploy-ipv6/