Thursday, August 22, 2024

A Practical Improvement in DNS Transport over UDP over IPv6

By Hugo Salgado and Alejandro Acosta


Introduction and problem statement

In this document we want to discuss an existing IETF draft (a working document that may become a standard) that caught our attention. This draft involves two fascinating universes: IPv6 and DNS. It introduces some best practices for carrying DNS over IPv6.


Its title is “DNS over IPv6 Best Practices” and it can be found here.


What is the document about and what problem does it seek to solve?

The document describes an approach to how Domain Name Protocol (DNS) should be carried over IPv6 [RFC8200].

Some operational issues have been identified in carrying DNS packets over IPv6 and the draft seeks to address them.


Technical context

The IPv6 protocol requires a minimum link MTU of 1280 octets. According to Section 5 “Packet Size Issues” of RFC8200, every link in the Internet must have an MTU of 1280 octets or greater. If a link cannot convey a 1280-octet packet in one piece, link-specific fragmentation and reassembly must be provided at a layer below IPv6.


Successful operation of PMTUD in an example adapted to 1280-byte MTU

Image source: https://www.slideshare.net/slideshow/naveguemos-por-internet-con-ipv6/34651833#2


Using Path MTU Discovery (PMTUD) and IPv6 fragmentation (source only) allows larger packets to be sent. However, operational experience shows that sending large DNS packets over UDP over IPv6 results in high loss rates. Some studies —quite a few years old but useful for context— found that around 10% of IPv6 routers drop all IPv6 fragments, and 40% block “Packet Too Big” messages, making client negotiation impossible. (“M. de Boer, J. Bosma, “Discovering Path MTU black holes on the Internet using RIPE Atlas”)

Most modern transport protocols like TCP [TCP] and QUIC [QUIC] include packet segmentation techniques that allow them to send larger data streams over IPv6.


A bit of history

The Domain Name System (DNS) was originally defined in RFC1034 and RFC1035. It was designed to run over several different transport protocols, including UDP and TCP, and has more recently been extended to run over QUIC. These transport protocols can be run over both IPv4 and IPv6.

When DNS was designed, the size of DNS packets carried over UDP was limited to 512 bytes. If a message was longer than 512 bytes, it was truncated and the Truncation (TC) bit was set to indicate that the response was incomplete, allowing the client to retry with TCP.

With this original behavior, UDP over IPv6 did not exceed the IPv6 link MTU (maximum transmission unit), avoiding operational issues due to fragmentation. However, with the introduction of EDNS0 extensions (RFC6891), the maximum was extended to a theoretical 64KB. This caused some responses to exceed the 512-byte limit for UDP, which resulted in sizes that exceeded the Path MTU and triggered TCP connections, impacting the scalability of the DNS servers.


Encapsulating a DNS packet in an Ethernet frame


Let’s talk about DNS over IPv6

DNS over IPv6 is designed to run over UDP or other transport protocols like TCP or QUIC. UDP only provides for source and destination ports, a length field, and simple checksum. It is a connectionless protocol. In contrast, TCP and QUIC offer additional features such as packet segmentation, reliability, error correction, and connection state.

DNS over UDP over IPv6 is suitable for small packet sizes, but becomes less reliable with larger sizes, particularly when IPv6 datagram fragmentation is required.

On the other hand, DNS over TCP or QUIC over IPv6 work well with all packet sizes. However, running a stateful protocol such as TCP or QUIC places greater demands on the DNS server’s resources (and other equipment such as firewalls, DPIs, and IDS), which can potentially impact scalability. This may be a reasonable tradeoff for servers that need to send larger DNS response packets.

The draft’s suggestion for DNS over UDP recommends limiting the size of DNS over UDP packets over IPv6 to 1280 octets. This avoids the need for IPv6 fragmentation or Path MTU Discovery, which ensures greater reliability.

Most DNS queries and responses will fit within this packet size limit and can therefore be sent over UDP. Larger DNS packets should not be sent over UDP; instead, they should be sent over TCP or QUIC, as described in the next section.


DNS over TCP and QUIC

When larger DNS packets need to be carried, it is recommended to run DNS over TCP or QUIC. These protocols handle segmentation and reliably adjust their segment size for different link and path MTU values, which makes them much more reliable than using UDP with IPv6 fragmentation.

Section 4.2.2 of [RFC1035] describes the use of TCP for carrying DNS messages, while [RFC9250] explains how to implement DNS over QUIC to provide transport confidentiality. Additionally, operation requirements for DNS over TCP are described in [RFC9210].


Security

Switching from UDP to TCP/QUIC for large responses means that the DNS server must maintain an additional state for each query received over TCP/QUIC. This will consume additional resources on the servers and affect the scalability of the DNS system. This situation may also leave the servers vulnerable to Denial of Service (DoS) attacks.


Is this the correct solution?

While we believe this solution will bring many benefits to the IPv6 and DNS ecosystem, it is a temporary operational fix and does not solve the root problem.

We believe the correct solution is ensuring that source fragmentation works, that PMTUD is not broken along the way, and that security devices allow fragmentation headers. This requires changes across various Internet actors, which may take a long time, but that doesn’t mean that we should abandon our efforts or stop educating others about the importance of doing the right thing.


Sunday, June 9, 2024

Cisco hidden command: bgp bestpath as-path multipath-relax

Hidden command

  bgp bestpath as-path multipath-relax


What for is this?

By default, Cisco does not do load-balance or distribute traffic between different ASs, this command allows it. Important, you must also use the maximum-paths command


Example:

router bgp 65001

 bgp router-id 1.1.1.1

 bgp log-neighbor-changes

 bgp bestpath as-path multipath-relax

 neighbor 2001:DB8:12::2 remote-as 65002

 neighbor 2001:DB8:12:10::2 remote-as 65002

 neighbor 2001:DB8:13:11::3 remote-as 65003

 !

 address-family ipv4

 no neighbor 2001:DB8:12::2 activate

 no neighbor 2001:DB8:12:10::2 activate

 no neighbor 2001:DB8:13:11::3 activate

 exit-address-family

 !

 address-family ipv6

 maximum-paths 3

 neighbor 2001:DB8:12::2 activate

 neighbor 2001:DB8:12:10::2 activate

 neighbor 2001:DB8:13:11::3 activate

 exit-address-family


Output after implementation:

     Network          Next Hop            Metric LocPrf Weight Path

 *m  2001:DB8::4/128  2001:DB8:12:10::2

                                                              0 65002 65004 ?

 *>                   2001:DB8:12::2                         0 65002 65004 ?

 *m                   2001:DB8:13:11::3

                                                              0 65003 65004 ?

 *m  2001:DB8:24:11::/64

                       2001:DB8:12:10::2

                                                              0 65002 65004 ?

 *>                   2001:DB8:12::2                         0 65002 65004 ?

 *m                   2001:DB8:13:11::3

                                                              0 65003 65004 ?

 *m  2001:DB8:34::/64 2001:DB8:12:10::2

                                                              0 65002 65004 ?

 *>                   2001:DB8:12::2                         0 65002 65004 ?

 *m                   2001:DB8:13:11::3

                                                              0 65003 65004 ?

Friday, June 7, 2024

Video: IPv6 LAC Race - May 2014 - Jun 2024

Do you want to know how the evolution of IPv6 has been in LAC? In this video of just a minute you will have your answer #barchartrace #ipv6





Sunday, June 2, 2024

Solved: "The following security updates require Ubuntu Pro with 'esm-apps' enabled"

Situation

  When you want to do some operations in Ubuntu using apt/do-release-upgrade you receive the message:

"The following security updates require Ubuntu Pro with 'esm-apps' enabled"


Solution

 The solution that worked for me was to run this:


cd /etc/apt/sources.list.d

for i in *.list; do mv ${i} ${i}.disabled; donated

apt clean

apt autoclean

sudo do-release-upgrade



Reference

https://askubuntu.com/questions/1085295/error-while-trying-to-upgrade-from-ubuntu-18-04-to-18-10-please-install-all-av 



Monday, April 29, 2024

Solved: Error: eth0 interface name is not allowed for R2 node when network mode is not set to none in containerlab

 Problem:

   Containerlab returns a similar error:

Error: eth0 interface name is not allowed for R2 node when network mode is not set to none


Solution:

 In the .yml file in the node section indicating the topology error specify:


network-mode: none


Example:

topology:

  kinds: 

    linux:

      image: quay.io/frrouting/frr:8.4.1

  nodes:

    R1:

      kind: linux

      image: quay.io/frrouting/frr:8.4.1

      network-mode: none


 Rerun the topology with clab dep -t file.yml and that's it!


Luck.

Friday, March 8, 2024

BGP Stream: An Analysis of One Year of BGP Incidents

04/03/2024


By Alejandro Acosta, R&D Coordinator at LACNIC

LACNIC presents the first webpage designed to show incidents and an analysis of Border Gateway Protocol (BGP) measurement data in Latin America and the Caribbean.

MAIN INCIDENTS. In addition to a summary of the information, the page shows three main types of events: possible network hijacks, BGP outages, and route leaks.

Possible hijacks refers to the illegitimate takeover of groups of IP addresses by corrupting Internet routing tables. This typically occurs when an Autonomous System announces a prefix that it does not originate.

Outages refers to the loss of visibility of network prefixes by a majority group of sensors.

Route leaks, as the name suggests, refers to the —potentially— unintentional announcement of a network prefix via BGP. For example, in a private peering traffic exchange, when one of the participants announces the peer’s prefix to the Internet. This case is the most difficult for algorithms to detect, so some of these incidents are not identified.

How is the data obtained?

This initiative uses Cisco BGP Stream, an automated process that selects the largest and most important incidents, providing information on the nature of the event and the ASNs involved.

The information is openly published, as LACNIC believes that it is important for engineers, network administrators, and organizations to gain insights into the most common incidents in the region and raise awareness about the situation.

This allows quickly investigating events, the rapid development of complex prototypes and tools, as well as large-scale monitoring applications (e.g., detecting connectivity outages, attacks, or BGP hijacks).

Using a system developed by LACNIC’s R&D department, raw data is collected, plotted, identified, cleaned, stored in a database, and later used to produce statistics and graphs. This occurs automatically every 24 hours.

RESULTS. During the study period —February 2023 to February 2024— we found the results shown in the charts below, which compare BGP events worldwide vs BGP events in our region.

A comparison between the global chart and the chart specific to the LAC region shows a similar pattern in the order of the most common incidents, with outages being the most frequent type of incident, followed by possible hijacks, and finally prefix leaks. It should also be noted that outages represent a higher percentage of the total number of incidents in our region than at the global level.

An analysis of the results table showing worldwide BGP events vs BGP events in our region reveals the following:

TOP 5 countries in our region with the highest number of BGP outages

Outages 
CCEvents
BR781
AR99
HT24
MX22
CL17

TOP 5 countries in our region with the highest number of possible Hijacks

Expected CCDetected CCEvents
BRBR67
BRnone35
PYBR24
BRUS22
BRCN9

TOP 3 countries in our region with the highest number of route leaks

Origin CCLeaker CCEvents
VEVE7
MXMX5
CLPA2

Impact

In this first year of operation, LACNIC has observed a reduction in BGP incidents. Several reasons for this have been identified, including a) the deployment and adoption of Resource Certification (RPKI), b) LACNIC’s Internet Routing Registry (IRR), and the adoption of RFC 9234 (Route Leak Prevention and Detection Using Roles in UPDATE and OPEN Messages).

The adoption of these tools is being driven by better operator practices and ISOC’s promotion of MANRS.

Conclusions

Possible hijacks, outages, and route leaks are the most common types of BGP incidents. During the initial year of data collection, a decrease in the number of cases was observed. However, it is expected that they will not disappear entirely in the near future. Implementing robust redundancy and resiliency measures in networks is crucial, as is the early detection and prevention of possible hijacks to ensure the integrity and reliability of Internet routes.

At LACNIC, our goal is to raise awareness and encourage ISPs and organizations to be prepared to handle these incidents efficiently when they occur.

References

https://stats.labs.lacnic.net/BGP/bgpstream-lac-region.html

https://stats.labs.lacnic.net/BGP/bgpstream.html

https://bgpstream.crosswork.cisco.com/ 


Tuesday, February 27, 2024

This is the way to install the telnet command in Alpine Linux (very popular in the container world such as docker)

This is the way to install the telnet command in Alpine Linux (very popular in the container world such as docker)

#apk update

#apk add busybox-extras

Friday, February 23, 2024

The real solution to run ContainerLAB on MAC m1 or m2 apple silicon

Step 1: Install Canonical Multipass your MAC 

$brew install multipass


Step 2: Install the VM called docker

$multipass launch docker --name mydocker


Step 3: Connect to the new VM

$multipass shell mydocker


Step 4: Inside the VM install ContainerLab

$sudo su

#bash -c "$(curl -sL https://get.containerlab.dev)"


Let's try this simple back2back topology of two Linux computers with FRR


-- 2-frr-back2back.yml --

name: ipv6-ws

topology:

   kinds:

     linux:

       image: ghcr.io/hellt/network-multitool

   do not give:

   ROUTERS ###

     A1:

       kind: linux

       image: quay.io/frrouting/frr:8.4.1

       exec:

         - "sysctl -w net.ipv6.conf.all.forwarding=1"

         - "ip address add dev eth1 2001:db8:ffab::1/64"

     A2:

       kind: linux

       image: quay.io/frrouting/frr:8.4.1

       exec:

         - "ip address add dev eth1 2001:db8:ffab::2/64"

         - "sysctl -w net.ipv6.conf.all.forwarding=1"

   links:

     - endpoints: ["R1:eth1", "R2:eth1"]

--- yml --


Step 5: Let's build the topology with clab:

clab dep -t 2-frr-back2back.yml


Step 6: finally we are going to connect to one of the VMs inside ContainerLAB

docker exec -i -t clab-ipv6-ws-R2 bash

Thursday, February 1, 2024

A Much-Needed BGP RFC: AS Path Prepending

Introduction

The Border Gateway Protocol (BGP) plays a critical role in building and maintaining Internet routing tables, so much so that it is considered the “glue” that holds the Internet together. In this context, a long-standing and very popular technique known as ‘AS Path Prepending’ has been devised as a key strategy for influencing route selection and optimizing an AS’s inbound and outbound traffic.

In this document, we will navigate through the IETF draft titled “AS Path Prepending” [1], which includes several ideas and concepts that are of great value to the community.


About draft-ietf-grow-as-path-prepending

This draft has been under discussion within the Global Routing Operation (GROW) Working Group since 2020 and is currently on version 10. The document has seven co-authors: M. McBride, D. Madory, J. Tantsura, R. Raszuk, H. Li., J. Heitz, and G. Mishra. It predominantly received support on the discussion list (including my own). You can read the draft here.


What is AS Path Prepending?

AS Path Prepending is a technique that involves repetitively adding one’s autonomous system identifier (ASN) to the list of ASs in a BGP route path (AS_PATH). Its goal is to influence route selection by making certain paths less attractive to inbound/outbound traffic. In other words, it consists of adding our autonomous system to the AS_PATH and therefore artificially “lengthening the path” to a prefix on the Internet.




In the figure above, without prepends, Router A prefers to go to C via B. However, when three prepends are added on B, router A decides to reach C via D.


Why is AS Path Prepending used and what is it used for?

AS Path prepending is used for multiple reasons. The main reason is undoubtedly traffic engineering, which in turn is used to influence an AS’s inbound and outbound traffic. It is very likely that the AS wishes to achieve one of the following goals:

  • to distribute traffic among two or more upstream providers, or
  • to have an upstream backup provider.
  • Whatever the case, the goal is traffic engineering.


To prepend or not to prepend, that is the question

Prepending is a bit like NAT in that it is often a necessary evil. As we will explain, its excessive and sometimes unnecessary use can become a vulnerability with significant implications for network stability.


What’s wrong with using AS Path Prepending?

We all know that AS Path Prepending is a very common technique to influence BGP decisions. However, its excessive, incorrect, and sometimes unnecessary use can have negative consequences. For example:


  • Creation of suboptimal traffic flows. In other words, we may achieve our goal of distributing traffic in the immediate links; however, beyond our immediate upstream, traffic is not optimized to reach our autonomous system and vice versa.
  • Prefix de-aggregation. When implementing traffic engineering, it is very common to de-aggregate prefixes, which affects the Internet ecosystem.
  • In the event of a route-leak, under normal circumstances, the as-path of our advertisements would be shorter than that of the leaked route. However, if we artificially lengthen the path by prepending, the as-path of the leaked routes may be shorter than those we are legitimately announcing for our legitimate prefix, which would have lower preference, leading to potential route hijacking, attacks, and a long etcetera.
  • Memory consumption. As expected, these AS Path Prepends are learned by BGP Speakers, thus increasing their memory usage. To this I would add that prepending introduces a small additional CPU usage penalty for each prefix.


Given that AS Path Prepend is no longer recommended, what alternatives are available?

There are many techniques for performing traffic engineering in BGP. I will mention some that appear in the draft:


  • Leveraging BGP communities. In addition to the well-known BGP communities, I recommend talking to your BGP peers to optimize traffic. There are numerous BGP communities implemented by providers, which might certainly benefit your setup.
  • Announcing more specific routes to your main upstreams.
  • Manipulating the AS Origin Code. Remember that this attribute is also found in the BGP route selection algorithms.
  • Using Multi Exit Discriminator (MED), a non-transitive attribute that can be used with excellent results for manipulating inbound traffic when we have several links to the same provider
  • Using Local-Preference, another non-transitive attribute, perfect for influencing the traffic that leaves our autonomous system


This is all well and good, but I still need to use AS Path Prepending. Any suggestions?

The draft mentions the best current practices when using AS Pat Prepending, which I will summarize below:

  1. Only use AS Path Prepending if it is absolutely necessary.
  2. Due to some traffic manipulation techniques, when using AS Path Prepending, we may not see significant changes in the traffic distribution, which is why it is important to talk to our peers to see if they will honor the prepends.
  3. Use local-preference on our network.
  4. Don’t prepend ASNs that you don’t own.
  5. Don’t prepend if you are connected to a single ISP using a single link, i.e., single homed (this one is not included in the draft).
  6. If we prepend a prefix, it might not be necessary to use that prepend for all our peers.
  7. There is no need to use more than five prepends. The reason for this is that more than 90% of path are five ASs or fewer in length.



Final Considerations

The use of AS Path Prepending is a valuable strategy but should be used only when necessary and with caution, following best practices. Excessive use of prepends may cause unforeseen events that may affect our autonomous system from a traffic and a security perspective.

We invite you to read the full draft (available here and to join the discussion on the LACNOG mailing list.

We also encourage you to comment on this post to let us know if you are prepending your ASN, as well as why and what you are using this for.


References:

[1] https://datatracker.ietf.org/doc/draft-ietf-grow-as-path-prepending/

Monday, January 8, 2024

Two very short jokes, one TCP and other UDP

TCP:

 ” You wanna hear a TCP joke ?
 You wanna hear a TCP joke ?
 You wanna hear a TCP joke ?
 You wanna hear a TCP joke ?

 [...]”


UDP:

I’d tell you a UDP joke, but you might not get it.