Thursday, June 25, 2009

Thoughts on Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises

Authors: Changhoon Kim, Matthew Caesar, and Jennifer Rexford

Reading Detail Level: Quick Read

BibTeX:
@inproceedings{seattle,
author = {Kim, Changhoon and Caesar, Matthew and Rexford, Jennifer},
title = {Floodless in seattle: a scalable ethernet architecture for large enterprises},
booktitle = {SIGCOMM '08: Proceedings of the ACM SIGCOMM 2008 conference on Data communication},
year = {2008},
isbn = {978-1-60558-175-0},
pages = {3--14},
location = {Seattle, WA, USA},
doi = {http://doi.acm.org/10.1145/1402958.1402961},
publisher = {ACM},
address = {New York, NY, USA},
}

Summary:

The main idea is to use the switches to form a single-hop DHT. The DHT is accessed by using consistent hashing (function F) to minimize churn when switches fail or recover and keys are re-hashed. SEATTLE switches do the following things:
  1. They use a link-state algorithm to build the network topology (uses broadcast but only for switches)
  2. They form a one-hop DHT that can be used to store (k,v) pairs.
  3. They also participate in a multi-level DHT so that a large network such as an ISP can be divided into regions and queries to other regions get encapsulated at border switches when crossing regions.
  4. For every k a switch inserts, it keeps track of whether the resolver changes (maybe because it died or a new one came up), and inserts the (k,v) at the new location.
  5. If a switch dies, each switch will scan through its list of stored (k,v) and if the dead switch is a stored value, that entry is deleted.
  6. They advertise themselves as multiple virtual switches so that more powerful switches would appear as several virtual switches and would handle a larger load in the DHT.
  7. When a host arrives:
    1. The access switch snoops to find the host's IP and MAC addrs
    2. The access switch stores MAC -> (IP, location) at switch R=F(MAC) in the DHT
    3. The access switch stores IP -> (MAC, location) at switch V=F(IP) in the DHT
  8. To resolve an ARP request, an access switch uses the DHT's IP->(MAC, location) and caches the result so packets go directly to the target location without additional lookups.
  9. If a packet is sent to a MAC address which the access switch does not know:
    1. The packet is encapsulated and sent to switch R=F(MAC).
    2. R forwards the packet to the right location.
    3. R sends the info to the access switch
    4. The access switch caches the info
  10. When a host's MAC and/or IP addresses and/or location change:
    1. The switch at which the host gets attached (or remains) updates the info in the DHT.
    2. For a location change, the old access switch is told of the change; so, if it receives packets to the old address, it informs the sending access switch of the change.
    3. For a MAC address change, 1) the attached host maintains a revocation list (IP, MACold, MACnew) for the host. If a host is using the old MAC address, the switch sends a gratuitous ARP to that host. 2) The switch tells the other switch of the new MAC->location mapping so it doesn't do a lookup.
  11. SEATTLE separates unicast reachability from broadcast domains. It uses groups---a set of hosts that share the same broadcast domain regardless of location. Switches somehow (not clear now) know what groups a host belongs to, and use a protocol similar to IP multicast to graft a branch to an existing multicast tree.
The DHT can also store mappings from service names to locations (e.g. DHCP_SERVER and PRINTER). This is how they handle DHCP without broadcasts.

For evaluation, the authors implemented SEATTLE (on XoRP and Click), and used traces from several locations to run simulations. They compared with Ethernet and ROFL showing that SEATTLE is superior then both in many ways.

The Good:
The architecture is beautiful in many ways. The DHT approach seems very powerful indeed. I don't know of any other solutions to the broadcasting problem, and none that can handle arbitrary L2 topologies. I liked the notions of virtual switches and groups (with caveats below). The evaluation section is nice, though not perfect.

The Bad:
No paper is perfect, unfortunately.
  • While the authors claim to eliminate broadcast, they do use link state updates which are broadcast, even though these do not happen as often as ARP.
  • Virtual switches are nice, but no analysis of their side-effects exist or how difficult it is to implement them.
  • The discussion on areas was completely confusing.
  • The discussion on groups was too abstract. How do switches know what hosts are in what groups? I was not able to visualize how groups would be implented from the discussion, and so I have to say they seem too difficult.
  • The evaluation section is missing a critical piece. The paper does not mention what has been implemented and what is missing. This makes the evaluation much less credible. How difficult was the implementation? How does it compare to other designs? What is the cost comparison?
  • The per-packet processing cost was not convincing because it was not clear what was implemented and what not.
Nothing was said on multipath, but that was not the point of the paper anyway.

No comments:

Post a Comment