Tuesday, March 31, 2009

Thoughts on NIRA: A New Internet Routing Architecture

Authors: Xiaowei Yang (MIT LCS)

BibTeX:
@inproceedings{Yang:NIRA,
author = {Yang,, Xiaowei},
title = {NIRA: a new Internet routing architecture},
booktitle = {FDNA '03: Proceedings of the ACM SIGCOMM workshop on Future directions in network architecture},
year = {2003},
isbn = {1-58113-748-0},
pages = {301--312},
location = {Karlsruhe, Germany},
doi = {http://doi.acm.org/10.1145/944759.944768},
publisher = {ACM},
address = {New York, NY, USA},
}
Summary:
In the intro, the author argues that it is important to allow users to choose their domain-level routes. This fosters competition and innovation and drives costs down. Then she adds a disclaimer, saying this is not provable. NIRA is designed with the following in mind: scalability, robustness, efficiency, heterogeneous user choices, and practical provider compensation. Security is not mentioned.

NIRA provides "valley-free" routes that go up a sender's provider chain and down a receiver's provider chain going through a single provider that is sommon to both chains. In the Core of the Internet, packets cannot be pushed further up. Addresses are 128-bit hierarchical addresses, and each address identifies a unique route segment to the core. A full route is represented as a number of "segments" that are valley-free. Each valley-free segment can be represented by a pair of addresses. Routes that have peers (i.e. are not valley-free because there is no common provider in the up and down chains) are represented with multiple addresses.

The author argues that source routes are not terribly insecure when end-to-end authentication is used (where the attacker spoofs an address and inserts its real address in the return source route to get replies). Address-based authentication should not be used.

The Core forms a "routing region", users don't choose providers in the core, and so for most cases, a pair of addresses suffices to indicate the route. Additional optimizations include extended addressing (adding an address to a peering link) and multiple routing regions.

Addressing feasibility: Looked at the route table dumps from Route Views server. Used heurestics to get provider/customer relationships. Concluded it looks feasible.

TIPP (Topology Information Propagation Protocol) is used instead of a LS protocol to limit topology information propagation, but does not include dynamic conditions of interconnections.

NRRS (Name-to-Route Resolution Service) act like DNS and translate hierarchical names to hard-coded routes.

When a high level provider changes its top-level provider, all customers and customers' customers and so on need to change their addresses. However, AS birth and death rate is less than 15 per day and AS-level link birth and death rate is less than 50. See Chen et al. The Origin of Power Laws in Internet Topologies Revisited. So, rate is low and changes won't happen often.

Error handling is done by a reactive message from a router or a timeout.

Nothing interesting is said about provider compensation. Business relationships are either between directly connected domains or indirectly connected ones. With indirectly connected ones, the problem of authenticating senders and receivers is an issue that is punted.

What happens in the DoS or spam case when the receiver has to pay for routes? The author says this is a problem with user-empowerment in general. Incorrectly, she assumes that because BGP can be used to control which providers announce an edge domain's adress prefixes, the problem is solved today. In NIRA this is still a problem.

The Good:
The paper suggests an interesting addressing and routing architecture that allows senders to select routes and receivers to slightly determine the choice. The architecture requires a small aount of state and does not seem to consume too much packet overhead (although this was not specifically measured).

The Bad:
The case for user choice is not very convincing. It's not clear to me that user choice of route will lead to all the green pastures that were promised in the paper. Issues of route availability and errors were not handled very well. Timeouts are horrible. It is also not clear to me how a receiver can control the path that packets take towards him. It seems that if a receiver is going to be paying money, he wants to have as much control over the route choice as a sender. I think a provider can choose any path it likes to reach a sender, and neither the sender not the receiver necessarily know about it.

The Ugly:
There are many issues that are left open that I thought the paper was going to talk about. The most important of these is the issue of authenticating users and billing them. Another problem is attacking a receiver and making him pay money for worthless traffic. Then again, I guess this is an issue any architecture would face. But this is especially important here because there is an emphasis on the receiver paying money for receiving traffic.

Sunday, March 29, 2009

Recursive and Iterative designs in Verilog

Disclaimer: Note that the examples below for the unary_and have not been compiled and are just there for demonstration (probably have a few minor bugs). The paramatrizeable priority decoder example on the other hand is complete, and is in production use.

The Verilog language allows designers to write recursive modules using a generate block. For example:
module unary_and
#(parameter WIDTH = 32)
(input [WIDTH-1:0] in_and,
output out_and)

generate
if(WIDTH == 1) begin
assign out_and = in_and;
end
else if(WIDTH == 2) begin
assign out_and = in_and[0] & in_and[1];
end
else begin
unary_and #(.WIDTH (WIDTH/2))
unary_and_low
(.in_and (in_and[WIDTH/2-1:0]),
.out_and (out_and_low));

unary_and #(.WIDTH (WIDTH - WIDTH/2))
unary_and_high
(.in_and (in_and[WIDTH-1:WIDTH/2]),
.out_and (out_and_high));

assign out_and = out_and_low & out_and_high;
end
endgenerate
endmodule
Unfortunately, not all synthesis tools can handle such designs. One way to get around this problem is by using using two nested loops to build each level of the tree of the recursive modules. The outer loop iterates over levels, and the inner loop creates the nodes at each level. For simplicity, we change the parameter to be the number of levels to create. Here's an example:
module unary_and
#(parameter NUM_LEVELS = 5,
parameter WIDTH = 2**NUM_LEVELS)
(input [WIDTH-1:0] in_and,
output out_and)

generate
genvar i,j;
// start at the lowest level in the tree
for (i=0; i < NUM_LEVELS; i=i+1) begin:gen_levels
// do nodes at each level
for (j=0; j < 2**(NUM_LEVELS-i-1); j=j+1) begin:gen_nodes
wire in_low;
wire in_high;
wire out;

// at the lowest level, use the module's inputs
if (i==0) begin
assign in_low = in_and[j*2];
assign in_high = in_and[j*2 + 1];
end
// otherwise, use the outputs of the lower level
else begin
assign in_low = gen_levels[i-1].gen_nodes[j*2];
assign in_high = gen_levels[i-1].gen_nodes[j*2 + 1];
end

assign out = in_low & in_high;

end
end
endgenerate

assign out_and = gen_levels[NUM_LEVELS-1].gen_nodes[0].out;

endmodule

A more useful example is the design of an efficient parametrizeable priority decoder. I have looked everywhere and couldn't find a good design that worked for various input widths. I've found one that used a recursive structure, but of course that didn't synthesize. The example below shows a complete priority decoder. There are two implementation styles. If STYLE is set to 0 (default), then the implementation builds a multiplexer tree that multiplexes the value to be output. The other style constructs one bit of the output at each level. Below I give two modules, the priority_encoder itself and a tester to test that it works.
/***************************************************               
* $Id$
*
* Module: priority_encoder.v
* Project: NF2.1
* Author: Jad Naous
* Description: parametrizable priority encoder.
*
* Highest priority by default given to rightmost bit.
*
* Usually just specify the OUTPUT_WIDTH. INPUT_WIDTH
* is optional. Defaults to 2**OUTPUT_WIDTH.
*
***************************************************/
module priority_encoder
#(parameter OUTPUT_WIDTH = 8,
parameter RIGHT_TO_LEFT_PRIORITY = 1)

(input [0:(2**OUTPUT_WIDTH)-1] unencoded_input,
output [OUTPUT_WIDTH-1:0] encoded_output,
output valid);

localparam INPUT_WIDTH = 2**OUTPUT_WIDTH;
localparam INPUT_VAL_WIDTH = INPUT_WIDTH*OUTPUT_WIDTH;
localparam STYLE = 0;

generate
genvar i,j;
if(STYLE==0) begin
for(i=0; i<OUTPUT_WIDTH; i=i+1)
begin:gen_levels
for(j=0; j<INPUT_WIDTH/(2**(i+1)); j=j+1)
begin:gen_nodes
wire [OUTPUT_WIDTH-1:0] value;
wire valid;

wire [OUTPUT_WIDTH-1:0] left_val;
wire left_vld;
wire [OUTPUT_WIDTH-1:0] right_val;
wire right_vld;

if(i==0) begin
assign left_val = j*2;
assign left_vld = unencoded_input[j*2];
assign right_val = j*2 + 1;
assign right_vld = unencoded_input[j*2+1];
end
else begin
assign left_val
= gen_levels[i-1].gen_nodes[j*2].value;
assign left_vld
= gen_levels[i-1].gen_nodes[j*2].valid;
assign right_val
= gen_levels[i-1].gen_nodes[j*2+1].value;
assign right_vld
= gen_levels[i-1].gen_nodes[j*2+1].valid;
end // else: !if(i==0)

assign value = (RIGHT_TO_LEFT_PRIORITY ?
(right_vld ? right_val : left_val) :
(left_vld ? left_val : right_val));
assign valid = right_vld | left_vld;
end // block: gen_nodes
end // block: gen_levels

assign encoded_output
= gen_levels[OUTPUT_WIDTH-1].gen_nodes[0].value;
assign valid
= gen_levels[OUTPUT_WIDTH-1].gen_nodes[0].valid;
end // if (STYLE==0)

else begin
for(i=0; i<OUTPUT_WIDTH; i=i+1)
begin:gen_levels
for(j=0; j<INPUT_WIDTH/(2**(i+1)); j=j+1)
begin:gen_nodes
wire [i:0] value;
wire valid;

if(i==0) begin
assign value = RIGHT_TO_LEFT_PRIORITY
? unencoded_input[j*2+1]
: unencoded_input[j*2];
assign valid = unencoded_input[j*2+1]
| unencoded_input[j*2];
end
else begin
wire [i-1:0] left_val;
wire left_vld;
wire [i-1:0] right_val;
wire right_vld;
assign left_val
= gen_levels[i-1].gen_nodes[j*2].value;
assign left_vld
= gen_levels[i-1].gen_nodes[j*2].valid;
assign right_val
= gen_levels[i-1].gen_nodes[j*2+1].value;
assign right_vld
= gen_levels[i-1].gen_nodes[j*2+1].valid;
assign value
= (RIGHT_TO_LEFT_PRIORITY ?
(right_vld ? {1'b1, right_val}
: {1'b0, left_val}) :
(left_vld ? {1'b0, left_val}
: {1'b1, right_val}));
assign valid = right_vld | left_vld;
end // else: !if(i==0)
end // block: gen_nodes
end // block: gen_levels

assign encoded_output
= gen_levels[OUTPUT_WIDTH-1].gen_nodes[0].value;
assign valid
= gen_levels[OUTPUT_WIDTH-1].gen_nodes[0].valid;
end // else: !if(STYLE==0)
endgenerate
endmodule

// synthesise translate_off
module pri_encode_test ();
reg [0:7] unencoded_input = 8'h0;
wire [2:0] encoded_output;
wire valid;

priority_encoder
#(.OUTPUT_WIDTH(3))
priority_encoder
(.valid (valid),
.encoded_output (encoded_output),
.unencoded_input (unencoded_input));

initial begin
unencoded_input[7] = 1'b1;
unencoded_input[2] = 1'b1;
unencoded_input[3] = 1'b1;

#2 if(encoded_output != 7) $display("%t ERROR", $time);

#2 unencoded_input = 0;
unencoded_input[6] = 1'b1;
unencoded_input[3] = 1'b1;
unencoded_input[0] = 1'b1;

#2 if(encoded_output != 6) $display("%t ERROR", $time);

#2 unencoded_input = 0;
unencoded_input[0] = 1'b1;

#2 if(encoded_output != 0) $display("%t ERROR", $time);

#2 unencoded_input = 0;
unencoded_input[7] = 1'b1;

#2 if(encoded_output != 7) $display("%t ERROR", $time);

#2 unencoded_input = 255;

#2 if(encoded_output != 7) $display("%t ERROR", $time);

#2 $display("%t Test ended.", $time);
end // initial begin
endmodule // pri_encode_test
// synthesise translate_on


Note that in synthesis, and surprisingly, style 0 turns out to be more efficient.