Thursday, August 13, 2009

Thoughts on A System for Authenticated Policy-Compliant Routing (Platypus)

Authors: Barath Raghavan and Alex C. Snoeren
Venue: SIGCOMM 2004
Paper: http://www.cs.ucsd.edu/~snoeren/papers/platypus-sigcomm04.pdf

Summary:

Providers want to forward packets only for entities they can bill. Regular source-routing makes this difficult since packets can arrive from anywhere and it is not clear whom to bill. The authors propose a network protocol that allows an ISP to give capabilities to whomever it wants to bill. The capabilities themselves can then be further delegated. Their work assumes that the knowledge of alternate routes or waypoints is done through some out-of-band means and the issue is not addressed.

Applications:

1. Efficient overlays and on-demand routing where end-hosts can decide how to route packets.

2. Preferential egress points where different peers get routed differently even if they all peer at the same router.

3. Preferential ingress points where multi-homed ASes can select what which ISP traffic comes in through.

4. Virtual multi-homing where a stub AS can get multi-homing at the provider (i.e. a hop or more away) even though it only has one connection.

The Platypus header is below the IP header so it is compatible with IP. A packet has a list of capabilities where each capability specifies the resource principal to bill, and the waypoint to go to. Packets are forwarded between waypoints by replacing the IP address in the IP-level packet.

Capabilities are authenticated using a “double MAC” trick. The key server and the router share a single key k. Secret capability key s = MAC(k, capability) is given to customer. Customer then generates b = MAC(s, payload) and puts b in the packet along with the capability. The router can then authenticate the capability.

The capability key s is temporary and expires after some time. The expiration mechanism used requires loose synchronization, but allows for only one value. When a key expires, the customers need to get a new key. Customers are given a master key they can use. They encrypt the op code for requests and put it in the DNS lookup name. So if a key is requested from a server a.b.edu, the request is sent to E(master key, request code).a.b.edu. The response contains the encoded new key. Requests are thus cacheable.

They also have revocation lists that are pulled from key servers into the routers.

The describe two ways to do delegation on prefixes. One by XORing, and the other by recursive MACing. The first is susceptible to collusion, the second might be too slow (this is what we do in ICING).

They evaluate their system using kernel modules, and it is a little slower than IP forwarding. I did not inderstand their discussion on clustered waypoints and their proposed solution traffic engineering. So I won’t say anything about it. I will say that the way they choose their nonces is not very nice.

They punt issues like accounting saying it is possible to do some distributed per-packet counting and rate-limiting(unknown). Their scalability section is a little hand-wavy since they have no idea what it would really take to build it in hardware. Replay attacks are also hand-waved saying they will use bloom filters (but they don’t mention costs or how they would do that in hardware, what do they hash…)

The Good:

· This is good seminal work on how to verify policy compliant packets.

· Used some cute smart tricks here and there (expiration, double MAC, delegation, key lookups)

· System seems to perform well enough in software

· They present the system and crypto nicely (who has what keys and how they derive them for example)

The Bad:

· They concentrate only on recognizing whom to bill. In reality, policies are probably more complex for security reasons. For example, don’t let AT&T send packets through Sprint even if another customer is paying for it. Or only use trusted ISPs.

· Much has been hand-waved. The implementation doesn’t explicitly specify (or maybe I missed it) what has been implemented.

· Revocation lists and polling from key-servers might be a problem.

· Once a capability is in a packet, it is compliant. There is no check whether or not the packet actually follows the path.


Thoughts on HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

Authors: Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi Silberschatz (Yale University), Alexander Rasin (Brown University)

Venue: VLDB 2009

Paper: http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf

Summary:

The paper starts by saying that DBs are becoming increasingly important and data is growing at a huge rate. Unfortunately, the usual parallel DBMSs do not scale well because they cannot handle failures very well. On the other hand, MapReduce handles failures and stragglers (heterogeneous systems) well and scales well, but does not perform as well as DBMSs.

The paper describes a system that has roughly three layers. The top layer is a SQL layer which accepts SQL queries from users. The SQL queries are translated into MapReduce jobs. The MapReduce jobs are executed in the middle layer, Hadoop. The job is divided into several tasks. Each task itself is a smaller SQL query that is executed on a single node. Each node independently runs the third layer, a single-node DBMS (PostgreSQL in this case). It uses Hadoop for communication, Hive for the translation, and PostgreSQL for the DB.

HadoopDB consists of the following main components:

1. The database connector: provides an interface to independent databases so that worker nodes (TaskTrackers) can get results of SQL queries in MapReduce format (i.e. key-value pairs).

2. Catalog: maintains metainformation about the third-layer DBs---connections parameters (e.g. DB location, driver class, credentials) and metadata (data sets, partitions, replica locations, …)

3. Data Loader: Partitions data and stores it in the local DBs.

4. SQL to MapReduce to SQL (SMS) Planner: Extends Hive which translates SQL queries to MapReduce (Hadoop) jobs by optimizing the output to push as much of the work onto the local DB instances in the worker nodes since the single node DBMSs can do much better than MapReduce.

In the eval section, they compare Vertica (a column store DBMS), DBMS-X, HadoopDB, and Hadoop. They tested the systems without failures and with failures. I didn’t quite understand all the differences in workloads, but I did notice that Vertica was much faster than any of the other systems. The authors claim that HadoopDB performs almost as well as the other systems they tested and almost always better than Hadoop. They say that the performance differences are mainly because of the lowest PostgreSQL layer. This seems plausible, but it would have been a much more compelling eval if they tested comparable systems (example use compression on PostgreSQL and/or a column store).

When faults and slow nodes were introduced, Vertica performed much worse: 50% performance hit with failures and 170% with slow nodes. HadoopDB on the other hand was only 30% slower. They then say that this slow-down is a minor bug that can be easily fixed so that their performance matches Hadoop’s (which is better than HadoopDB with slowdown).

The Good:

The paper is interesting and reads well. The system itself is also very smart, and it’s a wonder no one has built it before. I would really like to see what ends up happening with it and how it ends up performing in the real world.

The Bad:

It would have been much more insightful if the used the same DB bottom layer in the eval, or at least eval different ones. It would have been compelling to see how Hadoop can parallelize DBMS. It would have been nice too if they could have put Vertica there to show how HadoopDB can fix the issues with failures. Also, I didn’t think the queries the used were that complex, but I don’t know whether this is of any importance.

Friday, August 7, 2009

Thoughts on Fabric: A Platform for Secure Distributed Computation and Storage

Authors: Jed Liu, Michael D. George, K. Vikram, Xin Qi, Lucas Waye, Andrew C. Myers (Cornell University)
Venue: SOSP 2009

Summary:
The paper describes Fabric: a language that is mainly an extension of Jif into the distributed computating world. The paper start with motivation citing patient record sharing as an example of where it is necessary to share data between two domains such that each domain's security policies are enforced.

In Farbic, there are three types of nodes:
  1. Stores: store objects persistently
  2. Workers: perform computation
  3. Dissemination: CDNs for stored objects
Naming: Fabric uses an object oriented DB model. Fabric objects store information, are mutable, and have an object id or oid. The oid is mainly the concatenation of the location where the object is stored (as a DNS name) and a 64bit number. The oid is permanent. If the object moves, a pointer in the old location will point to the new location.
Certificates are the roots of trust and are used to establish SSL connections.

Labels: Objects has labels decribing its integrity and secrecy requirements.

Classes: Class objects creat an unforgeable binding between each object and the correct code for implementing that object.

Versions: Fabric objects are mutable. Each has a version number that is incremented when a transaction updates the object. The version number is used to check whether a transaction is operating on a stale or the up-to-date copy of the data.

Threat Model: Timing and termination channels are ignored. It is assumed that the network will deliver messages eventually. It is not clear if this assumption is a strong one. Do they need the network to never fail or else everything crumbles down? Or do the only assume that the network works mostly as today (it will be out for a few hours maybe)?

Another assumption that is used in the rest of the paper is that a program knows a priori the nodes it will use for computation and dissemination, or that they will be input into Fabric somehow. Similarly for storage nodes.

Storage Nodes: Store objects. Related objects are grouped together and returned when requested for efficiency. They track delegation or as they call it "trust relationships".

Worker Nodes: Execute Fabric programs.
  • Trusted programs can incorporate other intermediate language-programs that don't have security constructs.
  • Objects modified only inside transactions
  • worker nodes can either request remote data (data-shipping)
  • or can do remote invocations (function shipping)
  • Remote invocations don't need the object to already exist at the remote worker node
  • The entire method call is executed in a subtransaction whose effects are only visible when the top-level transaction commits. The whole transaction tree is then committed.
  • The caller-side of the remote invocation checks its security statically. The callee-checks it on run-time since it doesn't know who is going to invoke it. i.e. the langauage has to explicitly check trust: e.g. if a trusts b: then a.invoke(...)
Dissemination Nodes:
  • Dissemination layer is not prescribed, and currently uses FreePastry
  • Object on dissemination nodes encrypted
  • Key stored on object's store
  • Store can verify security and dole out the key when safe. **They say workers need not fetch keys often, but not clear why...p5
  • Stores and dissemination nodes track readers.
  • When object updated, updates are pushed to tracked worker nodes.
  • Dissemination nodes can learn info from request pattern, so they are labeled with some label representing "the maximum confidentiality of information that may leak on this channel". Workers only do fetch requests if this max is not violated. **I do not understand this whole paragraph. How do they quantify the amount of information leaked on this covert channel and turn it into a label?
The Fabric Language: They extend JIF to support
  1. Transactions and consistency
  2. Remote method calls
The interesting thing is they do both of these without leaking information and so that remote calls are authorized. I think this should probably say "where information leakage is explicit" because there will always be some undesired information leakage.

Then they go on to say: "Fabric adds a new trust ordering on information flow labels". This I found confusing and intriguing at the same time. Eventually, I will be disappointed...

In Fabric, everything is in terms of principals.
  • There are no partial privileges or dynamic labels for prinicpals. A principal always executes with its full privileges.
  • They add an acts-for relationship which is basically just delagation of all privileges, and they call this "trust". If a acts for b then b trusts a or b has delegated all its privileges to a.
  • There's a top principal (I'll call T, but they use a symbol I'm too lazy to lookup) that acts for all other principals.
  • There's a bottom principal (B) that all prinicpals act for.
  • Fabric nodes are also principals, and so also follow the acts-for model.
  • Principals implement their own authentication. The description of how trust is bootstrapped in the system is too confusing. They must have cut too much here. My imagination fails me. **They need to describe bootstrapping more. How does a user delegate trust to a node? How is the authentication turned into a principal?
  • Information flow security policies are expressed in terms of principals. They claim this integrates access control and information flow, though I don't know what that means. Information flow security policies are access control policies!
  • Labels in Farbic look like: {alice -> bob} meaning alice, the owner of this security policy allows bob to read from it. {alice <- bob} means alice allows bob to write to it. Since the owner is explicit (alice), only code running with alice's authority (worker node acts for alice) can declassify or endorse data with alice's labels.
  • {alice->bob} can flow to {charlie->dora} only if charlie acts for alice and dora acts for bob or dora acts for alice.
Trust Ordering: As well as being ordered by information flow using the flow to relationship, labels can be ordered by trust (or the acts-for relationship). The higher a label's trust requirement, the more a node needs to be trusted. For example, {a->b} actsfor {c->d} iff a actsfor c and b actsfor (d or c). That is one label can be used instead of another in that case. This can be used to establish trust in nodes: if an object x with label L_x is to be stored on a node n. Formally, L_n={T->n; T<-n} actsfor L_x for this to work.
e.g. if L_x is {p->q} then either n actsfor p or n actsfor q. In either case, n must be trusted with the contents of x.

They mention that the reason this trust ordering is needed is because "Fabric classes may be parametrized with respect to different labels or principals, and so instances of the same class may have different labels. This allows implementation of reusable classes..." However, it is not clear to me that anything other than good ol' delegation is necessary. Of course you have to delegate to a node if that node is to enforce your labels! That is why in D-Star exporters own categories they in labels they enforce. Hence, the disappointment I alluded to earlier.

As in Jif, implicit flows are tracked with the program counter's label.

For remote invocations, they say that the callee checks at run-time that the caller has the authority to invoke something with another principal's endorsements, but they do not mention how this check is done. Are there certs as in D-Star? **Need to mention how delegation is proved across nodes.

Transactions: Fabric provides transactions that can be distributed. When distributed, they are turned into a transaction tree. The whole tree is committed when the top level worker node starts the commit. It does so by recursively calling the commits of the whole tree (note recursively not iteratively). Most of the things they describe about transactions are how you would expect them to work.
A failure of a node to commit a transaction correctly is as serious as if that node violated trust by not updating the object's fields correctly.
If a transaction coordinator midway through a transaction, so changes might be committed while others not. This can break consistency but only for objects whose integrity the failing node was entrusted with.

Writer Maps: Used to allow workers to check whether an object has been updated without revealing information to workers who shouldn't learn about the updates. The writer map a log of writes. Everytime an object is updated, an entry hash(oid, xid, key)->E(key, w) is stored where w is the writer. It allows nodes that know the key to get the latest update and know what the writer for that update is. But it is not clear how the nodes know what the last transaction id is. It is probably stored in the clear. **Nodes also still know that an object has been updated since the writer map changes even if the are not allowed to know what the exact updates are.

Implementation and Eval: The implementation was 33k lines of code on top of Jif and other libs. They use BDB for the store and FreePastry for dissemination.
A few things are left unimplemented, most notably the distributed deadlock detection and timeouts for divergent computations.

For the Eval they implemented a course management system and tested on 55 students. Complex queries were replaced with object oriented code and this simplified the implementation from 3100 loc to 740. However, they only migrated part of the full system. They also only ported it to FabIL, the intermediate language used by Fabric that has no security properties. Only consistency. Performance shows that FabIL is faster than all other implementations with consistency. Transaction management and consistency is about 50% of performance. **The metric doesn't really show much since only parts of the implementation were done. It is not clear why FabIL is faster. Maybe I didn't understand what FabIL does. Is the implementation on a set of nodes? How does BDB run faster that Oracle? Still it says nothing about security and performance with full Fabric used.

They also implemented a calendar. They didn't get performance numbers. Is it because the calendar didn't fully work?

They used the OO7 benchmark on Fabric nodes and found that Fabric is 10 times slower than non-persistent Java. They say this is acceptable, but that is not clear.

In summary, the paper is a thorough design of a language for distributed computation and storage when DIFC is involved. But like almost every other paper on the subject has left me with more questions than answers.

The Likes:
  • The paper is thorough in the design description
  • The paper is probably the first to combine DIFC with transactions to provide consistency
  • They combine language constructs with run-time checks. It's cool that trust requirements are in the language.
  • They do so much stuff!
The Disklikes:
  • Ah so many questions! See the text (**). I'm too lazy to put them here.
  • Why the trust ordering on labels? Why is delegation not enough? I am not convinced that it is necessary.
  • They have said almost nothing about how runtime checks are implemented. Do they use certs as in DStar to prove delegation?
  • That bootstrapping paragraph was completely useless. It left me more confused than informed. Probably because they did not explain how runtime enforcement is done and how authentication is turned into labels or proofs of labels.
  • The evaluation leaves so much to be desired. There is no full implementation of any system that uses Fabric and no real performance evaluation. In all, that was a disappointing eval section.
  • What is the deal with the dissemination nodes. Yup they are cool, but where is how they help? Are they basically CDNs? How are they used? Why not just replication?
  • I don't like the fact that principals always use their full privileges to access everything. What if a principal wants to operate in an untrusted environment or where it doesn't want a node to be responsible for the integrity of so many objects.
  • The paper does not say much about the "control plane": Where do the names of nodes come from? How does a user select nodes? How do programs select/find nodes? How does a user's credentials translate across multiple nodes? I guess all this is orgthogonal but it would've helped me understand Fabric's usage model which was quite obscure (as in many other DIFC papers).
  • I think my main issue is why is the system not layered? Why aren't transactions built on top of a minimal base whose properties are easy to reason about? Why are dissemination nodes a primary object instead of being implemented on top of worker and storage nodes, i.e. on top of Fabric? Is it convenience? What happens when things need to change? What about the modularity argument?

Sunday, August 2, 2009

Thoughts on Provable Data Possession at Untrusted Stores

Authors: Giuseppe Ateniese† Randal Burns† Reza Curtmola† Joseph Herring†
Lea Kissner ‡ Zachary Peterson† Dawn Song §

†Department of Computer Science, Johns Hopkins
University, Baltimore, MD – {ateniese, randal,
crix}@cs.jhu.edu, jrh@jhu.edu, zachary@cs.jhu.edu
‡Google, Inc. – leak@cs.cmu.edu
§University of California Berkeley/Carnegie Mellon Univer-
sity – dawnsong@cs.berkeley.edu

Venue: CCS’07, October 29–November 2, 2007, Alexandria, Virginia, USA

Paper: http://www.cs.berkeley.edu/~dawnsong/papers/p598-ateniese

Summary:
The authors describe a way to probabilistically determine whether a file stored at a SSP exists and has not been modified. I don't quite understand the crypto, but the results are:
  • Fast checks. They are I/O bound, not CPU bound
  • The size of the file stored at the server increases
  • The client needs to store a constant small amount of data per file
  • They use sampling to reduce the load on the server.
  • Uploads are slow because preprocessing is expensive on the order of 160kB/s
This work is almost identical work by Ari Juels(RSA Labs) and Burton Kaliski(EMC Corp) PORs: Proofs-of-Retrievability for Large Files. However, in PORs, a client has to decide the number of checks a priori because the system works by encoding a constant number of sentinel blocks for checks in the file that are queried and hence consumed during challenges.

The main difference between PORs and PDPs is: PORs prove to a client that the entire file can be retrieved unmodified. PDPs do not make this claim. They only claim the file blocks tested are uncorrupted.

Throughts on POTSHARDS: Secure Long-Term Storage Without Encryption

Authors: Mark W. Storer, Kevin M. Greenan, Ethan L. Miller (University of California, Santa Cruz), Kaladhar Voruganti (Network Appliance)
Venue: Usenix 2007
Paper: http://www.usenix.org/events/usenix07/tech/full_papers/storer/storer.pdf

Summary:
The paper describes a confidential storage system over untrusted storage service providers (SSPs) without the use of encryption by using secret splitting. The system is also reliable using RAID techniques and approximate pointers to recover a user's data. It is also resilient to insider attacks at an SSP where the insider might attempt to recover stored data.
For long-term archival use:
  • throughput more important than latency.
  • encryption is a pain because of:
    1- lost/compromised keys
    2- compromised cryptosystems
    3- key management difficult because user who stored data may be unavailable
System designed around three tenets:
  1. No encryption (cryptosystems break after time)
  2. Fulfilling requests needs no information external to archives because that data will be lost in time. This is the reason archives are used to begin with.
  3. Indviduals are more likely to be malicious than aggregates, so system builds on SSP cooperation.
The system is also built around an implicit tenet (later made explicit in the paper) that each SSP monitors its security and is distrustful of other SSPs.

Each piece of data is split into an ordered set of shards, a shard-tuple that can be used to recontruct the data when put together. Without all the shards, the data cannot be reconstructed and no information of the data is revealed. Some redundancy is added at the shard layer so that not all shards are needed to improve short-term reliability.

With each shard, an approximate pointer is stored. Approximate pointers point to a region of the namespace where other shards are stored. An exact pointer would allow an attacker to easily reconstruct the data because he would know where each shard is stored. If no pointers are used, then an external index would have to be maintained or there would be no way of knowing which shards go together. With approximate pointers, the attacker has to request on average half the region to get the right shard, which is highly conspicuous.

POTSHARDS uses RAID techniques + algebraic signatures for redundancy and verification.

Storage proceeds as follows:
  1. Data preprocessed into objects.
  2. Objects secret-split into fragments for secrecy.
  3. Fragments secret-split into shards for availability. Approximate pointers added here.
  4. Shards are placed across SSPs so that no single SSP can get all shards to make data.
SSPs coordinate between themselves into RAID groups: **Huh?
  • "To deal with the threat of data loss from these events, POTSHARDS utilizes distributed RAID techniques. This is accomplished by dividing each archive into fixed-sized blocks and requiring all archives to agree on distributed, RAID-based methods over these blocks."
For parity updates on placements, a shard stored in a particular block is sent to other archives to update the parity for that block. **What? So archives can get shards that are meant for other archives? I thought the archives are supposed to be mutually distrustful. Nope. They say the "it is assumed that all of the archives are trusted". This seems to me like a contradiction.

For recovery, the SSPs again coordinate and choose one candidate to store the failed data. **Come on, this is stretching too too far. How can they choose that if they are mutually distrustful?

Archives also do integrity checking of other archives on behalf of a customer using algebraic signatures. **These guys are doing too much work.

SSPs store exact pointers from users to shards. And shards store approximate pointers to each other. A user can get all her shards, then use the approximate pointers to find chains and try to reconstitute the fragments. An attacker would have to recover all shards in each block pointed to by approximate pointers from each archive. **If an attacker compromised a SSP, he can get the exact index of the shards! Why try to get all the shards and not the index?

The eval section was hand-wavy. I couldn't really understand what the numbers they found meant. How do they relate to the real world? Some x-axes had no labels. PlanetLab vs. local cluster had too much discrepancy. There was no discussion of overhead costs.

In summary, this paper had some good ideas. The motivation is solid and the architecture seems to be built mostly on solid principles. The design they came up with, though, is questionable. There seems to be many weaknesses. For example, they assumed an attacker would brute-force requests of data without getting meta-data from an archive. They also assumed that SSPs would be able to do all this great amount of coordination among each other. It was also assumed at one point that the SSPs should "question each other's security" i.e. be mutually untrusted, but then they say that the archives are trusted because they send each other shards while doing parity updates. This work needs significant rework and careful re-evaluation. The explanation of how approximate pointers are used by a user is unintelligible (to me at least).

Thoughts on SafeStore: A Durable and Practical Storage System

Authors: Ramakrishna Kotla, Lorenzo Alvisi, and Mike Dahlin (UT Austin)
Venue: Usenix 2007
Summary:
The author describe a system---SafeStore---that drastically increases the durability of data stored at storage service providers (SSPs). The system relies on the following points:
  • Use hierarchical erasure coding within and across multiple SSPs
  • SSPs should provide an interface that exposes the redundancy levels they support internally
  • Use a heuristic to decide how/where to store data and with what redundancy levels
  • Use auditing to ensure data is stored correctly and is available.
Auditing works as follows:
  • When data owner stores data, it gets a signed receipt with object ID and hash.
  • Data owner encodes and stores receipt across SSPs. **Does the receipt need a receipt?
  • Routine audit:
    - Auditor sends salt for particular ID
    - SSP returns signed message with [obj ID, time, H(salt||data)]
    - If SSP honest, and finds that data is corrupted, returns error
    - If SSP dishonest, forced to return bogus H(salt||data) and now we have a crypto proof
  • Spot Check:
    - Auditor verifies some percentage of the responses
    - It does this by retrieving data from owners' cache, SSP, or other SSPs (**Why retrieve whole data? Isn't hash sufficient? i.e. get SSP data, get receipt hash and compare. Several options...)
    - Proof of Misbehavior (POM) can be produced if hash fails.
  • Cost:
    "our audit protocol improves durability of our system by two 9’s over a system with no audit at an additional audit cost of just 20%"
  • All local state except encryption keys and list of SSPs used are soft-state.
Evaluation:
  • Performance vs NFS: 13% worse
  • Adding snapshots makes performance ~40% worse
  • Over the WAN with large delays, moderate drop in performance: 5%
  • SSFS versioning makes replication cheaper with less space overhead.

Saturday, August 1, 2009

Thoughts on Towards Automated Provisioning of Secure Virtualized Networks

Authors: Serdar Cabuk, Chris I. Dalton, HariGovind Ramasamy, Matthias Schunter

Venue: CCS'07, October 29-November 2nd 2007, Alexandria, Virginia, USA.

Paper: http://www.hpl.hp.com/techreports/2007/HPL-2007-139.pdf

Summary:
The work describes a system in which VMs distributed across multiple hosts are isolated within there own Trusted Virtual Domain (TVD). The TVDs are isolated from each other with confidentiality and integrity guarantees. TVDs are implemented using combinations of VLANs, VPNs, and custom vSwitch software that is installed on the virtualized hosts. TVDs enforce which VMs are allowed to be members using certificates or other attributes. Inter-TVD communication is regulated by a traffic matrix whose elements can be 0=no communication, 1=allow any communication, or a P=firewall policy that filters communication.

To communicate between hosts, Ethernet-in-IP encapsulation is used. The mapping is done by the vSwitch component which implements a learning switch. For routing, they use dedicated VMs. To connect a non-virtualized host, they use a gateway VM proxy that has two vNICs.

Each TVD has two VLANs. One that is used for internal communication and is unrestricted, and the other is used for external or inter-TVD communication and passes through a firewall that implements the policies.

They proceed to describe the steps for creating and establishing a TVD and for a VM to join the TVD. They describe how to deploy TVDs in a network and what capabilities of infrastructure should be looked for.

They implemented the system using Xen, and their system has a surprisingly low overhead.

Is this better than Ethane? Well it is better suited for multiple administrative domains since each TVD can be administered separately. Inter-TVD communication is partially administered locally. They don't need specialized hardware. In a way, though, having OpenFlow enabled switches would have simplified much of their work and would have probably allowed everything to be implemented without any end-host modification.

The Good:
  • Really cool work. They managed to use a bunch of ad-hoc mechanisms already available in current hardware to build their system.
  • Each TVD is centrally managed and all policies are maintained in one TVD master
  • The capabilities of the underlying infrastructure is fed to the TVD master which then determines the missing capabilities that need to be instantiated in software. How cool is that!
  • They can connect a number of different platforms along with virtualized and unvirtualized machines
The Bad:
  • Using firewall rules is a very heavy compromise. They cannot stop malicious VMs from communicating. A more thorough approach such as looking into to the VMs memory or using trusted computing to verify guests is necessary.
  • The eval section was a little too hand-wavy.