Sunday, August 2, 2009

Throughts on POTSHARDS: Secure Long-Term Storage Without Encryption

Authors: Mark W. Storer, Kevin M. Greenan, Ethan L. Miller (University of California, Santa Cruz), Kaladhar Voruganti (Network Appliance)
Venue: Usenix 2007
Paper: http://www.usenix.org/events/usenix07/tech/full_papers/storer/storer.pdf

Summary:
The paper describes a confidential storage system over untrusted storage service providers (SSPs) without the use of encryption by using secret splitting. The system is also reliable using RAID techniques and approximate pointers to recover a user's data. It is also resilient to insider attacks at an SSP where the insider might attempt to recover stored data.
For long-term archival use:
  • throughput more important than latency.
  • encryption is a pain because of:
    1- lost/compromised keys
    2- compromised cryptosystems
    3- key management difficult because user who stored data may be unavailable
System designed around three tenets:
  1. No encryption (cryptosystems break after time)
  2. Fulfilling requests needs no information external to archives because that data will be lost in time. This is the reason archives are used to begin with.
  3. Indviduals are more likely to be malicious than aggregates, so system builds on SSP cooperation.
The system is also built around an implicit tenet (later made explicit in the paper) that each SSP monitors its security and is distrustful of other SSPs.

Each piece of data is split into an ordered set of shards, a shard-tuple that can be used to recontruct the data when put together. Without all the shards, the data cannot be reconstructed and no information of the data is revealed. Some redundancy is added at the shard layer so that not all shards are needed to improve short-term reliability.

With each shard, an approximate pointer is stored. Approximate pointers point to a region of the namespace where other shards are stored. An exact pointer would allow an attacker to easily reconstruct the data because he would know where each shard is stored. If no pointers are used, then an external index would have to be maintained or there would be no way of knowing which shards go together. With approximate pointers, the attacker has to request on average half the region to get the right shard, which is highly conspicuous.

POTSHARDS uses RAID techniques + algebraic signatures for redundancy and verification.

Storage proceeds as follows:
  1. Data preprocessed into objects.
  2. Objects secret-split into fragments for secrecy.
  3. Fragments secret-split into shards for availability. Approximate pointers added here.
  4. Shards are placed across SSPs so that no single SSP can get all shards to make data.
SSPs coordinate between themselves into RAID groups: **Huh?
  • "To deal with the threat of data loss from these events, POTSHARDS utilizes distributed RAID techniques. This is accomplished by dividing each archive into fixed-sized blocks and requiring all archives to agree on distributed, RAID-based methods over these blocks."
For parity updates on placements, a shard stored in a particular block is sent to other archives to update the parity for that block. **What? So archives can get shards that are meant for other archives? I thought the archives are supposed to be mutually distrustful. Nope. They say the "it is assumed that all of the archives are trusted". This seems to me like a contradiction.

For recovery, the SSPs again coordinate and choose one candidate to store the failed data. **Come on, this is stretching too too far. How can they choose that if they are mutually distrustful?

Archives also do integrity checking of other archives on behalf of a customer using algebraic signatures. **These guys are doing too much work.

SSPs store exact pointers from users to shards. And shards store approximate pointers to each other. A user can get all her shards, then use the approximate pointers to find chains and try to reconstitute the fragments. An attacker would have to recover all shards in each block pointed to by approximate pointers from each archive. **If an attacker compromised a SSP, he can get the exact index of the shards! Why try to get all the shards and not the index?

The eval section was hand-wavy. I couldn't really understand what the numbers they found meant. How do they relate to the real world? Some x-axes had no labels. PlanetLab vs. local cluster had too much discrepancy. There was no discussion of overhead costs.

In summary, this paper had some good ideas. The motivation is solid and the architecture seems to be built mostly on solid principles. The design they came up with, though, is questionable. There seems to be many weaknesses. For example, they assumed an attacker would brute-force requests of data without getting meta-data from an archive. They also assumed that SSPs would be able to do all this great amount of coordination among each other. It was also assumed at one point that the SSPs should "question each other's security" i.e. be mutually untrusted, but then they say that the archives are trusted because they send each other shards while doing parity updates. This work needs significant rework and careful re-evaluation. The explanation of how approximate pointers are used by a user is unintelligible (to me at least).

No comments:

Post a Comment