Venue: SOSP 2006
Paper: PDF
Summary:
The paper describes, in painful detail, the HiStar operating system and the implementation of two applications on it. HiStar uses distributed information flow control to provide a platform to run untrusted code on sensitive data and provide guarantees for data leakage and integrity.
Motivation:
- Hard to reason about implications of every action by untrusted
code - chroot can break application assumptions
- syscall interposition error-prone
- Easier to specify policy in terms of where to where information can flow
- Unlike other IFC systems, HiStar implements standard OS abstractions on top of lower level building blocks that obey IFC
- Taint tracking mechanism does not leak info itself
There are six kernel object types in HiStar. Of these, threads are the only ones that execute code. Under Distributed Information Flow Control (DIFC), each kernel object has a label. The label describes how it is tainted. An object can be tainted in different sets of categories with different degrees. The degree of taint in a particular category places a restriction on the object. In HiStar, there are 4 levels of taint.
- Cannot be written/modified (lowest taint)
- No restriction
- Cannot be untainted/exported
- Cannot be read/observed (highest taint)
A label looks like {a0, b2,c«, 1}, which means that the object is tainted in category a with degree 0, b with degree 2, c with degree «(i.e. the object owns category c), and in all other categories with degree 1 (i.e. untainted in other categories). The labels on all objects, except for threads, are immutable and have to be specified when the object is created. Objects are created by threads only.
Threads can read restricted objects by raising their label. Threads, however, can only raise their labels to the limit imposed by their clearance label. The clearance label is an additional label on threads, that also places an upper limit on the label a thread can request for a newly created object. A created object’s label must flow to the thread’s label which in turn must flow to the thread’s clearance (bypassing owned categories).
Kernel:
HiStar uses a single-level store. All objects are stored on disk and, on bootup, the entire system state is reloaded from the latest snapshot. All objects are stored in a container hierarchy with quotas.
There are six kernel object types in HiStar: segments, threads, address spaces, containers, gates, and devices.
- Segments: segments are the simplest object type in HiStar. They are basically arrays of bytes and are analogous to other OSes’ files.
- Threads: threads execute code. They can create categories and objects, and change their own label and clearance as long as the new label or clearance can flow to the current clearance ignoring owned categories.
- Containers: All objects are stored in containers that are organized in a hierarchicy rooted at the root container with quotas. Objects are hard-linked into containers, and when they are disconnected from the root, they are garbage-collected. The existence of hard links can leak information, so objects are always specified as a (Container ID, Object ID) pair. To be able to see if a hard link exists, the thread would need to be able to read the container. Quotas on objects do not change so as not to convey information.
- Address Spaces: Each thread has runs in an address space. An address space is a mapping from virtual address -> containers, segments in them and permissions. The address spaces are used to specify code to run and virtual memory. They are created by a thread and used in launching a new thread. They are used when page faults occur to find whether a thread has permission to read some segment and to find where that segment should be mapped.
- Gates: gates “provide protected control transfer, allowing a thread to jump to a pre-defined entry point in another address space with additional privileges”. Basically, they are very similar to RPCs except that the calling thread provides the resources for execution. A thread can create a gate and give it some label (respecting DIFC). Another thread can invoke the gate and request a label and clearance. The requested label and clearance are a combination of the invoking thread’s label and clearance and the gate’s label and clearance. The thread then starts executing in the new address space (i.e. executing some other code or function) using the requested label and clearance. Return values can be stored inside the thread object itself, but there is no implicit return mechanism. The invoking thread needs to create a return gate and label appropriately (the discussion on this is quite interesting).
- Devices: Can be mounted…
The authors implemented a Unix-like environment using the HiStar kernel completely in userspace, and so they all respect IFC. The filesystem is implemented using segments and containers. A directory is a container with a special directory segment that contains a map of names to object IDs and a mutex to synchronize directory operations. To get a consistent view of the directory entries without acquiring the mutex (which needs write permissions), a user atomically reads a generation number and busy flag before and after reading an entry.
Processes in HiStar are containers with two process-specific categories to protect its secrecy and integrity. The process has two containers, a process container and an internal container. The process container exposes objects that are the external interface to the process. The internal container has segments used to implement the file descriptors, stack, heap, and executable code. It also has the thread that executes the code and its address space.
A user has a pair of unique categories that define her read/write privileges. HiStar does not support ACLs.
HiStar also supports IPC using gates, signals, and networking using lwIP. The network device has a label {nr3, nw0, i2, 1} to taint all data read from the network.
Unix leaks information for process exit, quota adjustment, and file creation. For these operations, HiStar uses untainting gates that explicitly allow information to leak.
Applications:
The authors used ClamAV as the motivating example and split it up so that the scanner part cannot export user data.
They also implemented an untrusted user authentication mechanism that cannot leak more than one bit of the user’s password. The discussion for this application is interesting, but too complicated to explain in a summary. The main idea is to split up the login mechanism into a directory and gates and use each gate with different privileges to get the information for authentication, check the password, and get the user’s privileges.
Another application was VPN, where another lwIP stack was run in parallel to the network with a different label so that public Internet data and VPN data do not mix (at least not directly).
Performance:
HiStar runs comparable to Linux and OpenBSD, sometimes fasters, others slower. It is slower in the fork and exec microbenchmark. However, it does allow new consistency primitives such as group sync where no changes are recorded until an application is done execution.
Limitations:
Some useful features are missing from the Unix environment, while the semantics of others are modified. A crash or killed process can leave mutexes locked, and there needs to be support for cleaning those up. There are no CPU quotas and HiStar does not solve timing covert channels.
The Likes:
- The paper was thorough in the description and did not leave questions
unanswered. - DIFC seems very useful, and the OS seems to implement it well.
- It is nice to build a layered system basing the design on a few basic blocks that enforce security and be able to reason about the rest of the system from those.
- The OS has a load of cool and smart things whose composition was carefully designed to build useful libraries.
- It works!
- For non-malicious code, HiStar seems to be able to protect against bugs.
- It’s cool that the taint degrees work out to map into numbers that can be compared using simple integer comparisons.
- The paper was very dense, and takes quite a bit of effort to read. It would’ve been nice if the authors could have focused on the important points and insights instead of just a spewing out of information.
- The userspace environment is not completely compatible with Unix, but ah well.
- The timing channels problem seems to be a major issue
- It looks difficult to build applications using the HiStar primitives.
- What is the killer app for HiStar? What can it be used for given all its limitations? What are all these features good for?
This comment has been removed by a blog administrator.
ReplyDelete