Friday, December 11, 2009

Thoughts on Automatic Patch-Based Exploit Generation is Possible: Techniques and Implications

Authors: David Brumley (CMU), Pongsin Poosankam (CMU), Dawn Song(UC Berkeley), Jiang Zheng (U. Pittsburgh)

Venue: 2008 IEEE Symposium on Security and Privacy

The work describes how to generate exploits for an unpatched binary program P given a patched version P'. This is important because it highlights the fact that the patching process itself is important. Staggering updates over a long period of time can lead to exploits that were previously unknown and that can be generated in minutes.

The work, however, only targets input validation checks. So, for example, if a buffer overflow was fixed by checking the input size or the pointer, the system the author built works. If, on the other hand, it was fixed by increasing the buffer size, the system fails.

The authors generate exploits (user inputs to a program that compromises its safety) in four main steps:
  1. "Identify the new sanitization checks added in P'." This is done using off-the-shelf binary differencing tools like EBDS. EBDS does syntactic analysis, but not semantic. So if a check in P is i > 10 while in P' it's i - 1 > 10 the latter might be reported as a new check by EBDS. The verification step later on weeds out these semantic differences.
  2. For each new check found, generate a candidate exploit which fails the new check in P' by:
    1. Calculating the boolean function F that specifies the weakest constraints on the inputs that would fail these checks. This is done by basically trying to find which instructions in the program to include in the formula. They use three approaches:
      1. Dynamic: If we know the constraints on the inputs that exercise the check, they simply specify the constraints that lead to the new check and add the constraint that the input must fail the new check. This is the case if the new check is on a common path executed for all inputs. This is the quickest method, and generates the easiest constraint functions to solve. In practice, this is done by analyzing an execution trace, lifting the executed instructions into the Vine modeling language for easier analysis, then doing the analysis.
      2. Static: (See Creating vulnerability signatures using weakest preconditions) In this approach, the whole program is lifted to the Vine modeling language first. Then, the program's control flow graph is analyzed and the program is chopped so that it only includes paths that lead to the new checks. The constraint function is then basically the whole CFG. This approach is more thorough and will generate exploits that might require more searching in the dynamic approach, but the generated functions are usually much larger and more difficult to solve.
      3. Hybrid: Use the dynamic approach to reach a junction point in the execution closer to the new check, then use the static approach between that point and the new check. This approach creates smaller constraint functions so they are easier to solve but also allows wider coverage to find exploits.
        They automate the process by iterating through junction points until an exploit is found, where the heuristic they use for these points is to place them at procedure boundaries. While the past two approaches have been done by other before, the authors claim the hybrid approach as a novel contribution.
    2. Solve the constraint function F to find the candidate exploits using an off-the-shelf solver.
  3. Check that the candidate exploit is a real exploit by checking that it compromises the program's "safety policy", where "a safety policy" is a function that maps the program's state space to a boolean value of safe or unsafe. Examples include type checking, buffer overflows, ... In the paper, the authors use policies that are enforceable by their execution monitor (dynamic taint analysis to detect information leaks and memory safety issues, dynamic type checking, return address integrity, ...)
  4. Generate additional variants x' by solving the new constraint function F'(x') = F(x') and x' != x

If the solver fails to find an input, then the path described by the constraint function does not have an exploit. If it takes too much time, then the authors set a timeout so that the system moves on to check a new path by, for example, changing the mix (junction) point in the hybrid approach.

For evaluation, the authors ran their system on 5 MS updates. Some exploits worked with dynamic analysis, others worked with static analysis, and other required the hybrid approach. Exploits required between tens of seconds to less than 3 minutes, and most of them resulted in control hijacking.

The techniques seem to also be useful in other applications such as automatic deviation detection to check whether two programs behave the same, or, more concretely, to check if compiler optimizations affected the behavior of a program. But this doesn't seem to me to be easily attainable since the differences between two programs might be large enough that the techniques don't work.

The solutions to the automatic patch exploit generation problem they propose are obvious: obfuscation, patch encryption so that everyone can apply it simultaneously, and fast patch distribution using P2P. They mention that off-line systems would stay vulnerable, but this is the same issue with the current path distribution architecture.

The Likes:
  • The paper presents a cool attack
  • It works (at least for the cases shown)!

The Dislikes:
  • There was no real quantitative estimate of how important this is in practice or how well the system will fare in general. What is the percentage of bugs it does work on and through which a bad guy can cause real damage?
  • There was too much notation and repetition
  • I did not understand anything on dynamic analysis. The math notation looked horrible. I'm sure the ideas are probably not too difficult to understand, but I didn't want to wade through it with no helping hand (i.e. real explanation).
  • This is not quite about the paper, but there are no satisfactory solutions to the problem they raised yet. It would've been nice if they were able to describe one, but that's not the goal here.
  • I wish there was a bit more education on the technical background.

1 comment:

  1. This comment has been removed by a blog administrator.