Objective
Objective
The objective of this project is to construct a formal specification of the Java Virtual Machine (JVM) bytecode loader and verifier, and from that specification formally derive a provably-correct implementation. The specification and program development is being carried out using Kestrel’s Specware System. The security of Java applications depend on type safety and related properties enforced by bytecode verification. Serious Java security flaws have been traced to errors in Sun’s Java bytecode verifier and loader. A formal specification will serve as a reference document for the construction of new JVM implementations for just-in-time compilers, web browsers, smart cards, etc.. The desired safety and security properties of the verifier will be proved as putative properties of the formal specification. The formally-derived implementation can be used as a test oracle to test implementations, or may be incorporated directly into a JVM implementation.
Approach
It is becoming apparent that Java will play a significant role in the nation’s computing base, especially in the context of mobile computing, where security issues are of paramount importance. Unlike other widely-used programming languages, Java’s clean design is amenable to formal analysis. Sun has published an English specification of the JVM and makes available a reference implementation. However, the English specification, while quite good, is nonetheless incomplete and occasionally in conflict with the reference implementation. The reference implementation is a large, complex C program in which security checks are distributed throughout the code making it difficult to understand and impossible to prove anything about. This situation is not acceptable for security-critical applications. Because of the importance of the verifier, and because it is amenable to formal analysis, this is an ideal opportunity to apply formal specification and development technology. Use of formal methods will improve the security of Java applications.
The approach is to first create a specification in naive set theory based on Sun’s English specification and reference implementation. The specification formalizes, fills gaps in, removes bugs, and generalizes the Sun specification. Our specification removes constraints on loading strategies that are in the Sun implementation.
The specification builds heavily on standard mathematical structures, such as stacks, lattices, and product and sum lattice constructors. The bytecode verifier is formalized by the specification as an instance of a constraint satisfaction architecture. Thus, to verify a JVM class description, a constraint satisfaction problem is derived from the class description and then solved. If the constraint problem does not have a (non-trivial) solution, the class fails verification. Reliance on standard architectures and mathematical structures makes the specification easier to comprehend and validate.
The constraint satisfaction architecture is parameterized by a lattice and a collection of monotone functions over the lattice. The lattice formalizes the typing structure of the JVM. The monotone functions are used in constraint inequalities that directly formalize the JVM typing rules. The specification is thus at a high-level of abstraction, and in close correspondence to how a type theorist would specify the JVM type system. Furthermore there is clean separation of the specification of bytecode verification problem as a constraint problem, and its implementation by constraint solving algorithm. The algorithm specified in the constraint solving architecture is a chaotic or non-deterministic fixed-point iteration scheme similar to those used to solve dataflow analysis problems.
Next, the specification is expressed in Specware. It is type checked, and putative properties of the specification are proved. Specware composition capabilities permit a very natural structuring of the specification into components. For example, Specware’s parameterized specifications, which, like ML functors, build specifications (or modules) from specifications, are used to instantiate the constraint satisfaction architecture and lattice-constructing operations used extensively in the specification.
Finally, the specification is refined into code. This requires defining refinements for each of the components of the specification. These refinements are composed to yield an implementation in Lisp or C++. The refinements carry formal proof obligations that guarantee that the implementation is correct. The main work is to derive an efficient data representation for the JVM lattice, and to refine the constraint-solving algorithm into a more efficient form.
The specification of the bytecode verifier has been structured so that can be easily extended to verify additional JVM class properties, such as information flow properties.
Recent Accomplishments
Identified restrictions in the Sun implementation that requires loading of certain classes prior to their execution. Generalized the specification of the bytecode verifier to remove this restriction.
Formalized subtle properties of the verifier in a clean and transparent mathematical framework.
Completed Specware specification of the bytecode verifier. The specification will be published on our web site.
Current Plan
Refine Specware specification to executable code.
Develop an alternative implementation of the bytecode verifier that integrates U.C. Berkeley’s BANE constraint solving technology.
Use a theorem prover to prove the correctness of the derived implementation.
Technology Transition
The Specware system, developed in part with the support of this and other DARPA efforts, is being distributed. It is being used at Motorola, Boeing, academic, and government institutions. Specware is a system for specification and correct refinement of specifications into code. It runs on a SUN UNIX platform.
Via the National Security Agency, technical publications, and direct contacts, our work is being briefed to JavaSoft.