# Approach

It is becoming apparent that Java will play a significant role in the nation’s computing base, especially in the context of mobile computing, where security issues are of paramount importance. Unlike other widely-used programming languages, Java’s clean design is amenable to formal analysis. Sun has published an English specification of the JVM and makes available a reference implementation. However, the English specification, while quite good, is nonetheless incomplete and occasionally in conflict with the reference implementation. The reference implementation is a large, complex C program in which security checks are distributed throughout the code making it difficult to understand and impossible to prove anything about. This situation is not acceptable for security-critical applications. Because of the importance of the verifier, and because it is amenable to formal analysis, this is an ideal opportunity to apply formal specification and development technology. Use of formal methods will improve the security of Java applications.

The approach is to first create a specification in naive set theory based on Sun’s English specification and reference implementation. The specification formalizes, fills gaps in, removes bugs, and generalizes the Sun specification. Our specification removes constraints on loading strategies that are in the Sun implementation.

The specification builds heavily on standard mathematical
structures, such as stacks, lattices, and product and sum lattice
constructors. The bytecode verifier is formalized by the specification
as an *instance of a constraint satisfaction architecture*. Thus,
to verify a JVM class description, a constraint satisfaction problem
is derived from the class description and then solved. If the
constraint problem does not have a (non-trivial) solution, the class
fails verification. Reliance on standard architectures and
mathematical structures makes the specification easier to comprehend
and validate.

The constraint satisfaction architecture is parameterized by a lattice and a collection of monotone functions over the lattice. The lattice formalizes the typing structure of the JVM. The monotone functions are used in constraint inequalities that directly formalize the JVM typing rules. The specification is thus at a high-level of abstraction, and in close correspondence to how a type theorist would specify the JVM type system. Furthermore there is clean separation of the specification of bytecode verification problem as a constraint problem, and its implementation by constraint solving algorithm. The algorithm specified in the constraint solving architecture is a chaotic or non-deterministic fixed-point iteration scheme similar to those used to solve dataflow analysis problems.

Next, the specification is expressed in Specware. It is type
checked, and putative properties of the specification are
proved. Specware composition capabilities permit a very natural
structuring of the specification into components. For example,
Specware’s *parameterized specifications,* which, like ML
functors, build specifications (or modules) from specifications, are
used to instantiate the constraint satisfaction architecture and
lattice-constructing operations used extensively in the specification.

Finally, the specification is refined into code. This requires
defining *refinements* for each of the components of the
specification. These refinements are composed to yield an
implementation in Lisp or C++. The refinements carry formal proof
obligations that guarantee that the implementation is correct. The
main work is to derive an efficient data representation for the JVM
lattice, and to refine the constraint-solving algorithm into a more
efficient form.

The specification of the bytecode verifier has been structured so that can be easily extended to verify additional JVM class properties, such as information flow properties.