# Research Interests of Dr Stephen Fitzpatrick

## Human-centred Automation for Interactive Construction of Plans

Kestrel was involved in RSPACE, a DARPA-sponsored program to dramatically reduce the time and people required to produce plans (air tasking orders) for the US Air Force. There had been several predecessor programs with similar objectives, but they had encountered problems with acceptance because the human operators were reluctant to trust the plans that were produced, since they had little insight into the process. Consequently, a priority for RSPACE was to make sure that the operators could gain confidence in the plans.

I worked with user experience experts at Next Century Corporation to develop concepts for opening up the planning process to the user at an intuitive level. For example, the operator might have to choose from among several air bases for aircraft to accomplish some task. The choice of base would be constrained by geography, time, fuel, compatibility between the type of aircraft and the task, availability of aircraft, etc. Indeed, the operator would commonly have to choose several different types of aircraft, at different bases, that would work together to complete the task.

I developed a Java back-end system that worked with Next Century’s web-based user interface, to exploit the constraints to guide the operator’s choices in exploring possible plans. For example, selecting one base for a given task would narrow down which other bases that might collaborate on the same task, taking into account the time and location of the task, the operating schedules of the other bases, and the bases’ locations and the different speeds of their aircraft (collaboration requiring colocation).

A key technique for achieving human-centred interaction was formulating higher level constraints that involved fewer variables. For example, analysis might show that two particular bases could never collaborate on any task, because of their respective locations and the respective speeds and ranges of their aircraft. Providing the operator with an explanation of why a plan is infeasible using such high level constraints helps to spare the operator endless twiddling with low level details, such as routing or start times, that cannot, in this case, make a difference.

The system was also able to recommend options to the operator. One complication was that there were several metrics for ranking options – such as efficiency, effectiveness, safety and robustness – that were somewhat contradictory. An axiom of the RSPACE program was that these could not be reduced to a single metric because that is not how human operators think about the various characteristics of a plan. Rather, the operator had to be allowed to explore the space of feasible plans along the various metrics (requiring that a Pareto frontier be generated).

## Peer-to-peer Algorithms

I was involved in several projects that dealt with peer-to-peer networks, in which a large number of widely spread nodes could communicated only locally. A common theme was the need to approximate a global object (e.g., a common coordinate system) even though each node was capable of retaining only a small amount of data (e.g., the distances between itself and some nearby nodes, and between pairs of nearby nodes).

My colleague, Lambert Meertens, and I developed several algorithms that demonstrated how to achieve this. See: Scalable, Anytime Constraint Optimization through Iterated, Peer-to-Peer Interaction in Sparsely-Connected Networks

Two examples of peer-to-peer algorithms follow.

### Soft Graph Colouring

In the classical problem of graph colouring, each node in an undirected graph is to be assigned a colour from a fixed set of colours, such that nodes connected by an edge have different colours. (A colour might, for example, represent a radio frequency at which that node broadcasts to its neighbourhood – Connected nodes are nearby nodes that need to have different colours to avoid interference.)

The classical problem is a constraint satisfaction problem, requiring that there be no conflicts (i.e, no connected nodes with the same colour). For peer-to-peer systems, we weakened it to a constraint optimization problem (soft graph colouring), in which the objective is to minimize the number of conflicts. We demonstrated that very simple algorithms can work remarkably well, in a few rounds significantly reducing the number of conflicts below what can be achieved with random colouring. (Random colouring can be achieved without communication and so is a good benchmark for a distributed problem.)

For example, what is perhaps the simplest algorithm operates as follows:

• Initialization: each node chooses a colour at random, and transmits it to each neighbour (i.e., each node to which it is connected by an edge).

• Periodically, each node selects a colour that minimizes the number of conflicts that it has with its neighbours, given what the node knows of those colours, and transmits its new colour to its neighbours. We do not want all of the nodes changing colour at the same time as that causes thrashing, whereby large blocks of nodes have conflicts. To avoid this, a node can introduce random delays between colour selections.

This algorithm has been studied and improved by several researchers and applied to other constraint problems that have a sparse network structure (i.e., the number of constraints per network node does not grow with the total number of nodes).

### Localization in Sensor Networks

Localization is the problem of constructing a common coordinate system and determining coordinates for each node in a network of nodes, based on local measurements.

There are several ways to achieve localization. For example, each node can estimate its distances to some nearby nodes and communicate those measurements to nearby nodes. Each node thus acquires a local distance matrix. Using multilateralization, a node can construct a local map of nearby node positions from its distance matrix.

Each distance matrix generates its own coordinate system, since the network has not yet established a common origin or orientation. So nearby nodes exchange their local maps, identify common nodes, and use their positions to align the maps (using translation, rotation and perhaps reflection). After aligning a neighbour’s map, a node reconciles it with its own map. For example, each map almost certainly contains a somewhat different position for any common node, due to measurement error and simplifications inherent in the model on which this algorithm is based; the different positions can be reconciled by averaging.

Eventually, the local maps become aligned throughout the network. This process can be accelerated using various techniques.

An alternative approach uses anchor nodes (e.g., nodes equipped with a GPS sensor), local broadcasts, and an upper bound on the range of a local broadcast. Each anchor node broadcasts its own position estimate, which is a region rather than a point due to the finite precision of the anchors’ sensors. A nearby node receives the position estimate, and reasons that its own position must lie within the region it has just received, dilated by the maximum communication range (a known constant). It updates its own position by intersecting any estimate it already has with the dilated, received region, and transmits its updated estimate to its own neighbours.

Each of these position estimates, of course, becomes less and less accurate with each communication hop (and the associated dilation). However, once a given node starts to receive estimates that originated from different anchors, then the intersection of estimates starts to narrow down the estimate. Eventually, depending on the number of neighbours and the number of anchors, each node obtains a crude but nonetheless informative estimate of its own position.

The following video shows the localization process in a simulated network of 1,000 nodes, each of which can talk to its 5 nearest neighbours. The white dots indicate the anchors (a random 2.5% of the nodes are anchors, plus 8 around the edges). The blue dots indicate the nodes that must determine their positions. A coloured line indicates the error in a node's estimate of its position — the line connects the node's true position with its estimated position. The colour of the line is determined by the magnitude of the error, with green representing a low error, then yellow, orange, red and finally purple indicating a high error.

The next video shows 20,000 nodes, each communicating with its 10 nearest neighbours. This time, the anchors are only around the edges, plus one in the middle. This is probably not a practical configuration, but the patterns displayed by the position estimates do illustrate the flow of information from the anchors through the network of nodes.

## Synthetic Diversity and Runtime Monitoring

Synthetic diversity is the use of multiple, semantically equivalent data representations (and corresponding algorithms) for a single abstract data type. A simple example is a set: we can choose to represent a set using a non-repeating list (perhaps ordered), using a list that may allow repetitions, using a tree (if we have an ordering), using hashing, etc. A set of boolean flags has an additional representation as a bit vector. In turn, a list can be represented using an (expandable) array, using linked elements, or even as a finite function (map) from index to element.

A programmer may choose to use a particular representation for a particular variable when implementing a given application. However, using synthetic diversity, my colleague, Stephen Westfold, and I generated multiple implementations of the application, each implementation using different combinations of representations. The different implementations can be used simultaneously on different computers (spatial diversity), or rotated on a single computer over time (temporal diversity).

One reason for using different implementations is to make it more difficult for an attacker to study an implementation and identify and exploit potential vulnerabilities.

Using synthetic diversity, we can also choose to use multiple representations simultaneously for a single variable in a single application. For example, we may choose to represent a complex number using its real and imaginary components, or using its radius and angle; or we may choose to use both representations, performing each operation on the complex number twice, once with each representation. The equivalence between the representations allows us to generate cross-checking code to detect errors in the calculations or corruption of the values, maybe arising through bugs or attacks.

If we use three or more representations, we can also introduce code to resolve discrepancies through voting, and to automatically reconstitute representations found to be corrupt.

This approach can be augmented using semantic properties of the abstract data type and the representations. For example, if we represent a set using a non-repeating list, then we know that the list should not contain repetitions. We can generate code that checks for repetitions and removes any that are found. Such repairs to data structures are not guaranteed to be correct (since we do not know how the repetition originated), but they fall within an approach that tries to “fight through” anomalies in the belief that continued operation will be better with the repaired data than with the corrupted data.

The above example uses a data type invariant (i.e., no repetitions). Other semantic invariants arise from function pre- and post-conditions, and in the refinement of an abstract type (such as a set) into a representation (such as a list).

## Software Synthesis and Transformation

At Kestrel, I used semi-automated refinement techniques to generate implementations from abstract specifications of applications (e.g., for planning and scheduling). Refinements can operate at various levels: introducing algorithms to solve problems specified non-constructively; introducing data type representations (e.g., representing a set as a list); performing optimizations (e.g., applying finite differencing to a repeated computation); and generating low-level code.

At Queen’s University, I used automated transformations to convert high-level functional specifications of numerical algorithms (e.g., eigensolvers) into architecture-specific Fortran implementations (e.g., implicitly vectorizable form or explicitly data-parallel form). I also use automated transformations to introduce sparse representations (e.g.: tridiagonal matrices; or matrices in which the number of sparse elements per row is fixed but their locations are not, requiring a secondary matrix to record the column numbers).

For example, the $nxn$ product of two $nxn$ matrices, $A$ and $B$, can be defined as: $A × B = matrix([n, n], λ i, j · row(A, i) . col(B, j))$ where $U.V$ is the dot product of vectors $U$ and $V$: $U.V = sum(U * V)$ where the $*$ operator here performs elementwise multiplication of the two vectors. The sum function is a reduction: $sum(U) = reduce([n], λ i · U[i], +, 0)$

Suppose $P={A}^{T}×B$ where ${A}^{T}$ is the transpose of $A$: $A T = matrix([n, n], λ i, j · A[j,i])$

Transformation for unfolding definitions and algebraic simplification can eliminate the definitions and intermediate terms to produce the following: $P = matrix([n, n], λ i, j · reduce([n], λ k · A[k,i] * B[k,j]), +, 0)$

A data-parallel architecture is best used for whole-array operations. Transformations designed to optimize computations for such an architecture swap the reduction and the matrix creation, so that each step of the reduction is a whole-array operation: $P = reduce([n], λ k · matrix([n, n], λ i, j · A[k,i] * B[k,j]), +, 0)$

Other transformations propagate the matrix creation inwards (converting the single-element multiplication into an element-wise multiplication on whole arrays, for example) and implement the reduction as an explicit loop: ``` P = 0 do k=1, n     P = P + matc(A(k, ), n) * matr(B(k, ), n) end ``` where the $+$ and $*$ operators apply elementwise to whole arrays, $A\left(k, \right)$ extracts the $k$th column of $A$, $matc\left(U,n\right)$ copies vector $U$ across $n$ columns, and $matr\left(U,n\right)$ copies vector $U$ across $n$ rows. This is an implementation of $P={A}^{T}×B$ that is efficient on a data-parallel architecture.