Coordination in Large Resource Networks

Kestrel's e-Merge-ANT project is part of the ANTS programme sponsored by DARPA/ITO to investigate the problem of coordinating a large distributed network of resources to accomplish high-value tasks that may have stringent real-time requirements when the network may be subject to high loads.

For example, resources may be sensors and tasks may be target tracking (the tasks may be considered "high-value" when, for example, the targets being tracked are missiles). The load on the network is determined by the number and type of targets, and may vary from light (no targets present, just do background scans) to heavy (more targets than the network could track even if operating at full efficiency).

Coordination among the resources is required to ensure that targets get assigned to appropriate sensors (e.g., ones that are close), and to ensure that individual sensors do not become over-loaded (a single sensor is able to track only a finite number of targets simultaneously). Moreover, a single task may require multiple sensors to act coherently (e.g., to take simultaneous measurements to triangulate a target's position).

Tasks are not generally known a priori: e.g., targets are detected at essentially random positions and times by background scans and may be present for unknown durations. The real-time requirements arise from the tasks themselves having dynamic states: e.g., the fact that a target is moving means that there is a finite amount of time between its being detected and its moving out of sensor range, and that time must be used to determine which sensors should track it, to inform those sensors to begin tracking, and to actually carry out the tracking. (In other words, the more time spent deciding who should track, the less time left for actually tracking.)

It is an assumption in the ANTS programme that the resource network may be large (possible containing tens of thousands of resources) and that the network's communication has significant latency (large enough that it must be taken into account when trying to coordinate the resources) and may be unreliable; for example, communication may take place through low-power radio communication that can be jammed.

Moreover, communication is assumed to be expensive, partly because it consumes energy, but more importantly because it reveals the resources' positions to adversaries (e.g., a target may be able to locate and attack a passive sensor based on the sensor's radio emissions).

Finally, the resources may not be continuously available because the resources themselves may be damaged by adversaries.

Technical Challenges

Any system that is designed to coordinate the resource network must address the following technical challenges:

The coordination system must be able to deal with very large networks: networks containing tens of thousands of resources are feasible using current technology, and their size is likely to grow dramatically.
Real-time responsiveness:
Tasks are assumed to have real-time properties (e.g., deadlines for completion) that impose limits on how long the coordination system has available to assign the task to appropriate resources. In general, the coordination system must balance time it spends coordinating against time resources spend actually accomplishing tasks. Similarly, it must balance its own use of resources such as CPU time and communication bandwidth against the use of these resources to accomplish tasks.
The coordination system should tolerate intermittent resource and communication failures. In particular, the coordination system must tolerate the failure of resources on which it itself depends. Clearly, if all of the resources fail, the coordination system will not be able to compensate, but partial resource failure should result in graceful degradation of the network's performance rather than total collapse.
The coordination system should incur low resource consumption costs (e.g., low communication usage). In particular, when the load on the network is low, the coordination system should incur almost zero costs.
The effectiveness of the coordination system may be characterized in two regimes. (1) When the load on the network is low, the coordination system should ensure that every task is accomplished with high quality (e.g., each target is tracked accurately). (2) When the load on the network is high, the network may not be able to accomplish all of the tasks with high quality. Nevertheless, because each task is considered to be of high-value, the coordination system should strive to ensure that the resources perform as well as could be expected, taking into account the tasks' real-time constraints. In other words, it is worthwhile developing a coordination system that will perform better than, say, a simple heuristic system.
Dynamic loads & phase transitions:
When a network is lightly loaded, it is expected that assigning the tasks to resources will be computationally easy (since there are lots of assignments which are satisfactory). When a network is over-loaded, determining that there is no way to assign all of the tasks may also be computationally easy, because the requirements imposed by tasks quickly reduce the search space of possible assignments.

In contrast, when the network is critically loaded, the computational costs may become much greater: if the load is just below critical, the number of ways to assign all of the tasks to resources is small and so finding one is difficult; if the load is just above critical, determining that the tasks' requirements are not met by any assignment may take a long time.

This peak in the computational complexity near some critical region is called a phase transition. The practical ramifications of the existence of phase transitions is that the coordination system must find ways to avoid the critical regions if it is to achieve real-time responsiveness as the load varies.