Soft graph coloring is a generalization of traditional graph coloring: the objective is to assign a color to each node in an undirected graph so that the number of edges that connect nodes of the same color is minimized. (In traditional graph coloring, the objective is to ensure that **no** edge connects nodes of the same color.)

Soft graph coloring can be viewed as an abstraction/simplification of resource allocation problems (involving resources that are mutually exclusive). For example:

- The nodes can represent resources.
- The colors can represent time slots in a cyclic schedule of resource activity. The number of time slots, and hence colors, may be fixed in advance.
- An edge between two nodes thus represents mutual exclusion between the corresponding resources (i.e., the resources should not be active at the same time).
- An edge that connects two nodes of the same color represents a resource conflict; such an edge incurs a penalty (and thus their number is to be minimized) because a resource conflict will degrade the performance of the resources.

Kestrel's project deals primarily with large, distributed networks of resources that use radio communication. Communication is considered to be expensive: it tends to have relatively high latency and can be detrimental in a physical environment that includes adversaries who are trying to determine the location of resources. Moreover, the tasks that the resource network is servicing have real-time aspects and so the resource allocation algorithm must be real-time, and the resources may be unreliable, so the algorithm must be robust. Finally, the network may be over-loaded (too many tasks) but the resource allocation algorithm must continue to behave well (it cannot just fall down or start consuming all of the CPU time trying to find perfect resource allocations when none exists).

More detail on the relationship between resource management and graph coloring is given in the section on the ANTS challenge problem. Here, it is taken for granted that the coloring algorithms must be real-time, scalable, decentralized and fault-tolerant, and must incur low communication costs. Moreover, they must behave well when they have too few colors to eliminate all conflicts.

The essential objective of a soft graph colorer is to find an assignment of colors to nodes that minimizes the number of *color conflicts*, that is, the number of edges that connect nodes of the same color. To date, we have mainly investigated coloring with a fixed number of colors - the case where the number of colors is variable, and must be minimized, will be investigated soon.

The algorithms that we have looked at are decentralized, anytime, local-repair algorithms:

- Decentralized
- Each node determines its own color based on information about its immediate neighbors' colors.
- Anytime
- Initially, the colors are randomly chosen. Then each node executes a continuous cycle of updating its own color and informing its neighbors of any changes. Thus, at any given time, a complete coloring for the graph exists, and the quality of the coloring is expected to improve over time (if the nodes are properly coordinated - see blow).
- Local-repair
- Each node makes local improvements to the coloring (it directly changes only its own color).

There are two major design decisions to be made: what sort of local repairs are to be made and how to coordination repairs.

- Local repairs - min-conflicts
- Each node is informed of its neighbors' colors. When a node updates its own color, it chooses one that minimizes the number of conflicts with those colors. In other words, it chooses a color that it believes is least used by its neighbors, although its beliefs may be incorrect because its neighbors may also be in the process of updating their colors.
- Coordinating updates - stochastic synchronous activation
- When a node updates its own color, it does so based on what it knows of its neighbors' colors. But if a neighbor is also updating its color, then the first node's information may be incorrect. In the worst case, if all nodes continuously update their colors, thrashing may occur, in which the coloring does not stabilize and communication costs soar.
At the other extreme, a total order could be imposed on the nodes to ensure that only one changes color at any given time. This obviously is not a scalable solution.

To avoid thrashing while still maintaining a high degree of parallelism, we use stochastic, synchronous updating: the nodes are all synchronized and at each time step, they determine an

*activation probability*and may change their color only if a randomly generated probability falls below the activation level. The activation level can be adjusted to strike a balance between coordinating neighbors' updates and maintaining parallelism.(An asynchronous variant of this approach also seems to be promising, based on very preliminary investigations.)

Most of our investigation has focused on a very simple method for determining the activation level: in the *Fixed Probability* method, each node sets a uniform, constant activation level. Values for this constant of around 0.5 seem to work well, although it is expected that the optimal value would depend on the complexity of the graph (e.g., the mean degree). Also, highly irregular graphs (in which the degree varies significantly from node to node) may benefit from non-uniform activation levels. These are topics of activate investigation.

Decentralized, anytime, local-repair graph coloring can be seen in action in this animation: AVI format. Nodes and edges are displayed brightly only when they are conflicts, so the progress of the colorer can be gauged by how few nodes and edges remain easily visible.

To assess how well the soft graph colorer performs in practice, we developed a simulator and measured the number of color conflicts over time for various graphs. We also measured the number of color changes that occurred, since each color change incurs communication costs (for the node to inform its neighbors of the change). Details of the experiments are given below; here is a summary:

- Strong convergence
- Moderate activation levels produce strongly convergent improvement in colorings. Thrashing occurs only at very high activation levels.
- Quick conflict reduction
- The number of conflicts is quickly reduced for moderate activation levels.
- Scalability
- The costs and performance of the algorithm are dependent on the mean degree of the graph rather than on the number of nodes.
- Robustness
- The algorithm adapts well to changes in topology and is tolerant of communication failure.

For moderate activation levels, the Fixed Probability algorithm converges to high-quality (though not necessarily optimal) colorings. Thrashing occurs at high activation levels. The long term behavior represents what happens in a resource network when the coordination mechanism is given time to work without intereference from changing tasks or resources - the results indicate that the Fixed Probability algorithm will produce high quality solutions.

The diagram below shows typical behavior of the Fixed Probability algorithm for various activation levels. Each curve shows the number of conflicts against the number of synchronous steps for a fixed activation level. The number of conflicts is normalized by dividing by the total number of edges (to eliminate dependence of the metric on the size of the graph) and by multiplying by the number of colors (because coloring with more colors is inherently easier). Using the normalized scale, a random assignment of colors has an expected value of 100%.

Performance for activation levels of 10%, 30%, 50%, 80% & 90%

The effect of the activation level can be seen in this animation: AVI format. Copies of the same graph are 4-colored using different activation probabilities; clockwise from top left:

- Low activation threshold (10%) - too slow in removing conflicts.
- Medium activation threshold (30%) - fast removal of most conflicts.
- Medium activation threshold (50%) - fast removal of most conflicts.
- High activation threshold (90%) - thrashing; worse performance than a random colorer.

Performance for activation levels

of 10%, 30%, 50%, 70%, 80% & 90%

Even for over-constrained colorings (where the number of colors is insufficient to color the graph with zero conflicts) the Fixed Probability colorer is well behaved and performs well for moderate activation levels.

For example, the two diagrams to the right show the *mean* number of conflicts and color changes over the first 50 steps versus the number of colors for various activation levels.^{1} The means are measured because these represent how well the resources would be able to accomplish tasks while their schedules are being improved, and the cost incurred to achieve the improvement.

The graphs being colored can all be colored with zero conflicts using 4 colors, so these diagrams show the performance when the coloring is over-constrained (2 & 3 colors), critically constrained (4 colors) and under-constrained (5 & 6 colors).

In the conflicts diagram, for two colors, the theoretical lowest value that could be achieved for conflicts is 50%; the colorer achieves a value of around 58% after 50 steps and a mean of 65% (for moderate activation values); in the long term, it achieves a value of around 52%.

The communication diagram shows the mean fraction of nodes that change color (per step); this is proportional to the communication costs. For moderate activation levels, the rate of color change is low, less than 10%.

A straightforward analysis of the Fixed Probability colorer shows that its per-node, per-step computational, storage and communication costs are independent of the number of nodes. The costs do depend on the topology of the graph (roughly speaking, the mean degree) but we expect to deal only with graphs of (approximately) constant mean degree since these are the type that arise in large resource networks.

Experimentally, it was found that the *performance* of the Fixed Probability algorithm was independent of the size of the graph.

To assess the robustness of the Fixed Probability algorithm, the topology of the graph was adjusted while it was being colored by removing and subsequently reinserting nodes (and adjacent edges). This approximately corresponds to resources failing and recovering in a resource network.

It was found that low level, continuous change in the topology produced small increases in the number of conflicts. In contrast, high level, intermittent changes caused transient spikes in the number of conflicts - if the changes were not too frequent, the coloring robustly adapted to the changes and continued to improve over time.

We also investigated the effect of communication noise/failure and found that low levels of noise/failure induced only small increases in the number of conflicts.

- Paper:
*An Experimental Assessment of a Stochastic, Anytime, Decentralized, Soft Colourer for Sparse Graphs* - Presentation at SAGA 2001 of preceeding paper.
- Our presentation at the spring 2001 ANTS meeting also included further technical details and some more coloring animations.
- The section on the ANTS challenge problem gives more detail on how graph coloring is used.

^{1}The results for short-term improvement and robustness are for a variant of the Fixed Probability colorer that has better performance when under-constrained.