Packet Network failure, Sparse vs Congested traffic volumes

This document attempts to show how timeouts and disconnects happen when a packet network becomes overloaded. We'll look at how much traffic constitutes an overload, what packet networks are immune, and why packet networks with overload potential are still useful.

I'm sorry this appears so dry. We're going to bring in a good SFX group and make a cool PG-13 live action full-length.
Catch it at your local deluxe dinner theatre next year.

Collisions

A packet collision occurs when two stations are transmitting and both are within range of a receiver. The collision would make it so the receiver fails to decode either packet.
Even if one of the stations transmits after the other (but while the first is still transmitting), and the 2nd station is much stronger, the receiving station will still fail to copy the 2nd packet because it considers the 2nd packet to be part of the 1st, and that wrecks the receiver's perception of the 2nd packet's contents.

Connected vs Unconnected

There are two common modes of packet radio messaging we're concerned with. One is used for APRS, the other for BBS, DxCluster, Chat and Nodes.

APRS - send-and-forget

APRS stations send messages via digipeaters, but they are unconcerned if a particular packet makes it to the destination because each packet is mostly redundant with the initiator's previous packet and the next packet. Furthermore, in APRS, there can actually be several appropriate receivers of the transmission, any one of which is perfectly adaquate, and any and all of which would redundantly copy and hand-off the packet to an Internet server.
In an APRS network, the loss of one packet to any or all potential receivers does not result in additional traffic. If the number of stations increases suddenly, the traffic volume only goes up proportionally, instead of exponentially, and collisions have minimal negative consequences.

This mode is called Unconnected mode or Unproto mode.

BBS, DxCluster, CHAT, Nodes - Guaranteed Delivery with Retries

In a guaranteed-delivery use-case, the initiator of a message is sending it to one destionation. The initiator of a data message (a data message contains data/text) will regenerate each data message repetitively until each message is acknowledged by the destination, or a retry timeout occurs.

This is called Connected mode.

Connected mode has a retry mechanism in order to overcome transient channel problems, like link fading, or noise condition.

Guaranteed-delivery/connected-mode packet networks fail when there are multiple stations generating traffic into the channel, where the channel is near or over the occupancy-limit (discussed in a moment) and where one of the messages is lost, possibly due to collisions, necessitating a retry. The amount of traffic seems to explode once the network starts failing.

Above the Occupancy Limit

Above the occupancy-limit, the chance for collisions (in a guaranteed delivery network) goes up rapidly. This was discussed in a paper called "The Aloha System" or ALOHANET, from 1971. Once collisions start, (and because of the retry mechanism) the traffic volume begins to increase, causing a catastrophic network collapse resulting in the participating stations receiving a DISCONNECT. No hardware is damaged in this collapse, so the participants can reconnect and resume operations, but unless most or all of the participants change their behavior to reduce the traffic volume, the failure will repeat.

Occupancy-Limit

In a CSMA data channel (which is what ham radio packet usually is), the TNCs decide when to transmit, based on configuration, and the TNC's observation of the channel. In any CSMA data channel with multiple contributor/initiators, there is a optimum amount of channel time which must be left empty, for the next initator transmission to occupy. Once the channel gets too full, such that the optimum is no longer left unused, the total throughput on the channel will actually fall, rather dramatically.

The occupancy-limit is usually discussed in terms of how much channel is permitted to be occupied, i.e. the inverse of the amount left unused.

Depending on the network architecture, i.e. whether there are hidden transmitters involved, the occupancy-limit will vary dramatically. In the TNC-design, carrier-sense-multiple-access, or CSMA, is part of the system and is a major design consideration. Using CSMA, the TNC will check the channel and decide to transmit only if the channel is clear for a short time. Using P-Persistent CSMA, built into all TNCs, there is a probability factor and a Slot-Time to permit a station to not claim the channel rapidly even when it is quiet.
StackOverflow on P-Persistence may be a good source of reading on this topic.
If the network is topographically flat, i.e. everybody can see each other, the P-Persistance model works pretty well, and the occupancy-limit is higher. But we don't have topographically flat towns, counties, and states.

Because the layout of the packet radio network across terrain is almost always way more complicated than one of the mathametical models, we can't come up with a percentage number, as did the Alohanet paper. In that paper the Pure-Aloha network was a star, and the stations, except for the mountain-top, could not hear each other at all. In the Pure-Aloha case the percentage number for occupancy-limit was around 18%. In a ham radio packet network based on the Sparse model, the occupancy-limit will probably be higher than that number.
In one easy-to-imagine example, however, it will be worse. See the graphic:

WEST-A                                                 EAST E
WEST-B -------- WESTMTN ---------- EASTMTN ----------- EAST F

Presume that WEST-A can see WEST-B and WESTMTN, but not EASTMTN or EAST-E or EAST-F.
Similarly, EAST-E and F can't see the WEST stations or WESTMTN.
To reach EAST-F, WEST-A would connect through WESTMTN and EASTMTN. Traffic launched from EAST-F to WEST-A would go EAST-F to EASTMTN to WESTMTN to WEST-A, and then WEST-A acknowledges back to WESTMTN, to EASTMTN to EAST-F.
If WEST-A can read a file from EAST-E, at the same time EAST-F is reading a file from WEST-B, the occupancy percentages would have to be under 18% for this to function.
The reason for the low number is that neither EASTMTN or the EAST stations know that WEST is trying to send data and will apparently maliciously jam the receiver at WESTMTN by trying to move traffic. The same thing would happen from the other side.
This is an experiment worth running. From a god's-eye view, this is extraordinary to watch.
See also: FAQ-HTS

A 'good' Sparse Network

There are single-frequency packet network designs which do not fail at the occupancy-limit. The APRS Beacon function is a good example. APRS is well known in the ham radio community, but there are other designs where the traffic will never reach the occupancy-limit.

A packet network built for sparse traffic can have digipeaters, single port nodes, and multiple data sources. We'll call these networks "Flat Networks" because they have a trivial single-channel hierarchy. If the traffic volume is always less than some low occupancy-limit, then it is possible to build a very inexpensive network with multiple originators. Examples of these networks include utility meter reporting, alarm reporting and control, home automation, paging.

Congested model

In a connected-mode packet network, historically, there is a configuration for each station to define the likelyhood that a waiting packet will be transmitted. That configuration includes a figure P-Persist and a figure for slot-time. P-Persist defines what the likelyhood that this station will transmit a pending packet. Each time a slot expires, the P-Persist chance will be measured and if it comes up true, then if the channel is clear, the station will transmit. Slot-time is how long a slot lasts. Slots are not synchronized between stations, but the configuration is defined as how long after any station stops listening for channel-clear by going into transmit, that its transmissions are detectable by this station. The P-Persist and slot-time configuration would try to keep the channel from saturating more than the threshold at which collisions are inevitable.

A network run near the saturation point can be called a congested model network.

Backoff-And-Retry

Modern wireless data networks designed with the Congested Model use a replacement for P-Persist called backoff-and-retry. In backoff-and-retry the stations will maintain a delay count used to space out their data transmissions.
Each time a transmission fails to be acknowledged, the failing station will increase the delay by as much as 2x.
Each time a transmission is acknowledged, the station will decrease the delay by a small amount, 10% or so.
Eveutually the station will learn what amount of data can be injected into the network without failure.
This results in a network which is adaquately below the occupancy-limit, however, it also results in a network which uses [much] less than the theoretical bandwidth of the data modems.

When to use a Sparse Packet Network

Sparse networks are useful so long as:

The amount of traffic stays below the occupancy-limit,
AND, even if collisions occur, the loss of traffic either doesn't cripple the desired functionality, or retries don't push the traffic volume over the occupancy-limit.
Another class of applications which works adaquately on a sparse traffic packet network is where there is a single source of data, even if there are multiple destinations. The single source can modulate it's output rate to verify delivery to the multiple destinations.

Situations that do not work on a sparse-traffic packet network include cases where there are multiple simultaneous data sources and the sum of traffic exceeds the occupancy-limit, or where there are high numbers of individual small-data sources, again going over the occupancy-limit.

Traditional Ham Radio Packet-Network

Single frequency Ham Radio packet-radio-networks, the most common type of Ham Radio packet-radio-network, are traditionally built on the Sparse-Network model, but are then used as congested or saturated networks, and without backoff-and-retry. Ham Radio packet-radio-networks, if actually used, reach their own occupancy-limit frequently, and the involved stations are stalled and then disconnected. This results in tremendous frustration.

Saturated Network

A TARPN is designed to never see this problem by moving each step in the link chain to a different frequency and/or band, so the relayed information never interferes with the originator. .