First read FAQ HTS and FAQ CSMA

Networking on Purpose -- by Tadd, KA2DEW

By the late 80s it became possible to configurate a packet network router with two or more radios. It was now possible to multiply the bandwidth of the network, and also to avoid interference between separate channels and sites. Having multiple channels upped the cost of the network routers, made some stations not see each other since they'd be on different channels, and made network routing difficult. However, having the ability to select frequencies gave us the opportunity to pick and plan networking strategies and to ease congestion.

This web page talks about different options we have when building out a network of packet stations and compares each. If you can think of a configuration I missed or can critique this page, please email me at my QRZ-listed address.

The TARPN packet network is made up primarily of low technology digital modems and off-the-shelf radio hardware. Packet radio with this kind of equipment has a bad reputation because packet networks with no design intent, or with poor design, have performed badly. Typically, network operators attempt to fix badly performing networks by insisting on user access policies, regulating who and when access is granted for all or parts of the network. .

I think with good design we can get much better results. For the purposes of this discussion, consistently getting 40 characters per second between any neighboring stations, regardless of number of participants or content, would be good results.

Here is a fiction (based mostly in fact) about how the systems we have seen come and go were implemented.

Five hams make a packet network -- mesh-on-channel

If you have five people sharing a single 1200 baud channel, each using a radio and a TNC, it looks like the group would have a maximum of 1200 bits per second (max of about 90 characters per second in packets) to share between the five people.
csma_5_stations002_no_hill

Because the stations would naturally transmit at the same time if you let them, each of the five stations is set up to implement CSMA using SlotTime and Ppersistence calculations. If Ppersistance and SlotTime were not used, two stations at a time could achieve 90 characters per second, but that rate would fall to 0 very quickly if a 3rd station joined in. This failure to zero-rate is called catastrophic-network-failure.
With Ppersistance and SlotTime, the channel is slowed down to reduce the chance of collision. With no collisions, under Ppersistance and SlotTime, an 80 character packet message and acknowledgement takes 5 seconds to exchange on this shared channel, or 16 characters per second. The group of five is now sharing 16 characters per second. If there was a magical supervisor guiding the stations to transmit at the appropriate times, the channel could achieve 90 characters per second, but there is no magic here. Instead we have Ppersistance and SlotTime.

Practically speaking it is worse than that, because as the number of users grows, not everybody can hear everybody. Take this example:
csma_5_stations002_one_frequency
The grey thick lines are radio paths between the stations. Everybody is on the same frequency.
Sergey can hear Bob and Bimaldo, but not Sigmund or Anna-Mae. That means that Sigmund might start transmitting while Sergey is already on the air.
Before reading further, check out this article: FAQ HTS

If Sergey sends a packet to Bimaldo, and Sigmund goes on the air at the same time, Bimaldo would miss both packets. Sergey will not retry until his FRACK delay runs down. Using TNC default numbers, that's a delay of about 4 seconds. If Sigmund is sending to Bimaldo, his packet also would have to be resent. Sigmund would be able to talk to Anna-Mae, and get acknowledgements despite Sergey's retries. However, when Bimaldo finally does answer Sergey, that could interfere with Anna-Mae and Sigmund's communications. These packet collisions result in retries which result in more transmission time just to keep up with the packets ready to go out. This temporarily decreases the channel available capacity. Also, in order for Bob to talk to Anna-Mae, his packets have to survive the gauntlet of collisions, and the delays of Ppersistence 3 times out, and 3 times back. Even if his packet was the only message in flight, it would take 18 seconds for him to get a response, best case. If the channel was loaded with all five stations being active, it would take much longer than that. Collisions in an HTS environment are really a bad problem.

A more modern management of the stations (employed by AX.25 built into Linux) would use a backoff-and-retry scheme where the time between of retries doubles with each failure and then very slowly shrinks with each success. Mathematically this could eventually reach a situation approaching the Aloha limits where the channel is only 18% busy at each receiver. While this performance is the same as I detailed above, it is automatically scaling. The failure of automatic backoff-and-retry occurs in a mesh system when success is available to the stronger station who captures the target receiver (see capture effect) while a weaker station consistently fails to capture on the same receiver. Eventually the weaker station is forced to slow way down while waiting for the channel to be nearly completely clear, before being able to access the required receiver. A manually (and correctly) configured Ppersistance model would give the weaker transmitter much better access to the required receiver. On a channel with transient users you can't correctly and manually configure the Ppersistance with existing systems, because the Ppersistance would have to change each time a new user joined or left and this information isn't propagated by any of the existing systems.

Mesh On Channel Advantages:

trivial to implement
trivial to reconfigure
automatic best-radio-link selection is easy, i.e. MESH
No server vs user class system.
Serving as a relay point is trivial and built in to the basic equipment. This invites competition to provide relay capability.

Disadvantages:

works badly when multiple stations active if stations don't use Ppersistance and SlotTime effectively.
favors stronger links over weaker links in the case of capture effect
antennas for packet stations should be omnidirectional (less gain per cost) in order to reduce hidden transmitter problems.
maximum power/maximum signal footprint is preferred for collision reduction (in any small area) and for increased relay performance.
network will be slow when used for long-range multi-hop connects
alignment (fine tuning/audio adjustment) of radio and TNC are with many unknowns so performance will be much reduced
extremely difficult path to upgrade since all hams must upgrade together
interference source or jamming would require coordination to spot. No avoidance would be possible due to fixed frequency.
performance of the LAN is exponentially reduced by increase in the number of on-line users and is subject to increasingly frequent catastrophic network failure each time on line activity grows

One feature which is worth noting is that Bob and Sergey can converse without interfering with Anna-Mae and Sigmund.

Mountain top relay -- mesh-on-channel

Historically, the next advancement for this little network would be to activate a relay station on Hill Mountain, enabling Bob to talk to Anna-Mae by way of the hilltop.
csma_5_stations002_digipeater

Adding a node or digipeater on the mountaintop makes things better, but only if nobody is actually sending packets through that mountaintop station. If 3 stations go on the air via the mountain top during the same time-period, the mountaintop makes things much worse for the stations. It has some social results as well.
Since the mountaintop relay exists, operators no longer need to maximize the antenna at their home station to encompass as much of the network as possible. Instead they can focus on getting a good signal into and out of the mountain. This results in the creation of even more hidden transmitters. Before, when you only had stations operating from their houses, a collision would occur when the several other stations within range of one station were all on the air. Now a collision can occur with any two stations trying to send to the mountain that are not within range of each other.

The mountain-top station may also be located at a site which cannot fall under the same service regimine as the home stations. This is especially true if the access to the mountain is conditional on 3rd parties. Hams tend to fix their own stuff, especially during an emergency and regardless of worsening conditions. Commercial vendors tend to fix the high priority stuff but sometimes not during emergencies. It is not unheard of that the mountaintop site, if commercial or government, would be unusable at the very moment when it is most needed.

Impact of inconsistant channel capacity
Key factors in keeping collisions under control are the Ppersistence and SlotTime figures. The point of these figures is to delay transmission, making it less likely that two stations transmit at the same time. In a local network where everybody can hear everybody else, the SlotTime figure is set to the amount of time that a station is unable to listen when it is switching to transmit. The Ppersistence covers the likelihood that two or more stations might want to go from listening to transmitting at the same time. Once the station is actually transmitting, there is no longer danger that another station my mistake the channel for clear. Ppersistence is calculated from the number of other stations that can collide at the receiver of the message, not the transmitter.

In a situation where the participants are transmitting to a mountain-top relay, the stations will be heard by the mountain-top the entire duration of the transmission but not by most of the stations transmitting to the mountain. If a mountain top location can hear 20 stations currently involved in packet operations, and most stations can't hear the other stations, then every station coming into the mountain needs to set Ppersistence to work out to a chance of one transmission in 20. SlotTime will then be the total duration of the receive to transmit switching time, plus the on-the-air time. That's 4 seconds or so. That could mean that on average, a station will wait for 20 x 4 seconds before transmitting, even if transmitting a retry. That's slow and the participants will never put up with it. Even if agreements were reached to make things work that way, human nature would have some of the participants cheating. The default values for Ppersistence in a KPC-3plus is 64, accounting for 5 stations on the LAN. The default value for SlotTime is 100mS or 1/10th of a second. For a mountain top situation the user station delay numbers are completely ineffective. This means that as soon as there are more than a few stations on-the-air, retries will start occurring, resulting in an even higher loading of the channel. The extra loading will build once it starts, result in all of the packet stations getting disconnected -- 0 characters per second. This is what we call a catastrophic channel meltdown.

Practically speaking this means that the packet channel will only be useful so long as only 2 stations are on the air within the range of the mountain top relay station, or if the stations sending data are doing it at much less than the channel capacity, leaving the channel 80% empty, from the perspective of the mountaintop receiver, i.e. as low as 8 character per second. In calculating the channel bandwidth it is not necessary to count the transmissions from the mountaintop relay since everybody can hear it, but that is only true if there is one and only one mountaintop relay on that frequency and in range of the mountaintops!
Note! One way to dramatically increase channel capacity is to strike the requirement for acknowledgements and retries. This makes collisions much less relavent. This is the mode APRS runs in.

to restate: If the TNCs were set up for optimal operation via the mountaintop relay, the total channel capacity (all users combined) would be about 4 characters per second through the mountain top relay. If the TNCs were set up to default values, the channel capacity would be about 10 characters per second but it would only work if there were only 2 stations plus the mountain. However, the likelihood is that as soon as the hungry packeteers see that the channel is actually working again, they will join in and the capacity now goes back to 0.

Advantages of using a mountaintop relay:

trivial to implement
trivial to configure, except for Ppersist and Slottime
since all users are set on the same frequency, fallback in case of mountaintop relay failure is easier
automatic best-radio-path selection is easy, i.e. MESH

Disadvantages:

works badly in all cases where channel into the mountaintop relay gets more than 20% saturation
in real-use situations the mountaintop can often hear another mountaintop on the same frequency, drastically reducing available bandwidth
very difficult to come up with reasonable Ppersist & SlotTime settings
impossible to assure that bullies don't grab the channel and cause jamming (unintentional or otherwise)
favors stronger station over weaker
no path to upgrade since all hams using the mountaintop relay must upgrade together
antennas for packet stations should be omnidirectional (less gain per cost) in order to reduce hidden transmitter problems.
maximum power/maximum signal footprint is still preferred for collision reduction (in any small area) and for increased relay performance.
value of mountaintop relay creates a development trend where low-profile users no longer work on range to neighbors, resulting in more HTS problems and less survivability
mountaintop relay is subject to Exposed Receiver Syndrome making it nearly useless during prime time
interference source or jamming impacts the relay or could impact other users and would require coordination to spot. No avoidance would be possible due to fixed frequency.
due to high coverage, the mountaintop becomes a required asset causing apparent loss of value of 'normal' locations
mountaintop location, achieving high importance to the network, would devastate local emergency preparedness if it has limited service access due to security and management control
performance of the LAN is exponentially reduced by increase in the number of on-line users and is subject to increasingly frequent catastrophic network failure each time on line activity grows
supports a sysop vs users class system

Digital repeater as packet relay

One solution for the hidden transmitter problem is to convert the Hill Mtn site into a repeater. The repeater would use a TNC modem for reception and for transmission but no error checking would be done. Very low latency is required to permit CSMA to operate.
csma_5_stations002_repeater

Advantage to a repeater vs a single-frequency relay:

everybody on the channel can hear everybody else -- no hidden transmitters
latency is very good since the packet is not decoded and resent by the repeater. Everybody is the same distance
alignment (fine tuning/audio adjustment) of user station equipment need only be optimized to talk to the repeater receiver and transmitter, allowing for optimal audio alignment at the user stations
the TNC default timing values for CSMA are now within the realm they need to be to work correctly
CSMA system can actually work. It still depends on PPersistence being approximately lined up with the number of stations on the air at a time
lower power may be used for some stations and yagis may be used.
likelihood for catastrophic melt-down is very much reduced compared to the single frequency solution

The total capacity of a repeater channel is about 20 characters per second with five users if the stations are setting Ppersistence correctly.

Disadvantages:

individual packet stations are now built without a requirement for being able to reach any neighbors directly. The discipline and expertise required for surviving a mountaintop site outage are not practiced.
automatic best-radio-path selection is not possible since client radio is not on shared frequency outside of repeater LAN
the repeater is also likely to be located at a site not serviceable by most of the hams, or any, depending on the site.
Emergency survivability is not assured.
The radios used by most of the hams are set for duplex to operate the repeater.
This means the radios at the user site can't ever be used as relays themselves anymore unless they are manually reconfigured.
repeater users have to set Ppersistence and SlotTime appropriately and thus to not hog the channel
favors stronger station over weaker
channel occupancy still has to be somewhat low. If the system is set up for five users then you would see about 4 seconds or more per packet
exponential throughput degradation as user population increases
interference source or jamming impacts all users and would be difficult to spot
no path to upgrade since all hams using the mountaintop relay must upgrade together
supports a sysop vs users class system
requires specialized node hardware and configuration -- some station or stations in the network needs to be using a network routing system in order to get traffic in and out of the repeater LAN.

With the repeater in place, Anna-Mae and Bob can see each other's stations on the channel. If they have their Ppersistence and SlotTime set up appropriately (the defaults actually) then the latency on the link will be about 4 seconds and they will be able to send an 80 character packet through in about 4 seconds assuming they are alone on the repeater. If there are five different transmissions with acknowledgements in flight, they will have considerably less throughput, between 20 seconds per packet (4 characters per second) and much slower. Latency would go up to a variable rate between 4 seconds and maybe 30 seconds.

Dedicated point‑to‑point links

Every channel has only two stations on it in any give area. Stations will have the same number of TNCs as they have neighbors. The TNCs are set up so Ppersistence and SlotTime are removed from the equation. The rate of packet transfer is one packet every 2 seconds giving us a throughput of from 40 to 90 characters per second depending on radio tx/rx/tx switchover times and depending on size of packet.
csma_5_stations002_dedicated_links

While Hill Mountain may still be applied as a hilltop relay, we're using it only as a link between Bob and Anna-Mae. Having a complete loop allows gives two paths from any station to any other. This improves throughput and allows a redundant path between any two stations. The mountain, while possibly harder to be physically visited, is not unique in that if it were to go away, the stations still have the other route.
The latency across any link is dependent on packet backlog but generally works out to about 2 seconds per hop for a large packet and 1 second per hop for short packets. There aren't any collisions or usage-caused catastrophic channel meltdowns. The cost of the system is about twice what it would cost for single radios at each site but the throughput is from 5 to 500 times as good, depending on loading. Each of the network sites is notionally expandable by adding more radios, though doing so at a commercial tower site can be a bit hard. Adding links at the individual ham-shack level is pretty easy.

Anna-Mae and Bob can connect through the network to each other. If they go through the hill mountain site they will have a latency of as low as 4 seconds for an 80 character packet and a throughput of better than 40 characters per second. If they went around the mountain through Sergey and Bimaldo the latency goes up to at least 6 seconds but the throughput is still better than 40 characters per second, assuming they are the only channel users. If the channel usership goes up (due to stations elsewhere in the network passing through the same nodes), the latency will increase and the throughput will decrease, but at no point does everybody get dumped. The throughput is deterministic in that there are no exponential scaling costs of adding more users. Collisions do not occur. Life is good.

There is a timeout (L4TIMEOUT) on the channel for each station through network to other station connection. If it takes more than L4TIMEOUT to transact any single packet across the network, then that packet is regenerated, or the session will get disconnected. L4TIMEOUT is usually quite a bit higher than the typical latency. The disadvantage of a high L4TIMEOUT is that if the link is broken for some reason, it takes you a long time to find out.

Advantages for dedicated links over any other system:

no magic supervisor needed to get best case performance as there are 0 collisions, ever
pairs of transceiver/TNC combinations could be optimized for each other. Frequencies chosen, antennas aimed, timing and levels customized
no need to use Ppersistence or SlotTime and no local arguments about the best values for this kind of congestion control parameter
power levels can be reduced to whatever is needed for a link, instead of max all-the-time
Antennas can also be chose to suit the links
path to upgrade involves only two stations at a time so this can even be done experimentally
interference source or jamming only impacts one link and could be trivially configured out via a frequency change or antenna change
at 1200 baud, 40 characters per second is easy, 90 is possible
increases in traffic just increases latency, but never results in catastrophic meltdowns
excessive retries or other failures are trivial to diagnose since there is only one other station
automatic routing is performed within the context of the defined radio paths
everybody gets to run a backbone connected node. No sysop vs users mechanism imposed
hilltops, while still valuable, only get unique value if an around the hill network is impractical -- there is justification for linking around the hill

Disadvantages:

requires specialized node hardware and configuration
all connections are pre-arranged -- while this increases camaraderie, it doesn't come naturally
non-portable in the context of one station moving to a new location
automatic best-radio-path selection is not practical since pointing a radio at a new destination radio (and frequency and direction) necessarily removes the radio from prior agreed upon link
since all stations are relay points and always up, this ties up station radios even if the ham isn't on packet
antennas are also tied up in the packet operation
bands are (at least partially) tied up due to use of multiple bands at same site just for packet. There are only so many ham bands
The biggest performance problem is bad links. In many cases a lack of redundancy (non-mesh) forces a high percentage of network traffic across a bad link, resulting in bad throughput for everybody. Since the network topology does not typically permit mesh-on-channel, the bad link becomes a focus. It would take a strong community to keep all links in repair, or a high station density to permit rerouting around bad links.

Restating the in-obvious

The typical packet radio network (mesh-on-channel) consists of frequencies designated for specific kinds of service, and where the usage of that frequency spans far beyond the simplex range of any of the participants. What this means is that each frequency can be looked at as a mountain-top digipeater/node supported LAN where there are multiple mountaintops. Each mountaintop sees only a segment of the users, and doesn't even see all of the mountaintops. Over the large area, and with half a dozen users/services on line at a time, the performance goes from slow, ones of lines of text per minute, to fail with disconnect, back and forth, making it completely useless for live operators who would be totally frustrated with the performance.

A network with hidden transmitters like that described in the previous paragraph, with a dozen or so users on it at a time, will give a throughput per user measured in fractional single bytes per second. It is mind-bogglingly slow. The cost of station measured in station-cost per throughput is on the order of multiple hundreds of $ per character per second. That's ridiculous. It is a fundamental problem. Even multiplying the bit rate by 48 (i.e. to 56Kbaud) will only increase the available bitrate to 10s of characters per user per second and at a cost which is not much better. If a network was popular, after that upgrade, then it will gain more users which, because of the fundamental problem, will again be brought to its knees.

A dedicated link based network will consistently deliver text in a stream, leave connections up for days, and allow for 10-hop wide networks to have latencies of under a minute, all at 1200 baud. A dedicated link network could be from hundreds of times to thousands of times faster, with the same basic radio and modem equipment. Even with a 4 port node at every station, the cost of bandwidth is only about $1000 divided by 40 characters per second, or $25 per character per second. That would build a network which could reliably deliver bandwidth, assuming links which were set up well enough (and that IS going to be the hard part).

There are ways to fix an old-school network, knowing what we know.
The first is to move some of the devices (users or services) to HTS-free controlled zones, or to repeaters. Or to move some of the devices to dedicated links. Perhaps both fixes would be implemented in stages and in various places. Eventually every system which generates traffic at rates higher than 80 characters (one line) every minute onto a channel with out-of-site stations should be moved to a dedicated link. LANs should be broken up into no-HTS zones of no more than 10 stations.

Moving wholesale to dedicated links had many advantages but is very hard and expensive when considered as a network and all at once. Perhaps the way to start is to work on a whole-new parallel network, maybe not even connected to the existing network, and definitely not connected to the Internet. Once the parallel network gets big enough (and it will take a long time) you can start co-opting services. Getting keyboard-ops (live human operators) over to the parallel network is actually pretty easy if you can find any survivors. Demos of the parallel network are always interesting. It either lights up their eyes or antagonizes the crap out of them.

Which disadvantages to dedicated links matter?

Pre-arranging connections, and tying up radios, are the big objections for most poll respondents. That shouldn't affect service providers, however.

In this day and age, the biggest real limits to implementing a dedicated link network from scratch are:

Interested stations tend to be too far apart for non-hilltop connections.
Supporting multiple radios at the hilltop or tower relay location is difficult or not permitted. Certainly existing leftover commercial antennas are not going to be enough to support a multi-band relay device
Since hams these days tend to think of vhf and uhf as talking-to-the-repeater, they are not prepared to own and operate, much less dedicate, decent terrestrial antennas. This is probably the biggest problem with making a terrestrial-only packet radio dedicated link system.
The second biggest real problem is the bad-links scenario where a link can be present, but not be strong enough to be reliable, i.e. too far apart. The participations of the network need to keep in mind that redundancy is the key to dependable network operation.

TARPN project -- FAQ: Networking‑On‑Purpose summary

The TARPN project aims to make setting up networks of dedicated links practical. We want friendly, easy, cheap, reliable, and expandable. We're prepared to try it without commercial tower locations though that may not be reasonable in mountainous environs.

Dedicated links give us excellent performance and easy upgrade paths.