The 100-Million-Mile Network

By David F. Carr  |  Posted 2004-02-06

Eighteen days after landing on Mars, the robotic explorer named Spirit squawked in distress and went silent for nearly 24 hours.

Listening anxiously for any sign of life were navigators at the Jet Propulsion Laboratory (JPL) in Pasadena, Calif. They had to fix a broken interplanetary communications link that reached more than 100 million miles (and counting-the distance keeps growing as the orbits of Earth and Mars draw apart).

"The most difficult thing is to know how to talk to the spacecraft when you're getting no response from it," says Douglas J. Mudgway, a former National Aeronautics and Space Administration (NASA) engineer who managed communications with the Viking landers in the 1970s and helped save the Galileo mission in the early 1990s.

Spirit was exploring the Gusev crater on Mars on Jan. 21, and was already sending back spectacular photographic images. The wandering robot had rolled out of its landing nets and had approached a rock to take measurements using an appendage called the Rock Abrasion Tool.

Diagnosing what was wrong with Spirit depended on interpreting squawks, tones and other sounds traveling along a conduit dubbed the Deep Space Network.

Operators of this interplanetary signaling system send commands to and listen for data from "nodes" such as Spirit and its twin rover, Opportunity, using three facilities spaced roughly one-third of the way around Earth apart from each other. These communications complexes are in Goldstone, Calif.; near Madrid, Spain; and near Canberra, Australia.

This geographic separation means that, as the Earth rotates, at least one of these listening posts will be able to point its antennae toward the spacecraft being tracked at any given moment. Designed much like radio telescopes, the antennae are parabolic dishes as large as 70 meters in diameter (although the trend for the future is to use arrays of smaller antennas).

During normal operations, the rovers communicate directly with Earth when receiving instructions or sending back diagnostic information. They send back the bulk of their scientific data and photographs by using NASA's Mars Odyssey and Mars Global Surveyor probes as relay stations. These unmanned craft orbit the red planet carrying cameras, high-gain and ultra-high-frequency (UHF) antennae along with other scientific instruments.

The omnidirectional mast antenna sticking up from each rover's top like a dorsal fin knows when to transmit by listening for a signal that one of the orbiters is passing overhead. The orbiter then uses its more-powerful antenna to send as many as one million bits of data per second back to Earth. While fairly fast for an attenuated radio connection, that's only about a tenth of the speed of a cable-modem connection for the average home-computer user.

The rover-to-orbiter link uses UHF radio-the same basic technology used for broadcasting channels 14 and higher to television sets in the United States-while long-haul communications to Earth use X-band radio, which is a higher frequency (about 8 gigahertz) and easier to focus into a tight beam.

For critical commands, the rovers do communicate directly with Earth over X-band. Each rover has directional antennae that provide relatively strong signals that make it easier for the ground stations on Earth to filter out space noise and terrestrial interference. The omnidirectional antenna can also send and receive X-band when the directional one is not aimed properly.

Despite all this radio power, it's not unusual for a connection to be lost, at least temporarily. When Spirit landed the night of Jan. 3, the cheering in the JPL control room-over a series of simple radio tones indicating the lander had survived its fiery descent and dropped to the surface within a protective cluster of airbags-abruptly ended with the announcement, "We currently do not have signal from the spacecraft."

Cosmic Cheering Resumes

For 15 nail-biting minutes, everyone from NASA Administrator Sean O'Keefe to the most junior member of the Mars Exploration Rover project waited for the signal indicating the lander was still alive. The cheering resumed when the signal came.

Spirit's outage weeks after that brief silence left engineers guessing. JPL engineers could keep sending new commands to the rover, but they had no way of knowing whether it was listening or whether new instructions might do more harm than good.

Often, a communications breakdown spells the end of a mission, as appears to be the case for the European Space Agency's Beagle 2 Mars lander, which failed to radio back after touchdown in December. On the other hand, the rovers are designed to be as autonomous and resilient as possible, meaning that they will try to debug their own problems and radio diagnostic information back to Earth even if they are not receiving commands.

For Joseph Wackley, the Deep Space Network's mission system operations manager, the silence meant he had to fortify the network to make sure Spirit's signal would not be missed when-or if-it came. NASA brought on additional staff and powered up onsite generators to guarantee that antennae would be up and running.

"That's exactly the nightmare," says Wackley, who was at a Pasadena facility of subcontractor ITT Industries the night Spirit landed. "We have to make sure it's not us contributing to why they are not seeing the signals."

When trying to regain a connection, the Deep Space Network puts its reception equipment in a "closed-loop" mode, continually scanning a range of frequencies around the expected one, looking for some kind of signal that it could "lock on" to. The closed-loop mode kicked in for the 15 silent minutes during Spirit's landing and again in the latest communications crisis.

JPL finally caught a break Jan. 23. After several rounds of sending instructions that were not acknowledged, JPL received a transmission Spirit sent on its own initiative. But engineers still had trouble getting Spirit to respond to commands or send back intelligible data. One communications session relayed via the Surveyor orbitor picked up static, as if the UHF antenna had been left on but wasn't controlled by Spirit's computer.

Gradually, JPL was able to rebuild the communications link through trial and error.

Where project manager Pete Theisinger originally told a press conference some electrical or mechanical failure was suspected, the investigation subsequently indicated a software-only problem.

It turned out the rover had become trapped in a cycle of continual reboots, crashing each time it tried to access the two flash-memory devices it uses for storage of images and other data. The cycle of some 60 reboots over 30 hours also prevented Spirit from going to sleep overnight when no solar power was available, causing it to run down its batteries.

To get the robot's software to work normally, JPL had to disable the flash-memory devices so the onboard computer would boot using only Random Access Memory (RAM), which stores information for active use by a computer only when power is present.

But this left Spirit operating in a crippled state, since data held in RAM evaporates when a computer is powered down. Like a digital camera, the rover uses flash memory for temporary storage of data to be sent to Earth later. But apparently the scientists had hoarded data too aggressively, filling flash storage with data collected during the cruise between Earth and Mars as well as data from the surface exploration. Eventually, just keeping track of all those files consumed so much memory that Spirit's software was unable to function normally.

The solution: Delete excess files and send a patch instructing the rovers how to use their flash and random memories more conservatively.

This patch was also sent to the second rover, Opportunity, which had meanwhile experienced a flawless landing on Jan. 24. That mobile explorer maintained communications with Earth even while it was bouncing to a stop. Then it flipped itself upright and began sending back images from its 20-megapixel stereo cameras.

By the end of January, the Mars exploration program's head scientist, Steve Squyres, said he was optimistic both rovers ultimately will work well beyond the three months originally planned. "We built margin [of error] on top of margin [of error], specifically to allow for the fact that things go wrong on a place like Mars," he says.

The resuscitation of Spirit continued a record of long-distance network recoveries for the space program. In 1990, the Galileo spacecraft sent to Jupiter suffered what could have been a mission-ending failure when its umbrella-like main antenna failed to unfold properly, but JPL managed to reprogram the spacecraft in flight. In that case, mission managers sent compression software that allowed Galileo to transmit data and high-resolution images over a backup antenna.

The rehabilitation of Spirit also came as explorations of Mars were putting unprecedented demands on the Deep Space Network. If Beagle 2 had remained in contact, NASA also would have assisted the Europeans with communications for that lander.

To accommodate Spirit and Opportunity, the Deep Space Network has to maintain 'round-the-clock communication. Because they are on opposite sides of the planet, the two rovers operate on roughly opposite shifts. When one is in daylight, it gathers power through its solar panels, while the other powers down for the night.

For the $860-million mission to be completely successful, scientists wanted both rovers actively searching for signs that liquid water existed on Mars.

But, in any event, sending twin rovers to Mars served as insurance for NASA in case one robot was lost-redundant outposts of the 100-million-mile network.

To Run A Space Network">

What You Should Do To Run A Space Network
  • Automate processes.
    Encode many operations in a remote device, so it can solve its own problems.
  • Bulletproof your gear.
    Refine systems under your direct control, like Deep Space Network antennas, to make sure they aren't the cause of an outage.
  • Be persistent.
    Analyze any shred of communication. Build theories. Exploit small wins.
  • Simulate potential problems.
    Test theories on duplicate devices, under your control, even if conditions aren't alike.