Cosmic Cheering ResumesBy David F. Carr | Posted 2004-02-06 Email Print
Re-Thinking HR: What Every CIO Needs to Know About Tomorrow's Workforce
Think your network is hard to manage? Try remote diagnosis and repair when you're relying on radio signals from Mars.
For 15 nail-biting minutes, everyone from NASA Administrator Sean O'Keefe to the most junior member of the Mars Exploration Rover project waited for the signal indicating the lander was still alive. The cheering resumed when the signal came.
Spirit's outage weeks after that brief silence left engineers guessing. JPL engineers could keep sending new commands to the rover, but they had no way of knowing whether it was listening or whether new instructions might do more harm than good.
Often, a communications breakdown spells the end of a mission, as appears to be the case for the European Space Agency's Beagle 2 Mars lander, which failed to radio back after touchdown in December. On the other hand, the rovers are designed to be as autonomous and resilient as possible, meaning that they will try to debug their own problems and radio diagnostic information back to Earth even if they are not receiving commands.
For Joseph Wackley, the Deep Space Network's mission system operations manager, the silence meant he had to fortify the network to make sure Spirit's signal would not be missed when-or if-it came. NASA brought on additional staff and powered up onsite generators to guarantee that antennae would be up and running.
"That's exactly the nightmare," says Wackley, who was at a Pasadena facility of subcontractor ITT Industries the night Spirit landed. "We have to make sure it's not us contributing to why they are not seeing the signals."
When trying to regain a connection, the Deep Space Network puts its reception equipment in a "closed-loop" mode, continually scanning a range of frequencies around the expected one, looking for some kind of signal that it could "lock on" to. The closed-loop mode kicked in for the 15 silent minutes during Spirit's landing and again in the latest communications crisis.
JPL finally caught a break Jan. 23. After several rounds of sending instructions that were not acknowledged, JPL received a transmission Spirit sent on its own initiative. But engineers still had trouble getting Spirit to respond to commands or send back intelligible data. One communications session relayed via the Surveyor orbitor picked up static, as if the UHF antenna had been left on but wasn't controlled by Spirit's computer.
Gradually, JPL was able to rebuild the communications link through trial and error.
Where project manager Pete Theisinger originally told a press conference some electrical or mechanical failure was suspected, the investigation subsequently indicated a software-only problem.
It turned out the rover had become trapped in a cycle of continual reboots, crashing each time it tried to access the two flash-memory devices it uses for storage of images and other data. The cycle of some 60 reboots over 30 hours also prevented Spirit from going to sleep overnight when no solar power was available, causing it to run down its batteries.
To get the robot's software to work normally, JPL had to disable the flash-memory devices so the onboard computer would boot using only Random Access Memory (RAM), which stores information for active use by a computer only when power is present.
But this left Spirit operating in a crippled state, since data held in RAM evaporates when a computer is powered down. Like a digital camera, the rover uses flash memory for temporary storage of data to be sent to Earth later. But apparently the scientists had hoarded data too aggressively, filling flash storage with data collected during the cruise between Earth and Mars as well as data from the surface exploration. Eventually, just keeping track of all those files consumed so much memory that Spirit's software was unable to function normally.
The solution: Delete excess files and send a patch instructing the rovers how to use their flash and random memories more conservatively.
This patch was also sent to the second rover, Opportunity, which had meanwhile experienced a flawless landing on Jan. 24. That mobile explorer maintained communications with Earth even while it was bouncing to a stop. Then it flipped itself upright and began sending back images from its 20-megapixel stereo cameras.
By the end of January, the Mars exploration program's head scientist, Steve Squyres, said he was optimistic both rovers ultimately will work well beyond the three months originally planned. "We built margin [of error] on top of margin [of error], specifically to allow for the fact that things go wrong on a place like Mars," he says.
The resuscitation of Spirit continued a record of long-distance network recoveries for the space program. In 1990, the Galileo spacecraft sent to Jupiter suffered what could have been a mission-ending failure when its umbrella-like main antenna failed to unfold properly, but JPL managed to reprogram the spacecraft in flight. In that case, mission managers sent compression software that allowed Galileo to transmit data and high-resolution images over a backup antenna.
The rehabilitation of Spirit also came as explorations of Mars were putting unprecedented demands on the Deep Space Network. If Beagle 2 had remained in contact, NASA also would have assisted the Europeans with communications for that lander.
To accommodate Spirit and Opportunity, the Deep Space Network has to maintain 'round-the-clock communication. Because they are on opposite sides of the planet, the two rovers operate on roughly opposite shifts. When one is in daylight, it gathers power through its solar panels, while the other powers down for the night.
For the $860-million mission to be completely successful, scientists wanted both rovers actively searching for signs that liquid water existed on Mars.
But, in any event, sending twin rovers to Mars served as insurance for NASA in case one robot was lost-redundant outposts of the 100-million-mile network.