Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Laying A Foundation For Distributed Computing's Next-Gen: Page 5 of 9

As part of this project on recovery-oriented computing, Stanford researchers have done work on tools to pinpoint why a system has failed. They have also done work on micro-reboots to restart components of a system rather than a whole system.

These ideas are being put into a version of Java J2EE to evaluate them. We are getting some company uptake where people are trying to put some of this in their products.

EET: What's next?

Patterson: The next phase of our project will look at ways to make distributed systems more reliable and adaptive. One of the things we are trying to do is get more data on why systems crash. We think it would be good if people knew why their systems were crashing. We are trying to collect our own set of data we can publicize. Then we will use statistical learning theory to try to analyze the large amount of systems-monitoring data.

This is the state of the art for people providing online services using distributed computers that millions of people depend on: They have a big network operations center with lots of human beings watching monitors to see what's going on. If something goes wrong, they react.