August 19, 2010

Possible Data Mining Age in the Future of Exoplanet Science

Exoplanets. Distant worlds light-years away, yet they continue to move ever closer to humanity’s psyche, enticing us to declare them worthy of serious research.

The study of exoplanets is of such enormous intellectual importance and it helps us understand our place in the universe,” said Louis Friedman of the Planetary Society.

The search for exoplanets is one of the most exciting subjects in all of astronomy,” the Astronomy Decadal Survey committee expressed their enthusiasm in a report entitled New Worlds, New Horizons in Astronomy and Astrophysics.

Thus, a program was recommended “to explore the diversity and properties of planetary systems around other stars, and to prepare for the long-term goal of discovering and investigating nearby, habitable planets." The survey proposed The Wide-Field Infrared Survey Telescope (WFIRST), a $1.6 billion dollar telescope that would provide us with more exoplanet data. The report states that “in addition to determining just the planetary statistics, a critical element of the committee’s exoplanet strategy is to continue to build the inventory of planetary systems around specific nearby stars.

Now look closer at those words. “Inventory”, “census” and “statistics” of planetary systems. Clearly, the current focus is to continue to “gather” more and more data. We just turned into proverbial planet hunter-gatherers!

Kiddding aside, the excitement to gather more exoplanets in our database is great. But then I feel that something is missing, and that something must be done in addition to it all.

I feel that the future prospect of “processing” must also be planned in parallel with the construction of these telescopes.

We are continously building instruments to gather more and more exoplanet data but we are not preparing the infrastructure to handle the imminent data explosion in Exoplanetary Science.

I anticipate a bottleneck when all these planet-hunting telescopes finally get deployed and the copious amounts of data comes flooding in. On what basis am i saying this? Well, take this for example: Dozens of Earth-mass planets could already be within the database of the Kepler Team right now even as we sit waiting for their announcement. That is why i think that not just one earth-sized planets would be announced at one time, but several of them. It’s just that the Kepler scientists are extra careful about their greatest discovery ever in humanity's history. They are simply making sure that their discoveries are solid before they reveal it to the whole world.

But perhaps it’s understandable to withhold data about historical “firsts” because so much is at stake for the Scientists making the announcement.

The raw data will eventually be shared to the public for Citizen Scientists to gobble up. But I can't help but think that in some way, it’s a mini-bottleneck. And looking ahead, there is no infrastructure for outside help to “harvest” planets from the raw data the will eventually pour in. So far so good, only pure Scientists can harvest the goods.

In my previous post, I asked these questions: How will Scientists keep up with petabytes of raw data? How will web technology keep up? How will the internet enable Citizens to contribute to science for the love of it? What needs to be done?

Looking around at the changing face of Science that results from the enabling power of web technology, we see the important contributions of a swarm of minds working together for Scientific goals.

Take a look at GalaxyZoo, FoldIt, StarDust@Home, Rosetta@Home, Einstein@Home and SETI@Home. These are awesome computational engines that tap the human potential in analyzing massive amounts of data, to produce novel scientific discoveries. Despite the public excitement in the field of exoplanets, I'm surprised that there is no program like them that are focused on exoplanetary science. I have reason to think that in some way, the same idea of "distributed processing" can be applied to the deluge of exoplanet data that will arrive in the coming decades.

"We're at the dawn of a new era, in which computation between humans and machines is being mixed," says Michael Kearns--a computer scientist dealing with the concept of distributed thinking.

So how can we apply the power of distributed thinking in Exoplanetary Science? Who must jump-start the GalaxyZoo for Exoplanets? Should NASA or ESA do it? Should ExoPAG work on it? Universities? Or should the private sector do it? Should someone write a grant proposal? Or perhaps open a kickstarter project for it?

It’s been said that we are currently in the Golden Age of planetary discovery. But if we don’t begin to create some way to collectively "process" the huge amount of data gathered by planet-hunting telescopes, a “bottleneck era” might follow. It will be an era in which we are virtually “aware” that planets lie hidden within massive amounts of data stored somewhere in hard drives locked away at some facility. And like the exoplanet that was re-discovered within the Hubble archives, it would take years to mine these planets from out of the digital ones and zeros and be announced as a belated exoplanet discovery.

I am guessing that that such an era may likely occur if we don't build a system to process the data in a collective fashion. But then we would refuse to call it a bottleneck era. We would simply name it a much better-sounding term, The Data Mining Age of Exoplanetary Science.

New Worlds, New Horizons in Astronomy and Astrophysics:
$1.6 billion telescope would seek out alien planets:
Dark Energy and Exoplanets Top List of Astronomy Priorities:
Citizen science, People power: