January 22, 2010

A Data Pipe Dream

I have a dream, a data pipe dream. I'd like to share a vision on how to maximize the usefulness of data in the field of Exoplanet Science.

Even before I built the so-called Exoplanet Seeker, I had something else in mind. I wanted to build a "mash-up" from the treasure trove of data available about exoplanets. Much data on exoplanets are freely available but there is no standard as to how they are shared, and have no structure as to be usable by other web applications. And there is no way to channel them to other applications. In other words there are no "pipes" to send them through in order to make them more useful.

I am thankful that exoplanet researchers are sharing their data online. But now we must move to the next logical step. Most of the data providers do not expose their data in a structured way like XML or JSON. So far, the only one I saw that is friendly to programmers is LookUP, as they share data in XML, JSON, and AVM format.

Currently, there's no way for web developers to create compelling Web Apps or "mashups" from the data available at The Exoplanet Encyclopedia, NStED, or Exoplanets.org. The best I could do was to come up with a very simple exoplanet seeker, in the hopes that it signals a vision of what is to come in the next few years as more exoplanet discoveries pour in.

Think of the hundreds of thousands of exoplanets that will fill scattered databases across the earth. Think of how many Petabytes of data all these discoveries will mount to. And finally, think of how wonderful it would be if all the exoplanet researchers and planet-hunters in the world shared their data in the form that is usable by other Scientists, and open to Citizens as well.

I believe that when an interoperability standard is developed (perhaps something like an "Exoplanet Markup Language", ExoML), or when exoplanet data sources started sharing their data semantically, then a new form of Science will emerge from this abundance of structured Exoplanetary Science data. Awesome things can emerge from this information pipe infrastructure. Perhaps even some new and novel discoveries can come out of it.

If programmers can readily create visualizations from open data, or run simulations using structured data, or create cross references from different planetary characteristics along with their host stars' qualities then perhaps patterns can be discovered, like how they discovered the relationship between the lithium content of stars and the presence of planets around them. I'm sure there are other clues that are hidden in all these data, just waiting to be mined. Even old data can still contain yet undiscovered treasures, like how an exoplanet was found hidden within Hubble's old data archives.

This data pipe dream must have been apparent in two of my previous posts, "The Extremophile Zone" that hints some aspect of this data infrastructure (injecting how social media can play a role in science), and another is "Life is a Pattern", a flash fiction that gives a glimpse of this vision in a metaphorical narrative.

I hope that some organization picks up on this thread to enable a collaborative data pipeline among exoplanet researchers around the world.