November 4, 2010

My First Foray into Data Visualization with Exoplanets

Einstein said, "If I can't picture it, I can't understand it." This statement proved itself to be true on my first foray into Data Visualization. And let me add that it was absolutely a fun experience being able to play around with all the data using the excellent tool provided by "Many Eyes" which was instrumental in helping to reveal insights that lay hidden within those rows and columns of numbers. And of course, it wouldn't be possible at all without the data! So I am thankful for the folks at for providing the dataset.

Now I realized that, "If you can't explain it simply, you don't understand it well enough." Yes, Einstein also said that. And I think that is the final part of Data Visualization (which i presume overlaps with Data Journalism in some way, too). It was fun generating the visualizations, and analyzing them to find insights was challenging--but it is not complete without the ultimate goal of communicating what I have discovered.

Sure, the pictures speak for themselves (in more than a thousand words), but there is an added benefit in formulating a simple statement to convey the insight: it makes you learn the concept even more. Finally, being able to share that understanding completes the joy.

Let me warn you that I am not a scientist, so please bear with me as i briefly explain my own original findings from these pictures (and please correct me if i'm wrong). Also let me iterate that I'm not a designer or an artist, so the graphics presented in this post are no match compared to the aesthetics of David McCandless (Visual Miscellaneum), and the analytical prowess of Edward Tufte (Envisioning Information).

The conclusion(s) I've drawn from the visuals pretty much explains it concisely. The additional notes are supplementary.

Fig. 1: Visual Magnitude vs Discovery Method (used to find the exoplanet)
Conclusion: The magnitude or brightness of host stars whose planets were discovered via Transit Method are generally dimmer than those stars whose planets were discovered via Radial Velocity (RV) Method. Here's another proof.
Transit Method works best on dimmer stars because it does not drown out its own planets in its glare as much as a bright star would. Radial Velocity (RV) method works best on brighter stars for better spectographic analysis.

Fig. 1: Visual Magnitude vs Discovery Method

Notes:The label delineating the two methods are not noticeable so i need to point out that the left section are those host stars with planets were discovered via RV method. The smaller section on the right with a noticeably darker shade are those stars with transiting planets.
The dimmer tint for high values of Visual Magnitude (V) fits well with the fact that stars with a higher value of magnitude (V) are actually dimmer than those with a lower value.
[Link to the interactive version at Many Eyes]

Fig. 2: Discovery Method vs. Exoplanet Names
Conclusion: Exoplanets discovered via RV method have more "generic" sounding names than those discovered via transit method.
Here's a nice tip on how to quickly guess whether an exoplanet was discovered via RV or transit method: If it's generic-sounding, or if it bears the host star's name (like "HD blah-blah")--then chances are, it’s discovered via Radial Velocity!

Fig. 2: Discovery Method vs. Exoplanet Names
Notes: Exoplanet names are commonly derived from the host star name plus the alphabetical index of the planet by order of its discovery date. Often, the instrument used to discover it are then used in lieu of the star name (for example: Kepler-4 b, or CoRoT-7b)

[Link to the interactive version at Many Eyes]

These are only some of the visualizations I came up with while playing with the exoplanet dataset for around an hour or so. There are definitely many more correlations you can uncover via the other methods at Many Eyes or other tools. Make sure you also play around with the data and let other people know about the insights you come up with. Head over to Many Eyes and use the same data I uploaded to generate your own visualizations.

1) These graphs and visualizations are what we can statistically glean from, given the current data provided. The trends I pointed out in this post will change in the next few years. For example: When Kepler starts to announce their discoveries, the number of exoplanets discovered via Transit Method will suddenly skyrocket and overtake the total number of exoplanets discovered via Radial Velocity (RV). (It will be interesting how to visualize that trend when we add the function of time.)
2) Also, these visualizations does not take into account those exoplanets discovered by a mash-up of two or more methods (such as those exoplanets discovered by using Transit method in tandem with RV).

Many Eyes
Dataset from