Wednesday, June 5, 2013

How well are predicted times reflected in actual swim rankings in a pool swim triathlon?

Subtitle: fun with non-parametric statistics!
Sub-subtitle: getting my nerd on. 

On my run yesterday when I should have been thinking about cadence or something like that I was busy wondering whether or not our swim order for the time trial start at Sunday's pool swim triathlon was close to the swim time order outcome. We were intended to be lined up fastest to slowest, but were we really?

We were "seeded" fastest to slowest based on the times we submitted - our 100y swim time that we could maintain for 300y.  Our bib numbers reflected that seeding. So the athlete who submitted the fastest time was "1" and the next fastest was "2" and so on. I was 44.

The researcher in me asked, "how well did we self-seed ourselves?" Were we ordered relatively correctly, did we submit/guess our right time, resulting in the right order?

So I pulled the Salem Sprint Tri results into a spreadsheet and got to work.
  • I took out the two late entries I knew who had high bib numbers because they registered the morning of the race and their number did not reflect their seeding. There may have been others, but I didn't assume so.
  • I renumbered the seeding and the final swim rankings to account for the two pulled entries and any no-shows resulting in two columns of ranked data, 1 to 162.
  • I plotted the data, shown on the graph below.

The X axis is our swim order (denoted by our bib number) and the Y axis is our actual swim rank based on our 300y swim time (plus the run to the transition mat which introduces error). If we'd ordered ourselves perfectly, the data would all fall on the red line. Those above the line were seeded faster (too far up in the swim lineup) than they should have been; those below were seeded slower (too far back in the swim lineup) than they should have. I am the red dot on the graph, I was pretty close to right on - seeded 38, finished 39 with the adjusted rankings.

If you are a dot far away from the line you probably created a logjam and should re-evaluate or get a new watch!

I calculated the Spearman Rank Order Correlation Coefficient and got 0.806 with a p less than 0.0001 meaning it's a significant finding. A correlation of 1.0 would be perfect meaning we'd all fall on the line. (I couldn't use Pearson since it was ordinal data only). Really, we didn't do too bad a job.

The funniest thing is how long it seems to take us to put ourselves in numerical order from low to high on race morning even with giant numbers on our arms.

If I'd had the estimated times I could have correlated those with actual times and used Pearson...but I digress.

I found no significant gender differences so men and women were equally capable (or not depending on your view) of predicting swim time/self-seeding.

I'd like to thank my PhD adviser Dr. Tonya Smith Jackson for helping me to appreciate the beauty of statistics, especially non parametrics which are still my favorite!!