An online community for showcasing R & Python tutorials. It operates as a networking platform for data scientists to promote their talent and get hired. Our mission is to empower data scientists by bridging the gap between talent and opportunity.

Basic Statistics

- Published on November 29, 2015 at 10:37 am
- Updated on April 28, 2017 at 6:24 pm

- 4.2k Views
- Shares
- 7 Comments

Today will be a brief introduction in to circular statistics (sometimes referred to as directional statistics). Circular statistics is an interesting subdivision of statistics involving observations taken as vectors around a unit circle. As an example, imagine measuring birth times at a hospital over a 24-hour cycle, or the directional dispersion of a group of migratory animals. This type of data is involved in a variety fields, such as ecology, climatology, and biochemistry. The nature of measuring observations around a unit circle necessitates a different approach to hypothesis testing. Distributions need to be “wrapped” around the circle to be of use, and conventional estimators such as the sample mean or sample variance hold no water.

In this post, we will conduct *Rao’s Spacing Test* to assess the uniformity of a circular dataset. This is a basic procedure and should be thought of as an introduction to handling circular data.

We are going to conduct a hypothesis test on *turtles*, a small dataset consisting of the arrival angles of 10 green sea turtles to their nesting island. Our goal is to determine where the arrival angles show signs of directionality or are more indicative of a random scatter.

First, install the `circular`

package and attach the *turtles* dataset.

install.packages("circular") require(circular) attach(turtles)

The `circular`

package contains its own plotting function, `plot.circular`

. Let’s observe the arrival angles of the turtles.

plot.circular(arrival)

Given the eye test, the observations appear to be uniform around the circle. If we want to run a hypothesis test to determine if the data is truly uniform, we will need to develop a test statistic that works with angular data.

What is a good parameter for us to utilize? Taking the sample mean doesn’t tell us much about the direction of the data (180 degrees is not a useful mean of 2 degrees and 358 degrees). In the following plot, observe how the sample mean is of no use in representing the shape or spread of our data.

mean(arrival) plot.circular(mean(arrival))[1] 0.9120794

Instead, we will use a method that determines directionality by measuring the average space between observations. This test is called **Rao’s Spacing Test**.

Rao’s Spacing Test was developed to assess the uniformity of circular data. It uses the space between observations to determine if the data shows significant directionality. If the data is uniform, observations should tend to be evenly spaced apart.

Here is the test statistic \(U\) for Rao’s Spacing Test: $$U = 1/2\sum\limits_{i=1}^n |T_{i} – λ| $$ where \(λ = 360/n, T_{i} = f_{i+1}-f_{i}\) and \(T_{n} = (360-f_{n})+f_{1}\)

Basically, the test statistic aggregates the deviations between consecutive points, each one weighted by the total number of observations in the dataset.

We will use the `rao.spacing.test()`

function to run this hypotheses test. Our null hypothesis says the data is of a uniform distribution, while the alternate states the data shows signs of directionality. Let’s run the test.

rao.spacing.test(arrival,alpha=.10) Rao's Spacing Test of Uniformity Test Statistic = 127.2689 Level 0.1 critical value = 161.23 Do not reject null hypothesis of uniformity

With a test statistic of 127 falling below the critical value of 161, the data fails to significantly lean in any direction. We can not reject the hypothesis that the turtles arrivals are of a uniform distribution.

Rao’s spacing test determined the data to show no signs of directional trends. We cannot reject the null hypothesis of uniformity and will assume uniformity in regards to the direction of arrival. While this post was a relatively basic tutorial, many people in the data science community haven’t worked with circular data before. It is an interesting subtopic to dive in to as well as a young field of statistics that is still evolving.

I would like to extend credit to S. Rao Jammalamadaka PhD, of the University of California, Santa Barbara, and his textbook “Topics in Circular Statistics” for sparking my interest in the field of circular statistics.