Statistics in Python: Measures of Dispersion

In this article I would like to concentrate on 4 main measures of statistical dispersion: range (and the biggest and the smallest number as well), average deviation, variance and standard deviation. In Python, we can easily compute them with a few functions.

Range

To compute the range, we have to determine the smallest and the largest value in data set. The range is the difference between them.

def range_min_max(abclist):
    smallest = abclist[0]
    largest = abclist[0]
    range_of_values = 0
    for item in abclist[1:]:
        if item < smallest:
            smallest = item
        elif item > largest:
            largest = item
    range_of_values = largest - smallest
    return smallest, largest, range_of_values

This function returns the smallest, the largest number and the range. We assume, that the first value in the collection (abclist[0]) is both the smallest and the largest value. At this stage the variable range_of_values can ​​be 0. Next, for each item in the list we check if it is less than the currently stored smallest value. If so, we save this value as the smallest. Next, we check whether item is greater than the current largest, if so, we save this element as the largest value. The rest of the cases, we simply omit. We calculate range_of_values ​​calculate and return all values.

Average Absolute Deviation, Variance and Standard Deviation

These three measures are interrelated, so I present them in a “cascade” of functions. To calculate them, we have to use the function which counts mean.

def mean(datalist):
    total = 0
    mean = 0
    for item in datalist:
        total += item
    mean = total / float(len(datalist))
    return mean

def avg_dev(thislist):
    average = mean(thislist)
    sum_of_dev = 0
    avg_dev = 0
    for item in thislist:
        sum_of_dev += abs((average - item))
    avg_dev = sum_of_dev / len(thislist)
    return avg_dev

def variance(thatlist):
    average = mean(thatlist)
    sum_of_sqrt_dev = 0
    variance = 0
    for item in thatlist:
        sum_of_sqrt_dev += (average - item) ** 2
    variance = sum_of_sqrt_dev / len(thatlist)
    return variance

def std_dev(anotherlist):
    std_dev = variance(anotherlist) ** 0.5
    return std_dev

Average deviation is the arithmetic average of absolute differences between the values ​​and the mean. From each value we subtract the arithmetic mean of the collection (or vice versa, because we count the absolute value), then we sum all the differences (these 2 actions are represented in line 14: sum_of_dev + = abs ((average - item))). We count the arithmetic average of the sum of differences and return the result.

The variance is the arithmetic average of squared deviations of values ​​from the mean value. It’s calculated similarly to the average deviation, with the difference, that the differences between the values ​​and mean, are squared.

Thus, counting the standard deviation is a trivial operation. Its value is simply the square root of the variance. Here presented as the variance raised to the power 0.5.

Interpretation of Results

After printing all of the functions for the list crater_diameter:

crater_diameter = [46, 51, 49, 82, 74, 63, 49, 70, 48, 47, 79, 48, 52, 55, 49, 51, 58, 82, 72, 45]

print range_min_max(crater_diameter)
print avg_dev(crater_diameter)
print variance(crater_diameter)
print std_dev(crater_diameter)

we should have an output:

zrzut-ekranu-2016-09-07-o-19-22-39

The results of measures of dispersion tell us about how stretched are the values ​​in the data set. Our collection of craters’ diameters have range of 37 km. The average deviation from the average diameter is 11.25 km and the standard deviation is 12.07 km. The variance amounts 161.45 km2, however, the interpretation of variance is a problem, because of the squared unit. The variance can be useful only in comparative studies of data sets.

Advertisements
Tagged with: , , , , , ,
Posted in data analysis, python

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: