In the central tendency there are 3 most common measures: mean (arithmetic average), median and mode. Their manual calculation in Python is presented below.
Arithmetic mean is a sum of a collection of numbers divided by the total number of numbers in the collection. We are computing the average manually with the
mean() function, which iterates through a list, counts a sum of all numbers in the list, then takes this sum and divides it by length of the list and returns the result.
def mean(datalist): total = 0 mean = 0 for item in datalist: total += item mean = total / float(len(datalist)) return mean
Median, as a middle value of sequence of numbers, is a little bit more complicated to compute. There are 2 different ways to get the result, depending on the number of elements in the sequence: even or odd. If the collection consists of an even number of numbers, median is an average of two middle elements.
def median(datalist): numsort = sorted(datalist) mid = len(numsort) / 2 median = 1 if len(numsort) % 2 == 0: median = (numsort[mid - 1] + numsort[mid]) / 2.0 else: median = numsort[(len(numsort) - 1) / 2] return median
median() creates a new, sorted list named
numsort. Next, it creates a variable
mid, which is the length of list divided by 2 – this is the middle of the list. We need to define also a variable
median, which we can assign value 1.
If the number of numbers in the list is even, median is the sum of two values with the middle indices, divided by 2.0 (it has to be float number). If the amount of element is odd, median is the number with the middle index in the list.
Mode is a most often occurring number in the list. To compute the mode, we can use the function
frequency_distribution(), that collected the data we need now. Instead of printing the frequency distribution, we create function
mode() which iterates through the keys of the dictionary, looks for most often value and returns its key:
def frequency_distribution(datalist): freqs = dict() for item in datalist: if item not in frees.keys(): freqs[item] = 1 else: freqs[item] += 1 return freqs def mode(datalist): d = frequency_distribution(datalist) most_often = 0 mode = 0 for item in d.keys(): if d[item] > most_often: most_often = d[item] mode = item return mode
Interpretation of Results
After printing all of the functions for the list crater_diameter:
crater_diameter = [46, 51, 49, 82, 74, 63, 49, 70, 48, 47, 79, 48, 52, 55, 49, 51, 58, 82, 72, 45] print mean(crater_diameter) print median(crater_diameter) print mode(crater_diameter)
we should have an output:
Measures of central tendency are used for identifying the central values in data set. In our collection of craters’ diameters the most frequent value is 49 km. Average diameter is 58.5 km. Half of the values are less than 51.5 km and half are greater than that.