Statistics in Python: Frequency Distribution

Previous articles concentrated on managing and visualizing data with numpy and mathplotlib.pyplot libraries. Now, it is time to count more statistics, but manually, without built-in functions. For the beginning, let’s write some code to show a frequency distribution of the variable crater_diameter.

def frequency_distribution(datalist):
    freqs = dict()
    for item in datalist:
        if item not in freqs:
            freqs[item] = 1
            freqs[item] += 1
    return freqs

def print_dict(freqs):
    for item in freqs:
        print item, freqs[item]

result = frequency_distribution(crater_diameter)

Our variable crater_diameter is a list of values. We want the program to iterate through this list and have an output of values and their frequencies in the list. To do that, I defined the function frequency_distribution(), which takes a list as an argument. Then I created a variable freqs, which is a dictionary. Dictionary is actually the best way to combine values with occurrences. Next, for each element in the list, if the element was not added to the dictionary yet, we set the frequency on 1. Else, if the element was mentioned before, the frequency increases of 1. At the end the function returns the dictionary freqs.

We want also to print this dictionary not as an standard dictionary, but in two columns. For that we need another function print_dict(), which prints key and value for every key in dictionary (names of variables can be different, here they are the same only to be easier to imagine):

Zrzut ekranu 2016-07-27 o 08.54.53

Tagged with: , ,
Posted in data analysis, python

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: