Mars Craters Research Project #4: Visualizing Data

This study shows graphically the distribution of crater morphologies, crater location plot and relationship between diameter and depth of a crater. The Python program below should call the graphs, but unfortunately it is impossible due to traceback error (the matplot.pyplot library not found). Therefore, to show the graphs, Tableu Public application was used.

import pandas
import numpy
import seaborn
import matplot.pyplot as plt

data = pandas.read_csv('marscrater_pds.csv', low_memory=False)
pandas.set_option('display.max_columns', None)
pandas.set_option('display.max_rows', None)
pandas.set_option('display.float_format', lambda x:'%f'%x)

data['LATITUDE_CIRCLE_IMAGE'] = data['LATITUDE_CIRCLE_IMAGE'].convert_objects(convert_numeric=True)
data['LONGITUDE_CIRCLE_IMAGE'] = data['LONGITUDE_CIRCLE_IMAGE'].convert_objects(convert_numeric=True)
data['DIAM_CIRCLE_IMAGE'] = data['DIAM_CIRCLE_IMAGE'].convert_objects(convert_numeric=True)
data['DEPTH_RIMFLOOR_TOPOG'] = data['DEPTH_RIMFLOOR_TOPOG'].convert_objects(convert_numeric=True)
data['MORPHOLOGY_EJECTA_1'] = data['MORPHOLOGY_EJECTA_1'].replace(' ', numpy.nan)
data['DEPTH_RIMFLOOR_TOPOG'] = data['DEPTH_RIMFLOOR_TOPOG'].replace(0.00, numpy.nan)

seaborn.countplot(x='MORPHOLOGY_EJECTA_1', data=data)
plt.xlabel('MORPHOLOGY_EJECTA_1')
plt.title('Number of craters with particular ejecta morphologies')
print 'Describe MORPHOLOGY_EJECTA_1'
ejmorph = data['MORPHOLOGY_EJECTA_1'].describe()
print ejmorph

seaborn.regplot(x='LONGITUDE_CIRCLE_IMAGE', y='LATITUDE_CIRCLE_IMAGE', fit_reg=False, data=data)
plt.xlabel('LONGITUDE')
plt.ylabel('LATITUDE')
plt.title('Crater Location Plot')
seaborn.regplot(x='DIAM_CIRCLE_IMAGE', y='DEPTH_RIMFLOOR_TOPOG', fit_reg=False, data=data)

plt.xlabel('DIAMETER')
plt.ylabel('DEPTH')
plt.title('Relationship Between Crater Diameter and Crater Depth')

tumblr_inline_o72wfsIw7r1u6o60s_540

Number of craters with particular ejecta morphologies

tumblr_inline_o72w36zPOk1u6o60s_540

The univariate graph of ejecta morphologies shows only morphologies with more than or equal to 200 craters. This graph is unimodal, with its highest peak at the Rd ejecta morphology.

Crater Location Plot

ALL CRATERS

The graph above plots the latitude to the longitude for all craters with defined ejecta morphology. We can read this graph as a map on which we can see that there are places where there are more craters and places where there are no craters. The scatter graph does not show a clear relationship/trend between the two variables.

Relationship Between Crater Diameter and Crater Depth

DEPTH-DIAM-ALL CRATERS

The graph above plots the crater depth to the crater diameter. We can see that the scatter graph shows a weak relationship/trend between the two variables.

Tagged with: , , , ,
Posted in data analysis, python

Python Exercises: List Methods and Dictionaries

Below presented exercises come from An Introduction to Interactive Programming in Python (Part 2) course available on Coursera. All methods have one goal: to subordinate names of week days to numbers, starting from 0, ending on 6.

Exercise 1

Write a function 𝚍𝚊𝚢_𝚝𝚘_𝚗𝚞𝚖𝚋𝚎𝚛(𝚍𝚊𝚢) that takes the supplied global list 𝚍𝚊𝚢_𝚕𝚒𝚜𝚝 and returns the position of the given day in that list. You can either use the Docs to locate the appropriate list method or write a 𝚏𝚘𝚛 loop to implement this function.

Each time in following test

print day_to_number('Sunday'), day_to_number('Monday'), day_to_number('Tuesday'), day_to_number('Wednesday'), day_to_number('Thursday'), day_to_number('Friday'), day_to_number('Saturday')

we should have an output: 0 1 2 3 4 5 6

Solution 1

day_list = ['Sunday';, 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']

def day_to_number(day):
    day = day_list.index(day)
    return day

Solution 2

day_list = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']

def day_to_number(day):
    position = -1
    for daya in day_list:
        position += 1
        if daya == day:
            return position

Solution 3

day_list = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']

def day_to_number(day):
   pos = 0
   for i in range(len(day_list)):
       if day_list[i] == day:
           pos = i
   return pos

Exercise 2

Create a dictionary 𝚍𝚊𝚢_𝚝𝚘_𝚗𝚞𝚖𝚋𝚎𝚛 that converts the days of the week “𝚂𝚞𝚗𝚍𝚊𝚢”, “𝙼𝚘𝚗𝚍𝚊𝚢”, … into the numbers 𝟶, 𝟷, …, respectively.

Each time in following test

print day_to_number['Sunday'], day_to_number['Monday'], day_to_number['Tuesday'], day_to_number['Wednesday'], day_to_number['Thursday'], day_to_number['Friday'], day_to_number['Saturday']

we should have an output: 0 1 2 3 4 5 6

Solution

day_to_number = {'Sunday': 0, 'Monday': 1, 'Tuesday': 2, 'Wednesday': 3, 'Thursday': 4, 'Friday': 5, 'Saturday': 6}
Tagged with: , ,
Posted in python

Mars Craters Research Project #3: Managing Data

There were 3 variables worked out in this Python program: diameter of a crater (DIAM_CIRCLE_IMAGE), depth (DEPTH_RIMFLOOR_TOPOG) and ejecta morphologies (MORPHOLOGY_EJECTA_1, MORPHOLOGY_EJECTA_2).

import pandas
import numpy

data = pandas.read_csv('marscrater_pds.csv', low_memory=False)
pandas.set_option('display.float_format', lambda x:'%f'%x)

#Setting variables to numeric
data['DIAM_CIRCLE_IMAGE'] = data['DIAM_CIRCLE_IMAGE'].convert_objects(convert_numeric=True)
data['DEPTH_RIMFLOOR_TOPOG'] = data['DEPTH_RIMFLOOR_TOPOG'].convert_objects(convert_numeric=True)

data['DIAMETER'] = pandas.cut(data.DIAM_CIRCLE_IMAGE,[0.99,1.99,2.99,3.99,4.99,5.99,6.99,7.99,8.99,9.99,10.99,11.99,12.99,13.99,14.99,15.99,16.99,17.99,18.99,19.99,20.99,21.99,22.99,23.99,24.99,25.99,26.99,27.99,28.99,29.99,30.99,31.99,32.99,33.99,34.99,35.99,36.99,37.99,38.99,39.99,40.99,41.99,42.99,43.99,44.99,45.99,46.99,47.99,48.99,49.99,54.99,59.99,69.99,79.99,89.99,99.99,499.99,999.99,1499.99])
print 'Counts for DIAMETER'
c4a = data['DIAMETER'].value_counts().sort_index(inplace=False)
print c4a
print 'Percantages for DIAMETER'
p4a = data['DIAMETER'].value_counts(sort=False, normalize=True)
print p4a

data['DEPTH'] = pandas.cut(data.DEPTH_RIMFLOOR_TOPOG, [-1,-0.01,0.09,0.19,0.29,0.39,0.49,0.59,0.69,0.79,0.89,0.99,1.09,1.19,1.29,1.39,1.49,1.59,1.69,1.79,1.89,1.99,2.49,2.99])
data['DEPTH'] = data['DEPTH'].replace(0.00, numpy.nan)
print 'Counts for DEPTH'
c5a = data['DEPTH'].value_counts(dropna=False).sort_index(inplace=False)
print c5a
print 'Percantages for DEPTH'
p5a = data['DEPTH'].value_counts(sort=False, normalize=True, dropna=False)
print p5a

data['EJECTA'] = data['MORPHOLOGY_EJECTA_1'] + data['MORPHOLOGY_EJECTA_2']
data['EJECTA'] = data['EJECTA'].replace(' ', numpy.nan)
print 'Counts for EJECTA'
c67 = data['EJECTA'].value_counts(sort=True, dropna=False)
pandas.set_option('display.max_rows', len(data))
print c67
print 'Percentages for EJECTA'
p67 = data['EJECTA'].value_counts(sort=True, normalize=True, dropna=False)
pandas.set_option('display.max_rows', len(data))
print p67

Below you can find the output of this program: counts and percentages for chosen variables.

To have better organized distribution for diameter, the variable was cut in ranges: 1-1.99 km, 2-2.99 km etc. (database contains craters with diameter ≥ 1km ). All craters were measured, so there is no missing data in this variable. About 94% of craters have diameter less than 10 km.

To have better organized distribution for depth, the variable was cut in ranges: -0.99 – 0.01, 0 – 0.09, 0.1 – 0.19 km etc. (there were 10 craters with negative value). Because about 80% of craters were not measured, category NaN was added to frequency distribution.

For ejecta morphology there are 2 characteristics important: MORPHOLOGY_EJECTA_1 and MORPHOLOGY_EJECTA_2. Both variables were joint into one variable. Because about 95% of craters were not identified, category NaN was added to frequency distribution.

Tagged with: , , , ,
Posted in data analysis, python

Mars Craters Research Project #2: Running the First Program

All observed craters on Mars were investigated in terms of several characteristics, like ejecta morphology, depth of the crater or number of layers. This Python program prints frequency distributions for 3 above mentioned variables: MORPHOLOGY_EJECTA_1, DEPTH_RIMFLOOR_TOPOG and NUMBER_LAYERS.

import pandas
import numpy

data = pandas.read_csv('marscrater_pds.csv', low_memory=False)

print len(data.index) #Number of observations/rows
print len(data.columns) #Number of variables/columns

#Setting variables to numeric
data['NUMBER_LAYERS'] = data['NUMBER_LAYERS'].convert_objects(convert_numeric=True)
data['DEPTH_RIMFLOOR_TOPOG'] = data['DEPTH_RIMFLOOR_TOPOG'].convert_objects(convert_numeric=True)

#Counts and percentages (frequency distributions) for each variable
print 'Counts for MORPHOLOGY_EJECTA_1'
c1 = data['MORPHOLOGY_EJECTA_1'].value_counts(sort=True)
print c1
print 'Percentages for MORPHOLOGY_EJECTA_1'
p1 = data['MORPHOLOGY_EJECTA_1'].value_counts(sort=True, normalize=True)
print p1

print 'Counts for DEPTH_RIMFLOOR_TOPOG'
c2 = data['DEPTH_RIMFLOOR_TOPOG'].value_counts(sort=True)
print c2
print 'Percentages for DEPTH_RIMFLOOR_TOPOG'
p2 = data['DEPTH_RIMFLOOR_TOPOG'].value_counts(sort=True, normalize=True)
print p2

print 'Counts for NUMBER_LAYERS'
c3 = data['NUMBER_LAYERS'].value_counts(sort=True)
print c3
print 'Percentages for NUMBER_LAYERS'
p3 = data['NUMBER_LAYERS'].value_counts(sort=True, normalize=True)
print p3

Below you can find the output of this program: counts and percentages for chosen variables.

Of the total number of craters, 88,4% have no identified ejecta morphology (no label means not identified). In the group of other craters there are following mostly present types: Rd, SLEPS, SLERS, SLEPC, SLERC and DLERS.

About 80% of craters have unknown depth (0.00 means not measured). From the distribution of the rest of them we can observe linear decrease of number of craters with a increase of depth.

About 94,9% of craters have unknown number of layers (0 means not measured). Distribution of this variable show also a linear correlation of number of layers and number of craters in inverse proportion.


In the output you can see only first 30 and last 30 lines. If you want to see all lines, you have to add this code: pandas.set_option('display.max_rows', len(data)). To sort the data by index, you have to add function sort_index(inplace=False). Eventually my code looked like that:

print 'Counts for DIAM_CIRCLE_IMAGE'
c4 = data['DIAM_CIRCLE_IMAGE'].value_counts().sort_index(inplace=False)
pandas.set_option('display.max_rows', len(data))
print c4
Tagged with: , , , ,
Posted in data analysis, python

Python Exercises: Computing Pay

Below exercises come from the book ‘Python for Informatics. Exploring Information.’ Version 2.7.0 by Charles Severance.

Exercise 2.3

Write a program to prompt the user for hours and rate per hour to compute gross pay.
Enter Hours: 35 Enter Rate: 2.75 Pay: 96.25.

Solution 1

hrs = raw_input('Enter hours:')
rate = raw_input('Enter rate:')

pay = float(hrs) * float(rate)
print pay

This is a very simple program which multiplies two given numbers. Function raw_input asks user for entering number of hours and rate. Both numbers are read as strings, so to compute the pay, inputs have to be changed into float numbers with the aid of float() function. But, this solution can be a little bit risky, because the user can enter for example a letter instead of a number. The program will fail and throw an error: Line 4: ValueError: float: Argument: a is not number. That is why we can use try/except to handle such exception.

Solution 2

hrs = raw_input('Enter hours:')
rate = raw_input('Enter rate:')

try:
  pay = float(hrs) * float(rate)
  print pay
except ValueError:
  print 'Entered value is not a number'

This solution is safer. If the user enter a mark which is not a number, the program will run, but will print the message ‘Entered value is not a number’.

Exercise 3.1

Rewrite your pay computation to give the employee 1.5 times the hourly rate for hours worked above 40 hours. Enter Hours: 45, Enter Rate: 10, Pay: 475.0.

hrs = raw_input('Enter hours:')
h = float(hrs)
rate = raw_input('Enter rate:')
r = float(rate)
pay = 0

if h <= 40:
  pay = h * r
elif h > 40:
  pay = (h - 40 * (r * 1.5)) + 40 * r

print pay

Now, we have to compute the overtime paid 1.5 times the hourly rate. We assume this time, that the user enter only correct values and we do not use try/except. In this program we have to distinguish, if the employee worked less than or equal to 40 hours or more than 40 hours. To do this, we use conditional statement if/elif/else.

Exercise 4.6

Rewrite your pay computation with time-and-a-half for overtime and create a function called computepay which takes two parameters (hours and rate).
Enter Hours: 45, Enter Rate: 10 Pay: 475.0.

def computepay(h,r):
  pay = 0
  if h <= 40:
    pay = h * r
    return pay
  elif h > 40:
    pay = (h - 40 * (r * 1.5)) + 40 * r
    return pay

hrs = raw_input('Enter hours:')
hrs = float(hrs)
rate = raw_input('Enter rate:')
rate = float(rate)

print computepay(hrs, rate)

In this program we have to do the same, but with use of function. We can use the same conditional statement, but within the function.

Tagged with: , ,
Posted in python

Mars Craters Research Project #1: Getting Started

This project is a Data Management and Visualization course assignment. All sources were made available by Wesleyan University on Coursera and were created on the basis of Planetary Surface Properties, Cratering Physics, and the Volcanic History of Mars from a New Global Martian Crater Database
 by
 Stuart James Robbins:

Topic of Interest

The data set provides information about 378,540 craters on Mars which can help us understand what major events, like bombardments, impacts, took place on this planet. The main question I would like to put is: Is the location of the crater associated with its type/main ejecta characteristics? Are the craters with particular ejecta morphology concentrated in one region or are they diffused and is their position random? Is their existence a sign of particular historical event or does the ejecta morphology depend on local surface type?

The secondary topic of interest I would like to explore is: Is the ejecta morphology associated with diameter and depth of the crater? In other words, is the type of the crater correlated with is size?

Codebook

In my personal codebook I would like to use variables which describe geographic location of the crater (CRATER_ID, LATITUDE_CIRCLE_IMAGE, LONGITUDE_CIRCLE_IMAGE) and variables which describe the crater itself (DIAM_CIRCLE_IMAGE, DEPTH_RIMFLOOR_TOPOG, MORPHOLOGY_EJECTA_1, MORPHOLOGY_EJECTA_2).

Hypotheses

Particular types of craters (variables MORPHOLOGY_EJECTA_1, MORPHOLOGY_EJECTA_2) are placed in specific regions of the planet (variables LATITUDE_CIRCLE_IMAGE, LONGITUDE_CIRCLE_IMAGE). There are studies which show such a correlation with the type of surface, for example: ‘a latitude dependence on the distribution of ice and/or
water may explain the occurrence or absence of particular ejecta morphologies’ 
(1). There are some particular types of craters which latitudinal location dependence was proved: ‘SLE morphologies dominate across the entire ±60° latitude zone. […] DLE morphologies are primarily concentrated in the northern plains regions of Mars, especially between 35°N and 60°N’ (2).

Particular types of craters (variables MORPHOLOGY_EJECTA_1, MORPHOLOGY_EJECTA_2) are associated with diameter and depth (DIAM_CIRCLE_IMAGE, DEPTH_RIMFLOOR_TOPOG). Studies show that such correlation exists: ‘All the ejecta and almost all the interior morphologies studied here show a relationship with crater diameter. This correlation actually reflects the greater depth of excavation associated with larger craters’ (*1).

Literature

Below articles were found on Google Scholar with following key phrases: ‘mars craters diameter’, ‘mars craters distribution’.

  1. Barlow, N.G., and Bradley, T.L.: 1990, ‘Martian Impact Craters: Correlations of Ejecta and Interior Morphologies with Diameter, Latitude, and Terrain’
  2. Barlow, N.G., and Perez, C.B.: 2003, ‘Martian impact crater ejecta morphologies as indicators of the distribution of subsurface volatiles’ 
Tagged with: , , ,
Posted in data analysis, python

Starting to learn

Fortunately we have today a lot of possibilities and sources to learn new technologies. There are books, discussion boards, documentations and libraries, online courses and many more teaching aids. For sure you’ll find your own path and methods very easily. Here I would like to present my learning steps I have done so far and recommend a few sources I have used in learning Python and data analysis.

I started to learn Python at Codecademy. It’s a very easy tool for beginners which goes through all basic structures like strings, conditionals, functions, lists, dictionaries, loops and classes. Exercise instructions are very clear and comprehensible for someone who never dealt with programming. At Codecademy you can also learn HTML/CSS, JAvaScript, jQuery, PHP, Ruby and SQL.

Then I discovered Coursera, where I join new courses systematically. There are plenty of subjects you can choose from and you can study almost everything you want. Hereby I present courses that I accomplished. I can recommend them with my whole heart. Some of them were quite demanding, but the satisfaction after you get credit is really great. It is also worth to mention platforms like StackOverflow and Python Documentation which were very helpful in accomplishing the courses.

Now I start a new subject: data analysis with Python, SAS, R and Excel. Below courses I have chosen to participate in.

In the future I will post my exercises and assignments for those courses and other interesting problems that can occur during learning process.

Tagged with: , ,
Posted in learning, python