Skip to content

Pandas

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Install

pip3 install pandas

Import

import pandas as pd

Snippets

Load csv

data = pd.read_csv("filename.csv")

If you want to parse the dates of the start column give read_csv the argument parse_dates=['start'].

Do operation on column data and save it in other column

# make a simple dataframe
df = pd.DataFrame({'a':[1,2], 'b':[3,4]})
df
#    a  b
# 0  1  3
# 1  2  4

# create an unattached column with an index
df.apply(lambda row: row.a + row.b, axis=1)
# 0    4
# 1    6

# do same but attach it to the dataframe
df['c'] = df.apply(lambda row: row.a + row.b, axis=1)
df
#    a  b  c
# 0  1  3  4
# 1  2  4  6

Get unique values of column

If we want to get the unique values of the name column:

df.name.unique()

Extract columns of dataframe

df1 = df[['a','b']]

Remove dumplicate rows

df = df.drop_duplicates()

Remove column from dataframe

del df['name']

Count unique combinations of values in selected columns

df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})

     A    B  count
0   no   no      1
1   no  yes      2
2  yes   no      4
3  yes  yes      3

Get row that contains the maximum value of a column

df.loc[df['Value'].idxmax()]

References