Pandas
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
Install⚑
pip3 install pandas
Import
import pandas as pd
Snippets⚑
Load csv⚑
data = pd.read_csv("filename.csv")
If you want to parse the dates of the start
column give read_csv
the argument parse_dates=['start']
.
Do operation on column data and save it in other column⚑
# make a simple dataframe
df = pd.DataFrame({'a':[1,2], 'b':[3,4]})
df
# a b
# 0 1 3
# 1 2 4
# create an unattached column with an index
df.apply(lambda row: row.a + row.b, axis=1)
# 0 4
# 1 6
# do same but attach it to the dataframe
df['c'] = df.apply(lambda row: row.a + row.b, axis=1)
df
# a b c
# 0 1 3 4
# 1 2 4 6
Get unique values of column⚑
If we want to get the unique values of the name
column:
df.name.unique()
Extract columns of dataframe⚑
df1 = df[['a','b']]
Remove dumplicate rows⚑
df = df.drop_duplicates()
Remove column from dataframe⚑
del df['name']
Count unique combinations of values in selected columns⚑
df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
A B count
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3
Get row that contains the maximum value of a column⚑
df.loc[df['Value'].idxmax()]