Pandas Cheatsheet

Reset index for data frame based on specific column

df2=df.set_index(['id'])

Show null columns of dataframe

df.ix[df.index[(df.T==np.nan).sum() > 1]]

Sum specific column of dataframe, excluding NULLs

df[df['col_name'].notnull()]['col_name'].sum()

Loop over rows of dataframe
With iterrows (produces index, columns as a Series):

for idx, row in df.iterrows():
         print idx, row['a'], row['col2'], row['col3']

With itertuples ( produces tuple of index and column values):

for irow in df.itertuples():
         print irow

Check if dataframe column contains string

df['field_name'].str.contains('trinidad')

Increase the width of dataframe in jupyter display

pd.options.display.max_colwidth = 100

Estimate memory usage for dataframe
By column

df.memory_usage(index=True)

Total

df.memory_usage(index=True).sum()

Save numpy array to text file

np.savetxt('data.txt', A,fmt="%s")

Filter rows with null/nan values and write to CSV

rp_pdf[rp_pdf['last_name'].isnull()]\
.to_csv("./rows_missing_last.csv", sep='|', index=False, header=True)

Leave a Reply

Your email address will not be published. Required fields are marked *