What is Pandas in Python

What is Pandas in Python

What is Pandas in Python




pandas is a Python library used to analyze, clean, explore, and manipulate data.


Pandas is the tool for you when working with tabular data, such as data

stored in spreadsheets or databases. 


pandas will help you explore, clean, and process your data. 


In Pandas, a data table is called a DataFrame.

created by Ves Mckinney


Pandas give you answers about the data Like:

  • Relation between two or more columns
  • average value
  • Max value
  • Min value


when we perform cleaning the data, we are going to delete rows that are :

            not relevant, or

            contains wrong values, like:

                                    empty or

                                    NULL values.

            

data structures in pandas:

series

A Pandas Series is like a column in a table.

syntax:

pandas.Series( data, index, dtype, copy)

It is a one-dimensional array holding data of any type.

example:

import pandas as pd

a = [1, 7, 2]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar.head())


hasnans: returns true if a series contains any NaNs.


example:

s = pd.Series([5, 5, 23, None])

print(s)

0    5

1    5

2    23

3    NaN


print(s.hasnans)

output:

True


empty: returns true if a series is empty


example:

import pandas as pd

import numpy as np

ser_empty = pd.Series([{'a':np.nan}])

print(ser_empty)

print(ser_empty.empty)

ser_empty = pd.Series()

print(ser_empty.empty)

output:
False
True


fillna: return a series where NaN values are filled with the defined values.

example:
import pandas as pd
s= pd.Series([4,5,6,2,4,None])
new_s=s.fillna(222222)
print(new_s)


sort_values(data,na_position,inplace) :returns a sorted series

example:

import pandas as pd
s= pd.Series([4,5,6,2,4,None])
print(s)
print(s.sort_values())
print(s.sort_values(na_position="first"))




loc

This function selects data by referring to the explicit index.

example:

import pandas as pd

a = [1, 7, 2,6,4,34,56,5,4,45]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar.loc[3:4])


importing pandas module  and  making data frame  from csv

import pandas as pd       

df = pd.read_csv("nba.csv")  

ser = pd.Series(df['Name']) 

data = ser.head(10)



Add and subtract data of one series into another string

import pandas as pd   

# creating a series

data = pd.Series([5, 2, 3,7], index=['a', 'b', 'c', 'd'])

# creating a series

data1 = pd.Series([1, 6, 4, 9], index=['a', 'b', 'd', 'e'])

print(data, "\n\n", data1)

data.add(data1, fill_value=0)

print(data)

data.sub(data1, fill_value=0)

print(data)


example:

# importing pandas as pd

import pandas as pd  

# Specify start and periods, the number of periods (days).

dRan1 = pd.date_range(start ='1-1-2018', periods = 13)

# Specify end and periods, the number of periods (days).

dRan2 = pd.date_range(end ='1-1-2018', periods = 13)

# Specify start, end, and periods; the frequency 

# is generated automatically (linearly spaced).

dRan3 = pd.date_range(start ='01-03-2017', 

            end ='1-1-2018', periods = 13)  

print(dRan1, "\n\n", dRan2, '\n\n', dRan3)


value_counts(): returns a series of each counted element in a given series.

example:

import pandas as pd

s= pd.Series([4,5,6,2,4,None])

print(s.value_counts())


output:

4.0    2

5.0    1

6.0    1

2.0    1

dtype: int64


dataframes

import pandas as pd


data ={

  "calories": [420, 380, 390],

  "duration": [50, 40, 45]

        }

#load data into a DataFrame object:

df = pd.DataFrame(data,index=["d1","d2","d3"])

print(df) 

print(df.loc["d2"])




Skip the columns to display

columns_to_skip = ['foo','bar']
df = pd.read_csv(file, usecols=lambda x: x not in columns_to_skip )

print(df) 

print(df.head())

0 Comments