What is Pandas in Python
pandas is a Python library used to analyze, clean, explore, and manipulate data.
Pandas is the tool for you when working with tabular data, such as data
stored in spreadsheets or databases.
pandas will help you explore, clean, and process your data.
In Pandas, a data table is called a DataFrame.
created by Ves Mckinney
Pandas give you answers about the data Like:
- Relation between two or more columns
- average value
- Max value
- Min value
when we perform cleaning the data, we are going to delete rows that are :
not relevant, or
contains wrong values, like:
empty or
NULL values.
data structures in pandas:
series
A Pandas Series is like a column in a table.
syntax:
pandas.Series( data, index, dtype, copy)It is a one-dimensional array holding data of any type.
example:
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar.head())
hasnans: returns true if a series contains any NaNs.
example:
s = pd.Series([5, 5, 23, None])
print(s)
0 5
1 5
2 23
3 NaN
print(s.hasnans)
output:
True
empty: returns true if a series is empty
example:
import pandas as pd
import numpy as np
ser_empty = pd.Series([{'a':np.nan}])
print(ser_empty)
print(ser_empty.empty)
ser_empty = pd.Series()
print(ser_empty.empty)
False
True
This function selects data by referring to the explicit index.
example:
import pandas as pd
a = [1, 7, 2,6,4,34,56,5,4,45]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar.loc[3:4])
importing pandas module and making data frame from csv
import pandas as pd
df = pd.read_csv("nba.csv")
ser = pd.Series(df['Name'])
data = ser.head(10)
Add and subtract data of one series into another string
import pandas as pd
# creating a series
data = pd.Series([5, 2, 3,7], index=['a', 'b', 'c', 'd'])
# creating a series
data1 = pd.Series([1, 6, 4, 9], index=['a', 'b', 'd', 'e'])
print(data, "\n\n", data1)
data.add(data1, fill_value=0)
print(data)
data.sub(data1, fill_value=0)
print(data)
example:
# importing pandas as pd
import pandas as pd
# Specify start and periods, the number of periods (days).
dRan1 = pd.date_range(start ='1-1-2018', periods = 13)
# Specify end and periods, the number of periods (days).
dRan2 = pd.date_range(end ='1-1-2018', periods = 13)
# Specify start, end, and periods; the frequency
# is generated automatically (linearly spaced).
dRan3 = pd.date_range(start ='01-03-2017',
end ='1-1-2018', periods = 13)
print(dRan1, "\n\n", dRan2, '\n\n', dRan3)
value_counts(): returns a series of each counted element in a given series.
example:
import pandas as pd
s= pd.Series([4,5,6,2,4,None])
print(s.value_counts())
4.0 2
5.0 1
6.0 1
2.0 1
dtype: int64
dataframes
import pandas as pd
data ={
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
#load data into a DataFrame object:
df = pd.DataFrame(data,index=["d1","d2","d3"])
print(df)
print(df.loc["d2"])
Skip the columns to display
columns_to_skip = ['foo','bar']df = pd.read_csv(file, usecols=lambda x: x not in columns_to_skip )print(df)
print(df.head())

0 Comments