关注我们

Pandas DataFrame

Pandas DataFrame是一种广泛使用的数据结构，可用于带有标记轴(行和列)的二维数组。 DataFrame被定义为一种标准的存储数据的方式，该数据具有两个不同的索引，即行索引(row index)和列索引(column index)。它包含以下属性:

这些列可以是int，bool等。
可以看作是Series结构的字典，其中行和列都被索引了。如果是列，则表示为"列(columns)"；如果是行，则表示为"索引(index)"。

创建数据框

无涯教程可以使用以下方式创建一个DataFrame数据框:

字典(dict)
列表(Lists)
Numpy ndarrrays
Series

创建一个空的DataFrame数据框

以下代码显示了如何在Pandas中创建一个空的DataFrame数据框:

复制代码

# importing the pandas library
import pandas as pd
df = pd.DataFrame()
print (df)

输出

Empty DataFrame
Columns: []
Index: []

在上面的代码中，首先导入了别名为 pd 的pandas库，然后定义了一个名为 df 的变量。包含一个空的DataFrame数据框。最后，通过将 df 传递到 print 中进行了打印。

通过List创建DataFrame数据框

可以使用list在Pandas中轻松创建一个DataFrame。

复制代码

# importing the pandas library
import pandas as pd
# 字符串列表
x = ['Python', 'Pandas']

# 在列表中调用 DataFrame 构造函数
df = pd.DataFrame(x)
print(df)

输出

      0
0   Python
1   Pandas

在上面的代码中，定义了一个名为" x"的变量，该变量由字符串值组成。正在调用DataFrame构造函数以获取列表以打印值。

通过Dict创建DataFrame数据框

复制代码

# importing the pandas library
import pandas as pd
info = {'ID' :[101, 102, 103],'Department' :['B.Sc','B.Tech','M.Tech',]}
df = pd.DataFrame(info)
print (df)

输出

       ID      Department
0      101        B.Sc
1      102        B.Tech
2      103        M.Tech

说明:在上面的代码中，无涯教程定义了一个名为" info"的字典，该字典由 ID 和Department组成的列表(list)。为了打印值，必须通过名为 df 的变量调用信息字典，并将其作为参数传递给 print()。

通过Series Dict创建DataFrame数据框

复制代码

# importing the pandas library
import pandas as pd

info = {'one' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']),
   'two' : pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])}

d1 = pd.DataFrame(info)
print (d1)

输出

        one         two
a       1.0          1
b       2.0          2
c       3.0          3
d       4.0          4
e       5.0          5
f       6.0          6
g       NaN          7
h       NaN          8

在上面的代码中，名为" info"的词典由两个具有各自索引的Series组成。为了打印值，必须通过名为 d1 的变量调用 info 字典，并将其作为参数传递给 print()。

数据框列选择

可以从DataFrame中选择任何列。这是演示如何从DataFrame中选择列的代码。

复制代码

# importing the pandas library
import pandas as pd

info = {'one' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']),
   'two' : pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])}

d1 = pd.DataFrame(info)
print (d1 ['one'])

输出

a      1.0
b      2.0
c      3.0
d      4.0
e      5.0
f      6.0
g      NaN
h      NaN
Name: one, dtype: float64

数据框列添加

无涯教程还可以将任何新列添加到现有DataFrame中。以下代码演示了如何将任何新列添加到现有DataFrame中:

复制代码

# importing the pandas library
import pandas as pd

info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}

df = pd.DataFrame(info)

# 向现有 DataFrame 对象添加新列

print ("Add new column by passing series")
df['three']=pd.Series([20,40,60],index=['a','b','c'])
print (df)

print ("Add new column using existing DataFrame columns")
df['four']=df['one']+df['three']

print (df)

输出

Add new column by passing series
      one     two      three
a     1.0      1        20.0
b     2.0      2        40.0
c     3.0      3        60.0
d     4.0      4        NaN
e     5.0      5        NaN
f     NaN      6        NaN

Add new column using existing DataFrame columns
       one      two       three      four
a      1.0       1         20.0      21.0
b      2.0       2         40.0      42.0
c      3.0       3         60.0      63.0
d      4.0       4         NaN      NaN
e      5.0       5         NaN      NaN
f      NaN       6         NaN      NaN

说明:在上面的代码中，名为 f 的字典由两个 Series 和各自的 index 组成。后来，通过变量 df 调用了 info 词典。

要将新列添加到现有DataFrame对象，传递了一个新Series，其中包含一些有关其索引的值，并使用 print()打印其结果。

可以使用现有的DataFrame添加新列。添加了" four"列，该列存储两个列相加的结果，即one和three。

数据框列删除

无涯教程还可以从现有DataFrame中删除任何列。此代码有助于说明如何从现有DataFrame中删除该列:

复制代码

# importing the pandas library
import pandas as pd

info = {'one' : pd.Series([1, 2], index= ['a', 'b']), 
   'two' : pd.Series([1, 2, 3], index=['a', 'b', 'c'])}
   
df = pd.DataFrame(info)
print ("The DataFrame:")
print (df)

# 使用del函数
print ("Delete the first column:")
del df['one']
print (df)
# 使用pop功能
print ("Delete the another column:")
df.pop('two')
print (df)

输出

The DataFrame:
      one    two
a     1.0     1
b     2.0     2
c     NaN     3

Delete the first column:
     two
a     1
b     2
c     3

Delete the another column:
Empty DataFrame
Columns: []
Index: [a, b, c]

在上面的代码中， df 变量负责调用 info 词典并打印词典的全部值。可以使用delete 或 pop 函数从DataFrame中删除列。

在第一种情况下，使用 delete 函数从数据帧中删除" one"列，而在第二种情况下，使用了 pop 函数从数据框中删除" two"列。

行选择,添加和删除

行选择

无涯教程可以随时轻松地选择，添加或删除任何行。首先，将了解行的选择。看看如何使用以下不同方式选择行:

按标签选择: 可以通过将行标签传递给 loc 函数来选择任何行。

复制代码

# importing the pandas library
import pandas as pd

info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']), 
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}

df = pd.DataFrame(info)
print (df.loc['b'])

输出

one    2.0
two    2.0
Name: b, dtype: float64

说明: 在上面的代码中，名为info的字典由两个具有各自索引的Series组成。为了选择行，将行标签传递给了 loc 函数。

按位置选择: 也可以通过将整数位置传递给 iloc 函数来选择行。

复制代码

# importing the pandas library
import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = pd.DataFrame(info)
print (df.iloc[3])

输出

one    4.0
two    4.0
Name: d, dtype: float64

说明:在上面的代码中，定义了一个名为info的字典，该字典由两个具有各自索引的Series组成。为了选择行，将整数位置传递给了 iloc 函数。

按Slice切面选择: 这是使用':'运算符选择多个行的另一种方法。

复制代码

# importing the pandas library
import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']), 
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = pd.DataFrame(info)
print (df[2:5])

输出

      one    two
c     3.0     3
d     4.0     4
e     5.0     5

说明: 在上面的代码中，无涯教程为行选择定义了一个2:5的参数，然后将其值打印在控制台上。

添加行: 可以使用 append 函数轻松地向DataFrame添加新行。它在末尾添加新行。

复制代码

# importing the pandas library
import pandas as pd
d = pd.DataFrame([[7, 8], [9, 10]], columns = ['x','y'])
d2 = pd.DataFrame([[11, 12], [13, 14]], columns = ['x','y'])
d = d.append(d2)
print (d)

输出

      x      y
0     7      8
1     9      10
0     11     12
1     13     14

说明: 在上面的代码中，定义了两个单独的列表，其中包含一些行和列。这些列已使用append函数添加，然后结果显示在控制台上。

删除行: 可以使用 index 标签从DataFrame中删除或删除任何行。如果万一标签重复，则将删除多行。

复制代码

# importing the pandas library
import pandas as pd

a_info = pd.DataFrame([[4, 5], [6, 7]], columns = ['x','y'])
b_info = pd.DataFrame([[8, 9], [10, 11]], columns = ['x','y'])

a_info = a_info.append(b_info)

# Drop rows with label 0
a_info = a_info.drop(0)

输出

x      y
1     6      7
1     10    11

说明: 在上面的代码中，定义了两个单独的列表，其中包含一些行和列。

在这里，无涯教程定义了需要从列表中删除的行的索引标签。

DataFrame函数

DataFrame中使用了许多函数，如下所示:

函数	说明
Pandas DataFrame.append()	将其他数据框的行添加到给定数据框的末尾。
Pandas DataFrame.apply()	允许用户传递函数并将其应用于PandasSeries的每个单个值。
Pandas DataFrame.assign()	将新列添加到数据框中。
Pandas DataFrame.astype()	将Pandas对象投射到指定的dtype.astype()函数。
Pandas DataFrame.concat()	在DataFrame中沿轴执行串联操作。
Pandas DataFrame.count()	计算每个colu的非NA细胞数或行。
Pandas DataFrame.describe()	计算一些统计数据，例如Series或DataFrame的数值的百分位数，均值和标准差。
Pandas DataFrame.drop_duplicates()	从DataFrame中删除重复的值。
Pandas DataFrame.groupby()	将数据分成不同的组。
Pandas DataFrame.head()	根据位置返回对象的前n行。
Pandas DataFrame.hist()	将数字变量中的值划分为" bins"。
PandasDataFrame.iterrows()	将行迭代为(index，series)对。
Pandas DataFrame.mean()	返回所请求轴的平均值。
Pandas DataFrame.melt()	将DataFrame从宽格式取消为长格式。
Pandas DataFrame.merge()	将两个数据集合并为一个。
Pandas DataFrame.pivot_table()	使用诸如"求和"，"计数"，"平均值"，"最大值"和"最小值"之类的计算来汇总数据。
Pandas DataFrame.query()	过滤数据框。
Pandas DataFrame.sample()	从数据框中随机选择行和列。
Pandas DataFrame.shift()	移动列或从数据框中减去具有上一行值的列值。
Pandas DataFrame.sort()	排序数据框。
Pandas DataFrame.sum()	返回用户所请求轴的值的总和。
Pandas DataFrame.to_excel()	将数据框导出到excel文件。
Pandas DataFrame.transpose()	转置数据框的索引和列。
Pandas DataFrame.where()	检查数据框是否存在一个或多个条件。

祝学习愉快！(内容编辑有误？请选中要编辑内容 -> 右键 -> 修改 -> 提交！)

技术教程推荐

技术领导力实战笔记 2022 -〔TGO 鲲鹏会〕

AI大模型之美 -〔徐文浩〕

PPT设计进阶 · 从基础操作到高级创意 -〔李金宝（Bobbie）〕

好记忆不如烂笔头。留下您的足迹吧 :)