Python酷库之旅-第三方库Pandas(133)

一、用法精讲

pandas.DataFrame.to_csv%E5%87%BD%E6%95%B0-toc" style="margin-left:120px;">596、pandas.DataFrame.plot.density方法

596-1、语法

596-2、参数

596-3、功能

596-4、返回值

596-5、说明

596-6、用法

596-6-1、数据准备

596-6-2、代码示例

596-6-3、结果输出

pandas.DataFrame.plot.hexbin%E6%96%B9%E6%B3%95-toc" style="margin-left:120px;">597、pandas.DataFrame.plot.hexbin方法

597-1、语法

597-2、参数

597-3、功能

597-4、返回值

597-5、说明

597-6、用法

597-6-1、数据准备

597-6-2、代码示例

597-6-3、结果输出

pandas.DataFrame.plot.hist%E6%96%B9%E6%B3%95-toc" style="margin-left:120px;">598、pandas.DataFrame.plot.hist方法

598-1、语法

598-2、参数

598-3、功能

598-4、返回值

598-5、说明

598-6、用法

598-6-1、数据准备

598-6-2、代码示例

598-6-3、结果输出

pandas.DataFrame.plot.kde%E6%96%B9%E6%B3%95-toc" style="margin-left:120px;">599、pandas.DataFrame.plot.kde方法

599-1、语法

599-2、参数

599-3、功能

599-4、返回值

599-5、说明

599-6、用法

599-6-1、数据准备

599-6-2、代码示例

599-6-3、结果输出

pandas.DataFrame.plot.line%E6%96%B9%E6%B3%95-toc" style="margin-left:120px;">600、pandas.DataFrame.plot.line方法

600-1、语法

600-2、参数

600-3、功能

600-4、返回值

600-5、说明

600-6、用法

600-6-1、数据准备

600-6-2、代码示例

600-6-3、结果输出

二、推荐阅读

1、Python筑基之旅

2、Python函数之旅

3、Python算法之旅

4、Python魔法之旅

5、博客个人主页

一、用法精讲

pandas.DataFrame.to_csv%E5%87%BD%E6%95%B0">596、pandas.DataFrame.plot.density方法

596-1、语法

python"># 596、pandas.DataFrame.plot.density方法
pandas.DataFrame.plot.density(bw_method=None, ind=None, **kwargs)
Generate Kernel Density Estimate plot using Gaussian kernels.

In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function (PDF) of a random variable. This function uses Gaussian kernels and includes automatic bandwidth determination.

Parameters:
bw_method
str, scalar or callable, optional
The method used to calculate the estimator bandwidth. This can be ‘scott’, ‘silverman’, a scalar constant or a callable. If None (default), ‘scott’ is used. See scipy.stats.gaussian_kde for more information.

ind
NumPy array or int, optional
Evaluation points for the estimated PDF. If None (default), 1000 equally spaced points are used. If ind is a NumPy array, the KDE is evaluated at the points passed. If ind is an integer, ind number of equally spaced points are used.

**kwargs
Additional keyword arguments are documented in DataFrame.plot().

Returns:
matplotlib.axes.Axes or numpy.ndarray of them.

596-2、参数

596-2-1、bw_method(可选，默认值为None)：str、float或callable，表示控制核密度估计中使用的带宽(bandwidth)，即平滑程度。

如果是str，它可以取值'scott'或'silverman'，表示使用Scott’s或Silverman’s规则来估计带宽。
如果是float，表示手动设置带宽大小，数值越小曲线越陡峭，越大则越平滑。
如果是callable，可以自定义带宽函数。
默认为None，这时默认使用Scott’s规则来估计带宽。

596-2-2、ind(可选，默认值为None)：numpy.array或整数，指定用于估计密度的采样点。

如果是numpy.array，则该数组表示用于绘制曲线的点。
如果是整数，则该值表示用来计算密度估计的点数，默认情况下会在数据范围内生成这些点。
如果是None，则默认会生成1000个等间隔的点用于密度估计。

596-2-3、**kwargs(可选)：其他传递给绘图函数的关键字参数，如设置颜色、线条样式等，它们会传递给matplotlib库的plot()函数。

596-3、功能

可以帮助可视化DataFrame中列的概率分布，通过绘制核密度曲线，可以平滑地展示数据的分布趋势，避免了直方图由于区间选择而带来的误导。

596-4、返回值

返回的是一个AxesSubplot对象，它是matplotlib中的一个绘图区域，可以用于进一步的定制化处理和显示。

596-5、说明

无

596-6、用法

596-6-1、数据准备

python">无

596-6-2、代码示例

python"># 596、pandas.DataFrame.plot.density方法
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# 生成一个随机数据集
np.random.seed(0)
data = np.random.normal(loc=0, scale=1, size=1000)
# 将数据放入DataFrame
df = pd.DataFrame(data, columns=['Values'])
# 绘制核密度估计曲线
df['Values'].plot.density(bw_method='scott', color='blue', linestyle='-', linewidth=2)
# 添加标题和标签
plt.title('Density Plot of Values')
plt.xlabel('Value')
plt.ylabel('Density')
# 显示图形
plt.show()

596-6-3、结果输出

python"># 596、pandas.DataFrame.plot.density方法
见图1

图1：

pandas.DataFrame.plot.hexbin%E6%96%B9%E6%B3%95">597、pandas.DataFrame.plot.hexbin方法

597-1、语法

python"># 597、pandas.DataFrame.plot.hexbin方法
pandas.DataFrame.plot.hexbin(x, y, C=None, reduce_C_function=None, gridsize=None, **kwargs)
Generate a hexagonal binning plot.

Generate a hexagonal binning plot of x versus y. If C is None (the default), this is a histogram of the number of occurrences of the observations at (x[i], y[i]).

If C is specified, specifies values at given coordinates (x[i], y[i]). These values are accumulated for each hexagonal bin and then reduced according to reduce_C_function, having as default the NumPy’s mean function (numpy.mean()). (If C is specified, it must also be a 1-D sequence of the same length as x and y, or a column label.)

Parameters:
xint or str
The column label or position for x points.

yint or str
The column label or position for y points.

Cint or str, optional
The column label or position for the value of (x, y) point.

reduce_C_functioncallable, default np.mean
Function of one argument that reduces all the values in a bin to a single number (e.g. np.mean, np.max, np.sum, np.std).

gridsizeint or tuple of (int, int), default 100
The number of hexagons in the x-direction. The corresponding number of hexagons in the y-direction is chosen in a way that the hexagons are approximately regular. Alternatively, gridsize can be a tuple with two elements specifying the number of hexagons in the x-direction and the y-direction.

**kwargs
Additional keyword arguments are documented in DataFrame.plot().

Returns:
matplotlib.AxesSubplot
The matplotlib Axes on which the hexbin is plotted.

597-2、参数

597-2-1、x(必须)：字符串或Series，代表DataFrame中的列名或Series，用作X轴坐标的数据。

597-2-2、y(必须)：字符串或Series，代表DataFrame中的列名或Series，用作Y轴坐标的数据。

597-2-3、C(可选，默认值为None)：字符串、Series或None，该参数用于设置与六边形区域颜色相关的值。例如，如果你传递一个数值型列名，颜色将根据这个列的值进行调整，而不是简单地统计每个区域的点数。如果为None，则颜色表示在每个六边形中的点数。

597-2-4、reduce_C_function(可选，默认值为None)：函数或None，该参数用于对C 值进行聚合的函数。如果提供C参数，则使用该函数来聚合位于同一六边形内的多个C值。例如，可以使用np.sum来计算每个六边形区域的总和。

597-2-5、gridsize(可选，默认值为None)：整数，定义在X轴方向上的六边形网格大小，影响六边形的数量。值越大，生成的六边形越多，图像分辨率越高；值越小，六边形越少，图像越粗略。

597-2-6、**kwargs(可选)：其他关键词参数，可以用来进一步定制图表。例如，colorbar=True用于显示颜色条，gridsize控制六边形的数量，cmap设置颜色映射等。

597-3、功能

使用六边形的网格而不是传统的矩形来聚类点，与散点图不同，hexbin图在处理大量重叠数据点时更加清晰。每个六边形的颜色强度代表其中点的数量或点的某种属性值(当指定C和reduce_C_function时)。

597-4、返回值

返回一个matplotlib.AxesSubplot对象，该对象包含生成的图，可以进一步修改和自定义。例如，用户可以添加标题、坐标轴标签等。

597-5、说明

无

597-6、用法

597-6-1、数据准备

python">无

597-6-2、代码示例

python"># 597、pandas.DataFrame.plot.hexbin方法
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# 生成随机数据
np.random.seed(0)
n = 10000
x = np.random.standard_normal(n)
y = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
# 创建DataFrame
df = pd.DataFrame({'x': x, 'y': y})
# 绘制hexbin图
df.plot.hexbin(x='x', y='y', gridsize=50, cmap='Blues')
# 添加标题和标签
plt.title('Hexbin Plot')
plt.xlabel('X')
plt.ylabel('Y')
# 显示图形
plt.show()

597-6-3、结果输出

python"># 597、pandas.DataFrame.plot.hexbin方法
见图2

图2：

pandas.DataFrame.plot.hist%E6%96%B9%E6%B3%95">598、pandas.DataFrame.plot.hist方法

598-1、语法

python"># 598、pandas.DataFrame.plot.hist方法
pandas.DataFrame.plot.hist(by=None, bins=10, **kwargs)
Draw one histogram of the DataFrame’s columns.

A histogram is a representation of the distribution of data. This function groups the values of all given Series in the DataFrame into bins and draws all bins in one matplotlib.axes.Axes. This is useful when the DataFrame’s Series are in a similar scale.

Parameters:
bystr or sequence, optional
Column in the DataFrame to group by.

Changed in version 1.4.0: Previously, by is silently ignore and makes no groupings

binsint, default 10
Number of histogram bins to be used.

**kwargs
Additional keyword arguments are documented in DataFrame.plot().

Returns:
class:
matplotlib.AxesSubplot
Return a histogram plot.

598-2、参数

598-2-1、by(可选，默认值为None)：字符串、序列或None，用于分组绘图，by参数可以指定一个列名或多个列名，根据这些列进行分组，并分别为每个分组生成直方图。例如，如果by='category'，将为每个类别绘制一个独立的直方图。

598-2-2、bins(可选，默认值为10)：整数、序列或None，指定直方图中的箱数，整数值指定箱的数量，序列指定箱的边界，如果未指定，则默认为将数据分成10个区间。

598-2-3、**kwargs(可选)：其他关键字参数，用于传递额外的自定义参数，可以用来进一步定制图表。例如：

grid：是否显示网格，默认True。
xlabelsize/ylabelsize：设置x和y轴标签的大小。
title：设置图形标题。
color：指定颜色或颜色序列。
alpha：设置颜色的透明度。

598-3、功能

用于绘制直方图，直方图是一种用来显示数值数据分布的图表，它通过将数据分组到不同的区间(bins)，并统计每个区间的频数，来反映数据的分布情况，直方图通常用于单变量的可视化分析。

598-4、返回值

返回一个matplotlib.AxesSubplot对象，该对象包含生成的直方图，可以进一步修改和自定义。例如，用户可以通过添加标题、坐标轴标签，调整颜色等方式来对图表进行调整。

598-5、说明

无

598-6、用法

598-6-1、数据准备

python">无

598-6-2、代码示例

python"># 598、pandas.DataFrame.plot.hist方法
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# 生成随机数据
np.random.seed(0)
data = pd.DataFrame({
    'A': np.random.randn(1000),  # 正态分布数据
    'B': np.random.randn(1000) + 1  # 均值不同的正态分布
})
# 绘制直方图
data.plot.hist(bins=20, alpha=0.7)
# 添加标题和标签
plt.title('Histogram of A and B')
plt.xlabel('Value')
plt.ylabel('Frequency')
# 显示图形
plt.show()

598-6-3、结果输出

python"># 598、pandas.DataFrame.plot.hist方法
见图3

图3：

pandas.DataFrame.plot.kde%E6%96%B9%E6%B3%95">599、pandas.DataFrame.plot.kde方法

599-1、语法

python"># 599、pandas.DataFrame.plot.kde方法
pandas.DataFrame.plot.kde(bw_method=None, ind=None, **kwargs)
Generate Kernel Density Estimate plot using Gaussian kernels.

In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function (PDF) of a random variable. This function uses Gaussian kernels and includes automatic bandwidth determination.

Parameters:
bw_method
str, scalar or callable, optional
The method used to calculate the estimator bandwidth. This can be ‘scott’, ‘silverman’, a scalar constant or a callable. If None (default), ‘scott’ is used. See scipy.stats.gaussian_kde for more information.

ind
NumPy array or int, optional
Evaluation points for the estimated PDF. If None (default), 1000 equally spaced points are used. If ind is a NumPy array, the KDE is evaluated at the points passed. If ind is an integer, ind number of equally spaced points are used.

**kwargs
Additional keyword arguments are documented in DataFrame.plot().

Returns:
matplotlib.axes.Axes or numpy.ndarray of them.

599-2、参数

599-2-1、bw_method(可选，默认值为None)：字符串、标量或可调用函数，控制估计过程中的核密度带宽，bw_method指定带宽选择的方法，参数可以是'scott'、'silverman'、浮点数或用户自定义函数。如果为None，默认使用'scott'方法，带宽控制着曲线的平滑程度，较小的带宽会产生更为波动的估计曲线，而较大的带宽则会产生更平滑的估计曲线。

599-2-2、ind(可选，默认值为None)：数组或整数，定义KDE曲线的评估点数，整数值ind定义将曲线评估的点数。默认情况下，ind会根据数据的范围自动创建1000个等间距的点，如果提供数组，它将被用作评估点。

599-2-3、**kwargs(可选)：其他关键字参数，用于传递额外的自定义参数，可以用来进一步定制图表。例如：

color：指定曲线颜色。
title：设置图形标题。
xlabel/ylabel：设定X和Y轴的标签。
linewidth：设定曲线的宽度。

599-3、功能

核密度估计(KDE)提供了一种平滑数据分布的方式，它适用于可视化单变量分布，对数据的概率密度进行估计，而不依赖于将数据聚集到离散的箱中(如直方图)。

599-4、返回值

返回一个matplotlib.AxesSubplot对象，包含生成的KDE曲线，可以用于进行进一步的修改和自定义。

599-5、说明

无

599-6、用法

599-6-1、数据准备

python">无

599-6-2、代码示例

python"># 599、pandas.DataFrame.plot.kde方法
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# 生成随机数据
np.random.seed(0)
data = pd.DataFrame({
    'A': np.random.normal(loc=0, scale=1, size=1000),  # 标准正态分布
    'B': np.random.normal(loc=1, scale=0.5, size=1000)  # 均值为1的正态分布
})
# 绘制KDE曲线
data.plot.kde(bw_method='scott')
# 添加标题和标签
plt.title('KDE of A and B')
plt.xlabel('Value')
plt.ylabel('Density')
# 显示图形
plt.show()

599-6-3、结果输出

python"># 599、pandas.DataFrame.plot.kde方法
见图4

图4：

pandas.DataFrame.plot.line%E6%96%B9%E6%B3%95">600、pandas.DataFrame.plot.line方法

600-1、语法

python"># 600、pandas.DataFrame.plot.line方法
pandas.DataFrame.plot.line(x=None, y=None, **kwargs)
Plot Series or DataFrame as lines.

This function is useful to plot lines using DataFrame’s values as coordinates.

Parameters:
x
label or position, optional
Allows plotting of one column versus another. If not specified, the index of the DataFrame is used.

y
label or position, optional
Allows plotting of one column versus another. If not specified, all numerical columns are used.

color
str, array-like, or dict, optional
The color for each of the DataFrame’s columns. Possible values are:

A single color string referred to by name, RGB or RGBA code,
for instance ‘red’ or ‘#a98d19’.

A sequence of color strings referred to by name, RGB or RGBA
code, which will be used for each column recursively. For instance [‘green’,’yellow’] each column’s line will be filled in green or yellow, alternatively. If there is only a single column to be plotted, then only the first color from the color list will be used.

A dict of the form {column namecolor}, so that each column will be
colored accordingly. For example, if your columns are called a and b, then passing {‘a’: ‘green’, ‘b’: ‘red’} will color lines for column a in green and lines for column b in red.

**kwargs
Additional keyword arguments are documented in DataFrame.plot().

Returns:
matplotlib.axes.Axes or np.ndarray of them
An ndarray is returned with one matplotlib.axes.Axes per column when subplots=True.

600-2、参数

600-2-1、x(可选，默认值为None)：字符串或整数，指定DataFrame中用作x轴的数据列，如果为None，则默认使用DataFrame的索引作为x轴，可以通过列的名称(字符串)或列的编号(整数)来指定。

600-2-2、y(可选，默认值为None)：字符串、整数或列表，指定DataFrame中用作y轴的数据列，可以是单个列名或列号，也可以是一个列表，表示多个列，如果为None，将绘制所有数值列的折线图。

600-2-3、**kwargs(可选)：其他关键字参数，其他自定义参数，用于控制图表的细节和样式。这些参数包括但不限于：

color：指定折线的颜色。例如'blue'。
title：设置图形的标题。
xlabel/ylabel：设置x轴和y轴的标签。
linewidth：设置折线的宽度，默认值为1.5。
figsize：控制图形的大小，格式为(宽度, 高度)。

600-3、功能

用于将DataFrame的数据以折线图的形式可视化，折线图非常适合展示趋势或随时间变化的数据，通常用于观察一个或多个变量在x轴值上的变化情况。

600-4、返回值

返回一个matplotlib.AxesSubplot对象，包含生成的折线图，用户可以基于此对象进一步进行自定义操作，如调整图形布局、添加注释或保存图像。

600-5、说明

无

600-6、用法

600-6-1、数据准备

python">无

600-6-2、代码示例

python"># 600、pandas.DataFrame.plot.line方法
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# 生成随机数据
np.random.seed(0)
data = pd.DataFrame({
    'Year': np.arange(2000, 2024),  # x 轴数据：年份
    'Sales_A': np.random.randint(100, 200, size=24),  # y 轴数据：销售额 A
    'Sales_B': np.random.randint(150, 250, size=24)   # y 轴数据：销售额 B
})
# 绘制折线图，Year为x轴，Sales_A和Sales_B为y轴
data.plot.line(x='Year', y=['Sales_A', 'Sales_B'], linewidth=2, color=['blue', 'green'])
# 添加标题和标签
plt.title('Sales Trend from 2000 to 2023')
plt.xlabel('Year')
plt.ylabel('Sales')
# 显示图形
plt.show()

600-6-3、结果输出

python"># 600、pandas.DataFrame.plot.line方法
见图5