null - 程序员宅基地

pandas 合并数据集 pd.concat(), pd.merge(), ,data1.append(data2)_selftestclass= pd.concat([selftestclass,datatestal-程序员宅基地

技术标签： pandas

重点内容几种常用的合并数据集的方法：
1) pd.concat ([data1,data2,…], axis=1(or 0), keys=[‘key1’, ‘key2’,…], names=[‘upper’,’lower’,…], ignore_index=True/False,…)

2) pd.merge(left, right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=(‘_x’, ‘_y’), copy=True, indicator=False, validate=None)

3) data1.append(data2): 不推荐用，用pd.concat替代

作用区别：
pd.concat() 主要是沿着一条轴(axis=0 或1)将多个对象堆叠在一起;
pd.merege() 根据一个或多个键将不同DataFrame中的行连接起来，类似SQL连接操作

举例：
数据集data1和data2：

data1：
  subject_id first_name last_name
0          4      Billy    Bonder
1          5      Brian     Black
2          6       Bran   Balwner
3          7      Bryce     Brice
4          8      Betty    Btisan
data1：
  subject_id first_name last_name
0          1       Alex  Anderson
1          2        Amy  Ackerman
2          3      Allen       Ali
3          4      Alice      Aoni
4          5     Ayoung   Atiches

方法和参数选择：
- pd.merge():
1) pd.merge(data1,data2, how=’inner’,on=’subject_id’)

pd.merge(data1,data2, how='inner',on='subject_id')
Out[31]: 
  subject_id first_name_x last_name_x first_name_y last_name_y
0          4        Alice        Aoni        Billy      Bonder
1          5       Ayoung     Atiches        Brian       Black

2) pd.merge(data1,data2, how=’right’,on=’subject_id’)

pd.merge(data1,data2, how='right',on='subject_id')
Out[34]: 
  subject_id first_name_x last_name_x first_name_y last_name_y
0          4        Alice        Aoni        Billy      Bonder
1          5       Ayoung     Atiches        Brian       Black
2          6          NaN         NaN         Bran     Balwner
3          7          NaN         NaN        Bryce       Brice
4          8          NaN         NaN        Betty      Btisan

- pd.concat():
1) pd.concat([data1,data2],axis=1,ignore_index=True)

all_data_col = pd.concat([data1,data2],axis=1,ignore_index=True)
all_data_col
Out[40]: 
   0       1         2  3      4        5
0  1    Alex  Anderson  4  Billy   Bonder
1  2     Amy  Ackerman  5  Brian    Black
2  3   Allen       Ali  6   Bran  Balwner
3  4   Alice      Aoni  7  Bryce    Brice
4  5  Ayoung   Atiches  8  Betty   Btisan

2) pd.concat([data1,data2],axis=0)

pd.concat([data1,data2],axis=0)
all_data_col
Out[38]: 
  subject_id first_name last_name
0          1       Alex  Anderson
1          2        Amy  Ackerman
2          3      Allen       Ali
3          4      Alice      Aoni
4          5     Ayoung   Atiches
0          4      Billy    Bonder
1          5      Brian     Black
2          6       Bran   Balwner
3          7      Bryce     Brice
4          8      Betty    Btisan--------

data1.append(data2):

all_data = data1.append(data2)
all_data
Out[14]: 
  subject_id first_name last_name
0          1       Alex  Anderson
1          2        Amy  Ackerman
2          3      Allen       Ali
3          4      Alice      Aoni
4          5     Ayoung   Atiches
0          4      Billy    Bonder
1          5      Brian     Black
2          6       Bran   Balwner
3          7      Bryce     Brice
4          8      Betty    Btisan

Less efficient:
    >>> df = pd.DataFrame(columns=['A'])
    >>> for i in range(5):
    ...     df = df.append({
   'A'}: i}, ignore_index=True)
    >>> df
       A
    0  0
    1  1
    2  2
    3  3
    4  4

    More efficient:
    >>> pd.concat([pd.DataFrame([i], columns=['A']) for i in range(5)],ignore_index=True)
       A
    0  0
    1  1
    2  2
    3  3
    4  4

help()代码信息：

concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, copy=True)
    Concatenate pandas objects along a particular axis with optional set logic
    along the other axes.
Parameters
    ----------
    objs : a sequence or mapping of Series, DataFrame, or Panel objects
        If a dict is passed, the sorted keys will be used as the `keys`
        argument, unless it is passed, in which case the values will be
        selected (see below). Any None objects will be dropped silently unless
        they are all None in which case a ValueError will be raised
    axis : {
    0/'index', 1/'columns'}, default 0
        The axis to concatenate along
    join : {
    'inner', 'outer'}, default 'outer'
        How to handle indexes on other axis(es)
    join_axes : list of Index objects
        Specific indexes to use for the other n - 1 axes instead of performing
        inner/outer set logic
    ignore_index : boolean, default False
        If True, do not use the index values along the concatenation axis. The
        resulting axis will be labeled 0, ..., n - 1. This is useful if you are
        concatenating objects where the concatenation axis does not have
        meaningful indexing information. Note the index values on the other
        axes are still respected in the join.
    keys : sequence, default None
        If multiple levels passed, should contain tuples. Construct
        hierarchical index using the passed keys as the outermost level
    levels : list of sequences, default None
        Specific levels (unique values) to use for constructing a
        MultiIndex. Otherwise they will be inferred from the keys
    names : list, default None
        Names for the levels in the resulting hierarchical index
    verify_integrity : boolean, default False
        Check whether the new concatenated axis contains duplicates. This can
        be very expensive relative to the actual data concatenation
    copy : boolean, default True
        If False, do not copy data unnecessarily
  Returns
    -------
    concatenated : object, type of objs
        When concatenating all ``Series`` along the index (axis=0), a
        ``Series`` is returned. When ``objs`` contains at least one
        ``DataFrame``, a ``DataFrame`` is returned. When concatenating along
        the columns (axis=1), a ``DataFrame`` is returned.

merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
    Merge DataFrame objects by performing a database-style join operation by
    columns or indexes.

    If joining columns on columns, the DataFrame indexes *will be
    ignored*. Otherwise if joining indexes on indexes or indexes on a column or
    columns, the index will be passed on.

    Parameters
    ----------
    left : DataFrame
    right : DataFrame
    how : {'left', 'right', 'outer', 'inner'}, default 'inner'
        * left: use only keys from left frame, similar to a SQL left outer join;
          preserve key order
        * right: use only keys from right frame, similar to a SQL right outer join;
          preserve key order
        * outer: use union of keys from both frames, similar to a SQL full outer
          join; sort keys lexicographically
        * inner: use intersection of keys from both frames, similar to a SQL inner
          join; preserve the order of the left keys
    on : label or list
        Field names to join on. Must be found in both DataFrames. If on is
        None and not merging on indexes, then it merges on the intersection of
        the columns by default.
    left_on : label or list, or array-like
        Field names to join on in left DataFrame. Can be a vector or list of
        vectors of the length of the DataFrame to use a particular vector as
        the join key instead of columns
    right_on : label or list, or array-like
        Field names to join on in right DataFrame or vector/list of vectors per
        left_on docs
    left_index : boolean, default False
        Use the index from the left DataFrame as the join key(s). If it is a
        MultiIndex, the number of keys in the other DataFrame (either the index
        or a number of columns) must match the number of levels
    right_index : boolean, default False
        Use the index from the right DataFrame as the join key. Same caveats as
        left_index
    sort : boolean, default False
        Sort the join keys lexicographically in the result DataFrame. If False,
        the order of the join keys depends on the join type (how keyword)
    suffixes : 2-length sequence (tuple, list, ...)
        Suffix to apply to overlapping column names in the left and right
        side, respectively
    copy : boolean, default True
        If False, do not copy data unnecessarily
    indicator : boolean or string, default False
        If True, adds a column to output DataFrame called "_merge" with
        information on the source of each row.
        If string, column with information on source of each row will be added to
        output DataFrame, and column will be named value of string.
        Information column is Categorical-type and takes on a value of "left_only"
        for observations whose merge key only appears in 'left' DataFrame,
        "right_only" for observations whose merge key only appears in 'right'
        DataFrame, and "both" if the observation's merge key is found in both.

本文链接：https://blog.csdn.net/weixin_40040404/article/details/80733134

原作者删帖不实内容删帖广告或垃圾文章投诉

智能推荐

Linux查看登录用户日志_怎么记录linux设备发声的登录和登出-程序员宅基地

文章浏览阅读8.6k次。一、Linux记录用户登录信息文件1　　/var/run/utmp----记录当前正在登录系统的用户信息；2　　/var/log/wtmp----记录当前正在登录和历史登录系统的用户信息；3　　/var/log/btmp：记录失败的登录尝试信息。二、命令用法1.命令last，lastb---show a listing of la_怎么记录linux设备发声的登录和登出

第四章笔记：遍历--算法学中的万能钥匙-程序员宅基地

文章浏览阅读167次。摘要：1. 简介 2. 公园迷宫漫步 3. 无线迷宫与最短（不加权）路径问题 4. 强连通分量1. 简介在计算机科学裡，树的遍历（也称为树的搜索）是圖的遍歷的一种，指的是按照某种规则，不重复地访问某种樹的所有节点的过程。具体的访问操作可能是检查节点的值、更新节点的值等。不同的遍历方式，其访问节点的顺序是不一样的。两种著名的基本遍历策略：深度优先搜索（DFS）和广度优先搜索（B...

【案例分享】使用ActiveReports报表工具，在.NET MVC模式下动态创建报表_activereports.net 实现查询报表功能-程序员宅基地

文章浏览阅读591次。提起报表，大家会觉得即熟悉又陌生，好像常常在工作中使用，又似乎无法准确描述报表。今天我们来一起了解一下什么是报表，报表的结构、构成元素，以及为什么需要报表。什么是报表简单的说：报表就是通过表格、图表等形式来动态显示数据，并为使用者提供浏览、打印、导出和分析的功能，可以用公式表示为：报表 = 多样的布局 + 动态的数据 + 丰富的输出报表通常包含以下组成部分：报表首页：在报表的开..._activereports.net 实现查询报表功能

Ubuntu18.04 + GNOME xrdp + Docker + GUI_docker xrdp ubuntu-程序员宅基地

文章浏览阅读6.6k次。最近实验室需要用Cadence，这个软件的安装非常麻烦，每一次配置都要几个小时，因此打算把Cadence装进Docker。但是Cadence运行时需要GUI，要对Docker进行一些配置。我们实验室的服务器运行的是Ubuntu18.04，默认桌面GNOME，Cadence装进Centos的Docker。安装Ubuntu18.04服务器上安装Ubuntu18.04的教程非常多，在此不赘述了安装..._docker xrdp ubuntu

iOS AVFoundation实现相机功能_ios avcapturestillimageoutput 兼容性 ios17 崩溃-程序员宅基地

文章浏览阅读1.8k次，点赞2次，收藏2次。首先导入头文件#import 导入头文件后创建几个相机必须实现的对象 /** * AVCaptureSession对象来执行输入设备和输出设备之间的数据传递 */ @property (nonatomic, strong) AVCaptureSession* session; /** * 输入设备 */_ios avcapturestillimageoutput 兼容性 ios17 崩溃

Oracle动态性能视图--v$sysstat_oracle v$sysstat视图-程序员宅基地

文章浏览阅读982次。按照OracleDocument中的描述，v$sysstat存储自数据库实例运行那刻起就开始累计全实例(instance-wide)的资源使用情况。类似于v$sesstat，该视图存储下列的统计信息：1>.事件发生次数的统计(如：user commits)2>._oracle v$sysstat视图

随便推点

Vue router报错：NavigationDuplicated {_name: "NavigationDuplicated", name: "NavigationDuplicated"}的解决方法_navigationduplicated {_name: 'navigationduplicated-程序员宅基地

文章浏览阅读7.6k次，点赞2次，收藏9次。我最近做SPA项目开发动态树的时候一直遇到以下错误：当我点击文章管理需要跳转路径时一直报NavigationDuplicated {_name: “NavigationDuplicated”, name: “NavigationDuplicated”}这个错误但是当我点击文章管理后，路径跳转却是成功的<template> <div> 文章管理页面 <..._navigationduplicated {_name: 'navigationduplicated', name: 'navigationduplic

pandas 合并数据集 pd.concat(), pd.merge(), ,data1.append(data2)_selftestclass= pd.concat([selftestclass,datatestal-程序员宅基地

智能推荐

Linux查看登录用户日志_怎么记录linux设备发声的登录和登出-程序员宅基地

第四章笔记：遍历--算法学中的万能钥匙-程序员宅基地

【案例分享】使用ActiveReports报表工具，在.NET MVC模式下动态创建报表_activereports.net 实现查询报表功能-程序员宅基地

Ubuntu18.04 + GNOME xrdp + Docker + GUI_docker xrdp ubuntu-程序员宅基地

iOS AVFoundation实现相机功能_ios avcapturestillimageoutput 兼容性 ios17 崩溃-程序员宅基地

Oracle动态性能视图--v$sysstat_oracle v$sysstat视图-程序员宅基地

随便推点

Vue router报错：NavigationDuplicated {_name: "NavigationDuplicated", name: "NavigationDuplicated"}的解决方法_navigationduplicated {_name: 'navigationduplicated-程序员宅基地

Webrtc回声消除模式(Aecm)屏蔽舒适噪音(CNG)_webrtc aecm 杂音-程序员宅基地

医学成像原理与图像处理一：概论_医学成像与图像处理技术知识点总结-程序员宅基地

notepad++ v8.5.3 安装插件，安装失败怎么处理？下载进度为0怎么处理？_nodepa++-程序员宅基地

hive某个字段中包括\n（和换行符冲突）_hive sql \n-程序员宅基地

印象笔记05：如何打造更美的印象笔记超级笔记_好的印象笔记怎么做的-程序员宅基地

推荐文章

热门文章

相关标签