pandas.DataFrame.copy(deep=True) 实际上并不创建深拷贝

pandas.DataFrame.copy(deep=True) doesn#39;t actually create deep copy(pandas.DataFrame.copy(deep=True) 实际上并不创建深拷贝)
本文介绍了pandas.DataFrame.copy(deep=True) 实际上并不创建深拷贝的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用 pd.Series 和 pd.DataFrame 试验了一段时间,遇到了一些奇怪的问题.假设我有以下 pd.DataFrame:

I've been experimenting for a while with pd.Series and pd.DataFrame and faced some strange problem. Let's say I have the following pd.DataFrame:

df = pd.DataFrame({'col':[[1,2,3]]})

请注意,此数据框包含包含列表的列.我想修改此数据框的副本并返回其修改后的版本,以便初始版本保持不变.为简单起见,假设我想在其单元格中添加整数4".

Notice, that this dataframe includes column containing list. I want to modify this dataframe's copy and return its modified version so that the initial one will remain unchanged. For the sake of simplicity, let's say I want to add integer '4' in its cell.

我试过以下代码:

def modify(df):
    dfc = df.copy(deep=True)
    dfc['col'].iloc[0].append(4)
    return dfc

modify(df)
print(df)

问题是,除了新的副本dfc之外,初始DataFramedf也被修改了.为什么?我应该怎么做才能防止初始数据帧被修改?我的熊猫版本是 0.25.0

The problem is that, besides the new copy dfc, the initial DataFrame df is also modified. Why? What should I do to prevent initial dataframes from modifying? My pandas version is 0.25.0

推荐答案

来自文档 这里,在注释部分:

From the docs here, in the Notes section:

当 deep=True 时,会复制数据,但不会递归复制实际的 Python 对象,只会复制对对象的引用.这与标准库中的 copy.deepcopy 不同,后者递归地复制对象数据(参见下面的示例).

When deep=True, data is copied but actual Python objects will not be copied recursively, only the reference to the object. This is in contrast to copy.deepcopy in the Standard Library, which recursively copies object data (see examples below).

这在 GitHub 上的 this 问题中再次被引用,其中 开发人员声明:

This is referenced again in this issue on GitHub, where the devs state that:

在a中嵌入可变对象.DataFrame 是一种反模式

embedding mutable objects inside a. DataFrame is an antipattern

所以这个函数按照开发者的意图工作 - 不应该将列表等可变对象嵌入到 DataFrames 中.

So this function is working as the devs intend - mutable objects such as lists should not be embedded in DataFrames.

我找不到让 copy.deepcopy 在 DataFrame 上按预期工作的方法,但我确实找到了使用 pickle:

I couldn't find a way to get copy.deepcopy to work as intended on a DataFrame, but I did find a fairly awful workaround using pickle:

import pandas as pd
import pickle

df = pd.DataFrame({'col':[[1,2,3]]})

def modify(df):
    dfc = pickle.loads(pickle.dumps(df))
    print(dfc['col'].iloc[0] is df['col'].iloc[0]) #Check if we've succeeded in deepcopying
    dfc['col'].iloc[0].append(4)
    print(dfc)
    return dfc

modify(df)
print(df)

输出:

False
            col
0  [1, 2, 3, 4]
         col
0  [1, 2, 3]

这篇关于pandas.DataFrame.copy(deep=True) 实际上并不创建深拷贝的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Leetcode 234: Palindrome LinkedList(Leetcode 234:回文链接列表)
How do I read an Excel file directly from Dropbox#39;s API using pandas.read_excel()?(如何使用PANDAS.READ_EXCEL()直接从Dropbox的API读取Excel文件?)
subprocess.Popen tries to write to nonexistent pipe(子进程。打开尝试写入不存在的管道)
I want to realize Popen-code from Windows to Linux:(我想实现从Windows到Linux的POpen-code:)
Reading stdout from a subprocess in real time(实时读取子进程中的标准输出)
How to call type safely on a random file in Python?(如何在Python中安全地调用随机文件上的类型?)