从大型元组/行列表中有效地构建 Pandas DataFrame

Efficiently construct Pandas DataFrame from large list of tuples/rows(从大型元组/行列表中有效地构建 Pandas DataFrame)
本文介绍了从大型元组/行列表中有效地构建 Pandas DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我继承了一个以 Stata .dta 格式保存的数据文件.我可以使用 scikits.statsmodels genfromdta() 函数加载它.这会将我的数据放入一维 NumPy 数组中,其中每个条目是一行数据,存储在 24 元组中.

I've inherited a data file saved in the Stata .dta format. I can load it in with scikits.statsmodels genfromdta() function. This puts my data into a 1-dimensional NumPy array, where each entry is a row of data, stored in a 24-tuple.

In [2]: st_time = time.time(); initialload = sm.iolib.genfromdta("/home/myfile.dta"); ed_time = time.time(); print (ed_time - st_time)
666.523324013

In [3]: type(initialload)
Out[3]: numpy.ndarray

In [4]: initialload.shape
Out[4]: (4809584,)

In [5]: initialload[0]
Out[5]: (19901130.0, 289.0, 1990.0, 12.0, 19901231.0, 18.0, 40301000.0, 'GB', 18242.0, -2.368063, 1.0, 1.7783716290878204, 4379.355, 66.17669677734375, -999.0, -999.0, -0.60000002, -999.0, -999.0, -999.0, -999.0, -999.0, 0.2, 371.0)

我很好奇是否有一种有效的方法可以将其安排到 Pandas DataFrame 中.根据我的阅读,逐行构建 DataFrame 似乎效率很低……但我有什么选择?

I am curious if there's an efficient way to arrange this into a Pandas DataFrame. From what I've read, building up a DataFrame row-by-row seems quite inefficient... but what are my options?

我已经写了一个非常慢的第一次通过,它只是将每个元组作为单行 DataFrame 读取并附加它.只是想知道是否还有其他更好的方法.

I've written a pretty slow first-pass that just reads each tuple as a single-row DataFrame and appends it. Just wondering if anything else is known to be better.

推荐答案

pandas.DataFrame(initialload, columns=list_of_column_names)

这篇关于从大型元组/行列表中有效地构建 Pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Leetcode 234: Palindrome LinkedList(Leetcode 234:回文链接列表)
How do I read an Excel file directly from Dropbox#39;s API using pandas.read_excel()?(如何使用PANDAS.READ_EXCEL()直接从Dropbox的API读取Excel文件?)
subprocess.Popen tries to write to nonexistent pipe(子进程。打开尝试写入不存在的管道)
I want to realize Popen-code from Windows to Linux:(我想实现从Windows到Linux的POpen-code:)
Reading stdout from a subprocess in real time(实时读取子进程中的标准输出)
How to call type safely on a random file in Python?(如何在Python中安全地调用随机文件上的类型?)