通过Paramiko将CSV文件从SFTP服务器读取到Pandas失败,编解码器无法解码字节...在位置...:无效的起始字节&Quot;

Reading CSV file into Pandas from SFTP server via Paramiko fails with quot;#39;utf-8#39; codec can#39;t decode byte ... in position ....: invalid start bytequot;(通过Paramiko将CSV文件从SFTP服务器读取到Pandas失败,编解码器无法解码字节...在位置...:无效的起始字节Quot;) - IT屋-程序员
本文介绍了通过Paramiko将CSV文件从SFTP服务器读取到Pandas失败,编解码器无法解码字节...在位置...:无效的起始字节&Quot;的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Paramiko将CSV文件从AM SFTP服务器读取到Pandas中:

with sftp.open(path + file.filename) as fp:
    fp_aux = pd.read_csv(fp, separator = '|')

但在尝试时,它抛出以下错误:

‘utf-8’编解码器无法对位置73中的字节0xa3进行解码:起始字节无效

我尝试了不同的编码,将不同的参数传递给pd.read_csv函数的encoding参数(UNICODE_ESCRIPE,拉丁语-1,latin1,拉丁语,utf-8...)。我也尝试了engine='python',但到目前为止还没有成功。还有什么我可以试一试的吗?如果不是,我如何忽略该错误并继续到下一行或下一个DF?

只有当我尝试从SFTP服务器读取时才会发生这种情况,如果我从本地磁盘读取它,它工作得很好。

完成错误的调用堆栈:

UnicodeDecodeError                        Traceback (most recent call last)
pandas\_libsparsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

pandas\_libsparsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

pandas\_libsparsers.pyx in pandas._libs.parsers.TextReader._string_convert()

pandas\_libsparsers.pyx in pandas._libs.parsers._string_box_utf8()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 83: invalid start byte

During handling of the above exception, another exception occurred:

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-41-53537b824736> in <module>
      1 with sftp.open(r'/Debtopdcarich/Mandatory File/MandatoryFile_190721.csv') as fp:
----> 2     fp_aux = (pd.read_csv(fp, encoding='iso-8859-1', sep='|'))

~AppDataLocalContinuumanaconda3libsite-packagespandasioparsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    603     kwds.update(kwds_defaults)
    604 
--> 605     return _read(filepath_or_buffer, kwds)
    606 
    607 

~AppDataLocalContinuumanaconda3libsite-packagespandasioparsers.py in _read(filepath_or_buffer, kwds)
    461 
    462     with parser:
--> 463         return parser.read(nrows)
    464 
    465 

~AppDataLocalContinuumanaconda3libsite-packagespandasioparsers.py in read(self, nrows)
   1050     def read(self, nrows=None):
   1051         nrows = validate_integer("nrows", nrows)
-> 1052         index, columns, col_dict = self._engine.read(nrows)
   1053 
   1054         if index is None:

~AppDataLocalContinuumanaconda3libsite-packagespandasioparsers.py in read(self, nrows)
   2054     def read(self, nrows=None):
   2055         try:
-> 2056             data = self._reader.read(nrows)
   2057         except StopIteration:
   2058             if self._first_chunk:

pandas\_libsparsers.pyx in pandas._libs.parsers.TextReader.read()

pandas\_libsparsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas\_libsparsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas\_libsparsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()

pandas\_libsparsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

pandas\_libsparsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

pandas\_libsparsers.pyx in pandas._libs.parsers.TextReader._string_convert()

pandas\_libsparsers.pyx in pandas._libs.parsers._string_box_utf8()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 83: invalid start byte

推荐答案

pandas 似乎被类似Paramiko文件的对象API搞糊涂了。当使用类似Paramiko文件的对象时,它不使用其encoding参数。

快速而肮脏的解决方案是将远程文件读取到内存中的类文件对象,并将其呈现给Pandas。然后使用encoding参数。

flo = BytesIO()
sftp.getfo(path + file.filename, flo)
flo.seek(0)
pd.read_csv(flo, separator = '|', encoding='iso-8859-1')

更有效的方法可能是使用Pandas可以使用的API在类似Paramiko文件的对象之上构建包装类。

这篇关于通过Paramiko将CSV文件从SFTP服务器读取到Pandas失败,编解码器无法解码字节...在位置...:无效的起始字节&Quot;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Leetcode 234: Palindrome LinkedList(Leetcode 234:回文链接列表)
How do I read an Excel file directly from Dropbox#39;s API using pandas.read_excel()?(如何使用PANDAS.READ_EXCEL()直接从Dropbox的API读取Excel文件?)
subprocess.Popen tries to write to nonexistent pipe(子进程。打开尝试写入不存在的管道)
I want to realize Popen-code from Windows to Linux:(我想实现从Windows到Linux的POpen-code:)
Reading stdout from a subprocess in real time(实时读取子进程中的标准输出)
How to call type safely on a random file in Python?(如何在Python中安全地调用随机文件上的类型?)