使用 Pandas 读取 CSV 时不断出现 UnicodeDecodeError 的错误

Question

13 浏览2023年5月22日

匿名的 2022年10月29日

0 Comments

这个问题已经有了答案：

错误UnicodeDecodeError：无法解码字节0xff（位置0）：无效的起始字节

我正在尝试使用Python读取csv文件，但总是出现以下错误。我尝试了之前在其他计算机上无问题工作的其他csv文件，但是它们也会给出相同的错误信息。我最近换了电脑，但同样奇怪的是，昨天我读取了另一个保存在相同网络位置的csv文件，没有任何问题。我不知道是什么原因导致的，但如果有人有任何想法，我希望能够加载我的以前的文件。

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Input In [17], in 
      1 import pandas as pd
----> 3 df = pd.read_csv(r"C:\Users\nabecker\OneDrive - McDermott Will & Emery LLP\Documents\Parent Data for Analysis.csv")
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py:311, in deprecate_nonkeyword_arguments..decorate..wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:586, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    571 kwds_defaults = _refine_defaults_read(
    572     dialect,
    573     delimiter,
   (...)
    582     defaults={"delimiter": ","},
    583 )
    584 kwds.update(kwds_defaults)
--> 586 return _read(filepath_or_buffer, kwds)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:482, in _read(filepath_or_buffer, kwds)
    479 _validate_names(kwds.get("names", None))
    481 # Create the parser.
--> 482 parser = TextFileReader(filepath_or_buffer, **kwds)
    484 if chunksize or iterator:
    485     return parser
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:811, in TextFileReader.__init__(self, f, engine, **kwds)
    808 if "has_index_names" in kwds:
    809     self.options["has_index_names"] = kwds["has_index_names"]
--> 811 self._engine = self._make_engine(self.engine)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:1040, in TextFileReader._make_engine(self, engine)
   1036     raise ValueError(
   1037         f"Unknown engine: {engine} (valid options are {mapping.keys()})"
   1038     )
   1039 # error: Too many arguments for "ParserBase"
-> 1040 return mapping[engine](self.f, **self.options)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py:69, in CParserWrapper.__init__(self, src, **kwds)
     67 kwds["dtype"] = ensure_dtype_objs(kwds.get("dtype", None))
     68 try:
---> 69     self._reader = parsers.TextReader(self.handles.handle, **kwds)
     70 except Exception:
     71     self.handles.close()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\parsers.pyx:542, in pandas._libs.parsers.TextReader.__cinit__()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\parsers.pyx:642, in pandas._libs.parsers.TextReader._get_header()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\parsers.pyx:843, in pandas._libs.parsers.TextReader._tokenize_rows()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\parsers.pyx:1917, in pandas._libs.parsers.raise_parser_error()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 95538: invalid continuation byte

admin 更改状态以发布 2023年5月22日

0

1 答案

匿名的 · Answer 1 · 2022-10-29T20:57:58+00:00

看来您把文件存储在OneDrive上了。

有时网络驱动器会更改文件编码。例如，每当我在Windows上保存我的文件到Dropbox时，我会遇到这种问题；某些内容会被更改，因此我必须小心地在Mac上使用它。

有几种处理这种编码问题的方法：

# Way 1. use "ISO-8859-1" (or "latin-1") encoding when you open the file
f = open('../Resources/' + filename, 'r', encoding="ISO-8859-1")

# Way 2. ignore error when you open the file
f = open('u.item', encoding='utf8', errors='ignore')

请注意，如果成功（没有异常）加载文件，则文件将被正确打开，所有字符也将清晰可见。