使用 Pandas 读取 CSV 时不断出现 UnicodeDecodeError 的错误
使用 Pandas 读取 CSV 时不断出现 UnicodeDecodeError 的错误
这个问题已经有了答案:
我正在尝试使用Python读取csv文件,但总是出现以下错误。我尝试了之前在其他计算机上无问题工作的其他csv文件,但是它们也会给出相同的错误信息。我最近换了电脑,但同样奇怪的是,昨天我读取了另一个保存在相同网络位置的csv文件,没有任何问题。我不知道是什么原因导致的,但如果有人有任何想法,我希望能够加载我的以前的文件。
--------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) Input In [17], in 1 import pandas as pd ----> 3 df = pd.read_csv(r"C:\Users\nabecker\OneDrive - McDermott Will & Emery LLP\Documents\Parent Data for Analysis.csv") File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py:311, in deprecate_nonkeyword_arguments..decorate..wrapper(*args, **kwargs) 305 if len(args) > num_allow_args: 306 warnings.warn( 307 msg.format(arguments=arguments), 308 FutureWarning, 309 stacklevel=stacklevel, 310 ) --> 311 return func(*args, **kwargs) File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:586, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options) 571 kwds_defaults = _refine_defaults_read( 572 dialect, 573 delimiter, (...) 582 defaults={"delimiter": ","}, 583 ) 584 kwds.update(kwds_defaults) --> 586 return _read(filepath_or_buffer, kwds) File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:482, in _read(filepath_or_buffer, kwds) 479 _validate_names(kwds.get("names", None)) 481 # Create the parser. --> 482 parser = TextFileReader(filepath_or_buffer, **kwds) 484 if chunksize or iterator: 485 return parser File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:811, in TextFileReader.__init__(self, f, engine, **kwds) 808 if "has_index_names" in kwds: 809 self.options["has_index_names"] = kwds["has_index_names"] --> 811 self._engine = self._make_engine(self.engine) File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:1040, in TextFileReader._make_engine(self, engine) 1036 raise ValueError( 1037 f"Unknown engine: {engine} (valid options are {mapping.keys()})" 1038 ) 1039 # error: Too many arguments for "ParserBase" -> 1040 return mapping[engine](self.f, **self.options) File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py:69, in CParserWrapper.__init__(self, src, **kwds) 67 kwds["dtype"] = ensure_dtype_objs(kwds.get("dtype", None)) 68 try: ---> 69 self._reader = parsers.TextReader(self.handles.handle, **kwds) 70 except Exception: 71 self.handles.close() File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\parsers.pyx:542, in pandas._libs.parsers.TextReader.__cinit__() File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\parsers.pyx:642, in pandas._libs.parsers.TextReader._get_header() File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\parsers.pyx:843, in pandas._libs.parsers.TextReader._tokenize_rows() File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\parsers.pyx:1917, in pandas._libs.parsers.raise_parser_error() UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 95538: invalid continuation byte
admin 更改状态以发布 2023年5月22日
看来您把文件存储在OneDrive上了。
有时网络驱动器会更改文件编码。例如,每当我在Windows上保存我的文件到Dropbox时,我会遇到这种问题;某些内容会被更改,因此我必须小心地在Mac上使用它。
有几种处理这种编码问题的方法:
# Way 1. use "ISO-8859-1" (or "latin-1") encoding when you open the file f = open('../Resources/' + filename, 'r', encoding="ISO-8859-1")
# Way 2. ignore error when you open the file f = open('u.item', encoding='utf8', errors='ignore')
请注意,如果成功(没有异常)加载文件,则文件将被正确打开,所有字符也将清晰可见。