如何在pandas中清空字符串

Question

12 浏览2023年5月6日

匿名的 2023年5月7日

0 Comments

所以，我一直在使用Python中的pandas库，并从外部系统中提取了一些带有许多空格的列数据。我想到了使用str.strip()方法来去除每个Series中的空格，代码如下：

Data["DESCRIPTION"] =  Data["DESCRIPTION"].str.strip()

它基本上完成了它的工作，但我注意到当我使用run into an issue来检查数据框的属性时，如果一个值只包含空格而没有任何文本，它是空的，但它不会将该标量转换为null：


RangeIndex: 18028 entries, 0 to 18027
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   VIN             18028 non-null  object
 1   DESCRIPTION     18028 non-null  object
 2   DESCRIPTION 2   18028 non-null  object
 3   ENGINE          18023 non-null  object
 4   TRANSMISSION    18028 non-null  object
 5   PAINT           18028 non-null  object
 6   EXT_COLOR_CODE  18028 non-null  object
 7   EXT_COLOR_DESC  18028 non-null  object
 8   INT_COLOR_DESC  18028 non-null  object
 9   COUNTRY         18028 non-null  object
 10  PROD_DATE       18028 non-null  object
dtypes: object(11)
memory usage: 1.5+ MB

然而，当检查字符串是否为空时：

Data['DESCRIPTION 2'] == ""
    0        True
    1        True
    2        True
    3        True
    4        True
             ... 
    18023    True
    18024    True
    18025    True
    18026    True
    18027    True
    Name: DESCRIPTION 2, Length: 18028, dtype: bool

我应该如何将所有这些都转换为null，以便我可以使用dropna()函数删除它们？

对于任何建议，我将非常感激。

0

1 答案

匿名的 · Answer 1 · 2023-09-21T16:55:22+00:00

问题的出现原因是数据中存在空字符串或只包含空格的记录，而需要将这些空字符串替换为NaN。出现这个问题的解决方法是使用pandas的str.strip()方法去除字符串两端的空格，并使用replace()方法将空字符串替换为NaN。

具体的解决方法如下：

Data["DESCRIPTION"].str.strip().replace(r'^\s*$', np.nan, regex=True)

更多关于该的问题和解决方案可以参考这个页面：Replacing blank values (white space) with NaN in pandas