PySpark OR 方法异常

26 浏览
0 Comments

PySpark OR 方法异常

我试图修改PySpark数据框中的一列值如下:

df_cleaned = df_cleaned.withColumn('brand_c', when(df_cleaned['brand'] == "samsung" |\
                                                   df_cleaned['brand'] == "oppo", df_cleaned.brand)\
                                   .otherwise('others'))

这将生成以下异常:

调用o435.or时发生错误。 Trace: py4j.Py4JException:

不存在方法or([class java.lang.String]) at

py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)

at

py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)

at py4j.Gateway.invoke(Gateway.java:274) at

py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)

at py4j.commands.CallCommand.execute(CallCommand.java:79) at

py4j.GatewayConnection.run(GatewayConnection.java:238) at

java.lang.Thread.run(Thread.java:748)

Traceback (most recent call last): File

\"/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/column.py\", line

115, in _

njc = getattr(self._jc, name)(jc) File \"/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py\",

line 1257, in call

answer, self.gateway_client, self.target_id, self.name) File \"/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py\", line

63, in deco

return f(*a, **kw) File \"/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py\", line

332, in get_return_value

format(target_id, \".\", name, value)) py4j.protocol.Py4JError: 调用o435.or时发生错误。 Trace: py4j.Py4JException:

不存在方法or([class java.lang.String]) at

py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)

at

py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)

at py4j.Gateway.invoke(Gateway.java:274) at

py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)

at py4j.commands.CallCommand.execute(CallCommand.java:79) at

py4j.GatewayConnection.run(GatewayConnection.java:238) at

java.lang.Thread.run(Thread.java:748)

admin 更改状态以发布 2023年5月24日
0
0 Comments

你只是少了一些括号。试试:

df_cleaned = df.withColumn('brand_c', when((df['Product'] == "apple") |\
                (df['Product'] == "oppo"), df.User).otherwise('others'))

在使用pyspark中的比较运算符时,始终使用括号。

0