DevOps | Cloud | Analytics | Open Source | Programming





How To Fix Spark Error - "org.apache.spark.sql.AnalysisException: resolved attribute(s)" ?



In this post, we will see - How To Fix Spark Error - "org.apache.spark.sql.AnalysisException: resolved attribute(s)".

While applying udf on columns using dataframes in Spark, sometimes you get the below error -


Exception in thread "main" org.apache.spark.sql.AnalysisException: 
resolved attribute(s) xxxx#yy missing from


ERROR: org.apache.spark.sql.AnalysisException: resolved attribute(s)

You also get errors like -


Reference ‘xxx’ is ambiguous

Most of the common occurrences of these issues are due to various join, aggregation etc. operations done on dataframe(s) . Also when the columns between the dataframes share an AttributeReference . This leads to ambiguity and the basic approach would be to resolve this ambiguity or nullify the AttributeReference shared. Use the below steps if that helps to solve the issue -

Approach 1:

  • If you are reusing references, it might create ambiguity in the name . One approach would be to clone the dataframe -

final Dataset<Row> join = cloneDataset(df1.join(df2, columns))

OR

df1\_cloned = df1.toDF(column\_names)
df1\_cloned.join(df2, \['column\_names\_to\_join'\])

 

Approach 2:

  • When you join two dataframes which have more than one keys sharing the same name, then you could try to join the dataframes specifying the exact columns that you are joining on.

df1.join(df2, \['col1', 'col2', 'col3', 'col4'\])

 

Approach 3:

  • Let's say you have dataframe-1 df1
  • Then you derived dataframe-2(df2) from dataframe-1
  • Since df2 was derived from df1, the common columns will have same name(s)
  • In case if you require to join df1 & df2 in a scenario, then
    • Rename the columns which are common to both the dataframes

df2\_modified = df2.withColumnRenamed('col1', 'col1\_renamed').withColumnRenamed('col2', 'col2\_renamed')

    • Now the columns ambiguity being handled join df1 with df2_modified - instead of df2

df1.join(df2\_modified)

 

Approach 4:

  • You could also use the alias option as shown below to nullify the column ambiguity. In this case we assume that col1 is the column creating ambiguity.

import pyspark.sql.functions as Func

df1\_modified = df1.select(Func.col("col1").alias("col1\_renamed"))

  • Now use df1_modified dataframe to join - instead of df1
  Hope this helps.  

Additional Read -

 


spark sql resolved attribute(s) missing from ,spark resolved attribute missing ,resolved attribute(s) missing from spark scala ,pyspark resolved attribute(s) missing from ,pyspark join resolved attribute(s) missing from ,resolved attribute(s) missing pyspark ,resolved attribute(s) missing from spark scala ,resolved attribute(s) missing from in operator ,resolved attribute(s) missing from in operator project ,resolved attribute(s) missing from in operator filter ,resolved attributes missing from pyspark ,resolved attributes missing from spark ,resolved attributes ,pyspark join resolved attributes missing ,resolved attribute(s) missing from spark scala ,attribute(s) with the same name appear in the operation ,cannot be resolved on the left side of the join ,analysisexception syntax error in attribute name ,failure when resolving conflicting references in join ,found duplicate rewrite attributes pyspark ,org apache-spark sql analysisexception reference is ambiguous, could be ,import pyspark sql could not be resolved ,spark resolved attribute(s) ,spark resolved attribute(s) missing from ,resolved attribute missing from pyspark ,resolved attribute(s) missing from spark scala ,spark sql resolved attribute(s) missing from ,pyspark join resolved attribute(s) missing from ,pyspark resolved attribute(s) missing from ,pyspark u'resolved attribute(s) missing from ,resolved attribute(s) missing from spark scala ,org.apache.spark.sql.analysisexception ,org.apache.spark.sql.analysisexception cannot resolve ,org.apache.spark.sql.analysisexception path does not exist ,org.apache.spark.sql.AnalysisException ,org.apache.spark.sql.analysisexception cannot resolve ,org.apache.spark.sql.analysisexception path does not exist ,org.apache.spark.sql.analysisexception cannot resolve given input columns ,spark resolved attribute(s) ,spark resolved attribute(s) c# ,spark resolved attribute(s) command ,spark resolved attribute(s) date ,spark resolved attribute(s) difference ,spark resolved attribute(s) error ,spark resolved attribute(s) example ,spark resolved attribute(s) graph ,spark resolved attribute(s) header ,spark resolved attribute(s) in java ,spark resolved attribute(s) key ,spark resolved attribute(s) kotlin ,spark resolved attribute(s) list ,spark resolved attribute(s) missing from ,spark resolved attribute(s) name ,spark resolved attribute(s) not working ,spark resolved attribute(s) qgis ,spark resolved attribute(s) query ,spark resolved attribute(s) queue ,spark resolved attribute(s) table ,spark resolved attribute(s) tag ,spark resolved attribute(s) types ,spark resolved attribute(s) update ,spark resolved attribute(s) url ,spark resolved attribute(s) value ,spark resolved attribute(s) xml ,spark resolved attribute(s) yaml ,spark resolved attribute(s) years ,spark resolved attribute(s) yield ,spark resolved attribute(s) youtube ,spark resolved attribute(s) zero ,spark resolved attribute(s) zip ,spark sql resolved attribute(s) missing from