DevOps | Cloud | Analytics | Open Source | Programming





How To Fix Spark Error - "Package does not Exist"



In this post , we will see How to Fix Spark Error - "package does not exist". These packages can be anything that is needed in the Spark project viz. Spark Streaming Lib , Spark-Sql Lib etc. You might see the error in the spark terminal as below (Any or More of them) -


/Spark_RDD.java:4: error: package org.apache.spark.api.java does not exist


package org.apache.spark.mllib.fpm does not exist

package org.apache.spark.streaming does not exist

package org.apache.spark.sql does not exist

This error mostly occur due to missing dependency.You might have to add these to your dependency lib or pom.xml (if using Maven). To Fix it , cross-check the below in your respective case as applicable.  

  • In case of - org.apache.spark.streaming.api.java error, Verify if spark-streaming package is added and available to the project or project path . The highlighted should be as per the versions that you are working in the project. Ensure the version no.s are correct and matches with your usage. See Sample pom.xml at the end of this post.

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-streaming_**aa.bb**</artifactId> <!-- matching Scala version -->
  <version>**xx.yy.zz**</version> <!-- matching Spark Core version -->
</dependency>

  • For - org.apache.spark.mllib.fpm error, ensure spark-mllib package is added. See Sample pom.xml at the end of this post.

<dependency>
   <groupId>org.apache.spark</groupId>
   <artifactId>spark-mllib_aa.bb</artifactId>  <!-- matching Scala version -->
   <version>xx.yy.zz</version> <!-- matching Spark Core version -->
</dependency>

  • For - org.apache.spark.sql error, ensure spark-sql package is added. See Sample pom.xml at the end of this post.

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-sql_aa.bb</artifactId> <!-- matching Scala version -->
  <version>xx.yy.zz</version> <!-- matching Spark Core version -->
</dependency>

Sample pom.xml Dependency  


<project>
  <groupId>com.myproject.app</groupId>
  <artifactId>java</artifactId>
  <modelVersion>1.0.0</modelVersion>
  <name>examples</name>
  <packaging>jar</packaging>
  <version>0.0.1</version>
  <repositories>
    <repository>
      <id>Akka repository</id>
      <url>http://repo.akka.io/releases</url>
    </repository>
    <repository>
      <id>scala-tools</id>
      <url>https://oss.sonatype.org/content/groups/scala-tools</url>
    </repository>
    <repository>
      <id>apache</id>
      <url>https://repository.apache.org/content/repositories/releases</url>
    </repository>
    <repository>
      <id>twitter</id>
      <url>http://maven.twttr.com/</url>
    </repository>
    <repository>
      <id>central2</id>
      <url>http://central.maven.org/maven2/</url>
    </repository>
  </repositories>
  <dependencies>
    <dependency> <!-- Spark dependency -->
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.10</artifactId>
      <version>1.3.1</version>
      <scope>provided</scope>
    </dependency>
    <dependency> <!-- Spark dependency -->
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.10</artifactId>
      <version>1.3.1</version>
      <scope>provided</scope>
    </dependency>
    <dependency> <!-- Spark dependency -->
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.10</artifactId>
      <version>1.3.1</version>
      <scope>provided</scope>
    </dependency>
    <dependency> <!-- Spark dependency -->
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.10</artifactId>
      <version>1.3.1</version>
    </dependency>
    <dependency> <!-- Spark dependency -->
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming-kafka_2.10</artifactId>
      <version>1.3.1</version>
    </dependency>
    <dependency> <!-- Spark dependency -->
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-mllib</artifactId>
      <version>1.3.1</version>
    </dependency>
    <dependency> <!-- Cassandra -->
      <groupId>com.datastax.spark</groupId>
      <artifactId>spark-cassandra-connector</artifactId>
      <version>1.0.0-rc5</version>
    </dependency>
    <dependency> <!-- Cassandra -->
      <groupId>com.datastax.spark</groupId>
      <artifactId>spark-cassandra-connector-java</artifactId>
      <version>1.0.0-rc5</version>
    </dependency>
    <dependency> <!-- Elastic search connector -->
      <groupId>org.elasticsearch</groupId>
      <artifactId>elasticsearch-hadoop-mr</artifactId>
      <version>2.0.0.RC1</version>
    </dependency>
    <dependency> <!-- Jetty demmo -->
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>jetty-client</artifactId>
      <version>8.1.14.v20131031</version>
    </dependency>
    <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-databind</artifactId>
      <version>2.3.3</version>
    </dependency>
    <dependency>
      <groupId>org.apache.commons</groupId>
      <artifactId>commons-lang3</artifactId>
      <version>3.0</version>
    </dependency>
    <dependency>
      <groupId>net.sf.opencsv</groupId>
      <artifactId>opencsv</artifactId>
      <version>2.0</version>
    </dependency>
    <dependency>
      <groupId>org.scalatest</groupId>
      <artifactId>scalatest_${scala.binary.version}</artifactId>
      <version>2.2.1</version>
    </dependency>
  </dependencies>
  <properties>
    <java.version>1.7</java.version>
  </properties>
  <build>
    <pluginManagement>
      <plugins>
        <plugin>
	  <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-compiler-plugin</artifactId>
          <version>3.1</version>
          <configuration>
            <source>${java.version}</source>
            <target>${java.version}</target>
          </configuration>
	</plugin>
	<plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-assembly-plugin</artifactId>
          <version>2.2.2</version>
          <!-- The configuration of the plugin -->
          <configuration>
            <!-- Specifies the configuration file of the assembly plugin -->
            <descriptors>
              <descriptor>src/main/assembly/assembly.xml</descriptor>
            </descriptors>
          </configuration>
        </plugin>
      </plugins>
    </pluginManagement>
  </build>
</project>


  Hope this is useful.  

Other Interesting Reads -

   


apache spark, hadoop spark, spark ml, python spark, package spark does not exist,apache spark, spark download, error package org apache-spark sql does not exist, Spark Import Issue , package org.apache.spark.api.java does not exist , package org.apache.spark.mllib.fpm does not exist , package org.apache.spark.streaming does not exist , package org.apache.spark.sql does not exist, apache spark issue