DevOps | Cloud | Analytics | Open Source | Programming





How To Set up Apache Spark & PySpark in Windows 10 ?



This post explains How To Set up Apache Spark & PySpark in Windows 10 . We will also see some of the common errors people face while doing the set-up. Please do the following step by step and hopefully it should work for you -  

1. Create and Verify The Folders:

Create the below folders in C drive. You can also use any other drive . But for this post , I am considering the C Drive for the set-up. 1.1. For Spark - C:\Spark 1.2. For Hadoop - C:\Hadoop\bin 1.3. For Java - Check where your Java JDK is installed. If Java is not already installed ,  install it from Oracle website (https://java.com/en/download/help/windows_manual_download.xml) . Ideally Java version 8 works fine without any issues so far. So try that. Lets assume Java is installed . Note down the Java JDK path. Typically it is like - C:\Program Files\Java\jdk1.8.0_191. It might be different based on what folder you choose. But whatsoever , Note the path down. We will need all the above 3 Folder names in our next steps.  

2. Downloads:

Download the following - > Download Spark from - https://spark.apache.org/downloads.html Extract the files and place it in - C:\Spark. e.g I have downloaded spark 2.2.1 version and extracted , it looks something like - C:\Spark\spark-2.2.1-bin-hadoop2.7  

>   Download winutils.exe from - https://github.com/steveloughran/winutils/blob/master/hadoop-2.7.1/bin/winutils.exe

Copy the winutils.exe file in C:\Hadoop\bin

 

3. Environment Variable Set-up:

Let's set up the environment variable now. Open the Environment variables windows . And Create New or Edit if already available. Based on what I have chosen , I will need to add the following variables as Environment variables - SPARK_HOME - C:\Spark\spark-2.2.1-bin-hadoop2.7 HADOOP_HOME - C:\Hadoop JAVA_HOME - C:\Program Files\Java\jdk1.8.0_191 These values are as per my folder structure. Please try to keep the same folder structure. For my case , it looks like below once I set-up the environment variables - variables  

4. Run Spark:

If you have done the above steps correctly, you are ready to start Spark. However most of the cases , the issue happens due to the Folder names are not correctly set in the environment variables. So Double check All the above steps ad make sure everything is fine. > Open windows line window or power shell . Both are fine. > Go to Spark bin folder and copy the bin path - C:\Spark\spark-2.2.1-bin-hadoop2.7\bin > type in - cd C:\Spark\spark-2.2.1-bin-hadoop2.7\bin > Type - ls It should show you all the Spark executable files. > Type in - spark-shell You will see a screen like below - This ensures that Spark is running fine now spark-shell  

5. PySpark :

So if you correctly reached this point , that means your Spark environment is Ready in Windows. But for pyspark , you will also need to install Python - choose python 3. Install Python and make sure it is also added in Windows PATH variables. If done , then follow all steps from 4 , and then execute "pyspark" as shown below   pyspark    

6. Next Steps :

As a next step , you can also run spark jobs using spark-submit.  

7. Common Error :

Most common error - The system cannot find the path specified It happens when the environment variables & path are not correctly set up. If you follow all my steps correctly , this error should not appear. If you still face issue , do let me know in the comments.   If you liked this post , you can check my other posts -

   

install pyspark on windows 10, install spark on windows 10, apache spark download, pyspark tutorial, install spark and pyspark on windows, download winutils.exe for spark 64 bit, 'pyspark' is not recognized as an internal or external command, operable program or batch file, spark installation on windows 7, install pyspark on windows 10, install spark on windows 10, apache spark download, pyspark tutorial, install spark and pyspark on windows, download winutils.exe for spark 64 bit, pip install pyspark, pip install pyspark windows, install apache spark on windows, install pyspark on windows 10, spark installation on windows 7, download winutils.exe for spark 64 bit, winutils.exe download apache spark windows installer, install spark, how to install spark and scala on windows, spark