In this post, I will explain the Spark-Submit Command Line Arguments(Options). We will touch upon the important Arguments used in Spark-submit command. And at the last , I will collate all these arguments and show a complete spark-submit command using all these arguements. We consider Spark 2.x version for writing this post.
org.com.sparkProject.examples.MyApp
)
\--class org.com.sparkProject.examples.MyApp
\--name SparkApp
spark://10.21.195.82:7077
). It does not run any external Resource Manager like Mesos or Yarn.local[8]
: Means run locally on 8 Cores , local [*] means use all the available cores.cluster
) or locally as an external client (client
) (default: client
). To understand the difference between Cluster & Client Deployments, read this post.
--conf "spark.eventlog.enabled=true"
--executor-memory 2G
--driver-memory 3G
--num-executors 12
hdfs://
path or a file://
path that is present on all nodes. e.g.
/AA/BB/target/spark-project-1.0-SNAPSHOT.jar
/project/spark-project-1.0-SNAPSHOT.jar input1.txt input2.txt
\-jars cassandra-connector.jar, some-other-package-1.jar, some-other-package-2.jar
\--py-files dependency\_files/egg.egg
Note Additional points below for PySpark job -
**./****bin/spark-submit \ --class <main-class> \ --master <master-url> \ --deploy-mode <deploy-mode> \ --conf <key>=<value> \ ... # options <application-jar> \ [application-arguments] <--- here our app arguments**
**export HADOOP\_CONF\_DIR=XXX**
**./bin/spark-submit \\**
**\--class org.com.sparkProject.examples.MyApp \\**
**\--master yarn \\**
**\--deploy-mode cluster \\**
**\--executor-memory 20G \\**
**\--num-executors 50 \\**
**\--conf "spark.eventlog.enabled=true"**
**\--jars cassandra-connector.jar, some-other-package-1.jar, some-other-package-2.jar**
**/project/spark-project-1.0-SNAPSHOT.jar input1.txt input2.txt #Argument to the Program**
**./bin/spark-submit \\**
**--master yarn \\**
**--deploy-mode cluster \\
--executor-memory 5G \\
--executor-cores 8 \\** **--py-files dependency\_files/egg.egg
--archives dependencies.tar.gz**
**mainPythonCode.py value1 value2 #This is the Main Python Spark code file followed by
#arguments(value1,value2) passed to the program**
Example of how the arguments passed (value1, value2) can be handled inside the program.
import os
import sys
n = int(sys.argv\[1\])
a = 2
argspassed = \[\]
for \_ in range(n):
argspassed.append(sys.argv\[a\])
a += 1
print(argspassed)
**./bin/spark-submit**
**--class org.com.sparkProject.examples.MyApp \\**
**--master local\[2\] \\ #Running on 2-cores**
**/project/spark-project-1.0-SNAPSHOT.jar input.txt**
**./bin/spark-submit**
**\--class org.com.sparkProject.examples.MyApp \\**
**\--master spark://<IP\_Address:Port\_No>**
**/project/spark-project-1.0-SNAPSHOT.jar input.txt**
**./bin/spark-submit**
**\--class org.com.sparkProject.examples.MyApp**
**\--master spark://<IP\_Address:Port\_No>**
**\--deploy-mode cluster**
**/project/spark-project-1.0-SNAPSHOT.jar input.txt**
**export HADOOP\_CONF\_DIR=XXX ./bin/spark-submit**
**\--class org.com.sparkProject.examples.MyApp**
**\--master yarn**
**\--deploy-mode cluster**
**\--executor-memory 5G**
**\--num-executors 10**
**/project/spark-project-1.0-SNAPSHOT.jar input.txt**
**export HADOOP\_CONF\_DIR=XXX ./bin/spark-submit**
**\--class org.com.sparkProject.examples.MyApp**
**\--master k8s://<IP\_Address>:443**
**\--deploy-mode cluster**
**\--executor-memory 5G**
**\--num-executors 10**
**/project/spark-project-1.0-SNAPSHOT.jar input.txt**
Other Posts You May Find Helpful -
What is spark submit, How do I deploy a spark application,How do I run spark submit in cluster mode, How do I submit a spark job to yarn,spark-submit yarn cluster example, spark-submit python, spark-submit scala example, spark-submit --files ,spark-submit --packages, spark-submit --py-files, spark-submit java example, spark submit --files multiple files, spark-submit command pyspark, spark-submit yarn , cluster example, spark-submit command not found, spark-submit command scala, spark-submit --files, spark-submit --packages, spark-submit java example, spark-submit --py-files, spark-submit yarn cluster example, spark-submit scala example, spark-submit pyspark example, spark-submit --packages, spark-submit --files, spark-submit --py-files, spark-submit java example, spark-submit command not found, spark submit command, spark submit command arguments, spark submit arguments, spark-submit --files, spark-submit yarn cluster example, spark-submit python, spark-submit scala example, spark-submit --packages, spark-submit --py-files, spark-submit java example, spark-examples jar, spark submit options, spark-submit yarn cluster example, spark-submit options emr, spark-submit --files, spark-submit python, spark-submit scala example, spark-submit --packages, spark-submit --py-files, spark-submit java example, spark submit parameters,spark-submit yarn cluster example, spark-submit pyspark example, spark-submit --files, spark-submit scala example, spark-submit --packages, spark-submit emr, spark-submit --py-files, spark-submit java example,spark submit parameters, spark submit, spark-submit, spark, apache spark