Monday, August 10, 2015

Running Spark on YARN


                                           

              Running Spark on YARN



There are two deploy modes that can be used to launch Spark applications on YARN. 
In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. 
In yarn-clientmode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
Unlike in Spark standalone mode, in which the master’s address is specified in the --master parameter, in YARN mode the ResourceManager’s address is picked up from the Hadoop configuration.


Spark is distributed as two separate packages:
  • mapr-spark
  • mapr-spark-historyserver for Spark History Server (optional)


Note :- Its assumed you have all the core mapr packages along with Yarn packages installed and the cluster has warden stopped to now install and configure spark on the nodes.

1)  On one node you would install spark and spark history server package while rest nodes you can just install spark package

yum install mapr-spark mapr-spark-historyserver -y
yum install mapr-spark -y


2)  Run the configure.sh command:

[root@yarn1 ~]#/opt/mapr/server/configure.sh -R
Configuring Hadoop-2.5.1 at /opt/mapr/hadoop/hadoop-2.5.1
Done configuring Hadoop
Node setup configuration:  cldb fileserver historyserver nodemanager spark-historyserver webserver zookeeper
Log can be found at:  /opt/mapr/logs/configure.log

Thats all, Spark is preconfigured for YARN and does not require any additional configuration to run. To test the installation, run the following command:

[root@yarn1 ~]# su - mapr

[mapr@yarn1 ~]$ MASTER=yarn-cluster /opt/mapr/spark/spark-1.2.1/bin/run-example org.apache.spark.examples.SparkPi
Spark assembly has been built with Hive, including Datanucleus jars on classpath
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/spark/spark-1.2.1/lib/spark-assembly-1.2.1-hadoop2.5.1-mapr-1501.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/08/09 21:18:17 INFO client.RMProxy: Connecting to ResourceManager at /10.10.70.118:8032
15/08/09 21:18:18 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
15/08/09 21:18:18 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/08/09 21:18:18 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/08/09 21:18:18 INFO yarn.Client: Setting up container launch context for our AM
15/08/09 21:18:18 INFO yarn.Client: Preparing resources for our AM container
15/08/09 21:18:18 INFO yarn.Client: Uploading resource file:///opt/mapr/spark/spark-1.2.1/lib/spark-assembly-1.2.1-hadoop2.5.1-mapr-1501.jar -> maprfs:/user/mapr/.sparkStaging/application_1439183740763_0002/spark-assembly-1.2.1-hadoop2.5.1-mapr-1501.jar
15/08/09 21:18:23 INFO yarn.Client: Uploading resource file:/opt/mapr/spark/spark-1.2.1/lib/spark-examples-1.2.1-hadoop2.5.1-mapr-1501.jar -> maprfs:/user/mapr/.sparkStaging/application_1439183740763_0002/spark-examples-1.2.1-hadoop2.5.1-mapr-1501.jar
15/08/09 21:18:29 INFO yarn.Client: Setting up the launch environment for our AM container
15/08/09 21:18:29 INFO spark.SecurityManager: Changing view acls to: mapr
15/08/09 21:18:29 INFO spark.SecurityManager: Changing modify acls to: mapr
15/08/09 21:18:29 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mapr); users with modify permissions: Set(mapr)
15/08/09 21:18:29 INFO yarn.Client: Submitting application 2 to ResourceManager
15/08/09 21:18:29 INFO security.ExternalTokenManagerFactory: Initialized external token manager class - com.mapr.hadoop.yarn.security.MapRTicketManager
15/08/09 21:18:29 INFO impl.YarnClientImpl: Submitted application application_1439183740763_0002
15/08/09 21:18:30 INFO yarn.Client: Application report for application_1439183740763_0002 (state: ACCEPTED)
15/08/09 21:18:30 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.mapr
start time: 1439183909584
final status: UNDEFINED
tracking URL: http://yarn2:8088/proxy/application_1439183740763_0002/
user: mapr
15/08/09 21:18:31 INFO yarn.Client: Application report for application_1439183740763_0002 (state: ACCEPTED)
15/08/09 21:18:32 INFO yarn.Client: Application report for application_1439183740763_0002 (state: ACCEPTED)
15/08/09 21:18:40 INFO yarn.Client: Application report for application_1439183740763_0002 (state: RUNNING)
15/08/09 21:18:40 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: yarn1
ApplicationMaster RPC port: 0
queue: root.mapr
start time: 1439183909584
final status: UNDEFINED
tracking URL: http://yarn2:8088/proxy/application_1439183740763_0002/
user: mapr
15/08/09 21:18:41 INFO yarn.Client: Application report for application_1439183740763_0002 (state: RUNNING)
15/08/09 21:18:42 INFO yarn.Client: Application report for application_1439183740763_0002 (state: RUNNING)
15/08/09 21:18:43 INFO yarn.Client: Application report for application_1439183740763_0002 (state: RUNNING)
15/08/09 21:18:44 INFO yarn.Client: Application report for application_1439183740763_0002 (state: RUNNING)
15/08/09 21:18:45 INFO yarn.Client: Application report for application_1439183740763_0002 (state: RUNNING)
15/08/09 21:18:55 INFO yarn.Client: Application report for application_1439183740763_0002 (state: RUNNING)
15/08/09 21:18:56 INFO yarn.Client: Application report for application_1439183740763_0002 (state: FINISHED)
15/08/09 21:18:56 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: yarn1
ApplicationMaster RPC port: 0
queue: root.mapr
start time: 1439183909584
final status: SUCCEEDED
tracking URL: http://yarn2:8088/proxy/application_1439183740763_0002/history/application_1439183740763_0002
user: mapr

No comments:

Post a Comment