Thursday, July 20, 2017

Distcp Across Secure MapR clusters

                                   Distcp Across Secure MapR clusters 



This Blog assumes you have 2 clusters up and running securely


Note :-  For this Blog i just have 1 node in each cluster but as needed will mention what files are needed on all nodes.

Source Cluster  - Node node106rhel72/10.10.70.106
Destination Cluster  - Node node107rhel72/10.10.70.107  

1)  i) On all node in SOURCE CLUSTER verify that maprserverticket , cldb.key , ssl_truststore, ssl_keystore are same. Run md5sum on these file on each node to confirm.
ii) Make sure destination cluster details are added to "
mapr-clusters.conf" file


[root@node106rhel72 ~]# cat /opt/mapr/conf/mapr-clusters.conf
Container-cluster secure=true 10.10.70.106:7222
Container-cluster2 secure=true 10.10.70.107:7222

2) On all node in DESTINATION CLUSTER verify that maprserverticket , cldb.key , ssl_truststore, ssl_keystore are same. Run md5sum on these file on each node to confirm.
3)  i) Copy /opt/mapr/conf/ssl_truststore from DESTINATION CLUSTER to cldb node of SOURCE CLUSTER under /tmp/

[root@node107rhel72 conf]# scp  /opt/mapr/conf/ssl_truststore  10.10.70.106:/tmp/
root@10.10.70.106's password: 
ssl_truststore                                                                                                                         100%  798     0.8KB/s   00:00    
[root@node107rhel72 conf]#

 2) Now run the below command to merge ssl_truststore on SOURCE CLUSTER

Note: Ignore ssl_truststore merge step if in case you have already done it earlier.
$ chmod 644 /opt/mapr/conf/ssl_truststore
$ /opt/mapr/server/
manageSSLKeys.sh merge /tmp/ssl_truststore /opt/mapr/conf/ssl_truststore 
$ chmod 444 /opt/mapr/conf/ssl_truststore 

4) Copy the merged truststore file '/opt/mapr/conf/ssl_truststore' on all the node in SOURCE CLUSTER under /opt/mapr/conf/ 

5) Generate cross-cluster ticket from DESTINATION CLUSTER for user who wants to do distcp ( MapR in our case), in this case i created ticket to last for 10 years

$ maprlogin generateticket -type crosscluster -out /tmp/destination-ticket -duration 3650:0:0 

Note: - It is critical to specify an appropriate value for the duration. After the ticket expires, communication between the clusters will stop. In this example, the duration of ten years is given for convenience of explanation. Use a value that is consistent with your security policies.

6) Copy file /tmp/destination-ticket from DESTINATION CLUSTER to SOURCE CLUSTER's cldb node under /tmp. 


scp /tmp/destination-ticket  10.10.70.106:/tmp/

7) At SOURCE CLUSTER append the content of file /tmp/destination-ticket in /opt/mapr/conf/maprserverticket .

$ cat /tmp/destination-ticket >> /opt/mapr/conf/maprserverticket


8) Copy file /opt/mapr/conf/maprserverticket on all the nodes in SOURCE CLUSTER . 

9) Stop warden and 
zookeeper in SOURCE CLUSTER followed by starting ZK and then warden once ZK is up

10) On SOURCE CLUSTER create user ticket for user mapr for source and destination cluster .

maprlogin password
maprlogin password -cluster Dest

cat /tmp/maprticket_2000
Source KV34qQ0jtmQXObJglDiZqqHHm507pbYOsHd4qIEEavC+0PGDlB/YeTBGReOxf+EleSEO78pYvNqzoqK5uK+5Gibx0v+XPEyl2UuDgBR6GUBwx4yUUxnUY7Ct4STdcHmvcyE47AVM4gXc9ivQCvkokyIvZwYiGtwVQ8rnTNrLuzuUPAH8GMbR486UgMQ8axy8QIcA2zexIT0K0Ct7Fj612UPVonXZDfnAB2yG5gEhdmxLOMPmQLm9qt6f49Pzrn96IwHGLXQtUAmfrTwrbPPPOSUshA==
Dest 4D9Z469Y3j7h3sy2CVZwQrlXDEWHCtmCENQQGFvVzoGsytXp4K3OLOf+BZhLIoTBZuu2uzmV/1SbnqYUfO9NXsxAx3Bomez9iZ3ni7Kfk9m9CTEPydl9updp8IFQZ83jQ7IERM3WgN/rouEg3T/BnwPA2+U2cnGjeeCgXH3lmopJGiYFCegXWhhn9TmKawH0Vp4f3tDBBo2nWjr1sCnBvsBXhYP6DQzA3vLdmbGWQn6d2IJRNUA0irG8MSjxzZ4E9y4S2hu4gnLYE0IXgXNoWWhawQ==


Validation :


1) Created a test file and pushed in source cluster
[root@node106rhel72 ~]# vi abi
[mapr@node106rhel72 ~]# hadoop fs -put abi /
[mapr@node106rhel72 ~]# hadoop fs -ls /abi
Found 7 items
-rwxr-xr-x   3 root root        266 2017-07-20 19:16 /abi

2) Now distcp across clusters and preserve the permissions. 

[root@node106rhel72 ~]# hadoop distcp -p /mapr/Container-cluster/abi /mapr/Container-cluster2/tmp/
17/07/20 19:16:59 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[/mapr/Container-cluster/abi], targetPath=/mapr/Container-cluster2/tmp, targetPathExists=true, preserveRawXattrs=false}
17/07/20 19:16:59 INFO client.MapRZKBasedRMFailoverProxyProvider: Updated RM address to node106rhel72/10.10.70.106:8032
17/07/20 19:17:00 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
17/07/20 19:17:00 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
17/07/20 19:17:00 INFO client.MapRZKBasedRMFailoverProxyProvider: Updated RM address to node106rhel72/10.10.70.106:8032
17/07/20 19:17:00 INFO mapreduce.JobSubmitter: number of splits:1
17/07/20 19:17:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500592302342_0009
17/07/20 19:17:01 INFO security.ExternalTokenManagerFactory: Initialized external token manager class - com.mapr.hadoop.yarn.security.MapRTicketManager
17/07/20 19:17:01 INFO impl.YarnClientImpl: Submitted application application_1500592302342_0009
17/07/20 19:17:01 INFO mapreduce.Job: The url to track the job: https://node106rhel72:8090/proxy/application_1500592302342_0009/
17/07/20 19:17:01 INFO tools.DistCp: DistCp job-id: job_1500592302342_0009
17/07/20 19:17:01 INFO mapreduce.Job: Running job: job_1500592302342_0009
17/07/20 19:17:09 INFO mapreduce.Job: Job job_1500592302342_0009 running in uber mode : false
17/07/20 19:17:09 INFO mapreduce.Job:  map 0% reduce 0%
17/07/20 19:17:14 INFO mapreduce.Job:  map 100% reduce 0%
17/07/20 19:17:14 INFO mapreduce.Job: Job job_1500592302342_0009 completed successfully
17/07/20 19:17:14 INFO mapreduce.Job: Counters: 34
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=100285
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
MAPRFS: Number of bytes read=631
MAPRFS: Number of bytes written=266
MAPRFS: Number of read operations=33
MAPRFS: Number of large read operations=0
MAPRFS: Number of write operations=1
Job Counters 
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=2800
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=2800
Total vcore-seconds taken by all map tasks=2800
Total megabyte-seconds taken by all map tasks=2867200
DISK_MILLIS_MAPS=1400
Map-Reduce Framework
Map input records=1
Map output records=0
Input split bytes=144
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
CPU time spent (ms)=440
Physical memory (bytes) snapshot=271511552
Virtual memory (bytes) snapshot=2987470848
Total committed heap usage (bytes)=904396800
File Input Format Counters 
Bytes Read=221
File Output Format Counters 
Bytes Written=0
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=266
BYTESEXPECTED=266
COPY=1

3) Validated file exists on destination cluster .

[root@node106rhel72 ~]# hadoop fs -ls /mapr/Container-cluster2/tmp/abi
-rwxr-xr-x   3 root root        266 2017-07-20 19:16 /mapr/Container-cluster2/tmp/abi

No comments:

Post a Comment