Filemigrater
This Blog assumes you have a 5.2 MapR secure cluster already up and running and as per your requirement you need to continuously scan for new files in specific location in MapR and move them over to bucket in AWS as per policies setup.
Note:- I have a single node cluster for this Blog but in real scenarios you could install file migrator on one of the nodes where usually lesser services are installed.
1) Stop warden
service maps-warden stop
2) Install File migrator service and run configure.sh.
rpm -ivh mapr-filemigrate-1.0.0.201704071106-1.x86_64.rpm
/opt/mapr/server/configure.sh -R
3) Now start warden .
service mapr-warden start
When the rpm is installed it installs below file would be installed as well which is more of a way for MCS to track this service i.e part of pluggable service.
[root@node9 conf]# cat /opt/mapr/filemigrate/filemigrate-1.0.0/conf/warden.filemigrate.conf
services=filemigrate:1:cldb
service.displayname=FileMigrate
service.command.start=/opt/mapr/filemigrate/filemigrate-1.0.0/bin/mapr-filemigrate.sh start
service.command.stop=/opt/mapr/filemigrate/filemigrate-1.0.0/bin/mapr-filemigrate.sh stop
service.command.monitorcommand=/opt/mapr/filemigrate/filemigrate-1.0.0/bin/mapr-filemigrate.sh status
service.command.type=BACKGROUND
service.ui.port=9444
service.uri=/api/login
service.baseservice=0
service.logs.location=/opt/mapr/filemigrate/filemigrate-1.0.0/logs/filemigrate.log
service.process.type=BINARY
service.alarm.tersename=nafmsd
service.alarm.label=FileMigrateServerDown
#The items here need to be set consistently to accurately reflect memory available for the service.
#Default is 100MB of memory.
service.env=JAVA_HEAP=100m,JETTY_PORT=9444
service.heapsize.min=100
service.heapsize.max=100
service.heapsize.percent=2
4) Once the cluster and File Migrater service is up we can see then running.
[root@node9 conf]# jps
15829 WardenMain
16321 Jps
16660 CLDB
22119 ResourceManager
21463 CommandServer
22324 FileMigrateApplication
29649 QuorumPeerMain
27632 NodeManager
5) Since MapR only trusts its own self-signed certificates by default. To configure MapR to trust the certificates used by AWS S3 for HTTPS upload, you’ll need to configure additional trusted certificates. Add one of the following to the /opt/mapr/conf/ssl_truststore file on every node in the cluster. As of this writing, the root certificate used by AWS S3 is the Baltimore CyberTrust root certificate provided by Digicert
Warning: Most Baltimore CyberTrust root certificates will expire in 2025 and expired certificates cannot be used for connecting to AWS S3. When Amazon replaces their certificates with those issued by new certificate authorities, update the truststore to hold both the old and new root certificates for a smooth transition.
Download the cert.
[root@node9 tmp]# wget https://www.digicert.com/CACerts/BaltimoreCyberTrustRoot.crt
--2017-04-11 18:19:11-- https://www.digicert.com/CACerts/BaltimoreCyberTrustRoot.crt
Resolving www.digicert.com... 64.78.193.234
Connecting to www.digicert.com|64.78.193.234|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 891 [application/x-x509-ca-cert]
Saving to: “BaltimoreCyberTrustRoot.crt”
100%[=========================================================================================================================================>] 891 --.-K/s in 0s
2017-04-11 18:19:20 (175 MB/s) - “BaltimoreCyberTrustRoot.crt” saved [891/891]
[root@node9 conf]# cd /opt/mapr/conf
Run the command to add the certificate. Enter the keystore password when prompted. The default is mapr123
[root@node9 conf]# keytool -importcert -file /tmp/BaltimoreCyberTrustRoot.crt -keystore ssl_truststore
Enter keystore password:
Owner: CN=Baltimore CyberTrust Root, OU=CyberTrust, O=Baltimore, C=IE
Issuer: CN=Baltimore CyberTrust Root, OU=CyberTrust, O=Baltimore, C=IE
Serial number: 20000b9
Valid from: Fri May 12 11:46:00 PDT 2000 until: Mon May 12 16:59:00 PDT 2025
Certificate fingerprints:
MD5: AC:B6:94:A5:9C:17:E0:D7:91:52:9B:B1:97:06:A6:E4
SHA1: D4:DE:20:D0:5E:66:FC:53:FE:1A:50:88:2C:78:DB:28:52:CA:E4:74
SHA256: 16:AF:57:A9:F6:76:B0:AB:12:60:95:AA:5E:BA:DE:F2:2A:B3:11:19:D6:44:AC:95:CD:4B:93:DB:F3:F2:6A:EB
Signature algorithm name: SHA1withRSA
Version: 3
Extensions:
#1: ObjectId: 2.5.29.19 Criticality=true
BasicConstraints:[
CA:true
PathLen:3
]
#2: ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
Key_CertSign
Crl_Sign
]
#3: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: E5 9D 59 30 82 47 58 CC AC FA 08 54 36 86 7B 3A ..Y0.GX....T6..:
0010: B5 04 4D F0 ..M.
]
]
Trust this certificate? [no]: y
Certificate was added to keystone <-- Means Cert was added sucessfully.
[root@node9 conf]#
Note:- Copy the ssl_truststore file to all other MapR nodes in the same location (i.e., /opt/mapr/conf/) in case of multi node cluster.
6) Now restart file migrater
maprcli node services -name filemigrate -nodes 10.10.70.109 -action restart
Now login to service UI Directly.
https://10.10.70.109:9444 ( mapr/<mapr user password>)
7) Now since the service is up next step is to configure File migration to be able to connect to aws using secret key with different setting, followed by restart for files to be copied from MapR cluster to AWS .
Adding Properties is straight forward and can be done via command line or UI .
a) Command line : Copy the edited FileMigrate.properties file to /var/mapr/filemigrate/ directory on MapRFS.
b) UI : Click on setting on right top corner and fill out info needed .
8) Now to start moving the files to S3
i. add new policy via clicking on "New File Migration Policy" Alternatively, select Policy from the dropdown menu.
ii. Set the following in the Add Data Migration Policy page and click OK.
(Required) Directory Path
(Required) Target Bucket
Purge Interval
Delete Empty Directories
Ignore Files Regex
X-Attributes
Test I : To Test Data Migration I created 4 files as below under the "srcvol" mount point.
[root@node9 ~]# hadoop fs -ls /srcvol
Found 5 items
-rwxr-xr-x 3 root root 0 2017-04-11 19:33 /srcvol/a
-rwxr-xr-x 3 root root 0 2017-04-11 19:33 /srcvol/b
-rwxr-xr-x 3 root root 0 2017-04-11 19:33 /srcvol/c
-rwxr-xr-x 3 root root 1359 2017-04-11 19:32 /srcvol/filemigrate.out
From the "filemigrate.log" we can see when "com.mapr.filemigrate.FileMigrateServer" starts the scan it finds this new 4 files and now uploads the files to S3 .
2017-04-11 19:48:03,648 INFO com.mapr.filemigrate.FileMigrateServer [FileMigrateServer:mainthread]: Starting new file scan...
2017-04-11 19:48:03,648 INFO com.mapr.filemigrate.ScanDirectoryTree [FileMigrateServer:mainthread]: Starting incremental scan of /srcvol
2017-04-11 19:48:03,651 INFO com.mapr.filemigrate.FileMigrateServer [FileMigrateServer:mainthread]: Scan for new files completed after 0.00 seconds.
2017-04-11 19:48:03,651 INFO com.mapr.filemigrate.FileMigrateServer [FileMigrateServer:mainthread]: Pausing for 60.00 seconds.
2017-04-11 19:48:21,663 INFO com.mapr.filemigrate.S3UploadManager [S3UploadManager:monitorActiveUploads]: Stats summary: UploadStats [activeUploadsCounter=0, waitingUploadsCounter=4, bytesUploadedLastHour=0, uploadsLastHour=0]
2017-04-11 19:48:30,658 INFO com.mapr.filemigrate.S3UploadManager [S3UploadManager:lookForWork]: starting upload for maprfs:///srcvol/filemigrate.out
2017-04-11 19:48:36,664 INFO com.mapr.filemigrate.S3UploadManager [S3UploadManager:clearCompletedUploads]: upload completed successfully for ActiveUpload [path=maprfs:///srcvol/filemigrate.out, state=Completed, bucket=filemigratertest, enqueueTime=Tue Apr 11 19:33:03 PDT 2017, uploadStartTime=Tue Apr 11 19:48:30 PDT 2017, uploadCompleteTime=Tue Apr 11 19:48:35 PDT 2017, size=1359, modificationTime=Tue Apr 11 19:32:57 PDT 2017, key=srcvol/filemigrate.out, errors=3, percentSent=100.0, lastStateUpdate=Tue Apr 11 19:48:35 PDT 2017]
2017-04-11 19:49:03,649 INFO com.mapr.filemigrate.FileMigrateServer [FileMigrateServer:mainthread]: Starting new file scan...
2017-04-11 19:49:03,649 INFO com.mapr.filemigrate.ScanDirectoryTree [FileMigrateServer:mainthread]: Starting incremental scan of /srcvol
2017-04-11 19:49:03,651 INFO com.mapr.filemigrate.FileMigrateServer [FileMigrateServer:mainthread]: Scan for new files completed after 0.00 seconds.
2017-04-11 19:49:03,651 INFO com.mapr.filemigrate.FileMigrateServer [FileMigrateServer:mainthread]: Pausing for 60.00 seconds.
2017-04-11 19:49:21,661 INFO com.mapr.filemigrate.S3UploadManager [S3UploadManager:lookForWork]: starting upload for maprfs:///srcvol/a
2017-04-11 19:49:21,672 INFO com.mapr.filemigrate.S3UploadManager [S3UploadManager:lookForWork]: starting upload for maprfs:///srcvol/c
2017-04-11 19:49:21,680 INFO com.mapr.filemigrate.S3UploadManager [S3UploadManager:lookForWork]: starting upload for maprfs:///srcvol/b
2017-04-11 19:49:27,666 INFO com.mapr.filemigrate.S3UploadManager [S3UploadManager:clearCompletedUploads]: upload completed successfully for ActiveUpload [path=maprfs:///srcvol/a, state=Completed, bucket=filemigratertest, enqueueTime=Tue Apr 11 19:34:03 PDT 2017, uploadStartTime=Tue Apr 11 19:49:21 PDT 2017, uploadCompleteTime=Tue Apr 11 19:49:26 PDT 2017, size=0, modificationTime=Tue Apr 11 19:33:22 PDT 2017, key=srcvol/a, errors=3, percentSent=-0.0, lastStateUpdate=Tue Apr 11 19:49:26 PDT 2017]
2017-04-11 19:49:27,671 INFO com.mapr.filemigrate.S3UploadManager [S3UploadManager:clearCompletedUploads]: upload completed successfully for ActiveUpload [path=maprfs:///srcvol/c, state=Completed, bucket=filemigratertest, enqueueTime=Tue Apr 11 19:34:03 PDT 2017, uploadStartTime=Tue Apr 11 19:49:21 PDT 2017, uploadCompleteTime=Tue Apr 11 19:49:26 PDT 2017, size=0, modificationTime=Tue Apr 11 19:33:33 PDT 2017, key=srcvol/c, errors=3, percentSent=-0.0, lastStateUpdate=Tue Apr 11 19:49:26 PDT 2017]
2017-04-11 19:49:27,676 INFO com.mapr.filemigrate.S3UploadManager [S3UploadManager:clearCompletedUploads]: upload completed successfully for ActiveUpload [path=maprfs:///srcvol/b, state=Completed, bucket=filemigratertest, enqueueTime=Tue Apr 11 19:34:03 PDT 2017, uploadStartTime=Tue Apr 11 19:49:21 PDT 2017, uploadCompleteTime=Tue Apr 11 19:49:26 PDT 2017, size=0, modificationTime=Tue Apr 11 19:33:28 PDT 2017, key=srcvol/b, errors=3, percentSent=-0.0, lastStateUpdate=Tue Apr 11 19:49:26 PDT 2017]
Verify Test I :
[root@node9 logs]# yum install s3cmd
Installed:
s3cmd.noarch 0:1.6.1-1.el6
Dependency Installed:
python-magic.x86_64 0:5.04-30.el6
Dependency Updated:
file.x86_64 0:5.04-30.el6
file-libs.x86_64 0:5.04-30.el6
Complete!
Note :- bucket filemigratertest should be existing for the service to perform its tasks, if its not existing you can create the bucket via AWS console or below command from cli.
[root@node9 logs]# s3cmd mb s3://filemigratertest
Bucket 's3://filemigratertest/' created
Verified : Files are indeed existing in the bucket.
[root@node9 logs]# s3cmd ls s3://filemigratertest/srcvol/
2017-04-12 02:49 0 s3://filemigratertest/srcvol/a
2017-04-12 02:49 0 s3://filemigratertest/srcvol/b
2017-04-12 02:49 0 s3://filemigratertest/srcvol/c
2017-04-12 02:48 1359 s3://filemigratertest/srcvol/filemigrate.out
Same can be verified from AWS console as well.
Test II : Created a new file "/srcvol/d" and testing if service picks this new file and uploads it.
[root@node9 logs]# hadoop fs -ls /srcvol
Found 5 items
-rwxr-xr-x 3 root root 0 2017-04-11 19:33 /srcvol/a
-rwxr-xr-x 3 root root 0 2017-04-11 19:33 /srcvol/b
-rwxr-xr-x 3 root root 0 2017-04-11 19:33 /srcvol/c
-rwxr-xr-x 3 root root 0 2017-04-11 20:11 /srcvol/d
-rwxr-xr-x 3 root root 1359 2017-04-11 19:32 /srcvol/filemigrate.out
As seen after a minute when the scan runs this file is queued for the upload .
2017-04-11 20:12:03,676 INFO com.mapr.filemigrate.FileMigrateServer [FileMigrateServer:mainthread]: Starting new file scan...
2017-04-11 20:12:03,676 INFO com.mapr.filemigrate.ScanDirectoryTree [FileMigrateServer:mainthread]: Starting incremental scan of /srcvol
2017-04-11 20:12:03,702 INFO com.mapr.filemigrate.S3UploadManager [S3UploadManager:lookForWork]: starting upload for maprfs:///srcvol/d
2017-04-11 20:12:03,706 INFO com.mapr.filemigrate.FileMigrateServer [FileMigrateServer:mainthread]: Scan for new files completed after 0.00 seconds.
2017-04-11 20:12:03,706 INFO com.mapr.filemigrate.FileMigrateServer [FileMigrateServer:mainthread]: Pausing for 59.97 seconds.
2017-04-11 20:12:12,695 INFO com.mapr.filemigrate.S3UploadManager [S3UploadManager:clearCompletedUploads]: upload completed successfully for ActiveUpload [path=maprfs:///srcvol/d, state=Completed, bucket=filemigratertest, enqueueTime=Tue Apr 11 20:12:03 PDT 2017, uploadStartTime=Tue Apr 11 20:12:03 PDT 2017, uploadCompleteTime=Tue Apr 11 20:12:08 PDT 2017, size=0, modificationTime=Tue Apr 11 20:11:20 PDT 2017, key=srcvol/d, errors=0, percentSent=-0.0, lastStateUpdate=Tue Apr 11 20:12:08 PDT 2017]
Verified :
[root@node9 logs]# s3cmd ls s3://filemigratertest/srcvol/
2017-04-12 02:49 0 s3://filemigratertest/srcvol/a
2017-04-12 02:49 0 s3://filemigratertest/srcvol/b
2017-04-12 02:49 0 s3://filemigratertest/srcvol/c
2017-04-12 03:12 0 s3://filemigratertest/srcvol/d
2017-04-12 02:48 1359 s3://filemigratertest/srcvol/filemigrate.out