Fixing Spill inconsistency in MapRDB
While doing a consistency check we find one of the tablet "2236.32.131230" has problems and its consistency check is failing below are the steps to identify and fix the issue.
Caution :- Please do not run step 7 from this blog without consulting with MapR Support since the command is very low level command and needs to be executed with extreme caution.
While doing a consistency check we find one of the tablet "2236.32.131230" has problems and its consistency check is failing below are the steps to identify and fix the issue.
Caution :- Please do not run step 7 from this blog without consulting with MapR Support since the command is very low level command and needs to be executed with extreme caution.
1) Run consistency check as described in http://abizeradenwala.blogspot.com/2016/12/mapr-db-tablet-consistency-check.html , it would fail as below with an error.
maprcli debugdb checkTablet -fid 2236.32.131230 -startkey user5137029094653148843 -endkey user7197423315127484105 -tracefile t1
ERROR (10009) - fs rpc failed
2) Get the master node of container which holds the tablet.
maprcli dump containerinfo -ids 2236 -json
{
"timestamp":1482450014994,
"timeofday":"2016-12-22 03:40:14.994 GMT-0800",
"status":"OK",
"total":1,
"data":[
{
"ContainerId":2236,
"Epoch":5,
"Master":"10.10.70.177:5660--5-VALID",
"ActiveServers":{
"IP:Port":[
"10.10.70.110:5660--5-VALID",
"10.10.70.109:5660--5-VALID",
"10.10.70.111:5660--5-VALID"
3) Now look at /opt/mapr/logs/mfs.log-3 on the master node of the container below errors are logged which point at the issue i.e issue in reading spill 2236.2319.147898
2016-12-22 15:38:15,2490 ERROR DB db/mfsread.cc:240 ***********FileRead RPC 2236.2319.147898 failed: 116 -----> RPC error occurred since read RPC failed
2016-12-22 15:38:15,2491 ERROR DB tabletrangecheck.cc:3222 CheckSpillHeaderReadSME : read of spill 2236.2319.147898 failed 116
2016-12-22 15:38:15,2491 ERROR DB tabletrangecheck.cc:1708 TabletRangeCheckSpillmapProcess : child error 116 for spillmap 2236.2299.147866
2016-12-22 15:38:15,2491 ERROR DB tabletrangecheck.cc:1417 TabletRangeCheckSegmapProcess : child error 116 for segmap 2236.1747.134662
4) From tablet Fid we need to get segment fid 2236.1747.134662 which is also printed in the in above mfs.log-3
maprcli debugdb dump -fid 2236.32.131230
value key
{"value":{}} endkey.user7197423315127484105
{"value":{"segfid":"<parentCID>.1747.134662","isFrozen":false,"inSplit":false,"useBucketDesc":true,"lastFlushedBucketFid":"2236.968.145294","numLogicalBlocks":174220,"numPhysicalBlocks":91527,"numRows":76195,"numRowsWithDelete":0,"numRemoteBlocks":0,"numSpills":868,"numSegments":323}} pmap.user5137029094653148843
{"value":{"segfid":"<parentCID>.1748.134664","isFrozen":false,"inSplit":false,"useBucketDesc":true,"lastFlushedBucketFid":"2236.687.144732","numLogicalBlocks":179179,"numPhysicalBlocks":94082,"numRows":78300,"numRowsWithDelete":0,"numRemoteBlocks":0,"numSpills":873,"numSegments":342}} pmap.user5666253514688897233
{"value":{"segfid":"<parentCID>.1176.133520","isFrozen":false,"inSplit":false,"useBucketDesc":true,"lastFlushedBucketFid":"2236.1516.146388","numLogicalBlocks":390865,"numPhysicalBlocks":179574,"numRows":142771,"numRowsWithDelete":0,"numRemoteBlocks":0,"numSpills":2421,"numSegments":677}} pmap.user6214000520416211177
{"value":{}} startkey.user5137029094653148843
5) Now from segment fid get the spill map for the spill.
maprcli debugdb dump -fid 2236.1747.134662
value key
{"value":{"fid":"<parentCID>.988.145334"}} user5137029094653148843
{"value":{"fid":"<parentCID>.989.145336"}} user5139360914320437622
{"value":{"fid":"<parentCID>.1264.145884"}} user5646406973934510610
{"value":{"fid":"<parentCID>.4804.140744"}} user5647105917867305914
{"value":{"fid":"<parentCID>.1267.145890"}} user5648419690992190947
{"value":{"fid":"<parentCID>.1268.145892"}} user5651953154952706923
{"value":{"fid":"<parentCID>.2299.147866"}} user5652099992370546090
{"value":{"fid":"<parentCID>.2300.147868"}} user5654138608241368186
{"value":{"fid":"<parentCID>.4809.140754"}} user5654624228442534993
{"value":{"fid":"<parentCID>.3398.137948"}} user5656079376459834156
{"value":{"fid":"<parentCID>.1269.145894"}} user5658158163442262620
{"value":{"fid":"<parentCID>.1270.145896"}} user5660771976780382135
{"value":{"fid":"<parentCID>.3924.139000"}} user5661016661850577428
{"value":{"fid":"<parentCID>.1275.145906"}} user5662697508043691134
{"value":{"fid":"<parentCID>.1276.145908"}} user566617831443475188
6) Now from the spill map we got the spill Fid and the Key "0" associated with it which has the problem.
maprcli debugdb dump -fid 2236.2299.147866 -json
{
"timestamp":1482450468536,
"timeofday":"2016-12-22 03:47:48.536 GMT-0800",
"status":"OK",
"total":1,
"data":[
{
"key":0,
"numRemoteBlocks":0,
"numSpills":0,
"numSegments":0,
"value":{
"fid":"<parentCID>.2319.147898",
"smeSize":342,
"keyIdxOffset":12,
"keyIdxLength":45806,
"ldbIdxLength":126,
"bloomBitsPerKey":26,
"numLogicalBlocks":606,
"numPhysicalBlocks":360,
"numRows":305,
"numRowsWithDelete":0,
"families":{
"id":[
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11
],
"offset":[
524288,
983040,
1441792,
1900544,
2359296,
2818048,
3276800,
3735552,
4194304,
4653056,
5111808
],
"length":[
433957,
433561,
433523,
433152,
432884,
430810,
433202,
433185,
433931,
432497,
271649
],
"minTimeStamp":[
1482285384222,
1482285384222,
1482285384222,
1482285384222,
1482285384222,
1482285384222,
1482285384222,
1482285384222,
1482285384222,
1482285384222,
1482287036566
],
"maxTimeStamp":[
1482287240397,
1482287240397,
1482287240397,
1482287240397,
1482287240397,
1482287240397,
1482287240397,
1482287240397,
1482287240397,
1482287240397,
1482287240397
]
}
}
}
]
}
7) Now to delete the specific Spill we need the spill map, spill fid and the key from step 5 and 6 to execute below command.
maprcli debugdb multiOp -kvfid 2236.2299.147866 -delkeys 0 -delfids 2236.2319.147898
No comments:
Post a Comment