Sunday, March 19, 2017

Understanding MapR DB Binary Table

                                                  Understanding MapR DB  Binary Table



  • A Table - an infinitely large key/value space composed of tablets ( In above diagram table spans from 1-100 key range ). Table data is organized into rows and columns and columns are organized into column families ( CF's ).


  • A Tablet/Region -  4-6GB of data that represents a particular key range. Table comprises of many tablets ( Tablet 1 {key 1~50} and Tablet 2 {key 50~100} ).


  • Partition - It is a mini tablet of up to about 2 GB (partition split in two at 2GB) with zero or one active buckets ( Tablet 1 => Partition 1 [Key 1~25] + Partition 2 [Key 25~50])
    • an active bucket is accepting puts
    • a second bucket might be flushing (or not exist)

  • Bucket -  It is a file that holds incoming writes unordered up to about 150MB. The keys are kept in in-memory index. If we run low on cache space this will force a bucket flush.

  • Segment - Collection of read only spill files that typically holds 2-4MB of sorted data (just like HBase HFiles) with the row key index at the start of the file
    • all of the values for a given row are stored together
    • all of the versions for the values are also stored together
  • Spill file - a file that contains sorted additions (called spills) to the corresponding segment. The first spill file can be thought of as a “the initial segment.”
    • a segment has no more than 6 spills
    • We consider the segment to be composed of the spills so really the first spill is the initial segment data.

      From Segment we get Spill Map:
      [root@node9 ~]# maprcli debugdb  dump -fid 2182.34.131316 -json
      {
      "timestamp":1489941805408,
      "timeofday":"2017-03-19 09:43:25.408 GMT-0700",
      "status":"OK",
      "total":1,
      "data":[
      {
      "key":"",
      "value":{
      "fid":"<parentCID>.35.131318"
      }
      }
      ]
      }

      From Spill Map we get Spill (keys)

      [root@node9 ~]# maprcli debugdb  dump -fid 2182.35.131318 -json
      {
      "timestamp":1489941819033,
      "timeofday":"2017-03-19 09:43:39.033 GMT-0700",
      "status":"OK",
      "total":3,
      "data":[
      {
      "key":0,
      "numRemoteBlocks":0,
      "numSpills":0,
      "numSegments":0,
      "value":{
      "fid":"<parentCID>.36.131320",
      "smeSize":51,
      "keyIdxOffset":12,
      "keyIdxLength":57,
      "ldbIdxLength":20,
      "bloomBitsPerKey":8000,
      "numLogicalBlocks":2,
      "numPhysicalBlocks":5,
      "numRows":1,
      "numRowsWithDelete":0,
      "families":{
      "id":1,
      "offset":524288,
      "length":43,
      "minTimeStamp":1489471080975,
      "maxTimeStamp":1489471080975
      }
      }
      },
      {
      "key":1,
      "numRemoteBlocks":0,
      "numSpills":0,
      "numSegments":0,
      "value":{
      "fid":"<parentCID>.38.131324",
      "smeSize":51,
      "keyIdxOffset":12,
      "keyIdxLength":57,
      "ldbIdxLength":20,
      "bloomBitsPerKey":8000,
      "numLogicalBlocks":2,
      "numPhysicalBlocks":5,
      "numRows":1,
      "numRowsWithDelete":0,
      "families":{
      "id":1,
      "offset":524288,
      "length":43,
      "minTimeStamp":1489471253234,
      "maxTimeStamp":1489471253234
      }
      }
      },
      {
      "key":2,
      "numRemoteBlocks":0,
      "numSpills":0,
      "numSegments":0,
      "value":{
      "fid":"<parentCID>.38.131324",
      "smeSize":52,
      "keyIdxOffset":8204,
      "keyIdxLength":57,
      "ldbIdxLength":20,
      "bloomBitsPerKey":8000,
      "numLogicalBlocks":2,
      "numPhysicalBlocks":5,
      "numRows":1,
      "numRowsWithDelete":0,
      "families":{
      "id":1,
      "offset":589824,
      "length":43,
      "minTimeStamp":1489471657238,
      "maxTimeStamp":1489471657238
      }
      }
      }
      ]
      }

No comments:

Post a Comment