Friday, July 24, 2015

Installing Hive and Connecting to Hive services


                                  Installing Hive and Connecting to Hive services

Apache Hive™ is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop-compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

Note : This post assumes you have MapR core packages installed and setup as described in my previous post "Install MapR Cluster"

Below post describes steps to follow to install hive , hive metastore and HS2 (version 12) on the same node.

1) List all version of hive packages available in yum repo.
yum search --showduplicates hive

2) Install hive , hive metastore and HS2  packages.
yum install mapr-hive-0.12.201502021326-1 mapr-hiveserver2-0.12.201502021326-1 mapr-hivemetastore-0.12.201502021326-1  -y

3) Run configure.sh for all hive packages to be listed under roles directory.
/opt/mapr/server/configure.sh -R
ls /opt/mapr/roles/

4) The metadata for Hive tables and partitions are stored in the Hive Metastore . By default, the Hive Metastore stores all Hive metadata in an embedded Apache Derby database in MapR-FS. Derby only allows one connection at a time; since we want multiple concurrent Hive sessions, we will use MySQL for the Hive Metastore. 

Installing MYSQL

i) Install mysql server package from the configured yum repo.
yum install mysql-server

ii)  Start mysql deamon .
service mysqld start

iii) Set password  for root user as "password".
/usr/bin/mysqladmin -u root password password

5) Modify hive-site.xml and add below config to the xml file.

vi /opt/mapr/hive/hive-0.12/conf/hive-site.xml
_________________________________________________________________
<configuration> 
 <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
</property>
 
 <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
 </property>
 
 <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>username to use against metastore database</description>
 </property>
 
 <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>password</value>
    <description>password to use against metastore database</description>
 </property>
 
 <property>
    <name>hive.metastore.uris</name>
    <value>thrift://localhost:9083</value>
 </property>
_________________________________________________________________

6) Add "export METASTORE_PORT=9083" parameter to hive-env.sh

cp /opt/mapr/hive/hive-0.12/conf/hive-env.sh.template /opt/mapr/hive/hive-0.12/conf/hive-env.sh

vi hive-env.sh   ( Add below parameter )

export METASTORE_PORT=9083

7)  Now we can start the cluster.

service mapr-zookeeper start 
service mapr-zookeeper qstatus      ( verify status of zookeeper )
service mapr-warden start               

maprcli node services -name hivemeta -action restart -nodes `hostname`    ( is needed restart the hivemeta and check hive metastore logs to verify it started correctly )

view /tmp/mapr/hive.log                                   ( logs related to metastore are logged here )

maprcli node services -name hs2 -action restart -nodes `hostname -f`    ( is needed restart the HS2 and check hs2 logs to verify it started correctly in hive.log as well)


Once all the service are up there are 2 ways  to connect using hive cli or HS2 ( beeline ) .

i) HS1 / Hive cli : HiveServer is an optional service that allows a remote client to submit requests to Hive, using a variety of programming languages, and retrieve results. HiveServer cannot handle concurrent requests from more than one client . Below commands lists the commands to be run while working interactively via hive cli .

[root@311-HS2-1 ~]# hive
Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.12/lib/hive-common-0.12-mapr-1501.jar!/hive-log4j.properties
hive> show tables;
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/hive/hive-0.12/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
OK
Time taken: 0.711 seconds
hive> 


ii) HS2 : HiveServer2 (HS2) is a server interface that enables remote clients to execute queries against Hive and retrieve the results. The current implementation, based on Thrift RPC, is an improved version of HiveServer and supports multi-client concurrency and authentication. Below list of commands depict way to connect to HS2 via beeline.


[root@311-HS2-1 ~]# hive --service beeline
Beeline version 0.12-mapr-1501 by Apache Hive
beeline> !connect jdbc:hive2://127.0.0.1:10000/default
scan complete in 3ms
Connecting to jdbc:hive2://127.0.0.1:10000/default
Enter username for jdbc:hive2://127.0.0.1:10000/default: mapr
Enter password for jdbc:hive2://127.0.0.1:10000/default: ****
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/hive/hive-0.12/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Connected to: Hive (version 0.12-mapr-1501)
Driver: Hive (version 0.12-mapr-1501)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://127.0.0.1:10000/default> show tables;
[HiveQueryResultSet/next] 0
+-----------+
| tab_name  |
+-----------+
+-----------+
No rows selected (0.638 seconds)
0: jdbc:hive2://127.0.0.1:10000/default>





No comments:

Post a Comment