In this blog post, we will install Apache Hive in Ubuntu Machine(Ubuntu 16.04.5 LTS (GNU/Linux 4.4.0-36-generic x86_64)
). Once installation is complete, we will run Hive queries using Hive Query Language(HQL) to verify the installation.
Figure: Ubuntu Version
Prerequisites for Hive Installation
Before Installing the hive, we need to make sure that both Java
and Hadoop
is installed and configured in a cluster.
Install Java
First Update the Ubuntu with the latest software and patches if available.
sudo apt-get update && sudo apt-get -y dist-upgrade
Use the below command to install the open JDK version of Java.
sudo apt-get -y install openjdk-8-jdk-headless
Install Apache Hive
Download and Decompress Hive
First, download the latest available Hive installation archive from the mirror site.
cd /tmp
sudo wget https://www-eu.apache.org/dist/hive/stable-2/apache-hive-2.3.4-bin.tar.gz
[maria_dev@sandbox-hdp ~]$ cd /tmp
[maria_dev@sandbox-hdp tmp]$ sudo wget https://www-eu.apache.org/dist/hive/stable-2/apache-hive-2.3.4-bin.tar.gz
--2019-04-14 21:27:49-- https://www-eu.apache.org/dist/hive/stable-2/apache-hive-2.3.4-bin.tar.gz
Resolving www-eu.apache.org (www-eu.apache.org)... 95.216.24.32, 2a01:4f9:2a:185f::2
Connecting to www-eu.apache.org (www-eu.apache.org)|95.216.24.32|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 232234292 (221M) [application/x-gzip]
Saving to: ‘apache-hive-2.3.4-bin.tar.gz’
100%[====================================================================================================================================================================================================>] 232,234,292 13.9MB/s in 17s
2019-04-14 21:28:07 (13.0 MB/s) - ‘apache-hive-2.3.4-bin.tar.gz’ saved [232234292/232234292]
Once the file is downloaded, Decompress the Tar file and move to the installation location
tar -xvf apache-hive-2.3.4-bin.tar.gz
mv apache-hive-2.3.4-bin /usr/local/hive
Change Permission to the installation directory
If you want to run the hive besides root
user you need to change ownership of hive directory to the desired user and hive proper permission.
For my case, Apache Hive is being installed for user hduser
at location /usr/local/hive
.
## Give 755 Permisiion to Folder
chmod 755 -R /usr/local/hive
## Change ownership
chown -R hduser /usr/local/hive
Skip this step if you are installing hive as default user.
Set the HIVE_HOME in the system Path
Now we have moved the hive installation file to /usr/local/hive
. We need to add this path to Ubuntu system Path if we want to access hive from anywhere in that Ubuntu.
In a Debian-based system .bashrc is a shell script that Bash runs whenever it is started interactively. It initializes an interactive shell session.
Use the text editor like vim
or nano
to open and edit the file.
nano ~/.bashrc
Set the Hive Home Path in the .bashrc
file like below.
#HIVE Path
export HIVE_HOME=/usr/local/hive
export HIVE_CONF_DIR=/usr/local/hive/conf
export PATH=$HIVE_HOME/bin:$PATH
Now, to make the Hive path available, we need to reload the .bashrc
file using the source
command
source ~/.bashrc
Check Hadoop and Java Path in .bashrc
Before running Hive, we need to make sure that Apache Hadoop and Java are set up in the path and running properly.
#HADOOP VARIABLES START
export HADOOP_HOME="/usr/local/hadoop"
export PATH="$HADOOP_HOME/bin:$PATH"
export PATH="$HADOOP_HOME/sbin:$PATH"
export HADOOP_MAPRED_HOME="$HADOOP_HOME"
export HADOOP_COMMON_HOME="$HADOOP_HOME"
export HADOOP_HDFS_HOME="$HADOOP_HOME"
export YARN_HOME="$HADOOP_HOME"
export HADOOP_COMMON_LIB_NATIVE_DIR="$HADOOP_HOME/lib/native"
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
#HADOOP VARIABLES END
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export PATH=$JAVA_HOME/bin:$PATH
Now use the jps
and hadoop version
command to check if Apache Hadoop is running or not.
Figure: Check Hadoop and Hive Version
Create Hive Warehouse directory and initialize Derby
Let’s configure the directory information in Hadoop Distributed File System(HDFS) where the hive can store its data.
hdfs dfs -mkdir -p /user/hive/warehouse
Now give proper permission to the warehouse
hdfs dfs -chmod 755 /user/hive/warehouse
Now let’s inform hive about the database that it should use for its schema definition. The below command tells the hive to use the derby database as its metastore database. We can also specify this in the Hadoop hive configuration file ‘hive-site.xml’ file.
$HIVE_HOME/bin/schematool -initSchema -dbType derby
Figure: Initialize Derby database
Run Hive Queries (Hive Query Langauge)
Start the Hive Shell
hive
Figure: Hive Shell
Create a Database in Hive
We will create a new database named niten_test
and display all existing databases using SHOW DATABASES
command.
CREATE DATABASE IF NOT EXISTS niten_test;
SHOW DATABASES;
Figure: Create Hive Database
Create Hive Table
We have just created our own database, which we can use to create a table.
So, switch to the database you just created.
USE niten_test;
Now create a table inside this database with the below fields.
CREATE TABLE IF NOT EXISTS niten_table(
id INT,
first_name String,
last_name String,
website String);
Figure: Create Hive Table
Once the table is successfully created, we can display the tables and the schema of the table.
show tables;
desc niten_table;
Figure: Show and Describe Hive Table
Insert Records into Hive Tables
INSERT INTO TABLE niten_table VALUES(1,'Nitendra','Gautam','nitendragautam.com');
Figure: Insert Record Hive Table
Display the record
SELECT * FROM niten_test;
Figure: Display Record
To conclude, we have installed and validated Apache Hive in the Ubuntu server.