In this blog post, we will load the retail data into MySQL instances and analyze it using Structured Query Language(SQL). SQL is one of the most popular languages among developers, data engineers, data scientists, and DBAs.
The data model for the Retail database
This retail database is obtained from the tutorial the Cloudera team provided as part of their Big Data Developer training. It has mainly 6 tables.
- Departments
- Categories
- Products
- Order Items
- Orders
- Customers
Download the retail_db database
cd /tmp
wget https://raw.githubusercontent.com/nitendragautam/samp_data_sets/master/retail_data/retail_db.sql
Create database retail_db and user retail_dba for this Database
mysql -root -padmin
mysql>CREATE database retail_db;
mysql>CREATE user retail_dba identified by 'hadoop';
Grant the required permission to the retail_dba user
mysql> GRANT ALL ON retail_db.* to retail_dba;
Flush all the Privileges
mysql> flush privileges;
Login into MySQL using the retail_dba user and load the retail_db data
Previously, had set up the retail_dba
user using the password hadoop
. Now we will log in to MySQL using this retail_dba
user.
$ mysql -u retail_dba -phadoop
mysql> USE retail_db;
Load the retail_db
Use the directory location where we downloaded the Retail Database /tmp/retail_db.sql
to load the file in MySQL databases.
mysql>source /tmp/retail_db.sql;
It will load the data into the MySQL Instance as a retail_dba
user.
Display all the tables within the Database
mysql> show tables;
+---------------------+
| Tables_in_retail_db |
+---------------------+
| categories |
| customers |
| departments |
| order_items |
| orders |
| products |
+---------------------+
6 rows in set (0.00 sec)
Give me the Unique Order date in which Order status is Pending Payment State
SELECT DISTINCT(Year(order_date)) FROM orders WHERE order_status='PENDING_PAYMENT';
Number of Customers whose Order status is Pending Payment
mysql> SELECT COUNT(*) as count FROM customers c JOIN orders o ON(c.customer_id=o.order_customer_id) WHERE o.order_status='PENDING_PAYMENT';
+-------+
| count |
+-------+
| 15030 |
+-------+
1 row in set (0.06 sec)
Number of Customers whose Order status is Closed
mysql> SELECT COUNT(*) as count FROM customers c JOIN orders o ON(c.customer_id=o.order_customer_id) WHERE o.order_status='CLOSED';
+-------+
| count |
+-------+
| 7556 |
+-------+
1 row in set (0.03 sec)
Get Customer details, and order status who live in the state of Texas
SELECT c.customer_fname,c.customer_lname,o.order_date,o.order_status FROM customers c JOIN orders o ON(c.customer_id=o.order_customer_id) WHERE c.customer_state='TX' LIMIT 20;
Completed version of the Script
You can find the complete version of the script at the below location.
To run this script, you can follow the below steps.
- Get the Script in your local environment
cd ~
wget https://gitlab.com/nitendragautam/blog_tutorials/raw/master/database/load_retail_data_mysql.sh
- Make the Script executable using Change Mode
chmod
command
cd ~
chmod a+x load_retail_data_mysql.sh
- Run the Script by passing your MySQL username and password
Syntax: ./load_retail_data_mysql.sh -u <User_Name> -p<Password>
cd ~
./load_retail_data_mysql.sh -u root -padmin