What are Hadoop Execution Modes?
Apache Hadoop can be used in multiple modes to achieve a different set of tasks. There are three modes in which a Hadoop Mapreduce application can be executed.
Apache Hadoop can be used in multiple modes to achieve a different set of tasks. There are three modes in which a Hadoop Mapreduce application can be executed.
HDFS is a distributed file system that is designed for storing very large files with streaming data access patterns running on clusters of commodity hardware.
Linux is open-source and one of the most popular operating systems. It is one of the most important technological advancements of the last century. It has made a huge impact…
Digital data is now everywhere—in every sector, in every economy, in every organization, and user of digital technology. While this topic might once have concerned only a few data geeks,…
What exactly is Apache Hadoop? Apache Hadoop is an open-source distributed processing framework that is used to store and process large datasets whose size ranges from gigabytes to petabytes of…
Git is a version control system for tracking changes in computer files and coordinating work on those files among multiple people. This post is a collection of important Git command Cheat Sheet that i use in my day to day basis.
Apache Spark is an open-source cluster-computing framework. This post will explain the steps for installing prebuilt version of Apache Spark 2.1.1 as a stand alone cluster in a Linux system. I have used Ubuntu as a debains based OS for this post.
The amount of data in our world has been exploding. Different Companies capture trillions of bytes of information about their customers, suppliers, and operations, and millions of networked sensors are being embedded in the physical world in devices such as mobile phones and automobiles, sensing, creating, and communicating data.
Microservice architecture, or simply microservices, is a distinctive method of developing software applications as a suite of independently deployable, small , modular services in which each service runs a unique process and communicates through a well-defined, lightweight mechanism to serve a business goal.
Apache Spark is a distributed, in-memory and disk based optimized system which does real-time analytics using Resilient Distributed Data(RDD) Sets.Spark includes a streaming library, and a rich set of programming interfaces to make data processing and transformation easier.