Apache Hadoop is an open-source framework that is used to store and process large datasets whose size ranges from gigabytes to petabytes of data. This framework uses multiple commodity computers to analyze a large amount of data quickly, in comparison to traditional software. Hadoop is not an ETL (Extract, Transform and Load) or an ELT (Extract, Load and Transform) by itself.
Apache Hadoop is a build-up to scale up from one server to multiple servers, each of them offering computation and storage at the local level. It provides high availability regardless of the fact that the hardware used is of low end or high end. It detects the failures at the application level and handles them before the job itself fails.
This page will guide you through different topics that are needed for learning Hadoop-based technology.
Table of Contents
ToggleHadoop Basics Topics
Hadoop Advanced Topics
- Hadoop Yarn Commands
- Joins using MapReduce Framework
- Data Compression techniques in Hadoop Framework
- Speculative Execution in Hadoop
- Finding Right hardware for Hadoop Cluster
- Apache Hadoop 3 Changes
- Difference between Apache Hadoop/HDFS and HBase
- Rack Awareness in Hadoop HDFS
- What is Heartbeat in Hadoop Framework?
- What is Hadoop Task Tracker?
- Hadoop Job Tracker