Apache Spark is a distributed, in-memory, and disk-based optimized open-source framework which does real-time analytics using Resilient Distributed Data(RDD) sets. It includes a streaming library, and a rich set of programming interfaces to make data processing and transformation easier.
This page will guide you through different topics one needs for learning SparkĀ and its related technologies.
Table of Contents
ToggleBasics Concepts
Advanced Topics
- Spark Resilient Distributed Dataset(RDD)
- Installing Apache Spark on Linux
- Data Locality in Spark
- Caching and Persisting Mechanism in Spark
- Apache Spark Shared Variables
- Accessing Hive in HDP3 using Apache Spark
- Submit Apache Spark Job with REST API
- SparkSession in Apache Spark
- User-Defined Aggregate Functions(UDAF) Using Apache Spark