Introduction to Apache Spark SQL

Post author:nitendratech
Post category:Spark
Post comments:1 Comment
Post published:October 10, 2017

Introduction to Apache Spark Streaming

Post author:nitendratech
Post category:Spark
Post comments:1 Comment
Post published:October 4, 2017

Apache Spark Streaming execution model is advantageous over traditional streaming systems for its fast recovery from failures, dynamic load balancing, streaming and interactive analytics, and native integration.

Spark Resilient Distributed Dataset(RDD)

Post author:nitendratech
Post category:Spark
Post comments:5 Comments
Post published:September 30, 2017

Resilient Distributed Dataset (RDD) is the fault-tolerant and immutable primary data structure/abstraction in Apache Spark. It is a distributed collection of objects. The term ‘resilient’ in ‘Resilient Distributed Dataset’ refers…

What is Spark Shared Variables?

Post author:nitendratech
Post category:Spark
Post comments:0 Comments
Post published:September 5, 2017

Shared variables are an abstraction in Apache Spark which is used in parallel operations in different nodes. When Spark runs a function in parallel as a set of tasks on…

Installing Apache Spark on Linux

Post author:nitendratech
Post category:Spark
Post comments:0 Comments
Post published:June 18, 2017

Apache Spark is an open-source cluster-computing framework. This post will explain the steps for installing prebuilt version of Apache Spark 2.1.1 as a stand alone cluster in a Linux system. I have used Ubuntu as a debains based OS for this post.

What is Apache Spark? The Unified engine for large-scale data analytics.

Post author:nitendratech
Post category:Spark
Post comments:28 Comments
Post published:May 10, 2017

Apache Spark is a distributed, in-memory and disk based optimized system which does real-time analytics using Resilient Distributed Data(RDD) Sets.Spark includes a streaming library, and a rich set of programming interfaces to make data processing and transformation easier.