Introduction to Apache Spark SQL

Post author:nitendratech
Post category:Spark
Post comments:1 Comment
Post published:October 10, 2017

Introduction to Apache Spark Streaming

Post author:nitendratech
Post category:Spark
Post comments:1 Comment
Post published:October 4, 2017

Apache Spark Streaming execution model is advantageous over traditional streaming systems for its fast recovery from failures, dynamic load balancing, streaming and interactive analytics, and native integration.

What is a Database and Why it is Important? Facts and Types

Post author:nitendratech
Post category:Database
Post comments:25 Comments
Post published:October 1, 2017

What is Data? Before we learn about databases, we need to first understand what data is. Data is information or facts related to an object that is under consideration. In…

Spark Resilient Distributed Dataset(RDD)

Post author:nitendratech
Post category:Spark
Post comments:5 Comments
Post published:September 30, 2017

Resilient Distributed Dataset (RDD) is the fault-tolerant and immutable primary data structure/abstraction in Apache Spark. It is a distributed collection of objects. The term ‘resilient’ in ‘Resilient Distributed Dataset’ refers…

Joins using MapReduce Framework

Post author:nitendratech
Post category:Hadoop
Post comments:1 Comment
Post published:September 25, 2017

There are 3 types of joins, Reduce-Side joins, Map-Side joins, and memory-backed Joins that can be used to join Tables in MapReduce. Map Side Join Joining at the map side…

Scala Programming Language

Post author:nitendratech
Post category:Scala
Post comments:2 Comments
Post published:September 16, 2017

What is Spark Shared Variables?

Post author:nitendratech
Post category:Spark
Post comments:0 Comments
Post published:September 5, 2017

Shared variables are an abstraction in Apache Spark which is used in parallel operations in different nodes. When Spark runs a function in parallel as a set of tasks on…