Transaction Isolation Level in Database

Post author:nitendratech
Post category:Database
Post comments:1 Comment
Post published:November 16, 2021

Relational databases are one of the actively used applications in current Information technology-based companies. They are mainly used to retrieve, store, update and delete information for many verticals like financial…

What is the Data Ingestion Process? Key Concepts Needed for Data Strategy

Post author:nitendratech
Post category:Big Data
Post comments:0 Comments
Post published:September 30, 2021

Data Ingestion or Data Processing is a framework or a process in which data from multiple sources is captured to a storage layer that can be accessed, used, and analyzed…

Spark Tutorials

Post author:nitendratech
Post category:Spark
Post comments:2 Comments
Post published:September 11, 2021

Apache Spark is a distributed, in-memory, and disk-based optimized open-source framework which does real-time analytics using Resilient Distributed Data(RDD) sets. It includes a streaming library, and a rich set of…

Important Big Data Terms Everyone Should Know

Post author:nitendratech
Post category:Big Data
Post comments:1 Comment
Post published:July 10, 2021

When someone works with Big data, they need to have knowledge of many concepts and terms. I will introduce some of the important data terms everyone working with Big data should know.

SparkSession in Apache Spark

Post author:nitendratech
Post category:Spark
Post comments:2 Comments
Post published:May 24, 2020

SparkSession has been the main entry point to Spark applications since Spark 2.0. Before Spark 2.0 Spark Context was the main entry point for any Spark applications. We see how…

Caching and Persisting Mechanism in Spark

Post author:nitendratech
Post category:Spark
Post comments:1 Comment
Post published:October 1, 2019

Caching and persistence are optimization techniques for (iterative and interactive) Apache Spark computations. This technique helps to save interim partial results, which can be reused in subsequent stages. These results,…

Data Locality in Spark

Post author:nitendratech
Post category:Spark
Post comments:0 Comments
Post published:September 24, 2019

Data locality refers to how close data is to the code processing it. Having the code and the data together tends to make computations faster in Apache Spark. If the…

Accessing Hive in HDP3 using Apache Spark

Post author:nitendratech
Post category:Spark
Post comments:1 Comment
Post published:July 15, 2019

If you are switching from Hortwonwork Data Platform(HDP) 2.6 To 3.0+, you will have a hard time accessing Hive Tables through the Apache Spark shell. HDP 3 introduced something called…

User-Defined Aggregate Functions(UDAF) Using Apache Spark

Post author:nitendratech
Post category:Spark
Post comments:0 Comments
Post published:May 31, 2019

UDAF stands for User Defined Aggregate functions. Aggregate functions are used to perform a calculation on a set of values and return a single value. It is difficult to write an aggregate function compared to writing a User Defined Functions(UDF) as we need to aggregate on multiple rows and columns. Apache Spark UDAF operates on more than one row or Column while returning a single value results