When we are running spark streaming applications with dynamic allocations of executors, we might observe scenario like even though there is no incoming data from source or no active jobs are in progress still some resources are held by applications

We need to scale down executors to zero when no data is getting processed as it helps to improve performance of applications

we need to follow below steps to achieve the same

Step 1: Disable spark.dynamicAllocation.enabled to false

Step 2 : Enable spark.Streaming.dynamicaAllocation.enabled to true

Step 3 : Set the minimum executors to 0, spark.streaming.dynamicAllocation.minExecutors=0

Skewed Table can improve the performance of tables that have one or more columns with skewed values. By specifying frequently occurring values ​​(severe skewing), hive will record these skewed column names and values ​​in the metadata, which can be optimized during join . …

Recent time most of the companies have introduced coding round where we panel will ask you few problem statements and you need to write code for it.

Below are few pointers needs to keep in mind during coding rounds which may help you clear them

  1. Listen to the problem statement…

Performance plays key role in big data related projects as they deals which huge amount of data. So when you are using Hive if you keep few things in mind then we can see dramatic change in the performance

  • Partitions
  • Bucketing
  • File formats
  • Compression
  • Sampling
  • Tez
  • Vectorization
  • Parallel execution
  • CBO

Partitions :

Kafka is a publish-subscribe based durable messaging system exchanging data between processes, applications, and servers

Key components/terminologies in Kafka Architecture:

Producers, Topic, Consumers, Broker ,Consumer Group, Partitions, Offset ,ZooKeeper ,Replications ,Leader, Kafka API’s

Kafka basic flow:

Kafka producers write to topics, while Kafka consumers read from topics. Topics represent commit…

Let’s have a look about few hive table configuration properties

Mutable and Immutable

Hive provides an option to create mutable table and immutable table.

Mutable Table

All the tables by default are mutable. Mutable table allows appending the data when data already present in table.

Immutable Table

A Table can be created as…

Few key points to remember while doing building spark applications to optimise performance

  1. Spark UI (Monitor and Inspect Jobs).
  2. Level of Parallelism (Clusters will not be fully utilised unless the level of parallelism for each operation is high enough. Spark automatically sets the number of partitions of an input file…

Sai Prabhanj Turaga

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store