When we are running spark streaming applications with dynamic allocations of executors, we might observe scenario like even though there is no incoming data from source or no active jobs are in progress still some resources are held by applications
We need to scale down executors to zero when no data is getting processed as it helps to improve performance of applications
we need to follow below steps to achieve the same
Step 1: Disable spark.dynamicAllocation.enabled to false
Step 2 : Enable spark.Streaming.dynamicaAllocation.enabled to true
Step 3 : Set the minimum executors to 0, spark.streaming.dynamicAllocation.minExecutors=0
Skewed Table can improve the performance of tables that have one or more columns with skewed values. By specifying frequently occurring values (severe skewing), hive will record these skewed column names and values in the metadata, which can be optimized during join . …
Recent time most of the companies have introduced coding round where we panel will ask you few problem statements and you need to write code for it.
Below are few pointers needs to keep in mind during coding rounds which may help you clear them
Performance plays key role in big data related projects as they deals which huge amount of data. So when you are using Hive if you keep few things in mind then we can see dramatic change in the performance
Kafka is a publish-subscribe based durable messaging system exchanging data between processes, applications, and servers
Key components/terminologies in Kafka Architecture:
Producers, Topic, Consumers, Broker ,Consumer Group, Partitions, Offset ,ZooKeeper ,Replications ,Leader, Kafka API’s
Kafka basic flow:
Kafka producers write to topics, while Kafka consumers read from topics. Topics represent commit…
Let’s have a look about few hive table configuration properties
Hive provides an option to create mutable table and immutable table.
All the tables by default are mutable. Mutable table allows appending the data when data already present in table.
A Table can be created as…
Few key points to remember while doing building spark applications to optimise performance