Let's try to understand the Difference Between Spark Cluster & Client Deploy Modes. It is confusing concept for many. We will see if we can understand it and get clarified. But first some Basics. Spark has a notion of a Worker or Slave node(s) which is used for computation.
This Post explains the Spark Code Internals – Different Parts of a Spark Application Code , Class & Jars . Also how those Spark codes run in a Cluster . A Spark Application has 3 Main components as mentioned below. Please note each of these components is a separate JVM andthey contain different classes in their own classpath(s) ):
Here we will try to understand Difference Between the Frameworks - Django vs Flask vs FastAPI. These are popular frameworks used by the Python community for building various Web Apps, Rest APIs , Websites etc.
This is Docker Commands List With Examples and Cheatsheet. We are trying to list down some of the important and often used commands .
Resolve Dependency Problem in Spark . While building any Spark Application - this is one of the main concerns that any Engineer should have. This post tries to put some guidance as to how you could do that. When building and deploying Spark applications all dependencies require compatible versions.
This post explains the difference between the Terminologies ,Technologies & Difference between them – Hadoop, HDFS, Map Reduce, Spark, Spark Sql & Spark Streaming For a newbie who has started to learn Big Data , the Terminologies sound quite confusing . So lets try to explore each of them and see where they all fit in.
This post explains - How to Improve Spark Application Performance. Performance Tuning is a very important aspect of Spark programming. In this post, I will try to explain some of the Spark Tuning Techniques. Below are some the steps and techniques that I follow to improvise Spark Application Performance -
This post explains how to Purge a Kafka Topic.
Many time while trying to send large messages over Kafka it errors out with an exception – “MessageSizeTooLargeException”. These mostly occurs on the Producer side.
How Spark handles if a Dataset is Bigger than Available Memory ? Spark can handle dataset even if the dataset size is larger than RAM available. To put it simply , if a dataset doesn’t fit into memory , then Spark spills it to disk. Although Spark processing is preferably in-memory , but Spark’s capability is not restricted to just memory-only though. Likewise, if dataset is cached and doesn’t fit in memory, Spark can either
In this post , we will see How to Fix - Data Skewness in Spark using Salting Method. Data skew problem is basically related to an Uneven or Non-Uniform Distribution of data . In Real-Life Production scenarios, we often have to handle data which is far from ideal data. Hence it is imperative that we are equipped to handle such data scenarios. We will try to understand Data Skew from Two Table Join perspective.
In this post, we will explore How to fix Docker Error - "Got Permission Denied While Trying To Connect To The Docker Daemon Socket". Sometimes when we run any docker command, we might face the below error -
In this post, we will see How To Fix - Error: You must be logged in to the server (Unauthorized) in Kubernetes.
In this post, we will see How To Fix - "HTTP 500 Server Error" When Setting DEBUG = False in Django. How the error might look like in the terminal or application. You might also find this error when you upgrade your Django version and start working with the newer version.
In this post , we will see How To Fix Kafka Docker Error - "Brokers Does Not Meet The Required Replication Factor" . Replication Factor in Kafka might give error , if it is not properly configured or if there are any discrepancies. This issue especially occurs in Docker implementation of Kafka. See an example of the error below -
In this post , we will see How to Fix Kafka Error - "org.apache.kafka.connect.errors.dataexception".