DevOps | Cloud | Analytics | Open Source | Programming





Difference Between Spark Cluster & Client Deployment Modes

Let's try to understand the Difference Between Spark Cluster & Client Deploy Modes. It is confusing concept for many. We will see if we can understand it and get clarified. But first some Basics. Spark has a notion of a Worker or Slave node(s) which is used for computation.

Different Parts of a Spark Application Code , Class & Jars

This Post explains the Spark Code Internals – Different Parts of a Spark Application Code , Class & Jars . Also how those Spark codes run in a Cluster . A Spark Application has 3 Main components as mentioned below. Please note each of these components is a separate JVM andthey contain different classes in their own classpath(s) ):

Django vs Flask vs FastAPI Framework Differences

Here we will try to understand Difference Between the Frameworks - Django vs Flask vs FastAPI. These are popular frameworks used by the Python community for building various Web Apps, Rest APIs , Websites etc.

Docker Commands List With Examples and Cheatsheet

This is Docker Commands List With Examples and Cheatsheet. We are trying to list down some of the important and often used commands .

Best Practices for Dependency Problem in Spark

Resolve Dependency Problem in Spark . While building any Spark Application - this is one of the main concerns that any Engineer should have. This post tries to put some guidance as to how you could do that. When building and deploying Spark applications all dependencies require compatible versions.

Difference between Hadoop, HDFS , Map Reduce, Spark , Spark Streaming & SparkSql

This post explains the difference between the Terminologies ,Technologies & Difference between them – Hadoop, HDFS,  Map Reduce, Spark, Spark Sql & Spark Streaming For a newbie who has started to learn Big Data , the Terminologies sound quite confusing . So lets try to explore each of them and see where they all fit in.  

How to Improve Spark Application Performance –Part 1?

This post explains - How to Improve Spark Application Performance. Performance Tuning is a very important aspect of Spark programming. In this post, I will try to explain some of the Spark Tuning Techniques. Below are some the steps and techniques that I follow to improvise Spark Application Performance -

How to Purge a Running Kafka Topic ?

This post explains how to Purge a Kafka Topic.

How to Send Large Messages in Kafka ?

Many time while trying to send large messages over Kafka it errors out with an exception – “MessageSizeTooLargeException”. These mostly occurs on the Producer side.

How Spark Handles Dataset Bigger than Available Memory ?

How Spark handles if a Dataset is Bigger than Available Memory ? Spark can handle dataset even if the dataset size is larger than RAM available. To put it simply , if a dataset doesn’t fit into memory , then Spark spills it to disk.  Although Spark processing is preferably in-memory , but Spark’s capability is not restricted to just memory-only though. Likewise, if dataset is cached and doesn’t fit in memory, Spark can either

How To Fix - Data Skewness in Spark (Salting Method)

In this post , we will see How to Fix - Data Skewness in Spark using Salting Method. Data skew problem is basically related to an Uneven or Non-Uniform Distribution of data . In Real-Life Production scenarios, we often have to handle data which is far from ideal data. Hence it is imperative that we are equipped to handle such data scenarios.   We will try to understand Data Skew from Two Table Join perspective.

How to Fix Docker Error - "Got Permission Denied While Trying To Connect To The Docker Daemon Socket" ?

In this post, we will explore How to fix Docker Error - "Got Permission Denied While Trying To Connect To The Docker Daemon Socket". Sometimes when we run any docker command, we might face the below error -

How To Fix - Error: You must be logged in to the server (Unauthorized) in Kubernetes ?

In this post, we will see How To Fix - Error: You must be logged in to the server (Unauthorized) in Kubernetes.  

How To Fix - "HTTP 500 Server Error" When Setting DEBUG = False in Django on Azure\AWS ?

In this post, we will see How To Fix - "HTTP 500 Server Error" When Setting DEBUG = False in Django. How the error might look like in the terminal or application. You might also find this error when you upgrade your Django version and start working with the newer version.

How To Fix Kafka Docker Error - "Brokers Does Not Meet The Required Replication Factor"

In this post , we will see How To Fix Kafka Docker Error - "Brokers Does Not Meet The Required Replication Factor" . Replication Factor in Kafka might give error , if it is not properly configured or if there are any discrepancies. This issue especially occurs in Docker implementation of Kafka. See an example of the error below -

How To Fix Kafka Error - "org.apache.kafka.connect.errors.DataException"

In this post , we will see How to Fix Kafka Error - "org.apache.kafka.connect.errors.dataexception".