DevOps | Cloud | Analytics | Open Source | Programming





How To Save DataFrame as Different Formats in PySpark (Json, Parquet, ORC, Avro, CSV) ?

This post explains Sample Code - How To Save DataFrame as Different Formats in PySpark (Json, Parquet, ORC, Avro,CSV) . We will consider the below file formats -

  • JSON
  • Parquet
  • ORC
  • Avro
  • CSV
  • HDFS File

Spark Context vs Spark Session Differences

In this post , we will try to understand the Spark Context vs Spark Session Differences. Also we will explore why exactly it is advisable to use the Spark Session (instead of Spark Context).  

Difference Between SparkSession , SparkContext , SQLContext & HiveContext

This post explains the difference between - SparkSession SparkContext SQLContext HiveContext. The difference between Spark Session vs Spark Context vs Sql Context lies in the version of the Spark versions used in Application. As per Spark versions > Spark 2.0 , A pictorial Representation of the Hierarchy between - SparkSession SparkContext SQLContext HiveContext

How To Code SparkSQL in PySpark - Examples Part 1

In this Part 1 of the post , I will write some SparkSQL Sample Code Examples in PySpark  . These are the Ready-To-Refer code References used quite often for writing any SparkSql application. Hope you find them useful. Below are some basic points about SparkSQL -

Top 5+ Facts about Yandex (Sometimes Called Google of Russia)

Many people haven’t heard about Yandex . It is one of the Biggest Corporations based out of Russia. They are the Biggest Rival of Google in Russia.  Some people even treat it as the Google of Russia. Below are some of the Interesting Facts about Yandex –

Top 5+ FREE Screen Recording Softwares (for Youtube Videos or Presentations)

Screen Recording is one of the most necessary aspects of creating videos or presentations – be it for creating Youtube Videos or making video presentation etc. Below is a comprehensive list of all the different Screen Recorder Software for Windows  available for Free .

Understand The Cloudera Cluster Internals

This post helps you to Understand The Cloudera Cluster Internals . Since it is pre-packaged Big Data platform , hence it is interesting to know and understand what runs where compared to a Vanilla Manual installation.

How to Remove Old and Unused Docker Images ?

In this post, we will see How to Clean-up and Remove Docker Images. With prolonged usage of running Docker, it leaves a lot of images in system. It is advisable to keep a housekeeping practice of cleaning up and removing the unused Docker images. Use the below options to cleanup and remove unnecessary Docker images in the system.  

How To Monitor Important Performance Metrics in Kafka ?

In this Post , we will learn What Are The Most Important Metrics to Monitor in Kafka and How To Monitor Important Performance Metrics in Kafka ? Kafka monitoring is a Crucial Part of the Process. Since Kafka is Big and Complex in Architecture , when Something goes down , it is a head-scratching task for the Developers to find out the root cause. Having a handy list of metrics to monitor at the First hand helps in this regard.

How To Disable Any Apps or Services in Windows 10

More often than not, there are so many unnecessary services or apps running in Windows. These consume RAM & CPU & slows down your system. It is really handy if we know how to Permanently Disable these Apps or Services .

How To Use Python List as Stack vs Queue vs Comprehension vs Nested Comprehension ?

In this post , we will try to understand the Difference of Using Python List as Stack vs Queue vs List Comprehension vs Nested Comprehension. or How To Use a Python List as Stack vs Queue vs Comprehension vs Nested Comprehension. Both the Video content as well as the Notes are given below.  

Top 5+ Online Distance MBA in India

Today I am sharing some information about online distance MBA in India. MBA is one of the most demanded Post Graduate program in India. However, if you are a working professional, a fresher with a job or a housewife who wants to fulfill her dream of higher studies. Or anyone who wants to be prepared to cope with the digital world, a distance MBA is always the right program for you. I have covered the best and picked up the best Universities offering distance MBA in India.

Best Laptop Hardware Configuration for Coding (DL,ML,Big Data) ?

Assuming you are planning to do some CPU & GPU intensive tasks – might be Practicing Deep Learning or Machine Learning models with Heavy Datasets or creating Multi-Node Cluster for Big Data. On either cases , it is a recommended to have a Laptop with below specifications –

Spark-Submit Command Line Arguments

In this post, I will explain the Spark-Submit Command Line Arguments(Options). We will touch upon the important Arguments used in Spark-submit command. And at the last , I will collate all these arguments and show a complete spark-submit command using all these arguements. We consider Spark 2.x version for writing this post.

Understand Spark Execution Modes - Local, Client & Cluster Modes

It is important to understand Spark Execution Modes - Local, Client & Cluster Modes . This is a Simple yet very crucial aspect to understand from a Big Data system point of view. In this post , I will try to explain these in very simple & easy terms.