This post explains Sample Code - How To Save DataFrame as Different Formats in PySpark (Json, Parquet, ORC, Avro,CSV) . We will consider the below file formats -
In this post , we will try to understand the Spark Context vs Spark Session Differences. Also we will explore why exactly it is advisable to use the Spark Session (instead of Spark Context).
This post explains the difference between - SparkSession SparkContext SQLContext HiveContext. The difference between Spark Session vs Spark Context vs Sql Context lies in the version of the Spark versions used in Application. As per Spark versions > Spark 2.0 , A pictorial Representation of the Hierarchy between - SparkSession SparkContext SQLContext HiveContext
In this Part 1 of the post , I will write some SparkSQL Sample Code Examples in PySpark . These are the Ready-To-Refer code References used quite often for writing any SparkSql application. Hope you find them useful. Below are some basic points about SparkSQL -
Many people haven’t heard about Yandex . It is one of the Biggest Corporations based out of Russia. They are the Biggest Rival of Google in Russia. Some people even treat it as the Google of Russia. Below are some of the Interesting Facts about Yandex –
Screen Recording is one of the most necessary aspects of creating videos or presentations – be it for creating Youtube Videos or making video presentation etc. Below is a comprehensive list of all the different Screen Recorder Software for Windows available for Free .
This post helps you to Understand The Cloudera Cluster Internals . Since it is pre-packaged Big Data platform , hence it is interesting to know and understand what runs where compared to a Vanilla Manual installation.
In this post, we will see How to Clean-up and Remove Docker Images. With prolonged usage of running Docker, it leaves a lot of images in system. It is advisable to keep a housekeeping practice of cleaning up and removing the unused Docker images. Use the below options to cleanup and remove unnecessary Docker images in the system.
In this Post , we will learn What Are The Most Important Metrics to Monitor in Kafka and How To Monitor Important Performance Metrics in Kafka ? Kafka monitoring is a Crucial Part of the Process. Since Kafka is Big and Complex in Architecture , when Something goes down , it is a head-scratching task for the Developers to find out the root cause. Having a handy list of metrics to monitor at the First hand helps in this regard.
More often than not, there are so many unnecessary services or apps running in Windows. These consume RAM & CPU & slows down your system. It is really handy if we know how to Permanently Disable these Apps or Services .
In this post , we will try to understand the Difference of Using Python List as Stack vs Queue vs List Comprehension vs Nested Comprehension. or How To Use a Python List as Stack vs Queue vs Comprehension vs Nested Comprehension. Both the Video content as well as the Notes are given below.
Today I am sharing some information about online distance MBA in India. MBA is one of the most demanded Post Graduate program in India. However, if you are a working professional, a fresher with a job or a housewife who wants to fulfill her dream of higher studies. Or anyone who wants to be prepared to cope with the digital world, a distance MBA is always the right program for you. I have covered the best and picked up the best Universities offering distance MBA in India.
Assuming you are planning to do some CPU & GPU intensive tasks – might be Practicing Deep Learning or Machine Learning models with Heavy Datasets or creating Multi-Node Cluster for Big Data. On either cases , it is a recommended to have a Laptop with below specifications –
In this post, I will explain the Spark-Submit Command Line Arguments(Options). We will touch upon the important Arguments used in Spark-submit command. And at the last , I will collate all these arguments and show a complete spark-submit command using all these arguements. We consider Spark 2.x version for writing this post.
It is important to understand Spark Execution Modes - Local, Client & Cluster Modes . This is a Simple yet very crucial aspect to understand from a Big Data system point of view. In this post , I will try to explain these in very simple & easy terms.