DevOps | Cloud | Analytics | Open Source | Programming





How To Read Various File Formats in PySpark (Json, Parquet, ORC, Avro) ?

This post explains Sample Code - How To Read Various File Formats in PySpark (Json, Parquet, ORC, Avro). We will consider the below file formats -

  • JSON
  • Parquet
  • ORC
  • Avro
  • CSV

How To Remove all Python packages installed by pip?

This post explains - How To Remove all Python packages installed by pip. To do that lets try the below  

How To Run Jupyter NoteBook Online or in Cloud for Free ?

How To Run Jupyter NoteBook Online or in Cloud for Free ? Some of the Options - Google Colab Kaggle Kernels DeepNote Datalore Binder  

How To Select Between - PyTorch , Keras or Tensorflow?

This post explains How To Select Between PyTorch , Keras or Tensorflow. We will see each of these and come at a Non-Politically Correct decision.

How To Set up Apache Spark & PySpark in Windows 10 ?

This post explains How To Set up Apache Spark & PySpark in Windows 10 . We will also see some of the common errors people face while doing the set-up. Please do the following step by step and hopefully it should work for you -  

How To Set-up Cloudera Manager in Cloudera Quickstart VM ?

 In this post , I will explain How To Set-up Cloudera Manager in Cloudera Quickstart VM . You can download the Cloudera Quickstart VM from

How to Setup a Custom Logging for Spark Driver and Executor in Yarn ?

In this post, we will see How to Setup a Custom Logging for Spark Driver and Executor in Yarn . When you submit a Spark job to be run within an Yarn cluster , you are essentially using a cluster mode approach ( --master yarn-cluster) .

How to Setup a Multi Node Kafka Cluster or Brokers ?

In this post , we will see - How to Setup Multi Node Kafka Cluster or Brokers ? As we all know the Real strength and True purpose of Kafka System is inculcated in a Multi Node setup. An multi node Kafka Architecture takes care of Data Replication and Safety as well as efficient Topic Partitions.

How To Setup Spark Scala SBT in Eclipse

This post explains How To Setup Spark Scala SBT in Eclipse . Normally running any Spark application especially in Scala is a bit lengthy since you need to code, compile , build jar and finally deploy or execute it. 

How to Stop Windows 10 Auto Update

It is not advisable to Stop Windows Update – as each update contains necessary boosters or requisites for the Windows Operating system to run properly. However at times , you might still have the need or the necessity to stop Windows Update.

How to Stream Data with Azure Event Hubs from Kafka (Java) ?

In this post, we will see How to Stream Data with Event Hubs using Kafka Protocol. Both Azure Event Hubs and Apache Kafka a data streaming platform and event ingestion service But Event Hubs comes without on-premise option. Azure Event Hubs for Kafka supports Apache Kafka version 1.0 and later.

How To Save & Reload a Python Machine Learning Model using Pickle ?

In this post lets understand How To Save a Machine Learning Model in Python using Pickle ? Pickle is used quite often in Python especially in Machine Learning modelling. We will try to understand what is & How to use Pickle.  

How To Use the Android Manifest File and Its Common Settings?

In this post, we will explore How To Use the Android Manifest File and Its Common Settings. The AndroidManifest.xml file is required for every app project and is located at the root of the project source set. The manifest file contains important information about the app for the Android build tools, operating system, and Google Play.

How To Visualize the Django Models ?

In this post, we will see - How To Visualize the Django Models. Follow the steps explained in the below video -

Jungle Bot – Translate & Tweet Jungle Tree Sounds !

Jungle Bot – It can Translate & Tweet Jungle Tree Sounds ! But can a Tree Tweet ? If you ever had a doubt , then look no further more ! Existential Jungle Bot is audio foot-print of ecosystem comprising trees, animals or anything .

How To Access Spark Logs in AWS EMR ?

In this post , we will see How To Access Spark Logs in AWS EMR . Depending on what mode you run your spark job - client or cluster , the logs access process can vary. We will jot down the steps to access the logs in both the cases. However , as an addendum, I would like you to go through the below posts for some additional idea(Not specific to AWS but in General) as Spark is deployed\submitted using Yarn as Resource Manager in an EMR cluster.