Either you work as a Data Engineer or Data Scientist; Jupyter Notebook is a helpful tool. One of the project I was working required two .parquet file comparison, it's mainly schema comparison; not the data. Though the two .parquet was creating from two different sources but outcome (which is .parquet) should be completely alike, schema wise. At the beginning I was manually comparing them then I thought there must be a tool to do that. Well, that's how I found Jupyter notebook which can be useful to compare two .parquet schema.
To compare two .parquet files your development environment need to have Jupyter notebook and python as a prerequisite. This post will describe step by step installation process of Jupyter notebook.
Step 1: Install python version 3.7.9
Python is a prerequisite for Jupyter notebook so at first python need to install.
Please follow the below URL and choose right version to install.
Create python folder under C Drive where downloaded file should be resided.
Make sure to choose 'Customize Installation' and check mark 'Add Python 3.9 to PATH' as shown in figure 4. I followed the customization method to avoid setting up environment variable.
And now hit the Install button. Installation will complete in a minute or two.
Let's test if python installed successfully, open command prompt and write python. If python is installed correctly then you should able to see the python version number and some key help as like below figure 6.
Step 2: Install Jupyter notebook
Let's move to the next step to install Jupyter notebook
Open command prompt and write the below code:
>pip install jupyter
It will open a browser with jupyter notebook as shown in figure 9.
Now you can create a Notebook by choosing 'New' and choose Python 3 as show in fig 10. It will open a new browser tab where you will write the code.
Let's write hello world program in the Jupyter notebook, the browser will look like figure 11 where Code we wrote : print('Hello world') and the output is shown after clicking the 'Run' button.