Learn the different ways that you can iterate a Pandas DataFrame using Python

Image for post
Image for post
Photo by Maicol Santos on Unsplash

Introduction

At times as a Data Scientist, we are going to encounter poor quality data. To be successful we need to be able to effectively manage data quality issues before any analysis. Thankfully there are several powerful open-source libraries that we can utilise to efficiently process data such as Pandas. Today we are going to look at the different ways that we can loop over a DataFrame and access its values. Iterating a DataFrame can be incorporated in steps post initial exploratory data analysis to begin cleansing raw data.

Getting Started

What is Pandas?

For those of you that are new to data science or unfamiliar…


Learn how to read a CSV file and create a Pandas DataFrame

Image for post
Image for post
Photo by Mika Baumeister on Unsplash

Introduction

As a Data Analyst or Data Scientist, you will frequently have to combine and analyse data from various data sources. A data type I commonly get requested to analyse is CSV files. CSV files are popular within the corporate world as they can handle powerful calculations, are easy to use and are often the output type for corporate systems. Today we will demonstrate how to use Python and Pandas to open and read a CSV file on your local machine.

Getting Started

You can install Panda via pip from PyPI. If this is your first time installing Python packages, please refer to…


How to massage a Pandas DataFrame into the shape you need

Image for post
Image for post
Photo by Todd Quackenbush on Unsplash

Introduction

Last article we shared an embarrassing moment which encouraged us to learn and use Pandas to pivot a DataFrame. Today we are going to look at Pandas built-it .melt() function to reverse our pivoted data. The .melt() function comes in handy when you need to reshape or unpivot a DataFrame.

Getting Started

Before ripping in, if you’re yet to read Pivoting a Pandas DataFrame or haven’t been exposed to Python Pandas previously, we recommend first beginning with Pandas Series & DataFrame Explained or Python Pandas Iterating a DataFrame. …


Two lines of code that saved me hours of work

Image for post
Image for post
Photo by Scott Graham on Unsplash

Introduction

One of the funniest moments you will have as a Data Analyst or Developer is coming across code you wrote as a junior. This moment recently happened when a stakeholder requested an update of an extract I provided when I first came on board. The request was to analyse casual hours worked during each calendar year for current staff members. Embarrassingly, my naive approach was to create columns using subqueries within the main select statement to segregated the years. Whilst this approach worked, the overall performance of the query was terrible.

A Better Approach

My initial approach meant aggregating casual hours worked between…


Learn how to output multiple Pandas DataFrames to an Excel

Image for post
Image for post
Photo by Elena Loshina on Unsplash

Introduction

Being able to extract, transform and output data is a crucial skill to develop to be a successful Data Analyst. Today we are going to look at how to use Pandas, Python and XlsxWriter to output multiple DataFrames to the one Excel file. We are going to explore outputting two DataFrames vertically and horizontally to the same tab, then splitting the two DataFrames across two tabs.

Getting Started

Today’s story builds on what we covered in How to Combine Pandas, Python & XlsxWriter, where we looked at outputting your first DataFrame to Excel. …


Learn how to drop duplicates from a Pandas DataFrame to improve your data quality

Image for post
Image for post
Photo by Samantha Lam on Unsplash

Introduction

Dropping duplicates from your data sets is a task you will regularly have to do as a Data Analyst. Whilst in some cases, duplicates may be valid frequently, they have been created through lax data integrity or incorrect joining methods during data extraction. To be successful as a Data Analyst, you need to be able to identify effectively invalid duplicates and remove them from your data sets. Not removing these duplicates will affect the quality of your data analysis. Today's story will form a guide that you can refer back to when needing to identify and remove duplicates.

Getting Started

This story…


Learn how to output a Pandas DataFrame to Excel with formatting using XlsxWriter

Image for post
Image for post
Photo by Safar Safarov on Unsplash

Introduction

Being able to extract data from a database and output it to Excel is a crucial skill to have as a Data Analyst. Today we are going to look at using a Python package called XlsxWriter to output a small DataFrame to Excel. XlsxWriter is a powerful package that you can use to auto-format Excel worksheets, change the styling and insert objects such as tables.

Installing XlsxWriter

There are several ways that you can install XlsxWriter. The easiest method would be using the Python package installer called pip. Open up a terminal and run the command pip install xlsxwriter.

Hello World

To test that…


Tips and tricks you can use to save time as a Data Analyst and increase your productivity.

Image for post
Image for post
Photo by Marc Sendra Martorell on Unsplash

Introduction

Today I am going to share with you several tools, packages and code snippets that I have used and developed during my time as a Data Analyst. In our roles as Data Analysts, there are going to be times where you are required to rerun the same report, run a similar report with different parameters or apply the same statistical analysis over differing datasets. Below I will give you a brief overview of some of the tools that you will be able to incorporate into your workflow as a Data Analyst to increase your productivity.

Version Control

For version control, I am…


A quick how-to guide for merging Pandas DataFrames in Python

Image for post
Image for post
Photo by MILKOVÍ on Unsplash

Introduction

As Data Scientist, we will often find that we are required to analyse data from multiple data sources at the one time. To be successful at achieving this, we need to be able to merge different data sources using a variety of methods efficiently. Today we are going to look at using Pandas built-in .merge() function to join two data sources using several different join methods.

Getting Started

For those of you that are new to data science or haven’t been exposed to Python Pandas yet, we recommend first beginning with Pandas Series & DataFrame Explained or Python Pandas Iterating a DataFrame


A comprehensive guide to understanding Pandas Series and DataFrame data structures

Image for post
Image for post
Photo by Eleonora Albasi on Unsplash

Introduction

To be successful as a Data Scientist one needs to be continuously learning and improving our skills across a wide range of tools. A tool synonymous with Data Science these days is Pandas. Pandas is an incredibly powerful open-source library written in Python. It offers a diverse set of tools that we as Data Scientist can use to clean, manipulate and analyse data. Today we are beginning with the fundamentals and learning two of the most common data structures in Pandas the Series and DataFrame.

Getting Started

Installing Pandas & Numpy

Pandas and Numpy are open-source libraries written in Python that you can utilise in your…

Dean McGrath

🔥 Aspiring Data Analyst ✍️ Towards Data Science 🌏 deanjmcgrath.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store