Analysis & Data Visualization: Online Sales Case Study

Enni Maedani
5 min readNov 21, 2021

--

Photo by Tima Miroshnichenko via Pexels

In this project, I will analyze and visualize data using Google Colab. The data used is online sales data from an online store.

Before analyzing and visualizing the data, the first step is to open the online sales data in Microsoft Excel.

Opening Online Sales Data in Microsoft Excel

Data by Product
Data by Month

The worksheet contains two sets of data: Data by Product and Data by Month. The former includes information on:

  1. Product Type
  2. Net Quantity
  3. Gross Sales
  4. Discount
  5. Returns
  6. Total Sales

The latter includes information on:

  1. Month
  2. Year
  3. Total Orders
  4. Gross Sales
  5. Discounts
  6. Returns
  7. Net Sales
  8. Shipping
  9. Total Sales

The next step is to identify which information can be analyzed and displayed.

Identifying Information to be Analyzed and Displayed

Based on the data, the following information can be obtained:

  1. When and why did the highest and lowest sales occur?
  2. How has sales growth been?
  3. Which products contribute the most to sales?

These questions can be answered by processing the data using Google Colab. This step will help to determine which data will be used later.

Processing Data and Creating Data Visualization

To begin processing the data, import Numpy, Pandas, and Matplotlib libraries. Here’s how to import them:

Next, import the dataset and create a data frame from the imported dataset file.

After running it, the result will look like this:

The data frame contains 1,175 rows and 6 columns. Checking the contents of the entire data frame will take a long time, but we only need data on Product Type and Net Quantity. To address this, a new data frame should be created that shows the number of Net Quantities for each Product Type. Here’s how to do it:

After running it, the result will look like this:

Next, create a chart for data visualization. In this case, a bar chart would be suitable for conveying information about the quantity of a category.

After running it, the result will look like this:

It is time to process the second dataset, which contains data by month. The process for doing so is the same as before: import the dataset and create its corresponding data frame.

After running it, the result will look like this:

Then, renamed each column first to make it easier to call the data later.

After running it, the result will look like this:

To make it easier for the next process, it is necessary to delete unnecessary columns by:

After running it, the result will look like this:

Next is the initialization process for the year data, to make it easier to call the data later.

It’s time to start the data visualization process. To display the data by month, I recommend using a line chart. This type of chart is ideal for viewing trends or developments over time.

After running it, the result will look like this:

Next, create a data frame to show the sales growth, which requires merging the month and year columns.

Then, the next step is to create its line chart.

After running it, the result will look like this:

Analyzing the Processed Data Results

After going through the processing and visualization process, it’s time for the analysis process.

Based on the chart, it can be seen that the product with the Basket type is the product with the highest sales. There are four top revenue-generating product types: Basket, Art & Sculpture, Jewelry, and Kitchen. These four product types represent 82% of total sales or 72% of total items sold.

Based on the chart, it can be seen that the highest total sales occurred in December 2019. In addition, the highest average sales from 2017 to 2019 occurred in December. This is because during that month there are Christmas and New Year celebrations. The previous chart shows that the Christmas product type ranks sixth in sales.

Based on the chart, it can be seen that sales fluctuate over the past 2 years. However, the chart also shows significant growth, where December is the month with the highest sales.

This concludes the analysis and data visualization of the case study. If you notice any errors or deficiencies in this writing, please do not hesitate to bring them to my attention and I will correct them accordingly. Thank you :)

--

--

Enni Maedani

A room of my curiosity, ideas, perspectives, concerns, and a dash of my knowledge.