AI in Dash Enterprise — save your spot for tomorrow's launch event session!

author photo

Moeen Ul Islam

April 21, 2023 - 15 min read

Why Automate Excel Reports Using Python?

Generating Excel reports can be tedious & time-consuming, but there’s a solution. Automating your reports using Python.

Have you ever been stuck with repetitive data processing tasks, wasting your precious time and energy on fetching, storing, generating reports, and performing calculations? Well, I certainly have! Recently, I was assigned an Excel automation project that had me spinning my wheels for days. That is until I discovered the amazing trio of Python, Dash, and Plotly. With their help, I was able to streamline my work and free up my time for more productive pursuits. 

Automation is the ultimate solution to break free from the tedium of repetitive data processing tasks. Your life becomes much easier with improved efficiency, accuracy, and less human-prone error. With a plethora of tools to automate almost all the possible operations related to Excel, this tutorial will explore the capabilities of Python that makes it a powerful tool when working with Excel files or data in general. Trust me, you don't want to miss out on these tools! if you are in the same boat!

A brief overview of Python

Developed by Guido Von Rossum and first released in 1991, Python is a high-level, general-purpose programming language that gained popularity in recent years because of its simplicity, readability, and versatility, making it a good choice for a major chunk of work that comes under the domain of Software development, research & analysis, AI and Machine learning, Scripting & automation.

Some of the highlights and key features of Python:

  1. Simple syntax that is easy to understand: making it an excellent option for novices.
  2. Extensive collection of libraries and packages: Python comes with a vast library of pre-built modules that can be used for practically everything from web development to data analysis, and scientific computing, so no "building the wheel from scratch" as they say.
  3. Cross-platform: Python code can be run on various platforms/operating systems, including Windows, macOS, and Linux. So no matter what machine you are using it is covered by Python.
  4. Free and open-source: Python is free to use for each and every person on the planet, it is easy to download and install. You can download Python from its official download page.

Is there a way to automate Excel reports?

Excel reports can be automated using a variety of tools, including Power Query, Macros, and Microsoft Power Automate. While each of the aforementioned tools is quite capable of automating, formatting, computations, data extraction, and purification, they all have limitations when it comes to carrying out sophisticated operations or generally interacting with additional applications. This is where Python comes into play. Python is a great choice for automating Excel and performing in-depth data analysis due to its flexibility as a general-purpose programming language and its large ecosystem of external modules for data analysis and visualization. The abundance of libraries it offers, such as Pandas, Matplotlib, Plotly, and others, enables users to create unique, accurate, and consistent reports that incorporate data analysis and visualizations as well as other important information.

How can I automate Excel spreadsheets and reports using Python?

There is a huge list of libraries that we can use for generating the reports from the Excel files in Python e.g., xlrd, openpyxl, pandas, numpy, xlsxwriter, matplotlib, plotly.

Let's take an example of a small dataset stored in a file data.xlsx to demonstrate automating the calculation of total sales.

Price Per Unit

In this example we will be using:

1. Plotly is an interactive, open-source plotting library for Python, covering over 40 different chart types for a variety of statistical, financial, geographic, scientific, and three-dimensional use cases. Access the PyPI package.
2. Pandas is a popular choice to work with Excel files in Python, offering a broad variety of functionality for reporting, data analysis, and manipulation e.g., filtering, grouping, reshaping data, etc. Additionally, a couple of other data analysis tools e.g., matplotlib, scikit-learn, plotly integrate and work well with Pandas.

Install packages

Let's install all the required packages and libraries using Python's inbuilt package manager called pip.

#Installing pandas and plotly
pip install pandas
pip install plotly

Install packages

Importing the required packages.

#import libraries and give them an alias for ease of use
import pandas as pd
import plotly.express as px
import plotly.io as pio

Read the dataset

Let's use read_excel() function provided by the Pandas library to read data from an Excel file and create a Pandas DataFrame, which is a table-like data structure with labelled columns and rows.

Note: Make sure the script and the excel file is located in the same folder.

#reading the dataset and storing in a variable df
df = pd.read_excel('data.xlsx')

Perform the operations

To get the total sales for each row in our dataframe, we will multiply values from the column 'Units Sold' with its corresponding values in the column 'Price per unit'. The resulting values will be stored in the newly created column 'Total Sales'.

# calculating the total sales and adding a new column for the same
df['Total Sales'] = df['Units Sold'] * df['Price per unit']

Result:

Total Sales

Plot the Charts

Let's learn how we can pictorially represent our calculations by plotting a bar chart with Plotly Express's bar() function. We will also need this chart in our report, so let's save it as a png file using Plotly's write_image() function.

# Create a bar chart of Product vs Total Sales
fig = px.bar(df, x='Product', y='Total Sales', title='Product Sales')
# set the border and background color of the chart area
fig.update_layout(
plot_bgcolor='white',
paper_bgcolor='lightgray',
width=800,
height=500,
shapes=[dict(type='rect', xref='paper',
yref='paper',
x0=0,
y0=0,
x1=1,
y1=1,
line=dict(
color='black',
width=2,
),
)
]
)
#display the graph
fig.show()
# Alternatively you can save the bar graph to an image using below line of code
pio.write_image(fig, 'bar_graph.png')

Plot the Charts

Generate the report

Now that we've completed the calculations and plotting chart, let's make a new Excel file to store the report. To prevent data loss, I recommend that you also create a new Excel file instead of storing them in the original one.

NOTE:

Some optional dependencies that you should install before running the below code.
1. openpyxl: pip install openpyxl
2. kaleido: pip install -U kaleido
3. xlsxwriter: pip install xlsxwriter
4. Pillow: pip install Pillow

#creating a new excel file named 'report.xlsx' to avoid data loss incase
writer = pd.ExcelWriter('report.xlsx', engine='xlsxwriter')
#save the updated data to an excel file with sheet name 'Sales Data'
df.to_excel(writer, index=False, sheet_name='Sales Data')
#creating a worksheet object named 'worksheet' within the Excel file that was created earlier using the 'ExcelWriter' function.
worksheet = writer.sheets['Sales Data']
# add the chart to the newly created Excel file
worksheet.insert_image('H1', 'bar_graph.png')
#Finally save the file
writer.save()

The result will look something like this:

Generate the report

Formatting the Excel report

When presenting reports to a client or customer, or for future use, formatting your Excel sheets plays a vital role. It improves the readability and visual appeal of data, allowing for more effective interpretation and sharing of insights. We will use openpyxl to demonstrate different formatting styles in our newly generated report.

This is how our Excel file looks before formatting:

Formatting the Excel report

Let's import the required packages and our Excel reports file:

from openpyxl import load_workbook
from openpyxl.styles import Alignment, Border, Side, PatternFill
# load the existing workbook and select the active worksheet
wb = load_workbook('report.xlsx')
ws = wb.active

1. Alignment:

# set the width of all columns to 20
for col in ws.columns:
ws.column_dimensions[col[0].column_letter].width = 20
# Set the text in cells left-aligned
for row in ws.iter_rows(min_row=1, max_row=6, min_col=1, max_col=ws.max_column):
for cell in row:
cell.alignment = Alignment(horizontal='left')
# save the modified workbook
wb.save('report.xlsx')

Alignment

2. Adding borders:

# create a new workbook and select the active worksheet
wb = load_workbook('report.xlsx')
ws = wb.active
# set a border around a specific range of cells
rrange_border = Border(left=Side(style='medium'),
right=Side(style='medium'),
top=Side(style='medium'),
bottom=Side(style='medium'))
for row in ws.iter_rows(min_row=1, max_row=6, min_col=1, max_col=ws.max_column):
for cell in row:
cell.border = range_border
# save the workbook to a new file
wb.save('report.xlsx')

Adding borders

3. Highlighting the columns:

# define the fill pattern to highlight the column
fill = PatternFill(start_color='FFFF00', end_color='FFFF00', fill_type='solid')
# highlight the first column of the sheet
for cell in ws[ws.max_column]:
cell.fill = fill
# save the modified workbook to a new Excel file
wb.save('report.xlsx')

Highlighting Columns

4. Adding font styles:

# set font styles for first row
font = Font(name='Times New Roman', bold=True)
# set background color for the first row
fill = PatternFill(start_color='23C4ED', end_color='23C4ED', fill_type='solid')
for row in ws.iter_rows(min_row=1, max_row=1):
if row[0].row == 1:
for cell in row:
cell.fill = fill
cell.font = font
# save the workbook
wb.save('report.xlsx')

Font Styles

This is how the Excel file looks after the formatting:

Excel file after formatting

Pivot Tables

Additionally, we can generate pivot tables to summarize and analyze large amounts of data by grouping, filtering, or sorting this data by any particular criteria we want. Dataset used in this example:

Pivot Tables

#import the required packages
import pandas as pd
import plotly.express as px
# Load the Excel file into a pandas DataFrame
df = pd.read_excel("pivot_dataset.xlsx")
# Create a pivot table
pivot_df = pd.pivot_table(df, values='Total', index='Gender', columns='Payment', aggfunc='sum')
# Export the pivot table to an Excel file
pivot_df.to_excel('output_file.xlsx', sheet_name='Sheet1', index=True)

Result:

Cash wallet

Additionally, you can create a Plotly figure for the same to visualize things graphically:

# Create a Plotly figure
fig = px.imshow(pivot_df)
# Show the figure
fig.show()

Plotly figure

Now, all that is left is to schedule the above script to run daily, weekly, monthly, or whatever your use case is. Depending on your needs and scenario, you can use the Crontab, Task Scheduler, Automator, or Anacron.

Wrapping everything into Python functions

Let's wrap all the code into respective Python functions so that the code not only becomes easy to handle but also reusable, and more readable.

import pandas as pd
import plotly.express as px
import plotly.io as pio
from openpyxl import load_workbook
from openpyxl.styles import PatternFill, Border, Side, Alignment, Font
#Function for automating the spreadsheet
def automate_excel_report(inputFilename, outputFilename):
df = pd.read_excel(inputFilename)
# Add a new column for Total Sales
df['Total Sales'] = df['Units Sold'] * df['Price per unit']
# Create a bar graph of units sold using Plotly
fig = px.bar(df, x='Product', y='Total Sales', title='Product Sales')
# set the border and background color of the chart area
fig.update_layout(
plot_bgcolor='white',
paper_bgcolor='lightgray',
shapes=[dict(
type='rect',
xref='paper',
yref='paper',
x0=0,
y0=0,
x1=1,
y1=1,
line=dict(color='black', width=2))])
# Save the bar graph as an image file
pio.write_image(fig, 'bar_graph.png')
writer = pd.ExcelWriter(outputFilename, engine='xlsxwriter')
df.to_excel(writer, index=False, sheet_name='Sales Data')
worksheet = writer.sheets['Sales Data']
worksheet.set_column('F:F', 12)
worksheet.insert_image('H1', 'bar_graph.png')
# save the file
writer.close()
#Function for automating the reports
def automate_excel_formatting(inputFilename, outputFilename):
# # load the Excel file
wb = load_workbook(inputFilename)
ws = wb.active
# # define the fill pattern to highlight the column
fill = PatternFill(start_color='FFFF00', end_color='FFFF00', fill_type='solid')
# highlight the 'Product' column of the sheet
for cell in ws['A:A']:
cell.fill = fill
# =========================Borders======================
# set a border around a specific range of cells
range_border = Border(left=Side(style='medium'),
right=Side(style='medium'),
top=Side(style='medium'),
bottom=Side(style='medium'))
for row in ws.iter_rows(min_row=1, max_row=6, min_col=1, max_col=ws.max_column):
for cell in row:
cell.border = range_border
# =====================column alignmnet====================
# # set the width of all columns to 20
for col in ws.columns:
ws.column_dimensions[col[0].column_letter].width = 20
# add some text to the worksheet and set it left-aligned
for row in ws.iter_rows(min_row=1, max_row=6, min_col=1, max_col=ws.max_column):
for cell in row:
# cell.value = f'Text {cell.column}{cell.row}'
cell.alignment = Alignment(horizontal='left')
#=======================font styles===============
# set font styles for different rows
font_name = Font(name='Times New Roman', bold=True)
# set background color for the first row
fill = PatternFill(start_color='23C4ED', end_color='23C4ED', fill_type='solid')
for row in ws.iter_rows(min_row=1, max_row=1):
if row[0].row == 1:
for cell in row:
cell.fill = fill
cell.font = font_name
# save the workbook
wb.close(outputFilename)
#Function for automating the pivot table
def automate_excel_pivot_table(inputFilename, outputFilename):
# Load the Excel file into a pandas DataFrame
# df = pd.read_excel("pivot_dataset.xlsx")
df = pd.read_excel(inputFilename)
# Create a pivot table
pivot_df = pd.pivot_table(df, values='Total',
index='Gender', columns='Payment', aggfunc='sum')
# Export the pivot table to an Excel file
pivot_df.to_excel(outputFilename, sheet_name='Sheet1', index=True)
# Create a Plotly figure
fig = px.imshow(pivot_df)
# Show the figure
fig.write_image('fig.png')

Now you just need to call the Python functions and pass the filename to it. Example:

#passing my file named 'data.xlsx' as an input will create a new file named 'report.xlsx' as an output
automate_excel_report('data.xlsx', 'report.xlsx')
#passing the 'report.xlsx' file generated using the above function will create a new formatted excel file named 'formattedReport.xlsx'
automate_excel_formatting('report.xlsx', 'formattedReport.xlsx')
#passing the file named 'pivot_dataset.xlsx' will create a new file named pivotTable.xlsx
automate_excel_pivot_table('pivot_dataset.xlsx', 'pivotTable.xlsx')

Viola! Everything is done for you.

Now, the only remaining task is to set a schedule for the script mentioned above, whether it's on a daily, weekly, or monthly basis, depending on your specific needs. You can use tools such as Crontab, Task Scheduler, Automator, or Anacron as per your use case.

Live Dash app showing automated Excel reports

Are you searching for a more efficient and streamlined approach to managing your reporting process? If so, you may want to consider Dash, which can help you manage your Excel reports and streamline your workflow. With a Dash application, you can examine your reports in real-time and make data-driven decisions effortlessly. By utilizing this powerful framework, you can save time and improve the quality of your reporting process.

Let's take the dataset that we used to learn about the pivot tables earlier. We will create a simple dash application that summarizes the data between a selected column and the Payment column. Accordingly, we will generate a heatmap and load it on our dashboard. Surprisingly, accomplishing all of this can be done with just 21 lines of code.

Import all the required packages and dataset
from dash import Dash, html, dcc, Output, Input
import plotly.express as px
import pandas as pd
import openpyxl
df = pd.read_excel("supermarket_sales.xlsx")

Create a Dash application instance and define the layout using HTML and Dash code components.

#instantiating Dash
app = Dash(__name__)
#Defining the web layout
app.layout=html.Div([
html.H1("Excel to Python App"),
dcc.RadioItems(id='col-choice', options=['Gender','Customer type','City'], value='Gender'),
dcc.Graph(id='our-graph', figure={}),
])

Define a callback function called update_graphs() using the @app.callback decorator. The function takes a single input parameter, which corresponds to the value of the radio button group that the user selects.

Inside the function, it creates a pivot table using the pandas to summarize the data by the selected column and Payment column. It then creates a heatmap visualization using plotly.express function and returns the figure object.

@app.callback(
Output('our-graph', 'figure'),
Input('col-choice', 'value')
)
def update_graphs(column_selected):
pivot_df = pd.pivot_table(df, values='Total', index=column_selected, columns='Payment', aggfunc='sum')
fig = px.imshow(pivot_df)
return fig

Run the application which starts the Flask server and renders the application on a web page.

if __name__=='__main__':
app.run_server()

Viola! In just 21 lines of code, your dashboard is ready. Live Demo.

Live dashboard

Conclusion

Automating Excel reports with Python can significantly enhance data processing and reporting tasks, such as generating and formatting reports, creating pivot tables, and developing interactive dashboards with tools like Dash and Plotly. Python, being a user-friendly and versatile programming language, can be easily learned and used by individuals of all skill levels. By following the step-by-step approach outlined in this article, anyone can create reports quickly and accurately, saving valuable time and resources. With the ability to make informed decisions based on current data, individuals can achieve better outcomes. Overall, Python offers a dependable and efficient solution for automating Excel reports and presents a remarkable opportunity for individuals and organizations seeking to optimize their data processing and reporting workflows.

Dash Enterprise

COMPANY

  • WE ARE HIRING

© 2024
Plotly. All rights reserved.