Picture by Creator
# Introduction
Excel stays related for knowledge work, however a good portion of the time spent utilizing it’s purely mechanical. Duties like combining recordsdata from a number of sources, monitoring down duplicate data, reformatting inconsistent exports, and splitting a grasp sheet into separate recordsdata should not advanced, however they’re time-consuming and liable to human error.
These 5 Python scripts assist automate these duties. Every one is self-contained, configurable, and designed to work with messy real-world knowledge.
You could find all of the scripts on GitHub.
# Merging A number of Excel Recordsdata
// The Ache Level
When consolidating knowledge from a number of Excel or comma-separated values (CSV) recordsdata, the guide course of — opening every file, copying the information, and pasting right into a grasp sheet — is gradual and liable to misalignment errors, particularly when column orders differ between recordsdata.
// What the Script Does
This script scans a folder for .xlsx and .csv recordsdata, stacks all their knowledge right into a single unified sheet, and writes a clear merged output file. It will possibly optionally add a supply column so that you all the time know which row originated from which file, and it handles mismatched column orders mechanically.
// How It Works
The script makes use of pandas to learn each file in a goal listing, aligns columns by identify slightly than place, and concatenates the whole lot into one DataFrame. A configurable add_source_column flag appends the unique filename to every row. Column mismatches are logged so you already know if some recordsdata had further or lacking fields. The output is written with openpyxl and features a abstract tab exhibiting file-by-file row counts.
⏩ Get the Excel recordsdata merger script
# Discovering and Flagging Duplicate Rows
// The Ache Level
Duplicate data are frequent in datasets which were exported and re-imported throughout techniques. Actual matches are straightforward to seek out, however near-duplicates — identical report, barely completely different formatting or spacing — are tougher to catch manually at scale.
// What the Script Does
This script scans an Excel file for duplicate rows based mostly on columns you outline, flags precise duplicates and near-duplicates by means of fuzzy matches on string fields, and writes an annotated output file highlighting each suspected duplicate group with shade coding and a confidence rating.
// How It Works
The script makes use of pandas for precise duplicate detection and RapidFuzz for fuzzy string matching on configurable key columns. Every row is assigned a reproduction group ID and a match confidence proportion. The output Excel file makes use of openpyxl formatting to spotlight duplicate clusters. A separate abstract sheet exhibits complete duplicates discovered, damaged down by match sort.
⏩ Get the duplicate finder script
# Cleansing and Standardizing Messy Exported Information
// The Ache Level
Information exported from exterior techniques typically arrives inconsistently formatted with combined date codecs, inconsistent capitalization, telephone numbers with various separators, and trailing whitespaces. Cleansing this manually earlier than any evaluation provides up rapidly.
// What the Script Does
This script applies a configurable set of cleansing guidelines to an Excel or CSV file. These embrace standardizing dates, trimming whitespace, fixing capitalization, normalizing telephone numbers and postcodes, eradicating clean rows, and flagging cells that seem incorrect. It outputs a cleaned file and a change log exhibiting precisely what was modified.
// How It Works
The script reads a configuration file that maps column names to cleansing operations: date_format, title_case, strip_whitespace, phone_normalize, remove_blank_rows, and others. Every operation is utilized in sequence. A side-by-side change log is written to a second sheet within the output, exhibiting unique versus cleaned values for each modified cell. Nothing is silently discarded. If a worth can’t be parsed, it’s flagged in a _clean_errors column.
⏩ Get the information cleaner script
# Splitting One Sheet into Separate Recordsdata by Column Worth
// The Ache Level
A grasp dataset typically must be distributed as separate recordsdata — comparable to one per area, division, or class. Doing this manually entails filtering, copying, and saving repeatedly, with a excessive danger of blending up knowledge between recordsdata.
// What the Script Does
This script reads a single Excel sheet and splits it into separate output recordsdata — one per distinctive worth in a specified column. Every output file accommodates solely the rows for that worth, with the unique formatting preserved. Filenames are generated mechanically from the column values. Optionally, it will probably ship every file as an e-mail attachment utilizing a name-to-email mapping you present.
// How It Works
The script teams the DataFrame by the goal column utilizing pandas, then writes every group to its personal .xlsx file utilizing openpyxl. A naming template, like Sales_Report_{worth}_{date}.xlsx, permits you to management the output filename format. Column headers, knowledge sorts, and fundamental formatting are preserved in every output file. An non-compulsory e-mail mode reads a CSV mapping of {worth} → {e-mail tackle} and sends every file through the Easy Mail Switch Protocol (SMTP).
⏩ Get the sheet splitter script
# Producing a Abstract Pivot Report from Uncooked Information
// The Ache Level
Producing a abstract report from uncooked knowledge — totals by class, month-to-month tendencies, or prime performers — entails constructing pivot tables, formatting them, and copying outcomes to a presentable format. When the supply knowledge updates frequently, this course of is repeated from scratch every time.
// What the Script Does
This script reads a uncooked knowledge Excel file, builds configurable pivot summaries, and writes a formatted multi-tab abstract report. Charts are generated and embedded within the output file. You possibly can re-run it any time the supply knowledge adjustments.
// How It Works
A configuration file defines the date area, the worth area, grouping columns, and particular aggregations to run. The script makes use of pandas for all aggregation logic and openpyxl with Matplotlib for chart era. Every abstract sort is given its personal tab. Conditional formatting highlights the best and lowest values. The report is designed for on-demand regeneration, and operating the script once more overwrites the earlier output cleanly.
⏩ Get the pivot report generator script
# Wrapping Up
These 5 scripts cowl frequent Excel duties which can be easy to automate however tedious to carry out manually. Select whichever one addresses essentially the most frequent job in your workflow and begin there. Here’s a fast overview:
Script Identify
Objective
Key Options
Finest Use Case
Excel Recordsdata Merger
Mix a number of Excel/CSV recordsdata
Column alignment, supply monitoring, abstract sheet
Consolidating knowledge from a number of sources
Duplicate Finder
Determine precise and fuzzy duplicates
Fuzzy matching, confidence scores, shade highlighting
Cleansing datasets with repeated data
Information Cleaner
Standardize messy exported knowledge
Formatting guidelines, normalization, change log
Preprocessing uncooked exterior knowledge
Sheet Splitter
Break up one sheet into a number of recordsdata
Auto file naming, grouping, non-compulsory e-mail sending
Distributing stories by class/area
Pivot Report Generator
Create abstract stories from uncooked knowledge
Automated pivots, charts, multi-tab output
Recurring reporting and dashboards
Joyful automating!
Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, knowledge science, and content material creation. Her areas of curiosity and experience embrace DevOps, knowledge science, and pure language processing. She enjoys studying, writing, coding, and low! At present, she’s engaged on studying and sharing her information with the developer group by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates partaking useful resource overviews and coding tutorials.

