top of page

Grocery Store Pipeline

A modularized data pipeline designed for ingestion, transformation, and storage of grocery store data. It extracts records from an Excel CRM Workbook, performs data cleaning and transformation for then stores the records in a MySQL database.

This pipeline streamlines the management of grocery store sales data, offering a systematic approach to handling records from extraction to storage while ensuring adaptability to evolving business needs. It also allows customization of batch size for efficient processing on each run.



The pipeline involves three main stages and a script for each one:

  • Extraction (data_ingestion.py)

  • Transformation (data_transformation.py)

  • Storage (data_storage.py)

  • Each stage is orchestrated by the file pipeline.py


When executed, it also performs the following tasks:

  1. Database Connection: utilizes SQLAlchemy and MySQL to connect to the local SQL database.

  2. Table Creation: dynamically creates tables if they do not already exist in the database.

  3. Periodic Data Storage: stores records periodically in the SQL database, ensuring data integrity by avoiding duplicates.


bottom of page