Skip to content
rumineykova edited this page Oct 23, 2023 · 22 revisions

FabGuard - FabFlee Input File Verification with Pandera

Introduction

FabGuard is a Python library that simplifies input file verification. It is based on the data validation library Pandera and adapted for FabFlee. This documentation will guide you through the steps to use FabGuard for input file verification.

Prerequisites

Before you get started with FabGuard, make sure you have the following prerequisites in place:

  • The Pandas library installed.
  • The Pandera library installed.

Project Structure

FabGuard is a plugin for FabFlee. The structure of the FabGuard folder is asa follows:

  • tests Folder: Contains schemes (tests) for various input files. For example, the closure_scheme folder contains verification tests for the closure file.

  • config.py: Contains configuration information, including the names of test files.

  • error_messages.py: Contains error messages used in your verification checks.

  • fab_guard.py: The main wrapper for Pandera tests. It defines decorators, such as fg.log for functions defining error messages and fg-check for functions that should be executed as part of the test suite. It also provides utility functions like load_files for reading a CSV file and returning a DataFrame, and transpose for transposing a CSV file.

Each scheme file contains a class that inherits from pa.DataFrameModel.

Important Util Functions

To ensure efficient use of resources, all test files are loaded into memory only once. This prevents unnecessary file loading, and you can achieve this by using the singleton class FabGuard. Load all files using the following method:

FabGuard.get_instance().load_file(config.routes)

How-To: Creating Tests for an Input File

In this guide, we will create tests for the locations.csv file as an example. Follow these steps to create your validation tests:

  1. Create a Python class that inherits from pa.DataFrameModel.

  2. In this class, define constraints for each column as fields of the class. For example, if you have a routes file with columns like name1, name2, distance, and forced_redirection, define the constraints as follows:

name1: Series[pa.String] = pa.Field(nullable=false, alias='#"name1"')
name2: Series[pa.String] = pa.Field(nullable=false)
distance: Series[int] = pa.Field(ge=0)
forced_redirection: Series[float] = pa.Field(isin=[0, 1, 2], nullable=True)
Clone this wiki locally