Assertions (deprecated)

Assert the profiling statistic result

PipeRider Assertions are deprecated since v0.25.0. Please replace assertions with the relevant testing methods offered by dbt tests.

Assertions are the data testing solution in PipeRider. It asserts if the profiling result fulfill a certain rule. There are two types of assertions

  • PipeRider assertions

  • DBT assertions

PipeRider Assertions

PipeRider assertion asserts the profiling result for each run

Assertion files

Assertion files are located in .piperider/assertions/

File naming convention

Assertion files are YAML files and are named according to the data source table name:

<table>.yml

If you opted to generate 'recommended assertions' by piperider generate-assertions, then assertion files will be prepended with 'recommended_':

recommended_<table>.yml

Example assertion file

The following is an except of an assertions file for a movie database table:

# Auto-generated by PipeRider based on table "movies"
movies:  # Table Name
  # Test Cases for Table
  tests:
  - metric: row_count
    assert:
      gte: 8961
    tags:
    - RECOMMENDED
  columns:
    title:  # Column Name
      # Test Cases for Column
      tests:
      - name: assert_column_schema_type
        assert:
          schema_type: VARCHAR
        tags:
        - RECOMMENDED
      - name: assert_column_not_null
        tags:
        - RECOMMENDED

Profile Assertions

Profile assertions are the most common way to define an assertion. You can assert if the profiling statistic fulfill certain rule.

Assertion expressions

Description: Profiling-based assertions are assert the value of a profiling field.

  • Metric: The profile field defined in profling

  • Assert:

    • gte: the value should be greater than or equal to

    • gt: the value should be greater than

    • lte: the value should be less than or equal to

    • lt: the value should be less than

    • eq: the value should equal to

    • ne: the value should not equal to

The row count should be <= 1000000

world_city:
  tests:
  - metric: row_count
    assert:
      lte: 1000000

The missing percentage should be <= 0.01

world_city:
  columns:
    country_code:
      tests:
      - metric: nulls_p
        assert:
          lte: 0.01

The median should be between [10, 20]

world_city:
  columns:
    country_code:
      tests:
      - metric: p50
        assert:
          gte: 10
          lte: 20

Basic Assertions

Basic assertions are high level assertions to check the if a column is not null, unique. And check if the column value (rather than profiling statistic) fulfill certain rule.

assert_column_unique
  • Description: The values of column must be unique.

  • Assert: None

  • Tags:

world_city:
  columns:
    country_code:
      tests:
      - name: assert_column_unique
        tags:
          - dialing code
assert_column_not_null
  • Description: The values of the column must not be null.

  • Assert: None

  • Tags:

world_city:
  columns:
    name:
      tests:
      - name: assert_column_not_null
        tags:
          - city name
assert_column_value
  • Description: Assert the column value should be in the range.

  • Assert:

    • gte: the value should be greater than or equal to

    • gt: the value should be greater than

    • lte: the value should be less than or equal to

    • lt: the value should be less than

    • in: the value should belong to the set

The value should be between [0,10000)

world_city:
  columns:
    population:
      tests:
      - name: assert_column_value
        assert:
            gte: 0
            lt: 10000

The value of a datetime type column should be >= '2022-01-01'

world_city:
  columns:
    create_at:
      tests:
      - name: assert_column_value
        assert:
          gte: '2022-01-01;

The value of the column should belong to ["male", "female"] set

TITANIC:
  columns:
    Sex:
      tests:
      - name: assert_column_value
        assert:
          in: ["male", "female"]

Schema Assertions

assert_column_exist
  • Description: The column must exist.

  • Assert: None

  • Tags:

world_city:  #Table Name
  columns:
    country_code:
      tests:
      - name: assert_column_exist
        tags:
          - dialing code
assert_column_type
  • Description: The type of the column must match the specified type.

  • Assert:

    • type: numeric, string, datetime

  • Tags:

world_city:
  columns:
    name:
      tests:
      - name: assert_column_type
        assert:
          type: string
        tags:
          - city name
assert_column_schema_type
  • Description: The column schema type should match the specific schema type.

  • Assert:

    • schema_type: the schema type in data source. (e.g. TEXT, DATE, VARCHAR(128), ...)

world_city:
  columns:
    name:
      tests:
      - name: assert_column_schema_type
        assert:
          schema_type: TEXT
assert_column_in_types
  • Description: The type of the column must be contained in the list.

  • Assert:

    • types: [string, integer, numeric, datetime, boolean, other]

  • Tags:

world_city:  #Table Name
  columns:
    country_code:
      tests:
      - name: assert_column_in_types
        assert:
          types: [string]
        tags:
          - dialing code

DBT Assertions

PipeRider can also integrate with the dbt test result. To integrate the test result, run piperider with the --dbt-run-results option then the latest run results would be integrated in the run report.

dbt build #or dbt test
piperider run --dbt-run-results

From version 0.26.0 dbt test results are included by default and it is not neccessary to use the --dbt-run-results option.

Last updated