ALFA command-line

ALFA CLI is a command line tool that helps working with ALFA ( .alfa ) files.

See Install & Setup for instructions on installing the ALFA CLI.

The alfa command can be use for different scenarios.

  1. Compile
  2. Code generation - ‘import’ external models to ALFA, or ‘export’ ALFA models to target languages/platforms.
  3. Create install package
  4. Execute Data Quality rules

Commands can be abbreviated to the 1st letter, for example -c instead of -compile, -e for export, -i for import.

The path parameter can be used to specify:

  1. A path to a .alfa or other model files if importing models.
  2. A path to source directory root containing .alfa files at top level or within sub-directories.
  3. A path to a .alfamod.zip file.

Compile

Compile the files in the specified path.

Optionally specify a modules path containing a path to use for resolving project references to modules.

alfa
    -c:ompile
    -p:ath <path>

Code Generation

Generate code for the content loaded from the path.

Optionally specify types to restrict the output to this list of types and their derived/dependant/reachable definitions.

Output will be written to directory specified by -outputDir.

alfa
    -g:enerate [ java | python | html ]
    [ -t:ypes <types> ]
    -p:ath <path>
    -o:utputDir <dir>

Execute Data Quality rules

ALFA DQ rules or assert expressions can be run against CSV, JSON, JSONL, Avro data, JDBC table or a Spark dataframe.

Example:

alfa --dq --datafiles data/SalesData.csv --types Validate.SaleRecord -o ./generated model

Usage:

alfa
    --dq
    --datafiles <path to csv, json, json, avro, spark conf file>
    --types The main type that will be used to validate the data
    [--settings <k1=v1;k2=v2>] Optional setting to configure behaviour of DQ run. Multiple settings are separated by `;` characters.
    -o:utputDir <dir> Write DQ run results as JSON data and HTML report.

Supported settings

  • skip-unknown-fields Ignore unknown fields in the data, i.e. those not found in the model
  • use-cached-classes optimise re-runs of DQ by caching generated in-memory rules
  • csv-delimiter Override default comma as delimeter
  • csv-has-header Does the csv file have a header. Default true.
  • csv-use-header Should the csv header be used to match to fields. Default true. If false, assumption is that column order will match field order.

DQ Against Spark

Using a configuration for Spark, it DQ can be run against a dataframe and DQ results stored into another dataframe.

E.g.

alfa --dq --datafiles config/sales.spark-dq.json -v -o .  model

The Spark config JSON file is defined as per the following DQCommand ALFA definition. Contact us for more details.

record DQCommand {
    requestId : uuid
    master : string
    sparkConfs : map<string, string >?
    sourceSpec : DataFrameSpec
    logLevel : string = "warn"
    referencedSpecs : set< ReferenceSpec >?
    outputSpec : OutputDataFrameSpec?
}

record OutputDataFrameSpec {
    format : string
    saveTarget : string?
    options : map< string, string >?
}

record ReferenceSpec {
    sourceSpec : DataFrameSpec
    refMethod : ReferenceMethodType?
}

enum ReferenceMethodType {
    Join
    Lookup
}

record DataFrameSpec {
    AlfaTypeName : string
    format : string
    options : map< string, string >?
    load : string?
    where : string?
}