ALFA command-line¶
ALFA CLI is a command line tool that helps working with ALFA ( .alfa
) files.
See Install & Setup for instructions on installing the ALFA CLI.
The alfa
command can be use for different scenarios.
- Compile
- Code generation - ‘import’ external models to ALFA, or ‘export’ ALFA models to target languages/platforms.
- Create install package
- Execute Data Quality rules
Commands can be abbreviated to the 1st letter, for example -c
instead of -compile, -e
for export, -i
for import.
The path
parameter can be used to specify:
- A path to a
.alfa
or other model files if importing models. - A path to source directory root containing
.alfa
files at top level or within sub-directories. - A path to a
.alfamod.zip
file.
Compile¶
Compile the files in the specified path.
Optionally specify a modules path containing a path to use for resolving project references to modules.
alfa
-c:ompile
-p:ath <path>
Code Generation¶
Generate code for the content loaded from the path.
Optionally specify types
to restrict the output to this list of types and their derived/dependant/reachable definitions.
Output will be written to directory specified by -outputDir
.
alfa
-g:enerate [ java | python | html ]
[ -t:ypes <types> ]
-p:ath <path>
-o:utputDir <dir>
Execute Data Quality rules¶
ALFA DQ rules or assert expressions can be run against CSV, JSON, JSONL, Avro data, JDBC table or a Spark dataframe.
Example:
alfa --dq --datafiles data/SalesData.csv --types Validate.SaleRecord -o ./generated model
Usage:
alfa
--dq
--datafiles <path to csv, json, json, avro, spark conf file>
--types The main type that will be used to validate the data
[--settings <k1=v1;k2=v2>] Optional setting to configure behaviour of DQ run. Multiple settings are separated by `;` characters.
-o:utputDir <dir> Write DQ run results as JSON data and HTML report.
Supported settings¶
skip-unknown-fields
Ignore unknown fields in the data, i.e. those not found in the modeluse-cached-classes
optimise re-runs of DQ by caching generated in-memory rulescsv-delimiter
Override default comma as delimetercsv-has-header
Does the csv file have a header. Defaulttrue
.csv-use-header
Should the csv header be used to match to fields. Defaulttrue
. Iffalse
, assumption is that column order will match field order.
DQ Against Spark¶
Using a configuration for Spark, it DQ can be run against a dataframe and DQ results stored into another dataframe.
E.g.
alfa --dq --datafiles config/sales.spark-dq.json -v -o . model
The Spark config JSON file is defined as per the following DQCommand
ALFA definition. Contact us for more details.
record DQCommand {
requestId : uuid
master : string
sparkConfs : map<string, string >?
sourceSpec : DataFrameSpec
logLevel : string = "warn"
referencedSpecs : set< ReferenceSpec >?
outputSpec : OutputDataFrameSpec?
}
record OutputDataFrameSpec {
format : string
saveTarget : string?
options : map< string, string >?
}
record ReferenceSpec {
sourceSpec : DataFrameSpec
refMethod : ReferenceMethodType?
}
enum ReferenceMethodType {
Join
Lookup
}
record DataFrameSpec {
AlfaTypeName : string
format : string
options : map< string, string >?
load : string?
where : string?
}