Structured Data (JSON, YAML, CSV) Importer¶
ALFA enables deriving a data model from formatted data. Data formatted in CSV, JSON, YAML or XML can be used to create a data model. This can be used to accelerate creating an initial schema with high accuracy, optionally with some minor adjustments.
The structured data importer will analyse the structure and generate a model. The model may contain generated type names as structured data does not contain type information. Those can easily be updated to user-specified names.
The ALFA model importer is run from the command line using the ALFA command-line tool.
Arguments to the CLI.
-importers
Parameter has to be specified asstructureddata
-output
The ALFA model is generated into this path.-settings namespace=<your namespace>
The structureddata importer requires a settings parameter containing the namespace to be used in the generated model.- Last parameter is the input data file with
.csv
,.json
, ‘.yaml
or.xml
extension. file is specified as the file to import.
Executing an Importer¶
Use the following form:
alfa -importers structureddata -settings namespace=demo -output generated transactions.json
Or using the Maven Plugin CLI.
mvn -U com.schemarise.alfa.utils:alfa-maven-plugin:3.4.8:cli -Dalfa.importers=structureddata -Dalfa.output=generated -Dalfa.settings=namespace=demo -Dalfa.sourcepath=sample.csv
NOTE: The namespace
settings parameter is mandatory.
CSV Example¶
Given a sample.csv
file
Index,CustomerId,FirstName,LastName,Company,City,Country,Phone1,Phone2,Email,SubscriptionDate,Website
1,DD37Cf93aecA6Dc,Sheryl,Baxter,Rasmussen Group,East Leonard,Chile,229.077.5154,397.884.0519x718,zunigavanessa@smith.info,2020-08-24,http://www.stephenson.com/
2,1Ef7b82A4CAAD10,Preston,Lozano,Vega-Gentry,East Jimmychester,Djibouti,5153435776,686-620-1820x944,vmata@colon.com,2021-04-23,http://www.hobbs.com/
3,6F94879bDAfE5a6,Roy,Berry,Murillo-Perry,Isabelborough,Antigua and Barbuda,+1-539-402-0259,(496)978-3969x58947,beckycarr@hogan.com,2020-03-25,http://www.lawrence.com/
The ALFA model shown below is generated.
record CsvImported {
Index : int
CustomerId : string
FirstName : string
LastName : string
Company : string
City : string
Country : string
Phone1 : string
Phone2 : string
Email : string
SubscriptionDate : date
Website : string
}
NOTES:
- ISO formatted date/datetime values are recognised as date/datetimes. Non-ISO formats can be specified using
settings
. - Numeric types -
int
,long
,double
will be recognised based on the values and most appropriate type to hold all values in the column.
JSON Example¶
Given a transactions.json
file
{
"Age" : 26,
"Name" : "Joe Bloggs",
"Salary" : 32000.00,
"LoyaltyPoints" : [ 1, 2, 3 ],
"Accounts" : [ {
"Id" : 10901238,
"Currency" : "EUR",
"Balance" : 1200.00
},
{
"Id" : 20901472,
"Currency" : "CHF",
"Balance" : 1200.00
},
{
"Id" : 90321234,
"Currency" : "USD",
"Balance" : 23200.00,
"Type" : "Savings"
} ]
}
The ALFA model shown below is generated.
namespace demo
record Rec1 {
Id : int ## 10901238
Currency : string ## "EUR"
Balance : double ## 1200.0
Type : string? ## "Savings"
}
record Rec2 {
Age : int ## 26
Name : string ## "Joe Bloggs"
Salary : double ## 32000.0
LoyaltyPoints : list< int > ## [1,2,3]
Accounts : list< Rec1 >
}
Observations:
- Type names are generated uniquely and can be replaced with more appropriate naming.
- All types are derived correctly
- The ‘Type’ field only exists in one of the nested objects therefore is generated as an optional field
- Comments indicate the data value used to derive the type