Structured Data (JSON, YAML, CSV) Importer

ALFA enables deriving a data model from formatted data. Data formatted in CSV, JSON, YAML or XML can be used to create a data model. This can be used to accelerate creating an initial schema with high accuracy, optionally with some minor adjustments.

The structured data importer will analyse the structure and generate a model. The model may contain generated type names as structured data does not contain type information. Those can easily be updated to user-specified names.

The ALFA model importer is run from the command line using the ALFA command-line tool.

Arguments to the CLI.

  1. -importers Parameter has to be specified as structureddata
  2. -output The ALFA model is generated into this path.
  3. -settings namespace=<your namespace> The structureddata importer requires a settings parameter containing the namespace to be used in the generated model.
  4. Last parameter is the input data file with .csv, .json, ‘.yaml or .xml extension. file is specified as the file to import.

Executing an Importer

Use the following form:

alfa -importers structureddata -settings namespace=demo -output generated transactions.json

Or using the Maven Plugin CLI.

mvn -U com.schemarise.alfa.utils:alfa-maven-plugin:3.4.8:cli -Dalfa.importers=structureddata -Dalfa.output=generated -Dalfa.settings=namespace=demo -Dalfa.sourcepath=sample.csv

NOTE: The namespace settings parameter is mandatory.

CSV Example

Given a sample.csv file

Index,CustomerId,FirstName,LastName,Company,City,Country,Phone1,Phone2,Email,SubscriptionDate,Website
1,DD37Cf93aecA6Dc,Sheryl,Baxter,Rasmussen Group,East Leonard,Chile,229.077.5154,397.884.0519x718,zunigavanessa@smith.info,2020-08-24,http://www.stephenson.com/
2,1Ef7b82A4CAAD10,Preston,Lozano,Vega-Gentry,East Jimmychester,Djibouti,5153435776,686-620-1820x944,vmata@colon.com,2021-04-23,http://www.hobbs.com/
3,6F94879bDAfE5a6,Roy,Berry,Murillo-Perry,Isabelborough,Antigua and Barbuda,+1-539-402-0259,(496)978-3969x58947,beckycarr@hogan.com,2020-03-25,http://www.lawrence.com/

The ALFA model shown below is generated.

record CsvImported {
  Index : int
  CustomerId : string
  FirstName : string
  LastName : string
  Company : string
  City : string
  Country : string
  Phone1 : string
  Phone2 : string
  Email : string
  SubscriptionDate : date
  Website : string
}

NOTES:

  1. ISO formatted date/datetime values are recognised as date/datetimes. Non-ISO formats can be specified using settings.
  2. Numeric types - int, long, double will be recognised based on the values and most appropriate type to hold all values in the column.

JSON Example

Given a transactions.json file

{
  "Age" : 26,
  "Name" : "Joe Bloggs",
  "Salary" : 32000.00,
  "LoyaltyPoints" : [ 1, 2, 3 ],

  "Accounts" : [ {
    "Id" : 10901238,
    "Currency" : "EUR",
    "Balance" : 1200.00
  },
  {
    "Id" : 20901472,
    "Currency" : "CHF",
    "Balance" : 1200.00
  },
  {
    "Id" : 90321234,
    "Currency" : "USD",
    "Balance" : 23200.00,
    "Type" : "Savings"
  } ]
}

The ALFA model shown below is generated.

namespace demo


record Rec1 {
      Id : int ## 10901238
      Currency : string ## "EUR"
      Balance : double ## 1200.0
      Type : string? ## "Savings"
}


record Rec2 {
      Age : int ## 26
      Name : string ## "Joe Bloggs"
      Salary : double ## 32000.0
      LoyaltyPoints : list< int > ## [1,2,3]
      Accounts : list< Rec1 >
}

Observations:

  1. Type names are generated uniquely and can be replaced with more appropriate naming.
  2. All types are derived correctly
  3. The ‘Type’ field only exists in one of the nested objects therefore is generated as an optional field
  4. Comments indicate the data value used to derive the type