Dojo will optionally convert model data into the Geotemporal format used in several applications.
This document is will further describe the Geotemporal format and elaborate on some less common data registration scenarios.
The Geotemporal format is a tabular data representation that is stored as gzipped parquet. Data will have a fixed set of columns plus arbitrary
|1546318800||Ethiopia||Afar||Afar Zone 3||Gewane||10.16807||40.64634||feature1||1|
|1561953600||Ethiopia||Afar||Afar Zone 3||Gewane||10.16807||40.64634||feature1||2|
|1577854800||Ethiopia||Afar||Afar Zone 3||Gewane||10.16807||40.64634||feature1||3|
|1546318800||Ethiopia||Afar||Afar Zone 3||Gewane||10.16807||40.64634||feature2||100||maize|
|1561953600||Ethiopia||Afar||Afar Zone 3||Gewane||10.16807||40.64634||feature2||90||maize|
|1577854800||Ethiopia||Afar||Afar Zone 3||Gewane||10.16807||40.64634||feature2||80||maize|
The fixed columns are
[timestamp, country, admin1, admin2, admin3, lat, lng, feature, value]. Here
qualifier_1 is the name of the qualifier which qualifies
feature2. Note that the fixed columns are nullable, but data that does not have at least some notion of time or place is not particularly useful.
Converting indicator datasets and model output is THE GOAL of the Dojo data pipeline. The above example is meant to illuminate in more detail the target format, but model output and indicator datasets are not expected to start in this format.
This example is available in gzipped parquet here.
In some instances a model may have date data that represents a range of dates for example:
2017/2018 represents start and end dates. The Geotemporal format supports only a single date field with the column name
timestamp, therefore a multi-date should be divided into separate columns. Using the above example, this would correspond to:
|Start Date||End Date||Country||Crop Index|
where one date would be marked as the
primary_date = True and another would become a
qualifier column, as described in the data registration document.
By convention, we expect that date ranges are represented by the first date of that range. For example, a date point representing the entire month of May, 2020 could be presented as
5/1/2020. Alternatively, Dojo provides a mechanism for the user to “build a date” where
year are in separate columns and there is no
Dates are standardized according to the Gregorion calendar. An example of a non-standard calendar is the Ethiopian calendar. Dates in a non-standard calendar should be converted to Gregorion datetime.
The Geotemporal format reserves the following column names:
value. If data is submitted with these column names and not used to represent that entity, then the submitted column name will be appended with the suffix