2 Formats & standards
2.1 Data formats
Data generated from or integrated into WaSHI can be non-digital or digital.
Non-digital data
Non-digital data, such as field forms, management surveys, and chain of custody forms, are manually recorded on paper forms. Paper forms must be transcribed or converted to digital file formats and then stored in the WaSHI filing cabinet in the Natural Resources Building in Olympia.
Digital data
Digital data include tabular, spatial, and binary data, such as lab results, sample locations, and field photos. Non-conventional data also include code, algorithms, tools, and workflows.
Tabular data include comma separated values (csv), tab separated values (tsv), Microsoft Excel open XML spreadsheet (xlsx), and portable document format (pdf).
Spatial data include file geodatabases (gdb), vector shapefiles (zipped folder containing multiple file extensions), keyhole markup language (kml or kmz). Tabular data may also contain spatial data such as longitude and latitude.
Binary data include photos (jpeg, png, gif, tiff), videos (mp4), code (R, py, js), and object-oriented data files (RDS, Rdata, parquet, arrow).
Proprietary data formats include Microsoft Excel, Word, and Powerpoint files (xlsx, docx, pptx). RDS and RData files are examples of application-specific data formats that can only be opened using the R programming language or RStudio IDE. These types of files should be saved in conjunction with a copy of the data in a non-proprietary and open-standard format, such as csv, to maintain accessibility for those who do not have Microsoft Office or do not use R.
Written documents and presentations are in formats including Microsoft Word and PowerPoint (docx and pptx), hypertext markup language (HTML), and pdf.
Notebooks combine text with executable code to generate written documents and presentations in docx, pptx, html, or pdf formats. These notebooks are stored in formats depending on the programming language: a few examples include R markdown (rmd), Quarto (qmd), and Jupyter notebook (ipynb).
The list below is not exhaustive and will continue to grow as additional data sources are discovered.
Type | Source | Formats |
---|---|---|
Lab results | Provided by an analytical lab, study PI, or grower | csv, xlsx, pdf, xml, json, RDS, RData |
Management surveys | Collected through interviews with grower | csv, xlsx, RDS, RData, scanned paper form |
Field forms | Completed in the field during/immediately after sampling | pdf, scanned paper form, csv, xlsx |
Sample locations | Identified prior to sampling using ArcGIS Online and updated while sampling using ArcGIS Field Maps | ArcGIS feature layer, shp, kmz, csv, xlsx |
Chain of custody forms | Completed prior to shipping or dropping off samples | pdf, scanned paper form |
Climate data | OSU PRISM, NOAA, Esri Living Atlas | csv, shp, netCDF, tiff, gdb |
Soil data | NRCS Web Soil Survey, NRCS WA gSSURGO | gdb, accdb |
Images | Logos, icons, photos taken in the field | jpeg, png, gif, tiff, svg |
Videos | Recordings of meetings, training videos | mp4 |
Documents | Reports, manuscripts, SOP, QAPP, factsheets, brochures | docx, txt, html, pdf |
Presentations | PowerPoints, slide decks | pptx, html, pdf |
Code | Scripts for wrangling and analyzing data; markdown for documents and presentations; style sheets for html | R, py, ipynb, js, yml, rmd, qmd, css, scss |
2.2 Data standards
Date will be expressed as YYYY-MM-DD according to ISO 8601 standard.
Date with time will be expressed as YYYY-MM-DDTHH:MM:SSZ.
- T separates date from time.
- Z designates the time zone (Z or -HH:MM).
- Z if using Universal Time Coordinated (UTC) with no offset.
- Pacific Standard Time (PST) offset is -8:00.
YYYY-MM-DDTHH:MM:SS-8:00 - Pacific Daylight Time (PDT) offset is -7:00.
YYYY-MM-DDTHH:MM:SS-7:00
Geospatial data will be accompanied by metadata that abides by the ISO 19115 standard and follows Esri’s documentation when using ArcGIS Pro. Metadata contains information about the identification, extent, quality, spatial and temporal schema, spatial reference, and distribution of digital geographic data.
Code will follow the style guide in Chapter 9.