2  Formats & standards

2.1 Data formats

Data generated from or integrated into WaSHI can be non-digital or digital.

Non-digital data

Non-digital data, such as field forms, management surveys, and chain of custody forms, are manually recorded on paper forms. Paper forms must be transcribed or converted to digital file formats and then stored in the WaSHI filing cabinet in the Natural Resources Building in Olympia.

Digital data

Digital data include tabular, spatial, and binary data, such as lab results, sample locations, and field photos. Non-conventional data also include code, algorithms, tools, and workflows.

Tabular data include comma separated values (csv), tab separated values (tsv), Microsoft Excel open XML spreadsheet (xlsx), and portable document format (pdf).

Spatial data include file geodatabases (gdb), vector shapefiles (zipped folder containing multiple file extensions), keyhole markup language (kml or kmz). Tabular data may also contain spatial data such as longitude and latitude.

Binary data include photos (jpeg, png, gif, tiff), videos (mp4), code (R, py, js), and object-oriented data files (RDS, Rdata, parquet, arrow).

Proprietary data formats include Microsoft Excel, Word, and Powerpoint files (xlsx, docx, pptx). RDS and RData files are examples of application-specific data formats that can only be opened using the R programming language or RStudio IDE. These types of files should be saved in conjunction with a copy of the data in a non-proprietary and open-standard format, such as csv, to maintain accessibility for those who do not have Microsoft Office or do not use R.

Written documents and presentations are in formats including Microsoft Word and PowerPoint (docx and pptx), hypertext markup language (HTML), and pdf.

Notebooks combine text with executable code to generate written documents and presentations in docx, pptx, html, or pdf formats. These notebooks are stored in formats depending on the programming language: a few examples include R markdown (rmd), Quarto (qmd), and Jupyter notebook (ipynb).

The list below is not exhaustive and will continue to grow as additional data sources are discovered.

Type Source Formats
Lab results Provided by an analytical lab, study PI, or grower csv, xlsx, pdf, xml, json, RDS, RData
Management surveys Collected through interviews with grower csv, xlsx, RDS, RData, scanned paper form
Field forms Completed in the field during/immediately after sampling pdf, scanned paper form, csv, xlsx
Sample locations Identified prior to sampling using ArcGIS Online and updated while sampling using ArcGIS Field Maps ArcGIS feature layer, shp, kmz, csv, xlsx
Chain of custody forms Completed prior to shipping or dropping off samples pdf, scanned paper form
Climate data OSU PRISM, NOAA, Esri Living Atlas csv, shp, netCDF, tiff, gdb
Soil data NRCS Web Soil Survey, NRCS WA gSSURGO gdb, accdb
Images Logos, icons, photos taken in the field jpeg, png, gif, tiff, svg
Videos Recordings of meetings, training videos mp4
Documents Reports, manuscripts, SOP, QAPP, factsheets, brochures docx, txt, html, pdf
Presentations PowerPoints, slide decks pptx, html, pdf
Code Scripts for wrangling and analyzing data; markdown for documents and presentations; style sheets for html R, py, ipynb, js, yml, rmd, qmd, css, scss

2.2 Data standards

Date will be expressed as YYYY-MM-DD according to ISO 8601 standard.

Date with time will be expressed as YYYY-MM-DDTHH:MM:SSZ.

  • T separates date from time.
  • Z designates the time zone (Z or -HH:MM).
    • Z if using Universal Time Coordinated (UTC) with no offset.
    • Pacific Standard Time (PST) offset is -8:00.
      YYYY-MM-DDTHH:MM:SS-8:00
    • Pacific Daylight Time (PDT) offset is -7:00.
      YYYY-MM-DDTHH:MM:SS-7:00

ISO 8601, Randall Munroe’s xkcd

ISO 8601, Randall Munroe’s xkcd

Geospatial data will be accompanied by metadata that abides by the ISO 19115 standard and follows Esri’s documentation when using ArcGIS Pro. Metadata contains information about the identification, extent, quality, spatial and temporal schema, spatial reference, and distribution of digital geographic data.

Code will follow the style guide in Chapter 9.