6 Documentation

Documentation is the process of recording all aspects of project design; sampling; lab analyses; data cleaning; data analyses; data quality control and assurance procedures; and development of decision-support tools. Seem familiar? These are the steps of the data life cycle. Documentation helps to:

standardize procedures
enable reproducibility
establish credibility
ensure others (including our future selves) use and interpret data correctly
provide searchability

All documentation (including this document) should be updated (and versioned) as procedures change and lessons are learned.

Samples collected by WSDA for the SOS must follow the procedures and standards described in the below documentation.

External data must have, at minimum, the documentation outlined in Section 6.4 to be integrated into the SOS dataset.

6.1 Project-level

Project-level documentation includes all descriptive information about the SOS dataset, as well as planning decisions and process documentation. Documentation includes quality assurance project plans, standard operating procedures, and other high-level documents (e.g., request for proposals, applications, meeting agendas/notes).

Quality assurance project plan (QAPP)

The QAPP is the highest level of project documentation and covers everything from the project description; personnel roles and responsibilities; project timelines; data and measurement quality objectives; study design; and overviews of field, laboratory, and quality control.

Ours can be found in Y:/NRAS/soil-health-initiative/state-of-the-soils/qapp.

Standard operating procedures (SOP)

SOPs provide detailed instructions for field, lab, or data processing procedures and decision-making processes.

Ours can be found in Y:/NRAS/soil-health-initiative/state-of-the-soils/sop.

SOS sampling

The purpose of this SOP is to detail the procedures for a typical site visit in which soil samples are collected for physical, chemical, and biological soil health indicator analyses. Procedures include equipment preparation prior to sampling; best practices for filling out field forms; the selection of sampling locations; sampling protocols; sample handling and storage; and submitting samples to the lab. Following this SOP ensures data quality by creating audit trails and reminders to check that data are present, complete, and accurate. Additionally, this SOP will be used to maintain consistent sample collection procedures throughout the state for WSDA employees and partners.

Quality control / quality assurance (QA/QC)

This SOP outlines the process for screening sample metadata and lab results for completeness, consistency, and quality. Procedures involve subject matter expertise, investigation, communication with sampling teams and labs, algorithmic quality control, and tagging sample results with quality codes (listed in the below table). Data are then integrated into the statewide database.

Code	Tag	Description	Inclusion in analyses
0	Excellent	Met lab’s and WSDA’s QC criteria	Yes
100	Estimate	Interpolated missing value	Yes
110	Derived	Derived from an estimated value	Yes
120	Suspect	Z-score is ≥ \|3\|	Yes
130	Calculated ND	Calculated value using at least one ND	Yes
140	Non-detect	Below the method detection limit	No
160	Poor	Did not meet lab’s QC criteria	No
180	Outlier	Outlier, designated by soil scientist	No
200	Unknown	External dataset	Yes
ND = non-detect

6.2 Dataset-level

Dataset-level documentation applies to lab results, sample locations, grower information, and management data. Readmes and changelogs document what each dataset contains, how they are related, potential issues to be aware of, and any alterations made to the data. See below for examples of what to include.

Readme

readme files are plain text documents that contain information about the files in a folder, explanation of versioning, and instructions/metadata for data packages. These files are saved as .txt, instead of MS Word documents that take longer to open and can only be opened on computers with Microsoft installed.

Describe contents of folder

The readme.txt in the _complete-dataset folder describes each files’ structure, contents, and other pertinent information, such as data sources.

View the example

Last modified: 2024-03-18

INSTRUCTIONS
Place the latest, complete SOS data in this folder as RDS and csv files. Update the readme and data dictionary as needed. Document the changes in the changelog.txt.

ROOT: complete dataset with documentation.
sos_lab-results-wide has one sample per row with each measurement as a separate column.
sos_lab-results-long has one result per row with columns for sample identification, measurement name, result value, quality code, and lab. See `~qc/qc_codes.csv` for quality code descriptions.
sos_sample-locations has the coordinates, sample ID, crop info, and PRISM mean annual precipitation and mean annual temperature (800 m resolution, 30-year normals, 1991-2020).
example-data.csv contains 100 random samples that have been anonymized with fake sampleIDs, county, farm, producer, and field names. WSU SCBG samples are excluded, as are the 0-6in to 6-12in WSDA samples.
data-dictionary lists each variable with its short name (variable name without units), description, indicator type (chemical, physical, biological), unit (if applicable), label, and data type. This spreadsheet also contains tabs describing the variables in the sample-locations and qc-codes spreadsheets.
qc_codes describes the quality codes.
readme.txt (this file) describes the contents of the root folder and subfolders.
changelog.txt documents changes made to the data and organization.

MANAGEMENT SURVEYS
Since management surveys aren't yet compiled, they will be placed in this folder.
JR will work on a crosswalk between old sample IDs and new sample IDs to make joining with lab data easier.

REFERENCE: dictionaries and data used in code.
column-names is a dictionary to convert the old naming convention (Ca_mg.kg) to the new convention (ca_mg_kg).
crop-group-type is a dictionary mapping WSDA crop group to crop type.
format-survey-crop-names.csv is a dictionary mapping crop_type (snake_case) to Crop Type (Title Case).
lab-methods describes the lab, measurement name, method, and mdl.
soiltest-mdl contains Soiltest measurement names, methods, descriptions, units, and minimum detection limits.
staff-contacts lists year, sampling organization, and sampling contact(s).
ppt and temp are {sf} dataframes containing MAP (mm/yr) and MAT (deg C) downloaded from the {prism} package (800 m resolution, 30-year normals, 1991-2020). Created from download-spatial-data.R.
wa is a {sf} dataframe containing the WA boundary downloaded from the {rgeoboundaries} package. Created from download-spatial-data.R.

Explain versions

The readme.txt in the 2023_sampling > lab-data > raw folder explains why there are two different versions of the lab results and where to find additional information.

View the example

2023-08-21
2023_wsda-soil-health_v1.xlsx has errors. See email for details.
'          'v2.xlsx still has some errors, that are cleaned up in the R scripts.

Provide instructions

Another readme.txt instructs how to use the files in the ArcGIS soil sample points box.com folder. When this folder is shared with partners, the readme helps orient them to the contents of the folder and modify the files as needed for their own project.

View the example

Template for Soil Sample Points ArcGIS

2023-06-01

Jadey Ryan | Washington State Department of Agriculture (WSDA)
jryan@agr.wa.gov

Purpose:
To provide a template for ESRI ArcGIS data entry and management for soil sampling projects.

Folder contents:
- readme.txt describes the folder contents and provides general instructions.
- template-soil-sample-points.aprx is an ArcGIS Project File should allow you to open the project in ArcGIS Pro.
- template-soil-sample-points.gdb is a file geodatabase, which you can open in ArcGIS Pro or ArcGIS Online.
- crop-domain.csv provides WSDA approved crop types to use in ArcGIS 'Table to Domain' geoprocessing tool. We highly recommend using attribute domains, which are rules to enforce data integrity by limiting the field type and choices of an attribute field.
- 923-nras-soil-health-sop-web.pdf is WSDA's Standard Operating Procedure for soil sampling. The appendices contain instructions for the GIS workflow of soil sampling.
- /screenshots/feature-layer-offline-editing.png shows the checkboxes required for editing.
- /screenshots/configure-forms.png shows where to click within a Web Map to open the Field Maps form editor.
- /screenshots/table-to-domain.png is an example of how to use the 'Table to Domain' geoprocessing tool with the crop_domain.csv.
- /screenshots/field-maps-form_*.png are examples of the Field Maps form structure, which can be created in ArcGIS Online.
- washi_soil-series-rest-service is an internet shortcut to the URL for the Washington clipped soil series, originating from NRCS gSSURGO. When compositing multiple samples together from one field, we recommend keeping all sample points within one soil series to reduce variability in the composite samples.
- The other folders (Index, GpMessages, ImportLog, .backups) are part of the ArcGIS Pro project and can be ignored.

Instructions:
1. Open template-soil-sample-points.aprx.
2. Update attribute fields and domains to work with your project.
3. Update symbology, labels, visibility scale, popups, etc. The symbology of the sample points currently defaults to red when the 'Show/Hide Field Form' attribute is still 'Hide' or 'NULL'. When the sampler collects the sample and changes this attribute to 'Show', the point will turn yellow to indicate the sample has been collected.
4. Share as Feature Layer and create a Web Map to allow others access to this map.
5. Open the map in ArcGIS Online and click on 'Forms' in the right toolbar to configure your Field Maps form.
6. Use the /screenshots/field-maps-form_*.pngs as a guide, but adapt the field form structure to suit your project.

Offline Areas:
- If you anticipate needing to sample without cellular service or wifi access: 1) Open the hosted feature layer, 2) Click 'Settings' 3) Confirm 'Enable Sync' is turned on.
- If you need the Soil Series layer, you will need to configure two separate Web Maps: one with and one without. This is because the washi_soil-series-rest-service is not allowed in Web Maps with 'Enable Sync' turned on. 
- Alternatively, you can create your own Soil Series layer and host it as a feature layer in your own organization.

Resources:
- WSDA Soil Health YouTube channel: https://www.youtube.com/playlist?list=PL0pB20prk7Ni1daEYiEEXSWy8CfwO34FC
- WSDA WaSHI_SoilSeries MapServer: https://fortress.wa.gov/agr/gis/wsdagis/rest/services/NRAS/WaSHI_SoilSeries/MapServer
- Introduction to attribute domains: https://pro.arcgis.com/en/pro-app/latest/help/data/geodatabases/overview/an-overview-of-attribute-domains.htm
- Introducing smart forms in ArcGIS Field Maps: https://www.esri.com/arcgis-blog/products/field-maps/field-mobility/introducing-arcgis-smart-forms/

Changelog

Changelogs are also simple and concise plain text documents saved in a folder alongside data files that document changes to the dataset. For more information, see keepachangelog.com/.

At the bare minimum, the changelog.txt contains:

date of modification
initials of who made the changes
description of the changes

See the example changelog.txt in the _complete-dataset folder.

View the example

Contents of changelog.txt in _complete-dataset folder:

2022-12-15 JR standardized texture classes (title case, no extra white space) and converted texture, county, and crop to factor types.
2023-01-03 JR corrected error in 2022 WSDA bulk density measurements in 2022-11-01_soiltestData_manualCleanup.xlsx.
2023-03-02 JR recoded SCBG producer IDs to match WSDA format and cleaned up farm/producer names and IDs. See new scbg_producerId_recode.csv for list.
2023-03-07 JR corrected Okanogan producer and field names (Devany, Townsend).
2023-03-21 JR corrected cropType "Fallow, Idle" to "Fallow" and 2021 SCBG "Pea" samples to "Pea, Dry", updated Crop Group column. Updated results and sample locations spreadsheets.
2023-07-13 JR added labID to labResults datasets and updated dataDictionary accordingly. 
2023-08-21 JR added 2023 data and updated dataDictionary sample_locations tab.
2023-08-27 JR corrected SCBG pulse samples crop from "Pea" to "Pea, Dry".
2023-08-30 JR added an anonymized dataset (100 samples from 2022-2023).
2023-09-05 JR added labID to 2023 results.
2023-09-06 JR added sampling organization column to make impact tracking easier. Fixed merge issue that was causing the loss of some producer IDs. Add "County" to relevant CD names.
2023-11-14 JR switched from .RData to .RDS file type so users can assign a new name when loading the data into R with `data_wide <- readRDS("2020-2023_labResults_wide.RDS")`.
2023-11-14 JR added sampling dates and depths to all 877 samples. See addSampleDepthsDates for R script. 
2023-12-08 JR corrected CropGroup from Fallow to Cereal Grain for samples with CropType of Fallow, Wheat in 2020-2023_sampleLocations.csv.

6.3 Variable-level

Variable-level documentation includes data dictionaries, which are tabular collections of names, definitions, and attributes about the variables in a dataset. Data dictionaries are ideally created in the planning phase of the project before data are collected.

Data dictionary

Each row is a different variable, and each column is a different attribute of that variable. With a data dictionary, a user should be able to properly interpret each variable in the data.

Our data-dictionary.xlsx in the _complete-dataset folder contains three tabs (lab-results, sample-locations, and qc-codes) that describe the attributes of each variable.

View the sample-locations dictionary

variable	description	unit	data_type
year	Year the sample was collected		Numeric
county	County of the sampled field		Character
sample_id	Sample identification code		Character
crop_group	Crop group of the sampled field		Character
crop_type	Crop type of the sampled field		Character
map_mm_year	OSU PRISM mean annual precipitation 30-year normals (1991-2020) at 800 m resolution	mm/year	Numeric
mat_c	OSU PRISM mean annual temperature 30-year normals (1991-2020) at 800 m resolution	degrees C	Numeric
longitude	Longitude of sample point, WGS84	decimal degrees	Numeric
latitude	Latitude of sample point, WGS84	decimal degrees	Numeric

6.4 External data

External data refers to any data not directly collected by WSDA or trained partners (e.g., WSU or conservation districts) that follow our SOPs. These can include other studies pre-dating WaSHI, special soil health surveys, and publicly available datasets.

On a case-by-case basis, the Senior Soil Scientist and Data Scientist consider the following questions when deciding whether to integrate an external dataset:

How does the study design fit into SOS goals?
What field procedures were used and how were they documented?
Who analyzed the soil samples? With which methods and QA/QC procedures?
Are the following required metadata and management data available along with the lab results?
- Farm, producer, and field info¹
- Sampling date
- Sampling depth
- Latitude and longitude
- Production system (current crop, crop rotation, etc.)
- Information concerning tillage, livestock grazing, irrigation, soil fertility and amendments, land use history, and/or conservation practices
Is there a data dictionary or codebook describing the measurements, units, missing values, etc.?

Generally, external data should 1) be well documented, 2) be collected and analyzed by well-trained scientists and labs; and 3) have adequate accompanying metadata and management data to facilitate interpretation of the results.

Some publicly available datasets to consider are in Y:/NRAS/soil-health-initiative/state-of-the-soils/external-data.

Intake form

External data may be provided in the External Data Intake spreadsheet, alongside related documents such as SOPs, management surveys, raw data files, etc.

Enough farm, producer and field info to distinguish unique farmers and fields for assigning unique IDs. They don’t need to include personally identifiable information.↩︎