6 Documentation
Documentation is the process of recording all aspects of project design; sampling; lab analyses; data cleaning; data analyses; data quality control and assurance procedures; and development of decision-support tools. Seem familiar? These are the steps of the data life cycle. Documentation helps to:
- standardize procedures
- enable reproducibility
- establish credibility
- ensure others (including our future selves) use and interpret data correctly
- provide searchability
All documentation (including this document) should be updated (and versioned) as procedures change and lessons are learned.
Samples collected by WSDA for the SOS must follow the procedures and standards described in the below documentation.
External data must have, at minimum, the documentation outlined in Section 6.4 to be integrated into the SOS dataset.
6.1 Project-level
Project-level documentation includes all descriptive information about the SOS dataset, as well as planning decisions and process documentation. Documentation includes quality assurance project plans, standard operating procedures, and other high-level documents (e.g., request for proposals, applications, meeting agendas/notes).
Quality assurance project plan (QAPP)
The QAPP is the highest level of project documentation and covers everything from the project description; personnel roles and responsibilities; project timelines; data and measurement quality objectives; study design; and overviews of field, laboratory, and quality control.
Ours can be found in Y:/NRAS/soil-health-initiative/state-of-the-soils/qapp.
Standard operating procedures (SOP)
SOPs provide detailed instructions for field, lab, or data processing procedures and decision-making processes.
Ours can be found in Y:/NRAS/soil-health-initiative/state-of-the-soils/sop.
SOS sampling
The purpose of this SOP is to detail the procedures for a typical site visit in which soil samples are collected for physical, chemical, and biological soil health indicator analyses. Procedures include equipment preparation prior to sampling; best practices for filling out field forms; the selection of sampling locations; sampling protocols; sample handling and storage; and submitting samples to the lab. Following this SOP ensures data quality by creating audit trails and reminders to check that data are present, complete, and accurate. Additionally, this SOP will be used to maintain consistent sample collection procedures throughout the state for WSDA employees and partners.
Quality control / quality assurance (QA/QC)
This SOP outlines the process for screening sample metadata and lab results for completeness, consistency, and quality. Procedures involve subject matter expertise, investigation, communication with sampling teams and labs, algorithmic quality control, and tagging sample results with quality codes (listed in the below table). Data are then integrated into the statewide database.
Code | Tag | Description | Inclusion in analyses |
---|---|---|---|
0 | Excellent | Met lab’s and WSDA’s QC criteria | Yes |
100 | Estimate | Interpolated missing value | Yes |
110 | Derived | Derived from an estimated value | Yes |
120 | Suspect | Z-score is ≥ |3| | Yes |
130 | Calculated ND | Calculated value using at least one ND | Yes |
140 | Non-detect | Below the method detection limit | No |
160 | Poor | Did not meet lab’s QC criteria | No |
180 | Outlier | Outlier, designated by soil scientist | No |
200 | Unknown | External dataset | Yes |
ND = non-detect |
6.2 Dataset-level
Dataset-level documentation applies to lab results, sample locations, grower information, and management data. Readmes and changelogs document what each dataset contains, how they are related, potential issues to be aware of, and any alterations made to the data. See below for examples of what to include.
Readme
readme
files are plain text documents that contain information about the files in a folder, explanation of versioning, and instructions/metadata for data packages. These files are saved as .txt
, instead of MS Word documents that take longer to open and can only be opened on computers with Microsoft installed.
Describe contents of folder
The readme.txt in the _complete-dataset
folder describes each files’ structure, contents, and other pertinent information, such as data sources.
View the example
Last modified: 2024-03-18
INSTRUCTIONS
Place the latest, complete SOS data in this folder as RDS and csv files. Update the readme and data dictionary as needed. Document the changes in the changelog.txt.
ROOT: complete dataset with documentation.
sos_lab-results-wide has one sample per row with each measurement as a separate column.
sos_lab-results-long has one result per row with columns for sample identification, measurement name, result value, quality code, and lab. See `~qc/qc_codes.csv` for quality code descriptions.
sos_sample-locations has the coordinates, sample ID, crop info, and PRISM mean annual precipitation and mean annual temperature (800 m resolution, 30-year normals, 1991-2020).
example-data.csv contains 100 random samples that have been anonymized with fake sampleIDs, county, farm, producer, and field names. WSU SCBG samples are excluded, as are the 0-6in to 6-12in WSDA samples.
data-dictionary lists each variable with its short name (variable name without units), description, indicator type (chemical, physical, biological), unit (if applicable), label, and data type. This spreadsheet also contains tabs describing the variables in the sample-locations and qc-codes spreadsheets.
qc_codes describes the quality codes.
readme.txt (this file) describes the contents of the root folder and subfolders.
changelog.txt documents changes made to the data and organization.
MANAGEMENT SURVEYS
Since management surveys aren't yet compiled, they will be placed in this folder.
JR will work on a crosswalk between old sample IDs and new sample IDs to make joining with lab data easier.
REFERENCE: dictionaries and data used in code.
column-names is a dictionary to convert the old naming convention (Ca_mg.kg) to the new convention (ca_mg_kg).
crop-group-type is a dictionary mapping WSDA crop group to crop type.
format-survey-crop-names.csv is a dictionary mapping crop_type (snake_case) to Crop Type (Title Case).
lab-methods describes the lab, measurement name, method, and mdl.
soiltest-mdl contains Soiltest measurement names, methods, descriptions, units, and minimum detection limits.
staff-contacts lists year, sampling organization, and sampling contact(s).
ppt and temp are {sf} dataframes containing MAP (mm/yr) and MAT (deg C) downloaded from the {prism} package (800 m resolution, 30-year normals, 1991-2020). Created from download-spatial-data.R.
wa is a {sf} dataframe containing the WA boundary downloaded from the {rgeoboundaries} package. Created from download-spatial-data.R.
Explain versions
The readme.txt in the 2023_sampling
> lab-data
> raw
folder explains why there are two different versions of the lab results and where to find additional information.
View the example
2023-08-21
2023_wsda-soil-health_v1.xlsx has errors. See email for details.
' 'v2.xlsx still has some errors, that are cleaned up in the R scripts.
Provide instructions
Another readme.txt instructs how to use the files in the ArcGIS soil sample points box.com folder. When this folder is shared with partners, the readme helps orient them to the contents of the folder and modify the files as needed for their own project.
View the example
Template for Soil Sample Points ArcGIS
2023-06-01
Jadey Ryan | Washington State Department of Agriculture (WSDA)
jryan@agr.wa.gov
Purpose:
To provide a template for ESRI ArcGIS data entry and management for soil sampling projects.
Folder contents:
- readme.txt describes the folder contents and provides general instructions.
- template-soil-sample-points.aprx is an ArcGIS Project File should allow you to open the project in ArcGIS Pro.
- template-soil-sample-points.gdb is a file geodatabase, which you can open in ArcGIS Pro or ArcGIS Online.
- crop-domain.csv provides WSDA approved crop types to use in ArcGIS 'Table to Domain' geoprocessing tool. We highly recommend using attribute domains, which are rules to enforce data integrity by limiting the field type and choices of an attribute field.
- 923-nras-soil-health-sop-web.pdf is WSDA's Standard Operating Procedure for soil sampling. The appendices contain instructions for the GIS workflow of soil sampling.
- /screenshots/feature-layer-offline-editing.png shows the checkboxes required for editing.
- /screenshots/configure-forms.png shows where to click within a Web Map to open the Field Maps form editor.
- /screenshots/table-to-domain.png is an example of how to use the 'Table to Domain' geoprocessing tool with the crop_domain.csv.
- /screenshots/field-maps-form_*.png are examples of the Field Maps form structure, which can be created in ArcGIS Online.
- washi_soil-series-rest-service is an internet shortcut to the URL for the Washington clipped soil series, originating from NRCS gSSURGO. When compositing multiple samples together from one field, we recommend keeping all sample points within one soil series to reduce variability in the composite samples.
- The other folders (Index, GpMessages, ImportLog, .backups) are part of the ArcGIS Pro project and can be ignored.
Instructions:
1. Open template-soil-sample-points.aprx.
2. Update attribute fields and domains to work with your project.
3. Update symbology, labels, visibility scale, popups, etc. The symbology of the sample points currently defaults to red when the 'Show/Hide Field Form' attribute is still 'Hide' or 'NULL'. When the sampler collects the sample and changes this attribute to 'Show', the point will turn yellow to indicate the sample has been collected.
4. Share as Feature Layer and create a Web Map to allow others access to this map.
5. Open the map in ArcGIS Online and click on 'Forms' in the right toolbar to configure your Field Maps form.
6. Use the /screenshots/field-maps-form_*.pngs as a guide, but adapt the field form structure to suit your project.
Offline Areas:
- If you anticipate needing to sample without cellular service or wifi access: 1) Open the hosted feature layer, 2) Click 'Settings' 3) Confirm 'Enable Sync' is turned on.
- If you need the Soil Series layer, you will need to configure two separate Web Maps: one with and one without. This is because the washi_soil-series-rest-service is not allowed in Web Maps with 'Enable Sync' turned on.
- Alternatively, you can create your own Soil Series layer and host it as a feature layer in your own organization.
Resources:
- WSDA Soil Health YouTube channel: https://www.youtube.com/playlist?list=PL0pB20prk7Ni1daEYiEEXSWy8CfwO34FC
- WSDA WaSHI_SoilSeries MapServer: https://fortress.wa.gov/agr/gis/wsdagis/rest/services/NRAS/WaSHI_SoilSeries/MapServer
- Introduction to attribute domains: https://pro.arcgis.com/en/pro-app/latest/help/data/geodatabases/overview/an-overview-of-attribute-domains.htm
- Introducing smart forms in ArcGIS Field Maps: https://www.esri.com/arcgis-blog/products/field-maps/field-mobility/introducing-arcgis-smart-forms/
Changelog
Changelogs are also simple and concise plain text documents saved in a folder alongside data files that document changes to the dataset. For more information, see keepachangelog.com/.
At the bare minimum, the changelog.txt
contains:
- date of modification
- initials of who made the changes
- description of the changes
See the example changelog.txt in the _complete-dataset
folder.
View the example
Contents of changelog.txt in _complete-dataset folder:
2022-12-15 JR standardized texture classes (title case, no extra white space) and converted texture, county, and crop to factor types.
2023-01-03 JR corrected error in 2022 WSDA bulk density measurements in 2022-11-01_soiltestData_manualCleanup.xlsx.
2023-03-02 JR recoded SCBG producer IDs to match WSDA format and cleaned up farm/producer names and IDs. See new scbg_producerId_recode.csv for list.
2023-03-07 JR corrected Okanogan producer and field names (Devany, Townsend).
2023-03-21 JR corrected cropType "Fallow, Idle" to "Fallow" and 2021 SCBG "Pea" samples to "Pea, Dry", updated Crop Group column. Updated results and sample locations spreadsheets.
2023-07-13 JR added labID to labResults datasets and updated dataDictionary accordingly.
2023-08-21 JR added 2023 data and updated dataDictionary sample_locations tab.
2023-08-27 JR corrected SCBG pulse samples crop from "Pea" to "Pea, Dry".
2023-08-30 JR added an anonymized dataset (100 samples from 2022-2023).
2023-09-05 JR added labID to 2023 results.
2023-09-06 JR added sampling organization column to make impact tracking easier. Fixed merge issue that was causing the loss of some producer IDs. Add "County" to relevant CD names.
2023-11-14 JR switched from .RData to .RDS file type so users can assign a new name when loading the data into R with `data_wide <- readRDS("2020-2023_labResults_wide.RDS")`.
2023-11-14 JR added sampling dates and depths to all 877 samples. See addSampleDepthsDates for R script.
2023-12-08 JR corrected CropGroup from Fallow to Cereal Grain for samples with CropType of Fallow, Wheat in 2020-2023_sampleLocations.csv.
6.3 Variable-level
Variable-level documentation includes data dictionaries, which are tabular collections of names, definitions, and attributes about the variables in a dataset. Data dictionaries are ideally created in the planning phase of the project before data are collected.
Data dictionary
Each row is a different variable, and each column is a different attribute of that variable. With a data dictionary, a user should be able to properly interpret each variable in the data.
Our data-dictionary.xlsx in the _complete-dataset folder contains three tabs (lab-results
, sample-locations
, and qc-codes
) that describe the attributes of each variable.
View the sample-locations
dictionary
variable | description | unit | data_type |
---|---|---|---|
year | Year the sample was collected | Numeric | |
county | County of the sampled field | Character | |
sample_id | Sample identification code | Character | |
crop_group | Crop group of the sampled field | Character | |
crop_type | Crop type of the sampled field | Character | |
map_mm_year | OSU PRISM mean annual precipitation 30-year normals (1991-2020) at 800 m resolution |
mm/year | Numeric |
mat_c | OSU PRISM mean annual temperature 30-year normals (1991-2020) at 800 m resolution |
degrees C | Numeric |
longitude | Longitude of sample point, WGS84 | decimal degrees | Numeric |
latitude | Latitude of sample point, WGS84 | decimal degrees | Numeric |
6.4 External data
External data refers to any data not directly collected by WSDA or trained partners (e.g., WSU or conservation districts) that follow our SOPs. These can include other studies pre-dating WaSHI, special soil health surveys, and publicly available datasets.
On a case-by-case basis, the Senior Soil Scientist and Data Scientist consider the following questions when deciding whether to integrate an external dataset:
- How does the study design fit into SOS goals?
- What field procedures were used and how were they documented?
- Who analyzed the soil samples? With which methods and QA/QC procedures?
- Are the following required metadata and management data available along with the lab results?
- Farm, producer, and field info1
- Sampling date
- Sampling depth
- Latitude and longitude
- Production system (current crop, crop rotation, etc.)
- Information concerning tillage, livestock grazing, irrigation, soil fertility and amendments, land use history, and/or conservation practices
- Is there a data dictionary or codebook describing the measurements, units, missing values, etc.?
Generally, external data should 1) be well documented, 2) be collected and analyzed by well-trained scientists and labs; and 3) have adequate accompanying metadata and management data to facilitate interpretation of the results.
Some publicly available datasets to consider are in Y:/NRAS/soil-health-initiative/state-of-the-soils/external-data.
Intake form
External data may be provided in the External Data Intake spreadsheet, alongside related documents such as SOPs, management surveys, raw data files, etc.
Enough farm, producer and field info to distinguish unique farmers and fields for assigning unique IDs. They don’t need to include personally identifiable information.↩︎