7  Data flow

This chapter outlines how data are generated, processed, and moved from start to finish.

7.1 Pre field season

When preparing sample ID assignments, labels, chain of custodies, and other materials, use an accessible font to reduce transcription errors. Atkinson Hyperlegible has very distinct alphanumeric characters, which improves legibility. Download it from Google Fonts.

“0O” and “11li” in Atkinson Hyperlegible

“0O” and “11li” in Atkinson Hyperlegible

Assign unique identifiers

Before sample IDs can be assigned, collect the following information for each proposed sample:

  • County
  • Organization of sampling team
  • Farm name (optional)
  • Producer name
  • Producer contact information (optional)
  • Field name
  • Crop
  • General management practice (i.e., conventional, cover crop, reduced tillage)

View examples of the 2024 Sample Request Form sent to conservation districts and the Berries Sample Request Form used for a WSDA/WSU special project.

Once producers and fields have been identified, assign a unique ID for the producer, field, and sample with the following convention:

  • Producer ID: first three letters of county + three-digit landowner number
    • WHA001
  • Field ID: two-digit field number
    • 01 and 02
  • Pair ID (optional): letter extension added to paired fields
    • A
  • Sample ID: last two digits of year + Producer ID + Field ID + Pair ID
    • 24-WHA001-01-A and 24-WHA001-02-A

The following counties have different abbreviations than their first three letters:

  • Clallam → CLL
  • Grays Harbor → GRY
  • Kitsap → KIS
  • Skamania → SKM

Match producer and field IDs to previous participants. Continue the sequence for new producers and fields. Producer IDs and sample IDs must not be duplicated.

For an example R script to automate this process, see assign-sample-ids.R.

Create sample labels

Sample label creation is automated using R and Microsoft Word’s mail merge tool. labels.R generates a spreadsheet with the information to be printed on the labels. Then open labels-template-mail-merge.docx, select the spreadsheet as the recipient list, and run the mail merge to generate a word document with all labels to be printed (as shown in the completed-labels folder).

Create a data tracking sheet

Create a spreadsheet to track which data have been submitted for each sample, including:

  • GPS points through the ArcGIS Field Maps field form
  • Scanned paper field forms (for those without ArcGIS Field Maps)
  • Management surveys through ArcGIS Survey123
  • Scanned chain of custodies with shipping tracking numbers
  • Location of archival falcon tubes (once retrieved by WSDA staff)
  • Notes for if a sample will no longer be sampled, a sample ID was changed, etc.

See the 2023 spreadsheet for an example.

Develop ArcGIS web tools

Use ArcGIS to build tools for managing spatial data and collecting management survey data. In ArcGIS Pro, create a sample selection feature layer with domains for point numbers, bulk density, and crop types. Publish this feature layer to ArcGIS Online as a web map with a soil series layer. Then publish a second copy without the soil series layer and enable offline use. On ArcGIS Online, use Field Maps to configure the field form for the feature layer. Management surveys are created and hosted with Survey123 and Experience Builder. Schedule the ArcGIS Notebook with Python that backs up all data to run as a task every Monday, Wednesday, and Friday during the field season.

This template ArcGIS Pro project includes a readme.txt that describes this process.

View code from the ArcGIS Notebook
import arcgis
from arcgis.gis import GIS
import datetime as dt
from datetime import timezone, timedelta
gis = GIS("home")

folder_path = '/arcgis/home/backups/2023/points'
title = "2023*"
owner = "jryan_NRAS"
items = gis.content.search(query = "title:" + title + " AND owner:" + owner,
                          item_type='Feature Layer')
print(str(len(items)) + " items will be backed up to " + folder_path +". See the list below:")
items

def download_as_fgdb(item_list, backup_location):
    for item in item_list:
        try:
            if 'View Service' in item.typeKeywords:
                print(item.title + " is view, not downloading")
            else: 
                print("Downloading " + item.title)
                version = dt.datetime.now(timezone(timedelta(hours=-8))).strftime("%Y-%m-%d")
                result = item.export(item.title + "_" + version, "File Geodatabase")
                result.download(backup_location)
                result.delete()
                print("Successfully downloaded " + item.title)
        except:
            print("An error occurred downloading " + item.title)
    print("The function has completed")

download_as_fgdb(items, folder_path)

folder_path = '/arcgis/home/backups/2023/surveys'
title = "2023 * Survey* Production"
owner = "dgelardi_NRAS"
items = gis.content.search(query = "title:" + title + " AND owner:" + owner,
                          item_type='Feature Layer')
print(str(len(items)) + " items will be backed up to " + folder_path +". See the list below:")
items

def download_as_fgdb(item_list, backup_location):
    for item in item_list:
        try:
            if 'View Service' in item.typeKeywords:
                print(item.title + " is view, not downloading")
            else: 
                print("Downloading " + item.title)
                version = dt.datetime.now(timezone(timedelta(hours=-8))).strftime("%Y-%m-%d")
                result = item.export(item.title + "_" + version, "CSV")
                result.download(backup_location)
                result.delete()
                print("Successfully downloaded " + item.title)
        except:
            print("An error occurred downloading " + item.title)
    print("The function has completed")

download_as_fgdb(items, folder_path)

7.2 During field season

Data collection in the field is detailed in the sampling SOP. Here, we focus on the behind-the scenes tasks for managing data.

Update data tracking spreadsheet

Throughout the season, update the data tracking spreadsheet as various forms, surveys, and correspondence are received, as described in Create a data tracking sheet.

Modify IDs when samples change

Sometimes a producer can no longer participate, or they need to change which field is sampled. Update, version, and archive the sample request form (sample-request-form-ferry.xlsxsample-request-form-ferry_v2.xlsx). Run the assign-sample-ids.R script again to update the sample IDs. Lines 362 - 386 should be commented out as shown in the highlighted lines of the script on GitHub.

See 01_returned-sample-requests and 02_completed-sample-ids for an example of this flow.

Add a concise, explanatory note to the data tracking spreadsheet.

7.3 Post field season

Organize multiple sources of data

To unify the information from multiple data sources (e.g., sample request forms, ArcGIS Field Maps forms, and management surveys), cross-reference each source and reach out to the sampling teams to resolve conflicting information as needed. This is especially important for verifying the crop planted at the time of sampling.

See how to mostly automate this in: 01_load-metadata.R and 02_check-crops.R.

Process lab data

Follow the QA/QC SOP for processing lab data.

See the 2023 processing scripts and QA/QC report on GitHub:

Generate reports

Use the {soils} package to create a new project for each year. To avoid email attachment size limitations, save reports to Box.com for distribution to the sampling partners who send the reports to the participants. Access to this folder requires a share link provided by WSDA staff.

Save data to shared drive and WSU Teams channel

Copy the output data files and reports from Process lab data and Generate reports to the state-of-the-soils folder in its respective year_sampling folder. See Chapter 4 to review folder structure and organization.

Save the final datasets (in wide and long formats) and documentation (data dictionary, changelog, readme) to the WSU SCBG Soil Health Assessment Teams channel.

Archive jars and falcon tubes

Falcon tubes from a -80 °C freezer with the labels starting to peel off.Store the archival subsamples in glass jars in the Yakima WSDA storage room and the cryogenic archive subsamples in falcon tubes in the -80 °C freezer at the WSU Mount Vernon Northwestern Washington Research & Extension Center.

Tape the labels on the falcon tubes with a generous amount of packing tape to avoid falling off when they freeze.

Update the archive spreadsheet with the additional sample IDs, number of falcon tubes, and box number of the glass jar.