7 Data flow
This chapter outlines how data are generated, processed, and moved from start to finish.
7.1 Pre field season
When preparing sample ID assignments, labels, chain of custodies, and other materials, use an accessible font to reduce transcription errors. Atkinson Hyperlegible has very distinct alphanumeric characters, which improves legibility. Download it from Google Fonts.
Assign unique identifiers
Before sample IDs can be assigned, collect the following information for each proposed sample:
- County
- Organization of sampling team
- Farm name (optional)
- Producer name
- Producer contact information (optional)
- Field name
- Crop
- General management practice (i.e., conventional, cover crop, reduced tillage)
View examples of the 2024 Sample Request Form sent to conservation districts and the Berries Sample Request Form used for a WSDA/WSU special project.
Once producers and fields have been identified, assign a unique ID for the producer, field, and sample with the following convention:
- Producer ID: first three letters of county + three-digit landowner number
WHA001
- Field ID: two-digit field number
01
and02
- Pair ID (optional): letter extension added to paired fields
A
- Sample ID: last two digits of year + Producer ID + Field ID + Pair ID
24-WHA001-01-A
and24-WHA001-02-A
The following counties have different abbreviations than their first three letters:
- Clallam → CLL
- Grays Harbor → GRY
- Kitsap → KIS
- Skamania → SKM
Match producer and field IDs to previous participants. Continue the sequence for new producers and fields. Producer IDs and sample IDs must not be duplicated.
For an example R script to automate this process, see assign-sample-ids.R.
Create sample labels
Sample label creation is automated using R and Microsoft Word’s mail merge tool. labels.R generates a spreadsheet with the information to be printed on the labels. Then open labels-template-mail-merge.docx, select the spreadsheet as the recipient list, and run the mail merge to generate a word document with all labels to be printed (as shown in the completed-labels folder).
Create a data tracking sheet
Create a spreadsheet to track which data have been submitted for each sample, including:
- GPS points through the ArcGIS Field Maps field form
- Scanned paper field forms (for those without ArcGIS Field Maps)
- Management surveys through ArcGIS Survey123
- Scanned chain of custodies with shipping tracking numbers
- Location of archival falcon tubes (once retrieved by WSDA staff)
- Notes for if a sample will no longer be sampled, a sample ID was changed, etc.
See the 2023 spreadsheet for an example.
Develop ArcGIS web tools
Use ArcGIS to build tools for managing spatial data and collecting management survey data. In ArcGIS Pro, create a sample selection feature layer with domains for point numbers, bulk density, and crop types. Publish this feature layer to ArcGIS Online as a web map with a soil series layer. Then publish a second copy without the soil series layer and enable offline use. On ArcGIS Online, use Field Maps to configure the field form for the feature layer. Management surveys are created and hosted with Survey123 and Experience Builder. Schedule the ArcGIS Notebook with Python that backs up all data to run as a task every Monday, Wednesday, and Friday during the field season.
This template ArcGIS Pro project includes a readme.txt that describes this process.
View code from the ArcGIS Notebook
import arcgis
from arcgis.gis import GIS
import datetime as dt
from datetime import timezone, timedelta
= GIS("home")
gis
= '/arcgis/home/backups/2023/points'
folder_path = "2023*"
title = "jryan_NRAS"
owner = gis.content.search(query = "title:" + title + " AND owner:" + owner,
items ='Feature Layer')
item_typeprint(str(len(items)) + " items will be backed up to " + folder_path +". See the list below:")
items
def download_as_fgdb(item_list, backup_location):
for item in item_list:
try:
if 'View Service' in item.typeKeywords:
print(item.title + " is view, not downloading")
else:
print("Downloading " + item.title)
= dt.datetime.now(timezone(timedelta(hours=-8))).strftime("%Y-%m-%d")
version = item.export(item.title + "_" + version, "File Geodatabase")
result
result.download(backup_location)
result.delete()print("Successfully downloaded " + item.title)
except:
print("An error occurred downloading " + item.title)
print("The function has completed")
download_as_fgdb(items, folder_path)
= '/arcgis/home/backups/2023/surveys'
folder_path = "2023 * Survey* Production"
title = "dgelardi_NRAS"
owner = gis.content.search(query = "title:" + title + " AND owner:" + owner,
items ='Feature Layer')
item_typeprint(str(len(items)) + " items will be backed up to " + folder_path +". See the list below:")
items
def download_as_fgdb(item_list, backup_location):
for item in item_list:
try:
if 'View Service' in item.typeKeywords:
print(item.title + " is view, not downloading")
else:
print("Downloading " + item.title)
= dt.datetime.now(timezone(timedelta(hours=-8))).strftime("%Y-%m-%d")
version = item.export(item.title + "_" + version, "CSV")
result
result.download(backup_location)
result.delete()print("Successfully downloaded " + item.title)
except:
print("An error occurred downloading " + item.title)
print("The function has completed")
download_as_fgdb(items, folder_path)
7.2 During field season
Data collection in the field is detailed in the sampling SOP. Here, we focus on the behind-the scenes tasks for managing data.
Update data tracking spreadsheet
Throughout the season, update the data tracking spreadsheet as various forms, surveys, and correspondence are received, as described in Create a data tracking sheet.
Modify IDs when samples change
Sometimes a producer can no longer participate, or they need to change which field is sampled. Update, version, and archive the sample request form (sample-request-form-ferry.xlsx
→ sample-request-form-ferry_v2.xlsx
). Run the assign-sample-ids.R
script again to update the sample IDs. Lines 362 - 386 should be commented out as shown in the highlighted lines of the script on GitHub.
See 01_returned-sample-requests and 02_completed-sample-ids for an example of this flow.
Add a concise, explanatory note to the data tracking spreadsheet.
7.3 Post field season
Organize multiple sources of data
To unify the information from multiple data sources (e.g., sample request forms, ArcGIS Field Maps forms, and management surveys), cross-reference each source and reach out to the sampling teams to resolve conflicting information as needed. This is especially important for verifying the crop planted at the time of sampling.
See how to mostly automate this in: 01_load-metadata.R and 02_check-crops.R.
Process lab data
Follow the QA/QC SOP for processing lab data.
See the 2023 processing scripts and QA/QC report on GitHub:
Generate reports
Use the {soils} package to create a new project for each year. To avoid email attachment size limitations, save reports to Box.com for distribution to the sampling partners who send the reports to the participants. Access to this folder requires a share link provided by WSDA staff.
Archive jars and falcon tubes
Store the archival subsamples in glass jars in the Yakima WSDA storage room and the cryogenic archive subsamples in falcon tubes in the -80 °C freezer at the WSU Mount Vernon Northwestern Washington Research & Extension Center.
Tape the labels on the falcon tubes with a generous amount of packing tape to avoid falling off when they freeze.
Update the archive spreadsheet with the additional sample IDs, number of falcon tubes, and box number of the glass jar.