5 Storage & version control
Non-digital data, such as paper forms, must be transcribed or converted to digital file formats and then stored in the WaSHI filing cabinet in the Natural Resources Building in Olympia.
All digital data are stored in the WSDA shared drives, and other locations listed below.
WSDA shared drives:
- Agency files: Y:/NRAS/soil-health-initiative
- GIS: K:/NRAS/Arc_Data/soil-health (access requires permissions from IT)
Esri products and services:
- ArcGIS Online Soil Health - WSDA Internal Group
- WSDA GIS on-premise ArcGIS REST Services Directory (only Jadey, Perry, and Joel can publish to this server; Ed Thompson is the contact for getting access)
Database for lab results and management data:
- WISKI, but very likely will migrate to SQL Server or a less water-focused database
GitHub organizations for code-based projects:
Microsoft Teams for data sharing between WSDA and WSU:
- WSDA and WSU Teams WaSHI channels
Box.com for external file sharing:
- WSDA has a box.com account. The WSDA Senior Soil Scientist has an account and can add editors as needed.
Individual devices (laptop, tablet, phone):
- Must NOT be the only place data are stored!
5.1 Backup
Data must be stored in multiple locations. At minimum, data on an individual computer must also be saved on the WSDA shared drive. Backing up data using version control (GitHub) or a cloud service (Microsoft OneDrive or Box.com) is strongly recommended.
5.2 Read-only raw data
Always set raw data files, such as lab results or ArcGIS Online exports, as Read-Only
to avoid accidental corruption or overwriting. For example, in the lab-data
folder, all original data files are set to Read-Only
and saved in the raw
folder.
Copy the raw data file to the working
folder for processing and analyses. Then save the final dataset in the separate clean
folder with a descriptive title. Keeping a readme.txt
to document processing steps is good practice, as discussed in Section 6.2.1.
Y:/NRAS/soil-health-initiative/state-of-the-soils/2023_sampling/lab-data
├── 2023_data-template-soiltest.xlsx
├── clean
├── qc
├── raw
└── working
To set a file as Read-Only
: right-click the file > Properties
> check the Read-only
attribute box > OK
.
5.3 Version control with Git and GitHub
A version control system records changes to files over time. Git is a free and open-source distributed version control system. GitHub is the hosting site we use to interface with Git. Git and GitHub are fundamental to reproducible statistical and data scientific workflows (Bryan 2018).
Version control ensures changes are documented and previous versions are accessible if changes must be recalled. Additionally, version control enables robust collaboration across projects.
It’s useful for not only code projects, but also for documents, presentations, and books (like this DMP!). Git and GitHub automatically save the revision history of each file, so there is only a single name for each file (e.g., report.docx
) instead of report_v01.docx
and report_v02.docx
. For a reminder on version naming, see Section 3.2.7).
The screenshot below shows who made commits (i.e., named version histories) and when they were made. From this screen, a user can click on the commit message to view all files that were changed.
After clicking the first commit message, a diff
(i.e., a visual of what changed) displays the additions to documentation.qmd
highlighted in green and deletions highlighted in red.
Privacy considerations
Review Chapter 8 to categorize the data included in the repository to protect grower privacy. If the data are not anonymized and aggregated, either 1) the repository must be set to private or 2) data files and any scripts containing Category 3 data as described in Section 8.1.0.2 must be added to the .gitignore file.
Git and GitHub resources
Read Jenny Bryan’s article Excuse Me, Do You Have a Moment to Talk About Version Control (open-access pre-print; full article on WSDA shared drive) for a background on Git and GitHub, why we should use it, and how to get started. For detailed instructions, follow along with her free online book Happy Git and GitHub for the useR.
GitHub: A Beginner’s Guide is a helpful resource created by Birds Canada for less advanced programmers. If you prefer to look through slides, see Byron C. Jaeger’s presentation Happier version control with Git and GitHub (and RStudio).