Research Data SystemsExperience / Internal

Research Data Systems at IITA: From Field Data Collection to FAIR Data Workflows

An experience-backed case study on research data workflows — from field data capture and validation to metadata, repositories, PostgreSQL, CKAN, FAIR principles, and AI-ready data foundations.

Graduate intern working around research data systems and workflow digitization · 2024 — ongoing

KoboToolbox
ODK
PostgreSQL
CKAN
Python
FAIR workflows

Role: Graduate intern working around research data systems and workflow digitization
Status: Experience / Internal
Timeline: 2024 — ongoing
Type: Research Data Systems
Access: Private · sanitized case study

Context & why it matters

Research data is only useful when it can be trusted, understood, accessed, and reused. Agricultural research generates field data, lab records, survey forms, metadata, and reporting workflows, and the challenge isn't only collecting that data but preserving its quality, structure, context, and traceability across the whole lifecycle. My exposure here has been as a graduate intern working around these systems — contributing to data-collection and digitization efforts, exploring repository and metadata tooling, and developing an understanding of how research data should flow — rather than owning institutional platforms.

Agricultural research decisions ripple outward, so the data behind them has to hold up to scrutiny.

Problem

Research forms are often long and complex, and field and lab data carry real quality risks. Collection, storage, repository, and reporting workflows are easily disconnected, while metadata and FAIR-aligned management are needed to keep data understandable — and increasingly, research data has to be reusable for analytics, reporting, and future AI workflows.

Work areas

Digital data collection

KoboToolbox and ODK-style workflows
Long, multi-section research forms
Validation and constraints at entry
Skip logic for conditional questions
Structured capture instead of free text

PostgreSQL & data storage

Relational thinking for research entities
Structured, queryable storage
Data integrity through constraints
Reporting-ready tables

CKAN & research repositories

Exploring CKAN for dataset repositories
Repository modernization thinking
Metadata and dataset discoverability
FAIR-aligned workflows

Laboratory workflow digitization

Mapping how lab data moves
Sample and data traceability
Workflow mapping
Archiving and reporting considerations

AI-ready data foundations

Well-described datasets
Consistent schemas
Data validation
Reusable pipelines
Analytics and ML readiness

Architecture

Research data moves through a lifecycle rather than living in one place: field/lab capture → validation → structured storage → metadata & documentation → repository & discovery → reporting & analytics → AI-ready reuse. Designing around that flow is what keeps data trustworthy and reusable at every step.

Capture — structured field and lab collection (KoboToolbox / ODK-style forms)
Validation — constraints and checks applied at entry
Structured storage — relational, queryable storage (PostgreSQL)
Metadata & documentation — describing datasets so they stay understandable
Repository & discovery — making datasets findable (CKAN-style repositories)
Reporting & analytics — turning validated data into reporting
AI-ready reuse — consistent, well-described data ready for modelling

Architecture diagram — to be added

Technical decisions

Data quality starts at capture: The cheapest place to prevent bad data is the form itself — constraints, skip logic, and validation at entry.
Metadata is part of the system, not an afterthought: Data you can't describe is data you can't trust or reuse, so metadata belongs in the design from the start.
PostgreSQL is a strong foundation for structured research data: Relational integrity and queryability make it a dependable base for data that has to hold up over time.
FAIR workflows need both technical and human process design: Findable, accessible, interoperable, reusable data depends as much on agreed process as on tooling.
Research software must respect institutional workflows: Tools that ignore how researchers actually work get worked around; fitting the workflow is the point.
AI-ready data needs traceability, structure, and context first: Before modelling, data has to be structured, validated, and well-described — otherwise the models inherit the mess.

What it demonstrates

Shows I can operate inside real research workflows where data quality is non-negotiable.

Understanding research data lifecycles end to end
Reasoning about field-to-repository workflows
Working with tools like KoboToolbox, PostgreSQL, and CKAN
Connecting research data quality to analytics and AI readiness
Translating institutional workflows into technical structure

Proof assets

Some proof assets use dummy data or are shared as private walkthroughs to protect sensitive systems and records.

Research data workflow diagram — to be added
DiagramPlanned
Research data workflow diagram
The field-to-reuse data lifecycle.
Planned — to be added
DocumentationPlanned
Dummy Kobo/XLSForm sample
An example form with validation and skip logic.
Planned — to be added
DocumentationCase study only
CKAN architecture notes
Sanitized notes on repository structure and metadata.
Shared as a sanitized case study
Lab digitization concept diagram — to be added
DiagramPlanned
Lab digitization concept diagram
How lab data moves and stays traceable.
Planned — to be added
DocumentationPlanned
PostgreSQL schema example with dummy data
A sample schema populated with non-real data.
Planned — to be added

Privacy

Next steps

Add sanitized workflow diagrams
Add a dummy Kobo/XLSForm example
Add a small PostgreSQL-backed research data demo
Add a FAIR metadata checklist
Add a lab data lifecycle diagram

Stack

KoboToolbox
ODK
PostgreSQL
CKAN
Python
FAIR workflows