Research Data Systems at IITA: From Field Data Collection to FAIR Data Workflows
An experience-backed case study on research data workflows — from field data capture and validation to metadata, repositories, PostgreSQL, CKAN, FAIR principles, and AI-ready data foundations.
- KoboToolbox
- ODK
- PostgreSQL
- CKAN
- Python
- FAIR workflows
- Graduate intern working around research data systems and workflow digitization
- Experience / Internal
- 2024 — ongoing
- Research Data Systems
- Private · sanitized case study
Context & why it matters
Research data is only useful when it can be trusted, understood, accessed, and reused. Agricultural research generates field data, lab records, survey forms, metadata, and reporting workflows, and the challenge isn't only collecting that data but preserving its quality, structure, context, and traceability across the whole lifecycle. My exposure here has been as a graduate intern working around these systems — contributing to data-collection and digitization efforts, exploring repository and metadata tooling, and developing an understanding of how research data should flow — rather than owning institutional platforms.
Agricultural research decisions ripple outward, so the data behind them has to hold up to scrutiny.
Problem
Research forms are often long and complex, and field and lab data carry real quality risks. Collection, storage, repository, and reporting workflows are easily disconnected, while metadata and FAIR-aligned management are needed to keep data understandable — and increasingly, research data has to be reusable for analytics, reporting, and future AI workflows.
Work areas
Digital data collection
- KoboToolbox and ODK-style workflows
- Long, multi-section research forms
- Validation and constraints at entry
- Skip logic for conditional questions
- Structured capture instead of free text
PostgreSQL & data storage
- Relational thinking for research entities
- Structured, queryable storage
- Data integrity through constraints
- Reporting-ready tables
CKAN & research repositories
- Exploring CKAN for dataset repositories
- Repository modernization thinking
- Metadata and dataset discoverability
- FAIR-aligned workflows
Laboratory workflow digitization
- Mapping how lab data moves
- Sample and data traceability
- Workflow mapping
- Archiving and reporting considerations
AI-ready data foundations
- Well-described datasets
- Consistent schemas
- Data validation
- Reusable pipelines
- Analytics and ML readiness
Architecture
Research data moves through a lifecycle rather than living in one place: field/lab capture → validation → structured storage → metadata & documentation → repository & discovery → reporting & analytics → AI-ready reuse. Designing around that flow is what keeps data trustworthy and reusable at every step.
- Capture — structured field and lab collection (KoboToolbox / ODK-style forms)
- Validation — constraints and checks applied at entry
- Structured storage — relational, queryable storage (PostgreSQL)
- Metadata & documentation — describing datasets so they stay understandable
- Repository & discovery — making datasets findable (CKAN-style repositories)
- Reporting & analytics — turning validated data into reporting
- AI-ready reuse — consistent, well-described data ready for modelling
Technical decisions
- Data quality starts at capture
- The cheapest place to prevent bad data is the form itself — constraints, skip logic, and validation at entry.
- Metadata is part of the system, not an afterthought
- Data you can't describe is data you can't trust or reuse, so metadata belongs in the design from the start.
- PostgreSQL is a strong foundation for structured research data
- Relational integrity and queryability make it a dependable base for data that has to hold up over time.
- FAIR workflows need both technical and human process design
- Findable, accessible, interoperable, reusable data depends as much on agreed process as on tooling.
- Research software must respect institutional workflows
- Tools that ignore how researchers actually work get worked around; fitting the workflow is the point.
- AI-ready data needs traceability, structure, and context first
- Before modelling, data has to be structured, validated, and well-described — otherwise the models inherit the mess.
What it demonstrates
Shows I can operate inside real research workflows where data quality is non-negotiable.
- Understanding research data lifecycles end to end
- Reasoning about field-to-repository workflows
- Working with tools like KoboToolbox, PostgreSQL, and CKAN
- Connecting research data quality to analytics and AI readiness
- Translating institutional workflows into technical structure
Proof assets
Some proof assets use dummy data or are shared as private walkthroughs to protect sensitive systems and records.
- Planned
Research data workflow diagram
The field-to-reuse data lifecycle.
- Planned
Dummy Kobo/XLSForm sample
An example form with validation and skip logic.
- Case study only
CKAN architecture notes
Sanitized notes on repository structure and metadata.
- Planned
Lab digitization concept diagram
How lab data moves and stays traceable.
- Planned
PostgreSQL schema example with dummy data
A sample schema populated with non-real data.
Privacy
Next steps
- Add sanitized workflow diagrams
- Add a dummy Kobo/XLSForm example
- Add a small PostgreSQL-backed research data demo
- Add a FAIR metadata checklist
- Add a lab data lifecycle diagram
Stack
- KoboToolbox
- ODK
- PostgreSQL
- CKAN
- Python
- FAIR workflows