Data For Development


image source: http://research.kraeutli.com/

This is an older post originally written for the International Agriculture Colloquium - a student organization at the University of Wisconsin-Madison.

I’ve been thinking a bit about data. In agricultural development, much like nearly anything else, data-driven approaches are gaining increasing traction and are becoming essential prerequisites for funding and project success at nearly any scale. This being said, there are multiple significant challenges that sidecar our burgeoning capacity for generating and analyzing data. Today I’d like to talk about three of them.

How do we catalogue data?

As we know, working in international agriculture more often than not means that you are functioning in a complex nexus of multiple disciplines – social sciences, global health, agricultural sciences (of which there are many), economics, policy and sustainability science, to name but a few. The scientific community is finally beginning to acknowledge the importance of synergistic approaches to problem solving instead of, or in addition to, targeted interventions in an isolated arena. However, using data to direct, course-correct and evaluate multi-disciplinary projects is challenging because it requires integration of disparate data types that often don’t play nice with each other.

If they weren’t before, this is where things get really un-sexy.

A major logistical hurdle to successfully combining disparate data streams lies in the current lack of widely used, integrated metadata standards. It seems like there are simultaneously hundreds of metadata standards that have been developed to suit various purposes, but not the correct standards available because nobody seems to be able to use them in a way that meets the needs of the broader scientific and regulatory community, at least in reference to global food security projects. When researchers get a grant, they don’t want to be burdened by unnecessarily comprehensive metadata requirements, which is understandable. But we need to agree upon a way to make different types of geospatial, biogeochemical, genetic, meteorological and demographic data centralized, persistent, consistently tagged and easy to reference.

One interesting thought on this front is the development of knowledge bases and ontologies, which utilize innovative data structures and relationships to improve searchability and utility of data far beyond the capacity of a traditional database. These knowledge bases or ontologies can then be referenced by integrated programs or models (called knowledge-based systems) that discover and manipulate emergent properties of the data by teasing apart relationships in the knowledge base. One of the more well known knowledge-based systems is an autonomous medical diagnostic tool developed at Stanford in the 1970s called Mycin. Artificial Intelligence researchers are at the forefront of this movement, and food security researchers and professionals should make a pointed effort to interface with the AI community to see how we can leverage their discoveries.

How do we create meaning?

This is a fundamental and age-old statistical challenge: how does a researcher utilize the data she has to extract accurate and actionable findings? I’m not a statistician so I have limited insights to offer here, sadly. However, I think it’s important to remind ourselves of the sensitivity of our conclusions to our statistical methods, and to work towards making our findings as robust and reproducible as possible.

The complexity of agricultural development means that projects will, by design, influence many variables simultaneously. (In fact, this happens in real life in most disciplines but many not be explicitly recognized). It is a big challenge to parse out the directionality of relationships between variables and to measure the effects that a project, shock or policy change might have on a community – particularly over short timescales.

As the statistician George Box quipped, “essentially, all models are wrong, but some are useful”. There is a tradeoff in model and framework complexity and applicability. Condensing data into reliable and actionable distillates is hard.

The indicator:model interface.

Building out our understanding of how agroecosystems are intertwined with social, environmental, and geopolitical outcomes requires the development of integrated models. There are fantastic models with a variety of scopes already under development by groups of very smart people (see AgMIP and GEOGLAM, for example). However, I think that advancing novel cross-sectoral models will face a challenge that I’m conceptualizing as the indicator:model interface.

Specifically, in an incredibly complex system or set of systems, how do you know what to measure and how to build your model?

There are certain relationships that are intuitive (as well as comprehensively documented), e.g. how socially-significant drought is induced by both rainfall deficits and the evaporation potential in a given agroecological zone. However, many relationships in an integrated ag-socio-environmental model would be much less clear.

The selection of indicators to measure, which then gather data used to classify and parameterize a model, requires a priori assumptions about what the model might look like. This puts the cart before the horse a bit, and requires an iterative correctional process to make sure that we are measuring the right stuff and actually discovering meaningful relationships that can be levied to improve lives.

The bottom line?

This stuff is complex. But we are learning from our mistakes and have greater awareness, humility and access to powerful technological tools than ever before. If we’re careful and collaborative, we can levy these strengths into transformative improvements at home and abroad.