Source: UNIVERSITY OF VIRGINIA submitted to NRP
FACT: NETWORK MODELS OF FOOD SYSTEMS AND THEIR APPLICATION TO INVASIVE SPECIES SPREAD
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1019766
Grant No.
2019-67021-29933
Cumulative Award Amt.
$499,952.00
Proposal No.
2018-09158
Multistate No.
(N/A)
Project Start Date
Sep 1, 2019
Project End Date
Aug 31, 2025
Grant Year
2019
Program Code
[A1541]- Food and Agriculture Cyberinformatics and Tools
Recipient Organization
UNIVERSITY OF VIRGINIA
(N/A)
CHARLOTTESVILLE,VA 22901
Performing Department
Biocomplexity Institute
Non Technical Summary
Agricultural commodity flow networks are a critical component of modern food systems. They also serve as conduits for pest, pathogen and contaminant dispersal. Understanding these food flows and their role ininvasive species spread is essential for food security, and preserving biodiversity, health and economic stability. This project seeks to develop (i) novel network representations and analytics to understand domestic agricultural commodity flows in the United States (ii) pest spread and impact models that account for natural and human-mediated pathways of spread. We apply our models to the study of Tuta absoluta, a devastating pest of the tomato crop.The project will employ state-of-the-art statistical and machine-learning techniques for data integration and network construction. We will develop methods for structural and dynamical analysis of these networks in a novel context of directed and time-varying networks. Agent-based epidemiological models from the infectious disease literature will be adapted for the pest spread model with implementation of various types of interventions. Partial equilibrium models will be used for economic impact analysis.The project will contribute novel network-based approaches for data integration, data analytics and computational modeling. In the context of invasive species, the developed tools will provide policy makers with guidance and support to identify vulnerabilities in the food system, inform monitoring efforts and assess various intervention strategies. These analyses will be particularly valuable and timely to address the imminent threat of T. absoluta invasion. The project will nurture graduate, undergraduate and K-12 programsthroughinterdisciplinary researchand team science.
Animal Health Component
30%
Research Effort Categories
Basic
70%
Applied
30%
Developmental
0%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
1361469208025%
6097410209025%
2167410117025%
2111469117025%
Goals / Objectives
Goal 1: Data exploration, curation and preliminary analysisWe will be analyzing and incorporating datasets from multiple domains, which include but are not limited to commodity flows, production, demographics, operations, pest interception, climate, etc.Each dataset, depending on the type will be standardized and stored along with metadata in a database. Also, we will store metadata on reports and research articles containing relevant qualitative data. We will perform analysis of these datasets individually to identify inconsistencies and anomalies if any.Goal 2: Modeling and analysis of commodity flows.We will develop methods to construct high-resolution attributed network representations of commodity-specific domestic flows for the US market. Node attributes correspond to production/consumption at the location, GDP and population, type of operation, to name a few.b. Network constructionHere, the objective is to create spatiotemporal commodity flow networks. We will explore methods to construct networks at multiple scales along three axes: (i) spatial: state-FAF region-district (resolution from low to high); (ii) time: annual-seasonal-monthly and (iii) commodity: vegetables-specific vegetable. The specific vegetables we will consider are the primary host crops of T. absoluta. It is expected that the challenges increase with increase in resolution. Broadly, there are two steps involved in the process: (a) estimation of node attribute values, and (b) estimation of flows.c. Structural and dynamical network analysisGiven the constructed networks, we will study their local properties such as total outflow, total inflow, degree, clustering coefficients and betweenness centralities, to name a few. We will also develop algorithms to discover higher-order structures such as clusters (well-connected subgraphs, relatively less connected with the rest of the graph) and network motifs. In particular, we will develop methods to discover properties that are important for spread dynamics, i.e., properties that are important with respect to propagation models (some examples arenetwork reliability andspectral properties).Goal 3: Multi-pathway models for invasive species spread.A multi-pathway modeling framework that couples the above commodity flow models with ecological and bioeconomic models to assess the spread and impact of non-indigenous invasive species.a. Model design and implementationWe will develop a stochastic network-based propagation model to simulate the multi-pathway spread of T. absoluta. Short-distance dispersal captures the spread through natural means. Long-distance human-mediated dispersal corresponds to spread through trade between localities (large urban areas with high production and/or consumption).b. Analyze spread under various hypothetical scenariosWe will apply the multi-pathway model to analyze the possible spread patterns under various scenarios of introduction, monitoring and control strategies. A key subtask here will be to analyze the PestID database and identify locations which have historically been susceptible to pest introductions. This analysis will provide candidate routes of entry. Through simulations with the starting conditions informed by the study, we will analyze the patterns and timeline of spread. This analysis will help identify high risk locations and hubs that facilitate large scale dispersal.c. Economic impact assessmentThe assessment of the economic impact resulting from invasion requires an integration of information on: 1) the biology, ecology and damage caused; 2) its entry; 3) establishment; 4) spread; 5) valuation of assets at risk; and 6) market consequences. To compute the impact, we will explore two different measures: 1) the direct impact and, 2) the total impact. The direct impact measures the direct revenue loss from the crop as the sum of loss encountered by each county or administrative/geographic unit considered, which in turn depends on proportional loss in the affected area, yield per unit of land in the district before being affected, tomato production area in the district, and the proportion of area affected by the pest which is informed by the spread model.Goal 4: Uncertainty quantification and sensitivity analysisGiven that the models are complex and data for validation is generally inadequate (both for network construction as well as multi-pathway model), uncertainty quantification and sensitivity analysis are crucial to evaluate the robustness of the models and credibility of the results. For the commodity flow networks, we will consider parameter variability (resulting from approximating flows), structural uncertainty (resulting from difference in the edge sets inferred) as well as uncertainty in the functional relationship between flows and node & edge attributes. The key question is, how it affects the structural and dynamical properties of the network. For example, how do centrality measures of vertices vary with such perturbations or how does cluster decomposition change. For the multi-pathway model, machine learning and Gaussian process surrogates will be used to perform sensitivity analysis and parameter importance.
Project Methods
Scientific MethodsWe will be using spatial-interaction models such as gravity models and radiation models estimate the commodity flows. Machine learning techniques such as regression, decision trees and deep learning will be used to learn the functional relationships between the flows and their drivers. For the structural analysis of the networks will use various betweenness centrality algorithms, spectral methods, dense subgraph and network motif discovery algorithms, and community detection algorithms. These will be particularly aimed at directed weighted networks. For dynamical analysis, network epidemiological models will be designed incorporating the commodity flow networks. Simulations will be run on the networks to study hypothetical invasion and intervention scenarios, assess importance of nodes, vulnerable regions, hubs of spread, etc. Surrogate models will be used for parameter space exploration and sensitivity analysis. To assess total economic impact, we will use the partial equilibrium approach, which accounts for the dynamics of the market, efficacy of interventions, etc. Given that the models are complex and data for validation is generally inadequate (both for network construction as well as multi-pathway model), uncertainty quantification and sensitivity analysis are crucial to evaluate the robustness of the models and credibility of the results.EffortsResults of the project will be presented in conferences and workshops. Leveraging other projects that the PI and co-PIs are participating in, we will host workshops, conduct courses and webinars.Evaluation. This is organized by goals and objectives.1(a). Data exploration, curation and preliminary analysisWhen: Years 1 and 2Milestones:(i) Relevant datasets identified and collected for network construction,modeling and impact analysis.(ii) They will be analyzed individually and in combination to assess their utility and role in commodity flow and multipathway model.Measures of success:(i) Datasets made available in a central database or repository. Each dataset, depending on the type standardized and stored along with metadata.(ii) Documentation of datasets, processing steps and how we plan to incorporate them into the commodity flow networks and pathway models;(iii) Code for preprocessing and analyzing data will be verified (using test cases) and documented.1(b) Network constructionWhen: Years 1 and 2Milestones:(i) Seasonal trade networks of solanaceous crops at various spatial scales will be generated.(ii) Primary drivers of trade will be determined.Measures of success:(i) Validation of network structure and volume using sample flow data, production and consumption information;(ii) Networks will be made available in a central database or repository;(iii) Documentation of network construction pipeline.(iv) Code will be verified and documented.1(c) Structural and dynamical network analysisWhen: Years 1 and 2Milestones:(i) Identification of important nodes, edges, network motifs and communities in the networks.(ii) Uncertainty quantification and sensitivity analysis will be performed.Measures of success:(i) Network measures such as betweenness centrality, spectral methods, network reliability and hyperbolicity will be applied. The results will be compared with previous works on trade network analytics.(ii) Methods and results will be documented, published and presented.(iii) Code will be verified and documented.2(a) Model design and implementationWhen: Years 2, 3 and 4Milestones:(i) Verified and validated multipathway model.(ii) Emergent properties of the model will be analyzed through HPC simulations and machine learning surrogates. Collective roles of different pathways will be determined.Measures of success:(i) For validation of the model, incidence reports from some T. absoluta infested regions will be collected. The model will be calibrated and validated for these datasets. Transfer learning techniques will be applied to apply this validated model to the North American setting.(ii) Rigorous sensitivity analysis will be conducted.(iii) Methods and results will be documented.(iv) Code will be verified and documented.(v) Project towards thesis.2(b) Analyze spread under various hypothetical scenariosWhen: Years 3 and 4Milestones:(i) Identification of potential routes of introduction and possible response scenarios due to hypothetical invasion.(ii) Analysis of counterfactual scenarios.Measures of success:(i) Findings will be compared invasion records from Europe and other regions.(ii) Methods and results will be documented, published and presented.(iii) Code will be verified and documented.(iv) Projects towards student theses.3(c) Economic impact assessmentWhen: Years 3 and 4Milestones:(i) Analysis of production, pricing and loss data of commodities, and intervention costs(ii) Design and implementation of the economic surplus model.(iii) Analysis of counterfactual scenarios.Measures of success:(i) Findings will be compared invasion records from Europe and other regions.(ii) Methods and results will be documented, published and presented.(iii) Code will be verified and documented.(iv) Projects towards student theses.

Progress 09/01/23 to 08/31/24

Outputs
Target Audience:Goal 1: Data exploration, curating and preliminary analysis Detailed analysis of AMS Movement and AMS Terminal Market data were conducted with the objective of constructing realistic commodity flow networks for tomatoes. Data from Terminal Market contains daily/weekly tomato price and origin information for select wholesale terminal markets. These were analyzed in detail to with respect to Mature Greens, Immature Greens, Vine Ripe, cherry tomatoes, and plum-type tomatoes. Movement Report Data from USDA AMS contains weekly shipment by weight and origin. The origins include California Central, Alabama, Arkansas, Florida, Michigan, North Carolina, South Carolina, Tennessee, and Virginia. The data also includes external origins in the form of imports from Canada, the Dominican Republic, Guatemala, Honduras, and Mexico. The data was collected from 2021 to 2024, only required columns chosen, and standardized. Data from AgCensus and various remote sensing-based datasets were explored to map farms and other agricultural infrastructure. Goal 2: Modeling and analysis of commodity flows. The two datasets were linked by connecting the sources from Terminal Market data and regions from the Movement data. In some cases, two sources in the former may have to be mapped to one region in the latter. These decisions were taken based on geographic proximity considerations. While the Terminal market data provides edge attributes, the Movement data provides node attributes. Given a terminal node, a source node, their features (such as population, production, and consumer price index), time, and source-terminal attributes such as distance, the objective is to determine if there is commodity flow from source to terminal. There are only a subset of terminals for which commodity flow information is available. Hence, this is a semi-supervised learning formulation where the objective is to be able to predict the flows for the unobserved terminals. Firstly, we would like to leverage the spatial location information of the various terminals and the sources through the use of a distance graph that can account for the spatial correlations. The idea is that terminals that are close by are probably sourcing the commodity in similar ways as opposed to those that are far from each other. Similar relations hold for sources as well. Secondly, we would like to leverage seasonal or temporal information as the flow of agricultural commodities are driven by their production that depends on the time of the year as well as the location. The approach to construct this network is a product graph based graph neural network. Multiple semi-supervised methods are being applied as baselines to compare with the proposed approach. Remote-sensing and deep learning methods were applied to map agricultural infrastructure. These methods and the data products will be applied to estimate accurately, tomato and other vegetable production in the network models. Goal 3(b) Analyze spread under various hypothetical scenarios The work from 2021 on interventions was finalized and submitted to Nature Computational Science. Additional experiments were conducted regarding robustness to sampling complexity and model uncertainty. A summary of the work is provided. Optimal control of spread processes over networks is a challenging problem even for simple diffusion models. Biological invasions are characterized by multiple spread pathways and time-varying network attributes. In this setting, we study the problem of region-wide interventions, where the objective is to find an optimal set of regions represented by groups of nodes in the underlying network to minimize the spread given budget constraints, intervention delays, and a spread scenario. We present an approach applicable to a general class of diffusion models based on integer linear programming and sample average approximation, and prove rigorous bounds on its performance. We apply this method to the spread of a representative agricultural pest. Our approach provides near-optimal solutions and consistently outperforms considered baselines. Our results highlight the importance of scenario-specific control and suggest that early intervention has the benefit of a significant reduction in spread under low budget as well as stable solutions under model uncertainty. Publications: A High-Resolution, US-scale Digital Similar of Interacting Livestock, Wild Birds, and Human Ecosystems with Applications to Multi-host Epidemic Spread. Abhijin Adiga, Ayush Chopra, Mandy L Wilson, SS Ravi, Dawen Xie, Samarth Swarup, Bryan Lewis, John Barnes, Ramesh Raskar and Madhav V Marathe. Submitted to PNAS. Scenario-specific Control of Multi-pathway Spread Processes: Application to Biological Invasions, Prathyush Sambaturu, Manisha Sudhir, Hongze Chen, Anil Vullikanti, Rangaswamy Muniappan, and Abhijin Adiga. Submitted to Nature Computational Science. ? Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?One postdoctoral fellow, two postgraduate students, and one undergraduate were mentored. One postgraduate student worked on surveillance problems in the context of spreading processes over networks. Another postgraduate student and the postrdoctoral fellow worked on the commodity flow problem and the graph neural network-based approach. The postgraduate student also assisted in experimentation for the interventions paper (coauthor of the submitted paper). The undergraduate student was tasked with analyzing various data sources (Goals 1 and 2). He also worked on developing deep learning surrogates for spread processes over networks. ? How have the results been disseminated to communities of interest?Papers have been submitted for possible publication. What do you plan to do during the next reporting period to accomplish the goals?Paper on constructing commodity flow networks will be written up and submitted. Based on reviews, work on the pending interventions paper will be completed. Datasets will be organized and made available in a data repository or github. The multipathway simulator along with the networks have already been made public. This repository will be updated. ?

Impacts
What was accomplished under these goals? Goal 1: Data exploration, curating and preliminary analysis Detailed analysis of AMS Movement and AMS Terminal Market data were conducted with the objective of constructing realistic commodity flow networks for tomatoes. Data from Terminal Market contains daily/weekly tomato price and origin information for select wholesale terminal markets. These were analyzed in detail to with respect to Mature Greens, Immature Greens, Vine Ripe, cherry tomatoes, and plum-type tomatoes. Movement Report Data from USDA AMS contains weekly shipment by weight and origin. The origins include California Central, Alabama, Arkansas, Florida, Michigan, North Carolina, South Carolina, Tennessee, and Virginia. The data also includes external origins in the form of imports from Canada, the Dominican Republic, Guatemala, Honduras, and Mexico. The data was collected from 2021 to 2024, only required columns chosen, and standardized. Data from AgCensus and various remote sensing-based datasets were explored to map farms and other agricultural infrastructure. Goal 2: Modeling and analysis of commodity flows. The two datasets were linked by connecting the sources from Terminal Market data and regions from the Movement data. In some cases, two sources in the former may have to be mapped to one region in the latter. These decisions were taken based on geographic proximity considerations. While the Terminal market data provides edge attributes, the Movement data provides node attributes. Given a terminal node, a source node, their features (such as population, production, and consumer price index), time, and source-terminal attributes such as distance, the objective is to determine if there is commodity flow from source to terminal. There are only a subset of terminals for which commodity flow information is available. Hence, this is a semi-supervised learning formulation where the objective is to be able to predict the flows for the unobserved terminals. Firstly, we would like to leverage the spatial location information of the various terminals and the sources through the use of a distance graph that can account for the spatial correlations. The idea is that terminals that are close by are probably sourcing the commodity in similar ways as opposed to those that are far from each other. Similar relations hold for sources as well. Secondly, we would like to leverage seasonal or temporal information as the flow of agricultural commodities are driven by their production that depends on the time of the year as well as the location. The approach to construct this network is a product graph based graph neural network. Multiple semi-supervised methods are being applied as baselines to compare with the proposed approach. Remote-sensing and deep learning methods were applied to map agricultural infrastructure. These methods and the data products will be applied to estimate accurately, tomato and other vegetable production in the network models. Goal 3(b) Analyze spread under various hypothetical scenarios The work from 2021 on interventions was finalized and submitted to Nature Computational Science. Additional experiments were conducted regarding robustness to sampling complexity and model uncertainty. A summary of the work is provided. Optimal control of spread processes over networks is a challenging problem even for simple diffusion models. Biological invasions are characterized by multiple spread pathways and time-varying network attributes. In this setting, we study the problem of region-wide interventions, where the objective is to find an optimal set of regions represented by groups of nodes in the underlying network to minimize the spread given budget constraints, intervention delays, and a spread scenario. We present an approach applicable to a general class of diffusion models based on integer linear programming and sample average approximation, and prove rigorous bounds on its performance. We apply this method to the spread of a representative agricultural pest. Our approach provides near-optimal solutions and consistently outperforms considered baselines. Our results highlight the importance of scenario-specific control and suggest that early intervention has the benefit of a significant reduction in spread under low budget as well as stable solutions under model uncertainty. Publications: A High-Resolution, US-scale Digital Similar of Interacting Livestock, Wild Birds, and Human Ecosystems with Applications to Multi-host Epidemic Spread. Abhijin Adiga, Ayush Chopra, Mandy L Wilson, SS Ravi, Dawen Xie, Samarth Swarup, Bryan Lewis, John Barnes, Ramesh Raskar and Madhav V Marathe. Submitted to PNAS. Scenario-specific Control of Multi-pathway Spread Processes: Application to Biological Invasions, Prathyush Sambaturu, Manisha Sudhir, Hongze Chen, Anil Vullikanti, Rangaswamy Muniappan, and Abhijin Adiga. Submitted to Nature Computational Science. ?

Publications


    Progress 09/01/22 to 08/31/23

    Outputs
    Target Audience: Nothing Reported Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?One graduate student and one undergraduate student were mentored. The graduate student worked on surveillance problems in the context of spreading processes over networks. The undergraduate student was tasked with analyzing various data sources (Goals 1 and 2). He also worked on developing deep learning surrogates for spread processes over networks. How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals?Task 2: Modeling and analysis of agricultural commodity flows Machine learning based methods are being applied to infer commodity flow links between locations. Task 3: Analyze spread under various hypothetical scenarios The focus will be on spatial surveillance algorithms. Novel Steiner-tree-based algorithms are being developed for directed weighted temporal networks. Multiple publications planned for this period: 1. Controlling the multi-pathway spread of invasive species using multi-scale network interventions. Journal version (Manisha Sudhir's thesis work) 2. Realistic Commodity Flow Networks to Assess Vulnerability of Food Systems. Journal version (revised version being submitted) 3. Reconstructing an Epidemic Outbreak Using Steiner Connectivity (extension of Mishra et al. 2023 proceedings paper)

    Impacts
    What was accomplished under these goals? Goal 3(b) Analyze spread under various hypothetical scenarios Mishra et al. 2022: We develop a community detection method to understand interdependencies in a food network with respect to spread processes. The objective is to find groups of nodes in the network (such as counties or states) with strong influence within groups. This will aid in the design of effective surveillance and control methods. Community detection in networks is extensively studied from a structural perspective, but very few works characterize communities with respect to dynamics on networks. We propose a generic framework based on Moore-Shannon network reliability for defining and discovering communities with respect to a variety of dynamical processes. This approach extracts communities in directed edge-weighted networks which satisfy strong connectivity properties as well as strong mutual influence between pairs of nodes through the dynamical process. We apply this framework to global as well as national-level vegetable and cereal networks. We compare our results with modularity-based approach, and analyze community structure across commodities, evolution over time, and with regard to dynamical system properties. Mishra et al. 2023: Only a subset of infections/invasions is actually observed in an outbreak, due to multiple reasons such as under-reporting or being unable to identify the invading organism. Therefore, reconstructing an epidemic cascade given some observed cases is an important step in responding to such an outbreak. A maximum likelihood solution to this problem (referred to as CASCADEMLE) can be shown to be a variation of the classical Steiner subgraph problem, which connects a subset of observed infections. In contrast to prior works on epidemic reconstruction, which consider the standard Steiner tree objective, we show that a solution to CASCADEMLE, based on the actual MLE objective, has a very different structure. We design a logarithmic approximation algorithm for CASCADEMLE, and evaluate it on multiple synthetic and social contact networks, including a contact network constructed for a hospital. Our algorithm has significantly better performance compared to a prior baseline. Harrison et al. 2023: We consider the setting of cascades that result from contagion dynamics on large realistic networks. We address the question of whether the structural properties of a (partially) observed cascade can characterize the contagion scenario and identify the interventions that might be in effect. Using epidemic spread as a concrete example, we study how social interventions such as compliance in social distancing, extent (and efficacy) of vaccination, and the transmissibility of disease can be inferred. The techniques developed are more generally applicable to other contagions as well. Through a machine learning approach, coupled with parameter significance analysis, our experimental results show that subgraph counts of the graph induced by the cascade can be used effectively to characterize the contagion scenario even during the initial stages of the epidemic. Further, we show that our approach performs well even for partially observed cascades. These results demonstrate that cascade data collected from delimiting surveys or contact tracing can provide valuable information about the contagion scenario. Publications: Mishra, Ritwick, et al. "Community Detection Using Moore-Shannon Network Reliability: Application to Food Networks." International Conference on Complex Networks and Their Applications. Cham: Springer International Publishing, 2022.V Mishra, Ritwick, et al. "Reconstructing an Epidemic Outbreak Using Steiner Connectivity." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 37. No. 10. 2023. Harrison, Galen, et al. "Identifying Complicated Contagion Scenarios from Cascade Data." Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023.

    Publications

    • Type: Conference Papers and Presentations Status: Accepted Year Published: 2022 Citation: Mishra, Ritwick, et al. "Community Detection Using Moore-Shannon Network Reliability: Application to Food Networks." International Conference on Complex Networks and Their Applications. Cham: Springer International Publishing, 2022.V
    • Type: Conference Papers and Presentations Status: Accepted Year Published: 2023 Citation: Mishra, Ritwick, et al. "Reconstructing an Epidemic Outbreak Using Steiner Connectivity." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 37. No. 10. 2023.
    • Type: Conference Papers and Presentations Status: Accepted Year Published: 2023 Citation: Harrison, Galen, et al. "Identifying Complicated Contagion Scenarios from Cascade Data." Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023.


    Progress 09/01/21 to 08/31/22

    Outputs
    Target Audience: Nothing Reported Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Two undergraduate students, one graduate student, one postdoc, and a high school student were mentored during this period. One undergraduate student was tasked with analyzing various data sources (Goals 1 and 2). In this process the student was primarily mentored in machine learning techniques and network science. Another undergraduate student was tasked with network analysis. The students also presented research papers related to network diffusion processes. The graduate student has been working on two directions. The first direction corresponds to cascade reconstruction and surveillance type problems. The second work corresponds to dynamics-based community detection that resulted in a conference paper and a workshop presentation (presented by the student). The high school student has worked on the dynamics-based community detection and helped collect data on economic impact. The postdoc contributed to the application of artificial neural networks for the problem of network construction. How have the results been disseminated to communities of interest?Results have been communicaed through publications. What do you plan to do during the next reporting period to accomplish the goals?Based on our analysis, there is a need to obtain sample data on end-to-end tomato trade between locations. Current effort is on obtaining such data. Even information providing evidence of flow (presence or absence of a link) can be used for learning and validation of networks. Several new features are being added to the multi-pathway simulator. These include several kernels for short-distance spread such as Gaussian, exponential, Weibull, etc. Also, location-specific population growth parameter is being implemented. This is particularly important when simulating for the conternimus USA, where the growth rates of T. absoluta can differ greatly. New surveillance and control algorithms are being developed using state-of-the-art algorithmic approaches. Economic impact studies will be conducted using the data collected, spread model and the partial equilibrium approach. Multiple publications planned for this period: 1. Controlling the multi-pathway spread of invasive species using multi-scale network interventions. Journal version (Manisha Sudhir's thesis work) 2. Realistic Commodity Flow Networks to Assess Vulnerability of Food Systems. Journal version (extension of conference paper in Complex Networks and Applications) 3. Assesing the pathways of introduction and spread of T. absoluta in North America. 4. Reconstructing an Epidemic Outbreak Using Steiner Connectivity

    Impacts
    What was accomplished under these goals? Goal 1: Data exploration, curating and preliminary analysis More datasets will be explored. This includes at least the following sources: (i) USDA APHIS Pest interception database to which we recently gained access, (ii) more data for economic impact analysis from ERS (iii) import-export data, and (iii) operations data from the industry collaborator. Goal 2: Modeling and analysis of commodity flows. We studied community structures in commodity flow networks. Very few works characterize communities with respect to diffusion processes on networks. Understanding the community structure of international food networks particularly in the context of spread processes representing cascading failures and biological invasions can help inform surveillance and control strategies. We propose a generic framework based on Moore-Shannon network reliability for defining and discovering communities with respect to a variety of dynamical processes. This approach extracts communities in directed edge-weighted networks which satisfy strong connectivity properties as well as strong mutual influence between pairs of nodes through the dynamical process. We apply this framework to country-to-country networks from FAO and US domestic flow networks from FAF. We compare our results with modularity- based approach, and analyze community structure across commodities, evolution over time, and with regard to dynamical system properties. We have considered motif counting in temporal networks to analyze the vulnerability in commodity flow networks arising due to seasonality in production. Motifs are small template subgraphs in the network with special properties. The number of motifs of a particular kind can indicate anomalies in the structure that could indicate vulnerability to invasion or rapid spread. Goal 3: Multi-pathway models for invasive species spread. The network-based modeling and simulation approaches applied to study biological invasions tend to be largely domain-specific, lacking any graph theoretic formalisms, and do not take advantage of more recent developments in network science. To rigorously understand these processes, we developed a generic multi-scale spatial network framework that is applicable to a wide range of models developed in the literature on biological invasions. A key question we address is the following: how do individual pathways and their combinations influence the rate and pattern of spread? The analytical complexity arises more from the multi-scale nature and complex functional components of the networks rather than from the sizes of the networks. We present theoretical bounds on the spectral radius and the diameter of multi-scale networks. These two structural graph parameters have established connections to diffusion processes. Specifically, we study how network properties, such as spectral radius and diameter are influenced by model parameters. Further, we analyze a multi-pathway diffusion model from the literature by conducting simulations on synthetic and real-world networks and then use regression tree analysis to identify the important network and diffusion model parameters that influence the dynamics. Publications: 1. Mishra et al., Community Detection using Moore-Shannon Network Reliability: Application to Food Networks, International Conference on Complex Networks and Their Applications 2022. 2. Mishra et al., Communities in Directed Weighted Food Networks using Moore-Shannon Network Reliability, Workshop "Communities in networks" in NetSci 2022. 3. Adiga et al., Network Models and Simulation Analytics for Multi-scale Dynamics of Biological Invasions, Frontiers BigData 4. Adiga et al., Realistic Commodity Flow Networks to Assess Vulnerability of Food Systems, International Conference on Complex Networks and Their Applications 2021. 5. Manisha Sudhir. Controlling Diffusion on Multi-Pathway Spatial Networks: Application to Biological Invasions, Masters thesis, UVA.

    Publications

    • Type: Journal Articles Status: Accepted Year Published: 2022 Citation: Adiga, A., Palmer, N., Baek, Y. Y., Mortveit, H., & Ravi, S. S. (2022). Network Models and Simulation Analytics for Multi-scale Dynamics of Biological Invasions. Frontiers in big Data, 5.
    • Type: Conference Papers and Presentations Status: Accepted Year Published: 2021 Citation: Adiga, A., Palmer, N., Sinha, S., Waghalter, P., Dave, A., Lazarte, D. P., ... & Marathe, M. (2021, November). Realistic Commodity Flow Networks to Assess Vulnerability of Food Systems. In International Conference on Complex Networks and Their Applications (pp. 168-179). Springer, Cham.
    • Type: Conference Papers and Presentations Status: Accepted Year Published: 2022 Citation: Mishra et al., Community Detection using Moore-Shannon Network Reliability: Application to Food Networks, International Conference on Complex Networks and Their Applications 2022.
    • Type: Conference Papers and Presentations Status: Accepted Year Published: 2022 Citation: Mishra et al., Communities in Directed Weighted Food Networks using Moore-Shannon Network Reliability, Workshop "Communities in networks" in NetSci 2022.
    • Type: Theses/Dissertations Status: Accepted Year Published: 2021 Citation: Manisha Sudhir. Controlling Diffusion on Multi-Pathway Spatial Networks: Application to Biological Invasions, Masters thesis, UVA.
    • Type: Theses/Dissertations Status: Accepted Year Published: 2022 Citation: Prathyush Sambaturu. Controlling Epidemics on Networks Using Stochastic Optimization Techniques, PhD thesis, UVA.


    Progress 09/01/20 to 08/31/21

    Outputs
    Target Audience: Nothing Reported Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Three graduate students and one postdoc were mentored during this period. Two graduate students were tasked with analyzing various data sources (Goals 1 and 2). In this process the student was primarily mentored in machine learning techniques and network science. The student was also tasked with presenting research papers related to network diffusion processes. As part of her thesis work, the third graduate student worked on developing intervention algorithms. To this end, she completed an efficient implementation of the multi-pathway simulator using vectorized programming in pandas. She assisted in the design and development of various intervention schemes. The student helped in developing advanced optimization algorithms for interventions. She also conducted extensive experiments on real-world networks to evaluate the algorithms. Thesis has been defended, and a research paper is under review. The postdoc continues to explore the possibility of applying artificial neural networks for the problem of network construction (Goal 1). Data and methodological challenges have been identified. ? How have the results been disseminated to communities of interest?Adiga et al., Realistic Commodity Flow Networks to Assess Vulnerability of Food Systems. Submitted to Complex Networks 2021. Sambaturu et al., Controlling Diffusion on Multi-Pathway Spatial Networks: Application to Biological Invasions, Under review. Eubank et al., Perturbative methods for monotonic probabilistic satisfiability problems, under preparation. What do you plan to do during the next reporting period to accomplish the goals?Goal 1: Data exploration, curating and preliminary analysis More datasets will be explored. This includes at least the following sources: (i) USDA APHIS Pest interception database to which we recently gained access, (ii) more data for economic impact analysis from ERS (iii) import-export data, and (iii) operations data from the industry collaborator. Goal 2: Modeling and analysis of commodity flows. We will continue to work on modeling commodity flows from FAF networks. The main challenges is estimating commodity specific flows from aggregate vegetable flows. Developing new techniques for analyzing temporal weighted directed networks is critical in the analysis. Methods like eigenvector centrality and betweenness centrality measures are well understood for simple unweighted networks. We are developing algorithms where such measures will be adapted for food flow networks. Goal 3: Multi-pathway models for invasive species spread. We initiated the economic impact assessment with a number of meetings with collaborators. Preliminary datasets have been identified. These will be applied to partial equilibrium models. Goal 4: Uncertainty quantification and sensitivity analysis All the methods developed under the three goals will be rigorously analyzed with regard to robustness. We will analyze the sensitivity of the modeled networks to functional and structural perturbations of the modeled networks. Under these perturbations, we will also analyze how forecasts and outcomes of intervention algorithms change. Multiple publications planned for this period: 1. Network Models and Simulation Analytics for Multi-scale Dynamics of Biological Invasions, Frontiers BigData 2. Controlling the multi-pathway spread of invasive species using multi-scale network interventions. Journal version (conference version under review) 3. Dynamics-based clustering and its application to food flows. 4. Realistic Commodity Flow Networks to Assess Vulnerability of Food Systems. Journal version (conference version under review)?

    Impacts
    What was accomplished under these goals? Goal 1: Data exploration, curating and preliminary analysis In the previous period, datasets corresponding to Freight Analysis Framework (FAF), vegetable production from CROPSCAPE and trade matrix from FAO were downloaded, standardized, and stored in a PostgreSQL database. Since the data is from multiple sources, we had to standardize the datasets in order to analyze them in combination. In this period, for temporal disaggregation, tomato growing zones were obtained from https://www.tomatofest.com/Tomato_Growing_Zone_Maps_s/164.htm were digitized by manually assigning counties to corresponding zones as well as mapping growing period to months. This was stored in CSV and JSON formats. Data on economic impact regarding production quantities at state-level, yield, supply and demand elasticities have been collected. However, there is a lot of misalignment in time and spatial resolution. We are investigating methods to remedy this problem. Goal 2: Modeling and analysis of commodity flows. We have developed a general framework for constructing the spatiotemporal representation of production, flow, and consumption of agricultural commodities. These data representations are derived by fusing gridded, administrative-level, survey datasets on production, trade, and consumption. Further, they are periodic temporal networks reflecting seasonal variations in production and trade of the crop. We apply this approach to create networks of tomato flow for three regions -- conterminous United States, Senegal, and Nepal. Using statistical methods and network analysis, we gain insights into spatiotemporal dynamics of production and trade. Our results suggest that agricultural systems are increasingly vulnerable to attacks through trade of commodities due to their vicinity to regions of high demand and seasonal variations in production and flows. For the US, we have used data from CROPSCAPE and FAF in this framework. The monthly flows of specific commodities from annual commodity-aggregated flows. In general, spatiotemporal disaggregation of FAF flows is a hard problem. In our case, it is made harder by the fact that we want to infer commodity-specific flows (for e.g., tomato) from FAF networks. Here, the tomato flow is a component of the flow corresponding to SCTG=3 (other agricultural crops). In the current setup, we have come up with a preliminary model to construct such flows. Goal 3: Multi-pathway models for invasive species spread and interventions. More functional relationships have been added to the multi-pathway model developed in our previous work [McNitt et al. 2019]. We have added the functionality of radial spread based on Haversine distance. Interventions at the node-level as well as locality-level have been implemented. Secondly, using vectorized operations in python programming language we have made the simulator faster and therefore scalable to larger networks. Our current implementation is 10-50 times faster compared to the earlier version. We developed a multi-scale intervention framework with the objective of selecting few locations to setup traps or apply interventions such as pesticides in order to delay or stifle the spread of the pest in the event of its introduction. Optimal control of epidemics is a challenging problem even for simple diffusion processes over static networks. We developed this algorithm for the multi-scale epidemiological process on a temporal network in the context of invasive species spread across a landscape. In this setting, we study the problem of group-scale interventions, where the objective is to find an optimal set of regions represented by groups of nodes to minimize the spread under budget constraints and intervention delays. We present an integer linear programming based algorithm for finding effective group-scale interventions and prove rigorous bounds on its performance. We experimentally evaluate it on several real-world networks constructed in McNitt et al. with respect to budget, model uncertainties, introduction scenarios, intervention delays, and rounding schemes. It provides near-optimal solutions and outperforms considered baselines. Also, the performance of group-scale control compares well with the superior but impractical node-scale version of the algorithm. Further, we analyze our solutions for various seeding scenarios and model parameters. Our results indicate that early intervention has the benefit of significant reduction in spread for low budget and stable solutions under model uncertainty. McNitt, J., Chungbaek, Y. Y., Mortveit, H., Marathe, M., Campos, M. R., Desneux, N., ... & Adiga, A. (2019). Assessing the multi-pathway threat from an invasive agricultural pest: Tuta absoluta in Asia. Proceedings of the Royal Society B, 286(1913), 20191159.

    Publications


      Progress 09/01/19 to 08/31/20

      Outputs
      Target Audience: Nothing Reported Changes/Problems:Disruptions due to COVID-19: 1. Personnel effort: Our team has been involved in COVID modeling work since March 2020 to support policy making for the state and the university. In this regard, much of the resources had to be redirected. As a result, other projects such as this one have not received as much attention as planned at the beginning of this period. 2. Student hiring and mentorship: It has been difficult to hire and remotely mentor students. What opportunities for training and professional development has the project provided?Two graduate students and one postdoc were mentored during this period. One graduate student was tasked with downloading and analyzing various data sources (Goals 1 and 2). In this process the student was primarily mentored in database management systems, GIS and network analytics, and developed expertise in python programming language and PostgreSQL. The student was also tasked with presenting research papers related to network construction and related problems in linear and non-linear optimization. The second graduate student has been tasked with developing the invasive species spread simulation model, which is being implemented using python programming language (Goals 2 and 3). The main objectives are two-fold: (i) extend a previously implemented model to make it more general and efficient and (ii) enable integration with intervention algorithms framework when these are developed at a later stage in the project. The student is being mentored in vectorized operations in advanced python packages Numpy and Pandas. The student is also developing advanced optimization algorithms for monitoring and interventions. The student is being mentored on network dynamics and linear programming to this end. The postdoc is exploring the possibility of applying artificial neural networks for the problem of network construction (Goal 1). How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals?Goal 1: Data exploration, curation and preliminary analysis More datasets will be explored. This includes at least the following sources: (i) USDA APHIS Pest interception database to which we recently gained access (ii) data for economic impact analysis from ERS. Goal 2: Modeling and analysis of commodity flows. We will continue to work on modeling commodity flows from FAF networks. The main challenges are (i) estimating commodity specific flows from aggregate vegetable flows, and (ii) estimating time varying networks representing seasonal commodity flows from annual production and trade data. Literature survey indicates that there is no work done in this area to the best of our knowledge. However, there are several works on disaggregating available commodity flows to finer resolution spatial networks. Goal 3: Multi-pathway models for invasive species spread. The ongoing work on the multi-pathway model to make it efficient and more general will be completed (Goal 3a). The major work during this period will be the design of network-based monitoring and controlling algorithms, which has already been initiated (Goal 3b). We will also initiate the economic impact assessment (Goal 3c). Goal 4: Uncertainty quantification and sensitivity analysis All the methods developed under the three goals will be rigorously analyzed with regard to robustness. We will analyze the sensitivity of the modeled networks to functional and structural perturbations of the modeled networks. Under these perturbations, we will also analyze how forecasts and outcomes of intervention algorithms change. At least two publications planned for this period: 1. Analysis of food flows using network reliability. 2. Controlling the multi-pathway spread of invasive species using multi-scale network interventions.

      Impacts
      What was accomplished under these goals? Goal 1: Data exploration, curation and preliminary analysis In this period, datasets -- vegetable and cereal commodity flows from Freight Analysis Framework (FAF), vegetable production from CROPSCAPE and trade matrix from FAO were downloaded and stored in a PostgreSQL database. Since the data is from multiple sources, we had to standardize the datasets in order to analyze them in combination. Using GIS tools in python programming language, production data was disaggregated to 25kmx25km cells in preparation for use in simulations. Goal 2: Modeling and analysis of commodity flows. Two sets of commodity flow networks were constructed from the above mentioned datasets -- US vegetable and cereal flows from FAF database for various years and country-to-country vegetable flow networks from FAO trade matrix. Structural and dynamical analysis of these datasets are underway. Important nodes in the network such as hubs, sources and sinks were identified based on structural analysis such as indegree, outdegree, betweenness centrality, etc. We are developing a novel approach to identify important dynamics-induced clusters of highly-connected nodes in a directed weighted network using Moore-Shannon network reliability. Goal 3: Multi-pathway models for invasive species spread. The multi-pathway model from our previous work [McNitt et al. 2019] is being extended. Firstly, based on feedback received more functional relationships are being implemented. Secondly, using vectorized operations in python programming language we have been making the simulator faster and therefore scalable to larger networks. This will enable us to run simulations at the region/country scale in the US. We are developing a multi-scale intervention framework with the objective of selecting few locations to setup traps or apply interventions such as pesticides in order to delay or stifle the spread of the pest in the event of its introduction. The approach will apply agent-based models along with linear optimization techniques. McNitt, J., Chungbaek, Y. Y., Mortveit, H., Marathe, M., Campos, M. R., Desneux, N., ... & Adiga, A. (2019). Assessing the multi-pathway threat from an invasive agricultural pest: Tuta absoluta in Asia. Proceedings of the Royal Society B, 286(1913), 20191159.

      Publications