hypoweavr is an R package that takes hypotheses from different studies and synthesizes them through a graphical analysis.
To install hypoweavr, use remotes::install_github(“elizagrames/hypoweavr”). If you don’t already have the remotes package installed, it can be installed from CRAN with install.packages(“remotes”).
hypoweavr is a work in progress. Please send any comments, suggestions, or better-designed logos to Eliza Grames at egrames@binghamton.edu.
Before developing a conceptual model using this approach, users should conduct a systematic review to determine which studies do and do not meet their criteria for inclusion. Although users could go through the process of extracting and synthesizing hypotheses from any set of studies, doing so with a (relatively) unbiased set of papers identified through a systematic process will help with reproducibility and rigor of the resulting model. We will not go into the details of systematic reviews here and assume that users are beginning this process with a database of studies that met inclusion criteria after full-text screening and for which full texts of articles are available to the user.
The example presented here uses articles derived from a systematic review of the literature on mechanisms underlying edge and area sensitivity in forest songbirds. In short, the aim of the review was to determine what processes researchers have studied that may explain why some forest songbird population densities are lower in small forests (area sensitivity) or with distance to edge of the forest (edge sensitivity). For example, birds nesting near the edge of a forest may experience greater predation pressure from small mammals that live in the surrounding habitat, which in turn reduces their reproductive success, leads to low interannual site fidelity, and ultimately to lower population densities. The full details of the example can be read in Grames (2021).
Grames, E.M. 2021. New methods of evidence synthesis applied to systematically identify and analyze processes underlying bird-habitat relationships. University of Connecticut. http://hdl.handle.net/11134/20002:860659970
Many studies test multiple hypotheses, and depending on the topic of the synthesis and what the conceptual model aims to represent, users may want to not extract all hypotheses from primary studies and instead focus their efforts to only relevant hypotheses. Just as with defining inclusion and exclusion criteria for a systematic review, users need to define which hypotheses or types of hypotheses will be extracted. How these are defined will depend on the topic of the synthesis. For fairly narrowly defined questions or those on emerging topics with few studies, users may want to not place any constraints and instead extract all hypotheses to get a sense of where the field is headed. Conversely, for very broad or well-studied topics, it may be necessary to restrict what types of hypotheses are extracted.
In our example, we placed conceptual boundaries on the implied causal pathway for hypotheses we extracted and ignored any hypotheses outside those boundaries. Because we were interested in how patch size or distance to edge affect population density, we were not interested in any processes that resulted in patches of different size (e.g. “new logging operations ➞ smaller patch sizes”) or that result from population density (e.g. “dense population ➞ heterozygosity”) because they are outside the conceptual boundary for our model. For a hypothesis to be included, it had to either i) start with patch size or distance to edge in some way that could logically lead to population density directly or indirectly (e.g. “distance to edge ➞ predation pressure” because we assume predators could affect population size), ii) end with population density and begin in some way that could logically be traced back to patch size or distance to edge (e.g. “predator density ➞ population density” because we assume logically that predators could covary with patch size), or iii) expand on potential intermediary processes (e.g. “predator density ➞ reproductive success”).
Extracting hypotheses as implied pathways is the trickiest, and also most subjective step of this process. The approach we present here is a graphical one, and as such, we suggest users think of implied causal pathways in terms of arrows leading from one concept to another. Here, we use ‘pathways’ to refer to any type of causal or relational link or tie between two concepts. These pathways could be thought of as having a vaguely causal or relational implication (e.g. ‘influences’, ‘affects’, ‘leads to’, ‘results in’, ‘causes’, ‘covaries with’, etc.) whose exact meaning depends on the concepts that are connected. In some cases, these pathways are somewhat phenomenological explanations of how a system operates without defining the proximate mechanisms. For example, “distance to edge affects predator density” does not specify why this might be, and there could be more specific paths with more proximate mechanisms such as “distance to edge affects microclimate which affects suitable snake habitat which affects predator density”. One way to approach thinking about this structure is to consider that each of the concepts separated by ‘affects’ in these examples could be thought of as occupying the x- and y-axis of a scatterplot or other visualization. Note: these are just random numbers to help visualize what we mean.
The primary complication for extracting hypotheses from primary studies is that they are not always explicitly described, and interpreting what authors have written as an implied causal pathways will be up to the discretion of the user. In our case study, we found three main ways in which authors described hypotheses that we were able to extract: 1) objectives of the study or a priori hypotheses written descriptively, 2) statistical hypotheses or models defined mathematically, and 3) data gathered or results presented. Ideally, studies will include the first type of hypotheses, which should take precedent over the other two types which are more reliant on data rather than concepts. When a priori hypotheses or study objectives are not described, mathematical descriptions are the next best option but this can be muddled when multiple competing hypotheses are presented (e.g. multi-model inference or information theoretic approaches). Extracting hypotheses based on the variables collected or results presented should be done sparingly, as papers describing studies are subject to a number of biases (e.g. publication bias towards significant results, leading to omission of unsupported hypotheses).
Because extracting pathways depends on interpreting how authors have described primary studies, there is no single solution for how to extract implied causal pathways and each user could do it slightly differently. Indeed, the same user could extract pathways slightly differently each time when reading a study, however, the main hypothesis structure should generally be the same. We present several examples for the three main ways authors describe their hypotheses based on what we found in our case study; this is by no means an exhaustive list and is meant to be illustrative to help users think of how they might extract hypotheses for their own synthesis topics. In most cases, implied hypotheses will be sprinkled throughout the introduction, methods, and results sections.
In our examples, we include a directed acyclic graph (DAG) representing the hypotheses from each study. It is not necessary to draw these for each study when implementing this approach. The only data that hypoweavr requires for the paths is to have them represented with text, e.g. “patch size + distance to edge ➞ bird abundance”.
“Our study objectives were (i) to assess the relative importance of mammalian predator abundance, invertebrate prey biomass, and nest placement on nest success for Ovenbird and Wood Thrush; (ii) to determine which habitat features and matrix land-use types were most strongly associated with the abundance of mammalian nest predators and invertebrate prey in the forest fragments; and (iii) to compare the responses of species with different nesting strategies to changes in predation pressure, prey availability, and nest placement characteristics. We constructed a number of candidate models consisting of one to two predictor variables each to assess the relative importance of mammalian nest predator abundance, invertebrate prey biomass, and nest placement on nest success in Ovenbirds and Wood Thrush. … Our study sites consisted of 12 mature, deciduous forest fragments ranging in size from 11 to 280.4 ha (mean = 46.5 ha).” (Richmond et al. 2011)
From objective (i), we know that there are three processes or factors which the authors hypothesize influence nest success, so we can write the path: “mammalian predator abundance + invertebrate prey biomass + nest placement ➞ nest success”. Because the study is comparing forest fragments of different sizes, we can also add a path based on objective (ii): “forest fragment size ➞ mammalian nest predator abundance + invertebrate prey”. We ignore objective (iii) because it is comparing these processes across species, but is not introducing new processes.
“Thus, the main goal of the current study was to simultaneously analyze the competing effects of distance from different edge types, and composition of the surrounding landscape on nest survival of several breeding bird species. … Based on previous results (Donovan et al. 1997, Suarez et al. 1997, Huhta and Jokimäki 2001, Chapa-Vargas and Robinson 2010), we predicted that an increasing proportion of natural habitats and edges would increase nest survival and decrease probabilities of parasitism, whereas increasing proportions of anthropogenic habitats and edges would decrease nest survival and increase probabilities of brood parasitism…” (Chapa-Vargas and Robinson, 2012)
Based on the study objectives, we know that the authors expect distance to edge affects nest survival. We can infer because they also make predictions about brood parasitism, that the main path is: “distance to edge ➞ nest survival + brood parasitism”. Reading further into the methods of the paper, we also have statistical analyses presented, which add more paths.
“…in order to assess the effects of factors related to distance from nearest natural and anthropogenic edges, habitat composition of the landscapes immediately around nest patches, temporal effects, and brood parasitism effects on survival of birds nests. … We used logistic regression to assess the effect of distance to nearest natural and anthropogenic edges, habitat composition of the landscapes immediately around nest patches, and temporal effects (nest initiation date) on brood parasitism.” (Chapa-Vargas and Robinson, 2012)
The first phrase implies the pathway: “distance to edge + brood parasitism ➞ nest survival”. We can ignore the temporal effects because time operates independently and cannot be tied back to either distance to edge or patch size which are the boundaries of our conceptual model. The second model presented implies a similar pathway: “distance to edge ➞ brood parasitism” which was already described in the main objectives.
In the results section, the authors state that “Depredation was the leading cause of nest failure” (Chapa-Vargas and Robinson, 2012). This was not explicitly mentioned as a study objective or analysis, but the authors clearly had some expectation that it may be important because they collected data on nest predation even though it was not linked to distance to edge. But, we can logically infer that it could somehow be connected by hypotheses from other studies in the final model, so we add a final pathway: “nest predation ➞ nest survival”. We replaced ‘nest failure’ with ‘nest survival’ to be consistent with the main objectives and because they are antonyms.
“After documenting lower densities of territories at forest-road edges, we present data on habitat use and reproductive success to evaluate three possible mechanisms that could produce this pattern: (1) the passive-displacement hypothesis, in which territories located adjacent to roads are limited to forested habitat such that territory centers are displaced from forest-road borders; (2) the territory-size hypothesis, whereby habitat quality is lower within edge areas, resulting in an increase in territory size that limits densities; and (3) the active-avoidance hypothesis, in which habitat quality is lower within edge areas, causing males to avoid edges and locate their territories away from roads.” (Ortega et al. 1999)
The outcome measure for all these hypotheses is ‘territory density’ because the mechanisms seek to explain it. These objectives require us to introduce new terms because the exact phrases the authors use are a bit too long to be a node and shorter versions do not capture their full meaning. So, from objective (1), we can rephrase ‘territories located adjacent to roads’ as ‘territory placement’, and write the following path: “distance to edge ➞ territory placement ➞ territory density”. From objective (2), we can also write “distance to edge ➞ habitat quality ➞ territory size ➞ territory density”. Objective (3) is also fairly straightforward and implies the pathway: “distance to edge ➞ habitat quality ➞ edge avoidance ➞ territory placement ➞ territory density”.
One problem with these hypotheses is that we do not really know what the authors mean by ‘habitat quality’ so we continue reading through the methods section to see if there are more details that can be included. There may also be other hypotheses that are not part of the main study objectives.
“We used a focal-male technique (Gibbs and Faaborg 1990) to assess pairing status and territory size in relation to roads.” (Ortega et al. 1999)
Pairing status was not previously mentioned, but this part of the methods section implies: “distance to edge ➞ pairing status + territory size”. Later on, the authors also introduce fledging success as a process that is influenced by distance to edge.
“To index the proportion of territories fledging at least one young within edge and interior areas, brood detections were referenced to territory maps of males.” (Ortega et al. 1999)
Because it is comparing edge and interior areas, this can be written as: “distance to edge ➞ fledging success”.
“We measured 10 variables considered important in characterizing Ovenbird habitat… At the center of each plot, we measured depth of leaf litter, shrub height, canopy height, and slope.” (Ortega et al. 1999)
Now we know how the authors are characterizing ‘habitat quality’ which was ambiguous before. Here, we omit the descriptions of how the other 6 variables were measured for clarity in the example. Because ‘habitat quality’ results from these different properties but we have no information on how the variables are related, we write: “leaf litter depth + shrub height + canopy heigh + slope ➞ habitat quality”.
“Therefore, we predicted that yearling male Ovenbirds would constitute a higher proportion of territorial males in forest fragments than in contiguous forest. As a mechanism to explain these differences, we predicted lower return rates of males to fragments than to contiguous forests, which should create more vacancies in the fragments than in the contiguous forests (Møller 1991; Weinberg and Roth 1998). We tested these predictions by capturing, banding, and ageing territorial male Ovenbirds in 12 forest fragments and two contiguous forest sites in the breeding seasons of 1996 through 1999, and recording resighting rates of birds in 1997 and 1998. We also weighed and measured birds to determine whether the condition of territorial males was affected by either age or size of the woodlot.” (Burke and Nol, 2001)
Because the authors are comparing forest fragments to contiguous forest, we can call the starting point “fragment size”. The first sentence implies the pathway: “fragment size ➞ male age” (because it is shorter to write). The second implies: “fragment size ➞ male return rates ➞ territory vacancy” (‘vacancies’ is too vague to interpret later in the full model). Because condition could be affected by two different things, the final sentence suggests: “fragment size + age ➞ body condition”.
“We quantified avian community structure (i.e. the number of species and their relative abundances) in forest remnants in the Chicago metropolitan area with varying amounts of invasive vegetation. Specifically, we addressed the following questions: How do measures of invasive vegetation correlate with avian community structure? How do particular avian guilds and individual species respond to exotic vegetation? Finally, how do the effects of exotic vegetation on birds compare in magnitude to those associated with other local and landscape characteristics?” (Schneider and Miller 2014)
At first glance, this study does not necessarily meet our inclusion criteria because it does not start with patch size or distance to edge, though it does end with bird abundance. It is unclear what ‘landscape characteristics’ are, however, so to find out what ‘landscape characteristics’ are and if they are related to patch size or distance to edge, we can check the methods section.
“We delineated forest remnants in which the plots were embedded and calculated the perimeter of each remnant (m) and total contiguous forested area (ha). We also measured the distance between each plot center and the nearest forest edge (m).” (Schneider and Miller 2014)
Now, we know that: “distance to edge + patch size” are the ‘landscape characteristics’ so we can use these in place of that for the main pathway. We can ignore perimeter because it is not part of our conceptual boundaries. We can write the first question posed by the authors as: “invasive vegetation ➞ avian community”. The second question is outside the scope of our model because it is about comparing different species. Knowing what ‘landscape characteristics’ are now, we can write the third question as: “distance to edge + patch size + invasive species ➞ avian community”.
It may be of interest to users to create multi-dimensional networks that include not just linked hypotheses, but also characteristics of the included studies. For example, users could extract study authors to do a concurrent bibliometric analysis, or whether or not a hypothesis was supported (note: vote-counting should be avoided as an alternative to meta-analysis, but may be useful as part of building a conceptual model). Users may also want to use study characteristics to subset studies and compare conceptual models built from different subtopics. In our case study (Grames 2021), for example, we created different models based on the type of habitat surrounding a forest edge and for different species. Some metadata may also be useful for investigating trends, such as years of data collection, latitude or longitude where a study was conducted, or other scales.
Note that this step is optional, but resulting networks are more interesting and additional analyses can be done with metadata that may covary with which hypotheses are studied. Users should consider what metadata they would like to use as covariates when designing their synthesis.
The hypotheses and study metadata for our case study from Grames (2021) are pre-loaded in hypoweavr as an example. We will load this dataset in with the code below so they are in our working environment, then look at the first few rows of the data.
data("studies")
::kable(head(studies, 3)) knitr
Reference | Article.type | Years.of.data.collection | Coordinates | Surrounding.landscape | Forest.type | Focal.taxa | Pathways |
---|---|---|---|---|---|---|---|
Bollinger, E.K., and E.T. Linder. 1994. Reproductive success of Neotropical migrants in a fragmented Illinois forest. The Wilson Bulletin, 106: 46-54. | article | 1991-1992 | 39, -88 | agriculture | GCFL, ACFL, REVI, WOTH, OVEN, WEWA, LOWA, KEWA, SCTA | patch size > reproductive success > age structure | |
Friesen, L.E., P.F. Eagles, and R.J. MacKay. 1995. Effects of residential development on forest-dwelling Neotropical migrant songbirds. Conservation Biology, 9: 1408-1414. | article | 1992-1994 | 43, -80 | residential | maple beech birch | community | patch size + housing development > bird abundance |
Williams Jr, G.E. 2002. Relations of nesting behavior, nest predators, and nesting success of Wood Thrushes (Hylocichla mustelina) to habitat characteristics at multiple scales. West Virginia University. | thesis | 1998-2000 | 38, -79* | regenerating forest | maple beech birch | WOTH | patch size > nest provisionining > nest attendance > nest predation |
Many authors use different terms to describe the same, or related, concepts and the distinction may or may not be useful in the context of the conceptual model. When extracting hypotheses, we recommend that users retain the original terms used by authors unless they are clear synonyms or antonyms (e.g. “nest success” and “nest failure” are direct antonyms and can be collapsed without preserving the original phrase since they only differ in the sign of the causal relationship).
For simplicity when presenting or interpreting a model, it can be valuable to collapse similar concepts and represent them with a single term, however, in many cases the distinctions do matter. For example, “nest success” (i.e. at least one chick fledged) and “number of fledglings” both represent “reproductive success” but may be influenced by different processes (e.g. predation may influence all or nothing nest success, but food availability may influence how many chicks fledge). It is best to preserve this distinction but not necessarily to represent it in the model where the more generic ‘reproductive success’ term may make the model more interpretable.
Rather than having to choose between a more detailed model or one with collapsed concepts, we suggest that users adopt a flexible approach using an n-level ontology. In the ontology, layer n consists of exact synonyms or antonyms, and higher order levels (e.g. n-1) are conceptual groupings of those synonyms. To continue with the nest success theme, we could imagine a 2-level ontology where level 1 contains the group ‘reproductive success’ within which are ‘nest success’ and ‘number of fledglings’; level 2 is nested within level 1 and contains synonyms for ‘nest success’ (e.g. ‘nest failure’) and ‘number of fledglings’ (e.g. ‘number of fledged young’). This ontology can then be used to reclassify pathways and generate models at different levels of specificity.
Concept | Term | Synonyms |
---|---|---|
Reproductive success | nest success | nest success |
nest failure | ||
number of fledglings | number of fledglings | |
number of fledged young |
Although users could create the higher levels of an ontology in advance based on knowledge of the system, inevitably many terms will be omitted from the synonym list. We recommend completing path extraction before creating the ontology, and using the list of terms extracted from the texts as the basis for the ontology.
In our case study (Grames 2021), we only created an ontology for synonyms or related terms and did not group them into higher levels.
data("ontology")
::kable(head(ontology, 10)) knitr
Term | Synonyms |
---|---|
adult survival | adult survival |
apparent survival | |
age structure | male age |
population age structure | |
age | |
age structure | |
arrival date | arrival date |
atmospheric deposition | atmospheric deposition |
behavior | foraging behavior |
behavior |
This table is human-readable with blank space left in the ‘Term’ column, however, it isn’t very machine-readable. To make it more useful, we need to fill in the blank spaces with the associated term. To do this, we can use the function fill_rows(), which is a verbatim copy of the same function in the topictagger package.
<- fill_rows(ontology)
ontology ::kable(head(ontology, 10)) knitr
Term | Synonyms |
---|---|
adult survival | adult survival |
adult survival | apparent survival |
age structure | male age |
age structure | population age structure |
age structure | age |
age structure | age structure |
arrival date | arrival date |
atmospheric deposition | atmospheric deposition |
behavior | foraging behavior |
behavior | behavior |
At this point, users should have 1) hypotheses associated with study metadata, and 2) an ontology defining synonyms (and potentially other groups). The remaining steps use that information to generate graphs representing study hypotheses, global graphs merging hypotheses across studies, and assemble studies in different ways to analyze the graphs (e.g. creating a cumulative time series of graphs or using sliding windows to explore trends).
First, we need to clean up the pathways and get them in the format expected by hypoweavr. This involves cleaning up the punctuation and replacing synonymous terms using the ontology. In our example, we have used “>” to represent implied causal relationships, and “+” as a shorthand for indicating that a group of variables share the same relationship to the variable(s) on the other end of the relationship. For example, instead of writing “patch size > bird abundance” and “distance to edge > bird abundance” we can write “patch size + distance to edge > bird abundance”. When a single study has more than one implied causal pathway, we separated them by “;”.
# First, let's pull the pathways out of the main dataset
<- studies$Pathways
pathways head(pathways)
## [1] "patch size > reproductive success > age structure"
## [2] "patch size + housing development > bird abundance"
## [3] "patch size > nest provisionining > nest attendance > nest predation"
## [4] "patch size + matrix type > bird species richness + bird abundance"
## [5] "forest width > bird abundance; distance to edge + forest width > nest success; predation + weather + parasitism > nest failure; invasive plant nest substrates > nest concealment > nest success"
## [6] "distance to edge + nest placement + understory vegetation density + canopy cover + nest concealment > nest success"
# We need to clean up the punctuation to be consistent and in the right format
<- clean_path(path=pathways,
cleaned_paths join = "+",
cause = c(">", "="),
sep=";")
head(cleaned_paths)
## [1] ";patch size > reproductive success > age structure;"
## [2] ";patch size > bird abundance;housing development > bird abundance;"
## [3] ";patch size > nest provisionining > nest attendance > nest predation;"
## [4] ";patch size > bird species richness;patch size > bird abundance;matrix type > bird species richness;matrix type > bird abundance;"
## [5] ";forest width > bird abundance;distance to edge > nest success;forest width > nest success;predation > nest failure;weather > nest failure;parasitism > nest failure;invasive plant nest substrates > nest concealment > nest success;"
## [6] ";distance to edge > nest success;nest placement > nest success;understory vegetation density > nest success;canopy cover > nest success;nest concealment > nest success;"
Now the punctuation and format of all our written pathways is standardized, but the terms used are not. One study may say ‘nest success’ and other ‘nest survival’ and we want to merge those into the same concept so that they are represented by a single node in the network. To do this, we use our ontology to group synonyms and replace them with a single term.
# Use the ontology to replace synonymous terms
<- replace_terms(path=cleaned_paths,
merged_paths terms=ontology$Term,
synonyms = ontology$Synonyms)
head(merged_paths)
## [1] " ; patch size > reproductive success > age structure ;"
## [2] " ; patch size > bird abundance ; matrix type > bird abundance ;"
## [3] " ; patch size > provisioning rate > nest attendance > nest predation ;"
## [4] " ; patch size > bird community ; patch size > bird abundance ; matrix type > bird community ; matrix type > bird abundance ;"
## [5] " ; patch size > bird abundance ; distance to edge > nest success ; patch size > nest success ; nest predation > nest success ; weather > nest success ; brood parasitism > nest success ; invasive species > nest concealment > nest success ;"
## [6] " ; distance to edge > nest success ; nest site selection > nest success ; vegetation structure > nest success ; forest characteristics > nest success ; nest concealment > nest success ;"
# Take a look at item 6; notice 'nest placement' is now 'nest site selection',
# 'understory vegetation density' is now 'vegetation structure', etc.
First, we need to create one graph for each study in our dataset. We can then recombine and analyze these graphs based on study characteristics or other metadata we have collected. For example, in the code below, we use the first year of data collection to merge studies that began in the same year. In some years, there are no studies in the database initiated in that year, so they are left blank.
# This will create a list of graph objects the same length as the number of studies
<- generate_graph(merged_paths)
study_graphs
# Let's create a new variable for first year of data collection
<- unlist(lapply(studies$Years.of.data.collection, function(x){
firstyear as.numeric(strsplit(x, "-")[[1]][1])
}))
# Now we can create a shorter series of graphs where all studies from the same
# year are merged together into a single graph
# This can be useful, e.g. for looking at trends in thinking over time
<- create_series(graphs = study_graphs,
by_year order.by = firstyear,
sort.order = 1975:2020)
# If we have ordered our graphs by something numerical (e.g. year, latitude)
# but we don't want to analyze each year separately, we could create a time
# series of graphs based on sliding windows
<- create_windows(by_year, window.size=5, startpoint = 1)
window_dag
# Or we could create a cumulative graph where each one builds on all previous
# graphs in the series, e.g. looking at accumulation of hypotheses over time
<- create_cumulative(graphs = study_graphs,
cumulative_year order.by = firstyear,
sort.order = 1975:2020)
# We could also pull out a graph for all studies with some shared characteristic
# For example, a unified graph of all studies done in aspen-birch forests, which
# we could compare to one done in maple-beech-birch
<- merge_graphs(study_graphs, by=studies$Forest.type=="aspen birch")
aspen <- merge_graphs(study_graphs, by=studies$Forest.type=="maple beech birch") maple
We can then plot the graphs to visualize how hypotheses are related to each other. For example, the aspen network is shown below. Because everyone will want to visualize their networks differently, there are no plotting functions included in hypoweavr. We suggest using igraph for static plots, or use visNetwork or tkplot to create interactive plots. If one of the end products is an interactive webpage with results of the network building, we recommend visNetwork; example code for the Shiny app created as part of our case study can be found at https://github.com/elizagrames/conceptual-models/.
Because we have represented the hypotheses from each study (or each year) as a network, there are many, many different analyses that could be done with it depending on the type of synthesis being done and the questions being addressed. There are a few simple analyses built into hypoweavr, which we demonstrate below, but users should consider more sophisticated approaches and software that is dedicated solely to graph analysis (e.g. the igraph package).
Note that some analyzes which depend on graphs being directed (e.g. transitive reduction) may not function as expected. Although hypotheses within a study should be internally consistent and form a directed acyclic graph (DAG), when individual study DAGs are combined, they may produce cycles or conflicting directions for causal relationships.
The dissimilarity metric included in hypoweavr is based on Schieber et al. (2017), which is based on topological differences between graphs. This can be useful when thinking about how the structure of the network changes over time.
# Dissimilarity between two graphs
calc_dissimilarity(aspen, maple)
## [1] 0.3377651
# We could also calculate dissimilarity over a series of graphs, for example
# using the cumulative graphs by year to see when the network stabilizes
<- calc_dissimilarity(cumulative_year)
cumulative_diss plot(cumulative_diss)
# Or we could use the sliding windows network to to see how much the field
# is changing over time and when there are bursts of ideas
<- calc_dissimilarity(window_dag)
time_diss plot(time_diss)
Users may want to look at simple metrics like how the number of nodes (i.e. factors or concepts in the model) or edges (i.e. relationships between those concepts) change over time or other dimensions. There are two simple functions in hypoweavr for doing this: one that calculates the number and identity of new features in a graph that are not in the comparison graph, and one that calculate the number and identity of shared features between two graphs.
# We can calculate the number of new features (nodes and edges) in the network
# created from only studies in maple compared to the network for aspen
# Note that y gets compared to x, so this returns new features in y that are not in x
new_features(x=aspen, y=maple, return="counts")
## $nodes
## [1] 13
##
## $edges
## [1] 42
# If we want to know what those new features are, we can return feature instead
new_features(x=aspen, y=maple, return="features")
## $nodes
## [1] "occupancy" "habitat selection" "habitat quality"
## [4] "territory size" "earthworm abundance" "plant community"
## [7] "invertebrate abundance" "edge characteristics" "urbanization"
## [10] "adult survival" "body size" "number of fledglings"
## [13] "site fidelity"
##
## $edges
## [1] "distance to edge|habitat selection"
## [2] "distance to edge|brood parasitism"
## [3] "distance to edge|leaf litter depth"
## [4] "distance to edge|forest characteristics"
## [5] "distance to edge|vegetation structure"
## [6] "distance to edge|nest success"
## [7] "distance to edge|nest predation"
## [8] "nest predation|site fidelity"
## [9] "nest predation|reproductive success"
## [10] "nest predation|nest success"
## [11] "nest attendance|nest predation"
## [12] "provisioning rate|nest attendance"
## [13] "matrix type|reproductive success"
## [14] "matrix type|bird community"
## [15] "matrix type|bird abundance"
## [16] "patch size|invertebrate abundance"
## [17] "patch size|plant community"
## [18] "patch size|habitat quality"
## [19] "patch size|occupancy"
## [20] "patch size|extinction"
## [21] "patch size|singing males"
## [22] "patch size|brood parasite abundance"
## [23] "patch size|predator abundance"
## [24] "patch size|predation risk"
## [25] "patch size|edge avoidance"
## [26] "patch size|settlement"
## [27] "patch size|off-territory movements"
## [28] "patch size|habitat heterogeneity"
## [29] "patch size|resource availability"
## [30] "patch size|reproductive success"
## [31] "patch size|brood parasitism"
## [32] "patch size|pairing success"
## [33] "patch size|territory density"
## [34] "patch size|arrival date"
## [35] "patch size|heterospecific density"
## [36] "patch size|bird community"
## [37] "patch size|peat depth"
## [38] "patch size|forest characteristics"
## [39] "patch size|vegetation structure"
## [40] "patch size|nest predation"
## [41] "patch size|provisioning rate"
## [42] "patch size|bird abundance"
# Similarly, we can calculate the number of shared features that appear in the
# graph for both aspen and maple, and return what those features are
shared_features(x=aspen, y=maple, return="counts")
## $nodes
## [1] 60
##
## $edges
## [1] 14
shared_features(x=aspen, y=maple, return="features")
## $nodes
## [1] "patch size" "bird abundance"
## [3] "matrix type" "provisioning rate"
## [5] "nest attendance" "nest predation"
## [7] "distance to edge" "nest success"
## [9] "nest site selection" "vegetation structure"
## [11] "forest characteristics" "nest concealment"
## [13] "peat depth" "bird community"
## [15] "leaf litter depth" "heterospecific density"
## [17] "arrival date" "territory density"
## [19] "pairing success" "brood parasitism"
## [21] "reproductive success" "resource availability"
## [23] "habitat heterogeneity" "off-territory movements"
## [25] "settlement" "edge avoidance"
## [27] "predation risk" "predator abundance"
## [29] "brood parasite abundance" "singing males"
## [31] "extinction" "occupancy"
## [33] "habitat selection" "habitat quality"
## [35] "territory size" "earthworm abundance"
## [37] "plant community" "invertebrate abundance"
## [39] "edge characteristics" "urbanization"
## [41] "adult survival" "body size"
## [43] "number of fledglings" "site fidelity"
## [45] "heterospecific attraction" "behavior"
## [47] "age structure" "singing rate"
## [49] "body condition" "extra-pair partners"
## [51] "breeding synchrony" "extra pair copulation"
## [53] "male quality" "encounter probability"
## [55] "territory establishment" "territory defense"
## [57] "canopy bird species" "open canopy"
## [59] "shrub layer" "ground vegetation coverage"
##
## $edges
## [1] "habitat quality|territory size"
## [2] "settlement|bird abundance"
## [3] "forest characteristics|bird community"
## [4] "forest characteristics|nest success"
## [5] "vegetation structure|bird community"
## [6] "vegetation structure|nest success"
## [7] "distance to edge|forest characteristics"
## [8] "distance to edge|vegetation structure"
## [9] "patch size|habitat quality"
## [10] "patch size|off-territory movements"
## [11] "patch size|bird community"
## [12] "patch size|forest characteristics"
## [13] "patch size|vegetation structure"
## [14] "patch size|bird abundance"
# If we calculate new features for a longer list of graphs, we can track growth
# of the network, for example using the cumulative graphs by year
<- new_features(cumulative_year, return="counts")
cumulative_features plot(cumulative_features$nodes ~ seq(1975, 2020, 1), type="l", ylab="Network nodes", xlab="Year")
plot(cumulative_features$edges ~ seq(1975, 2020, 1), type="l", ylab="Network edges", xlab="Year")
# Shared features for cumulative graphs aren't very interesting, because we already
# know that each graph shares all the same features of the previous graph
In the network, the hypotheses are represented by concepts (nodes) and the relationships between them (edges). Assessing characteristics of the nodes and edges can be useful for determining which hypotheses are most important, how hypotheses have changed over time, etc. hypoweavr includes a wrapper function to pull graph metrics for one or more graphs, including time series of graphs, using underlying functions from the igraph package. We can then visualize changes in graph metrics over time. There are many, many different ways users can approach analyzing graphs and the characteristics of nodes and edges, which we do not go into here.
# We can generate node metrics for a single graph
graph_metrics(aspen, metric="page_rank", return.df = FALSE)
## patch size bird abundance
## 0.01330561 0.08240264
## heterospecific attraction settlement
## 0.04832674 0.05438334
## bird community distance to edge
## 0.02177381 0.01330561
## forest characteristics vegetation structure
## 0.01726403 0.01726403
## nest success habitat selection
## 0.02064283 0.02952039
## nest site selection territory size
## 0.02585178 0.02578736
## habitat quality behavior
## 0.02670769 0.01443659
## off-territory movements pairing success
## 0.12479480 0.01330561
## age structure singing rate
## 0.01330561 0.01330561
## body condition extra-pair partners
## 0.01330561 0.06330512
## breeding synchrony extra pair copulation
## 0.01330561 0.12273530
## male quality encounter probability
## 0.01330561 0.06634340
## territory density territory establishment
## 0.02064283 0.01613306
## territory defense canopy bird species
## 0.01613306 0.02465638
## open canopy shrub layer
## 0.01443659 0.02557672
## ground vegetation coverage
## 0.01443659
# Or we can generate metrics for multiple graphs, which returns a list
graph_metrics(list(aspen, maple), metric="page_rank", return.df = FALSE)
## [[1]]
## patch size bird abundance
## 0.01330561 0.08240264
## heterospecific attraction settlement
## 0.04832674 0.05438334
## bird community distance to edge
## 0.02177381 0.01330561
## forest characteristics vegetation structure
## 0.01726403 0.01726403
## nest success habitat selection
## 0.02064283 0.02952039
## nest site selection territory size
## 0.02585178 0.02578736
## habitat quality behavior
## 0.02670769 0.01443659
## off-territory movements pairing success
## 0.12479480 0.01330561
## age structure singing rate
## 0.01330561 0.01330561
## body condition extra-pair partners
## 0.01330561 0.06330512
## breeding synchrony extra pair copulation
## 0.01330561 0.12273530
## male quality encounter probability
## 0.01330561 0.06634340
## territory density territory establishment
## 0.02064283 0.01613306
## territory defense canopy bird species
## 0.01613306 0.02465638
## open canopy shrub layer
## 0.01443659 0.02557672
## ground vegetation coverage
## 0.01443659
##
## [[2]]
## patch size bird abundance matrix type
## 0.01275238 0.13988046 0.01275238
## provisioning rate nest attendance nest predation
## 0.01315385 0.02393316 0.03638757
## distance to edge nest success nest site selection
## 0.01275238 0.06127903 0.01538328
## vegetation structure forest characteristics nest concealment
## 0.01686870 0.01686870 0.01633698
## peat depth bird community leaf litter depth
## 0.01315385 0.03631186 0.01646724
## heterospecific density arrival date territory density
## 0.01315385 0.01315385 0.03368889
## pairing success brood parasitism reproductive success
## 0.05095820 0.02722516 0.05021822
## resource availability habitat heterogeneity off-territory movements
## 0.01315385 0.01315385 0.01315385
## settlement edge avoidance predation risk
## 0.02433462 0.01315385 0.01315385
## predator abundance brood parasite abundance singing males
## 0.01315385 0.01315385 0.01315385
## extinction occupancy habitat selection
## 0.01315385 0.02399338 0.01646724
## habitat quality territory size earthworm abundance
## 0.02476139 0.01538328 0.01383634
## plant community invertebrate abundance edge characteristics
## 0.01315385 0.02476139 0.01275238
## urbanization adult survival body size
## 0.01275238 0.01455897 0.01455897
## number of fledglings site fidelity
## 0.01455897 0.02306220
# The number and order of items is not the same for each entry in our list
# But in some cases, we may want a data frame where all possible entries appear
# for every single graph, even if that node/edge is not present
# In this case, we can set return.df to TRUE
<- graph_metrics(cumulative_year, metric="strength", return.df=T)
nodestrength ::kable(head(nodestrength[,1:15])) knitr
1975 | 1976 | 1977 | 1978 | 1979 | 1980 | 1981 | 1982 | 1983 | 1984 | 1985 | 1986 | 1987 | 1988 | 1989 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
patch size | 1 | 4 | 4 | 4 | 4 | 6 | 17 | 20 | 21 | 21 | 21 | 21 | 22 | 26 | 27 |
bird community | 1 | 1 | 1 | 1 | 1 | 4 | 4 | 4 | 4 | 5 | 7 | 7 | 7 | 9 | 9 |
vegetation structure | NA | 4 | 4 | 4 | 6 | 6 | 6 | 6 | 6 | 6 | 7 | 7 | 7 | 10 | 10 |
forest characteristics | NA | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 5 | 5 | 5 | 5 | 5 |
bird abundance | NA | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 3 | 4 | 9 | 9 | 9 | 11 | 11 |
distance to edge | NA | NA | NA | NA | 4 | 4 | 4 | 5 | 5 | 5 | 9 | 9 | 9 | 9 | 9 |
# We can calculate edge metrics using the same function
graph_metrics(aspen, metric="edge_betweenness")
## ground vegetation coverage|habitat quality
## 3.0
## shrub layer|bird abundance
## 14.0
## open canopy|shrub layer
## 9.0
## encounter probability|extra pair copulation
## 1.0
## male quality|off-territory movements
## 3.0
## breeding synchrony|extra-pair partners
## 4.0
## extra-pair partners|off-territory movements
## 27.0
## body condition|off-territory movements
## 3.0
## singing rate|off-territory movements
## 3.0
## age structure|off-territory movements
## 3.0
## pairing success|off-territory movements
## 3.0
## off-territory movements|encounter probability
## 16.0
## off-territory movements|extra pair copulation
## 16.0
## habitat quality|canopy bird species
## 3.0
## habitat quality|territory size
## 2.0
## habitat selection|nest site selection
## 4.0
## habitat selection|habitat selection
## 0.0
## vegetation structure|territory density
## 2.0
## vegetation structure|habitat selection
## 6.0
## vegetation structure|nest success
## 2.0
## vegetation structure|bird community
## 1.5
## forest characteristics|territory density
## 2.0
## forest characteristics|extra-pair partners
## 8.5
## forest characteristics|nest success
## 2.0
## forest characteristics|bird community
## 1.5
## distance to edge|territory defense
## 1.0
## distance to edge|territory establishment
## 1.0
## distance to edge|vegetation structure
## 4.5
## distance to edge|forest characteristics
## 6.5
## settlement|bird abundance
## 11.0
## heterospecific attraction|settlement
## 10.0
## bird abundance|extra-pair partners
## 20.5
## bird abundance|heterospecific attraction
## 9.0
## patch size|ground vegetation coverage
## 1.0
## patch size|open canopy
## 2.0
## patch size|off-territory movements
## 3.0
## patch size|behavior
## 1.0
## patch size|habitat quality
## 2.0
## patch size|territory size
## 1.0
## patch size|vegetation structure
## 4.0
## patch size|forest characteristics
## 2.5
## patch size|bird community
## 1.0
## patch size|bird abundance
## 3.5
# For single graphs, we can look at metrics in terms of relative importance
<- sort(graph_metrics(aspen, metric="edge_betweenness"))
aspen_btwn <- names(aspen_btwn); labels[aspen_btwn<5]<- ""
labels plot(aspen_btwn)
text(aspen_btwn, labels, cex=0.5, adj=0)
graph_metrics(list(aspen, maple), metric="edge_betweenness", return.df = FALSE)
## [[1]]
## ground vegetation coverage|habitat quality
## 3.0
## shrub layer|bird abundance
## 14.0
## open canopy|shrub layer
## 9.0
## encounter probability|extra pair copulation
## 1.0
## male quality|off-territory movements
## 3.0
## breeding synchrony|extra-pair partners
## 4.0
## extra-pair partners|off-territory movements
## 27.0
## body condition|off-territory movements
## 3.0
## singing rate|off-territory movements
## 3.0
## age structure|off-territory movements
## 3.0
## pairing success|off-territory movements
## 3.0
## off-territory movements|encounter probability
## 16.0
## off-territory movements|extra pair copulation
## 16.0
## habitat quality|canopy bird species
## 3.0
## habitat quality|territory size
## 2.0
## habitat selection|nest site selection
## 4.0
## habitat selection|habitat selection
## 0.0
## vegetation structure|territory density
## 2.0
## vegetation structure|habitat selection
## 6.0
## vegetation structure|nest success
## 2.0
## vegetation structure|bird community
## 1.5
## forest characteristics|territory density
## 2.0
## forest characteristics|extra-pair partners
## 8.5
## forest characteristics|nest success
## 2.0
## forest characteristics|bird community
## 1.5
## distance to edge|territory defense
## 1.0
## distance to edge|territory establishment
## 1.0
## distance to edge|vegetation structure
## 4.5
## distance to edge|forest characteristics
## 6.5
## settlement|bird abundance
## 11.0
## heterospecific attraction|settlement
## 10.0
## bird abundance|extra-pair partners
## 20.5
## bird abundance|heterospecific attraction
## 9.0
## patch size|ground vegetation coverage
## 1.0
## patch size|open canopy
## 2.0
## patch size|off-territory movements
## 3.0
## patch size|behavior
## 1.0
## patch size|habitat quality
## 2.0
## patch size|territory size
## 1.0
## patch size|vegetation structure
## 4.0
## patch size|forest characteristics
## 2.5
## patch size|bird community
## 1.0
## patch size|bird abundance
## 3.5
##
## [[2]]
## urbanization|number of fledglings
## 1.000000
## urbanization|body size
## 1.000000
## urbanization|adult survival
## 1.000000
## urbanization|brood parasitism
## 1.500000
## urbanization|nest success
## 1.000000
## urbanization|nest predation
## 2.500000
## edge characteristics|occupancy
## 1.000000
## invertebrate abundance|invertebrate abundance
## 0.000000
## invertebrate abundance|habitat quality
## 13.000000
## plant community|bird abundance
## 1.000000
## earthworm abundance|bird abundance
## 1.200000
## territory size|pairing success
## 2.000000
## territory size|territory density
## 2.000000
## habitat quality|territory size
## 5.000000
## habitat quality|habitat selection
## 4.000000
## habitat quality|pairing success
## 3.000000
## habitat quality|leaf litter depth
## 3.000000
## habitat quality|forest characteristics
## 3.666667
## habitat quality|vegetation structure
## 5.666667
## habitat quality|nest site selection
## 4.666667
## habitat quality|bird abundance
## 2.200000
## habitat selection|territory density
## 5.000000
## extinction|bird abundance
## 1.000000
## brood parasite abundance|brood parasitism
## 2.000000
## settlement|bird abundance
## 2.000000
## off-territory movements|settlement
## 2.000000
## brood parasitism|reproductive success
## 3.000000
## pairing success|bird abundance
## 4.000000
## territory density|pairing success
## 4.000000
## leaf litter depth|bird abundance
## 1.200000
## peat depth|bird community
## 1.000000
## nest concealment|nest success
## 1.000000
## forest characteristics|bird community
## 2.500000
## forest characteristics|nest success
## 2.000000
## forest characteristics|bird abundance
## 1.200000
## vegetation structure|bird community
## 2.500000
## vegetation structure|nest concealment
## 5.000000
## vegetation structure|nest success
## 2.000000
## vegetation structure|bird abundance
## 1.200000
## nest site selection|nest success
## 1.666667
## distance to edge|invertebrate abundance
## 1.000000
## distance to edge|earthworm abundance
## 1.200000
## distance to edge|habitat quality
## 4.200000
## distance to edge|habitat selection
## 2.000000
## distance to edge|brood parasitism
## 1.500000
## distance to edge|leaf litter depth
## 1.200000
## distance to edge|forest characteristics
## 1.700000
## distance to edge|vegetation structure
## 2.700000
## distance to edge|nest success
## 1.000000
## distance to edge|nest predation
## 2.500000
## nest predation|site fidelity
## 6.000000
## nest predation|reproductive success
## 4.000000
## nest predation|nest success
## 3.333333
## nest attendance|nest predation
## 8.000000
## provisioning rate|nest attendance
## 6.000000
## matrix type|reproductive success
## 1.000000
## matrix type|bird community
## 1.000000
## matrix type|bird abundance
## 1.000000
## patch size|invertebrate abundance
## 1.000000
## patch size|plant community
## 1.000000
## patch size|habitat quality
## 5.000000
## patch size|occupancy
## 1.000000
## patch size|extinction
## 1.000000
## patch size|singing males
## 1.000000
## patch size|brood parasite abundance
## 1.000000
## patch size|predator abundance
## 1.000000
## patch size|predation risk
## 1.000000
## patch size|edge avoidance
## 1.000000
## patch size|settlement
## 1.000000
## patch size|off-territory movements
## 1.000000
## patch size|habitat heterogeneity
## 1.000000
## patch size|resource availability
## 1.000000
## patch size|reproductive success
## 1.000000
## patch size|brood parasitism
## 1.000000
## patch size|pairing success
## 1.000000
## patch size|territory density
## 1.000000
## patch size|arrival date
## 1.000000
## patch size|heterospecific density
## 1.000000
## patch size|bird community
## 1.000000
## patch size|peat depth
## 1.000000
## patch size|forest characteristics
## 1.333333
## patch size|vegetation structure
## 2.333333
## patch size|nest predation
## 2.333333
## patch size|provisioning rate
## 2.000000
## patch size|bird abundance
## 1.000000
# We can also look at changes in metrics over time
<- graph_metrics(cumulative_year, metric="edge_betweenness", return.df=T)
btwn ::kable(head(btwn[,1:15])) knitr
1975 | 1976 | 1977 | 1978 | 1979 | 1980 | 1981 | 1982 | 1983 | 1984 | 1985 | 1986 | 1987 | 1988 | 1989 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
patch size|bird community | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
forest characteristics|vegetation structure | NA | 1 | 1 | 1 | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 5 | 7 | 8 |
vegetation structure|vegetation structure | NA | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
patch size|bird abundance | NA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
patch size|forest characteristics | NA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
patch size|vegetation structure | NA | 1 | 1 | 1 | 3 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 |