Dissecting a Geographical Colourful Tapestry: Phylogeography of the Colour Polymorphic Spider Gasteracantha cancriformis

Species with large distributions provide unique opportunities to test how geography has in ﬂ uenced biotic diversi ﬁ cation. In this work, we aimed to explore the e ﬀ ect of geographic barriers on the distribution of the phenotypic and genetic variation of a spider species that is widespread in continental and insular America. We obtained an alignment of the mitochondrial locus Cytochrome Oxidase I (COI) for 408 individuals across the geographic range of Gasteracantha cancriformis . We used phylogenetics, population genetics, and morphology to explore the genetic and phenotypic variation of this species. We found ﬁ ve genetically di ﬀ erentiated and geographically structured populations. Three of them are distributed in continental America, separated by the Andes mountains, and two are in the Caribbean and Galapagos Islands. Some of these geographic clades shared haplotypes between them, which may be a consequence of dispersal. We detected at least 20 phenotypes of G. cancriformis , some of which were exclusive to a geographic region, while others occurred in multiple regions. We did not observe well-de ﬁ ned morphological di ﬀ erences across male genitalia. This evidence suggests that G. cancriformis is a widespread species with high phenotypic variation that should be explored in more depth.


Introduction
The great American tropical biodiversity has been suggested to be the result of a series of geological and climatic changes over time that hypothetically promoted divergence, and later allopatric speciation, by fragmenting the distribution of species that were formerly continuous [1]. In such scenario, different species would show similar patterns of diversification regardless of their dispersal capacity or ecological characteristics, and the divergence time between isolated lineages would match the origin of the landscape reconfiguration. However, recent studies have revealed a discrepant pattern has been reported in many lineages and is especially associated with speciation due to vicariance [5,6]. Nevertheless, divergence times younger than the Andes uplift and signals of gene flow between populations at both sides of the Andes indicate that some lineages have successfully dispersed across the Andes [4]. Crossing the Andes is a plausible hypothesis because its elevation varies throughout its range. In fact, there is palaeontologic [7] and genetic evidence [3] that described five altitudinal depressions [8], which likely facilitated dispersal of taxa.
Such dispersal model also explains community composition in islands, where species distributed in these regions may have arrived in one or several colonization events after island formation. For example, overwater dispersal is one of the hypotheses that seeks to explain the establishment of fauna in the Caribbean islands, which could have occurred via dispersal by vegetation draft, flight, or ballooning [9]. The latter mechanism is frequent in spiders, where individuals release threads of silk which are carried by the wind and are capable of moving long distances [10]. Similarly, ten biogeographic patterns of overwater dispersal have been proposed to explain the origin of species in the Galapagos Islands [11]. Interestingly, multiple morphological and genetic studies revealed that the biota of these islands varies in their geologic origins and divergence times from their sister lineages [11].
To date, we still know little on the demographic history of lineages that are common and abundant in continental and insular areas in America. For example, the 159 arachnid species distributed in the Galapagos Islands have yet to be considered in a geographic context, and their genetic composition and relationship with continental species have not been explored [12]. Species with wide distribution provide a great opportunity to test the role of geographic barriers in biotic diversification, since they can be used as a model to test different phylogeographic hypothesis at once [13].
Gasteracantha cancriformis (Linnaeus, 1758) is a colour polymorphic orb-web spider widely distributed in the Americas and found from southern USA to northern Argentina. This species displays a remarkable variation in the number of spines and dorsal colour of the abdomen, which has led to numerous descriptions of synonyms [14]. In the most recent revision of the genus in America, Levi [14] argued that cline variation in abdominal traits and little variation in genitalia precluded the discrimination of morphological subspecies. Also, two phylogeographic studies on G. cancriformis found genetically structured groups in the Caribbean and northern South America and supported a role of gene flow in its diversification [15,16]. However, a complete understanding of the evolutionary history of G. cancriformis is hampered by sampling gaps (for example, populations in the Galapagos Islands and Pacific coast of South America) and by the lack of information about morphological variation across the distribution of this species (e.g., male genitalia and abdominal colouration).
We used phylogenetic analysis, population genetic methods, and morphological data to study the genetic and phenotypic variation of G. cancriformis across its geographic distribution. We hypothesized that geographic barriers, such as the Andes and the oceans, limit the dispersal of individuals. Under this scenario, we expected to find genetically structured populations separated by these geographic barriers. We also predicted that genetic differentiation is coupled with phenotypic differences. Finally, given previous evidence supporting gene flow among divergent groups of G. cancriformis, we expected to find shared haplotypes among geographic clades.

Sample Collection and Colour
Coding. We collected 106 individuals of G. cancriformis from 19 locations distributed in Colombia, Ecuador, and Perú ( Figure 1 and Table S1). Specimens were colour coded based on spine and dorsal abdominal colouration, following Gawryszewski [17], preserved in a 20% dimethylsulfoxide (DMSO) solution saturated with NaCl, and stored at -80°C. Samples are deposited in the arthropod collection of the Universidad del Rosario, Colombia (CAUR#229), and at the Museo de Zoología of Universidad San Francisco de Quito, Ecuador (ZSFQ). When available in the literature, we included information on the occurrence of colour morphs in localities that were not sampled by us [17][18][19].
2.2. DNA Extraction, Amplification, and Sequencing. Genomic DNA was extracted from legs using Qiagen DNeasy blood and tissue kit (Qiagen, Valencia, CA, USA), following the manufacture's protocol. We amplified a fragment of the mitochondrial gene Cytochrome Oxidase I (COI) (500 bp; [20] by polymerase chain reaction (PCR) using the conditions used in the previous studies [15]. Fragments were cleaned by ExoSAP-IT (USA Corp., Cleveland, OH) and sequenced at MACROGEN Inc. laboratories (Seoul, Korea). Finally, we downloaded all COI sequences of G. cancriformis available in GenBank and Bold Systems, all of them obtained from regions different from those sampled by us (Table S1).
Gene sequences were read, aligned, and assembled in CLC MAIN WORKBENCH to obtain a consensus sequence per individual. We used the MUSCLE algorithm in MEGA X [21] to create an alignment that was visually inspected and corrected. This alignment was translated to protein to check for stop codons in Mesquite v.3.04 [22]. For the phylogenetic analysis, we discarded redundant sequences from the alignment, using one individual per haplotype to avoid zerolength branches.

Molecular Phylogenetics and Divergence
Times. Phylogenetic analysis was performed with maximum likelihood (ML) in IQ-TREE [23], letting the software select the best substitution model for the dataset. Node support was assessed with 10,000 UltraFast bootstraps. We used Gasteracantha geminata, Gasteracantha kuhli, Gasteracantha diardi, Gasteracantha fornicata, Gasteracantha sacerdotalis, and Macracantha hasselti as outgroups (Table S1). We also generated a Maximum Clade Credibility tree and estimated divergence times by Bayesian inference (BI) in BEAST v1.8 [24], pruning redundant sequences and using the sameg substitution models as in the ML analysis. We used the 2 Journal of Zoological Systematics and Evolutionary Research coalescence tree prior and applied a lognormal relaxed clock to estimate divergence times using a substitution rate of 0.0112 (SD = 0:001) substitutions/site/million years previously used for node dating and calibration in spiders [25]. We ran 100,000,000 generations sampling every 1,000 generations. We ran TRACER v1.7 to confirm that the effective sample sizes (ESS) of the parameters were >200 and to confirm the convergence of the chains to a stationary distribution. Ten percent of the trees were discarded as burn-in in TreeAnnotator, and the maximum credibility tree was selected best representing posterior distribution. Additionally, clade credibility in the ML topology was assessed contrasting six alternative topologies ( Figure S1) in IQ-TREE [23] with the following tests: likelihood-based, approximately unbiased (au), Kishino-Hasegawa (kh), Shimodaira-Hasegawa (sh), weighted Kishino-Hasegawa (wkh), and weighted Shimodaira-Hasegawa (wsh).

Characterization of Genetic Variation.
To characterize the genetic diversity of G. cancriformis, we calculated segregating sites (SS), nucleotide diversity (π), molecular genetic variation-Watterson theta (Ɵ), haplotype diversity (Hd), Tajima's D, Fu and Li's D, and Ramos-Onsins and Rozas R2 for each location and geographic group using DNAsp v5.0 [26]. Population structure was estimated among localities and geographic groups using relative (F ST ) and absolute (D xy and D a ) measures. We used an analysis of molecular variation (AMOVA) with 10,000 permutations in ARLE-QUIN v3.5 [27] to determine the hierarchy of genetic varia-tion using geographical regions as the higher-level grouping. Additionally, we tested if isolation by distance (IBD) explains the geographical distribution of the genetic diversity using a Mantel test and with a linear regression of the genetic distance as a function of the logarithm of the geographical distance. We also constructed an integer neighbour-joining haplotype network with alpha = 0:5 in POPART [28]. Lastly, to detect spatial genetic boundaries associated with geographic barriers, we applied Monmonier's algorithm using a Delaunay triangulation [29] in the R package "adegenet" [30].

Species Delimitation.
To test for the existence of multiple species in our sampling rather than it being a single widespread species, we implemented two species delimitation strategies: multirate Poisson tree processes (mPTP) [31] and a Bayesian implementation of the general mixed Yulecoalescent model bGMYC [32]. For mPTP, we first calculated the minimum branch length and used this value and the ML tree as input to run 10 Markov Chain Monte Carlo (MCMC) chains of 100,000,000 generations, sampling every 1000, and with a burning of 10% of the total chain's length. For bGMYC, we randomly sampled 100 of the 100,000 trees obtained in BEAST to run a MCMC chain of 100,000 steps with 10% steps as burn-in and thinning intercept of 100 steps. Presumptive species were defined based on a threshold of 0.95 on the conspecific probability. We calculated the intra-and interspecific genetic distance for the phylogenetic groups of G. cancriformis because the difference between these distances (i.e., barcoding gap) is   Table S4 and Figure S10-S13). Colours represent geographical regions, green: East Andes (EA), red: West Andes (WE), blue: Dry Pacific coast of Perú and Ecuador (DP), yellow: Caribbean islands (C), and purple: Galapagos Islands (G). Colour frames show the phenotypic diversity for each geographical region. The dots above morphs that symbolize that they are also found in other geographical regions, and the colours of the dots are coded as above. Dots within the map correspond to locations where we collected genetic and phenotypic information. Stars are the sites with published phenotypic information for the species [17][18][19].
3 Journal of Zoological Systematics and Evolutionary Research considered to be essential for accurate species delimitation [33] and has been successfully used to delimit arachnid species in families such as Tetragnathidae, Lycosidae, and Araneidae [34]. These measurements were computed using the Kimura 2-parameter (K2P) [35] in MEGA X with 10,000 bootstrap replicates and default parameters. We also generated morphological descriptions for the genitalia of 11 males from San Andrés, Quito, La Pedrera, and Lima, following Levi [36], and compared them with previously available descriptions.
We observed haplotype sharing between subclades ( Figure 3 and S2-S5), with all individuals from San Andres Island (in the Caribbean) grouping with the WA clade, and some from Florida and Texas (WA) falling within the C clade ( Figure S2). Also, few individuals from Lima and Piura (DP) clustered with EA ( Figure S3), and similarly, some samples from Jaen (EA) grouped with DP ( Figure S4). Additionally, some specimens from Villavicencio (EA) and the north of DP fall within WA, while some samples from Quito (WA) grouped with DP ( Figure S5).

Characterization of Genetic Variation.
The WA clade showed the highest nucleotide diversity (π) and molecular genetic variation (θ), while DP showed the lowest values for these estimates (Table S2). All neutrality tests support a significant signal of population expansion in WA (Table S2), which is consistent with the haplotype network analysis where haplotypes from WA fit a star-like pattern with a single highly frequent haplotype connected to numerous rare haplotypes ( Figure 3). Although this expansion pattern is observed in other geographic groups, it was less supported (Table S2).
Overall, absolute and relative measures show that geographic groups of G. cancriformis are genetically structured ( Figure 4) and these genetic differences are not explained by IBD (Mantel r = 0:08, p -value = 0.02; Figure S6). Thus, geographic barriers may be responsible for such divergence. Consistently, Monmonier's algorithm revealed a geologic break that coincides with the Andes, the Pacific Ocean, and the Caribbean Sea ( Figure S7). Also, the AMOVA indicated that genetic variation in our dataset is explained by genetic differences among geographical groups (61%) rather than by differences among or within subpopulations (Table S3).

Species Delimitation.
The coalescent approaches to delimit species gave consistent results. For instance, bGMYC and mPTP delimited the five geographic clades observed in the phylogenetic analysis as separate entities, although mPTP detected an extra nongeographic lineage composed by individuals from WA and EA (Figure 2). The barcoding gap revealed that the intra-and interspecific genetic distance values of the phylogenetic groups did not overlap (Table 1). In terms of genitalia morphology, the male palpus showed slight differences in the apophyses prolateral tibial (pTA), median (M) and paramedian (PM), and the embolus (E) ( Figure S8). For example, males from San Andrés and Quito have a thick M with a marked and sharp distal upper and lower knob, while males from Lima only have the distal upper knob, and males from Puerto Rico lack of distal knobs at all ( Figure S8). Also, males from San Andrés and Quito have a more ellipsoidal PM compared to Lima and Puerto Rican males. Finally, Puerto Rican males show the shortest E and their pTA has a blunt end ( Figure S8).

Phenotypic Geographical Variation.
We found 20 morphotypes of G. cancriformis across its distribution, which vary in abdominal phenotype: either in their dorsal colouration and/or the length and/or number of spines (Figure 1 and S9-S13). Most of these morphotypes were exclusive to one of the geographical clades described above, while others occurred in more than one clade (Table S4). For example, the black morph is present in all five geographical groups, while the orange morph with a black longitudinal band is restricted to the Amazon (Table S4, Figure 1, and S11).

Discussion
This study provides the most complete molecular phylogenetic analysis of G. cancriformis. All analyses support the hypothesis that strong geographical barriers to dispersion contributed to structure the genetic variation of G. cancriformis. We found five genetically differentiated groups that are separated by the Andes, the Pacific Ocean, and the Caribbean Sea. However, these groups share haplotypes among them which may be a consequence of gene flow, meaning that these geographic barriers are permeable. Contrary to our expectations, we did not find considerable morphological differences in the genitalia of males from different geographic clades. Nevertheless, we did find that some colour phenotypes are unique to some localities, although this pattern was not reflected in the mitochondrial phylogeny.
The geographic groups of G. cancriformis diverged during the late Miocene (5.07 Mya) and early Pleistocene (2.64 Mya), a time when the current geographical landscape of America was already formed or nearly completed [1,37]. This suggests that dispersal across geographic barriers, rather than vicariance, most likely shaped the 4 Journal of Zoological Systematics and Evolutionary Research genetic diversity of G. cancriformis. For example, the two major clades that we found are separated by the northern Andes, specifically by the Eastern Cordillera of the Colombian Andes, which already underwent significant elevations by the mid Miocene (15-5 Mya; [37]. This time precedes the divergence estimation (mean of 5.07 Mya), which is older than that of previous works that focused on specific geographic regions and thus had a limited sampling [15,16]. Unexpectedly, we found that Dry Pacific (Western Perú and Ecuador; DP) was more closely related to the Eastern side of the Andes (EA) rather than the Western side of the Andes (WA; excluding the Dry Pacific populations), which is counterintuitive since dispersal along the Pacific coast would presumably be easier than crossing the Andes. However, this pattern has also been documented in birds distributed in dry tropical forests at both sides of the Central Andes [3,38,39]. Interestingly, there is a humidity transition right on the Equator that is leading to a biome shift [43], which may be contributing to maintaining the DP and WA lineages apart. Even so, we observed some shared haplotypes between these lineages, suggesting that the barrier that separates them, whatever it is, is permeable.
Given that DP and EA diverged 3.23 Mya (95% HPD = 1:84, 4.79 Mya) when the Central Andes was already formed [37], and the presence of shared haplotypes at both regions, we hypothesize that multiple cross Andean dispersal events took place at the level of the Porculla pass in northern Peru. Dispersal across this pass is not without precedent as

Journal of Zoological Systematics and Evolutionary Research
in birds there is evidence of cross-Andean dispersal and gene flow [3]. Furthermore, genetic interchange across Andean passes has been reported for G. cancriformis between WA and EA at the level of the Andalucía pass and the Suaza-Pescado valleys in the Eastern cordillera of the Colombian Andes [16].
The inclusion of samples from Texas in this study confirms the presence of Caribbean haplotypes in North American populations [15], meaning that the Caribbean sea, at some point, promoted genetic differentiation but subsequent migration occurred. A similar scenario was reported in other arachnids with good dispersal capacity [41]. San Andrés Island is a particular case within the Caribbean plate, where we did not find any Caribbean haplotypes in our sampling. Instead, all the genetic variation found in this island corresponds to the WA clade. Because San Andrés Island originated in the early Miocene [42], but its population of G. cancriformis did not have genetic similarity with the Antillean island populations, we hypothesized that G. cancriformis from San Andrés originated by dispersion from the mainland of South America or Central America. This phylogeographic pattern was observed in butterflies [43] and lizards [44] from San Andrés.
Another notable finding was that G. cancriformis individuals from Caribbean and Galapagos Islands were more closely related to each other than to continental populations. This geographic pattern has been recorded in more than 31 cases of terrestrial lineages [11], including one case in the jumping spider genus Cerionesta (Salticidae), which is composed by two species, one distributed in the Galapagos Islands and the other one in the Caribbean Island St. Vincent [45]. However, this case is documented with morphological data; therefore, our work is the first one to support this biogeographical pattern in arachnids using genetics. Moreover, the divergence time between the Galapagos and the Caribbean populations of G. cancriformis (2.64 Mya; 95% HPD = 1:63, 3.75 Mya) coincides with the most recent estimation for the origin of the Darwin finches (2.6 Mya; [46]). Because this date is more recent than the Galapagos geological origin (minimum age 3.3 Mya; [47], we hypothesize that a plausible evolutionary scenario involves dispersion, either by island hopping or long-distance migration.
Both methods of species delimitation suggested that the geographic groups of G. cancriformis may correspond to different species. In agreement with this, we obtained  Journal of Zoological Systematics and Evolutionary Research K2P interspecific values that are considerably higher (minimum K2P = 4:2% and maximum K2P = 8:6%) than the threshold established to delimit species in Araneidae (K2P = 4:4%; [34]). However, we consider that the evidence supporting G. cancriformis as a single species with high phenotypic and genetic diversity, as first stated by Levi [14], is stronger for two reasons. First, we did not observe differences between the male genitalia of the geographical clades (which is an important character to delimit species in Araneidae), thus suggesting absence of mechanical isolation. Second, there are multiple shared haplotypes between the divergent lineages, meaning that gene flow may still occurs. Furthermore, species delimitation methods can be biased by population structure, overestimating the number of species [48]. In consequence, the hypothesis of multiple Gastercantha species in America still needs to be properly addressed in studies that investigate various reproductive isolation barriers and the incidence of gene flow across the genome. We advise that future studies in arachnids should use an integrative taxonomy approach, where concordance across multiple sources of evidence is needed to validate species [49].
Here, we observed an intriguing geographic pattern of colour polymorphism: some morphs are exclusive to some geographical groups, but other morphs occur in multiple geographic groups. The latter means that mtDNA variation does not explain colour variation. This discrepancy between colouration and genetic structure may occur because colouration is a consequence of natural selection instead of geographical isolation and genetic drift, as proposed for the wood tiger moth [50]. We hypothesized that G. cancriformis colourful tapestry could be the result of a combination of local adaptation and phenotypic convergence mediated by gene flow or ancestral polymorphism. Alternatively, the presence of similar colour morphs in divergent geographical clusters could also be the result of different genetic mechanism as observed in the happyface spider in the Hawaiian archipelago [51]. Determining the evolutionary origin of this spectacular colour variation requires further research involving field experiments testing the effect of natural selection and the genomics of colouration [52].

Data Availability
The new DNA sequences generated in this study were deposited in GenBank under accession numbers OM638445-OM638558. Raw data are available from GitHub repository https://github.com/fcsalgado/Gasteracantha_COI.

Conflicts of Interest
The authors declare that they have no conflicts of interest.  Figure S1: clade credibility analyses. Colours and abbreviations are coded as in Figure 1. Below each phylogenetic tree are the results of the following topology tests: likelihoodbased (logL and deltaL), approximately unbiased (au), Kishino-Hasegawa (kh), Shimodaira-Hasegawa (sh), weighted Kishino-Hasegawa (wkh), and weighted Shimodaira-Hasegawa (wsh). Figure S2: geographical representation of haplotype shared between the Caribbean islands and other geographical regions. Dots represent geographical origin of the samples and lines indicate their place in the phylogeny. Colours are coded as in Figure 1. Figure S3: geographical representation of haplotype sharing between the  Figure 1. Figure S4: geographical representation of haplotype sharing between the Eastern of the Andes and the other geographical regions. Dots represent geographical origin of the samples and lines indicate their place in the phylogeny. Colours are coded as in Figure 1. Figure S5: geographical representation of haplotype sharing between the Western of the Andes and the other geographical regions. Dots represent geographical origin of the samples and lines indicate their place in the phylogeny. Colours are coded as in Figure 1. Figure S6: isolation by distance plot. Red line is the trending line for the linear regression. Figure S7: geographical barrier test (Monmonier's algorithm). Solid line represents the main geographic barrier, and dotted lines show the Delaunay triangulation and Voronoi tessellation. Dots are sampled sites and their colours are coded as in Figure 1. Figure S8: male palpus illustrations from four different locations. Quito and La Pedrera are localities from Western and Eastern of the Andes, respectively. Lima is in the Dry Pacific and Puerto Rico is a Caribbean Island. This last drawing was obtained from Levi (1969). Apophyses prolateral tibial (pTA), median (M) and paramedian (PM), and the embolus (E). Figure S9: detailed morph distribution in the Caribbean and Galapagos Islands. The dots are the locations where we collected genetic and phenotypic information and the yellow polygon encompass the lesser Antilles. Stars are the sites with published phenotypic information for the species. Details about the morphs are available in Table S4. Figure S10: detailed morphs distribution in the Dry Pacific localities. The dots are the locations where we collected genetic and phenotypic information. Details about the morphs are available in Table  S4. Figure S11: detailed morph distribution in the Eastern Andes. The dots are the locations where we collected genetic and phenotypic information. Stars are the sites with published phenotypic information for the species [17]; [18]. Details about the morphs are available in Table S4. Figure  S12: detailed morph distribution in the Western Andes. The dots are the locations where we collected genetic and phenotypic information. Dot without connecting line did not have phenotypic information available. Details about the morphs are available in Table S4. Figure S13: photographs illustrating the dorsal abdomen colour variation of Gasteracantha cancriformis. Under each image is the description of the abdomen colour pattern (upper part) and the spines characteristics as described in Table S4. Table  S1: samples collecting data information and haplotype groups.