^{1, 2}

^{1, 2}

^{1, 2}

^{1, 2}

^{1, 2}

^{1}

^{2}

With the development of high-throughput and low-cost sequencing technology, a large number of marine microbial sequences were generated. The association patterns between marine microbial species and environment factors are hidden in these large amount sequences. Mining these association patterns is beneficial to exploit the marine resources. However, very few marine microbial association patterns are well investigated in this field. The present study reports the development of a novel method called HC-sNMF to detect the marine microbial association patterns. The results show that the four seasonal marine microbial association networks have characters of complex networks, the same environmental factor influences different species in the four seasons, and the correlative relationships are stronger between OTUs (taxa) than with environmental factors in the four seasons detecting community.

The oceans cover approximately 139 million square miles—roughly 71% of the earth’s surface. Marine microbes are the important composition in the marine ecosystem. They can provide the basis for the ocean’s food webs and facilitate the flow of nitrogen, carbon, and energy in the ocean. Yet specific ecological relationships among these taxa and environment factors are largely unknown. This is partly due to the dilute, microscopic nature of the planktonic microbial community, which prevents direct observation of their interactions [

With the development of high-throughput DNA sequencing technologies that yield a mass of reads of rRNA (16S rRNA/18S rRNA) and DNA, we can describe the compositions of microbial communities, their diversity, and how communities change across space, time, or experimental treatments based on these sequence data [

In this paper, we proposed a novel method called HC-sNMF to detect the association community patterns and structures in the four seasonal marine networks. HC-sNMF provides new insights into the natural history of microbes, finding the relationship among microbes and environmental factors and trying to determine the microbial association pattern difference among seasons and which environmental factors might have the greatest influence on the varying diversity.

The 16S rRNA sequence dataset used in this paper was downloaded from

The work engine and process of HC-sNMF consist of the three following parts: (i) OTUs generation with NbHClust algorithm, (ii) network construction with mutual information algorithm, and (iii) community patterns detection with symmetrical nonnegative matrix factorization method. Figure

The flowchart showing the work process of HC-sNMF.

For OTU inflation caused by 454 sequencing errors, we proposed a heuristic clustering method based on neighbor seeds, namely, NbHCluster. Based on the distribution of homopolymer, the idea of neighbor sequence was introduced to generated neighbor seeds. Then, a heuristic cluster strategy was used to cluster the sequences based on neighbor seeds instead of single seed. Finally, a constraint parameter based on cluster size was used to fine the clusters. The pseudocode of NbHClust is as shown in Pseudocode

Input: Sequence Set

Parameter

{

Seed =

For

If

// then Sequence

Else

// with Neighbor Sequence Expanding Method, yield to

End If

END For // Travel all of Clustering Units, Subtract the Clustering Results the Parameter

MinClusterSize

Reassigned (

Neighbor Clustering Algorithm, Assigned to the nearest cluster units

End If

}

Notes:

Neighbor Sequence (

In order to research the association among different microbial species and environmental factors, we use vectors _{2} + NO_{3} (E12), salinity (E13), silicate (E14), SRP (E15), temperature (E16), total organic carbon (E17), and total organic nitrogen (E18) [

Beyond Pearson correlation, mutual information (MI) can capture nonlinear dependencies and topology sparseness between variables. Here, we used MI [

Suppose that

The probability of

The entropy and joint entropy of

So, we can calculate the mutual information between two variables

The permutation test was used to calculate the statistical significance. We considered that there are robust associations between OTU-OTU and OTU environmental factor vector if

For a weighted and undirected graph

Suppose that

Supposing that

Obviously, the optimal solution of

By normalizing the column of

In order to determine the optimal number of community

In order to evaluate the performance of NbHClust, we compared NbHClust with the common used heuristic clustering methods CDHIT [

Results of four methods with Clone43 dataset.

The number of seasonal microbial OTUs generated with NbHClust at 97% sequence identity is displayed in Figure

The distribution of seasonal microbial OTUs generated with NbHClust.

Marine microbial correlation networks in spring, summer, fall, and winter seasons (○-OTU, △-environmental factor).

Spring

Summer

Fall

Winter

In order to analyze the microbial diversity and the relationship among OTUs and environmental factors in spring, summer, fall, and winter seasons, we should construct the four seasonal marine microbial association networks. In general, mutual information (MI) provides a natural generalization of the correlation since it measures nonlinear dependency (which is common in biology) and has the ability to deal with thousands of variables (nodes). Although conditional mutual information (CMI) can detect the joint relationship of interesting variable (e.g., OTU) by two or more variables and other nonlinear interaction by two variables, its computational complexity is more than that of MI for large scale networks. Considering the number of OTUs and the computational time, we select MI to construct the four seasonal marine microbial networks. The four seasonal marine microbial association networks with MI algorithm are shown in Figure

Topological parameters of four seasonal marine microbial correlational networks and the corresponding random networks.

Seasonal networks | Random networks | |||||||
---|---|---|---|---|---|---|---|---|

Spring | Summer | Fall | Winter | 1 | 2 | 3 | 4 | |

Node number | 280 | 254 | 313 | 365 | 280 | 254 | 313 | 365 |

Edge number | 793 | 855 | 845 | 2970 | 793 | 855 | 845 | 2970 |

Avg. degree | 5.664 | 6.732 | 5.399 | 16.274 | 5.664 | 6.732 | 5.399 | 16.274 |

Avg. clustering coefficient | 0.235 | 0.282 | 0.237 | 0.389 | 0.010 | 0.026 | 0.022 | 0.046 |

Avg. power law degree | 1.237 | 1.287 | 1.467 | 0.968 | 0.666 | 0.442 | 0.659 | 0.013 |

Modularity | 0.579 | 0.567 | 0.561 | 0.365 | 0.39 | 0.34 | 0.404 | 0.217 |

From Table

The four seasonal marine microbial association communities detected by_{2} + NO_{3}) is correlative with OTU 206 (_{2} + NO_{3}) is correlative with OTU 7 (_{2} + NO_{3}) is correlative with OTU 14 (_{2} + NO_{3}) is correlative with OTU 494 (Cryomorphaceae) and OTU 443 (Chloroplast); and E7 (

The structure of microbial interaction pattern detected by

Spring

Summer

Fall

Winter

According to the annotation information of OTUs at taxonomic level by using a number of different annotation strategies (e.g., GAST [

The M1 community in spring microbial network is composed of 7 environmental factors (E1, E2, E4, E5, E6, E12, and E14) and 38 OTUs in which the 26 OTUs come from

The M1 community in summer microbial network is composed of 13 environmental factors (E1, E2, E3, E4, E5, E8, E9, E10, E11, E12, E14, E17, and E18) and 87 OTUs in which the 85 OTUs come from

The M1 community in fall microbial network is composed of 10 environmental factors (E1, E2, E3, E4, E6, E12, E14, E15, E16, and E18) and 65 OTUs in which the 59 OTUs come from

The M1 community in winter microbial network is composed of 2 environmental factors (E4, E16) and 158 OTUs in which the 144 OTUs come from

The M4 community in winter microbial network is composed of 3 environmental factors (E7, E11, and E12) and 11 OTUs in which the 3 OTUs come from

The community structural analysis in four seasonal microbial networks shows that a large fraction microbial association in class level occurs among

Mining the marine microbial association patterns and diversity is a key for exploiting the marine resources. Considering that the marine microbes are symbiosis or competition, exhibiting numerous, significant intra- or interlineage associations, we used the NbHClust and

The authors declare that there is no conflict of interests regarding the publication of this paper.

This paper was supported by the National Natural Science Foundation of China (nos. 61170134 and 60775012).

_{2}