Extrapolative Estimation of Benthic Diatoms ( Bacillariophyta ) Species Diversity in Different Marine Habitats of the Crimea ( Black Sea )

Benthic diatoms species richness was analyzed based on 93 samples collected at 8 areas of Crimea (Black Sea) on sandy/muddy bottoms within depth range 6–48m. Totally 433 species were found. Expected species richness Sexp was estimated by application of Jack-knife -1 and -2, Chao-2, and Karakassis-S ∞ estimators. Magnitude of Sexp, resulted from S∞, displayed the most similar values to the observed species number (Sobs). Overestimation of Sobs (10–13%) occurred for small number of samples (<12), and slight underestimation (3–5%) occurred when sample numbers exceeded 40–43. The other estimators gave large overestimated results (Chao—from 21 to 70% higher than Sobs, Jack-knife—23–58%). The relationship between number of samples (X) and number of observed species (Y) was calculated considering all 93 samples: Y = 79.01 ln(x) + 34.95. Accordingly, not less than 10 samples are required for disclosing about 50% of the total species richness (433); to detect 80% (347 species) not less than 46 samples should be considered. Different configurations of S ∞ method were applied to optimize its performance. The most precise results can be achieved when the calculation of the Sexp is based on sequences of randomized samples with sampling lags of 10 to 15.


Introduction
Species richness is an essential attribute of a biological community and a widely used surrogate for the more complex concept of biological diversity.Quantitative change in species richness is an important characteristic underlying many biotic indices and integral assessment of community structure and condition in relation to habitat [1][2][3][4].
In the ecological study of benthic diatoms (Bacillariophyta), effective comparative assessment of the species structure of taxocene in various habitats, including protected marine areas, is a key problem that is important also to the establishment of conservation priorities.Therefore, the reliability and deviation of species richness measurements is one of the essential methodical tasks of diatomology that has been insufficiently studied as yet.The diatoms species richness may differ considerably even in adjacent sea bottom areas because of diverse environmental conditions and spatial microdistribution pattern of microalgae.In prognostic assessment of species richness and diversity of taxocene in heterogeneous biotopes, it is methodologically important to determine the relationship between sampling effort and the number of species found in these samples.Hypothetically, the larger is the number of samples, the larger is the number of detected species.In practice, however, only a reasonable minimum of samples, usually on the researcher's request, is collected from a sampling site because of constraints inherent in sampling effort and further cameral treatment.The total number of benthic diatoms samples taken from a studying area is usually confined to 15-20 samples; however, often the species structure and diversity of a taxocene are assessed from only 3-5 samples [5][6][7].Certainly, species composition of such microobjects as diatoms can rarely be completely determined for a sampling site even given a sufficient number of samples and their exhaustive taxonomical examination.In this case, prognostic algorithms (estimators) can provide a tool for estimation of expected species richness in different taxonomical groups of benthos [2,8,9].It should be noted International Journal of Biodiversity that application in our study the several widely used estimators for prognostic assessment of benthic diatoms species richness is one of the very few examples of such studies in diatomology [10,11].
The objectives of the study were (1) to implement comparative prognostic estimation of the expected species richness of benthic diatoms for several near-shore sampling sites of the SW Crimea (Black Sea) and to evaluate the precision of each estimator used; (2) to derive and statistically estimate a generalized optimal ratio between an essential minimum of sampling effort and a maximum of relevant data on the taxocene species richness in the marine coasts of the Crimean peninsula.

Material and Methods
In prognostic estimation of the expected number of species to be found through examination of a certain number of samples (), the records of benthic diatoms species composition from 93 samples were used.Material for the investigation was collected in 1996-2009 during the summer or autumn season from soft sediments (muddy sand) within the depth range 6-48 m at several sampling sites (or sampling areas) in Sevastopol, Balaklava, Karantinnaya, and Laspi bays and at the open coast Belbek (SW Crimea, Black Sea) (Figure 1).
Sediment samples were collected from soft-bottom substrate either using a Petersen grab-corer (at the most deep places) or by a diver using a hand-corer.Samples for diatom analysis (duplicate from the each station) were retrieved from the uppermost 2-3 cm layer of each sampled sediment bulk with the meiobenthic tube (surface area 15.9 cm 2 ).For better separation of epipelon and epipsammon, the sediment samples had been preliminary treated in an ultrasonic bath for 20 min; later samples were refined using the standard technique of cold burning in HCl and H 2 SO 4 with the addition of K 2 Cr 2 O 7 [12].Cleaned diatom valves were mounted using mountant of Eljashev for light microscopy and later examined for abundance and species richness.Diatom cells were quantified under the microscope (×400) in Goryayev chamber (7 × 10 −3 cm 3 ), in three random replicates from each sample; later cells number (abundance) was recalculated per 1 cm 2 substrate surface as average value from three replicates.
The check-list of benthic diatoms was compiled for each of the sampling sites (see Supplementary Material available online at http://dx.doi.org/10.1155/2013/975459).The diatom species were counted and abundance values expressed per 1 cm 2 of seabed.The rated minimum abundance of a species in the samples was estimated 250 cell⋅cm −2 .Species not found in Goryayev chambers but registered only on permanent slides (i.e., rare or solitary species) were included in the rectangular matrix (species density versus samples number) as a conventional minimum value of 10 cells cm −2 for quantitative uni-and multivariate statistical analyses.Such values (not equal to 1) were used for the preliminary procedure of fourthroot transformation of the extensively ranged values of initial diatom abundance (250 to 3.56⋅10 6 cell⋅cm −2 ) under further calculation of the similarity and diversity indices [13].The complete list of diatom species for each sample was identified to intraspecific level on permanent slides using the microscopes Zeiss Axiostar Plus and Nikon Eclipse E600 (×1000).The species were identified using the taxonomic atlases [14][15][16][17][18][19][20].Afterwards, real species richness was compared with the expected estimates yielded by computation methods.
In the comparative prognostic estimation of expected species number we used commonly applied Chao and Jackknife estimators [2,9,21,22] and estimator  ∞ based on algorithms of regression analysis [23,24].
The latter method implies that computation of a maximum expected number of species ( exp ) relies on the determination of a theoretical upper limit (asymptote) for the speciesaccumulation curve plotted from averages derived from many random permutations when two successive samples contain identical number of species using infinitely large number of samples.The expected number of species, that is, the asymptote magnitude, is calculated through solving the linear equation of the relationship between the ultimate species numbers accumulated in  samples ( obs() ) and in  + 1 samples ( obs(+1) ) against parameters of equation  =  which is the bisecting line of 1st coordinate quarter.It was suggested to develop the estimator algorithm, as it has been done in our work, by taking into account the different sampling lag widths between the pair samples along their original sequence, that is, by constructing a series of regression equations  obs() = ( obs(+) ), where  = 1, . . .,  − 1 for sampling lags of different extension [25].Such methodical amendments, though requiring far larger number of samples, enable more precise estimation of maximum expected species number [9].
The assessment of expected species richness by two other estimators, Chao-1 and Chao-2, involves relatively small number of samples [21,24,26].Both the estimators are calculated by the formula  total =  obs + ( 2 /2), where  total is the total predicted species richness,  obs is the number of species observed in the examined batch of samples, and  is the number of species represented by one individual (singleton species; Chao-1) or the number of species observed in only one sample (unique species; Chao-2).Coefficient  is the number of species represented by 2 individuals (Chao-1) or the number of species registered in only 2 samples .Since in our samples the admitted minimum of diatom cells was 10 cell⋅cm −2 , the curve of Chao-1 estimator overlaps the cumulative curve of detected species number; that is,  total =  obs ; and therefore only estimator Chao-2 was used in the analysis.
The Jack-knife estimators rely on the record of expected number of rare species.
total =  obs +  ⋅ ( − 1/), where  is the number of species found only once in the studied samples and  is the total number of samples [2,27].This estimator performs effectively when relatively small number of samples is processed; it has been successfully applied in analysis of data sets pertaining to marine benthos [9,25].
Two statistics, relative error (RE) and squared relative deviation (SRD), were used to evaluate the precision of the estimators relying on the deviation of the expected species number from the real number contained in a finite set of  samples, that is, over-and underestimation of real species richness: RE = ( exp −  obs )/ obs , where  exp is the expected number of species determined using the estimator, and  obs is the observed value of species richness computed from the upper limit of the species accumulation asymptote for the range of samples from 1 to , multiple permutations taken into the account [28].RE estimates relative difference between the value estimated and the true value of the species number under examination of different multitudes of samples.The square of RE (SRD) assesses the closeness of the estimator to the real number of species regardless of the deviation sign, that is, the measure of estimator inaccuracy [29].
In this study, we compared the performance of 4 estimation methods using real and simulated data sets.The reliability of each method was evaluated by calculating the bias and precision of its estimates against the known total diatom species richness.These two metrics allow an objective quantitative comparison of the performance of estimation methods.Bias measures whether an estimate consistently under-or overestimates the parameter.Precision measures the overall closeness of simulated curve to the true number of species along the overall succession of samples: where  = 1 to  = , and  is the number of examined samples.  is the species richness as extrapolated by the respective estimation method and   is the asymptote of the species richness accumulation curve for  samples [22].It is implied that a "good" estimator should have bias values close to zero and small precision values.Another measure of bias is the percentage of overestimates.If the estimator always overestimates   , it will have positive bias and 100% overestimates, and if it always underestimates   , it will have negative bias and 0% overestimates.An unbiased estimator returns zero bias and 50% overestimates [8].
Rarefaction method [30,31] is an important diagnostic tool that consists in the plot of randomized richness against the sampling intensity used in comparing diatom species International Journal of Biodiversity richness from different samples.The rarefaction (numerical species richness) index (ES () ) is based on different modes of species accumulation values in a large number of hypothetical subsamples with various numbers of diatom cells (10, 20,. .., 500, etc.) having been repeatedly randomly selected from the whole sample, so that the variance among randomizations remains meaningful for large number of sampling units or individuals.
Multivariate analysis of diatom assemblage species structure was conducted using the PRIMER v5.2 software package [32].Affinity of assemblage composition between sampling areas was estimated on ranked triangular similarity matrices based on the Bray-Curtis index on fourth-root transformed initial diatom abundance data.Results from nonmetric multidimensional scaling (MDS) were used for a graphical representation of possible similarities between groups of samples (sampling sites) according to similarity of diatom taxocene species structure.Possible differences between sampling sites were tested for significance using analysis of similarity (ANOSIM).Smoothed species accumulation curves for each sampling sites were generated using 1000 random permutations.PRIMER's DIVERSE routine was used to calculate the number of individuals, number of species, values of Chao and Jack-knife estimators and rarefaction indices ES() for each sample.Means of indices were then calculated for all data sets and various subsets of samples.Computation of data ( exp values averaged over 1000 randomized runs) for Karakassis- ∞ extrapolative model was performed also using DIVERSE routine with further calculations of regression equations using MS Excell.

Results and Discussion
The relationship between the observed species richness of benthic diatoms and the number of samples was estimated using the records from 93 samples taken in 8 near-shore seawater areas of the SW Crimea.Microscopic analysis revealed altogether of 433 species and intraspecific taxa (Annex 1), pooled in 96 genera, 51 families, 27 orders, and 3 classes of Bacillariophyta (Table 1).Species richness was highest for genera Nitzschia Hassall (53 species and intraspecific taxa), Amphora Ehrenberg (41), Navicula Bory (37), Cocconeis Ehrenberg (26), and Diploneis Ehrenberg ex Cleve (20).Aulacoseirales, Biddulphiales, Eunotiales, Paraliales, Rhabdonematales, Thalassionematales, and Toxariales were the most species-poor orders, where only one species recorded in each.
The list of species was compiled for each sampling sites (see Annex 1).These data have supplemented the created taxonomic base of the Black Sea diatom flora [33], based on the literature and own data ( [11,15,16,34,35], etc.)According to this base, updated inventory of Black Sea benthic diatoms from 5 regions (Caucasian, Crimean, Bulgarian, Romanian coasts, and North-Western shelf) holds 1093 species and infraspecific taxa (ssp.), pooled in 942 species, 142 genera, 60 families, 32 orders and 3 classes, following the recent systems [17,19,20,36].The latest check-list of entire Crimean coast includes 886 sp. and ssp., belonging to 800 species, 130 genera, 55 families, and 29 orders [33].However, previous studies of benthic diatoms diversity at Crimean shores were rather episodic and nonnumerous.Most of them had covered only spatially confined locations (e.g., one small bay [34]), only a few interseasonal samples in one point [37] or combined retrospective nonquantitative data on species richness throughout rather enlarged water area [15].Therefore, their results have not provided a comprehensive data on Crimean diatoms diversity which could be considered as a quite exhaustive base on species wealth for evaluation of estimator's accuracy (ratio  obs / exp ) in our study.For comparison, in previous studies have performed in the Sevastopol region by various researchers, was found 93 sp. and ssp. of benthic diatoms [34], 136 sp.& ssp.[37], 161 sp. and ssp.[15].Thus, the number of benthic diatoms species found in all our samples altogether (433) was much greater than in previous check-lists and accounted about 40% of revealed species richness for the Black Sea and almost of 50% of the total registered benthic diatoms diversity for the Crimean coast.
Sevastopol Bay (8.3 km 2 ) was divided into 3 parts: inner, central, and outer, with conspicuously different environmental parameters such as depth, grain-size composition, pH, Eh, O 2 concentration in near-bottom layers, and the industrial pollution level of bottom sediments with trace metals and organic pollutants such as PCBs, PAHs, and pesticides [38].Such spatial division of the bay bottom area was based on the earlier obtained results on the assessment of key abiotic factors impact on the diatom taxocene structure in different part of Sevastopol Bay [13].
Results of MDS ordination (stress = 0.19) appeared to confirm the visual separation of Belbek samples from the others and the rather close interarrangement of the samples from Sevastopol and Balaklava bays and samples from Laspi and Karantinnaya bays (at 25% similarity level) (Figure 2).Visual differences in samples' interposition patterns on 2D plot were then statistically proved by the one-way ANOSIM test.The results indicated significant differences in the taxocene structure among almost all compared groups of samples ( global = 0.672,  < 0.001), excepting pairs "Karantinnaya Bay versus Laspi Bay" and "Sevastopol central part versus Sevastopol outer part, " where pairwise tests did not show significant differences ( pairwise = 0.356,  < 0.2).
Diverse environmental conditions and statistically significant differences in diatom taxocene structure between the compared sites are the prerequisites for comparative analysis of habitat-related differences in relationships between number of samples and revealed species richness.On the other hand, such habitat-specific distinctions in speciesaccumulation pattern can take into account the variability of biotopes and, consequently, to improve the reliability of deductions under the most generalized model "sampling effort versus species richness" for the whole studied region (Crimea).Results based on this generalized region-specific curve can be applied for comparative interregional analysis of relationships between species richness, and sampling effort.Hence, the subsequent analysis of the diatom speciesaccumulation curves was performed both for each of 8 sampling sites and for the entire sequence of all 93 samples.
The number of samples within each of the sampling sites, the observed diatom species richness and the expected number of species assessed by different estimators are given in Table 2.
Application of estimators presumes that the prognostic estimation of species richness should overestimate the observed species number in the samples ( obs ) that conform to data in Table 2.The expected diatoms species richness ( exp ) estimated by the  ∞ method slightly overestimated (1-8%) the  obs value for different sampling sites.The exception is Inkerman, where  exp is about 18% as large as  obs probably because only 6 samples were collected.Other estimators more considerably overestimated the  obs values: Chao 21-70% and JN-1 24-36% and JN-2 33-58%, depending on sampling effort in each sampling site.
Habitat-dependent relationships of accumulation of new species ( exp ) with increasing sampling efforts were derived at each of the sampling sites (Figure 3).The average exp values were computed by 1000-fold randomized runs for different numbers of samples.
The most rapid rise of the  exp with increasing number of samples (species accumulation curve) was observed in the Belbek area (open coast) where the total number of diatom species detected from 9 samples was 244, that is, 56% of the total list of diatom species registered at all 93 samples.The accumulation curves corresponding to other sampling sites were more flat and, despite larger number of examined samples, showed a lesser number of observed species.Similar relationships were obtained between species accumulation and increasing number of samples derived for the sea bottom areas in Laspi Bay and Balaklava Bay, as well as for Sevastopol Bay, Karantinnaya Bay, and Inkerman.The latter set of 3 accumulation curves displays a similar mode though the number of taken samples and the number of diatom species found in each of the three sampling areas were different (see Table 2).The accumulation curves ( exp ) did not arrive at horizontal asymptote at any sampling area.Such results imply that the actual number of diatom species derived from the result of examination of the largest number of samples from the sampling areas is considerably lower than the expected species richness obtained by the estimators.The cumulative curve integrating the results of all 93 samples from 8 sampling areas is also shown in Figure 3. Based on this randomized curve, the parameters of the generalized relationship between number of samples () and the number of observed species () were calculated.This relationship is reliably described by log-equation  = 79.01 ln() + 34.95 with the correlation factor  = 0.99.
These results suggest that not less than 10 samples, that is, nearly 11% of the total studied number (93) should be considered for disclosing about 50% of the total species richness (433 species) of benthic diatoms which actually occur on sandy/muddy sediments at the near-shore marine areas of the SW Crimea.To detect 67% (or 290 species) and 80% (347 species) of the total species richness (on assumption of equal probability to reveal any diatom species in the sample), not less than 24 and 43% of the total number of considered samples, respectively, should be examined.Obviously, for other near shore water areas in which the number of collected samples and the total number of observed species could be different, the parameters of species-accumulation curves may differ, as well.Some other researchers also reported similar percentage of observed species compared to the expected maximum depending on the size of sampling effort.For instance, the randomized estimation of zoobenthos species richness performed on the Norwegian shelf, and in the coastal sea of Hong Kong evidenced that analysis of 12 and 16% of the total number of samples (101) disclosed up to 50% of the species richness; to elicit 80% of the totality of 809 species in Norway and of 386 species in Hong Kong, 48 and 57% of the total number of samples, correspondingly, should be studied [39].Results of methodically similar analysis applied to the macrozoobenthos samples closely taken in the Northern Sea has shown that examination of the first 7 samples in randomized row (10% of the total number of 70 samples) disclosed 50% of all species which dwell in the sampling area; eliciting that 80% of the species richness required not less than 26 samples, or 37% of the total sampling effort [25].
In the comparative estimation of diatom species richness, we also used the rarefaction index ES().This index entailed estimation of the expected number of species in the sequence of conditional subsets with different number of cells (10, 20,. .., 500) randomly withdrawn from the totality of diatom abundance counted in the whole sample.The rarefaction curve is a math expectancy function of "species saturation" depending on abundance of the whole community (or taxocene of diatoms in our case).The ES() plot corresponding to the Belbek sampling site ascends highest comparatively with plots corresponding to other sampling areas (Figure 4).
In the Belbek area, the expected number of diatom species in the conditional subsets of 200, 300, and 500 cells was estimated to be 62.1, 71.9, and 84.5, correspondingly.This provides evidence about high species saturation in this taxocene, probably owing to species brought with the Belbek river inflow and to the slightly polluted level of the bottom deposits.Estimates of the expected species number in the various-sized subsets of cell abundance were lower (and nearly identical) for diatom taxocenes in Sevastopol Bay and in Balaklava Bay.The average expected number of species  in the subsets of 200, 300, and 500 cells was 49.2 ± 0.4, 56.6 ± 0.1, and 65.5 ± 0.2, respectively.The lowest species diversity subgroup includes the Laspi Bay, Karantinnaya Bay and Inkerman sampling sites in which the average expected number in the subsets of 200, 300, and 500 cells is 34.8 ± 2.3; 38.7 ± 1.9, and 43.2 ± 2.8 species, correspondingly (see Figure 4).
Hence, environmental differences, including level of pollution in a certain sampling area, can influence the relationship between new species accumulation and greater sampling effort and, eventually, the structural peculiarities of diatom taxocene.
As mentioned above, though the estimators applied in our study had generally overestimated the observed species number ( obs ), each of them is characterized by a different degree of deviation in  exp values from  obs .Such differences can be considered as accuracy measures of a certain estimator and designate its applicability for prognostic estimation of expected species richness, especially for such very abundant microobjects as benthic diatoms.Therefore, in further analysis we attempted to comparatively evaluate the accuracy of each of the applied estimators based on calculation of the expected diatom species number.Randomized accumulation curves, corresponding to 4 estimators and to simple species richness, are shown in Figure 5.
The species richness accumulation curve illustrates the relationship between  obs against sampling effort.The curve monotonically increases, not converging to a horizontal asymptote at least as far as the extreme values in the entire series of 93 randomized samples.The Chao and Jack-knife estimators significantly overestimate the expected number of diatom species compared to the real species richness, especially given a small number of samples, for example, less than 10-12 samples.Beginning from this sampling effort level, the cumulative curves are plotted in parallel to the actual species accumulation curve, not approximating the horizontal asymptote along the whole randomized ascending sequence of samples number.The accumulation curve plotted for the Karakassis  ∞ estimator is considerably closer to the curve of observed species number ( obs ); from the 1st sample rank to almost 40th rank in the sequence it slightly overestimates, and within the range from 40 to 93 sample rank  ∞ underestimates the true species number for 3-5% (see Figure 5).
Earlier research applied to various groups of marine benthos found that all estimators (especially of the Chao family) often inaccurately estimate the observed species number when the sample number is small [2,9].When the number of samples increases,  exp asymptote converges to the cumulative curve of observed species number ( obs ) irrespective of whether the curve  exp over-or underestimates the true species richness [40].Walther and Martin [22] also stated that not less than 30-40% of the total number of samples (about 100) in an ascending randomized series was required for adequately precise  exp estimation.These authors denoted that all estimators considerably underestimated the real species number when less than 20-25% of the entire series was sampled.A reasonable precision level is when the estimator's asymptotic curve overestimates the real species richness value by not more than 20%.
Other authors [41] proved that the bootstrap, jack-knife-1 and -2 estimators can be applied to minimize underestimation of the expected species richness under comparing with actual species number in the samples.Given a small number of samples, for example, less than 25% of their total number in the randomized range, all these estimators similarly underestimate the species richness; however, the Jack-knife-2 estimator gives a lesser error.For a larger amount of samples, over 50% of the total number in the range, these estimators slightly overestimate the expected number of species and again the Jack-knife-2 estimator is more precise.Some other publications [24,40] in which the Chao and Karakassis  ∞ estimators were compared pointed out that both could slightly underestimate expected species richness given a large number of samples.
Such results presume that in the analysis of diatom species richness generally involving rather small number of samples the estimator  ∞ would estimate the expected species number ( exp ) close to the real level.In our study when 9-18 samples from each of the sampling sites were examined, the prognostic estimates  exp were only 7-10% larger than the observed species number.In general, this degree of accuracy is acceptable with the average precision level of the estimator values for taxonomically different groups of benthos.
Considering these possible deviations, the statistical evaluation of precision of the 4 estimators was applied to various biotopes and different number of samples.It was found that  obs index may vary considerably (high values of standard deviation, SD) using a relatively small number of samples (7)(8)(9)(10)(11)(12).When the sample number increased to 15-18, the SD values consistently decreased, sometimes to zero, that is, narrowing the variability range of the expected species number occurred.Accuracy in the estimators was computed from the RE and SRD statistics for the overall randomized sequence of 93 samples (Figure 6).
Given a small number of samples (4-6), all the tested estimators highly overestimated the expected number of diatom species (see the peaks on the RE and SRD curves).When sample numbers increased to 15-20 or higher, the Chao and Jack-knife estimators gave lower relative error, that is, ratio ( exp −  obs )/ obs , and further convergence of the corresponding curves to the horizontal asymptote at level of 0.20-0.30(RE) or 0.14-0.20 (SRD) was observed.As for the  ∞ estimator, the RE and SRD relative errors plotted in relation to larger numbers of samples () showed a monotonic convergence to a zero asymptote.This estimator more or less adequately predicts the number of species in the taxocene starting from  = 7-8.Other authors [28] estimated species and generic richness of aquatic chironomids by 7 nonparametric estimators including the Chao and Jack-knife families and also concluded that all of these estimators may largely overestimate the expected number of species.Better accuracy could be attained only given a large number of the samples; the values of relative error (RE and SRD) were International Journal of Biodiversity highest for the Chao-1 and receded from the Chao-2 to the Jack-knife-1 and to the Jack-knife-2 estimators.
A similar relationship between the accuracy of the  exp averages relative to the increasing number of samples based upon standard deviation values has been yielded by the comparative estimation of benthic species richness in two areas of the Norwegian continental shelf [39].The researchers concluded that the most precise estimates of  exp (when SD estimates on the plots gradually declined to zero) could be obtained only with a sufficiently large number of samples (>20) taken from environmentally heterogeneous biotopes.
The results of the computed  exp / obs ratio for the randomized sequences of samples in 4 sampling areas (Karantinnaya, Laspi, and central and outer parts of Sevastopol Bay) have shown that given a small number of samples all the estimators overestimated the  obs value 1.3-1.8times.With an increase in sampling effort, the  exp gradually converges to the actual.The fact that estimators similarly evaluated  exp / obs ratios for the Karantinnaya, Laspi the inner and outer parts of Sevastopol Bay can be explained by similarity of the habitats and, hence, similar species structure of the taxocenes in these pairs of sampling sites.Note, the points corresponding to samples of these areas on the MDS ordination plot also are arranged in dense patches according to the similarity of diatom species abundances (see Figure 2).
The results of estimators' reliability evaluation based on bias and precision metrics are represented in Table 3.
Both studied metrics have highest values considering the early 20% (sample ranks 1-19) of the overall ascending succession of 93 samples; here, all the estimators give considerably greater estimates of the actual species richness.Within the mid-range (sample ranks 20-58 or 21-60% of the entire series), the relevant estimates were substantially lower; the Karakassis- ∞ estimator most effectively approximated the true number of species.For the late samples in randomized row (ranks 59 to 93, or 61-100% of the total number of samples) all tested estimators predicted the expected species richness most precisely.Compared to the rest of the estimators, the  ∞ displays the minimum inaccuracy, that is, insignificantly underestimating (bias = −0.017) the real species number of benthic diatoms.In dealing with the overall range of 93 samples, the estimator  ∞ had also displayed the best results, giving lowest average bias value.According to the application of the precision metrics, the Jack-knife-1 estimator was superior, estimating the expected species richness closest to the actually observed number of diatom species.
A methodically similar study to determine the optimum relationship between sampling effort and observed species richness by testing of 19 estimators was conducted by Walther and Martin [22].Comparing the bias and the precision metrics in different segments of the randomized row of samples, they have shown that the majority of the estimators underestimate the real number of species in the first onethird (25-30%) of the total sample range.The precision receded from the Chao-2 to Chao-1 to Jack-knife-2 and to Jack-knife-1 estimators which were more accurate than the rest of the 15 tested estimators (the  ∞ estimator was not used).Considering the latter segment of the sequence (50-100% of samples), the Chao-1 and Chao-2 estimators were also the most effective, giving the least bias and a slightly overestimated value of  exp compared to the  obs .Both Jack-knife estimators were also quite inaccurate for the late samples (50-100%).Thus, these authors proposed to consider both Chao estimators as the most fitting for species richness prognostication.The Jack-knife-1 and 2 estimators were inferior yet performing more precisely than the rest of tested estimators.Nevertheless, Walther and Moore [42] concluded that the nonparametric estimators of the Chao and Jack-knife families performed most reliably in the prediction of expected species richness.
Proceeding from all of the above stated facts, none of the studied estimators considered would provide a universal tool that would perform equally well in different groups of biota (although these conclusions were drawn from results of the investigation of organisms and habitats very different Note that the precision of the expected number of species ( exp ) evaluated by the  ∞ estimator can depend also on the width of sampling lag between the pairs of samples in their ascending sequence [25].The initially proposed estimation method was based on a regression of the species in  + 1 samples against the species contained in  samples.The main concept was that this index would provide the number of species expected when the difference in the cumulative number of species between two consecutive randomized samples (i.e., sampling lag = 1) would be zero.However, in that case there should also be a zero difference between higher sampling lags.It could be expected that increasing sampling lag would provide more precise results, since it could give a higher resolution in detecting trends in the increase of species richness.
In this part of our analysis, two additional samples taken in the coastal sea water southward of Balaklava Bay and northward of the mouth of Sevastopol Bay were included, thereby the total number of samples increased to 95 and the total list of observed diatoms to 471 species.Changes in the  exp values determined from the linear regressions  obs() = ( obs(+) ), constructed for sequences of samples regarding various width of sampling lags are presented in Figure 7.
It was found that in developing the regressions of the species in  + 1 samples against  samples for variable sampling lags, increasing the lag from 1 to 15 within the entire range of 95 samples results in a greater expected number of species, with only minor underestimation of the real species richness.At the sampling lag of 15, the estimator gave the  exp value of 463, closest to the  obs number of 471 species.Further increase of the lag width brought about a conspicuous decline of and, hence, a greater underestimation of species richness in the taxocene (see Figure 7).Thus, most precise results under application of  ∞ estimator can be achieved when construction of the linear regression plots for calculation of the  exp is based on sequences of randomized samples with sampling lags of 10 to 15.
The assessment of similarity from the pairwise comparison in the sequence of successively taken samples (considering the different width of sampling lags) elicited heterogeneity of the distribution pattern of the expected species richness and the diversity of the taxocene throughout certain sampling areas.So, in the triangular matrix of the intersample similarity of species richness, the first subdiagonal corresponds to samples taken in the initial sequence, displaying the similarity between the successive pairs of samples, that is, for the sampling lag 1; the second subdiagonal corresponds to the International Journal of Biodiversity sampling lag 2 when evaluating similarity between samples 1 and 3, 2, and 4, and so forth.Sampling lag 3 corresponds to the third subdiagonal in the matrice, displaying similarity values between pairs of samples 1 and 4, 2, and 5, and so forth.
A possible trend of expected species richness distribution can be derived from comparison of the results averaged for each subdiagonal.A decreasing trend in the average similarity between all pairs of samples with increasing sampling lag assumes that strong patchiness or a "hidden" environmental gradient reduces the homogeneity of the data and along which the  exp index changes throughout the biotope [25].Accordingly, the absence of a negative trend in the quotient of similarity with increasing width of sampling lag presumes relatively homogeneous distribution of species richness.This methodical approach has provided an insight into the distribution of the expected species richness of diatoms over the sampling sites in Balaklava and Karantinnaya bays (Figure 8).
At Balaklava Bay a pronounced negative linear trend ( 2 = 0.95) suggests evident heterogeneity in the  exp distribution when comparing the inner and the outer parts of the bay.The uneven distribution can be attributed to the distinct integral gradient including factors such as depth, grain size of sediments, and the degree of anthropogenic impactalong the bay water area towards the mouth [43].On the contrary, in Karantinnaya Bay increases in the width of the sampling lag between pairwise comparing samples did not lead to a negative trend ( 2 = 0.01), the average similarity remained relatively constant for lags 1 to 7, ranging from 35 to 42%.These results imply the absence of an environmental gradient and, consequently, a relatively homogeneous pattern of  exp distribution throughout different parts of this bay [44].The results enable more precise estimation of the expected species richness of benthic diatoms along with reasonably minimal sampling efforts when studying other coastal habitats with similar bottom substrates and depth range.

Conclusion
The obtained results are one of the first attempts of prognostic estimation for benthic diatoms species richness in near shore habitats along the northern Black Sea coasts.
The results based on the randomized sequence of 93 samples taken at 8 sites of SW Crimea and applying 4 prognostic methods have shown that all estimators were dependent on the sampling effort in each data set.All applied indices overestimated the observed number ( obs ) of benthic diatom species, especially for a small number of samples (5)(6).The expected species richness ( exp ) is most reasonably estimated by the Karakassis- ∞ method.Lower-range overestimation of  obs value (2-10%) occurred for small number of samples (<12), and slight underestimation (3-5%) occurred when sample numbers in a certain site exceeded 40-43.Level of  exp averaged through all sampling sites overestimated  obs value 6.8 ± 2.8%.Other considered estimators (Chao-2 and Jack-knife-1 and -2) more considerably overestimated  obs level: 21-70% (on average 34.4 ± 8.0%), 24-36% (28.4 ± 2.2%), and 33-58% (42.5 ± 4.3%), respectively, depending on the number of samples in each sampling site.Thus, the estimator- ∞ represents the best compromise choice for evaluation of expected diatoms species richness in the various habitats.
The empirical relationship between number of samples () and the number of observed species () (considering all samples) is reliably described by log-equation  = 79.01 ln()+34.95.Following this equation, nearly 10 samples are required for disclosing about 50% of the total species number of benthic diatoms (433 species) which were found on sandy/muddy sediments near Crimean coasts within the depth range 6-48 m.To detect about 80% species richness (347 species), not less than 46 samples should be considered (under the assumption of equal probability to reveal any diatom species in the sample).Results based on this generalized region-specific curve can be applied for evaluation of compromised ratio between minimum sampling effort and possibly higher number of revealed diatom species.

Figure 2 :
Figure 2: MDS ordination plot of all 93 samples (based on double square-root transformed abundance similarity matrice).Samples from eight different sampling sites are indicated on the plot by labels.

Figure 4 :
Figure 4: Relationships between expected number of diatoms species ES() and conventional abundance subsets, consisting of different numbers of cells () constructed for different sampling sites of SW Crimea.

Figure 6 :
Figure 6: Evaluation of several estimators' inaccuracy based on relative error (RE) (a) and squared relative deviation, SRD (b) metrics.On the RE plot the data for all 93 samples are presented, while on the SRD plot only data for ranked sequence of the first 30 samples are shown (for clarity).

Figure 7 :
Figure 7: Changes in the expected diatom species number ( exp ) with respect to different sampling lag (lag = 1 for samples taken in initial sequence) between successive samples under construction of linear regressions  obs () = ( obs ( + 1)) based on  ∞ estimator.The observed diatom species number ( obs = 471) is also indicated on the plot by the dotted line.

Figure 8 :
Figure 8: Trends in changes of average species similarity values (± SD) between pairs of samples in the sequences taken at different sampling lag ((a) Balaklava Bay and (b) Karantinnaya Bay).

Table 1 :
Representativeness of benthic diatoms (Bacillariophyta) at 8 investigated sampling sites in SW Crimea.
Figure 3: Cumulative randomized sequences of  exp constructed for 8 sampling areas with different number of samples (6 to 18) as well as generalized species-accumulation curve (solid line) combining all 93 samples taken in SW Crimea.

Table
Number of samples, observed ( obs ) and expected ( exp ) values of benthic diatom species richness at 8 investigated sampling sites in SW Crimea. exp values are calculated using 4 estimators.

Table 3 :
Comparative assessment of reliability of 4 estimators based on bias and precision metrics calculated for different parts of a randomized ascending sequence of samples (1 to 93).