Molecular Signatures of Humic Acids from Different Sources as Revealed by Ultrahigh Resolution Mass Spectrometry

Humic acid (HA) is extremely important for understanding the geochemical cycle of pollutants in different environments. Fourier transform ion cyclotron resonance mass spectrometry (FT-ICRMS) has performed molecular-level analysis of two standard HAs from the Suwannee River (SRHA) and leonardite (LEHA) and HA from Jiufeng forest in Beijing (JFHA), which is impossible for other conventional instruments. Regardless of the source of HA, compounds containing more heteroatoms (such as nitrogen and sulfur) have a higher degree of unsaturation and aromaticity. JFHA, SRHA, and LEHA from soil, river, and leonardite, respectively, are arranged in order from the lowest to highest degree of humification, according to molecular unsaturation and aromaticity of HAs. Soil HA is more labile and contains many large molecular weight compounds with low unsaturation. Regardless of unsaturation, molecules of River HA have a homogeneous molecular mass distribution and contain many plantderived ligninand tannin-like compounds, which are more stable than lipid and more labile than condensed aromatics. Leonardite HA with a high degree of humification contains a large number of compounds with high aromaticity and more heteroatoms and has low lability. Our results reveal the diversity of humic acid at molecular level because of different degree of humification and the lability. -ese conclusions are significant for understanding the role of humic acid from different sources in pollutant transformation and the geochemical cycle at the molecular level.


Introduction
Humic substances (HS) are the most widespread natural organic matter (NOM) in soil, water, and sediment. HS are derived mainly from plant, organism, and animal tissues [1], while the biomaterials lost their initial structures in chemical and biological degradation. Various turnover times of humic substances that affect global carbon balance are due to inherent structural resistance to biotic decomposition and binding to minerals [2][3][4][5]. ey have complex chemical structures which are more stable than their precursors and have lost their chemical properties [6]. HS can be classified into humic acid (HA), fulvic acid (FA), and humin (HM) based on the solubility [7,8]. HA generally represents the major fraction of HS [6] and is mainly derived from typical environments rich in biochemical reactions such as soil, rivers, and leonardite. Soil, river, and leonardite are three typical sources of HAs. Soil HAs are mainly derived from the (bio)chemical degradation of plant and animal residues and synthetic activity from microorganisms and account for 20% of the total organic matter [6,9,10]. Rivers play a very important role in the transformation [11,12] and global cycling of HAs [12,13]. Elemental composition of molecules can cause the spontaneous generation of relatively labile HAs [14,15] or change the HA reactivity and adsorption process. Lignite (low rank coal) is the second stage of coal formation [6]. Leonardite, the most oxidized variety of lignite, is the richest source of HS [16,17]. Leonardite-derived HA accounts for 10-80% of leonardite organic matter, depending on its maturity level [18] and is further along the diagenetic path of humification.
ey are much older and contain more condensed aromatics than soil HAs [6]. ere are a number of studies concerning HA derived from different types of soil [19] and rivers worldwide [11], but contrast data on the molecular composition of HA from soil, river, and leonardite are still missing. e standard samples provided by International Humic Substances Society can more accurately reflect characteristics of HA from various sources.
It is difficult to analyse HA at the molecular level due to its extreme complexity, low concentration, and high polarity [20].
ere are many conventional techniques to characterize HA from a variety of sources, such as liquid chromatography-mass spectrometry (LC-MS), pyrolysis gas chromatography-mass spectrometry (GC-MS), and nuclear magnetic resonance (NMR) [21]. ey provide apparent or biased information about HA and fail to dig for complete characteristics [22]. Low-resolution GC-MS and LC-MS can only analyse a small part of HA due to volatility and solubility, respectively. NMR can give the functional group information of the bulk HA but fail to give molecular-level details [20]. Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) is a very suitable tool for analysing HA from different sources [23]. Due to its high resolution and accuracy, it can assign unique molecular formula to thousands of compounds [24]. Moreover, as a soft ionization technology, negative ion mode electrospray ionization (ESI) has high selectivity for polar molecules in complex mixtures [25], which can avoid complex separation procedures [26]. e purpose of this work is to investigate the molecular compositions of HAs from soil, river, and leonardite using ESI FT-ICR MS. e specific objectives are to (1) compare the diversity of molecular composition between HAs from different resources and (2) discuss the lability of diverse HAs.

Samples and Extraction Method.
Humic acid (HA) collected from Suwannee River in South Georgia (SRHA) is a standard fulvic acid sample of International Humic Substances Society (IHSS) and its serial number is 2S101F. e IHSS standard leonardite-derived HA sample (LEHA) was obtained from the Gascoyne Mine in Bowman County, North Dakota, USA [27], and its serial number is 4S102F. Leonardite was produced by the natural oxidation of exposed lignite, which was a low-grade coal. e microbial action and other chemical processes of leonardite in the natural environment are longer and more complex, and therefore LEHA has a higher degree of humification. More additional details of the source material can be found at http://humic-substances.org/. HA collected from Beijing Jiufeng forest soils in China (JFHA) was extracted according to the standard method recommended by IHSS. e detailed extraction method is as follows.
Take certain amount of freeze-dried soil from Jiufeng forest and thoroughly mix it with 0.1 mol/L NaOH solution to obtain mixed solution with solid-liquid ratio of 1 : 10. Remove the residual soil after centrifugation. e 6.0 mol/L HCl solution was added to the mixed solution to make the pH value 1 and then left to stand for 24 hours to obtain precipitate. 0.1 mol/L KOH solution was added to the precipitate, the pH was adjusted to 13.0, and then certain amount of potassium chloride was added to adjust the solution concentration to 0.4 mol/L. After half an hour of highspeed centrifugation (10000 r/min), the supernatant was separated, and 6.0 mol/L HCl was added to adjust the pH to 1.0, and concentrated HF was added to make the concentration to 0.3 mol/L. Impurities were removed by shaking and dialysis, and the solution was freeze-dried to obtain humic acid sample.

Parameters of FT-ICR MS.
e spectra of JFHA, SRHA, and LEHA were obtained using a Bruker Apex-ultra Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) with a 9.4 T superconducting magnet and Apollo II electrospray ion source. e operating parameters of the FT-ICR MS analysis were mainly according to the previous studies [28]. Sample solutions were continuously injected into the electrospray ion source at a flow rate of 180 μL/h with a syringe pump. e spray shield voltage, capillary column introducing voltage, and capillary column end voltage of the negative-ion ESI were set to 4.0 kV, 4.5 kV, and −320 V, respectively. For each spectrum multiple (200-300) transient scans (collected with a 4 M Word time domain) were coadded in the m/z 200-700 mass range. e spectra were externally calibrated from a homologous series of the Suwannee River Natural Organic Matter sample (SRNOM). e mass measurement accuracy for single charged molecular ions was better than ±1 ppm. Mass peaks were exported if the signal-to-noise (S/N) ratio was greater than 6. Molecular formula assignments were limited by the following constraints for the three samples: 12C (0-100), 13C (0-2), 1H (0-100), 14N (0-3), 16O (0-30), and 32S (0-2) atoms, and only the possible element composition within ±1 ppm mass measurement error was considered.

Molecular
Indexes of Humic Acid. Normalized peak intensities (normalized to the sum of peak intensities of identified molecular formulas) were used to semiquantitatively evaluate differences in HA molecular composition [29]. e modified aromatic index (AI-mod) reflected the number of double C=C bonds, C-O bonds, and C=O bonds and was used to assess the aromaticity of compounds. AI-mod was calculated based on the assigned molecular formula (described as C c H h O o N n S s ) using the equation AI-mod = (1 + c − 0.5o-s-0.5h)/(c − 0.5o-s-n) [30]. e double bond equivalents (DBE) was used to assess the degree of unsaturation of compounds and calculated as follows: DBE = c − h/2 + n/2 + 1. Intensity weighted average values are displayed for number of carbons (C wa ), hydrogens (H wa ), oxygens (O wa ), nitrogens (N wa ), sulfurs (S wa ), molecular masses (m/z wa ), hydrogen to carbon ratios (H/C wa ), oxygen to carbon ratios (O/C wa ), double bond equivalents (DBE wa ), and modified aromatic indexes (AI-mod wa ). e molecular lability boundary index (MLB) is the bound when H/C is 1.5. MLB L % which can be used to evaluate the lability of HA. When H/C > 1.5, the NOM constituents above MLB correspond to a more labile substance (MLB L ), and when H/C < 1.5, the NOM constituents below MLB exhibit less labile and more recalcitrant characteristics (MLB R ). MLB L % was calculated by dividing the total intensity of molecular formulas with H/C ≥ 1.5 by the total intensity of molecular formulas [31]. Carboxyl-rich alicyclic molecules (CRAMs) which are often associated with refractory compounds are complex mixtures of carboxylated and fused alicyclic structures whose carboxyl-C: aliphatic-C ratio is 1 : 2 to 1 : 7 [32]. CRAM% was calculated by dividing the total intensity of molecular formulas with DBE/C = 0.3-0.68, DBE/ H = 0.2-0.95, and DBE/O = 0.77-1.75 by the total intensity of molecular formulas [29]. e nominal oxidation state of C (NOSC) was used to assess the degradability of DOM [33,34] and was calculated by using NOSC = 4 − ((4c + h − 3n-2o-2s)/c) [34]. e assigned molecular formulas were examined using the van Krevelen diagram (v-K diagram) and modified aromatic index (AI-mod). e v-K diagram (a plot of atomic H/C and O/C) is an excellent method for graphical interpretation of large datasets and visualization of their chemical nature. Molecular formulas were further assigned to the following groups [30,35,36].

General Signatures of FT-ICR MS.
e HAs from river, soil, and leonardite are shown as representative spectra in Figures S1a-S1c. All HA samples have concentrated mass spectral peaks in the range of 200-700 Daltons (Da), and the overall peak shapes of SRHA and LEHA are approximately normal distribution ( Figure S1). e number of assigned formulas of SRHA, JFHA, and LEHA is 3559, 2246, and 4398, respectively. e variation in the C, H, O, N, and S contents and molecular mass as well as indices based on these shows distinct differences (Table 1).
ere are the highest H content and lowest O and N content in JFHA, leading to the highest H/C and lowest O/C in JFHA. In addition, JFHA contains more saturated compounds with higher H/C and lower O/C (Figure 1), and its overall molecular mass is smaller ( Figure S2b). is indicates that the degree of humification of JFHA is higher, because refractory humic acid fractions are marked by increases in the lipid components which have high H/C and low O/C [40]. e indexes of SRHA are relatively moderate, such as m/z wa , H/C wa , and DBE wa . LEHA has the lowest H/C and the largest molecular mass and this may be because the condensed aromatic structure with low H/C is abundant in LEHA and has a large molecular weight. is is consistent with the previous conclusion that refractory humic acid fractions are marked by increases in the condensed aromatic components [40].

Van Krevelen Diagram Analysis.
In the v-K diagram, the aromaticity of the compounds shows such a rule: the aromaticity of the compounds in lower layer is higher than that in upper layer [30]. e compound distribution of SRHA shows a rectangle in the v-K diagram, which indicates that the molecular distribution is relatively homogeneous (Figure 1). From the perspective of the number of molecular formulas and the peak magnitudes, there is no significant difference in the distribution proportion of the compounds ( Table 2). e compounds are mainly concentrated in highly unsaturated compounds, accounting for 51% in terms of number of molecular formulas and 54% in terms of peak magnitudes. Highly aromatic compounds and polycyclic aromatics are ranked second and third, respectively, accounting for about 10% to 20% ( Table 2). e compounds of JFHA are mainly concentrated in the upper and lower layers, and sparse points are in the middle layer ( Figure 1). In addition, from the perspective of the number of molecular formulas and peak magnitudes, the distribution of the compounds is quite different. From the number of molecule formulas, the compounds are mainly distributed in unsaturated aliphatic compounds, highly unsaturated compounds, and polycyclic aromatics, accounting for about 25%. From the peak magnitudes' point of view, the compounds are mainly distributed in unsaturated aliphatic compounds, accounting for 50%, followed by fatty acids, accounting for 25% (Table 2). By comparison, the high peak magnitudes of unsaturated aliphatic compounds and fatty acids and sulfonic acids are due to the high content of these compounds in JFHA firstly and secondly because these compounds are easily ionized in the negative ion mode of ESI FT-ICR MS [19]. e compound distribution of LEHA is a right-angled triangle in the v-K diagram; that is, the lower-layer compound is more abundant and has a wider distribution range than the upper-layer compound ( Figure 1). ere is no significant difference in the molecular formula number distribution and peak magnitude distribution of compounds. Obviously, the compounds are mainly concentrated in the lower layer, and the largest proportion is polycyclic aromatics, which accounts for about 50%, followed by highly aromatic compounds and highly unsaturated compounds, accounting for about 20-30% ( Table 2). DBE wa for SRHA, JFHA, and LEHA are 10.02, 5.35, and 15.07, respectively, and AI-mod wa are 0.38, 0.17, and 0.59, respectively. ese two molecular formula parameters consistently indicate that the aromaticity of LEHA is the highest among the three HAs samples, followed by SRHA, and the smallest is JFHA. is is concordant with previous research results. e DBE value increased in the progression of water-extractable organic matter from Adkins soil, labile humic acid, and refractory calcium humic acid, which indicates that as the degree of humification increases, the aromaticity of organic matter also increases [40]. e compounds that account for a large proportion of LEHA have higher degrees of unsaturation, and the most abundant compounds are polycyclic aromatics which have highest aromaticity.
e presence of a large number of aromatic compounds indicates that LEHA is the most aromatic and has the highest degree of humification. e compounds of SRHA and JFHA are mainly concentrated in medially unsaturated compounds, which are highly unsaturated compounds and unsaturated aliphatic compounds, respectively. Highly unsaturated compounds are more aromatic than unsaturated aliphatic compounds. As a result, SRHA is more aromatic than JFHA.
DBE is roughly positively correlated with m/z among three HA samples. e compounds with DBE greater than 20 are distributed across m/z range of 450-700 DA. e number and peak magnitudes of these molecular formulas both increase in order of JFHA, SRHA, and LEHA (Figure 2). DBE is roughly negatively related to H/C among the three HA samples. e above compounds are polycyclic aromatics and highly aromatic compounds (Figure 3).    ), nitrogens (N wa ), sulfurs (S wa ), molecular weights (m/z wa ), hydrogen to carbon ratios (H/C wa ), and oxygen to carbon ratios (O/C wa ). b,c Both MLB L % and CRAM% are obtained by dividing the total intensity of the corresponding molecular formula by the total intensity of the molecular formula.
erefore, it can be inferred that the aromatic compounds with higher DBE and larger molecular mass have a significant effect on the aromaticity and humification of HA. Refractory humic acid fractions are marked by increases in the condensed aromatic components with the highest DBE [40]. Humification processes involved in soil-derived organic matter formation are progressions where H/C ratios are steadily reduced [40] and the maximum molecular mass  Journal of Chemistry of condensed aromatic components is increased [19]. ere are more unsaturated aliphatic compounds in JFHA. e molecular formulas of these compounds are distributed across DBE range of 0-5 and m/z range of 400-600. at is, these compounds have low unsaturation and large molecular mass ( Figure 2). Most of molecular formulas for unsaturated aliphatic compounds (2.0 > H/C ≥ 1.5) are distributed in the m/z range of 200-400 and a few are in the m/z range of 400-500 in SRHA. e molecular formulas for unsaturated aliphatic compounds are distributed densely in the m/z range of 200-600 in JFHA. e distribution of unsaturated aliphatic compounds in LEHA is similar to SRHA, which is sparser. It is inferred that the high content of unsaturated aliphatic compounds in JFHA has a large molecular mass. e distribution of highly unsaturated compounds and highly aromatic compounds in SRHA and LEHA is similar, while the distribution in JFHA is sparser and the distribution range is smaller. Polycyclic aromatics are distributed in the m/z range of 200-700 in LEHA, with the largest distribution range and the largest peak magnitudes. e content of polycyclic aromatics in LEHA is the highest and the molecular mass is the largest, indicating that LEHA has the strongest aromaticity [40] (Figure S2). e contribution of aromatic compounds with larger molecular weights and higher unsaturation to the overall aromaticity of HAs is the most important.
Individual van Krevelen distribution of molecular formulas identified in each HA is included in the Supporting Information ( Figure S3). Across all HAs, the assigned molecular formulas consisted primarily of C, H, and O (CHO) followed by formulas with additional N (CHON 1 and CHON 2 ) and S (CHOS) [11]. For these three HAs, the CHO molecular formula can make up all classes of compounds, while the CHON and CHOS molecular formulas only make up compounds with a high degree of unsaturation. CHOS molecular formula can form polycyclic aromatics and highly aromatic compounds in SRHA and LEHA. In addition, CHOS molecular formula can also form highly unsaturated compounds with relatively low unsaturation in SRHA. CHON 1 molecular formula is present in all three HAs, and the proportion is increasing according to the order of humification, namely, JFHA < SRHA < LEHA. e CHON 2 molecular formula with highest aromaticity exists in the polycyclic aromatics of JFHA and LEHA, and the aromaticity and unsaturation of the CHON 2 molecular formula are higher than the CHON 1 molecular formula, consistent with previous conclusions on FA [41].  Of all formulas assigned to peaks detected by ultrahigh resolution mass spectral analysis (n � 5329), approximately 24% (n � 1282) are shared among all HA samples. e common molecular formulas consisted of the CHO molecular formulas and the CHON 1 molecular formulas. And the CHON 1 molecular formulas constituted highly aromatic compounds and polycyclic aromatics (Figure 4(a)). ere is similarity in these common molecular formulas regardless of climatic regions, land use characteristics, diagenetic process [42,43], and other factors. e content of compounds with high aromaticity and unsaturated degree is higher than other compounds and heteroatom-containing compounds are more aromatic and unsaturated. ese compounds are accumulation of a similar pool of refractory molecules [44] resulting from the removal of labile DOM components by microbial processing [45] or photodegradation [46]. ere are 492 molecular formulas identified only in JFHA, and most of them are unsaturated aliphatic compounds with 1. categorized into biolabile compound classes including carbohydrates, proteins, and lipids [20,30]. erefore, JFHA is mainly derived from autochthonous sources [47,48]. ese lipid-like compounds may be the products of secondary soil microbes [49]. ere are 1483 molecular formulas detected only in SRHA. ese compounds have higher O/C and contain more heteroatom-containing formulas (CHON 1 and CHOS 1 ). is indicates that SRHA has been anthropogenically impacted, such as urban region and cropland. Anthropogenic inputs to the rivers have altered in-stream DOM composition and increased the content of heteroatoms [11]. 39% of the molecular formulas are detected only in LEHA, including all classes of molecular formulas containing heteroatom, and the overall shape is a right-angled triangle. According to the order of CHO, CHON 1 , and CHON 2 (CHOS 1 ), the area of each class of molecular formulas becomes smaller and smaller and gradually concentrates on the lower layer of the v-K diagram, that is, the area of aromatic compounds. erefore, the aromaticity and unsaturation of the molecular formulas containing heteroatoms are significantly higher. Molecular formulas enriched in O and depleted in H with increased AI-mod reflect terrestrial inputs from plant-derived biomacromolecules such as lignins, tannins, and carboxyl-rich acyclic molecules [20,30,32,50]. Compounds containing heteroatoms (nitrogen and sulfur) are more aromatic [16], and the higher the number of heteroatoms, the stronger the aromaticity. At the same time, there is higher content for compounds containing heteroatoms in HAs with higher degree of humification.

Molecular Lability of Humic Acids.
e quality of HA is determined by the mixture of different types of chemical species, including labile and recalcitrant compounds. Microbial-derived bioavailable fractions are classified as more labile organic matter, which has been correlated with high H/C and low O/C ratios in the lipid, protein, and amino sugar-like regions of the v-K diagram [48,[51][52][53][54]. Classification of less labile, more recalcitrant components from each carbon source corresponds to molecular species that cluster in more lignin-, tannin-, and black carbon-like regions of the v-K diagram [31]. e MLB L molecular richness ranked HAs from most to least labile as JFHA > SRHA > LEHA, whose MLB L % is 75.47%, 15.33%, and 7.88%. JFHA represented the most purely autochthonous, microbially derived HA and has the highest contributions of more hydrogen-saturated species and labile fractions. e CRAM formulas are commonly linked with refractory compounds widely detected in diverse environments including deep ocean [32]. e structural diversity found within CRAM and its substantial content of alicyclic rings and branching contribute to its resistance to biodegradation and refractory nature [32]. e content of CRAM is SRHA, LEHA, and JFHA in order from high to low, which are 51.91%, 18.25%, and 8.28%, respectively. CRAM seems to consist mainly of the decomposition products of biomolecules, as indicated by the prevalence of its carboxyl groups and the manner of oxidation that increases with decreasing molecular size [32]. erefore, SRHA contains plant-derived lignin-and tannic-like compounds with the relatively small molecular mass. SRHA may have the most significance for pollutants because CRAM affected the reactivity of organic matter and the bioavailability of associated nutrients and trace metals. e nominal oxidation state of carbon (NOSC) is a formal parameter that gives information about the average oxidation state of all carbons in each chemical formula, regardless of chemical structure [33]. e NOSC ranked HAs from highest to lowest as LEHA > JFHA > SRHA, whose NOSC is −0.07, −1.16, and −0.25. DOM degrades along the gradient from aromatics to aliphatics and high to low nominal oxidation states of carbon [34]. JFHA is the most labile HA because it contains a large amount of unsaturated aliphatic compounds. SRHA is easy to combine with nutrients and trace metals to affect its bioavailability. LEHA shows the highest nominal oxidation states of carbon due to higher degree of humification.

Conclusions
Humic acids from different sources of soil, river, and leonardite were analyzed at the molecular level by ultrahigh resolution mass spectrometry, and thus multiple similarities and differences were found. Despite sources of HA, the compounds containing more heteroatoms (such as nitrogen and sulfur) are more unsaturated and aromatic. AI-mod and DBE consistently indicated that LEHA, SRHA, and JFHA are in order from the highest to lowest degree of humification. JFHA contained amounts of compounds with large molecular mass and low degree of unsaturation. Different classes of compounds uniformly distributed in SRHA, especially lignin-and tannin-like compounds with relatively high degree of unsaturation. LEHA contains an amount of aromatic compounds involving more heteroatoms with high degree of unsaturation and large molecular weight due to high degree of humification. Different molecular composition can cause the diversity of HA lability. JFHA is relatively labile and is prone to interact with microorganisms. e lignin-and tannin-like compounds of SRHA stem from plant-derived organic matter, which are easily combined with nutrients and trace metals to reduce the bioavailability of SRHA. LEHA is relatively less labile due to amounts of aromatic compounds. Our results reveal the diversity of HA due to different degrees of humification at the molecular level and the ways of combining with other substances because of different lability. Ultrahigh resolution FT-ICR MS is a powerful tool to address these novel hypotheses and is of great significance for studying the role of HA from different sources in the pollutant transformation and geochemical cycle at the molecular level.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper. Figure S1 includes respective ESI FT-ICR mass spectra from (a) humic acids from Jiufeng forest soil, (b) humic acids from Suwannee River, and (c) humic acids from leonardite. Figure S2 includes respective H/C-m/z diagrams from (a) humic acids from Jiufeng forest soil, (b) humic acids from Suwannee River, and (c) humic acids from leonardite. Figure  S3 includes van Krevelen diagrams from the mass spectra of (a) humic acids from Jiufeng forest soil, (b) humic acids from Suwannee River, and (c) humic acids from leonardite. Figure S4 includes van Krevelen diagrams of molecular formulas (a) common among in humic acids from Jiufeng forest soil and Suwannee River, (b) common among in humic acids from Jiufeng forest soil and leonardite, and (c) common among in humic acids from Suwannee River and leonardite. (Supplementary Materials)