Structure and Identification of Solenin : A Novel Fibrous Protein from Bivalve Solen grandis Ligament

Fibrous proteins, which derived fromnatural sources, have been acting as templates for the production of newmaterials for decades, and most of them have been modified to improve mechanical performance. Insight into the structures of fibrous proteins is a key step for fabricating of bioinspired materials. Here, we revealed the microstructure of a novel fibrous protein: solenin from Solen grandis ligament and identified the protein by MALDI-TOF-TOF-MS and LC-MS-MS analyses. We found that the protein fiber has no hierarchical structure and is homologous to keratin type II cytoskeletal 1 and type I cytoskeletal 9-like, containing “SGGG,” “SYGSGGG,” “GS,” and “GSS” repeat sequences. Secondary structure analysis by FTIR shows that solenin is composed of 41.8% βsheet, 16.2% β-turn, 26.5% α-helix, and 9.8% disordered structure.We believe that the β-sheet structure and those repeat sequences which form “glycine loops”may give solenin excellence elastic and flexible properties to withstand tensile stress caused by repeating opening and closing of the shell valves in vivo. This paper contributes a novel fibrous protein for the protein materials world.


Introduction
Fibrous proteins have attracted much attention for a long time because of their excellent tensile strength, elastic properties, biocompatibility, and implication for fabricating biomaterials.The most common fibrous proteins include collagen, elastin, silk fibroin, and keratin, whose microstructures, composition, primary structures, and mechanical properties have been studied in detail [1][2][3].Mussel byssal thread, a fibrous biomaterial from bivalve mollusc, has also been investigated for decades.This hair-like fiber can be divided into three regions: an elastic proximal thread, stiff distal thread, and adhesive plaque regions, and it was found to contain collagenous and elastin-like domains that play a key role in the extraordinary mechanical properties [4][5][6].Although these proteins have been studied in detail; however, the knowledge about fibrous proteins in bivalve ligaments is still limited.
Bivalve ligament is an elastic calcified structure which connects two shell valves dorsally and functions like a coil spring to open the valves when adductor muscles relax.
Most of bivalve ligaments are divided structurally into an outer uncalcified protein layer and an inner calcified layer made up of aragonite and matrix proteins [7].They have attracted the interest of materials scientists as elastic biocomposite materials with particular structure presenting excellent mechanical properties [8].Generally, an intact ligament is composed of about 40% protein and 60% calcium carbonate [9].These proteins have been studied for decades.Earlier studies were carried out mainly on amino acid composition analysis.For instance, Kelly and Rice [10] and Kahler et al. [9] reported that ligament proteins contain high contents of glycine and methionine but are devoid of hydroxyproline and hydroxylysine.Later, Kikuchi et al. [11] confirmed the results and identified two components desmosine and isodesmosine acting as cross-link in ligament protein.More recent studies have revealed insights into gene cloning and secondary structure analysis of the rubber-like protein: abduction of scallop and the synthetic peptides inspired by abduction [12,13].Besides, few studies have been given to proteins, especially fibrous proteins in other bivalve ligaments.Recently, we found two novel fibrous proteins from Siliqua radiata and Solen grandis ligaments.The former, named as K58, was found homologous to keratin type II cytoskeletal 1 [14,15]; the latter, which has never been studied, is composed of protein fibers with diameter of about 120 nm (Figure 1) [16].
This paper aims to observe the microstructure of the fibrous protein (FP), identify it by MALDI-TOF-TOF-MS and LC-MS-MS analyses, and reveal its secondary structure by FTIR analysis.

Sample Preparation and Observations. S. grandis
(Figure 1(a)) was freshly collected from Beihai city of Guangxi in southern China.After removing the soft body, ligaments were separated from shells mechanically, washed with deionized water, and air-dried.Then, we stripped away the outer layer and took an optical photo of the protein (Figure 1(b)) using a microscope equipped with a CCD camera.To obtain the SEM image (Figure 1(c)), FP was coated with gold and observed by a SEM (S-3400N, Hitachi) operated at accelerating voltage of 30 kV.

TEM Observations.
To observe the detailed structure of FP, we carried out TEM observations.First, we soaked FP in 3% glutaraldehyde for 3 days and then fixed it with 1% osmic acid for 2 h.After being dehydrated gradually with ethanolacetone solution, FP was permeated by using acetone-epoxy resin for 24 h.Then, the embedded FP was sliced into 70 nm thickness in longitudinal and transverse section with Leica UC7 ultramicrotome.Finally, the slices were stained with uranyl acetate-lead citrate solution and observed by Hitachi H-7650 TEM with acceleration voltage of 100 kV.

Sodium Dodecyl Sulphate-Polyacrylamide Gel Electrophoresis (SDS-PAGE).
For biochemical analysis, FP was ground into powder in liquid nitrogen.Then, 20 mg powder was treated with 7 M urea, 3% 2-mercaptoethanol, and 0.5 M NaOH solution at 65 ∘ C for 1 h.After being centrifuged at 12 000 rpm and 4 ∘ C for 20 min, we discarded the solution and treated the residue with the same conditions except NaOH solution that was diluted into 0.25 M.Then, the sample solution was obtained by centrifugation and dialyzed with a dialysis bag of 14 kDa molecular weight cutoff for three days.Dialysate was concentrated under vacuum, treated with 2D-Clean-Up Kit (GE Healthcare), and redissolved in lysis buffer.
After being quantified by Bradford assay, 26.1 g protein sample and molecular weight standards (MW 14.4-94.0kDa, TianGen Biotech) were applied to SDS-PAGE on a 12% separating gel using a JY 600 electrophoresis system (JunYi Technology).Electrophoresis was followed by silver staining and the gel band 1-1 (∼78 kDa) in lane 1 (Figure 2) was excised for trypsin digestion.

In-Gel Trypsin Digestion and MALDI-TOF-TOF-MS
Analysis.Gel band 1-1 was destained, dehydrated, and dried under vacuum.Then, gel pieces were rehydrated with 10 ng L −1 trypsin in 40 mM NH 4 HCO 3 and 10% acetonitrile solution for 45 min in ice bath.After being incubated at 37 ∘ C for 16 h, peptides were extracted from the gel twice with 50% acetonitrile and 0.1% trifluoroacetic acid.The extraction was vacuum-dried, redissolved in 0.1% trifluoroacetic acid, and mixed with -cyano-4-hydroxycinnamic acid (CHCA).The mixture was applied to a 4800 mass spectrometer (Applied Biosystems, Framingham) running in positive ion reflection mode in the mass range of 800-4000 Da.The ten most intense ions were selected for MS-MS analysis with acceleration voltage of 20 kV.
The acquired MS and MS-MS data were combined and searched against protein databases at NCBI and Swiss-Prot using the Mascot search engine with GPS Explorer Software (Applied Biosystems).Mass tolerance of 200 ppm and ±0.3 Da for MS and MS-MS were set, and variable modifications, such as oxidation, methylation, and phosphorylation, were under consideration.Protein score calculated by the software was used for correct identification.

LC-MS-MS Analysis.
FP sample solution was separated by SDS-PAGE again, and the similar gel band was excised and digested by trypsin.Digested sample was dissolved in 0.1% formic acid and desalted with 0.2% formic acid on a Zorbax 300 SB C18 peptide trap.Then, peptides were separated by a reversed phase C18 column (0.15 mm × 150 mm, Column Technology Inc.) with a linear gradient of 0-50% mobile phase A (0.1% formic acid and 84% acetonitrile) in mobile phase B (0.1% formic acid) over 60 min.Separated peptides were eluted into an LTQ linear ion trap mass spectrometer (Thermo Finnigan) equipped with a microspray source running in data-dependent mode with spray voltage of 3.4 kV at 200 ∘ C and full scan mass range of 300-1800 Da.Dynamic exclusion was enabled with a repeat count of 2 and exclusion duration of 1.5 min.The ten most intense ions in every full scan were selected automatically for MS-MS analysis.
The MS-MS spectra were searched against protein databases using SEQUEST algorithm.All SEQUEST searches were performed on Bioworks 3.2 software (Thermo Finnigan) with following parameters: fully tryptic peptide, parent mass tolerance 1.4; peptide mass tolerance 1.5.Delta CN (≥0.1) and Xcorr (one charge ≥1.9, two charges ≥2.2, and three charges ≥3.75) were used as criteria for identification.

Results and Discussion
3.1.TEM Observations.The morphology of FP has been observed by SEM in Figure 1(c) and previous study [16].However, owing to the limited resolution of SEM, it is hard to observe the fine structure of FP, especially the detailed structure of transverse sections of the protein fibers.By using TEM, we found that transverse sections of these protein fibers are roundish with diameters ranging from 130 to 200 nm, and they arrange orderly connecting one by one like many strings of pearls (Figure 3(a)).It is worth noting that diameters of protein fibers in TEM image are larger than those in SEM (about 120 nm).This inconsistency is very likely due to the pretreatment of FP for TEM observation.At higher magnification (Figure 3(b)) and from longitudinal view (Figures 3(c) and 3(d)), we did not find any microfibrils within these protein fibers, which means that these fibers have no hierarchical structure and they are not constructed by microfibrils.This structure feature is different from keratin and collagen fibers, both of which have complex hierarchical structures and are assembled by microfibrils as elementary building blocks [1,3].These results imply that FP may be a new kind of fibrous protein.

MALDI-TOF-TOF-MS Results and Data Analysis.
In previous study [14], we have found that FP has strikingly high Gly, Asp, Met, and Phe contents but contains trace amount of Hyp.That means that FP is not collagen.To identify this unique fibrous protein, mass spectrometry was performed and the results are shown in the following.
Although FP was highly matched with K1 and K9, it does not mean that FP is K1 or K9, because the molecular weight of FP (∼78 kDa) (Figure 2) is higher than K1 (∼64 kDa) and K9 (∼63 kDa) [19,20].As known, K1 and K9 are fibrous proteins belonging to keratin family.In terms of structure, both of them can be divided into three domains: a highly conserved central rod domain acting as basic structural framework, an N-terminal head, and a C-terminal tail domain that are diverse in different proteins [21].Of the matching peptides in Tables 1 and 2, 70% and 55% belong to conserved central rod domains of K1 and K9, respectively.These high matching rates imply that FP contains conserved domains similar to K1 and K9, and it should be the differences of head and tail domains that distinguish FP from K1 and K9.Therefore, we consider that FP is a novel fibrous protein homologous to K1 and K9.Since it was found from bivalve S. grandis ligament and identified for the first time, we named it solenin.

LC-MS-MS Results and Data Analysis. LC-MS-MS anal-
ysis shows the same results as MALDI.Similarly, no proteins were matched when searched against mollusc protein database, but K1 and K9 were matched with 56.7% and 64.2% sequence coverage, respectively.These results again indicate that solenin is homologous to K1 and K9.Interestingly, we also found that a matrix protein from inner layer of S. grandis ligament has a high homology with K1, which gives us an important implication; that is, solenin is likely to assemble from matrix proteins (consider for publication elsewhere).It is known that matrix proteins from bivalve shells can control the formation of biominerals [22].Excitingly, we have found that solenin can control pure aragonite formation at ambient condition in vitro, which suggests that solenin may serve as template for fabricating of biocomposite materials.
LC-MS-MS results also matched with repeat sequences mentioned (Tables 1 and 2).These sequences, with many repetitive "G, " are different from those of other fibrous or elastic proteins, such as "GGFGGMGGGX" of abduction [12], "GAGAGS, " "APGVGV, " and "GPGGG" of silk fibroin, elastin, and mussel byssal thread, respectively [23,24].However, repeat sequences of "GS" and "GSS" of solenin are identical to those of Lustrin A, an extracellular matrix protein from mollusc Haliotis rufescens shell [25].These glycine-and serine-rich repeats previously had been found to form rubber-like "glycine loops" in Lustrin A and keratins to give them elastic and flexible properties [25,26].Such  loops should also be present in solenin as it is homologous to keratin and contains "GS" and "GSS" repeats.Besides, phenylalanine (F) and tyrosine (Y) residues are involved in these repeat domains (Tables 1 and 2); their aromatic side chains interaction will contribute to the formation of "glycine loops" (Figure 4).These loops may give solenin excellence elastic and flexible properties to withstand tensile stress caused by repeating opening and closing of the shell valves in vivo.

FTIR Spectrum and Secondary
Structure Analysis.FTIR spectrum (Figure 5(a)) of solenin has been discussed in our previous study [16].Here, secondary structure of solenin was analyzed by curve-fitting of amide I band of the spectrum, since the band is frequently used for secondary structure analysis [27,28].Based on previous studies [17,18] and the curve-fitting results (Figure 5(b)), peaks of Gaussian curve were ascribed to -sheet, -turn, -helix, and disordered structure, respectively.Quantitative analysis results indicate that solenin is mainly composed of -sheet structure (41.8%), with 16.2% -turn, 26.5% -helix, and a small amount of disordered structure (9.8%).This high -sheet content contradicts K1 and K9 (mainly with -helix structure), which confirms that solenin is not K1 or K9.It also implies that the head or tail domain of solenin is mainly composed of -sheet structure since its central rod domain is similar to K1 and K9.This secondary structural feature is similar to silk [29,30] and K58 [14].It may endow solenin with high performance of tensile strength, just like silk; the strong fibrous protein having high tensile strength of 0.6 GPa [31] is mainly made up of -sheet structures.
Although the tensile strength of solenin has not been determined yet for the lack of instrument, there is no doubt that solenin is subjected to repeating tensile stress with frequent closing of shell valves in vivo [16].This implies that solenin must have superior tensile strength to withstand longterm successive rapid movement of shell valves to accommodate the rapid burrowing life habit of this species [16,32].In addition, solenin presents excellent solvent resistance property similar to K58 [14], which implies that the protein may serve as a good template for fabricating of biomaterials.

Conclusions
Solenin is a novel fibrous protein homologous to K1 and K9.It is not constructed by microfibrils and is different from collagen.Its high content of -sheet structure (41.8%) may endow solenin with excellent tensile strength and solvent resistance property.Insight into the structure of this intriguing natural fibrous protein will provide a new template for fabricating of bioinspired materials.

Figure 1 :
Figure 1: Optical photos and SEM image of the experiment sample.(a), (b) Optical photos of S. grandis and ligament fibrous protein, respectively, and (c) SEM image of the fibrous protein.

Figure 3 :
Figure 3: TEM observation of FP.(a) Transverse section of FP, (b) enlarged view of transverse section of FP that shows no microfibrils within the protein fibers, (c) longitudinal section of FP, and (d) enlarged view of longitudinal section of FP that shows no microfibrils within the protein fibers.

Figure 4 :Figure 5 :
Figure 4: Schematic diagram of formed (a) and unfolded glycine loops (b).Blue hexagons represent benzene rings and red dot lines represent the interaction of benzene rings side chains of phenylalanine and tyrosine.
a M * : oxidized methionine.b Bold peptides are the repeat sequences.