Analysis of Complete Genomes of Propionibacterium acnes Reveals a Novel Plasmid and Increased Pseudogenes in an Acne Associated Strain

The human skin harbors a diverse community of bacteria, including the Gram-positive, anaerobic bacterium Propionibacterium acnes. P. acnes has historically been linked to the pathogenesis of acne vulgaris, a common skin disease affecting over 80% of all adolescents in the US. To gain insight into potential P. acnes pathogenic mechanisms, we previously sequenced the complete genome of a P. acnes strain HL096PA1 that is highly associated with acne. In this study, we compared its genome to the first published complete genome KPA171202. HL096PA1 harbors a linear plasmid, pIMPLE-HL096PA1. This is the first described P. acnes plasmid. We also observed a five-fold increase of pseudogenes in HL096PA1, several of which encode proteins in carbohydrate transport and metabolism. In addition, our analysis revealed a few island-like genomic regions that are unique to HL096PA1 and a large genomic inversion spanning the ribosomal operons. Together, these findings offer a basis for understanding P. acnes virulent properties, host adaptation mechanisms, and its potential role in acne pathogenesis at the strain level. Furthermore, the plasmid identified in HL096PA1 may potentially provide a new opportunity for P. acnes genetic manipulation and targeted therapy against specific disease-associated strains.


Introduction
Acne vulgaris, commonly called acne, is a disease of the pilosebaceous unit of the skin. It affects over 80% of all adolescents in the US [1] and persists into adulthood in 50% of the cases [2,3]. While the etiology of the disease is undefined, four pathogenic mechanisms have been proposed: increased sebum production, changes in the follicle, hormone, and the activity of the follicular microflora [4][5][6][7][8].
Antibiotic treatment is one of the main acne therapies targeting the microbes living in the follicle.
Propionibacterium acnes, a Gram-positive anaerobic bacterium, has been associated with acne pathogenesis, largely due to the fact that it is commonly isolated from acne lesions [9,10] and that it can cause inflammation in the host skin. Contrarily, P. acnes is accepted as a commensal bacterium and in some cases has been shown to play a protective role against invading pathogenic colonization [11]. Our study of the human skin microbiome associated with acne demonstrated that P. acnes dominated the pilosebaceous unit in both healthy individuals and acne patients [12]. However, at the strain level, P. acnes distributions were significantly different in the two cohorts, suggesting that different P. acnes strains may contribute differently to skin health and disease. Therefore, understanding the genetic differences between acne-associated strains and other strains is essential to understanding the phenotypic differences of the strains and their different roles in acne.
Complete genome sequences provide detailed insights into genetic variations among strains, which may explain their phenotypic differences. We previously sequenced a complete P. acnes genome, HL096PA1 [12]. This strain belongs to recA type IA and ribotype (RT) 5, which is highly associated with acne. It is resistant to multiple antibiotics, including tetracycline, clindamycin, and erythromycin with resistanceconferring mutations in the 16S ribosomal RNA (rRNA) gene (G1058C) and the 23S rRNA gene (A2058G). To date, HL096PA1 is the only available complete genome of acneassociated strains [12]. The first sequenced P. acnes strain with a complete genome is KPA171202 [13]. This strain belongs to recA type IB and RT1, which was not specifically associated with acne [12]. The KPA171202 genome is 2.56 M bp long with 60% GC content. 2,333 open reading frames (ORFs) are encoded. To investigate whether genomic variations among P. acnes strains can explain their differences in virulent properties, in this study we performed a detailed genome comparison of the genome of HL096PA1 to KPA171202.

P. acnes HL096PA1 Genome Sequencing, Assembly and
Annotation. HL096PA1 was sequenced using Roche/454 FLX as previously described [12]. The genome was finished by multiple long-range PCRs combined with Sanger sequencing. Genome assembly and annotation were previously described [12]. Extensive manual inspection and editing of the genome annotation were performed. The GenBank accession numbers for HL096PA1 chromosome and plasmid pIMPLE-HL096PA1 are CP003293 and CP003294, respectively.

Genome
Comparison. P. acnes genome visualization and sequence comparison were performed using the ARTEMIS comparison tool (http://www.sanger.ac.uk/software/ACT/) [14]. Best-BLASTp matches with a cutoff E-value of 1 − 10 were used to identify HL096PA1 and KPA orthologous proteins.

Identification and Verification of Pseudogenes.
Predicted partial or truncated HL096PA1 protein-coding ORFs were aligned to homologs or truncated proteins in the nonredundant protein database to identify pseudogenes. For pseudogene verification, primers flanking the gene regions with frameshifts were designed for suspected HL096PA1 pseudogenes. DNA fragments of 500-1,000 bp were generated using PCR. Sanger sequencing was used to sequence the full length of the amplicons to verify the frameshifts.

Verification of Genomic
Inversion. PCR targeting the chromosomal inversion region of HL096PA1 was performed using the primer sets described in Figure 3(b). Each 20 Lreaction contained 13.3 L molecular grade H 2 O, 2 L 10X PCR buffer, 0.6 L 50 mM MgSO 4 , 0.4 L 10 mM dNTPs, 0.8 L primers (final concentration of 0.4 M), 0.1 L Platinum Taq DNA Polymerase High Fidelity (Invitrogen), and 2 L 30 ng/uL genomic DNA template. Genomic DNA extracted from HL096PA1 or KPA171202 was amplified. Thermocycling conditions were as the following: initial denaturation step of 10 minutes at 94 ∘ C, 35 cycles of 45 seconds at 94 ∘ C, 45 seconds at 54 ∘ C, and 7 minutes at 72 ∘ C and a final elongation step of 20 minutes at 72 ∘ C. The primer sequences

Plasmid in HL096PA1
, pIMPLE-HL096PA1. Compared to KPA171202, the most significant specific feature of HL096PA1 is a plasmid. HL096PA1 genome sequencing and assembly revealed a second large contig in addition to the chromosome, which appears to be an extrachromosomal plasmid [12]. The plasmid, named pIMPLE-HL096PA1, is 56,169 bp long. It encodes 74 ORFs (PAGK 2319-PAGK 2392) and one origin of replication (RepA). Several plasmid related proteins, including plasmid partition protein ParA (PAGK 2332) and plasmid stabilization system toxin and antitoxin proteins (PAGK 2321 and PAGK 2322), are encoded. Based on the sequencing coverage and quantitative PCR results, the copy number of the plasmid is three per genome [12]. While our attempts to isolate the plasmid in culture have been unsuccessful possibly due to the low copy number and the large size of the plasmid, we have since identified homologous plasmids in multiple other P. acnes strains that belong to the same lineage as HL096PA1 based on genome sequencing [12]. The assembled sequence contig suggests that pIMPLE-HL096PA1 is a linear plasmid, with hairpin ends used for bidirectional replication. Linear plasmids with terminal inverted repeats (TIRs) and 5 terminally attached proteins have been found in bacteria, fungi, and higher eukaryotes [15][16][17][18][19]. Well-studied examples of human pathogenic bacteria with such linear plasmids include Spirochaete Borrelia and Filamentous Streptomyces [16,[20][21][22][23][24]. TIRs have been shown to be essential for bidirectional linear plasmid replication [20,[22][23][24]. Similar to the linear plasmids observed in Borrelia and Streptomyces, pIMPLE-HL096PA1 contains a 20 bp short TIR with a sequence of ACGACACCAGCACCCACAAC at each end of the plasmid. Due to the difficulty in assembling the repeat sequences at the ends, the lengths of both ends of the plasmid are not fully defined but were assembled to the same. Based on the assembled plasmid sequence, the left arm TIR is located 743 bp from the start of the plasmid sequence, spanning the first ORF (PAGK 2319, an unknown protein). The right arm TIR is located 743 bp from the 3 end of the plasmid sequence. Analysis of the sequences at the ends of the plasmid reveals potential hairpin structures within the repeat sequences. The last ORF on the plasmid, PAGK 2392, is located near the right arm TIR and encodes resolvase A (ResA). ResA may facilitate fusion and resolution of hairpin telomeres during plasmid replication. It may cut the circular-like plasmid after replication into two linear plasmids with hairpin ends, using a mechanism similar to telomere resolvase (ResT), an enzyme commonly encoded by linear plasmids and shown to facilitate resolution of hairpin telomeres during plasmid replication [25].
The genes encoded on pIMPLE-HL096PA1 have little redundancy with the HL096PA1 chromosome. The plasmid shares only two genes (PAGK 2383 and PAGK 2391) with the chromosome. However, it encodes 12 genes (PAGK 2331, PAGK 2332, and PAGK 2337-PAGK 2346) homologous to the genes in KPA171202, which are absent on the HL096PA1 chromosome and are mostly hypothetical proteins with unknown functions. It also encodes an additional esterase/lipase (PAGK 2386). Lipases have been shown to contribute to growth advantage and possible pathogenicity of P. acnes in acne. Lipases hydrolyze triacylglycerols to free fatty acids, which may reduce the invasion of other skin pathogens but may also lead to inflammation and the development of acne when over-produced [26,27].
Significant horizontal gene transfer is evident within pIMPLE-HL096PA1. Thirty-five of the 74 predicted ORFs (PAGK 2344-PAGK 2378) are highly homologous to ORFs CLOLEP 00122-CLOLEP 00166 encoded in Clostridium leptum DSM 753 ( -value = 1 − 100 to 0) (Figure 2(a)), a Gram-positive commensal bacterium found in the human gut. The same region is also highly homologous to the genomes of Propionibacterium humerusii strains HL037PA2, HL037PA3, HL044PA1, and P08 [28]. P. humerusii is a bacterium found on the human skin [12] and is closely related to P. acnes. It shares several additional ORFs with pIMPLE-HL096PA1 (Figure 2(a)). Based on our sequencing data of strains HL037PA2, HL037PA3, and HL044PA1, the homologous genes in these P. humerusii genomes had a three to five folds higher sequencing coverage than other genes in the genomes (Figure 2(c)). Although these genomes are not complete genomes, the higher sequencing coverage of these genes suggests that they may also be located on an extrachromosomal mobile genetic element in P. humerusii. It is unclear whether this set of ORFs was horizontally transferred among  )  2319  2320  2321  2322  2323  2324  2325  2326  2327  2328  2329  2330  2331  2332  2333  2334  2335  2336  2337  2338  2339  2340  2341  2342  2343  2344  2345  2346  2347  2348  2349  2350  2351  2352  2353  2354  2355  2356  2357  2358  2359  2360  2361  2362  2363  2364  2365  2366  2367  2368  2369  2370  2371  2372  2373  2374  2375  2376  2377  2378  2379  2380  2381  2382  2383  2384  2385  2386  2387  2388  2389  2390  2391  P. acnes, P. humerusii, and C. leptum or transferred from an ancestor organism. Further investigation into the relationships among these microbes that primarily reside at different body sites will provide insight to how genetic materials were transferred and how this mechanism affected the virulent properties of human commensals. Within this cluster of C. leptum homologs on pIMPLE-HL096PA1, we identified a Tad (tight adhesion) locus [12]. Tad locus was reported to mediate colonization, pathogenesis, and tight adhesion of bacteria to host cell surfaces through Flp pilus assembly in Gram-positive pathogens, such as Corynebacterium dipheriae, Mycobacterium tuberculosis, Mycobacterium bovis, and Streptomyces coelicor [29][30][31]. The organizations and annotations of the ORFs in this locus are shown in Figure 2 and Table 2. ORFs PAGK 2360-PAGK 2370 encode a set of Tad proteins, including Flp (fimbrial low molecular-weight proteins) pilus proteins, TadZ, TadA, TadB, TadC, and TadE. The transcription of Tad locus in HL096PA1 appears to be controlled by a sigma 70 family transcription factor (PAGK 2357) and an antisigma factor (PAGK 2358) located immediately upstream of the Tad locus. Sigma 70 family transcription factors are commonly found upstream of virulence-associated gene loci and have been shown to regulate virulence expression in response to particular stress-related stimuli in several pathogenic bacteria such as Vibrio cholera [32]. This suggests that Tad locus expression in HL096PA1 may be induced under specific conditions, leading to enhanced colonization and adhesion to host cells. While KPA171202 does not possess such a plasmid and the Tad locus, majority of the RT4 and RT5 strains of P. acnes, which are highly associated with acne, contain a homologous plasmid and the Tad locus [12]. The unique    presence of such a plasmid in acne associated strains may potentially explain their role in acne pathogenesis and can possibly be used as a pathogenicity marker.

Pseudogenes in HL096PA1.
Pseudogenes tend to be abundant in bacterial species that have recently adapted to eukaryotic hosts or a pathogenic lifestyle [33]. Thus identification of pseudogenes in disease associated strains may provide insight to their host adaptation and pathogenic mechanisms. We identified 79 predicted pseudogenes in HL096PA1 genome (Table 3). In addition to confirmation of the pseudogenes based on the high-quality 454 sequence reads, we selected 17 out of the 79 identified pseudogenes and confirmed the frameshifts in all 17 genes by PCR and Sanger sequencing. Most pseudogenes represent recent gene disruptions and are usually unique to a particular genome [34]. HL096PA1 and KPA171202 share only four predicted pseudogenes. Compared to the 14 frameshifted ORFs encoded in the KPA171202 genome [13], HL096PA1 harbors a significantly increased number of pseudogenes. This may suggest a faster reduction rate of the chromosome of this strain. Sixty-one of the 79 predicted pseudogenes in HL096PA1 contain single inactivating mutations, indicating that the majority of these mutations were recently acquired [34]. Although the functions of the predicted pseudogenes in HL096PA1 are diverse, 14 of the 79 pseudogenes belong to the functional group of carbohydrate transport and metabolism, including five phosphotransferase system (PTS) components responsible for carbohydrate regulation and sensing. Two of the three predicted fructose PTSs contain frameshifts, suggesting that only one fructose uptake system remains functional in the genome [35]. In addition, HL096PA1 contains two pseudogenes in the only predicted galactitol PTS [35] and one disrupted component in the single glucitol PTS. This suggests that both of these systems are nonfunctional in this strain. These gene disruptions observed in the PTS components and other proteins of the carbohydrate transport and metabolism pathways may reflect an adaptation of HL096PA1 to the availability of nutrients in the pilosebaceous unit, where the major carbon source is lipids but not carbohydrates.
In addition to PTS genes, multiple virulence-associated adhesion proteins in HL096PA1 harbor frameshifts.
Adhesion proteins are of special interest because of their pathogenic potential and involvement in biofilm formation. Brüggemann et al. reported nine putative adhesion proteins in KPA171202 genome which contain thrombospondin type 3 repeats, PKD and SEST domains [13]. We found frameshifts in five of the nine orthologous adhesion proteins in HL096PA1, including PAGK 1592, PAGK 1593, PAGK 1821, PAGK 1897, and PAGK 2110. These orthologs were also recently reported to contain frameshifts in P. acnes strain 266 [36], which is a strain more closely related to HL096PA1 than to KPA171202. The differences in functional adhesion protein genes among the strains suggest that various P. acnes strains may use antigenic variations, a mechanism used by many pathogenic bacteria to avoid the host immune system [37], as a host immune invasion mechanism.

Genomic Islands in HL096PA1
. HL096PA1 harbors several island-like genomic regions. In KPA171202, 11 genomic islands were reported previously [13]. These regions encode proteins associated with pathogenicity or survival, mobility, and proteins of unknown function. The chromosome of HL096PA1 shares eight of the island-like regions found in KPA171202. Three of the KPA171202 island-like regions are absent in HL096PA1, including PPA839-878, PPA1278-1321, and PPA1576-1618. In addition to the genomic islands described by Brü ggemann et al. [13], an island-like region (PPA2055-2092) in KPA171202 reported later [36] is also absent in HL096PA1. On the other hand, HL096PA1 contains two unique genomic islands, locus 1 and locus 2, which have been shown to be associated with the disease state of acne [12]. These two loci are absent in KPA171202 but present in all strains that belong to the same lineage as HL096 PA1 [12], including almost all RT4 and RT5 strains and another completely sequenced strain SK137 [36]. HL096PA1 locus 1 spans over 6 Kb and encodes seven ORFs (PAGK 0121-PAGK 0127). This region shows evidence of a cryptic phage-like integration through the flanking tRNAs. It encodes phage-related genes such as site-specific recombinase. An antibiotic ABC transporter is also encoded in this region; however, it is a predicted pseudogene, suggesting that antibiotic resistance was once conferred through this phage integration.    [38]. This gene cluster is potentially capable of producing a nonribosomally synthesized secreted toxin, streptolysin. Streptolysin is responsible for the classical beta-hemolytic phenotype of bacterial colonies grown on blood agar media. This gene cluster in HL096PA1 may contribute to its virulence against the host and to its dominant colonization against other pilosebaceous unit dwelling microbes.
The second major group of genes encoded in locus 2 is phage defense genes. By using Pfam [39], we identified an abortive infection Abi 2 domain (pfam07751) within ORF PAGK 0180, which is involved in bacteriophage resistance via premature cell death upon phage entry. This abortive infection defense mechanism against bacterial phage is mediated by a single copy of Abi 2 domain containing protein [40]. A second copy of this Abi 2 domain containing gene, PAGK 2391, is encoded on the plasmid pIMPLE-HL096PA1. This suggests a possible common origin of locus 2 and the plasmid.
Additional genomic evidence supports a possible common origin of locus 2 and the plasmid pIMPLE-HL096PA1. Located at the immediate downstream of the abortive infection defense system is ORF PAGK 0182. It is a hypothetical protein homologous to CLOLEP 00167 in C. leptum. As described above, the plasmid pIMPLE-HL096PA1 contains a cluster of genes with high similarities to C. leptum DSM753 genes CLOLEP 00122-CLOLEP 00166, while the homolog of CLOLEP 00167 lies in locus 2 on the chromosome of HL096PA1. This further suggests a possible common origin of locus 2 and the plasmid. Notably, this gene in locus 2, PAGK 0182, contains a 31 bp sequence. As reported by Fitz-Gibbon et al. [12], this 31 bp sequence was found identical to the clustered regularly interspaced short palindromic repeat (CRISPR) spacer sequence encoded in some of the type II P. acnes strains. This suggests that the genes in locus 2 may have been acquired through a phage or plasmid mechanism in RT4 and RT5 strains, while type II strains do not harbor this genomic island possibly due to the protection mechanism of CRISPR/cas system against foreign DNA.

Genomic Inversion in HL096PA1.
While genomic inversion is commonly seen in bacterial genomes, pathogenic bacteria often exhibit a high degree of genomic rearrangement [41]. We observed a large genomic inversion in HL096PA1 when compared to KPA171202. The inversion spans the ribosomal region of 1.2 Mb ranging from the 1st set to the 3rd set of ribosomal operons (Figure 3(a)). To verify the inversion, we designed primer sets that span the beginning, middle, and the end of the inverted region (Figure 3(b)). PCR amplifications of these regions were performed. As predicted based on the inverted genomic region, HL096PA1 had strong amplifications of the regions covered by primer sets I, II, and III but had no specific amplifications using primer sets IV and V (Figure 3(c)). The inversion was further confirmed by sequencing the ends of the amplicons using the Sanger method. KPA171202 was used as a control. As predicted, KPA171202 had no specific amplifications using primer sets I and III, but strong amplifications using primer sets II, IV, and V. Among the nine currently available complete P. acnes genomes, ATCC11828 strain [42] also contains a genomic inversion. Similar to the inversion in HL096PA1, it spans a ribosomal region of 0.9 Mb including the 1st set and the 2nd set of ribosomal operons (Figure 3(a)).

Conclusions
Our recent study of the skin microbiome has shown that P. acnes resides in the pilosebaceous unit of both healthy skin and acne affected skin [12], challenging the traditional concept applied in clinical practice that all P. acnes contributes to acne pathogenesis. Moreover, several studies of the P. acnes strain populations on the skin suggested that acneassociated P. acnes subpopulations exist and may correlate with the genomic variations of the strains [12,22,36,43,44]. Our genome sequencing and comparison of a large number of strains revealed significant disease-or healthassociated genetic elements [12]; however, detailed analyses of extrachromosomal elements, pseudogenes, and genome structure variations were limited due to unfinished genome sequences. The comparison between two complete genomes HL096PA1 and KPA171202 described in this study allowed us to better understand the strain-specific genomic variations in plasmid, pseudogenes, genomic islands, and genomic inversion that may be associated with disease pathogenesis.
In this comparison, we identified several strain-specific genetic elements that are potentially associated with increased virulence of HL096PA1 in acne. Present in HL096PA1 and absent in KPA171202 is the first described P. acnes plasmid carrying a tight adhesion locus, which has been shown to enhance colonization and biofilm formation in several human pathogens [29][30][31]. Further investigation of this P. acnes plasmid will provide insight on its pathogenic potential. Compared to KPA171202, we observed a more than five-fold increase in the number of pseudogenes in HL096PA1, which suggests a potential mechanism of this strain to host adaptation. HL096PA1 contains two genomic islands that are unique to acne-associated strains and may be associated with increased virulence of this lineage in acne [12]. In addition, a large genomic inversion was observed in HL096PA1. Together these genetic differences revealed by complete genome comparison provide insights into the complexity of the strain variations of this organism and support the existence of specific P. acnes subpopulations which may be associated with acne pathogenesis [36,45]. Furthermore, the first P. acnes plasmid identified in HL096PA1 may potentially provide a new opportunity for genetic manipulation of this organism. The knowledge gained from this comparative study may potentially lead to novel diagnostic and targeted therapeutic approaches for acne against a specific disease associated lineage of P. acnes.