Prediction of the Sex-Associated Genomic Region in Tunas (Thunnus Fishes)

Fish species have a variety of sex determination systems. Tunas (genus Thunnus) have an XY genetic sex determination system. However, the Y chromosome or responsible locus has not yet been identified in males. In a previous study, a female genome of Pacific bluefin tuna (T. orientalis) was sequenced, and candidates for sex-associated DNA polymorphisms were identified by a genome-wide association study using resequencing data. In the present study, we sequenced a male genome of Pacific bluefin tuna by long-read and linked-read sequencing technologies and explored male-specific loci through a comparison with the female genome. As a result, we found a unique region carrying the male-specific haplotype, where a homolog of estrogen sulfotransferase gene was predicted to be encoded. The genome-wide mapping of previously resequenced data indicated that, among the functionally annotated genes, only this gene, named sult1st6y, was paternally inherited in the males of Pacific bluefin tuna. We reviewed the RNA-seq data of southern bluefin tuna (T. maccoyii) in the public database and found that sult1st6y of southern bluefin tuna was expressed in all male testes, but absent or suppressed in the female ovary. Since estrogen sulfotransferase is responsible for the inactivation of estrogens, it is reasonable to assume that the expression of sult1st6y in gonad cells may inhibit female development, thereby inducing the individuals to become males. Thus, our results raise a promising hypothesis that sult1st6y is the sex determination gene in Thunnus fishes or at least functions at a crucial point in the sex-differentiation cascade.


Introduction
Fish species have a substantial variety of sex determination systems compared to other animals, ranging from environmentally to genetically modulated ones. In the environmental sex determination system, individual sex depends mainly on nongenetic factors such as temperature, growth rate, and density, while in the genetic system, specific genes dominantly control sex differentiation. Regarding sexuality, it is estimated that more than 98% of fish are gonochoristic [1]. Therefore, in most fish, some responsible loci should exist biasedly between males and females or differentially affect gonadal phenotypes, even if the sex is primarily determined by environmental factors. In heterogametic (XY) males or heterogametic (ZW) females, sex chromosomes may be cytologically evident markers as carrying the sex determination loci, but they are often homomorphic in fishes [2], undistinguishable from autosomes by direct microscopic observation. In other words, fish sex chromosomes are not highly differentiated, implying that turnovers may have often occurred over the course of evolution [3]. Recently, sex determination genes have been reported in several fish species, including DMY/Dmrt1 in medaka [4,5] and Chinese tongue sole [6], Amhr2 in fugu [7], sdY in rainbow trout [8], Hsd17b1 in yellowtail [9], and amh in Patagonian pejerrey [10] and northern pike [11]. These studies suggest that a master gene for sex determination in fish can be easily replaced by another gene, which may be followed by the above-mentioned turnover of the sex chromosome. In particular, the cases in fugu and yellowtail are prominent examples, where only a single-nucleotide nonsynonymous polymorphism in a gene (Amhr2 for fugu and Hsd17b1 for yellowtail) between the males and females is critical for sex determination. Such observations suggest that new sex determination genes can emerge by a small amount of nucleotide substitutions in a short period of evolutionary time. Thus, the sex determination genes of fish are often different for each lineage and are not always identified by deductive approaches from model organisms (that is, case studies are required to identify the genes).
Regarding tunas, fish of the genus Thunnus South 1845, it is assumed that Pacific bluefin tuna (Thunnus orientalis, PBT) has an XY genetic sex determination system [12]. However, the chromosome or locus responsible has not yet been identified. One reason for this is that no distinguishable sex chromosomes were observed in the cytological analysis. It has been reported that little tuna, Euthynnus affinis, may have a ZW sex determination system [13]. These observations suggest that the sex determination gene of Thunnus species has only recently emerged and, therefore, that the sex chromosomes are not highly divergent. Another reason is the technical difficulty in handling: since tunas are large and difficult fish to culture, in vivo experiments for identifying sex-associated loci are costly and time-consuming. Identification of the sex-associated loci of tunas is considered to have a high impact in the fishery industry as well as in ichthyology: tunas have a high market value worldwide, and tuna target fishing and aquaculture have been conducted in many countries. In aquaculture, the control of the sex ratio in tanks or ocean nets is very important for breeding, as keeping more females than males leads to an increased production of fertilized eggs. Since tunas do not show any clear sexual dimorphism, the direct observation of the gonad has been the main technique for sex identification. However, this is a destructive inspection and is not applicable for selectively collecting females or removing males in tanks. Therefore, molecular markers associated with the sex of tuna have been of interest as an alternative for sex identification. In particular, DNA markers have been considered promising, where a small portion of tissue sections, such as the fin or skin, are sufficient for PCR assays. A DNA marker associated with PBT sex was reported in 2015 [12]. The marker, Md6, was developed from the DNA samples of cultured PBT individuals and showed an association with sex in the aquaculture population. Regarding the genomic approach, the PBT male genome data were public [14], but the scaffold sequences were too fragmented for a genome-wide screening of sex-associated loci. Recently, transcriptomics of southern bluefin tuna (T. maccoyii, SBT) have been conducted, and the differentially expressed genes between male and female gonads (testis and ovary) were examined [15]. However, it should be noted that a differentially expressed gene between the testis and ovary is not always the sex determination gene, but may be a downstream gene regulated by the master gene.
In 2019, Suda et al. sequenced the genome of female PBT, compared the assembled scaffold with that of the male genome, and explored sex-associated regions from the scaffold sequences by resequencing 31 PBT individuals (15 males and 16 females) [16]. They performed a genomewide association (GWA) study focusing on sex-associated single-nucleotide variants (SNVs) between males and females, and identified several regions carrying malespecific alleles. Finally, they developed three PCR primer pairs for sex identification, namely, scaf64_3724604_F/R, scaf64_3726411_F/R, and scaf64_3724591_F/R, based on male-specific heterozygous SNVs and short indels detected in the 6.5 kb region of female scaffold 64 (F64). The PCR primers were designed so that the amplification patterns were different between PBT males and females: scaf64_ 3724604_F/R and scaf64_3726411_F/R were amplifiable in males (the sizes were 113 bp and 143 bp, respectively) but not in females, and scaf64_3724591_F/R was amplified in both males and females, but the sizes were different (142 bp and 149 bp in males and only 149 bp in females). PCR amplification using 115 individuals (56 males and 59 females) yielded 100% accuracy of sex identification, implying that the PBT sex determination gene may exist around the 6.5 kb region in F64. They predicted some genes in F64 and discussed the possible involvement in the sex determination of PBT. However, promising genes, such as those related to the cascade, were not identified. The potential problem in the previous study was that they found malespecific heterozygous polymorphisms in multiple scaffolds. Since most sex determinants are often attributed to a single locus, they suspected that multiple candidates occurred due to assembly errors.
In this study, we resequenced and reassembled the PBT male genome using long-read sequencing technologies and examined the sex-associated regions using previously published data [17]. We have found a sex-associated gene that is paternally inherited and is involved in estrogen inactivation. The current results provide the most promising candidate for the sex determination gene of tuna.

Genome Sequencing and Assembly.
A three-year-old male F2 individual of Pacific bluefin tuna, which was cultured at the Seikai National Fisheries Research Institute (currently Fisheries Technology Institute), was collected for genome sequencing. A frozen heart tissue was prepared from the specimen, and DNA was extracted using a DNeasy Blood & Tissue Kit (Qiagen, Venlo, Netherlands). Then, an SMRT-bell template library was prepared following the manufacturer's protocol and sequenced using the Pacific Biosciences (PacBio) Sequel platform (P6C4 chemistry) 2 International Journal of Genomics (Pacific Biosciences, Inc., CA). From the same tissue sample, the DNA was extracted also using the standard phenolchloroform protocol. A Chromium 10X library was prepared using Chromium Genome Library Kit v2, Genome Gel Bead Kit, Genome Chip Kit v2, and i7 Multiplex Kit (10X Genomics, CA) following the manufacturer's protocol, and the linked reads were sequenced using the Illumina NovaSeq 6000 platform (Illumina, Inc., CA). The PacBio reads were first assembled by HGAP4 (SMRT Link v5.1.0.26412) [18]. The contigs obtained were corrected using NovaSeq linked-reads by Pilon v1.23 [19] and Tigmint v1.1.2 [20] and finally scaffolded by ARKS v1.0.2 [21] in combination with LINKS v1.8.5 [22]. The scaffold sequences obtained were submitted to the DNA Data Bank of Japan under accession nos. BOUD01000001-BOUD01000948. The final scaffold sequences were assessed by Benchmarking Universal Single-Copy Orthologs (BUSCO) version 4.0.5 [23] with a set of Actinopterygii orthologs (actinopter-ygii_odb10). The genes around the sex-associated region were predicted based on the protein sequences of model fish species in the Ensembl database (release 99) [24], namely, Danio rerio, Oryzias latipes, Gasterosteus aculeatus, Takifugu rubripes, and Tetraodon nigroviridis, by Exonerate v2.4.0 [25]. The PBT female genome sequence and the southern bluefin tuna (SBT) male genome sequence were obtained from the GenBank database (accession nos. BKCK00000000 and CAJVDC010000000, respectively). The nucleotide sequence comparison was performed by MUMmer v4 [26]. The repeat library of PBT genome was constructed by RepeatModeler Open-1.0, and the repetitive regions were masked by RepeatMasker Open-4.0 (http://www .repeatmasker.org). The duplicated regions were further detected by self-to-self BLASTN [27] in each of the PBT male and female genomes, and the sequences detected were used also for masking. To examine the phased genomic regions, the linked reads were assembled using Supernova2 (v2.1.1) [28].
2.2. Genome-Wide SNV/CNV Analysis. The sequenced reads for 31 PBT individuals (15 males and 16 females) [16] were downloaded from the NCBI SRA database (accession no. DRP005507). The read sequences were extracted using SRA Toolkit (https://trace.ncbi.nlm.nih.gov/Traces/sra/sra .cgi?view=software) and then preprocessed using Trimmomatic v0.36 [29] and ParDRe v2.2.5 [30], according to Suda et al. Mapping to the scaffold sequences and singlenucleotide variant calling were conducted using MapCaller v0.9.9.37 [31]. After filtering out inadequate sites (not biallelic, depth<20), genome-wide association (GWA) between males and females was estimated through 2 × 3 contingency tables using the R package, rrBLUP [32], with minor allele frequency > 0:05. Statistical significance was defined as P < 0:05 after Bonferroni's correction based on the total number of variant sites examined. For relative depth analysis, raw read depth per nucleotide site was counted using Samtools (samtools depth) [33] and then divided by the median depth. Copy number variations (CNVs) were detected by CNVcaller [34], where the above-mentioned mapping data were scanned by a 500 bp window and regional copy numbers were obtained by merging them in the neighboring windows. The difference in the regional copy number between males and females was tested using the Mann-Whitney U test with Bonferroni's correction. Furthermore, the regions carrying either of the three types of copy numbers (0, 0.5, or 1) among the 31 samples were selected, and the GWA was estimated in the same way as the SNV-based GWA scan mentioned above.
2.3. Sequence Analysis of Sex-Associated Genes. The transcriptome data of southern bluefin tuna [15] were downloaded from the NCBI SRA database (accession no. SRP059929). The read sequences were extracted using SRA Toolkit, preprocessed by Trimmomatic, and assembled using Trinity v2.8.4 [35]. From the contig sequences obtained, protein-coding sequences were predicted using TransDecoder v5.0.0 [35]. Homologs of PBT genes were searched using BLASTP [36] from the contig sequences, and the transcriptional variants were manually checked based on the sequence alignment using MAFFT v7.310 [37]. Using the Trinity contig sequences as a reference, gene expression was measured using RSEM v1.3.1 [38]. For molecular phylogenetic analysis, homologs in other fish genomes were downloaded from the Ensembl database. The coding sequences were then aligned using MAFFT, where the deduced protein sequences were aligned and converted into the corresponding codon alignments. The maximum-likelihood tree was constructed using RAxML-NG v0.9.0 [39] and visualized using FigTree v1.4.4 (http:// tree.bio.ed.ac.uk/software/figtree/).

Genome Assembly and Validation of Sex-Associated
Markers. We constructed the genome sequence of male PBT by de novo assembly, yielding a total of 948 scaffolds (>1000 bp) and totaling 827 Mb (Table 1).
Using BUSCO for quality assessment, this assembly completely captured 96.3% of Actinopterygii orthologs (3505 out of 3460). On the other hand, the score for the female genome was 89.3% (3253 out of 3640), consistent with the previous report. Using the sex-associated PCR primer sequences reported , we performed a BLAST search against the current male genome and found that two scaffolds, namely, M175 and M44, carried the regions amplifiable by the primer sets (two male-specific sets, scaf64_ 3724604_F/R and scaf64_3726411_F/R, and a common primer set, scaf64_3724591_F/R) ( Table 2).
In scaffold M175 (232,647 bp in total), the regions amplified by the three primer sets were closely located. The estimated amplicon sizes were congruent with those previously observed only in males: 113 bp for scaf64_ 3724604_F/R, 143 bp for scaf64_3726411_F/R, and 142 bp for scaf64_3724591_F/R, respectively. Scaffold M44 (4,583,724 bp in total) had a single region amplified by scaf64_3724591_F/R, and the estimated amplicon size was 149 bp, which was equivalent to that observed previously in both males and females. Thus, based on the estimated PCR products, M175 carried all of the male-specific regions and 3 International Journal of Genomics M44 carried the common haplotype to both males and females. We compared the nucleotide sequences of M175 and M44 and found that M175 was hardly aligned to M44 ( Figure 1) because of low similarity, while M44 was fully aligned with female scaffold F64 (Figure 2), in which the 6.5 kb sex-linked region (SLR) was reported in the previous study. The 6.5 kb SLR was found also in M44. On the other hand, we found no counterpart of M175 in the female genome, although the 6.5 kb SLR of M44/F64 (SLR M44 or SLR F64 ) was partially similar to the 10.7 kb region in M175 (SLR M175 ) ( Figure 1). We found the counterparts of M44 and M175 in the southern bluefin tuna (SBT) male genome, respectively, which were located on different chromosomes, namely, chromosome 4 (accession no. NC_056536) for M44 and chromosome 13 (accession no. NC_056545) for M175, respectively (Supplementary Figure 1). We assembled the PBT male genome using Supernova2 and obtained scaffolds corresponding to M44 and M175, respectively (Supplementary Figure 2). The M44-like scaffold carrying SLR M44 was phased into two pseudohaploid scaffolds, but SLR M44 itself was homozygotic. On the other hand, the scaffold carrying SLR M175 was not phased.

Read Depth Analysis.
We downloaded the resequencing data of 31 PBT individuals (15 males and 16 females) and mapped the reads to the current male genome sequence. As a result, both male and female reads were fully mapped to SLR M44 (Figure 3 and Supplementary Figure 3). Regarding M175, the female reads were hardly mapped to SLR M175 , in contrast to the male reads. Furthermore, there were consistently fewer male reads mapped to SLR M175 than SLR M44 . We computed the median read depths for whole scaffolds and thereby converted the read depths of SLR M44 and SLR M175 into the relative values. In males, the relative depths of SLR M175 were close to 0.5 ( Figure 4). On the other hand, the relative depths of SLR M44 were close to 1 in both males and females. These observations strongly suggest that, in shotgun sequencing libraries, the DNA content derived from SLR M175 in the male samples is always half than that derived from the average scaffolds, reflecting the ploidy or copy number. The majority of chromosomal regions, except for highly repetitive ones or those of polyploidized genomes, should exist as singletons in a diploid manner (2n), and the relative depths for many regions would be approximately 1. Therefore, our observations suggest that SLR M175 , with a relative depth of 0.5, should exist in a haploid manner (1n) only in males, while SLR M44 should exist in a diploid manner, like many other regions, in both males and females.
3.3. Genome-Wide Association Analysis. Since the results obtained until now appeared to be inconsistent with those previously reported (see Discussion), we validated the previous mapping and GWA results based on the female genome. We mapped the 31 resequenced read sets to the female reference genome and reperformed GWA analysis. As a result, the previous result was roughly reproduced, with significant peaks observed in multiple scaffolds, including F64 (Figures 5(a) and 5(c)). However, when the mapped reads between the female and male genomes were compared, over 10% of the reads mapped to female SLR F64 in the 15 male samples were found to be potentially derived from SLR M175 ( Figure 6). This observation implies that the reads derived from the male-specific region, such as SLR M175 , would have been incorrectly mapped to paralogous regions in the female genome in the previous study. To avoid the effect of such cross mapping, we performed the GWA scan again after the duplicated regions were thoroughly masked by Repeat-Masker and BLASTN. This result was in contrast to that from the unmasked reference: no significant peaks were observed in the masked sequences (Figures 5(b) and 5(d)). Similarly, GWA scans were performed using the current male genome, but no significant peaks were found in the masked sequences (Supplementary Figure 4).
We then tested another approach to identify the regions carrying male-specific haplotypes, such as SLR M175 . Using the mapping data for the 31 individuals, we computed copy number variations (CNVs) across the male genome and explored the regions whose copy numbers were significantly different between males and females. As a result, only one candidate was found in M175 (18,001-78,750 bp), very close to SLR M175 , and the copy number in males was significantly larger than that in females (Figure 7(a)). All of the males had 0.5 copies of the detected region and all of the females had no copy (i.e., absent), in concordance with the relative depth analysis results (Figure 4). We computed genome-wide associations for the haploid regions by converting the copy numbers (0, 0.5, and 1) to three quasigenotypes (0/0, 0/1, and 1/ 1). As a result, only one significant peak for this region was detected in scaffold M175 (Figure 7(b)).

Protein-Coding Genes in Scaffold M175
. We scanned the M175 sequence and predicted 12 protein-coding genes ( Figure 8 and Supplementary Table 1). Of these, two genes,  was similar to that of estrogen sulfotransferase encoded by sult1st6 (e.g., with~83% identity to ENSORLP00000007539 of Oryzias latipes). Therefore, the homolog of PBT was named sult1st6y. In the case of g4, the deduced amino acid sequence was similar to that of ENSTRUP00000072273 (gene ID: ENSTRUG00000033234) of Takifugu rubripes, with~70% identity. As a "novel gene" in the Ensembl database, the function was unknown. However, the domain "Integrase, catalytic core" was predicted by Pfam and PROSITE and found to overlap with the Gene3D domain "Ribonuclease H superfamily." The neighboring 10 genes (g1, g2, and g5-g12) were novel or related to transposable elements like g4 (Supplementary Table 1), and thus, only g3 was a functionally annotated gene. We found no homologs of sult1st6y (g3) in the scaffolds M44/F64. Instead, a paralog was found in male scaffold 30 (M30, 6594208-6602041 bp), named, sult1st6a, and the nucleotide and protein identities to sult1st6y were 92% and 89%, respectively. Male scaffold M30 corresponded to a part of female scaffold 47 (F47) (Supplementary Figure 5), where sult1st6a was also found. In the vicinity of sult1st6a, six genes were encoded, namely, paqr4b (progestin and adipoQ receptor family member IVb), pelo (pelota mRNA surveillance and ribosome rescue factor), ppp1cab (serine/ threonine-protein phosphatase alpha-2 isoform), atp2a1l (sarcoplasmic/endoplasmic reticulum calcium ATPase 1), mto1 (mitochondrial tRNA translation optimization 1), and adprh (ADP-ribosylarginine hydrolase). The conserved synteny (paqr4b-pelo-ppp1cab-atp2a1l-sult1st6-mto1-adprh) was observed between PBT and other fish species (Supplementary Table 2). To determine whether these sult1st6 genes are present in other tuna species, we downloaded the RNA-seq data of SBT, which were obtained from 10 male testis samples and 10 female ovary samples. We assembled the reads using Trinity and conducted a homology search against the contigs using PBT sult1st6y/sult1st6a sequences. As a result, we found the homologs of sult1st6y/sult1st6a in SBT, although the Trinity contigs were originally classified into a single cluster. We conducted phylogenetic analysis using PBT and SBT sult1st6 and the homologs in the Ensembl database ( Figure 9 and Supplementary Table 2). The phylogenetic tree showed that each SBT homolog formed a clade with each of PBT's sult1st6y/sult1st6a, respectively, suggesting that SBT also had two paralogs of sult1st6 and the duplication to sult1st6y and sult1st6a occurred in the common ancestor of PBT and SBT. The SBT sult1st6 5 International Journal of Genomics paralogs were confirmed also by the published male genome annotation (accession nos. XM_042430888 for sult1st6y on chromosome 13 and XM_042397350 for sult1st6a on chromosome 20, respectively). Using the transcript contigs as a reference, we mapped the reads in each of the 20 samples to those and measured the gene expression levels ( Figure 10). As a whole, the expression of SBT's sult1st6y/ sult1st6a was low compared to that of the control (0-2% of β-actin in FPKM). In detail, the expression of sult1st6a was observed in both the testis and ovary samples, although it tended to be low in the latter (0.003-0.05% of β-actin in FPKM, except for 0% for SRR2079766) (Figure 10(a)). The expression of sult1st6y was observed in all 10 testis samples, but none (i.e., FPKM = 0, undetectable) in ovary samples (Figure 10(b)). Here, we categorized the samples with FPKM > 0 as "expressed" and those with FPKM = 0 as "not expressed." Thus, the expression of sult1st6y was significantly biased between the testes and ovaries (P < 0:05 , Fisher's exact test).

Discussion
In this study, the male genome of Pacific bluefin tuna was sequenced using long-read sequencing technologies. The scaffold sequences assembled were more contiguous than those published in 2013, which were obtained from pyrosequencing and short paired-end reads. Completeness assessment indicates that the current version is more complete than that of the recent female genome, although the number of scaffolds (948) is larger than that for the female genome (444). Therefore, the current genome is expected to be useful for the validation and extension of previous results regarding      International Journal of Genomics the sex identification of PBT. In the previous study, three PCR primer sets were designed from the sequence of female scaffold F64, so that the amplification patterns were different between PBT males and females. In this study, we first searched for the locus amplified by the primer sets in the current male genome and found two regions: one in scaffold M44 and the other in M175. Based on the results of the large-scale sequence comparison, M44 was found to be the counterpart of F64, while M175 was male-specific. Since M44 and F64 were well aligned with each other, including 6.5 kb SLRs (SLR M44 /SLR F64 ), it is unlikely that large-scale misassemblies are present in M44/F64. The PCR amplification pattern in SLR M44 , estimated using primer sequences, was identical to that observed in females. Regarding M175, all the PCR amplicons with male-specific sizes were observed within the 10.7 kb region (SLR M175 ). It is worth noting that the PCR primer sets were originally designed based on the SNVs predicted in F64 by GWA analysis. Therefore, according to the previous results, M44 and M175 should be allelic to each other. However, these    M175 . Now, the previous result, in which male-specific heterozygous polymorphisms were detected in F64, needs to be reconsidered. The previous GWA scan was performed using the female genome as a reference, focusing on maleheterozygous SNVs or small indels, which may be detectable by the resequenced short reads. However, M175 found in the current study is a large-scale structural variant absent from the female genome and should be undetectable by shortread mapping. We considered the possibility that the reads derived from paralogous loci were cross-mapped to the female reference genome, causing fake heterozygous SNV calls in genotyping prior to GWA analysis. In the previous study, the repeat or paralogous sequences were not masked; therefore, it is likely that the reads derived from the paternal 10.7 kb region (i.e., SLR M175 ) were mapped to the 6.5 kb region of F64 in the analysis of male DNA samples. This hypothesis was confirmed by checking the mapped reads: at least 10% of the male reads mapped to SLR F64 in the previous analysis were estimated to be derived from SLR M175 . Furthermore, we compared the GWA results between the two conditions for the female reference sequence: duplicated regions were unmasked (i.e., raw sequences) and masked.
The result was clear-cut, where the peaks in multiple scaffolds were roughly reproduced in the unmasked sequences, but no such peaks were observed in the masked sequences. Thus, we conclude that the previous GWA result was an artifact due to cross-mapping, and many of the scaffolds predicted (e.g., F64) were not associated with the sex of PBT. Since we found no significant GWA peaks in the masked male reference genome, the SLR of PBT would be undetectable from the information of SNVs or small indels. The current results provide a caveat that traditional GWA analysis, which targets SNVs using short reads, may not always be sufficient for the identification of SLR in species with a genetic sex determination system.
Instead of SLR M44 , we found that SLR M175 is a malespecific locus absent from the female genome. It should be noted that this finding is due to a homology search of the primer sequences designed in the previous study. Therefore, it is possible that the current male genome contains other scaffolds carrying sex-associated structural variants. To check this possibility, we conducted a genome-wide scan based on a CNV detection scheme using mapping data. As a result, we observed that only M175 had a 61 kb region (18,001-78,750 bp) where the CNV was significantly different between males and females. Since this region is very close to SLR M175 , eventually the candidate sex-associated locus was narrowed down to a single region around or including SLR M175 . Regarding the CNV detected, we found that PBT females have no copy, while males have 0.5 copies of the region, indicating that this region exists in a haploid manner in males or, equivalently, that this region is paternally inherited. Thus, we conclude that this region (61-kb region + SLR M175 ) is the only candidate for the Y-linked locus in PBT. In M175, we predicted a total of 12 proteincoding genes, two of which (g3 and g4) were included in the predicted haploid region and one (g3) of which was annotated as encoding estrogen sulfotransferase (SULT1ST6), while the other (g4) was functionally unknown. We note that g4 has the domain "Integrase, catalytic core" or "Ribonuclease H superfamily" which is a feature of the retroviral integrase superfamily [40]. Regarding the other 10 predicted genes in M175, the functions are unknown. In particular, g1, g2, and g5 may be involved in transposons according to domain prediction. Therefore, around the male-specific haploid region, only the centermost g3, namely, sult1st6y, is functionally annotated and The current study suggests that sult1st6y is the most convincing gene involved in the sex determination or differenti-ation of PBT. The encoding protein, SULT1ST6, belongs to the family of cytosolic sulfotransferases (SULTs) and has been studied as a detoxifying enzyme in mammals [41]. Regarding fish species, the SULT genes have been investigated mainly in zebrafish, and 20 distinct paralogs (9

10
International Journal of Genomics SULT1s, 3 SULT2s, 5 SULT3s, 1 SULT4, 1 SULT5, and 1 SULT6) have been characterized. Among these, SULT1ST6, one of the nine SULT1 paralogs, displays strict substrate specificity for estrogens (e.g., estrone). This gene is thought to be involved in organogenesis, such as the development of the eye and muscle in zebrafish [42]. On the other hand, there is no report about the relationship between estrogen sulfotransferase and sex development. In fish, steroid hormones are considered to be the key factor for sex determination [2,43,44], wherein sex can change depending on the levels of estrogen or androgen production in gonad cells. The gene directly contributing to steroidogenesis has also been reported as a sex determination gene in yellowtails [9]. Since endogenous estrogens are inactivated by sulfation [45], it may not be surprising that the function of sult1st6 is involved in sex development. If this gene is expressed in gonad cells at the initiation of sex development, it may inhibit female development by depleting active estrogens, resulting in male development. In the present study, we found that sult1st6y of PBT was paternally inherited in the population. It is unlikely that an estrogen-inactivating gene with such a mode of inheritance is unrelated to the sex development of PBT. In addition, we found that both female and male PBT had a paralog of sult1st6y, namely, sult1st6a, in male scaffold M30 or female scaffold F47. It is worth noting that, in the previous study, F47 was another candidate carrying sex-associated SNVs [16], and the locus of sult1st6a was close to the artifact GWA peak. This observation is not only attributable to cross-mapping, as mentioned before, but also indirectly suggests that the sult1st6-like gene might be associated with the sex of PBT. Since sult1st6a is inherited regardless of sex in a diploid manner, we speculate that sult1st6a plays a common role in PBT males and females, for example, in organogenesis at the early developmental stage, as proposed in the model fish. Based on microsynteny, the gene order around sult1st6a was universally conserved across fish species, while sult1sy6y was alone surrounded by transposable element-related genes. This evolutionary stability is consistent with the possibility that the function of sult1st6a may be common to those in other fish species.
In contrast, there are little clues about the function of sult1st6y but might be explained by a sort of neofunctionalization among possible evolutionary mechanisms of duplicated genes [46]: sult1st6y could have been immune from the original role of sult1st6 as a redundant copy, leading to the gain of a new function related to sex development under relaxed selection. Here, it is assumed that some mutations may have occurred in the regulatory region of sult1st6y, which changes the expression pattern but not the enzymatic function itself as estrogen sulfotransferase. In fact, it has been reported that a copy of duplicated genes may be a sex determination gene in fishes [8,10,47]. We found that homologs of sult1st6y and sult1st6a were present in the southern bluefin tuna. Phylogenetic analysis suggested that gene duplication occurred in the common ancestor of PBT and SBT. The duplications of sult1st6 were observed also in other fish lineages, but the emergence of sult1st6y and sult1st6a was specific to the tuna lineage. In particular, the tuna' paralogs are located on different chromosomes based on the SBT male genome data, while those in other fish species are tandemly or closely located on the same chromosome. We speculate that the tissues in which sult1st6y is expressed and the timing of its expression may have been differentiated from those of sult1st6a in the proto-paternal line after the gene duplication in the ancestor of PBT and SBT. Actually, SBT sult1st6y was expressed in male's testis but not in female's ovary according to the transcriptome. This significant expression bias suggests that sult1st6y might have some sort of male-specific role in SBT, in particular, related to testis development. Regarding sult1st6a, the quantitative expression profile implies that this paralog as well as sult1st6y might be involved in testis development. In other words, sult1st6a might be bifunctionally utilized in the tissues assumed for fish sult1st6 and testis. On the other hand, since sult1st6a was poorly expressed in some male samples, it may not be always essential for testis development. Thus, we speculate that sult1st6a may be expressed in concert with sult1st6y to some extent, and the role of sult1st6a may be minor or secondary to that of sult1st6y. It should be noted that the expression of sult1st6a was at least detectable in S R R 2 0 7 9 7 5 1 S R R 2 0 7 9 7 5 2 S R R 2 0 7 9 7 5 3 S R R 2 0 7 9 7 5 4 S R R 2 0 7 9 7 5 5 S R R 2 0 7 9 7 5 6 S R R 2 0 7 9 7 5 7 S R R 2 0 7 9 7 5 8 S R R 2 0 7 9 7 5 9 S R R 2 0 7 9 7 6 0 S R R 2 0 7 9 7 6 1 S R R 2 0 7 9 7 6 2 S R R 2 0 7 9 7 6 3 S R R 2 0 7 9 7 6 4 S R R 2 0 7 9 7 6 5 S R R 2 0 7 9 7 6 6 S R R 2 0 7 9 7 6 7 S R R 2 0 7 9 7 6 8 S R R 2 0 7 9 7 6 9 S R R 2 0 7 9 7 7 0 S R R 2 0 7 9 7 5 1 S R R 2 0 7 9 7 5 2 S R R 2 0 7 9 7 5 3 S R R 2 0 7 9 7 5 4 S R R 2 0 7 9 7 5 5 S R R 2 0 7 9 7 5 6 S R R 2 0 7 9 7 5 7 S R R 2 0 7 9 7 5 8 S R R 2 0 7 9 7 5 9 S R R 2 0 7 9 7 6 0 S R R 2 0 7 9 7 6 1 S R R 2 0 7 9 7 6 2 S R R 2 0 7 9 7 6 3 S R R 2 0 7 9 7 6 4 S R R 2 0 7 9 7 6 5 S R R 2 0 7 9 7 6 6 S R R 2 0 7 9 7 6 7 S R R 2 0 7 9 7 6 8 S R R 2 0 7 9 7 6 9 S R R 2 0 7 9 7 7 0

11
International Journal of Genomics females (FPKM > 0 in nine out of 10 samples), although it was at a much lower level than in males. Therefore, the observation that the FPKMs of sult1st6y were zero in all the female samples cannot be explained by random factors, such as variations among individuals or loss of mRNA in library preparation, raising a hypothesis that sult1st6y might be originally absent from SBT females like PBT females. Although this hypothesis will need to be tested by sequencing the entire SBT female genome, our observations suggest that SBT sult1st6y may also be involved in sex development like PBT sult1st6y.
The results presented in this study provide an attractive hypothesis for sult1st6y being a sex determination gene in Thunnus fishes, or at least functioning at an important point in the sex-differentiation cascade. Our hypothesis may be tested by further genetic experiments, although the handling of tuna in tanks is not easy. Recently, genome-editing technology has been applied for the mutagenesis of tuna [48]. For example, it may be possible to check whether the sex ratio of larvae is heavily biased by the knockout of sult1st6y in fertilized eggs. Lastly, it is worth emphasizing that the CNV-based approach used in this study is very useful for identifying a trait-associated region using short reads when it is involved in large structural variants. In particular, when the region contains recently duplicated sequences, the traditional SNV-based approach may not only miss structural variants but also produce artifacts due to cross-mapping to unassociated regions. In other fish species, sex-associated SNVs have often been observed in multiple scaffolds [49][50][51], some of which might be cases of cross-mapping of short reads, unless the sexes are polygenically determined. We believe that a combination of traditional SNV-and CNV-based approaches is useful for the identification of sex-associated genes in fish species with a genetic sex determination system.

Conclusions
In this study, we constructed the highly contiguous genome sequence of male PBT and identified a single region present only in the male individuals. Genome-wide read depth scan suggested that this region was, among all the regions examined, the only one candidate paternally inherited. Therefore, the carrying genes were the only candidates of sex-associated gene in PBT. In particularly, a protein-coding gene, namely, sult1st6y, was predicted to encode estrogen sulfotransferase responsible for inactivation of estrogens, strongly suggesting that this gene is involved in PBT's sex determination or differentiation. In addition, we have found a paralog of sult1st6y, namely, sult1st6a, in a locus unassociated to PBT's sex in both of males and of females, and hypothesize that this gene may be involved in common estrogen metabolism. Both of sult1st6y and sult1st6a were found also in the SBT transcriptome, where sult1st6y was expressed in male testis, and absent or suppressed in female ovary. Since duplication of sult1st6y and sult1st6a predate the divergence of PBT and SBT, the function of sult1st6y may have been spatiotemporally differentiated from sult1st6a in the proto-paternal line of tuna species. Based on these results, we present a plausible hypothesis that sult1st6y is the sex-determinant gene in tuna or the gene almost close to the determination point. In this study, our results have provided a promising gene target for understanding tuna's sex determination system, although tunas are still difficult fishes for in vivo experiments. Finally, we address that our CNV-based approach is valid when the sex-associated loci may be derived from structural variants, which might have been missed in traditional SNV-based one. The combination of both approaches will be useful for exploring sex determination genes in a variety of fish species.

Data Availability
The genome data of Pacific bluefin tuna used to support the findings of this study have been deposited in the DNA Data Bank of Japan under accession nos. BOUD01000001-BOUD01000948.