《植物生物学》课程教学资源(文献资料)Phylogenomics of the genus Populus reveals extensive interspecific gene flow

ResearchGatecation/341478747Phylogenomics of the genus Populus reveals extensive interspecific gene flowandbalancing selectionArticle in New Phytologist-February 2020DOe 10,111/nph,16215CITATIONREADS111814 authors, includingMingchengWangLei ZhangSichuan UniversityDalian University of Technology14 PUBLICATIONS ~ 42 CITATIONS259 PUBLCATIONS2,743 CITATIONSSEEPROFILESEE PROFILEZhiyang ZhangMengmeng LiSichuan UniversityNanjing University9 PUBLICATIONS 2CITATIONS84 PUBLUCATIONS 678 CITATIONSSEEPROFLESEEPROFILESome of the authors of this publication are also working on these related projects:Optimal control problem of SPDEs View projectoPlastome phylogeny of Brassicaceae View projectAll content following this page was uploaded by Mingcheng Wang on 19 May 2020nloaded file
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/341478747 Phylogenomics of the genus Populus reveals extensive interspecific gene flow and balancing selection Article in New Phytologist · February 2020 DOI: 10.1111/nph.16215 CITATION 1 READS 118 14 authors, including: Some of the authors of this publication are also working on these related projects: Optimal control problem of SPDEs View project Plastome phylogeny of Brassicaceae View project Mingcheng Wang Sichuan University 14 PUBLICATIONS 42 CITATIONS SEE PROFILE Lei Zhang Dalian University of Technology 259 PUBLICATIONS 2,743 CITATIONS SEE PROFILE Zhiyang Zhang Sichuan University 9 PUBLICATIONS 2 CITATIONS SEE PROFILE Mengmeng Li Nanjing University 84 PUBLICATIONS 678 CITATIONS SEE PROFILE All content following this page was uploaded by Mingcheng Wang on 19 May 2020. The user has requested enhancement of the downloaded file

NewResearchPhytologistCheckforupdatesPhylogenomics of the genus Populus reveals extensiveinterspecificgeneflowandbalancingselectionMingcheng Wang'* , Lei Zhang'*, Zhiyang Zhang , Mengmeng Li', Deyan Wang', Xu Zhang, ZhenxiangXi',KenKeefover-Ring,LawrenceB.Smart,StephenP.DiFazio,Matthew S.Olson,TongmingYinJianquan Liuland TaoMa'Key Laboratory of Bio-Resource and Eco-Environmentof Ministry ofEducation, College ofLife Sciences, Sichuan University,Chengdu 610065, China: “State Key Laboratory of GrasslandAgro-Ecosystem, Instirute of Innovation Ecology & College of Life Sciences, Lanzhou Universiy, Lanzhou 730o0, China: Deparments of Borany and Geography, University of WisconsinMadison, 430 Lincoln Dr., Madison, WI 53706, USA;Horticulrure Section, School ofIntegrative Plan Science, New York State Agriculrural Experiment Sration, Cornll Universiry, Genea.NY14456,USA;Department of Biology,West Virginia Universiry,Morgantown,WV25606, USA:Deparment ofBiological Sciences,Texas Tech Universiry,Box 43131,Lubbock,X79409-3131,USA;Co-Innovation Center for SustainableForestry in Southern China,College of Forestry,Nanjing Forestry University,Nanjing 210037.ChinaSummaryAuthor for correspondence:.Phylogenetic analysis is complicated by interspecific gene flow and the presence of sharedTaoMaancestral polymorphisms,particularly those maintained by balancing selection. In this study,Tel:+8613519669951we aimed to examine the prevalence of these factors during the diversification of Populus, aEmail:matao.yz@gmail.commodel tree genus in the Northern HemisphereReceived:9February2019Weconstructed phylogenetic trees of 29Populus taxa using 80 individuals based on re-seAccepted:16September2019quenced genomes.Our species tree analyses recovered fourmain clades in thegenus basedon consensus nuclear phylogenies, but in conflict with the plastome phylogeny. A few inter-New Phytologist (2020) 225: 13701382specific relationships remained unresolved within the multiple-species clade because of incondoi:10.1111/nph.16215sistent genetrees.Our results indicated that gene flow has been widespread within eachclade and also occurred among the fourclades during their early divergence.We identified 45 candidate genes with ancient polymorphisms maintained by balancingKey words:balancingselection,gene flow,selection. These genes were mainly associated with mating compatibility,growth and stressphylogenomics, Populus, trans-specific poly-morphisms.resistance.·Bothgeneflow and selection-mediated ancientpolymorphisms areprevalent inthegenusPopulus.These arepotentiallyimportant contributorsto adaptive variation.Our resultsprovide a framework for the diversification of model tree genus that willfacilitate future com-parative studies.thelong-term maintenance of polymorphisms,and may alsoIntroductionaffect the extent of ILS, is balancing selection (Guerrero & Hahn,The phylogenetic histories of species are complicated, and it is2018),which maybe more common in plants than has been hisnow well understood that the persistence of ancestral polymor-torically recognised (Delph & Kelly, 2014). In the context ofphisms across multiple speciation events contributes to the pres-ILS, historical balancing selection actively maintains polymor-ence of gene genealogies that conflict with speciation historyphisms, and will result in a higher proportion of genes exhibiting(Mayr,1966;Schluter,2001;Coyne&Orr,2004).IncompleteILS once the lineages fix,likely because of weakened selectionlineage sorting (ILS), which results from the persistence of ances-pressures relative to drift. In such cases, orthologous sequencestral polymorphisms across multiple speciation events and thefrom the same loci will cluster by allele, rather than by species,subsequentrandomfixation of thesepolymorphisms in differentthereby distorting phylogenetictrees(Charlesworth 2006;Fijar-lineages, is one process that generates genealogical histories thatczyk & Babik,2015; Gao et al.,2015).However, although bal-are inconsistentwith the species tree(Tajima,1983;Pamilo &ancing selection should increase the frequency of ILS becauseNei, 1988; Degnan & Rosenberg, 2009). Because ILS requiresancient linked polymorphisms aremaintainedacross multiplethelong-term maintenance of ancestral polymorphisms relativespeciation events,it should not bias thegenealogical topologies ofto speciation events, ILS is expected to be much more prevalentlinked sites towards specific genealogical histories (Fijarczyk &inclades withrapid radiations (Wu,1991;Schluter,2000;Babik,2015; Gao et al, 2015).Arnold, 2006; Feng et al,2019).Another factor that influencesTo date, the influence of balancing selection on the maintenance of polymorphisms has been reported for only a limitednumber of taxonomic groups at a few loci.For example, the*These authors contributed equally to this work.1370:NewPhytologist(2020)225:1370-13822019The AuthorsNeuPhytologist2019 New Phytologist Trustwww.newphytologist.com
Phylogenomics of the genus Populus reveals extensive interspecific gene flow and balancing selection Mingcheng Wang1 * , Lei Zhang1 *, Zhiyang Zhang1 , Mengmeng Li1 , Deyan Wang1 , Xu Zhang2 , Zhenxiang Xi1 , Ken Keefover-Ring3 , Lawrence B. Smart4 , Stephen P. DiFazio5 , Matthew S. Olson6 , Tongming Yin7 , Jianquan Liu1,2 and Tao Ma1 1 Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China; 2 State Key Laboratory of Grassland Agro-Ecosystem, Institute of Innovation Ecology & College of Life Sciences, Lanzhou University, Lanzhou 730000, China; 3 Departments of Botany and Geography, University of WisconsinMadison, 430 Lincoln Dr., Madison, WI 53706, USA; 4 Horticulture Section, School of Integrative Plant Science, New York State Agricultural Experiment Station, Cornell University, Geneva, NY 14456, USA; 5 Department of Biology, West Virginia University, Morgantown, WV 25606, USA; 6 Department of Biological Sciences, Texas Tech University, Box 43131, Lubbock, TX 79409-3131, USA; 7 Co-Innovation Center for Sustainable Forestry in Southern China, College of Forestry, Nanjing Forestry University, Nanjing 210037, China Author for correspondence: Tao Ma Tel: +86 13519669951 Email: matao.yz@gmail.com Received: 9 February 2019 Accepted: 16 September 2019 New Phytologist (2020) 225: 1370–1382 doi: 10.1111/nph.16215 Key words: balancing selection, gene flow, phylogenomics, Populus, trans-specific polymorphisms. Summary Phylogenetic analysis is complicated by interspecific gene flow and the presence of shared ancestral polymorphisms, particularly those maintained by balancing selection. In this study, we aimed to examine the prevalence of these factors during the diversification of Populus, a model tree genus in the Northern Hemisphere. We constructed phylogenetic trees of 29 Populus taxa using 80 individuals based on re-sequenced genomes. Our species tree analyses recovered four main clades in the genus based on consensus nuclear phylogenies, but in conflict with the plastome phylogeny. A few interspecific relationships remained unresolved within the multiple-species clade because of inconsistent gene trees. Our results indicated that gene flow has been widespread within each clade and also occurred among the four clades during their early divergence. We identified 45 candidate genes with ancient polymorphisms maintained by balancing selection. These genes were mainly associated with mating compatibility, growth and stress resistance. Both gene flow and selection-mediated ancient polymorphisms are prevalent in the genus Populus. These are potentially important contributors to adaptive variation. Our results provide a framework for the diversification of model tree genus that will facilitate future comparative studies. Introduction The phylogenetic histories of species are complicated, and it is now well understood that the persistence of ancestral polymorphisms across multiple speciation events contributes to the presence of gene genealogies that conflict with speciation history (Mayr, 1966; Schluter, 2001; Coyne & Orr, 2004). Incomplete lineage sorting (ILS), which results from the persistence of ancestral polymorphisms across multiple speciation events and the subsequent random fixation of these polymorphisms in different lineages, is one process that generates genealogical histories that are inconsistent with the species tree (Tajima, 1983; Pamilo & Nei, 1988; Degnan & Rosenberg, 2009). Because ILS requires the long-term maintenance of ancestral polymorphisms relative to speciation events, ILS is expected to be much more prevalent in clades with rapid radiations (Wu, 1991; Schluter, 2000; Arnold, 2006; Feng et al., 2019). Another factor that influences the long-term maintenance of polymorphisms, and may also affect the extent of ILS, is balancing selection (Guerrero & Hahn, 2018), which may be more common in plants than has been historically recognised (Delph & Kelly, 2014). In the context of ILS, historical balancing selection actively maintains polymorphisms, and will result in a higher proportion of genes exhibiting ILS once the lineages fix, likely because of weakened selection pressures relative to drift. In such cases, orthologous sequences from the same loci will cluster by allele, rather than by species, thereby distorting phylogenetic trees (Charlesworth 2006; Fijarczyk & Babik, 2015; Gao et al., 2015). However, although balancing selection should increase the frequency of ILS because ancient linked polymorphisms are maintained across multiple speciation events, it should not bias the genealogical topologies of linked sites towards specific genealogical histories (Fijarczyk & Babik, 2015; Gao et al., 2015). To date, the influence of balancing selection on the maintenance of polymorphisms has been reported for only a limited *These authors contributed equally to this work. number of taxonomic groups at a few loci. For example, the 1370 New Phytologist (2020) 225: 1370–1382 2019 The Authors www.newphytologist.com New Phytologist 2019 New Phytologist Trust Research

NewResearch1371Phytologistdiverged alleles at the major histocompatibility locus (MHC) areunresolved between lineages from sect. Turanga ranging fromoften shared across distantly related vertebrate species (Kleincentral Asia to Africa and P.mexicana of the monotypic sect.et al.,2007), and the divergent ABOblood alleles exist togetherAbaso(Eckenwalder,1996).Moreover,phylogeniesconstructedin humans, gibbons and in Old World monkeys (Segurel et al.,with molecular data also conflict.A phylogenetic analyses of2012).In plants, classic examples ofdivergent alleles maintainedchloroplast genomes identified Turanga as the basal sectionby balancing selection include self-incompatibility (S) and disease(Zhang et al., 2018), whereas a phylogeny based on multiple low-resistance (R)genes (Takebayashi et al., 2003; Roux et al.,2013;copy genes suggested that P.mexicana was basal (X.Liu et al.,Karasoyet al.,2014).Recently trans-specific polymorphisms2017).These inconsistencies,togetherwith a lack of resolutionwere reported for five genes in Arabidopsis and distantly relatedin some prior phylogenetic analyses, may result from infuencesCapsella with divergence around 8 Myr, the function of whichof both interspecific gene fow and ILS of ancient polymor-phisms.involved in adaptation to divergent habitats (Wu et al., 2017).In order to investigate the phylogeny of Populus and factorsWith the ability to now generate large whole-genome data sets tothat have infuenced conficting genealogical histories among loci,explore phylogenetic histories, we also can better identify thepotential for long-term balancing selection to influence the main-we generated a whole-genome data set consisting of 80 individu-tenance of polymorphisms across speciation eventsals of 29taxa for thegenus.Our sampling covers all six sectionsUnlike ILS, historical hybridisation will bias the numbers ofof the genus and most distinct species as well as hybrids (Ecken-genealogies that exhibit histories in conflict with the history ofwalder,1996).We first reconstructed a backbone phylogeny forspeciation (Leache et al., 2014; Solis-Lemus et al.,2016).In fact,the genus based on variant sites within the single-copy genes.Wethe presence of this bias underlies the rationale for the ABBA-then examined the presence of hybridisation between the mainBABA test, which has been used to identify historical patterns oflineages by identity-by-descent (IBD) and ABBA-BABA analy-hybridisation throughout the tree of life (Green et al., 2010;ses. Finally, we selected six species from the three main lineagesDurand et al., 2011). One characteristic of this test is that theof thegenus to identify genes with ancient polymorphisms thatwere likely maintained by balancing selection.These resultsgenealogical effects of ancient hybridisation will persist in thegenomes of extant species (Lamichhaney et al., 2015; Novikovaprovide a detailed examination of the phylogenetic relationshipset al,2016;Feng et al.,2019).Plants arewell known tomaintainand evolutionarydiversification ofthemodel genusPopulus.the ability to hybridise even after clear morphological differentia-tion between species has occurred (Eaton et al, 2015; BauteMaterialsand Methodset al., 2016; Pease et al., 2016; Y. Liu et al, 2017; Crowl et al.,2019),which will complicate the reconstruction of speciationSamplecollection,sequencing andmappinghistories.Leaves representing 63 individuals of 24 species were collectedIn this studyweaimed toexaminegeneow andancient polymorphisms within diversification of the genus Populus.All speciesfrom natural populations and dried on silica gel (Supportingof thegenus are collectivelyknown as poplars and are widely dis-Information Table S1).For each individual, whole genomicDNA was extracted using the CTAB protocol (Doyle & Doyletributed in the Northern Hemisphere from subtropical to borealforests (Eckenwalder,1996),wherethey can act askeystone1987).Paired-end Illumina genomic libraries were prepared andspecies (Whitham et al.,2006).In addition, most poplars exhibitsequenced on the HiSeq 2000 and HiSeq 2500 Illumina plat-ecological fexibility, with diverse adaptations and large popula-forms following the manufacturer's instructions (llumina, Santion sizes.Numerous species havebeen artificially planted aroundDiego, CA, USA). Previously published genome sequences of 17the world, and poplars account for more than half of the plantedindividuals from six species were also included in our analysis(Slavov et al.,2012; Geraldes et al.,2015; Wang J.et al., 2016;trees in China, where they are used for the wood, pulp and paperindustries and for environmental restoration projects (IsebrandsMa et al, 2018) (Table S1). In total, genome resequencing data& Richardson, 2014). Although this genus has long been used asfor 80 individuals from 29 taxa covering all six sections of thea model for diverse studies in trees (Tuskan et al.,2006; Janssongenus were obtained.The raw reads were subjected to quality control. Low-quality&Douglas,2007),phylogenetic relationships withinthegenusremain unclear. Frequent interspecific hybridisation and clonalreads were removed if they met any of the following criteria: (1)expansion have perplexed taxonomists of the genus, with≥5% unidentified nucleotides;(2)a phred quality≤7for>65%acknowledged species, varieties, and hybrids ranging from 22 toof read length; and (3) reads overlapping >10bp with the85 (Eckenwalder, 1996).Six sections are traditionally recognisedadapter sequence,allowing<2bp mismatch.Wethen mappedthese high-quality reads to the P.trichocarpa reference genome(Abaso, Turanga, Populus, Leucoides, Tacamahaca and Aigeiros)based on morphological traits (Eckenwalder,1996).However,v.3.0 (Tuskan et al., 2006) using BwA-MEM v.0.7.12-r1039 withthese sections have not been consistently supported by moleculardefault parameters (Li&Durbin,2009).Duplicated reads wereevidence, and relationships among and within the sections haveremoved usingthermdupfunctionof SAMTooLSv.1.3.1(Libeen the subjects of controversy (Hamzeh & Dayanandan, 2004;et al., 2009).Finally,the Genome Analysis Toolkit (GATK)Cervera et al., 2005; Wang et al, 2014; X. Liu et al, 2017,v.3.6 (McKenna et al, 2010) was used to perform local realign-Zhang et al., 2018).For example, based on morphological traitsment of reads to enhance alignments in regions around putativeInDels.and fossil evidence, the basal lineages of Populus remain2019TheAuthorsNew Phytologist(2020)225:1370-1382NeuPlytologist2019New PhytologistTrustwww.newphytologist.com
diverged alleles at the major histocompatibility locus (MHC) are often shared across distantly related vertebrate species (Klein et al., 2007), and the divergent ABO blood alleles exist together in humans, gibbons and in Old World monkeys (Segurel et al., 2012). In plants, classic examples of divergent alleles maintained by balancing selection include self-incompatibility (S) and disease resistance (R) genes (Takebayashi et al., 2003; Roux et al., 2013; Karasov et al., 2014). Recently trans-specific polymorphisms were reported for five genes in Arabidopsis and distantly related Capsella with divergence around 8 Myr, the function of which involved in adaptation to divergent habitats (Wu et al., 2017). With the ability to now generate large whole-genome data sets to explore phylogenetic histories, we also can better identify the potential for long-term balancing selection to influence the maintenance of polymorphisms across speciation events. Unlike ILS, historical hybridisation will bias the numbers of genealogies that exhibit histories in conflict with the history of speciation (Leache et al., 2014; Solıs-Lemus et al., 2016). In fact, the presence of this bias underlies the rationale for the ABBA– BABA test, which has been used to identify historical patterns of hybridisation throughout the tree of life (Green et al., 2010; Durand et al., 2011). One characteristic of this test is that the genealogical effects of ancient hybridisation will persist in the genomes of extant species (Lamichhaney et al., 2015; Novikova et al., 2016; Feng et al., 2019). Plants are well known to maintain the ability to hybridise even after clear morphological differentiation between species has occurred (Eaton et al., 2015; Baute et al., 2016; Pease et al., 2016; Y. Liu et al., 2017; Crowl et al., 2019), which will complicate the reconstruction of speciation histories. In this study, we aimed to examine gene flow and ancient polymorphisms within diversification of the genus Populus. All species of the genus are collectively known as poplars and are widely distributed in the Northern Hemisphere from subtropical to boreal forests (Eckenwalder, 1996), where they can act as keystone species (Whitham et al., 2006). In addition, most poplars exhibit ecological flexibility, with diverse adaptations and large population sizes. Numerous species have been artificially planted around the world, and poplars account for more than half of the planted trees in China, where they are used for the wood, pulp and paper industries and for environmental restoration projects (Isebrands & Richardson, 2014). Although this genus has long been used as a model for diverse studies in trees (Tuskan et al., 2006; Jansson & Douglas, 2007), phylogenetic relationships within the genus remain unclear. Frequent interspecific hybridisation and clonal expansion have perplexed taxonomists of the genus, with acknowledged species, varieties, and hybrids ranging from 22 to 85 (Eckenwalder, 1996). Six sections are traditionally recognised (Abaso, Turanga, Populus, Leucoides, Tacamahaca and Aigeiros) based on morphological traits (Eckenwalder, 1996). However, these sections have not been consistently supported by molecular evidence, and relationships among and within the sections have been the subjects of controversy (Hamzeh & Dayanandan, 2004; Cervera et al., 2005; Wang et al., 2014; X. Liu et al., 2017, Zhang et al., 2018). For example, based on morphological traits and fossil evidence, the basal lineages of Populus remain unresolved between lineages from sect. Turanga ranging from central Asia to Africa and P. mexicana of the monotypic sect. Abaso (Eckenwalder, 1996). Moreover, phylogenies constructed with molecular data also conflict. A phylogenetic analyses of chloroplast genomes identified Turanga as the basal section (Zhang et al., 2018), whereas a phylogeny based on multiple lowcopy genes suggested that P. mexicana was basal (X. Liu et al., 2017). These inconsistencies, together with a lack of resolution in some prior phylogenetic analyses, may result from influences of both interspecific gene flow and ILS of ancient polymorphisms. In order to investigate the phylogeny of Populus and factors that have influenced conflicting genealogical histories among loci, we generated a whole-genome data set consisting of 80 individuals of 29 taxa for the genus. Our sampling covers all six sections of the genus and most distinct species as well as hybrids (Eckenwalder, 1996). We first reconstructed a backbone phylogeny for the genus based on variant sites within the single-copy genes. We then examined the presence of hybridisation between the main lineages by identity-by-descent (IBD) and ABBA–BABA analyses. Finally, we selected six species from the three main lineages of the genus to identify genes with ancient polymorphisms that were likely maintained by balancing selection. These results provide a detailed examination of the phylogenetic relationships and evolutionary diversification of the model genus Populus. Materials and Methods Sample collection, sequencing and mapping Leaves representing 63 individuals of 24 species were collected from natural populations and dried on silica gel (Supporting Information Table S1). For each individual, whole genomic DNA was extracted using the CTAB protocol (Doyle & Doyle, 1987). Paired-end Illumina genomic libraries were prepared and sequenced on the HiSeq 2000 and HiSeq 2500 Illumina platforms following the manufacturer’s instructions (Illumina, San Diego, CA, USA). Previously published genome sequences of 17 individuals from six species were also included in our analysis (Slavov et al., 2012; Geraldes et al., 2015; Wang J. et al., 2016; Ma et al., 2018) (Table S1). In total, genome resequencing data for 80 individuals from 29 taxa covering all six sections of the genus were obtained. The raw reads were subjected to quality control. Low-quality reads were removed if they met any of the following criteria: (1) ≥5% unidentified nucleotides; (2) a phred quality ≤ 7 for ˃65% of read length; and (3) reads overlapping > 10 bp with the adapter sequence, allowing < 2 bp mismatch. We then mapped these high-quality reads to the P. trichocarpa reference genome v.3.0 (Tuskan et al., 2006) using BWA-MEM v.0.7.12-r1039 with default parameters (Li & Durbin, 2009). Duplicated reads were removed using the ‘rmdup’ function of SAMTOOLS v.1.3.1 (Li et al., 2009). Finally, the Genome Analysis Toolkit (GATK) v.3.6 (McKenna et al., 2010) was used to perform local realignment of reads to enhance alignments in regions around putative InDels. 2019 The Authors New Phytologist 2019 New Phytologist Trust New Phytologist (2020) 225: 1370–1382 www.newphytologist.com New Phytologist Research 1371

New1372 ResearchPhytologistthen extracted across all 29 species and divided into threeSingle nucleotide polymorphisms and genotype callingdatasets: (1) the first and second codon positions (C2); (2) theSingle nucleotide polymorphisms (SNPs) and short InDels werethird codon position (C3); and (3) complete coding sequencescalled with GATK UnifiedGenotyper with default parameters for(CDS). For each dataset, the individual gene trees were con-each species separately.Somefiltering steps were performed tostructed using RAxML v.8.0.17 and a species tree was estimatedreduce false positives: (1) SNPs and InDels with a quality scoreusing recently developed coalescence methods in MP-EST v.1.550 taxa.Finallya majority-rule consensus tree of>3was considered statisticallysignificant.thebootstrapped treeswas generated using the'consensus'func-tion of the R package APE (Paradis et al., 2004) and support val-Identification of trans-specific polymorphisms underues of tree splits were calculated using the SuMTREES programbalancingselectionfrom the DENDRoPY package (Sukumaran & Holder, 2010).TheTo investigate trans-specific polymorphisms within Populus, wesame pipeline was applied to SNVs at four-fold degenerate sites(4D SNVs).analysed 72 additional individuals of six species from previouslyWe also identified single-copy genes using the OrthoMCL (Lipublished data sets representing three of the major lineages(Slavov et al., 2012; Geraldes et al, 2015; Wang J. et al, 2016;etal.,2003)method forall protein-codinggenes from seven Sali-caceae species:S. suchowensis (Dai et al., 2014),S.purpureaMa et al.,2018).We applied the same criteria as used above for(Zhou et al.,2018),P.trichocarpa (Tuskan et al., 2006),reads mapping, SNP and genotype calling and filtering,and onlyP. euphratica (Ma et al, 2013), P. pruinosa (Yang et al, 2017),retained SNVs with missing genotype rates <20% in all sixspecies. Shared biallelic SNVs were counted and the genomicP.deltoides(https://genome.jgi.doe.gov/)and P.albavar.pyramidalis (Ma et al., 2019).The SNVs within these genes weredivergence (Fst) between each pair ofspecies was estimated using2019TheAuthorsNeo Pbytologist (2020)225: 1370-1382NeuwPbytologist2019 New Phytologist Trustwww.newphytologist.com
Single nucleotide polymorphisms and genotype calling Single nucleotide polymorphisms (SNPs) and short InDels were called with GATK UnifiedGenotyper with default parameters for each species separately. Some filtering steps were performed to reduce false positives: (1) SNPs and InDels with a quality score 50 taxa. Finally a majority-rule consensus tree of the bootstrapped trees was generated using the ‘consensus’ function of the R package APE (Paradis et al., 2004) and support values of tree splits were calculated using the SUMTREES program from the DENDROPY package (Sukumaran & Holder, 2010). The same pipeline was applied to SNVs at four-fold degenerate sites (4D SNVs). We also identified single-copy genes using the OrthoMCL (Li et al., 2003) method for all protein-coding genes from seven Salicaceae species: S. suchowensis (Dai et al., 2014), S. purpurea (Zhou et al., 2018), P. trichocarpa (Tuskan et al., 2006), P. euphratica (Ma et al., 2013), P. pruinosa (Yang et al., 2017), P. deltoides (https://genome.jgi.doe.gov/) and P. alba var. pyramidalis (Ma et al., 2019). The SNVs within these genes were then extracted across all 29 species and divided into three datasets: (1) the first and second codon positions (C12); (2) the third codon position (C3); and (3) complete coding sequences (CDS). For each dataset, the individual gene trees were constructed using RAXML v.8.0.17 and a species tree was estimated using recently developed coalescence methods in MP-EST v.1.5 (Liu et al., 2010) and ASTRAL v.4.11 (Mirarab et al., 2014). Gene trees were superimposed using DENSITREE (Bouckaert, 2010). The ETE2 package (Huerta-Cepas et al., 2010) was used to examine different topologies of gene trees. We also estimated a ‘concatenation tree’ using RAXML v.8.0.17 for concatenated sequences of the C12, C3 and CDS datasets, respectively. Finally, we reconstructed a ML chloroplast DNA phylogeny based on 77 concatenated genes present in all the Salicaceae species using RAXML with 500 bootstrap replicates. Identification of gene flow To detect shared haplotypes and thus possible gene flow between species, we performed an IBD blocks analysis based on wholegenome SNVs using BEAGLE v.4.1 (Browning & Browning, 2013) with the following parameters: window = 50 000; overlap = 5000; ibdtrim = 100; ibdlod = 10. To evaluate the correlations between IBD block length and recombination, the population-scaled recombination rates (q) of P. trichocarpa and P. tremula were obtained from a previous study (Wang J. et al., 2016). We also performed ABBA–BABA analysis using S. suchowensis as an outgroup in all comparisons. In brief, for the ordered alignment (((S1, S2), S3), O), two classes of shared derived alleles were identified: the ABBA site refers to a pattern in which S1 has the outgroup allele and S2 and S3 share the derived allele, the BABA site corresponds to patterns in which S1 and S3 share the derived allele and S2 has the outgroup allele. D statistics were then calculated as (ABBA BABA)/ (ABBA + BABA) (Green et al., 2010). Under the null hypothesis of ILS, the number of ABBA and BABA sites is expected to be equal (D = 0). Alternatively, significant deviation of D from 0 suggests other events, in particular S3 exchanging genes with S1 or S2 (Durand et al., 2011). D statistics were estimated using ANGSD 0.9.21 (Korneliussen et al., 2014) with a block size of 5 Mb, and Z-scores were calculated using the m-block jackknife method (Busing et al., 1999). A Z-score with an absolute value > 3 was considered statistically significant. Identification of trans-specific polymorphisms under balancing selection To investigate trans-specific polymorphisms within Populus, we analysed 72 additional individuals of six species from previously published data sets representing three of the major lineages (Slavov et al., 2012; Geraldes et al., 2015; Wang J. et al., 2016; Ma et al., 2018). We applied the same criteria as used above for reads mapping, SNP and genotype calling and filtering, and only retained SNVs with missing genotype rates < 20% in all six species. Shared biallelic SNVs were counted and the genomic divergence (FST) between each pair of species was estimated using New Phytologist (2020) 225: 1370–1382 2019 The Authors www.newphytologist.com New Phytologist 2019 New Phytologist Trust Research New 1372 Phytologist

NewResearch1373PhytologistVCFro0LS v.0.1.14 (Danecek et al., 2011). Joint allele frequencyAigeiros, Tacamahaca and Leucoides, which werefertofrom thisspectra were calculated for biallelic sites berween species,point forward as'ATL'.The divergence of these four clades wereunfolded using S.suchowensis as the outgroup,and plotted usingalso supported by a principal component analysis (Fig.S2).WeDADI v.1.7.0 (Gutenkunst et al., 2009).found that c.6% of the identified genome-wide SNVs wereTo identify genes under balancing selection, we focused onlyshared among these clades (Fig. S3),which may contribute signifon SNVs located in genic regions and shared by all six species.icantly to phylogenetic inconsistencies.To filter out potential duplicated genes,we estimated the copyTo further investigate the phylogenetic conficts of this genus,number for these genes in each individual based on the ratio ofwe identified 5305 single-copy orthologous genes across sevenexon coverage depth divided by genome coverage depth (Hast-Salicaceae genomes and extracted 620531 SNVs within the cod-ings et al., 2009), and retained genes with a ratio between 0.4ing region of these genes. Individual gene trees constructed fromand 1.6 in all individuals. To avoid the potential for samplingthese data generally had low support values and the relationshipssites influenced by convergence resulting from repeated muta-among sections of Populus werehighly variable among them (Figstions among shared SNVs, we considered genes with more thanS4,S5).Wealso constructed concatenationtrees based ondiffer-one shared SNV and at least one shared SNV in coding regions.ent partitions of these orthologues, and the results consistentlyFinally,the genes with at least two shared SNVs in linkage dise-supported the basal position of sect.Abaso and followed by thequilibrium (2>0.3) in all three major lineages were selected assuccessive divergences of the other three clades: sect. Turanga,candidate genes under balancing selection.sect. Populus and ATL (Fig. S6). The Ci2 and CDS concatena-tion trees supported the placement of sect.Turanga as basal tosects.Populus and ATL, whereas in the C3 concatenation tree,Dataaccessibilitysect.Populus diverged earlier than sect.Turanga.These phylogeThe sequencing data have been deposited in the Genomenetic conflicts among different gene partitions of orthologuesSequence Archive in the BIGData Center (BIG Data Centerwere also observed in our species tree analyses. Both AsTRAL andMembers, 2019), Beijing Institute of Genomics (BIG), ChineseMP-EST methods generated species trees nearly identical withthe concatenation tree when applied to different gene partitionsAcademy of Sciences, under accession number CRA001510 thatis publicly accessible at http:/bigd.big.ac.cn/gsa. The whole-(Figs S7,S8): sect.Populus was placed sister to the ATL clade ingenome SNV data and gene trees have been deposited in GitHuball the species trees except the C3 partition.However,this phylo(https://github.com/wangmcyao/Whole-genome-SNPs-and-gegenetic relationship was supported byCspartition after eliminat-ing gene trees with low values of average bootstrap supportne-trees-of-genus-Populus).(Fig, S9). Among these trees, the Ci2 species tree had the highestbootstrap support in all the major nodes, and thus was consideredResultsas the most reliable topologyto resolvethe Populus phylogeny(Fig, la). However, based on the phylogenetic analyses of plas-Phylogenetic analysestome sequences, onlythe monophyly of sect.Turanga was sup-To clarify the phylogenetic relationships within Populus, we col-ported, which occupied the basal position, while P. mexicana oflected 1.12Tb of whole-genome sequencing data of 80 individu-sect. Abaso clustered with species from sects. Aigeiros (P. deltoidesals from 29 species, representing all six sections and almost all ofand P.fremonti),Tacamabaca (P.angustifolia,P.trichocarpa andthe currently recognised species of this genus (Isebrands &P. balsamifera) and Leucoides (P. heterophylla) (Fig. S10).Richardson, 2014). Sequences were first aligned to the referencegenome of P.trichocarpa and about 88% were successfullyExtensivegeneflowmapped, covering 81% of the genome and yielding an averageThe highly variable relationships indicated by the gene trees, anddepth of 29 × per individual (Table S1). We applied stringentvariant calling and quality filters to identify a final set of 12.93the striking discordance between the plastome phylogeny and ourmillion biallelic SNVs (Table S2). Among these, 1.94 millioncoalescence tree,maybe due to ILS and interspecific gene fow.nonsynonymousSNVsand1.8millionsynonymousSNVswereTo gain further insight into the relationships among the species,identified.we searched for IBD haplotypes using BEAGLE. No IBD blocksWe next inferred a concatenated genome tree using a MLwere identified across sects.Abaso, Turanga, Populus or ATL;method for genome-wide and four-fold degenerate (4D) SNVs,however, abundant shared IBD blocks were detected in compar-respectively. Both trees were highly resolved for most clades andisons within the four major sections, both between and withinshowed nearly identical topologies, with the only differences inspecies (Figs 2a, S11). Of the shared blocks between species,the positions of P. lasiocarpa and P.ningshanica (Fig.Si). In both75.9% (62 539) were detected within the ATL clade, while 15.2-phylogenies, P.mexicana of sect.Abaso diverged first with high%(12483)were detected within sect.Populus,and only8.9%support, followed by sect. Populus, which was monophyletic and(7323) within sect. Turanga. The length of IBD haplotypesclearly divided into two subclades, one with only Asian speciesshared between species (Fig. S1la) ranged from 1.9 kb to 1.5 Mband a second dlade with species representing Asian, Europe and(median=23.2 kb). As expected, extensively shared haplotypesNorth America. The next split included the monophyletic sect.were found berween recently diverged species, for example,Turanga and a polyphyletic cladeconsisting of members of sects.between P.trichocarpa and P. balsamifera (median=15.1kb,2019TheAuthorsNewPhytologist(2020)225:1370-1382New Plytologist @2019New Phytologist Trustwww.newphytologist.com
VCFTOOLS v.0.1.14 (Danecek et al., 2011). Joint allele frequency spectra were calculated for biallelic sites between species, unfolded using S. suchowensis as the outgroup, and plotted using DADI v.1.7.0 (Gutenkunst et al., 2009). To identify genes under balancing selection, we focused only on SNVs located in genic regions and shared by all six species. To filter out potential duplicated genes, we estimated the copy number for these genes in each individual based on the ratio of exon coverage depth divided by genome coverage depth (Hastings et al., 2009), and retained genes with a ratio between 0.4 and 1.6 in all individuals. To avoid the potential for sampling sites influenced by convergence resulting from repeated mutations among shared SNVs, we considered genes with more than one shared SNV and at least one shared SNV in coding regions. Finally, the genes with at least two shared SNVs in linkage disequilibrium (r 2 > 0.3) in all three major lineages were selected as candidate genes under balancing selection. Data accessibility The sequencing data have been deposited in the Genome Sequence Archive in the BIG Data Center (BIG Data Center Members, 2019), Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, under accession number CRA001510 that is publicly accessible at http://bigd.big.ac.cn/gsa. The wholegenome SNV data and gene trees have been deposited in GitHub (https://github.com/wangmcyao/Whole-genome-SNPs-and-ge ne-trees-of-genus-Populus). Results Phylogenetic analyses To clarify the phylogenetic relationships within Populus, we collected 1.12 Tb of whole-genome sequencing data of 80 individuals from 29 species, representing all six sections and almost all of the currently recognised species of this genus (Isebrands & Richardson, 2014). Sequences were first aligned to the reference genome of P. trichocarpa and about 88% were successfully mapped, covering 81% of the genome and yielding an average depth of 29 9 per individual (Table S1). We applied stringent variant calling and quality filters to identify a final set of 12.93 million biallelic SNVs (Table S2). Among these, 1.94 million nonsynonymous SNVs and 1.8 million synonymous SNVs were identified. We next inferred a concatenated genome tree using a ML method for genome-wide and four-fold degenerate (4D) SNVs, respectively. Both trees were highly resolved for most clades and showed nearly identical topologies, with the only differences in the positions of P. lasiocarpa and P. ningshanica (Fig. S1). In both phylogenies, P. mexicana of sect. Abaso diverged first with high support, followed by sect. Populus, which was monophyletic and clearly divided into two subclades, one with only Asian species and a second clade with species representing Asian, Europe and North America. The next split included the monophyletic sect. Turanga and a polyphyletic clade consisting of members of sects. Aigeiros, Tacamahaca and Leucoides, which we refer to from this point forward as ‘ATL’. The divergence of these four clades were also supported by a principal component analysis (Fig. S2). We found that c. 6% of the identified genome-wide SNVs were shared among these clades (Fig. S3), which may contribute significantly to phylogenetic inconsistencies. To further investigate the phylogenetic conflicts of this genus, we identified 5305 single-copy orthologous genes across seven Salicaceae genomes and extracted 620 531 SNVs within the coding region of these genes. Individual gene trees constructed from these data generally had low support values and the relationships among sections of Populus were highly variable among them (Figs S4, S5). We also constructed concatenation trees based on different partitions of these orthologues, and the results consistently supported the basal position of sect. Abaso and followed by the successive divergences of the other three clades: sect. Turanga, sect. Populus and ATL (Fig. S6). The C12 and CDS concatenation trees supported the placement of sect. Turanga as basal to sects. Populus and ATL, whereas in the C3 concatenation tree, sect. Populus diverged earlier than sect. Turanga. These phylogenetic conflicts among different gene partitions of orthologues were also observed in our species tree analyses. Both ASTRAL and MP-EST methods generated species trees nearly identical with the concatenation tree when applied to different gene partitions (Figs S7, S8): sect. Populus was placed sister to the ATL clade in all the species trees except the C3 partition. However, this phylogenetic relationship was supported by C3 partition after eliminating gene trees with low values of average bootstrap support (Fig. S9). Among these trees, the C12 species tree had the highest bootstrap support in all the major nodes, and thus was considered as the most reliable topology to resolve the Populus phylogeny (Fig. 1a). However, based on the phylogenetic analyses of plastome sequences, only the monophyly of sect. Turanga was supported, which occupied the basal position, while P. mexicana of sect. Abaso clustered with species from sects. Aigeiros (P. deltoides and P. fremontii), Tacamahaca (P. angustifolia, P. trichocarpa and P. balsamifera) and Leucoides (P. heterophylla) (Fig. S10). Extensive gene flow The highly variable relationships indicated by the gene trees, and the striking discordance between the plastome phylogeny and our coalescence tree, may be due to ILS and interspecific gene flow. To gain further insight into the relationships among the species, we searched for IBD haplotypes using BEAGLE. No IBD blocks were identified across sects. Abaso, Turanga, Populus or ATL; however, abundant shared IBD blocks were detected in comparisons within the four major sections, both between and within species (Figs 2a, S11). Of the shared blocks between species, 75.9% (62 539) were detected within the ATL clade, while 15.2- % (12 483) were detected within sect. Populus, and only 8.9% (7323) within sect. Turanga. The length of IBD haplotypes shared between species (Fig. S11a) ranged from 1.9 kb to 1.5 Mb (median = 23.2 kb). As expected, extensively shared haplotypes were found between recently diverged species, for example, between P. trichocarpa and P. balsamifera (median = 15.1 kb, 2019 The Authors New Phytologist 2019 New Phytologist Trust New Phytologist (2020) 225: 1370–1382 www.newphytologist.com New Phytologist Research 1373

New1374 ResearchPhytologist(a)(b)4t0'Nda (3)e0w120'2120'w0'Sect.AigeirosSect.LeucoidesSect.TacamahacaSect. PopulusSect. TurangaSect.AbasoFig.1(a)Phylogenetic relationships among 29 Populustaxa (80 samples)and two Salix species rootedonS.purpurea andS.suchowensis basedon thefirst and second codon positions of 5305 single-copy genes analysed with AsTRAL species tree methods. Numbers at each node represent bootstrap values.The numbers in parentheses nextto thetaxa names representthe numberof samples for eachtaxon.(b)Thegeographical distributions of six intragenicsections of genus Populus.maximum=1.53Mb)ofsect.Tacamahaca(Fig.S11b),betrweenhybridisation occurred among poplar species within the same sec-P.rotundifoliaandP.davidiana(median=14.1kb,maxi-tion or clade.mum=1.15Mb) of sect.Populus (Fig.S1lc), and betweenTofurther examine the possible occurrence of gene fow acrossP.euphraticaandP.pruinosa(median=7.3kb,maxi-the deepclades comprising thefourmajor sections of Populus,wemum=454.1kb)of sect.Turanga (Fig.S11d).All of theseperformed additional ABBA-BABA test based on the phyloge-closely related species showed evidence of extensive interspecificneticrelationshipsrecoveredabove.WefoundthatP.mexicanagene fow in previous studies (Meirmans et al.,2010; Zhengwas more closely related to the ATL clade than to any otheret al., 2017; Ma et al.,2018).Moreover, we also found that thespecies of sects.Turanga and Populus (Fig.2b).Further detailedhybrid aspen,P.× canescens,shared much longer haplotypes (me-analysesrevealedgeneflowbetweenP.mexicanaanddian=17.2kb,maximum=1.37Mb)with its parents,P.albaP. beteropbylla, which was statistically significant regardless of cheandP.tremula (Fig.S12a),asexpected.A similarlength distribu-ATL species used for the statistical comparison (Z-score>3 andtion of shared haplotypeswas found for P.wulianensis andP-value <0.0027; Fig. 2c). Both P. mexicana and P. beteropbyllaP. ningshanica (Fig. S12b), both of which might be interspecificare found in North America, but their range limits do not cur-hybrids between P. adenopoda and P. davidiana. We observed arentlyoverlap(Isebrands&Richardson,2014).Similarly,wealsonegative relationship between IBD length and recombinationfound that P.pruinosa was more closely related to the commonrates for closely related species, but not for distantly divergedancestor of P.trinervis, P.simonii, P.yunnanensis and P.nigraspecies (Fig.S13; Table S3), supporting theoretical predictionsthan to other species of sects.Aigeiros and Tacamahacathat theIBDblocks will degrade over time due to recombination,(Fig.S15a),while the common ancestor of P.davidiana andeven in regions where recombination is rare.In addition, we usedP.rotundifolia was more closely related to P.lasiocarpa than toABBA-BABA tests to examine gene flow within each clade.Ourany other species of the ATL clade (Fig. S15b), suggesting ancientresults suggested that most pairs of species within each cladegene flow between these species (Fig.S16).These complexshowed obvious gene flow and those that there was a positiveadmixture histories were also supported by the widespread incon-relationship between the extent of IBD and the level of gene flowgruence between the chloroplast tree and the species tree (X. Liu(Fig. S14; Table S4). All of these results indicated that frequentet al., 2017; Zhang et al., 2018; here), and therefore could beNewo Pbytologist (2020)225: 1370-1382@2019TheAuthorswww.newphytologist.comNewPhytologist @2019New Phytologist Trust
maximum = 1.53 Mb) of sect. Tacamahaca (Fig. S11b), between P. rotundifolia and P. davidiana (median = 14.1 kb, maximum = 1.15 Mb) of sect. Populus (Fig. S11c), and between P. euphratica and P. pruinosa (median = 7.3 kb, maximum = 454.1 kb) of sect. Turanga (Fig. S11d). All of these closely related species showed evidence of extensive interspecific gene flow in previous studies (Meirmans et al., 2010; Zheng et al., 2017; Ma et al., 2018). Moreover, we also found that the hybrid aspen, P. 9 canescens, shared much longer haplotypes (median = 17.2 kb, maximum = 1.37 Mb) with its parents, P. alba and P. tremula (Fig. S12a), as expected. A similar length distribution of shared haplotypes was found for P. wulianensis and P. ningshanica (Fig. S12b), both of which might be interspecific hybrids between P. adenopoda and P. davidiana. We observed a negative relationship between IBD length and recombination rates for closely related species, but not for distantly diverged species (Fig. S13; Table S3), supporting theoretical predictions that the IBD blocks will degrade over time due to recombination, even in regions where recombination is rare. In addition, we used ABBA–BABA tests to examine gene flow within each clade. Our results suggested that most pairs of species within each clade showed obvious gene flow and those that there was a positive relationship between the extent of IBD and the level of gene flow (Fig. S14; Table S4). All of these results indicated that frequent hybridisation occurred among poplar species within the same section or clade. To further examine the possible occurrence of gene flow across the deep clades comprising the four major sections of Populus, we performed additional ABBA–BABA test based on the phylogenetic relationships recovered above. We found that P. mexicana was more closely related to the ATL clade than to any other species of sects. Turanga and Populus (Fig. 2b). Further detailed analyses revealed gene flow between P. mexicana and P. heterophylla, which was statistically significant regardless of the ATL species used for the statistical comparison (Z-score > 3 and P-value < 0.0027; Fig. 2c). Both P. mexicana and P. heterophylla are found in North America, but their range limits do not currently overlap (Isebrands & Richardson, 2014). Similarly, we also found that P. pruinosa was more closely related to the common ancestor of P. trinervis, P. simonii, P. yunnanensis and P. nigra than to other species of sects. Aigeiros and Tacamahaca (Fig. S15a), while the common ancestor of P. davidiana and P. rotundifolia was more closely related to P. lasiocarpa than to any other species of the ATL clade (Fig. S15b), suggesting ancient gene flow between these species (Fig. S16). These complex admixture histories were also supported by the widespread incongruence between the chloroplast tree and the species tree (X. Liu et al., 2017; Zhang et al., 2018; here), and therefore could be (b) P. angustifolia (1) P. yunnanensis (3) P. trinervis (3) P. wulianensis (3) P. lasiocarpa (3) P. simonii (3) P. laurifolia (1) P. balsamifera (3) P. ilicifolia (3) P. rotundifolia (3) P. qiongdaoensis (3) P. mexicana (3) P. tremula (3) P. davidiana (3) P. trichocarpa (3) P. euphratica (3) P. alba (3) P. szechuanica (3) P. tremuloides (3) P. x canescens (3) P. deltoides (3) P. nigra (3) P. pruinosa (3) P. heterophylla (2) P. wilsonii (3) P. fremontii (1) P. ningshanica (3) P. koreana (3) P. adenopoda (3) S. purpurea (1) (a) 100 100 97 100 98 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 98 100 - 99 95 100 100 100 100 100 100 100 100 - 100 100 100 100 100 100 100 100 95 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 S. suchowensis (1) 100 100 100 100 100 100 100 93 100 - 100 100 100 100 100 100 - Sect. Tacamahaca Sect. Aigeiros Sect. Leucoides Sect. Populus Sect. Turanga Sect. Abaso 120 W 60 W 0 60 E 120 E 180 0 30 N 60 N 0 30 N 60 N 0 30 N 60 N 0 30 N 60 N Fig. 1 (a) Phylogenetic relationships among 29 Populus taxa (80 samples) and two Salix species rooted on S. purpurea and S. suchowensis based on the first and second codon positions of 5305 single-copy genes analysed with ASTRAL species tree methods. Numbers at each node represent bootstrap values. The numbers in parentheses next to the taxa names represent the number of samples for each taxon. (b) The geographical distributions of six intragenic sections of genus Populus. New Phytologist (2020) 225: 1370–1382 2019 The Authors www.newphytologist.com New Phytologist 2019 New Phytologist Trust Research New 1374 Phytologist

NewResearch1375Phytologist(b)(a)-SUTaoSect.Aigeiro01SectLeucoide:(c)Sect.PonulusT1Sect.TurangaSect.Abaso500Fig.2 (a) Estimated haplotype sharing in the genus Populus.Heatmap colours representthe total length (below thediagonal) and the total number (abovethe diagonal) of identity-by-descent (IBD) blocks for each pairwise comparison. (b,c) ABBA-BABA tests provide evidence of gene flow betweenP.mexicanaandP.heterophylla.partially responsible for the recovery of paraphyly for sect.trans-specific polymorphisms (Table S6; Fig. S18). We found c.Leucoides with respect to sects.Aigeiros and Tacamabaca (Fig.la).2.17-2.82 million trans-specific polymorphisms among speciespairs within the same clade, and c. 0.21-0.54 million amongspecies pairs from different clades (Table S6). However, it is notTrans-specific polymorphisms under long-term balancingclear whether these trans-specific polymorphisms are maintainedselectionby balancing selection, ILS or gene flow.Shared polymorphisms among species can be maintained notTo investigate thisfurther, wefocused on 7711 SNVs that seonly by ILS and interspecific gene fow, but also by long-termregated in all six species (Fig. S18a). Among these, we observedbalancing selection (Charlesworth, 2006; Fijarczyk & Babik,2925 SNVs located in the genic regions of 1007 genes2015).To investigatethe trans-specific polymorphisms under(Table S7). After excluding genes that showed evidence of dele-long-term balancing selection in the genus Populus, we furthertion or duplication by copy number filtering, 484 genes contain-performed a whole-genome scan across 72 individuals from sixing 1031 SNVs were retained (Fig.3a).To prevent any sharedspecies (Table S5),P.trichocarpa,P.balsamifera,P.tremula,SNVs due to repeated mutations, we selected only genes withP.tremuloides, P.eupbratica and P.pruinosa,representingthreetwo or more shared SNVs, at least one of which was in an exon.clades with sufficient divergence (6-11Myr,Zhang et al.,2018).This identified 100 genes containing 562 shared SNVs. AsThese demographic requirements ensure that observed trans-expected for ancestral polymorphisms maintained by selection,specific polymorphisms are more likely to have been maintainedthese genes showed substantially higher values of Tajima's D thanunder balancing selection in each lineage, rather than just drift.for all genesin thegenome(Fig.3b).Finally,we focused onBecause only three individuals fromthe same small population ofgenes with at least two shared SNVs that were in strong linkageP.mexicana were sampled, we excluded this clade for our analy-disequilibrium (>0.3) across species from all three deeplyses. The genome-wide averages of genetic divergence (Fsr)diverged clades to reduce false positives. Using these strict crite-ria, we ended up with 45 genes containing 150 shared SNVsbetween species ranged from0.29to0.41forcomparisons withinthe same cladeand from0.75to0.86forcomparisons between(Table S8). These genes with shared haplotypes showed muchdifferent clades (Table S6), indicating high genetic differentiationhigher nucleotide diversity (π) and intermediate allele frequencyamongthesespecies,especiallyamongthethreecladesthatwerelativetothegenome(TableS8),andthusweremorelikelytoanalysed. This was also supported by the lack of correlationbe maintained by long-term balancing selection. We determinedamong allele frequencies for polymorphisms shared across speciesthe haplotypes in the identified regions for each species and(Fig. S17). Despite their clear divergence, we found abundantfound that these sequences clustered by allele rather than species@2019TheAuthorsNewPhytologist(2020)225:1370-1382New Plytologist @2019New PhytologistTrustwww.newphytologist.com
partially responsible for the recovery of paraphyly for sect. Leucoides with respect to sects. Aigeiros and Tacamahaca (Fig. 1a). Trans-specific polymorphisms under long-term balancing selection Shared polymorphisms among species can be maintained not only by ILS and interspecific gene flow, but also by long-term balancing selection (Charlesworth, 2006; Fijarczyk & Babik, 2015). To investigate the trans-specific polymorphisms under long-term balancing selection in the genus Populus, we further performed a whole-genome scan across 72 individuals from six species (Table S5), P. trichocarpa, P. balsamifera, P. tremula, P. tremuloides, P. euphratica and P. pruinosa, representing three clades with sufficient divergence (6–11 Myr, Zhang et al., 2018). These demographic requirements ensure that observed transspecific polymorphisms are more likely to have been maintained under balancing selection in each lineage, rather than just drift. Because only three individuals from the same small population of P. mexicana were sampled, we excluded this clade for our analyses. The genome-wide averages of genetic divergence (FST) between species ranged from 0.29 to 0.41 for comparisons within the same clade and from 0.75 to 0.86 for comparisons between different clades (Table S6), indicating high genetic differentiation among these species, especially among the three clades that we analysed. This was also supported by the lack of correlation among allele frequencies for polymorphisms shared across species (Fig. S17). Despite their clear divergence, we found abundant trans-specific polymorphisms (Table S6; Fig. S18). We found c. 2.17–2.82 million trans-specific polymorphisms among species pairs within the same clade, and c. 0.21–0.54 million among species pairs from different clades (Table S6). However, it is not clear whether these trans-specific polymorphisms are maintained by balancing selection, ILS or gene flow. To investigate this further, we focused on 7711 SNVs that segregated in all six species (Fig. S18a). Among these, we observed 2925 SNVs located in the genic regions of 1007 genes (Table S7). After excluding genes that showed evidence of deletion or duplication by copy number filtering, 484 genes containing 1031 SNVs were retained (Fig. 3a). To prevent any shared SNVs due to repeated mutations, we selected only genes with two or more shared SNVs, at least one of which was in an exon. This identified 100 genes containing 562 shared SNVs. As expected for ancestral polymorphisms maintained by selection, these genes showed substantially higher values of Tajima’s D than for all genes in the genome (Fig. 3b). Finally, we focused on genes with at least two shared SNVs that were in strong linkage disequilibrium (r 2 > 0.3) across species from all three deeply diverged clades to reduce false positives. Using these strict criteria, we ended up with 45 genes containing 150 shared SNVs (Table S8). These genes with shared haplotypes showed much higher nucleotide diversity (p) and intermediate allele frequency relative to the genome (Table S8), and thus were more likely to be maintained by long-term balancing selection. We determined the haplotypes in the identified regions for each species and found that these sequences clustered by allele rather than species 0.00 0.05 0.10 0.15 0.20 z 30 20 10 0 S. suchowensis P. heterophylla X Species X of ATL clade {[(X, P. heterophylla), P. mexicanna], S. suchowensis} –0.10 –0.05 0.00 0.05 0.10 0.15 0.20 {[(POP, ATL), P. mexicana], O} D statistic POP- Sect. Populus ATL - ATL clade TUR- Sect.Turanga O - S. suchowensis (b) (c) {[(TUR, ATL), P. mexicana], O} P. mexicana03 P. mexicana02 P. P mexicana01 . ilicifolia03 P. ilicifolia02 P. ilicifolia01 P. euphratica03 P. euphratica02 P. euphratica01 P. pruinosa03 P. pruinosa02 P. pruinosa01 P. qiongdaoensis03 P. qiongdaoensis02 P. P qiongdaoensis01 . ningshanica03 P. ningshanica02 P. ningshanica01 P. wulianensis03 P. wulianensis02 P. wulianensis01 P. adenopoda03 P. adenopoda02 P. adenopoda01 P. alba03 P. alba02 P. alba01 P. canescens03 P. canescens02 P. canescens01 P. tremuloides03 P. tremuloides02 P. tremuloides01 P. tremula03 P. tremula02 P. tremula01 P. davidiana03 P. davidiana02 P. davidiana01 P. rotundifolia03 P. rotundifolia02 P. rotundifolia01 P. heterophylla02 P. heterophylla01 P. wilsonii03 P. wilsonii02 P. wilsonii01 P. lasiocarpa03 P. lasiocarpa02 P. lasiocarpa01 P. fremontii01 P. deltoides03 P. deltoides02 P. deltoides01 P. nigra03 P. nigra02 P. nigra01 P. yunnanensis03 P. yunnanensis02 P. yunnanensis01 P. trinervis03 P. trinervis02 P. trinervis01 P. simonii03 P. simonii02 P. simonii01 P. angustifolia01 P. balsamifera03 P. balsamifera02 P. balsamifera01 P. trichocarpa03 P. trichocarpa02 P. P trichocarpa01 . koreana03 P. koreana02 P. koreana01 P. laurifolia01 P. szechuanica03 P. szechuanica02 P. szechuanica01 P. mexicana03 P. mexicana02 P. mexicana01 P. ilicifolia03 P. ilicifolia02 P. ilicifolia01 P. euphratica03 P. euphratica02 P. euphratica01 P. pruinosa03 P. pruinosa02 P. pruinosa01 P. qiongdaoensis03 P. qiongdaoensis02 P. qiongdaoensis01 P. ningshanica03 P. ningshanica02 P. ningshanica01 P. wulianensis03 P. wulianensis02 P. wulianensis01 P. adenopoda03 P. adenopoda02 P. adenopoda01 P. alba03 P. alba02 P. alba01 P. canescens03 P. canescens02 P. canescens01 P. tremuloides03 P. tremuloides02 P. tremuloides01 P. tremula03 P. tremula02 P. tremula01 P. davidiana03 P. davidiana02 P. davidiana01 P. rotundifolia03 P. rotundifolia02 P. rotundifolia01 P. heterophylla02 P. heterophylla01 P. wilsonii03 P. wilsonii02 P. wilsonii01 P. lasiocarpa03 P. lasiocarpa02 P. lasiocarpa01 P. fremontii01 P. deltoides03 P. deltoides02 P. deltoides01 P. nigra03 P. nigra02 P. nigra01 P. yunnanensis03 P. yunnanensis02 P. yunnanensis01 P. trinervis03 P. trinervis02 P. trinervis01 P. simonii03 P. simonii02 P. simonii01 P. angustifolia01 P. balsamifera03 P. balsamifera02 P. balsamifera01 P. trichocarpa03 P. trichocarpa02 P. trichocarpa01 P. koreana03 P. koreana02 P. koreana01 P. laurifolia01 P. szechuanica03 P. szechuanica02 P. szechuanica01 Sect. Tacamahaca Sect. Aigeiros Sect. Leucoides Sect. Populus Sect. Turanga Sect. Abaso Sect. Tacamahaca Sect. Aigeiros Sect. Leucoides Sect. Populus Sect. Turanga Sect. Abaso 0 2 4 6 8 IBD length,(log10(bp)) Number of IBD blocks 0 1-10 11-20 21-50 51-100 101-200 201-500 501-1000 1001-2000 2001-5000 > 5000 (a) Fig. 2 (a) Estimated haplotype sharing in the genus Populus. Heatmap colours represent the total length (below the diagonal) and the total number (above the diagonal) of identity-by-descent (IBD) blocks for each pairwise comparison. (b, c) ABBA–BABA tests provide evidence of gene flow between P. mexicana and P. heterophylla. 2019 The Authors New Phytologist 2019 New Phytologist Trust New Phytologist (2020) 225: 1370–1382 www.newphytologist.com New Phytologist Research 1375

New1376 ResearchPhytologist(b)(a)*水***P.pruinosa7711SNVssharedbyall sixspecies本P.euphratica***SNVs located in genic regionsP.tremuloidesP.tremula4002925SNVsof1007genesaP.trichocarpa净ExcludinggenesthatshowedevidenceP.balsamiferaof deletion or duplication023-2-11Tajima's D1031SNVsof484genes(c)Potri.019G101500At least two shared SNVs in genicregion,at least one shared SNVincodingregion562SNVsof100genesAt least two shared SNVs in strongLD(r>0.3)across sections150SNVs in45genesFig.3 (a)Pipeline of the SNV filtering process to identify candidate trans-specific polymorphisms under balancing selection.(b)The Tajima's D estimator ishigherfor candidate genes showing no evidence of deletion or duplication (blue) and genes with at least two shared SNVs in genic region and at least onesharedSNVincoding region(orange)comparedtoallgeneswithvariation (white).*,P0.01;**,P0.001;***,P<0.001(Mann-Whitney test). (c) Candidate regions in the gene Potri.019G101500 produce an allelic tree, ratherthan a species tree. Text colours indicatesections,as in Fig.1.Additional examples of trans-specific polymorphism can be seen in Supporting Information Fig.S19.(Figs 3c, S19). Therefore, we considered these genes with sharedANK transmembrane proteins that play a wide variety of roles inhaplotypes as candidates for long-term balancing selection.protein-protein interactions, signal transduction and defenceGene ontology enrichment analyses revealed that these genesresponses (Lu et al, 2003; Becerra et al., 2004).Extremely highdiversity and potential signals of balancing selection were alsowere mainly associated withplantdevelopment, reproductionand responsetobiotic and abiotic stress (TableS9).For example,observed for ANK transmembrane proteins in A,thaliana(DuBODYGUARD3(BDG3)encodesanepidermis-specificextracel-etal,2007),suggestingthatgenesfromthisfamilymaybecomlular o/β-hydrolase fold-containing protein that may function inmon targets of balancing selection in numerous species. We didtheformation of the epidermal cell wall and cuticle (Kurdyukovnotfind thatthewell knownS(sterility)genes,SRKandSCRinet al., 2006); MTNI encodes one of the 5-methylthioadenosineArabidopsis(Kusaba etal,2oo1),inourscanforsharedhaplonucleosidases that are essential for normal vasculardevelopmenttypes were under balancing selection in Populus.However, weand reproduction in A.thaliana (Waduwara-Jayabahu et al.,found evidence for long-term balancing selection in a homolo-2012),whereasFORMINHOMOLOGY5(FH5)encodesapro-gous gene of Arabidopsis AT4G21390, which is tightly linked totein with similarity to formins that is involved in cytokinesisthe S-locus and encodes a S-locus lectin protein kinase family(Ingouff etal.,2005)and plays pivotal roles inthe regulation ofprotein(Kamau&Charlesworth,2005;Kamauetal,2007).Inendosperm development (Fitz Gerald et al, 2009) and in theaddition,we also found that several genes subjected to balancingestablishment of actin polarity during pollen germination (Che-selection encoded proteins involved in the response to biotic andabiotic stress, including CYCLOPHILIN 38 (CYP38) (Fu et al.,ung et al, 2010; Liu et al.,2018).Moreover, ANK1 is a memberofthe ankyrin (ANK)gene cluster inA.thaliana,which encodes2007),RESPONSEREGULATOR 22(RR22)(Kang etal,Newo Pbytologist (2020)225: 1370-1382@2019TheAuthorswww.newphytologist.comNew Pytologist 2019 New Phytologist Trust
(Figs 3c, S19). Therefore, we considered these genes with shared haplotypes as candidates for long-term balancing selection. Gene ontology enrichment analyses revealed that these genes were mainly associated with plant development, reproduction and response to biotic and abiotic stress (Table S9). For example, BODYGUARD 3 (BDG3) encodes an epidermis-specific extracellular a/b-hydrolase fold–containing protein that may function in the formation of the epidermal cell wall and cuticle (Kurdyukov et al., 2006); MTN1 encodes one of the 5-methylthioadenosine nucleosidases that are essential for normal vascular development and reproduction in A. thaliana (Waduwara-Jayabahu et al., 2012), whereas FORMIN HOMOLOGY 5 (FH5) encodes a protein with similarity to formins that is involved in cytokinesis (Ingouff et al., 2005) and plays pivotal roles in the regulation of endosperm development (Fitz Gerald et al., 2009) and in the establishment of actin polarity during pollen germination (Cheung et al., 2010; Liu et al., 2018). Moreover, ANK1 is a member of the ankyrin (ANK) gene cluster in A. thaliana, which encodes ANK transmembrane proteins that play a wide variety of roles in protein–protein interactions, signal transduction and defence responses (Lu et al., 2003; Becerra et al., 2004). Extremely high diversity and potential signals of balancing selection were also observed for ANK transmembrane proteins in A. thaliana (Du et al., 2007), suggesting that genes from this family may be common targets of balancing selection in numerous species. We did not find that the well known S (sterility) genes, SRK and SCR in Arabidopsis (Kusaba et al., 2001), in our scan for shared haplotypes were under balancing selection in Populus. However, we found evidence for long-term balancing selection in a homologous gene of Arabidopsis AT4G21390, which is tightly linked to the S-locus and encodes a S-locus lectin protein kinase family protein (Kamau & Charlesworth, 2005; Kamau et al., 2007). In addition, we also found that several genes subjected to balancing selection encoded proteins involved in the response to biotic and abiotic stress, including CYCLOPHILIN 38 (CYP38) (Fu et al., 2007), RESPONSE REGULATOR 22 (RR22) (Kang et al., 7711 SNVs shared by all six species 2925 SNVs of 1007 genes SNVs located in genic regions 1031 SNVs of 484 genes Excluding genes that showed evidence of deletion or duplication 562 SNVs of 100 genes At least two shared SNVs in genic region, at least one shared SNV in coding region 150 SNVs in 45 genes (a) At least two shared SNVs in strong LD (r > 0.3 ) across sections 2 0.03 P.balsamifera_08 P.tremula_12 P.euphratica_11 P.trichocarpa_05 P.balsamifera_05 P.tremula_17 P.tremula_11 P.trichocarpa_04 P.tremula_01 P.euphratica_03 P.tremula_10 P.euphratica_12 P.tremula_22 P.tremuloides_06 P.tremula_13 P.balsamifera_04 P.tremula_07 P.pruinosa_02 P.tremula_06 P.balsamifera_01 P.tremuloides_01 P.balsamifera_10 P.tremula_21 P.balsamifera_06 P.tremuloides_10 P.trichocarpa_06 P.balsamifera_02 P.tremula_09 P.euphratica_07 P.euphratica_10 P.euphratica_05 P.tremula_19 P.trichocarpa_10 P.euphratica_08 P.tremuloides_09 P.tremuloides_02 P.euphratica_06 P.balsamifera_03 P.tremula_20 P.pruinosa_04 P.tremuloides_08 P.trichocarpa_09 P.tremula_18 P.tremuloides_03 P.pruinosa_01 P.euphratica_04 P.tremula_03 P.trichocarpa_03 P.tremula_04 P.tremula_05 P.trichocarpa_01 P.tremula_23 P.euphratica_01 P.tremula_02 P.tremuloides_07 P.trichocarpa_08 P.tremula_14 P.pruinosa_05 P.trichocarpa_07 P.euphratica_09 P.tremuloides_05 P.pruinosa_03 P.tremuloides_04 P.pruinosa_07 P.tremula_16 P.tremula_08 P.balsamifera_09 P.balsamifera_07 P.pruinosa_06 P.trichocarpa_02 P.tremula_15 P.euphratica_02 (b) –2 –1 0 1 2 3 Tajima's D P. balsamifera P. trichocarpa P. tremula P. tremuloides P. euphratica P. pruinosa (c) Potri.019G101500 Fig. 3 (a) Pipeline of the SNV filtering process to identify candidate trans-specific polymorphisms under balancing selection. (b) The Tajima’s D estimator is higher for candidate genes showing no evidence of deletion or duplication (blue) and genes with at least two shared SNVs in genic region and at least one shared SNV in coding region (orange) compared to all genes with variation (white). *, P 0.01; **, P 0.001; ***, P < 0.001 (Mann–Whitney test). (c) Candidate regions in the gene Potri.019G101500 produce an allelic tree, rather than a species tree. Text colours indicate sections, as in Fig. 1. Additional examples of trans-specific polymorphism can be seen in Supporting Information Fig. S19. New Phytologist (2020) 225: 1370–1382 2019 The Authors www.newphytologist.com New Phytologist 2019 New Phytologist Trust Research New 1376 Phytologist

NewResearch 1377Phytologist2012),Filamentous temperature sensitive H 11 (FtsH11) (Chenthe fossil calibrations of the plastome phylogeny (Zhang et al.,etal,2006),MALEDISCOVERER1-INTERACTING2018).Therefore, sufficient generations have passed that, in theRECEPTORLIKEKINASE2(MIK2)(WangT.etal,2016absence of selection, ancient polymorphisms should have beenVan derDoes et al, 2017),Drought-induced protein 19(Di19-3)fixed by genetic drift. In addition, we found that the internodes(Qin et al., 2014) and others (TableS8).between these major clades were relatively short and the randomfixation of ancient polymorphisms across the radiative polytomyalso likely led to the phylogenetic inconsistencies (Wu, 1991).DiscussionUnder such a scenario, it is difficult to discern ILS from geneOur phylogenetic analyses of genomic data recovered four cladesflow.Second, our plastome phylogenyrecovered a close relation-in thetreemodel genus Populus, by contrast with the six sectionsship of sect.Abaso with P.beterophylla and related species fromthe ATL clade (Fig. S10), which was also supported by ourpreviously acknowledged based on morphological traits (Ecken-walder,1996).Onespecies,P.mexicanafromthe southernpartABBA-BABA tests that detected significant gene fow berweenP. mexicana and the ATL clade, especially with P. heteropbyllaof North America,was identified as a basal monotypic clade.Within each clade, we identified frequent gene fow and hybridis-(Fig.2c). It should be noted, however, that we failed to detectation between different species.We also found that gene flowshared IBD blocks berween these groups, suggesting that thisoccurred between the four clades during their early diversifica-gene flow would have occurred very early, and that subsequentrecombination has erased the long IBD haplotypes. Therefore,tion. We confirmed that numerous ancient polymorphisms per-sisted across different species of three major clades throughthese results may be best interpreted as chloroplast captures dur-balancing selection. Both gene flow and ILS of ancient polymor-ing the early hybridisation of two ancient lineages when repro-phisms obviously violate a model of strict bifurcating divergenceductive isolation was not yet complete, although we cannotcompletely exclude the possibility of ILS. Under this hypotheticalof the genus Populus between and within the major clades.scenario, after the ancient hybridisation between the ancestralP.mexicana and the ancestor that gave rise to P.heterophylla andPhylogenetic relationships,shared polymorphism and generelated species (referred here as the heteropbylla-like ancestor),flowrepeated backcrosses to the ancestral P.mexicana led to captureOur analyses based on nuclear genomic data recovered four welland fixation of the heteropbyla chloroplast in Pmexicanasupported clades: sects.Abaso, Turanga and Populus, and ATL(Fig. S20, X. Liu et al, 2017). This scenario is consistent with(Figs 1, S1, S2, S5-S9). The three previously identified sectionsthe fossil record, in which sect.Leucoides is thefirst to appearinwithin the ATL clade, sects. Aigeiros, Tacamahaca and Leucoides,North America following sect. Abaso (Eckenwalder 1996).were found to be paraphyletic with respect to one another (FigsAncestral gene flow berween pairs of species within each cladeS1, S5-S9). Most phylogenetic analyses suggested that thewas apparent fromthe numerous shared IBDhaplotypes betweenmonospecific sect. Abaso diverged first,followed by sect.Turangamost pairs of species (Fig. 2a) and our ABBA-BABA tests results.and then sect.Populus and ATL (Fig.1).However, in analyses ofMoreover,thequantityand extentof IBDhaplotypes is propor-tional to the level of gene fow between pairs of speciessome gene partitions, sect.Turanga is sister to ATLwhile sect.Populus diverged following sect. Abaso (Figs S1, S5-S9). The(Fig. S14). P. × canescens, which is a hybrid (primarily Fls)confict among these trees can be partially explained by the strongbetween P. alba and P. tremula (Rajora & Dancik, 1992), hadthe longest IBD haplotypes, supporting the use of shared IBDbase compositional bias and high mutation rates for the thirdcodon position (Jarvis et al., 2014; L.Liu et al.,2017).In anyblocks as an indicator of past gene flow.ILS of ancient polymor-case,phylogenetic inconsistencies among different datasets of thephisms should also exist across many pairs of Populus speciesnuclear genome are mainly related to relationships among sects.because of the short divergencetimes among many species pairsTuranga,Populus and the ATL clade while sect.Abaso alwayswithin the four main clades (Ingvarson, 2010). These factorsdiverged first. Therefore, the recovered interclade relationshipscombine to confound the reconstruction of the bifurcating rela-based on nuclear genomic data are consistent with an origin oftionships among the current species within and among clades.this genus in North America and then further dispersal to otherNonetheless,geneflowmayhavebeen an importantcontributorregions of the Northern Hemisphere (Fig: 1). This is generallyto the early local adaptation and divergence of Populus speciesconsistent with fossil evidence, which suggests that sect. Abaso(Suarez-Gonzalez et al.,2016,2018),which was also indicated inappeared first in North America (Eckenwalder 1996).other species groups (Sun et al., 2018; Wu et al., 2018).Furtherdetailed studies based on more sampling of closely related speciesBy contrast with the nuclear results, our phylogenetic analysesmay be a fruitful avenue towards understanding the historicalof plastomesonly recovered monophyly of sect.Turanga,andsupported its basal position in the genus, whereas the other sec-influences of gene flow in Populus.tions did not show corresponding monophyletic clustering(Fig.S10). Both gene flow and ILS of ancient polymorphism areTrans-specificpolymorphisms mediated by balancinglikely to have contributed to these inconsistent histories amongselectionthe four major clades, but gene fow was likelyto have playedaBalancing selection can maintain ancient polymorphisms overmore important role for the following two reasons. First, all fourclades diverged from one another between 6 and 11 Ma based onlong time frames and across species boundaries (Segurel et al,2019TheAuthorsNew Phytologist(2020)225:1370-1382NeuPlytologist2019New PhytologistTrustwww.newphytologist.com
2012), Filamentous temperature sensitive H 11 (FtsH11) (Chen et al., 2006), MALE DISCOVERER 1-INTERACTING RECEPTOR LIKE KINASE 2 (MIK2) (Wang T. et al., 2016; Van der Does et al., 2017), Drought-induced protein 19 (Di19-3) (Qin et al., 2014) and others (Table S8). Discussion Our phylogenetic analyses of genomic data recovered four clades in the tree model genus Populus, by contrast with the six sections previously acknowledged based on morphological traits (Eckenwalder, 1996). One species, P. mexicana from the southern part of North America, was identified as a basal monotypic clade. Within each clade, we identified frequent gene flow and hybridisation between different species. We also found that gene flow occurred between the four clades during their early diversification. We confirmed that numerous ancient polymorphisms persisted across different species of three major clades through balancing selection. Both gene flow and ILS of ancient polymorphisms obviously violate a model of strict bifurcating divergence of the genus Populus between and within the major clades. Phylogenetic relationships, shared polymorphism and gene flow Our analyses based on nuclear genomic data recovered four well supported clades: sects. Abaso, Turanga and Populus, and ATL (Figs 1, S1, S2, S5–S9). The three previously identified sections within the ATL clade, sects. Aigeiros, Tacamahaca and Leucoides, were found to be paraphyletic with respect to one another (Figs S1, S5–S9). Most phylogenetic analyses suggested that the monospecific sect. Abaso diverged first, followed by sect. Turanga and then sect. Populus and ATL (Fig. 1). However, in analyses of some gene partitions, sect. Turanga is sister to ATL while sect. Populus diverged following sect. Abaso (Figs S1, S5–S9). The conflict among these trees can be partially explained by the strong base compositional bias and high mutation rates for the third codon position (Jarvis et al., 2014; L. Liu et al., 2017). In any case, phylogenetic inconsistencies among different datasets of the nuclear genome are mainly related to relationships among sects. Turanga, Populus and the ATL clade while sect. Abaso always diverged first. Therefore, the recovered interclade relationships based on nuclear genomic data are consistent with an origin of this genus in North America and then further dispersal to other regions of the Northern Hemisphere (Fig. 1). This is generally consistent with fossil evidence, which suggests that sect. Abaso appeared first in North America (Eckenwalder 1996). By contrast with the nuclear results, our phylogenetic analyses of plastomes only recovered monophyly of sect. Turanga, and supported its basal position in the genus, whereas the other sections did not show corresponding monophyletic clustering (Fig. S10). Both gene flow and ILS of ancient polymorphism are likely to have contributed to these inconsistent histories among the four major clades, but gene flow was likely to have played a more important role for the following two reasons. First, all four clades diverged from one another between 6 and 11 Ma based on the fossil calibrations of the plastome phylogeny (Zhang et al., 2018). Therefore, sufficient generations have passed that, in the absence of selection, ancient polymorphisms should have been fixed by genetic drift. In addition, we found that the internodes between these major clades were relatively short and the random fixation of ancient polymorphisms across the radiative polytomy also likely led to the phylogenetic inconsistencies (Wu, 1991). Under such a scenario, it is difficult to discern ILS from gene flow. Second, our plastome phylogeny recovered a close relationship of sect. Abaso with P. heterophylla and related species from the ATL clade (Fig. S10), which was also supported by our ABBA–BABA tests that detected significant gene flow between P. mexicana and the ATL clade, especially with P. heterophylla (Fig. 2c). It should be noted, however, that we failed to detect shared IBD blocks between these groups, suggesting that this gene flow would have occurred very early, and that subsequent recombination has erased the long IBD haplotypes. Therefore, these results may be best interpreted as chloroplast captures during the early hybridisation of two ancient lineages when reproductive isolation was not yet complete, although we cannot completely exclude the possibility of ILS. Under this hypothetical scenario, after the ancient hybridisation between the ancestral P. mexicana and the ancestor that gave rise to P. heterophylla and related species (referred here as the heterophylla-like ancestor), repeated backcrosses to the ancestral P. mexicana led to capture and fixation of the heterophylla chloroplast in P. mexicana (Fig. S20; X. Liu et al., 2017). This scenario is consistent with the fossil record, in which sect. Leucoides is the first to appear in North America following sect. Abaso (Eckenwalder 1996). Ancestral gene flow between pairs of species within each clade was apparent from the numerous shared IBD haplotypes between most pairs of species (Fig. 2a) and our ABBA–BABA tests results. Moreover, the quantity and extent of IBD haplotypes is proportional to the level of gene flow between pairs of species (Fig. S14). P. 9 canescens, which is a hybrid (primarily F1s) between P. alba and P. tremula (Rajora & Dancik, 1992), had the longest IBD haplotypes, supporting the use of shared IBD blocks as an indicator of past gene flow. ILS of ancient polymorphisms should also exist across many pairs of Populus species because of the short divergence times among many species pairs within the four main clades (Ingvarson, 2010). These factors combine to confound the reconstruction of the bifurcating relationships among the current species within and among clades. Nonetheless, gene flow may have been an important contributor to the early local adaptation and divergence of Populus species (Suarez-Gonzalez et al., 2016, 2018), which was also indicated in other species groups (Sun et al., 2018; Wu et al., 2018). Further detailed studies based on more sampling of closely related species may be a fruitful avenue towards understanding the historical influences of gene flow in Populus. Trans-specific polymorphisms mediated by balancing selection Balancing selection can maintain ancient polymorphisms over long time frames and across species boundaries (Segurel et al., 2019 The Authors New Phytologist 2019 New Phytologist Trust New Phytologist (2020) 225: 1370–1382 www.newphytologist.com New Phytologist Research 1377

New1378 ResearchPhytologist2012). In such cases, phylogenetic analysis of orthologous genesgenes may have contributed to the widespread distributions ofmay not reflect true species relationships because theywill clustereach species.Future studies should use more extensive geographicby allele rather than by species. To detect such polymorphisms,sampling to uncover the correlations of the divergent alleles withwe sampled 72 individuals from six species across three of thehabitat, thereby providing possible mechanisms for local adapta-major clades in Populus.We identified 45 genes with polymor-tion (Wu et al., 2017). Overall, our population genomic dataphisms that segregated in all six species across these three deeplyacross the long-term diversification history of the genus identifiednumerous genes that were likely to be infuenced by balancingdiverged clades and exhibited patterns consistent with balancingselection contributing to their long-term maintenance. Theseselection,which would not have been detected by traditional for-three clades are sufficiently diverged that trans-clade polymor-ward or reverse genetic approaches.These approaches should bephisms resulting from ILS of ancient polymorphisms werewidely employed in the future to reveal how these divergent alleunlikely. As previously suggested, ancestral gene fow hasles of the same species at the same locus contribute to functionaloccurred among these clades, so we cannot totally exclude theadaptation and how these divergent alleles are maintained overpossibility that ancient hybridisation and introgression was thevastevolutionarydistances.source of these shared polymorphisms. However, these genes allcontain species-specific polymorphisms as well, so recent hybridi-Acknowledgementssation is not likely to have accounted for the shared polymor-This research was supported by National Natural Science Foun-phisms;otherwise,long haplotypes should be shared across thethree sections of Populus.In addition,introgression from ancientdationof China(31590821,31561123001,31922061,hybridisation also would have been subject to lineage sorting,31500502,41871044),National KeyResearch and Developmentand polymorphisms would not be likely to persist in the absenceProgramofChina(2017YFC0505203,2016YFD0600101)ofselection.Introgression should cause the sharing of similar alle-National Key Project for Basic Research (2012CB114504)National Science Foundation grants (DEB-1542599,NSF-les across species butwould notexplainthepresenceof divergentalleles coexisting in both species.Therefore,it is more likely that1542509,ISO-1542479,1542486),andFundamentalResearchFunds for the Central Universities (2018CDDY-S02-SCU,the detected trans-specific polymorphisms arose from ancientpolymorphisms maintained by balancing selection, a scenarioSCU2019D013).that is also supported by elevated Tajima's D values (Fig. 3b). Itshould be noted that our stringent filtering criteria is likely toAuthor contributionshave seriously underestimated thenumber of trans-specific polyTM, JL and MW planned and designed the research.LZ, JL andmorphisms mediated by balancing selection.More sites wouldhave been identified if we had relaxed the filtering criteria orMWconducted fieldwork.MW,ZZ,ML,DWandXZanalysedfocused on the polymorphisms shared by species from two cladesthe data. ZX designed the phylogenetic analyses. MW, TM and(Fig. S18). However, as the criteria become less stringent, a largerJLwrotethemanuscript.KK-R,LBS,SPD,MSOandTYproportion of the trans-specific polymorphisms are likely to haverevised the manuscript.MWand LZ contributed equally to thiswork.resulted from persistence after hybridisation instead of beingmaintained by balancing selection. In any case, our resultsrevealed that the persistence of selection-mediated ancestral poly-ORCIDmorphisms is likely to have been prevalent across the long evolu-tionary history of the genus Populus, which will increase theStephen P.DiFaziohttps://orcid.org/0000-0003-4077-1590number of loci that are inconsistent with the true species treeJianquanLiu(Dhttps://orcid.org/0000-0002-4237-7418because of increased ILS (Guerrero & Hahn, 2018).TaoMahttps://orcid.org/0000-0002-7094-6868In animals, genes identified to have experienced balancingMatthew S.Olsonhttps://orcid.org/0000-0002-0798-145Xselection are mainly suggested to be responsible for hostMingchengWangDhttps://orcid.org/0000-0002-3631-9174pathogen interactions (Leffler et al., 2013). Similarly,diseaseZhenxiangXiDhttps://orcid.org/0000-0002-2851-5474resistance(R)genes aswell as self-incompatibility(S)genes haveZhiyang Zhanghttps://orcid.org/0000-0002-9466-9439been found to be under balancing selection in plants (Take-bayashi et al.,2003; Roux et al,2013;Karasov et al., 2014).Arecent study found that the genes under balancing selection areReferencesresponsible for environmental adaptation,and that the distribu-tion of divergent alleles of the same species are correlated withAroldML2006.Evolutionthroughgeneticechange.Oxford,UK:Oxforddivergent niches (Wu et al., 2017). The genes we have identifiedUniversityPressBaute GJ, Owens GL, Bock DG, Rieseberg LH. 2016. Genome-widein Populus encompass all of thesefunctions,including matinggenotyping-by-sequencing data provide a high-resolution view of wildcompatibility,development, and resistancetobiotic and abioticHeliausditygcsuureandincesgenowmstress (Table S8). Because the sampled individuals of each speciesJournal ofBotany103:2170-2177.cover only a portion of its total distributional range, we wereBecerra C, Jahrmann T, Puigdomenech P, Vicient CM. 2004. Ankyrin repeatunable to evaluate whether divergent alleles were correlated withcontaining proteins in Arabidopsis characterization ofa novel and abundantdifferent habitats. It therefore remains unknown whether thesegroup of genes coding ankyrin-transmembrane proteins. Gene 340: 111-121.Newo Pbytologist (2020)225: 1370-13822019TheAuthorsNeu Pbytologist 2019 New Phytologist Trustwww.newphytologist.com
2012). In such cases, phylogenetic analysis of orthologous genes may not reflect true species relationships because they will cluster by allele rather than by species. To detect such polymorphisms, we sampled 72 individuals from six species across three of the major clades in Populus. We identified 45 genes with polymorphisms that segregated in all six species across these three deeply diverged clades and exhibited patterns consistent with balancing selection contributing to their long-term maintenance. These three clades are sufficiently diverged that trans-clade polymorphisms resulting from ILS of ancient polymorphisms were unlikely. As previously suggested, ancestral gene flow has occurred among these clades, so we cannot totally exclude the possibility that ancient hybridisation and introgression was the source of these shared polymorphisms. However, these genes all contain species-specific polymorphisms as well, so recent hybridisation is not likely to have accounted for the shared polymorphisms; otherwise, long haplotypes should be shared across the three sections of Populus. In addition, introgression from ancient hybridisation also would have been subject to lineage sorting, and polymorphisms would not be likely to persist in the absence of selection. Introgression should cause the sharing of similar alleles across species but would not explain the presence of divergent alleles coexisting in both species. Therefore, it is more likely that the detected trans-specific polymorphisms arose from ancient polymorphisms maintained by balancing selection, a scenario that is also supported by elevated Tajima’s D values (Fig. 3b). It should be noted that our stringent filtering criteria is likely to have seriously underestimated the number of trans-specific polymorphisms mediated by balancing selection. More sites would have been identified if we had relaxed the filtering criteria or focused on the polymorphisms shared by species from two clades (Fig. S18). However, as the criteria become less stringent, a larger proportion of the trans-specific polymorphisms are likely to have resulted from persistence after hybridisation instead of being maintained by balancing selection. In any case, our results revealed that the persistence of selection-mediated ancestral polymorphisms is likely to have been prevalent across the long evolutionary history of the genus Populus, which will increase the number of loci that are inconsistent with the true species tree because of increased ILS (Guerrero & Hahn, 2018). In animals, genes identified to have experienced balancing selection are mainly suggested to be responsible for host– pathogen interactions (Leffler et al., 2013). Similarly, disease resistance (R) genes as well as self-incompatibility (S) genes have been found to be under balancing selection in plants (Takebayashi et al., 2003; Roux et al., 2013; Karasov et al., 2014). A recent study found that the genes under balancing selection are responsible for environmental adaptation, and that the distribution of divergent alleles of the same species are correlated with divergent niches (Wu et al., 2017). The genes we have identified in Populus encompass all of these functions, including mating compatibility, development, and resistance to biotic and abiotic stress (Table S8). Because the sampled individuals of each species cover only a portion of its total distributional range, we were unable to evaluate whether divergent alleles were correlated with different habitats. It therefore remains unknown whether these genes may have contributed to the widespread distributions of each species. Future studies should use more extensive geographic sampling to uncover the correlations of the divergent alleles with habitat, thereby providing possible mechanisms for local adaptation (Wu et al., 2017). Overall, our population genomic data across the long-term diversification history of the genus identified numerous genes that were likely to be influenced by balancing selection, which would not have been detected by traditional forward or reverse genetic approaches. These approaches should be widely employed in the future to reveal how these divergent alleles of the same species at the same locus contribute to functional adaptation and how these divergent alleles are maintained over vast evolutionary distances. Acknowledgements This research was supported by National Natural Science Foundation of China (31590821, 31561123001, 31922061, 31500502, 41871044), National Key Research and Development Program of China (2017YFC0505203, 2016YFD0600101), National Key Project for Basic Research (2012CB114504), National Science Foundation grants (DEB-1542599, NSF- 1542509, ISO-1542479, 1542486), and Fundamental Research Funds for the Central Universities (2018CDDY-S02-SCU, SCU2019D013). Author contributions TM, JL and MW planned and designed the research. LZ, JL and MW conducted fieldwork. MW, ZZ, ML, DW and XZ analysed the data. ZX designed the phylogenetic analyses. MW, TM and JL wrote the manuscript. KK-R, LBS, SPD, MSO and TY revised the manuscript. MW and LZ contributed equally to this work. ORCID Stephen P. DiFazio https://orcid.org/0000-0003-4077-1590 Jianquan Liu https://orcid.org/0000-0002-4237-7418 Tao Ma https://orcid.org/0000-0002-7094-6868 Matthew S. Olson https://orcid.org/0000-0002-0798-145X Mingcheng Wang https://orcid.org/0000-0002-3631-9174 Zhenxiang Xi https://orcid.org/0000-0002-2851-5474 Zhiyang Zhang https://orcid.org/0000-0002-9466-9439 References Arnold ML. 2006. Evolution through genetic exchange. Oxford, UK: Oxford University Press. Baute GJ, Owens GL, Bock DG, Rieseberg LH. 2016. Genome-wide genotyping-by-sequencing data provide a high-resolution view of wild Helianthus diversity, genetic structure, and interspecies gene flow. American Journal of Botany 103: 2170–2177. Becerra C, Jahrmann T, Puigdomenech P, Vicient CM. 2004. Ankyrin repeatcontaining proteins in Arabidopsis: characterization of a novel and abundant group of genes coding ankyrin-transmembrane proteins. Gene 340: 111–121. New Phytologist (2020) 225: 1370–1382 2019 The Authors www.newphytologist.com New Phytologist 2019 New Phytologist Trust Research New 1378 Phytologist
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《植物生物学》课程教学资源(文献资料)Melosuavine I, an apoptosis-inducing bisindole alkaloid from Melodinus.pdf
- 《植物生物学》课程教学资源(文献资料)Fine mapping of the sex locus in Salix triandra.pdf
- 《植物生物学》课程教学资源(文献资料)Different autosomes evolved into sex.pdf
- 山东理工大学:《植物生物学》课程教学资源(导学任务单)第9周 自主学习任务单.pdf
- 山东理工大学:《植物生物学》课程教学资源(导学任务单)第8周 自主学习任务单.pdf
- 山东理工大学:《植物生物学》课程教学资源(导学任务单)第7周 自主学习单.pdf
- 山东理工大学:《植物生物学》课程教学资源(导学任务单)第4周 自主学习任务单.pdf
- 山东理工大学:《植物生物学》课程教学资源(导学任务单)第3周 自主学习任务单.pdf
- 山东理工大学:《植物生物学》课程教学资源(导学任务单)第11周 自主学习任务单.pdf
- 山东理工大学:《植物生物学》课程教学资源(导学任务单)第10周 自主学习任务单.pdf
- 山东理工大学:《植物生物学》课程教学资源(导学任务单)第6周 自主学习任务单.pdf
- 山东理工大学:《植物生物学》课程教学资源(导学任务单)第5周 自主学习任务单.pdf
- 《植物生物学》课程教学课件(讲稿)9 植物适应性与整体性.pdf
- 《植物生物学》课程教学课件(讲稿)8 叶的形态与结构.pdf
- 《植物生物学》课程教学课件(讲稿)7 营养器官结构——茎.pdf
- 《植物生物学》课程教学课件(讲稿)7 茎的形态.pdf
- 《植物生物学》课程教学课件(讲稿)6 种子植物的营养器官结构——根.pdf
- 《植物生物学》课程教学课件(讲稿)5 种子和幼苗.pdf
- 《植物生物学》课程教学课件(讲稿)15 藻类植物2/2.pdf
- 《植物生物学》课程教学课件(讲稿)14 藻类植物1/2.pdf
- 《植物生物学》课程教学资源(文献资料)The Arabidopsis MERISTEM DISORGANIZATION 1 gene is.pdf
- 《植物生物学》课程教学资源(文献资料)Wang2020_Article_InitiationAndMaintenanceOfPlan.pdf
- 《植物生物学》课程教学资源(文献资料)国家重点保护野生植物名录_林业_中国政府网.pdf
- 《植物生物学》课程教学资源(文献资料)花对称性的研究进展_李交昆.pdf
- 《植物生物学》课程教学课件(讲稿)1 绪论.pdf
- 《植物生物学》课程教学课件(讲稿)2 植物细胞1/2.pdf
- 《植物生物学》课程教学课件(讲稿)3 植物细胞2/2.pdf
- 《植物生物学》课程教学课件(讲稿)4 植物组织.pdf
- 《植物生物学》课程教学资源(文献资料)Top 3 Theories of Shoot Apical Meristem_Plants.pdf
- 《植物生物学》课程教学资源(文献资料)Characteristics of Monocotyledonous Roots_Botany.pdf
- 《植物生物学》课程教学资源(文献资料)CONVENTION ON BIOLOGICAL DIVERSITY.pdf
- 《植物生物学》课程教学资源(文献资料)三尖杉类的化石及其起源初探.pdf
- 《植物生物学》课程教学资源(文献资料)三种葫芦科植物花蜜腺的比较解剖学研究.pdf
- 《植物生物学》课程教学资源(文献资料)中国现生轮藻研究现状.pdf
- 《植物生物学》课程教学资源(文献资料)中国轮藻植物分布特点研究.pdf
- 《植物生物学》课程教学资源(文献资料)中国轮藻植物分布研究.pdf
- 《植物生物学》课程教学资源(文献资料)内蒙古轮藻植物数量分类.pdf
- 《植物生物学》课程教学资源(文献资料)山西垣曲盆地始新世轮藻植物群.pdf
- 《植物生物学》课程教学资源(文献资料)山西省轮藻植物新资料.pdf
- 《植物生物学》课程教学资源(文献资料)建兰蜜腺的显微结构及释香研究.pdf