αr35 is a family of bacterial small non-coding RNAs with representatives in a reduced group of Alphaproteobacteria from the order Hyphomicrobiales. The first member of this family (Smr35B) was found in a Sinorhizobium meliloti 1021 locus located in the symbiotic plasmid B (pSymB). Further homology and structure conservation analysis have identified full-length SmrB35 homologs in other legume symbionts (i.e. Rhizobium leguminosarum bv.viciae, R. leguminosarum bv. trifolii and R. etli), as well as in the human and plant pathogens Brucella anthropi and Agrobacterium tumefaciens, respectively. αr35 RNA species are 139-142 nt long (Table 1) and share a common secondary structure consisting of two stem loops and a well conserved rho independent terminator (Figure 1, 2, 3). Most of the αr35 transcripts can be catalogued as trans-acting sRNAs expressed from well-defined promoter regions of independent transcription units within intergenic regions of the Alphaproteobacterial genomes (Figure 5).

Discovery and Structure

edit

Smr35B sRNA was firstly described by del Val et al.,[1] as a result of a computational comparative genomic approach in the intergenic regions (IGRs) of the reference S. meliloti 1021 strain. Northern hybridization experiments confirmed that the predicted smr35B locus did express a single transcript of the expected size, which accumulated differentially in free-living and endosymbiotic bacteria. TAP-based 5’-RACE experiments mapped the transcription start site (TSS) of the full-length Smr35B transcript to the 577,730 nt position in the S. meliloti 1021 genome (http://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi) whereas the 3’-end was assumed to be located at the 577,868 nt position matching the last residue of the consecutive stretch of Us of a bona fide Rho-independent terminator (Figure 5). Recent deep sequencing-based characterization of the small RNA fraction (50-350 nt) of S. meliloti further confirmed the expression of Smr35B (here referred to as SmelB053), and mapped the 5’- and 3´-ends of the molecule to the positions proposed earlier.[2]

The nucleotide sequence of Smr35B was initially used as query to search against the Rfam database. This homology search rendered no matches to known bacterial sRNA in this database. Smr35B was next BLASTed with default parameters against all the currently available bacterial genomes (1,615 sequences at 20 April 2011; https://www.ncbi.nlm.nih.gov;). The regions exhibiting significant homology to the query sequence (78-89% similarity) were extracted to create a Covariance Model (CM) from a seed alignment using Infernal (version1.0)[3] (Figure 2).

 
Figure 1: Covariance Model in Stockholm format showing the consensus structure for the αr35 family. Each of the stems represented by the structure line #=GC SS_cons is in a different color, corresponding the red one to the rho independent terminator stem. Covariance Model in Stockholm format can be downloaded here.

This CM was used in a further search for new members of the αr35 family in the existing bacterial genomic databases.

Table 1: Smr35B homologs in other symbionts and pathogens
CM model Name GI accession number begin end strand %GC length Organism
αr35 Smr35B gi|16263748|ref|NC_003078.1| 577730 577868 + 52 139 Sinorhizobium meliloti 1021 plasmid pSymB
αr35 Atr35C gi|159185562|ref|NC_003063.2| 132595 132733 + 48 139 Agrobacterium tumefaciens str. C58 chromosome linear
αr35 Rlvr35C gi|116249766|ref|NC_008380.1| 2256716 2256853 + 55 138 Rhizobium leguminosarum bv. viciae 3841
αr35 Rlt1325r35p04 gi|241258599|ref|NC_012852.1| 114247 114385 - 56 139 Rhizobium leguminosarum bv. trifolii WSM1325 plasmid pR132504
αr35 Rlt1325r35p02 gi|241666492|ref|NC_012858.1| 466255 466394 - ? 140 Rhizobium leguminosarum bv. trifolii WSM1325 plasmid pR132502
αr35 ReCFNr35f gi|86360734|ref|NC_007766.1| 136368 136508 + 57 141 Rhizobium etli CFN 42 plasmid p42f
αr35 Oar35CII gi|153010078|ref|NC_009668.1| 1587138 1587279 - 52 142 Brucella anthropi ATCC 49188 chromosome 2

The results were manually inspected to deduce a consensus secondary structure for the family (Figure 1 and Figure 2). The consensus structure was also independently predicted with the program locARNATE[4] with very similar predictions. The manual inspection of the 84 sequences found with the CM using Infernal allowed finding seven true homolog sequences: two copies in Rhizobium leguminosarum bv. viciae (chromosome and plasmid pRL11), two copies in Rhizobium leguminosarumbv. trifolii WSM1325 (plasmid pR132504 and plasmid pR132502), in Rhizobium etli CFN 42 plasmid p42f and in the chromosomes of Agrobacterium tumefaciens and Brucella anthropi. All these sequences showed significant Infernal E-values (1.38e-33 – 1.05e-11) and bit-scores. In the case of S. meliloti a second copy was identified in the symbiotic plasmid pSymB (574630-574766) with a significant E-value (3.73e-07) but no expression has been detected under any of the tested conditions (unpublished data). The rest of the sequences found with the model showed high E-values between (8.76e-12 and 1.e-3) but very low bit-scores, which usually is a sign of a remote homologue. However, a manual inspection of these cases showed that the rho independent terminator and the second stem were the only conserved regions, failing the first stem. This two stem arregment construction was largely extended in all the Alphaproteobacteria, being specially conserved in Brucella species.

 
Figure 2: Consensus secondary structure of the αr35 members used in the Covariance Model predicted by RNAalifold.[5] The coloring scheme for the αr35 family structure is based on base pairs conservation: Red: base pair occurring in all sequences used to generate the consensus; yellow: two types of base pairing occur; Green: three types of base pairing occur. The shading of base pairs represents: Saturated, no inconsistent sequences; Pale, one inconsistent sequence; Very pale, two inconsistent sequences.
 
Figure 3: Phylogenetic distribution of known and predicted αr35 genes. Gene numbers are based on computational analysis using the program Infernal. Legend: Smr35B = Sinorhizobium meliloti 1021 plasmid pSymB (NC_003078), Atr35C = Agrobacterium tumefaciens str. C58 chromosome linear (NC_003063), Rlvr35C = Rhizobium leguminosarum bv. viciae 3841 (NC_008380), Rlt1325r35p04 = Rhizobium leguminosarum bv. trifolii WSM1325 plasmid pR132504 (NC_012852), Rlt1325r35p02 = Rhizobium leguminosarum bv. trifolii WSM1325 plasmid pR132502 (NC_012858), ReCFNr35f = Rhizobium etli CFN 42 plasmid p42f (NC_007766), Oar35CII = Brucella anthropi ATCC 49188 chromosome 2 (NC_009668), Rlvr35p11 = Rhizobium leguminosarum bv. viciae 3841 plasmid pRL11 (NC_008384).

Expression information

edit

Smr35B expression was first assessed by del Val et al.[1] in S. meliloti 1021 under different biological conditions; i.e. bacterial growth in TY, minimal medium (MM) and luteolin-MM broth and endosymbiotic bacteria (i.e. mature symbiotic alfalfa nodules). Expression of Smr35B in free-living bacteria was found to be growth-dependent, being the gene down-regulated when bacteria entered the stationary phase. Supplementation of MM with luteolin, the plant flavone that specifically induces transcription of the S. meliloti nodulation genes, stimulated the expression of Smr35B by ~4 fold.[1] In contrast, the Smr35B transcript was not detected in mature nodule tissues. Schlüter et al.[2] further described up-regulation of Smr35B upon an osmotic upshift.

Promoter Analysis

edit

All αr35 loci have recognizable σ70-dependent promoters showing a -35/-10 consensus motif CTTAGAC-n17-CTATAT previously shown to be widely conserved among several other genera in the Alphaproteobacteria.[6] To identify binding sites for other known transcription factors we used the fasta sequences provided by RegPredict[7](http://regpredict.lbl.gov/regpredict/help.html), and used those position weight matrices (PSWM) provided by RegulonDB[8] (http://regulondb.ccg.unam.mx). We built PSWM for each transcription factor from the RegPredict sequences using the Consensus/Patser program, choosing the best final matrix for motif lengths between 14 and 30 bps if the corresponding length had not been previously specified (see "Consensus matrices" threshold (average E-value < 10E-10) for each matrix was established (see "Thresholded consensus" in http://gps-tools2.its.yale.edu).[clarification needed] Moreover, we searched for conserved unknown motifs using MEME[9] (http://meme.sdsc.edu/meme4_6_1/intro.html) and used relaxed regular expressions (i.e. pattern matching) over all Smr35B homologs promoters. Only an inverted repeat structure built around the motif T-N11-A was found 55 nt upstream of the transcription start site of SmrB35 in S. meliloti which is a degenerated motif of the known conserved nod boxes (Figure 4). This characteristic sequence has been proposed as the specific binding site for the LysR-type proteins.[10] All promoter regions of the seed SmrB35 homologs presented the motif as well.

 
Figure 4: Graphic representation of the αr35 seed members' promoter region. All members presented putative σ70 promoters with -35 and -10 boxes marked in green and red respectively. The degenerated LysR recognition box is marked in blue.

Genomic Context

edit

Most of the members of the αr35 family are trans-encoded sRNAs transcribed from independent promoters in the IGRs of the rhizobial megaplasmids. Exceptions are SmrB35 homologs of R. leguminosarum bv. viciae (Rlvr35C), and R. etli CFN 42 plasmid p42f (ReCFNr35f), which are encoded in the opposite strand of annotated genes, partially overlapping ORFs. The predicted protein products of these overlapping ORFs could not be assigned to any functional category on the basis of the amino acid sequence homology.[11][12][13] Thus, these αr35 members are putative cis-encoded antisense sRNAs. The genomic regions of the trans-encoded αr35 sRNAs exhibit partial conservation mainly limited to the sRNA-coding sequence and one flanking gene. Most of the flanking genes of the αr35 loci encode transcription factors and proteins related to nitrogen and glutamine metabolism.

 
Figure 5: Genomic context scheme of Smr35B and its closest homologues in other organisms. The αr35 RNA genes are represented by red arrows and the flanking ORFs by arrows on different colors depending on their product function (legend). Numbers indicate the αr35 RNA gene's and flanking ORFs coordinates in each organism genome database. The gene strand is represented with the file direction. On the left of the figure identification names are used which correspond to a certain organism: αr35_Smr35B = Sinorhizobium meliloti 1021 plasmid pSymB (NC_003078), αr35_Oar35CII = Brucella anthropi ATCC 49188 chromosome 2 (NC_009668), αr35_ReCFNr35f = Rhizobium etli CFN 42 plasmid p42f (NC_007766), αr35_Atr35C = Agrobacterium tumefaciens str. C58 chromosome linear (NC_003063), αr35_Rlvr35C = Rhizobium leguminosarum bv. viciae 3841 (NC_008384), αr35_Rlt1325r35p02 = Rhizobium leguminosarum trifolii WSM1325 plasmid pR132502 (NC_012858), Rlt1325r35p04 = Rhizobium leguminosarum bv. trifolii WSM1325 plasmid pR132504 (NC_012852), αr35_Rlvr35p11 = Rhizobium leguminosarum bv. viciae 3841 plasmid pRL11 (NC_008384).
Table 2: Detailed Genomic context information of the α35 sRNA seed members.
Family Feature Name Strand Begin End Protein name Annotation Organism
αr35 gene SM_b20551 R 576952 577398 NP_437070.1 proteolysis Sinorhizobium meliloti 1021 plasmid pSymB (NC_003078)
αr35 sRNA Smr35B D 577730 577868

Sinorhizobium meliloti 1021 plasmid pSymB (NC_003078)
αr35 gene SM_b20552 D 578150 578881 NP_437071.1 nitrogen compound metabolic process Sinorhizobium meliloti 1021 plasmid pSymB (NC_003078)
αr35 gene Oant_4157 D 1586007 1587065 YP_001372686.1 nitrogen compound metabolic process Brucella anthropi ATCC 49188 chromosome 2 (NC_009668)
αr35 sRNA Oar35CII R 1587138 1587279

Brucella anthropi ATCC 49188 chromosome 2 (NC_009668)
αr35 gene Oant_4158 R 1587338 1587724 YP_001372687.1 proteolysis Brucella anthropi ATCC 49188 chromosome 2 (NC_009668)
αr35 gene RHE_PF00127 R 133963 134406 YP_472745.1 hypothetical protein Rhizobium etli CFN 42 plasmid p42f (NC_007766)
αr35 gene RHE_PF00128 D 136269 136700 YP_472746.1 hypothetical protein Rhizobium etli CFN 42 plasmid p42f (NC_007766)
αr35 sRNA ReCFNr35f D 136368 136508

Rhizobium etli CFN 42 plasmid p42f (NC_007766)
αr35 gene RHE_PF00129 D 137962 138264 YP_472747.1 membrane protein Rhizobium etli CFN 42 plasmid p42f (NC_007766)
αr35 gene Atu3124 D 132103 132318 NP_357476.1
Agrobacterium tumefaciens str. C58 chromosome linear (NC_003063)
αr35 sRNA Atr35C D 132595 132733

Agrobacterium tumefaciens str. C58 chromosome linear (NC_003063)
αr35 gene Atu3126 D 133057 133344 NP_357475.1 nitrogen compound metabolic process Agrobacterium tumefaciens str. C58 chromosome linear (NC_003063)
αr35 gene RL2133 D 2256297 2256500 YP_767731.1 hypothetical protein Rhizobium leguminosarum bv. viciae 3841 (NC_008380)
αr35 gene RL2134 R 2256617 2256982 YP_767732.1 hyphotetical protein Rhizobium leguminosarum bv. viciae 3841 (NC_008380)
αr35 sRNA Rlvr35C D 2256716 2256853

Rhizobium leguminosarum bv. viciae 3841 (NC_008380)
αr35 gene RL2135 D 2256994 2257383 YP_767733.1 transposase-related protein Rhizobium leguminosarum bv. viciae 3841 (NC_008380)
αr35 gene Rleg_6079 D 113829 114197 YP_002978585.1 membrane protein Rhizobium leguminosarum trifolii WSM1325 plasmid pR132502 (NC_012852)
αr35 sRNA Rlt132504r35p04 R 114247 114385

Rhizobium leguminosarum trifolii WSM1325 plasmid pR132502 (NC_012852)
αr35 gene Rleg_6080 R 114489 115121 YP_002978586.1 endonuclease Rhizobium leguminosarum trifolii WSM1325 plasmid pR132502 (NC_012852)
αr35 gene Rleg_7049 D 465959 466222 YP_002985022.1
Rhizobium leguminosarum trifolii WSM1325 plasmid pR132504 (NC_012858)
αr35 sRNA Rlt132502r35p02 R 466255 466394

Rhizobium leguminosarum trifolii WSM1325 plasmid pR132504 (NC_012858)
αr35 gene Rleg_7050 R 466934 467824 YP_002985023.1 transcription regulator Rhizobium leguminosarum trifolii WSM1325 plasmid pR132504 (NC_012858)
αr35 gene pRL110105 D 122566 123456 YP_771137.1 transcription regulator Rhizobium leguminosarum bv. viciae 3841 plasmid pRL11 (NC_008384)
αr35 sRNA Rlvr35p11 D 124030 124162

Rhizobium leguminosarum bv. viciae 3841 plasmid pRL11 (NC_008384)
αr35 gene pRL110106 R 124229 124447 YP_771138.1 hyphotetical protein Rhizobium leguminosarum bv. viciae 3841 plasmid pRL11 (NC_008384)

References

edit
  1. ^ a b c del Val C, Rivas E, Torres-Quesada O, Toro N, Jiménez-Zurdo JI (2007). "Identification of differentially expressed small non-coding RNAs in the legume endosymbiont Sinorhizobium meliloti by comparative genomics". Mol Microbiol. 66 (5): 1080–1091. doi:10.1111/j.1365-2958.2007.05978.x. PMC 2780559. PMID 17971083.
  2. ^ a b Schlüter JP, Reinkensmeier J, Daschkey S, Evguenieva-Hackenberg E, Janssen S, Jänicke S, Becker JD, Giegerich R, Becker A (2010). "A genome-wide survey of sRNAs in the symbiotic nitrogen-fixing alpha-proteobacterium Sinorhizobium meliloti". BMC Genomics. 11 (245): 436. doi:10.1186/1471-2164-11-436. PMC 3091635. PMID 20637113.
  3. ^ Nawrocki EP, Kolbe DL, Eddy SR (2009). "Infernal 1.0: inference of RNA alignments". Bioinformatics. 25 (10): 1335–1337. doi:10.1093/bioinformatics/btp157. PMC 2732312. PMID 19307242.
  4. ^ Will S, Reiche K, Hofacker IL, Stadler PF, Backofen R (2007). "Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering". PLOS Comput Biol. 4 (65): e65. Bibcode:2007PLSCB...3...65W. doi:10.1371/journal.pcbi.0030065. PMC 1851984. PMID 17432929.
  5. ^ Stephan H Bernhart; Ivo L Hofacker; Sebastian Will; Andreas R Gruber & Peter F Stadler (2008). "RNAalifold: improved consensus structure prediction for RNA alignments". BMC Bioinformatics. 9 (474): 474. doi:10.1186/1471-2105-9-474. PMC 2621365. PMID 19014431.
  6. ^ MacLellan SR, MacLean AM, Finan TM (2006). "Promoter prediction in the rhizobia". Microbiology. 152 (6): 1751–1763. doi:10.1099/mic.0.28743-0. PMID 16735738.
  7. ^ Novichkov PS, Rodionov DA, Stavrovskaya ED, Novichkova ES, Kazakov AE, Gelfand MS, Arkin AP, Mironov AA, Dubchak I (2010). "RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach". Nucleic Acids Research. 38 (Web Server issue): W299–W307. doi:10.1093/nar/gkq531. PMC 2896116. PMID 20542910.
  8. ^ Gama-Castro S, Salgado H, Peralta-Gil M, Santos-Zavaleta A, Muniz-Rascado L, Solano-Lira H, Jimenez-Jacinto V, Weiss V, Garcia-Sotelo JS, Lopez-Fuentes A, Porron-Sotelo L, Alquicira-Hernandez S, Medina-Rivera A, Martinez-Flores I, Alquicira-Hernandez K, Martinez-Adame R, Bonavides-Martinez C, Miranda-Rios J, Huerta AM, Mendoza-Vargas A, Collado-Torres L, Taboada B, Vega-Alvarado L, Olvera M, Olvera L, Grande R, Morett E, Collado-Vides J (2010). "RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units)". Nucleic Acids Research. 39 (Database issue): D98–D105. doi:10.1093/nar/gkq1110. PMC 3013702. PMID 21051347.
  9. ^ Bailey TL, Elkan C (1994). "Fitting a mixture model by expectation maximization to discover motifs in biopolymers". Proceedings. International Conference on Intelligent Systems for Molecular Biology. 2. AAAI Press, Menlo Park, California: 28–36. PMID 7584402.
  10. ^ Goethals K, Van Montagu M, Holsters M (1992). "Conserved motifs in a divergent nod box of Azorhizobium caulinodans ORS571 reveal a common structure in promoters regulated by LysR-type proteins". Proc Natl Acad Sci U S A. 89 (5): 1646–1650. Bibcode:1992PNAS...89.1646G. doi:10.1073/pnas.89.5.1646. PMC 48509. PMID 1542656.
  11. ^ Vinayagam A, del Val C, Schubert F, Eils R, Glatting KH, Suhai S, König R (2006). "GOPET: a tool for automated predictions of Gene Ontology terms". BMC Bioinformatics. 7: 171. doi:10.1186/1471-2105-7-161. PMC 1434778. PMID 16549020.
  12. ^ Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M (2005). "Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research". Bioinformatics. 21 (18): 3674–3676. doi:10.1093/bioinformatics/bti610. PMID 16081474.
  13. ^ del Val C, Ernst P, Falkenhahn M, Fladerer C, Glatting KH, Suhai S, Hotz-Wagenblatt A (2007). "ProtSweep, 2Dsweep and DomainSweep: protein analysis suite at DKFZ". Nucleic Acids Res. 35 (Web Server issue): W444–50. doi:10.1093/nar/gkm364. PMC 1933246. PMID 17526514.