GENBANK FLATFILE GENERATOR A new flatfile generator has been written to replace the old asn2ff code. It is provided both as a stand-alone application, asn2gb, and as a pair of C functions in the NCBI software toolkit. There are several command-line arguments, with equivalent function parameters, that customize the behavior of the new flatfile generator and optimize its performance. NCBI maintains the GenBank nucleotide sequence database, and is part of the International Nucleotide Sequence Database (INSD) collaboration. The list of biological features and qualifiers approved by the collaborators for official release and exchange of GenBank, EMBL, and DDBJ records can be found at http://www.ncbi.nlm.nih.gov/projects/collab/FT/index.html NCBI converts all direct sequence submissions, as well as records supplied by our collaborators and other data sources, into a data model specified in Abstract Syntax Notation 1 (ASN.1) format, regardless of the original form of the data. From this common data representation we can generate GenBank or FASTA files, populate BLAST sequence databases, or build indices for the Entrez retrieval system. Since the ASN.1 data is structured for ease of computation, asn2gb and the underlying toolkit functions are able to provide useful derived information at the request of the user. For example, the sequence of bases encoding an mRNA feature can be presented in a /transcription qualifier. Because this is not an INSD-approved qualifier, it will not be present in the official flatfile release of a record. It can only be provided as an extension of the collaboration-approved format, such as in "Sequin mode" below. ASN2GB STANDALONE APPLICATION An asn2gb executable is now available on all platforms, and is distributed in the asn1-converters area of the NCBI public ftp site. The most commonly used arguments are explained below. A more detailed discussion of various parameter values is in the section on calling the SeqEntryToGnbk function from within your own program code. An input file and output file are required, but default to stdin and stdout, respectively, if not specified in the command-line. -i Input File Name -o Output File Name GenBank format produces the conventional GenBank flatfile on nucleotide sequences. GenPept is the equivalent on protein sequences. INSDSet is a set of one or more INSDSeq elements, which is an XML structured view of the information in the flatfile. INSDSeq contains additional fields, derived from the underlying data, provided as a convenience for computing on the sequences and their feature annotations. -f Format (b GenBank, e EMBL, p GenPept, t Feature Table, x INSDSet) Sequin mode produces a relaxed flatfile that allows unapproved qualifiers and database cross-references present in the record to be shown. It is typically used while constructing a sequence record for submission. The stricter modes are used for official GenBank releases and display of flatfiles on the Entrez web site. -m Mode (r Release, e Entrez, s Sequin, d Dump) Normal style examines each record for "far" references (accessions not packaged along with the sequence being formatted), and shows the CONTIG block instead of separately fetching the underlying components. Master style forces the components to be fetched and displays the actual sequence letters. -s Style (n Normal, s Segment, m Master, c Contig) Bit flags and custom flags modify the appearance of the flatfile, such as adding /transcription and /peptide extended qualifiers, eliminating the need for writing programs to extract these subsequences. Locks are used to preload "far" sequence components or lookup their accession numbers, and can greatly speed up processing of records with far references. The values are decimal numbers generated by combination of the appropriate binary bits, which are described in the section on calling the SeqEntryToGnbk function. -g Bit Flags (1 HTML, 2 XML, 4 ContigFeats, 8 ContigSrcs, 16 FarTransl) -h Lock/Lookup Flags (8 LockProd, 16 LookupComp, 64 LookupProd) -u Custom Flags (2 HideMostImpFeats, 4 HideSnpFeats) Batch processing of Bioseq-set ASN.1 release files is also supported. The gb*.aso.gz compressed binary files from the NCBI public ftp site can also be uncompressed on-the-fly on UNIX platforms. -a ASN.1 Type Single Record: a Any, e Seq-entry, b Bioseq, s Bioseq-set, m Seq-submit Release File: t Batch Bioseq-set, u Batch Seq-submit -b Bioseq-set is Binary [T/F] -c Bioseq-set is Compressed [T/F] -p Propagate Top Descriptors [T/F] -l Log file (Release files package many independent Seq-entry objects in a Bioseq-set. Using -a t causes asn2gb to read one component at a time, processing it and freeing it from memory before reading the next one. Otherwise it would try to process the entire file at once, almost certainly running out of memory.) Remote fetching allows accession lookups and fetching of far components from the NCBI network server. An indicated accession can also be fetched for formatting. -r Remote Fetching [T/F] -A Accession to Fetch Remote features should be fetched by using a modified form of -A: -A ",0,-1" The middle number is the desired set structure complexity, which normally should be 0. If the last number is -1, it will fetch all external feature annotations and package them with the returned sequence record. SEQENTRYTOGNBK FUNCTION The NCBI software toolkit provides flatfile generation functions for programmers to incorporate into their own computer applications. SeqEntryToGnbk takes a SeqEntryPtr or SeqLocPtr and calls asn2gnbk_setup, asn2gnbk_format, and asn2gnbk_cleanup, which are available from a private header. It returns FALSE if there was a problem generating the flatfile. BioseqToGnbk is simply a convenience function that takes a BioseqPtr, looks up the parent SeqEntryPtr, and then calls SeqEntryToGnbk. To use these functions, #include in your program code. NLM_EXTERN Boolean SeqEntryToGnbk ( SeqEntryPtr sep, SeqLocPtr slp, FmtType format, ModType mode, StlType style, FlgType flags, LckType locks, CstType custom, XtraPtr extra, FILE *fp ); NLM_EXTERN Boolean BioseqToGnbk ( BioseqPtr bsp, SeqLocPtr slp, FmtType format, ModType mode, StlType style, FlgType flags, LckType locks, CstType custom, XtraPtr extra, FILE *fp ); In the asn2gb application, format, mode, style, flags, locks, and custom parameters are specified by the -f, -m, -s, -g, -h and -u arguments, respectively. FORMATS include GENBANK_FMT, EMBL_FMT, GENPEPT_FMT, and FTABLE_FMT (Sequin's 5-column parsable feature table). If the SeqEntryPtr argument passed to SeqEntryToGnbk points to a Bioseq-set, the function processes all Bioseqs of the appropriate molecule type (nucleotide or protein) for the specified format. MODES are RELEASE_MODE, ENTREZ_MODE (release mode strictness except that it allows local IDs and does not require a valid CDS /protein_id accession), SEQUIN_MODE, and DUMP_MODE. RefSeq records can have certain qualifiers (e.g., /transcript_id) and db_xrefs show up in release mode beyond those approved by INSD agreement. Entrez mode is used for web display, and can show new elements that haven't yet finished their 4-month quarantine period. STYLES are NORMAL_STYLE, SEGMENT_STYLE, MASTER_STYLE, and CONTIG_STYLE. Segment style is the traditional representation of segmented sequences, while contig style displays a CONTIG line with a join of accessions instead of a sequence. Normal style automatically chooses between segment and contig style, depending upon the kind of data. (Near segmented records will be done in segment style. Far segmented sequences or delta sequences with no literals will be done as if you chose contig style.) Master style shows features mapped to the segmented Bioseq's coordinates. FLAGS are bit flags controlling appearance or behavior, and are ORed together. One 2-bit flag tells asn2gnbk to create HTML with web links, flatfile in XML form, or flatfile in ASN.1 form. These settings are mutually exclusive. The setup for creating HTML links is within SeqEntryToGnbk itself. #define CREATE_HTML_FLATFILE 1 #define CREATE_XML_GBSEQ_FILE 2 #define CREATE_ASN_GBSEQ_FILE 3 Others control feature display behavior in contig style, whether it was explicitly chosen or was called when a far segmented or far delta record was processed in normal style. #define SHOW_CONTIG_FEATURES 4 #define SHOW_CONTIG_SOURCES 8 A 2-bit flag set controls translation of CDS features with far products. #define SHOW_FAR_TRANSLATION 16 #define TRANSLATE_IF_NO_PRODUCT 32 #define ALWAYS_TRANSLATE_CDS 48 The same set of flags also apply to transcription of mRNA features with far products if the SHOW_TRANCRIPTION flag is also set. #define SHOW_FAR_TRANSCRIPTION 16 #define TRANSCRIBE_IF_NO_PRODUCT 32 #define ALWAYS_TRANSCRIBE_MRNA 48 Any record can be shown with RefSeq policies for exception, source, and other qualifiers, values, and db_xrefs that are not necessarily part of the INSD agreement. #define REFSEQ_CONVENTIONS 64 Another 2-bit flag controls where to get features when using far segmented parts or far component delta Bioseqs. #define ONLY_NEAR_FEATURES 128 #define FAR_FEATURES_SUPPRESS 256 #define NEAR_FEATURES_SUPPRESS 384 Other flags allow customization of reports from genomic product sets. #define COPY_GPS_CDS_UP 512 #define COPY_GPS_GENE_DOWN 1024 The CONTIG block can be shown along with the sequence block in master or segment style, when appropriate. #define SHOW_CONTIG_AND_SEQ 2048 mRNAs and peptide features can show /transcription or /peptide sequence qualifiers. This is most useful when generating INSDSeq XML so users do not have to compute on the data themselves. #define SHOW_TRANCRIPTION 4096 #define SHOW_PEPTIDE 8192 GBSeq XML has been replaced by INSDSeq XML. The CREATE_XML_GBSEQ_FILE flag will actually produce INSDSeq. The original GBSeq can be generated during the transition period by adding the following flag. #define PRODUCE_OLD_GBSEQ 16384 Still others are expected to be rarely used, or are for testing new features. #define DDBJ_VARIANT_FORMAT 32768 #define SPECIAL_GAP_DISPLAY 65536 #define FORCE_PRIMARY_BLOCK 131072 LOCKS are bits for controlling program performance, and are also ORed together. One flag set is for locking far segmented or delta components, far feature location Bioseqs, or far feature product Bioseqs in advance. This prevents the object manager from uncaching components at an inopportune time, causing unnecessary thrashing. Far component Bioseqs are needed for displaying the sequence. #define LOCK_FAR_COMPONENTS 2 #define LOCK_FAR_LOCATIONS 4 #define LOCK_FAR_PRODUCTS 8 Another set attempts to do bulk accession to gi lookups in advance, which is possible if PubSeqFetchEnable was called by the application. Remote fetching in asn2gb uses this new access mechanism. Far component IDs are needed for the CONTIG line, far location IDs for feature location joins, and far product IDs for the /protein_id and /transcript_id accessions. #define LOOKUP_FAR_COMPONENTS 16 #define LOOKUP_FAR_LOCATIONS 32 #define LOOKUP_FAR_PRODUCTS 64 #define LOOKUP_FAR_HISTORY 128 #define LOOKUP_FAR_INFERENCE 256 #define LOOKUP_FAR_OTHERS 512 To use PubSeqFetchEnable, the application should #include . CUSTOM are bit flags suppressing specific features, and are also ORed together. One set enables display of statistics for features and references. #define SHOW_FEATURE_STATS 1 #define SHOW_REFERENCE_STATS 2 Another set suppresses common feature types or all features. #define HIDE_FEATURES 4 #define HIDE_IMP_FEATS 8 #define HIDE_VARS_AND_REPT_REGNS 16 #define HIDE_SITES_BONDS_REGIONS 32 #define HIDE_CDD_FEATS 64 #define HIDE_CDS_PROD_FEATS 128 A 3-bit flag controls selective display of GeneRIF references or review articles in RefSeq records. #define HIDE_GENE_RIFS 256 #define ONLY_GENE_RIFS 512 #define ONLY_REVIEW_PUBS 768 #define NEWEST_PUBS 1024 #define OLDEST_PUBS 1280 #define HIDE_ALL_PUBS 1792 Protein feature tables and References in feature tables can also be shown. #define SHOW_PROT_FTABLE 2048 #define SHOW_FTABLE_REFS 4096 Source features, instantiated Gap features, and the sequence itself can also be suppressed. #define HIDE_SOURCE_FEATS 8192 #define HIDE_GAP_FEATS 16384 #define HIDE_SEQUENCE 32768 Gaps in far delta sequences in Web Entrez are normally converted to a shorthand notation. These can be forced to expand to runs of Ns. #define EXPANDED_GAP_DISPLAY 65536 Gene Ontology terms can be suppressed if desired. #define HIDE_GO_TERMS 131072 The CDS /translation can also be suppressed, even with near products. #define HIDE_TRANSLATION 262144 Evidence qualifiers, including experiment and inference, can be suppressed. #define HIDE_EVIDENCE_QUALS 524288 EXTRA is an opaque pointer used for preparing internal NCBI indices. Most programs will pass NULL for this parameter. SAMPLE GENBANK FLATFILE A sample genomic sequence encoding a spliced mRNA is shown below in GenBank format. The exon features in the original record have been removed from this example. LOCUS AF012431 2141 bp DNA linear ROD 07-FEB-2000 DEFINITION Mus musculus D-dopachrome tautomerase (Ddt) gene, complete cds. ACCESSION AF012431 VERSION AF012431.1 GI:2352907 KEYWORDS . SOURCE Mus musculus (house mouse) ORGANISM Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muridae; Murinae; Mus. REFERENCE 1 (bases 1 to 2141) AUTHORS Esumi,N., Budarf,M., Ciccarelli,L., Sellinger,B., Kozak,C.A. and Wistow,G. TITLE Conserved gene structure and genomic linkage for D-dopachrome tautomerase (DDT) and MIF JOURNAL Mamm. Genome 9 (9), 753-757 (1998) PUBMED 9716662 REFERENCE 2 (bases 1 to 2141) AUTHORS Esumi,N. and Wistow,G. TITLE Direct Submission JOURNAL Submitted (03-JUL-1997) Molecular Structure and Function, NEI, Building 6, Rm. 331, NIH, Bethesda, MD 20892, USA FEATURES Location/Qualifiers source 1..2141 /organism="Mus musculus" /mol_type="genomic DNA" /db_xref="taxon:10090" /chromosome="10" gene 1..2141 /gene="Ddt" mRNA join(1..159,462..637,1868..2141) /gene="Ddt" /product="D-dopachrome tautomerase" CDS join(52..159,462..637,1868..1940) /gene="Ddt" /note="related to macrophage migration inhibitory factor (MIF); in vitro activity on D-dopachrome" /codon_start=1 /product="D-dopachrome tautomerase" /protein_id="AAC77467.1" /db_xref="GI:2352908" /translation="MPFVELETNLPASRIPAGLENRLCAATATILDKPEDRVSVTIRP GMTLLMNKSTEPCAHLLVSSIGVVGTAEQNRTHSASFFKFLTEELSLDQDRIVIRFFP LEAWQIGKKGTVMTFL" BASE COUNT 473 a 567 c 570 g 531 t ORIGIN 1 agctcacccg gtgcagttac cgtttggcga tcccactctt ctcccgctaa catgccattc 61 gttgagttgg aaacaaactt gccggctagc cgcatacccg cggggctgga gaaccggctg 121 tgtgcggcca cagccaccat cctggacaaa cccgaagacg tgagtgaggg tcggcgagaa 181 cttgtgggct agggtcggac ctcccaatga cccgttccca tccccaggga ccccactccc 241 ctggtaacct ctgaccttcc gtgtcctatc ctcccttcct agatcccttc ctggttgtct 301 ttcccaggcg tgaccctgac gtgactgact cccaaggatc ctgggcagtg tcccagaccc 361 ggagccctcg gacccccacg ttaagaattt ggcgcgcccc cttctgaaca ccagccatgc 421 cctgcccaag cttcaggatt taacttttgt gttccttgca gcgcgtgagc gttacgatac 481 gacctggcat gaccctgttg atgaacaaat ccacagagcc ttgtgctcac cttctggtct 541 cttccatcgg ggttgtgggc accgcggagc agaaccgcac tcacagcgcc agcttcttca 601 agttcctcac cgaggagctg tccctggacc aggaccggta tgcagggcca gtgagggaac 661 gtatttgtgc gtctggagtc aggactcagt ctctctgtat gaggttgggg ggggggaggg 721 gtcactattt gctggttcca gaaagcactc agtgtccttg tccacgaagg tggactcctc 781 aggcactgga atggtgagtc tgtgatcaga atgatagcaa gatttcaatt ccttcgactc 841 tctacagccc cgagaaagga tggtttggga agccccagtg ttgtcttgtg tgtactgaga 901 atctacttag gcaccctctt aaccactgtg atagtggcct cctcaccgtc actgaaccag 961 ggggtctggt tttttaaggg agaacttttc caggctggtc cgagggaatc tggttgtgtc 1021 ctgaggcaga taacctttga actagataag gctccgggag agttgctgga tgataaaaag 1081 acctccccca caaggtgacc ctaccctccc ccctccccat ccttacattc tgaggcagag 1141 ttagagtctc atattcctga ggctggagcg ggcctgtgaa gaactacgga gataagtttg 1201 aaagagcctt ccaaaatgga gtcctagtgg gctcaggaaa gttggtattg gctgcttttg 1261 ttggatgctc aaatgctgtc ctttagttga ggggacaata cttcttaacg gtaatgctcg 1321 tgcacacagc acagggcaga tttggtagct tcctgacata gataactgta ttgggccagt 1381 tttacagatg gaaacctgag ggtgtcagcc ctgtgcacaa ccaccctggt gccagacgat 1441 cgccagggac ttcctctgag tcctgtgatt gagcaattgc tgattcccac agatttgaat 1501 cagatttgaa cctgcgcctc acttagagct gggctttggt tcaaaactaa gtgcctggta 1561 ccctgggcac gcctttagga gcatgcagtt agttagaagc agggggactg tttgttagcc 1621 cgtaagcagc ctaacatgct cacctgagca cagagcacag gtattgaagc cattgcgtta 1681 agtctgcact gggaccggta tagccatcac ctttcttctg acttgtcttt ggtgcaagga 1741 tcattagctg gggtgggcag attggcaaaa tatcctgcag gctgatatgg gctggcctgt 1801 ctggcaggga ccttaacaaa tgaggggtgt atgcaggagt tgacatctct ccttcttcct 1861 cctaaaggat cgttatccgc ttcttcccct tggaggcttg gcagatcgga aagaaaggaa 1921 ctgtcatgac atttctgtga cggaaacaaa gaacccaggg tgtttgctcg aaccgggcca 1981 gagcccttcc agagaggccc tcccggcaga atcgtggcct ggtagatagg atggtaaatc 2041 cctcttttgc ctaaacgtct gcgacttcag tggtccattt ttctcttccc cagcctcgtg 2101 aataattgaa agagagcaaa taaatgaaga gaatatcatt c // SAMPLE INSDSET XML The same record is shown in INSDSet XML format. INSDSeq XML is a data distribution format meant to be read by a computer, not a display format intended for human reading, so sequence letters are single strings of characters with no spaces or newlines. (The sequences and other long lines are word-wrapped here only for printing.) The INSDFeature_location is the string displayed exactly as it was in the GenBank flatfile. join(1..159,462..637,1868..2141) For the convenience of users who wish to compute on features without having to parse these string, the individual feature intervals are also presented individually. 1 159 AF012431.1 462 637 AF012431.1 ... The record here was generated with the SHOW_TRANCRIPTION flag set to extract the bases under the mRNA feature interval and display them in a transcription qualifier. This eliminates the need to process feature intervals for the common task of obtaining the mRNA bases. SHOW_PEPTIDE does the same for extracting peptide sequences from under sig_peptide or mat_peptide features. The transcription and peptide qualifiers are extensions of those approved by the INSD for official releases. AF012431 2141 DNA linear ROD 07-FEB-2000 03-SEP-1997 Mus musculus D-dopachrome tautomerase (Ddt) gene, complete cds AF012431 AF012431.1 gb|AF012431.1|AF012431 gi|2352907 Mus musculus (house mouse) Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muridae; Murinae; Mus 1 (bases 1 to 2141) Esumi,N. Budarf,M. Ciccarelli,L. Sellinger,B. Kozak,C.A. Wistow,G. Conserved gene structure and genomic linkage for D-dopachrome tautomerase (DDT) and MIF Mamm. Genome 9 (9), 753-757 (1998) 9716662 2 (bases 1 to 2141) Esumi,N. Wistow,G. Direct Submission Submitted (03-JUL-1997) Molecular Structure and Function, NEI, Building 6, Rm. 331, NIH, Bethesda, MD 20892, USA source 1..2141 organism Mus musculus mol_type genomic DNA db_xref taxon:10090 chromosome 10 gene 1..2141 1 2141 AF012431.1 gene Ddt mRNA join(1..159,462..637,1868..2141) 1 159 AF012431.1 462 637 AF012431.1 1868 2141 AF012431.1 gene Ddt product D-dopachrome tautomerase transcription AGCTCACCCGGTGCAGTTACCGTTTGGCGATCCCACTCTTCT CCCGCTAACATGCCATTCGTTGAGTTGGAAACAAACTTGCCGGCTAGCCGCATACCCGCGGGGCTGGAGAACCGG CTGTGTGCGGCCACAGCCACCATCCTGGACAAACCCGAAGACCGCGTGAGCGTTACGATACGACCTGGCATGACC CTGTTGATGAACAAATCCACAGAGCCTTGTGCTCACCTTCTGGTCTCTTCCATCGGGGTTGTGGGCACCGCGGAG CAGAACCGCACTCACAGCGCCAGCTTCTTCAAGTTCCTCACCGAGGAGCTGTCCCTGGACCAGGACCGGATCGTT ATCCGCTTCTTCCCCTTGGAGGCTTGGCAGATCGGAAAGAAAGGAACTGTCATGACATTTCTGTGACGGAAACAA AGAACCCAGGGTGTTTGCTCGAACCGGGCCAGAGCCCTTCCAGAGAGGCCCTCCCGGCAGAATCGTGGCCTGGTA GATAGGATGGTAAATCCCTCTTTTGCCTAAACGTCTGCGACTTCAGTGGTCCATTTTTCTCTTCCCCAGCCTCGT GAATAATTGAAAGAGAGCAAATAAATGAAGAGAATATCATTC CDS join(52..159,462..637,1868..1940) 52 159 AF012431.1 462 637 AF012431.1 1868 1940 AF012431.1 gene Ddt note related to macrophage migration inhibitory factor (MIF); in vitro activity on D-dopachrome codon_start 1 transl_table 1 product D-dopachrome tautomerase protein_id AAC77467.1 db_xref GI:2352908 translation MPFVELETNLPASRIPAGLENRLCAATATILDKPEDRVSVTI RPGMTLLMNKSTEPCAHLLVSSIGVVGTAEQNRTHSASFFKFLTEELSLDQDRIVIRFFPLEAWQIGKKGTVMTF L AGCTCACCCGGTGCAGTTACCGTTTGGCGATCCCACTCTTCTCCCGCTAACAT GCCATTCGTTGAGTTGGAAACAAACTTGCCGGCTAGCCGCATACCCGCGGGGCTGGAGAACCGGCTGTGTGCGGC CACAGCCACCATCCTGGACAAACCCGAAGACGTGAGTGAGGGTCGGCGAGAACTTGTGGGCTAGGGTCGGACCTC CCAATGACCCGTTCCCATCCCCAGGGACCCCACTCCCCTGGTAACCTCTGACCTTCCGTGTCCTATCCTCCCTTC CTAGATCCCTTCCTGGTTGTCTTTCCCAGGCGTGACCCTGACGTGACTGACTCCCAAGGATCCTGGGCAGTGTCC CAGACCCGGAGCCCTCGGACCCCCACGTTAAGAATTTGGCGCGCCCCCTTCTGAACACCAGCCATGCCCTGCCCA AGCTTCAGGATTTAACTTTTGTGTTCCTTGCAGCGCGTGAGCGTTACGATACGACCTGGCATGACCCTGTTGATG AACAAATCCACAGAGCCTTGTGCTCACCTTCTGGTCTCTTCCATCGGGGTTGTGGGCACCGCGGAGCAGAACCGC ACTCACAGCGCCAGCTTCTTCAAGTTCCTCACCGAGGAGCTGTCCCTGGACCAGGACCGGTATGCAGGGCCAGTG AGGGAACGTATTTGTGCGTCTGGAGTCAGGACTCAGTCTCTCTGTATGAGGTTGGGGGGGGGGAGGGGTCACTAT TTGCTGGTTCCAGAAAGCACTCAGTGTCCTTGTCCACGAAGGTGGACTCCTCAGGCACTGGAATGGTGAGTCTGT GATCAGAATGATAGCAAGATTTCAATTCCTTCGACTCTCTACAGCCCCGAGAAAGGATGGTTTGGGAAGCCCCAG TGTTGTCTTGTGTGTACTGAGAATCTACTTAGGCACCCTCTTAACCACTGTGATAGTGGCCTCCTCACCGTCACT GAACCAGGGGGTCTGGTTTTTTAAGGGAGAACTTTTCCAGGCTGGTCCGAGGGAATCTGGTTGTGTCCTGAGGCA GATAACCTTTGAACTAGATAAGGCTCCGGGAGAGTTGCTGGATGATAAAAAGACCTCCCCCACAAGGTGACCCTA CCCTCCCCCCTCCCCATCCTTACATTCTGAGGCAGAGTTAGAGTCTCATATTCCTGAGGCTGGAGCGGGCCTGTG AAGAACTACGGAGATAAGTTTGAAAGAGCCTTCCAAAATGGAGTCCTAGTGGGCTCAGGAAAGTTGGTATTGGCT GCTTTTGTTGGATGCTCAAATGCTGTCCTTTAGTTGAGGGGACAATACTTCTTAACGGTAATGCTCGTGCACACA GCACAGGGCAGATTTGGTAGCTTCCTGACATAGATAACTGTATTGGGCCAGTTTTACAGATGGAAACCTGAGGGT GTCAGCCCTGTGCACAACCACCCTGGTGCCAGACGATCGCCAGGGACTTCCTCTGAGTCCTGTGATTGAGCAATT GCTGATTCCCACAGATTTGAATCAGATTTGAACCTGCGCCTCACTTAGAGCTGGGCTTTGGTTCAAAACTAAGTG CCTGGTACCCTGGGCACGCCTTTAGGAGCATGCAGTTAGTTAGAAGCAGGGGGACTGTTTGTTAGCCCGTAAGCA GCCTAACATGCTCACCTGAGCACAGAGCACAGGTATTGAAGCCATTGCGTTAAGTCTGCACTGGGACCGGTATAG CCATCACCTTTCTTCTGACTTGTCTTTGGTGCAAGGATCATTAGCTGGGGTGGGCAGATTGGCAAAATATCCTGC AGGCTGATATGGGCTGGCCTGTCTGGCAGGGACCTTAACAAATGAGGGGTGTATGCAGGAGTTGACATCTCTCCT TCTTCCTCCTAAAGGATCGTTATCCGCTTCTTCCCCTTGGAGGCTTGGCAGATCGGAAAGAAAGGAACTGTCATG ACATTTCTGTGACGGAAACAAAGAACCCAGGGTGTTTGCTCGAACCGGGCCAGAGCCCTTCCAGAGAGGCCCTCC CGGCAGAATCGTGGCCTGGTAGATAGGATGGTAAATCCCTCTTTTGCCTAAACGTCTGCGACTTCAGTGGTCCAT TTTTCTCTTCCCCAGCCTCGTGAATAATTGAAAGAGAGCAAATAAATGAAGAGAATATCATTC