Understanding the FASTA Format for Regulatory Regions

This documentation explains the FASTA format used for defining nucleotide sequences, such as promoters and terminators, with metadata parsing for better integration into tools.


FASTA Example

>NM_004562 PRKN | Homo sapiens chromosome 6, GRCh38.p14 Primary Assembly NC_000006.12 | Strand: minus | Promoter | TSS (on chromosome): 162727766 | TSS (on sequence): 2000
TATGAATACAGGTTTAGGAAAAAACAGAAAAGAACCCCAACCAGTAAAAAAAAAATTAAAGTATAACATTAAAAAACATCAAAATTGTAAATATTGTGTAGAAGAAAAACTAAATGATTAACCTGAATGG...

Key Components of the FASTA Format

  1. Header Line (>): Contains metadata about the sequence. Key fields include:
  2. Sequence Line: Represents the nucleotide sequence in uppercase letters (A, T, G, C).

Using Custom FASTA Files

Adding Your Custom FASTA

You can upload FASTA files containing regulatory region sequences. Headers must include metadata for proper processing: