https://github.com/Jumitti/TFinder
Also available on Health Universe: https://apps.healthuniverse.com/pmb-xci-tsb
And BETA: https://tfinder-beta.streamlit.app/
Julien Minniti, Eric Duplan, Frédéric Checler et al. TFinder: a Python web tool for predicting Transcription Factor Binding Sites https://doi.org/10.1016/j.jmb.2024.168921
TFinder is an easy-to-use Python web portal allowing the identification of Individual Motifs (IM) such as Transcription Factor Binding Sites (TFBS). Using the NCBI API, TFinder extracts either promoter or the gene terminal regions through a simple query based on NCBI gene name or ID. It enables simultaneous analysis across five different species for an unlimited number of genes. TFinder searches for TFBS and IM in different formats, including IUPAC codes and JASPAR entries. Moreover, TFinder also allows the generation and use of a Position Weight Matrix (PWM). Finally, the data can be recovered in a tabular form and a graph showing the relevance of the TFBSs and IMs as well as its location relative to the Transcription Start Site (TSS) or gene end. The results are then sent by email to the user facilitating the subsequent analysis and data analysis sharing.
A DNA Individual Motif (IM) is a short pattern conserved between species that can be bind by proteins like Transcription Factors (TFs) enabling gene regulation. They specifically recognize a nucleotide IM sequence called Transcription Factor Binding Site (TFBS) either in gene promoter or terminator regions. Searching of TFBSs is an empirical discipline of genomics which is a key step prior to TFBS functional validation either by gel shift assays (EMSA) or by chromatin immunoprecipitation (ChIP). Both techniques allow the examination of the interaction between a TF and DNA sequence (Jayaram, Usvyat and R. Martin 2016).
The in-silico research of IM can be tedious and time-consuming at various stages, especially for academics or biologist not familiar with bioinformatics. Thus, it is first necessary to retrieve the regulatory region sequence (promoter or terminator). This step may be achieved by the utilization of several databases such as NCBI, UCSC or Ensembl, but they are not intuitive and user-friendly. Next, after identifying the regulatory region sequence, one may use TF databases such as JASPAR (Castro-Mondragon et al. 2022) and TRANSFAC (Matys 2006), but they have their limitations. For instance, these platforms do not allow the search of TFBS for an unreferenced TF and may be subject to fees. Other tools such as PROMO (Farre 2003), TFBIND (Tsunoda and Takagi 1999) and TFsitescan allow searching multiple TFBSs in a unique nucleotide sequence; nevertheless, they all use JASPAR and TRANSFAC databases and do not allow a custom IM or un-referenced TFBS. Finally, a few web tools like FiMO, a module of MEME Suite (Grant, Bailey and Noble 2011; Bailey et al. 2015), allow an unreferenced TFBS or an IM. Of note all the above cited tools are rather archaic, not user-friendly and do not allow the retrieval of regulatory regions prior to motif finding.
TFinder is an intuitive, easy-to-use, fast analysis open source and free software that allows both the retrieval of sequences and the search of IM in a unique web application. TFinder allows (1) the analysis of an unlimited number of genes in a record time; (2) the selection up to five different species (human, mouse, rat, drosophila, zebrafish); (3) the choice and examination of either promoter and/or terminator gene regions; (4) the search of IM/TFBS in different formats (IUPAC code, JASPAR ID or a Position Weight Matrix (PWM)); (5) the search of IM/TFBS on the sense and antisense strand but also considers with the complementary forms and (6) the export of the resulting analysis by email.
TFinder simplifies the extraction of gene regulatory regions (promoters and terminators) using gene names or Gene IDs, enabling cross-species comparisons and advanced customization. It supports automated coordinate retrieval, sequence formatting in FASTA, and advanced options for precise region selection across multiple genes and species.
Promoter and Terminator Extraction
The FASTA format is used to represent nucleotide sequences like promoters and terminators, with headers containing metadata such as gene name, species, strand, and TSS positions. Properly formatted headers ensure accurate parsing and compatibility with analysis tools. Missing metadata defaults to standard values but can be manually added for precision.
The Motif Finder allows users to identify DNA motifs in sequences using various inputs such as IUPAC codes, JASPAR IDs, or PWMs. It supports advanced settings like pseudocount adjustments, background nucleotide frequencies (fixed or sequence-dependent), and p-value calculation through random sequence simulations. Results are visualized interactively, with scores and matches displayed alongside their statistical significance for enhanced motif analysis.
The mathematical approach includes transforming the PWM into a log-odds PSSM, incorporating pseudocounts for robustness, and calculating motif scores by summing log-odds ratios. Normalization ensures comparability, and p-values are derived from random sequence simulations to assess the statistical significance of motif matches.
If you encounter a problem, please send an email to [email protected] or [email protected] or use the https://github.com/Jumitti/TFinder/issues issues.
PhD. Minniti Julien PhD and Software developer
Minniti Pauline Graphic Designer
Dr. Duplan Eric Research Engineer
Dr. Alves da Costa Cristine Research Director
https://github.com/Jumitti/TFinder/blob/main/LICENSE