The 21st?Century has seen a tremendous?increase?in the amount of?biological data
This has been due to rapid advances in?DNA sequencing?and other technologies
Developments in scientific research have been accompanied by?improvements in computing, enabling scientists to interpret complex biological data using bioinformatics applications
Bioinformatics?is an?interdisciplinary field?that develops methods and software to help further our understanding of life by making sense of this data
Although many new bioinformatics applications are at the forefront of applied computing, most scientific research uses?standard tools and databases
Data related to gene sequence, protein structure, gene expression or metabolites is curated, annotated and stored in databases such as?GenBank, NCBI, EBI, PDB
A range of?open source software tools?is available to query this data
Sequence similarity
If a scientist has an unknown DNA sequence, they can determine if it codes for a gene
BLAST?(Basic?Local?Alignment?Search?Tool) search can compare the unknown DNA sequence to?all known gene sequences?in a particular database
BLAST?finds regions of similarity?between sequences
The search returns ‘hits’ which are the sequences most related to the search sequence (depending on the parameters set)
There are?many variations of BLAST?that can be used for different analyses such as?protein sequences?or comparing multiple input sequences at once
Genetic variation and evolutionary relationships
Scientists can?compare homologous gene sequences?between many organisms
Sequences are compared using an alignment tool such as?Clustal W?(there are many alternatives)
This aligns (stacks) the sequences based on similar regions so that?variable regions can be identified
This determines the?degree of similarity?between organisms which?gives an indication of how closely related?the organisms are
There may be a?common ancestral origin?but in some organisms, the gene might have accumulated differences over times from random mutations
Tree-like evolutionary diagrams (phylogenetic trees) can be constructed with software such as?PhyloWin?to show the?degree of relatedness?to a recent common ancestor
Phylogenetic analysis?is useful for biological classification, conservation studies, forensics or molecular epidemiology which can help dictate public health policy
Variants?of highly infectious pathogens such as SARS-CoV-2 (a well-known coronavirus) can be identified using these techniques
Sequencing DNA to determine protein sequences
The?genetic code?can be used to determine the?amino acid sequence?within a protein
This primary structure information can be used to predict?how proteins will fold?into their tertiary structure
This gives a greater level of understanding of?how a protein functions?or?interacts?with other proteins or molecules
Such information can be used for a range of applications, such as?drug design?or?novel protein engineering?in synthetic biology
Bioinformatics allows for large amounts of biological data to be available instantly to researchers across the globe