Skip to main content

Information retrieval from databases - search concepts, Tools for searching, homology searching, finding Domain and Functional site homologies

Information retrieval from databases - search concepts, Tools for searching, homology searching, finding Domain and Functional site homologies

Information Retrieval from Databases

1. Introduction

Information retrieval in bioinformatics refers to the process of extracting relevant biological data (DNA, RNA, protein sequences, structures, or functional information) from databases.
Aim: Identify sequences, functions, or structural features for analysis, comparison, and annotation.
Databases can be primary (raw sequence data) or secondary/derived (annotated, processed data).
2. Search Concepts in Biological Databases

2.1 Types of Searches

Exact Match Search

Returns results only if the query exactly matches database entries.
Useful for known accession numbers or IDs.

Pattern/Keyword Search

Searches based on specific motifs, keywords, or annotations.
Example: “kinase domain,” “signal peptide.”

Similarity/Homology Search
Detects sequences similar to the query based on sequence alignment.
Uses scoring matrices to assess similarity (e.g., BLOSUM, PAM).
Useful for identifying homologous genes or proteins.


Complex Query Search

Combines Boolean operators (AND, OR, NOT) to refine results.
Example: “kinase AND human NOT viral.”


2.2 Search Parameters

Query sequence or keyword
Database selection (nucleotide, protein, structural, functional)
Algorithm choice (BLAST, FASTA, PSI-BLAST)
Threshold or cut-off (E-value, score, % identity)
Filters (organism, date, length, sequence type)


3. Tools for Searching Biological Databases


3.1 Nucleotide Sequence Databases


GenBank (NCBI)
EMBL (European Nucleotide Archive)
DDBJ (DNA Data Bank of Japan)
Search Tools:
BLASTN – nucleotide vs nucleotide
FASTA – nucleotide similarity search


3.2 Protein Sequence Databases

SWISS-PROT / UniProtKB – curated protein sequences
PIR / TrEMBL – unreviewed protein sequences
Search Tools:
BLASTP – protein vs protein
PSI-BLAST – iterative search for distant homologs
HMMER – profile-based search using hidden Markov models

3.3 Structural Databases

Protein Data Bank (PDB) – 3D protein structures
SCOP / CATH – structural classification of proteins
Search Tools:
BLAST 3D – structure-based sequence search
DALI – structural alignment


3.4 Specialized Databases
Pfam – protein families
PROSITE – protein motifs
InterPro – integrated database of protein domains


4. Homology Searching

Homology searching identifies evolutionarily related sequences based on similarity.


4.1 Concept

Homologous sequences: share a common ancestor.
Types:

Orthologs – homologs in different species
Paralogs – homologs in the same species
Homology suggests similar structure or function.


4.2 Methods

1. Pairwise Sequence Alignment

Tools: BLAST, FASTA
Measures similarity (% identity) and E-value


2. Multiple Sequence Alignment (MSA)
Tools: Clustal Omega, MUSCLE
Identifies conserved residues and motifs


3. Profile-based Searching

Uses Position-Specific Scoring Matrices (PSSM)
Tool: PSI-BLAST, HMMER

4. Structural Homology
Comparing 3D structures for similarity
Tools: DALI, CATH, SCOP


5. Finding Domain and Functional Site Homologies


5.1 Protein Domains

Definition: Conserved part of protein with specific function/structure.
Examples: kinase domain, zinc finger, SH2 domain.
Domains often determine protein function.


5.2 Domain Databases and Tools

Pfam – HMM-based domain identification
SMART – domains in signaling and extracellular proteins
InterPro – integrates multiple domain databases
PROSITE – motifs and functional sites

5.3 Functional Site Prediction

Active sites, binding sites, or motifs are predicted based on:
Conserved residues across homologs
3D structure information
Known motifs (PROSITE patterns)
Tools:
ScanProsite – motif scanning
MotifScan – identifies functional motifs
CDD (Conserved Domain Database) – identifies domains and key residues


5.4 Steps to Identify Domain/Functional Homology
Input protein sequence
Perform sequence similarity search (BLASTP/PSI-BLAST)
Check conserved domains (Pfam, SMART, InterPro)
Predict functional motifs (PROSITE, ScanProsite)
Validate with structure-based tools if available

6. Summary / Workflow for Information Retrieval
1.Define the query (sequence, accession, or keyword)
2.Select the appropriate database (nucleotide, protein, structural)
3.Choose the search algorithm (BLAST, FASTA, HMMER)
4. Adjust parameters (E-value, filters)
5. Analyze results:
          Sequence similarity
          Homology inference
          Domain identification
           Functional site prediction
            Validate and annotate sequences
            Optional: Structural or evolutionary analysis


7. Key Points

Homology searches are more reliable than keyword searches for function prediction.
Iterative profile-based methods (PSI-BLAST, HMMER) detect distant homologs.
Domain and motif identification is essential for functional annotation.
Integrating sequence, domain, and structure information gives robust predictions.

Comments

Popular posts from this blog

Introduction about this blog

  ONE MIND – KERALA PSC JOB BLOGSPOT        --- One Goal. One Focus. One Mind . ONE MIND – KERALA PSC JOB is a dedicated   free learning platform created with a single vision — to help aspirants achieve their dream of a Kerala PSC job through focused, smart, and consistent preparation. This blog brings together quality study materials, syllabus-oriented notes, topic-wise explanations, MCQs, previous year question analysis, and exam strategies, designed especially for Kerala PSC examinations. The content is prepared in a simple, clear, and exam-focused manner, making it suitable for beginners as well as serious aspirants. At ONE MIND, we believe that success begins with clarity of thought, disciplined study, and the right guidance. Whether your goal is a government job, teaching position, or administrative role, this platform supports you at every step of your preparation journey. 🎯 One Goal. One Focus. One Mind. 🚀 Learn smart. Prepare well. Succeed conf...

1•Psc previous:Higher Secondary School Teacher (Junior)-Economics-Kerala Higher Secondary Education

Psc previous: Higher Secondary School Teacher (Junior)-Economics- Kerala Higher Secondary Education 1. Which one is not related with swadeshabhimani K Ramakrishnapillai ? (A) Kerala Pathrika  (B) Keralan (C) Sarada  (D) Deshabhimani Ans: Deshabhimani 2. Former wrestling star Battulga Khaltmaa has won the Presidential election of _ . (A) Philippines  (B) Mangolia (C) INDIA   (D) Thailand Answer: Mangolia (BM) 3. ____ has been described India's first world heritage city. (A) Ahmedabad  (B) Mysore (C) Delhi  (D) Jaipur Answer: Ahmedabad 4. Which one is the first important programme of "Sahodara Sangam" ? (A) Misra Vivaham  (B) Eradication of superstition (C) Negation of religion  (D) Misrabhojanam Answer : Misrabhojanam  (Code : സഹോദരമിà´¶്à´° ) 5. Who started the Malayalam newspaper "Paschima Tharaka" ? (A) Devji Bheemji  (B) Benjamin Baili (C) Herman Gundert  (D) Kandathil Vargeese Mappila   Answer: Devji Bheemji 6. Which one ...