![]() |
Bsoft 2.1.4
Bernard's software package
|
Header file for sequence analysis functions. More...
#include "rwmolecule.h"
#include "rwresprop.h"
#include "Complex.h"
#include "Matrix.h"
#include "utilities.h"
Functions | |
long | seq_limit (Bmolgroup *molgroup, Bstring &refseq) |
Limits the selection to the reference sequence in an aligned set. More... | |
Matrix | seq_aligned_identity (Bmolgroup *molgroup) |
Calculates the pairwise identities between aligned sequences. More... | |
Matrix | seq_aligned_similarity (Bmolgroup *molgroup, double threshold, Bresidue_matrix *simat) |
Calculates the pairwise similarities between aligned sequences. More... | |
long | seq_select (Bmolgroup *molgroup, long minlen, long maxlen) |
Selects sequences within a range of lengths. More... | |
long | seq_select (Bmolgroup *molgroup, Matrix mat, long ref, double cutoff) |
Selects sequences based on a comparison matrix of aligned sequences. More... | |
long | seq_delete (Bmolgroup *molgroup, Matrix mat) |
Deletes non-selected sequences and corresponding elelments of a comparison matrix. More... | |
string | seq_aligned_profile (Bmolgroup *molgroup) |
Generates a PROSITE format profile from an aligned set of sequences. More... | |
int | seq_aligned_information (Bmolgroup *molgroup, int window, Bstring &psfile) |
Calculates the sequence logo representation for an alignment. More... | |
int | seq_aligned_hydrophobicity (Bmolgroup *molgroup, int window, double threshold, Bstring &hphobfile, Bstring &psfile) |
Calculates the average hydrophobicity at every position in an alignment. More... | |
vector< Complex< float > > | seq_frequency_analysis (long win, long start, long end, vector< double > &data) |
Fourier transforms a vector for frequency analysis. More... | |
Matrix | seq_correlated_mutation (Bmolgroup *molgroup, Bstring &refseqid, double cutoff, Bstring &simfile) |
Correlated mutation analysis of an alignment. More... | |
Header file for sequence analysis functions.
int seq_aligned_hydrophobicity | ( | Bmolgroup * | molgroup, |
int | window, | ||
double | threshold, | ||
Bstring & | hphobfile, | ||
Bstring & | psfile | ||
) |
Calculates the average hydrophobicity at every position in an alignment.
*molgroup | the set of sequences. |
window | moving average window. |
threshold | fraction of sequences with a residue in a position. |
&hphobfile | parameter file. |
&psfile | postscript output file. |
The default hydrophobicity scale is the GES scale.
Calculates the pairwise identities between aligned sequences.
*molgroup | the set of sequences. |
The identity between two sequences is defined as: number of identical residues identity = -------------------------— overlap where the overlap is the number of positions with residues in both sequences.
Calculates the sequence logo representation for an alignment.
*molgroup | the set of sequences. |
window | window for calculating the moving average. |
&psfile | the postscript file name. |
The information content of each position in an alignment is calculated as: information = log_2(n) - sum(pi * log_2(pi) ) fi pi = ----— sum(fi) fi = frequency of residue type i at this position n = sum(fi) if sum(fi) < 20, otherwise n = 20 A moving average of the information is calculated over a given window to smooth the resultant data. The sequence logo representation for the occurrence of every residue type at every position is generated and written into a postscript file.
string seq_aligned_profile | ( | Bmolgroup * | molgroup | ) |
Generates a PROSITE format profile from an aligned set of sequences.
*molgroup | the set of sequences. |
At each position in the alignment, the number of distinct residue types are counted. If there are more than 3 residue types represented at a position, or there is a gap, it is designated as variable by an "x". The profile finally contains 1-3 residue type possibilities for highly conserved positions interspersed by variable length gaps.
Matrix seq_aligned_similarity | ( | Bmolgroup * | molgroup, |
double | threshold, | ||
Bresidue_matrix * | simat | ||
) |
Calculates the pairwise similarities between aligned sequences.
*molgroup | the set of sequences. |
threshold | threshold to accept residues as similar. |
*simat | residue similarity matrix. |
The similarity between two sequences is defined as: sum(residue similarity) similarity = --------------------— overlap number of residues with similarity > threshold fraction similarity = -------------------------------------------— overlap where the overlap is the number of positions with residues in both sequences. The residue similarity is taken from a residue substitution matrix. The default substitution matrix is BLOSUM62.
Matrix seq_correlated_mutation | ( | Bmolgroup * | molgroup, |
Bstring & | refseqid, | ||
double | cutoff, | ||
Bstring & | simfile | ||
) |
Correlated mutation analysis of an alignment.
*molgroup | the set of aligned sequences. |
refseqid | reference sequence to report on. |
cutoff | cutoff for reporting correlated mutations. |
&simfile | similarity matrix file. |
Reference: Gobel, Sander & Schneider (1994) Proteins 18, 309-317. Mutation (residue variation) correlation is defined as: 1 r(i,j) = ----------— sum(w(k,l)*(s(i,k,l) - <s(i)>)*(s(j,k,l) - <s(j)>)) m^2*o(i)*o(j) where: m: number of sequences o(i): standard deviation of similarities at alignment position i w(k,l): weight for sequences k and l (1 - fractional identity: see function seq_aligned_identity) s(i,k,l): similarity for alignment position i between sequences k and l <s(i)>: average similarity at alignment position i Individual high-scoring correlations (using the given cutoff value) are reported as follows: Res1 Num1 Res2 Num2 Total Corr T 9 I 17 210 0.631 TAIIIVVVIVVVIVIIIIIII IILLLLLLLLLLLLLLLLLLL The first 4 values gives the type and alignment position of the correlating residues. The total is the number of comparisons made: maximally m*(m-1)/2 The last number is the correlation coefficient. The following two lines gives the corresponding residues at the two alignment positions for all the sequences, allowing the user to see on what basis this is a high correlation.
Deletes non-selected sequences and corresponding elelments of a comparison matrix.
*molgroup | the set of sequences. |
mat | comparison matrix. |
vector< Complex< float > > seq_frequency_analysis | ( | long | win, |
long | start, | ||
long | end, | ||
vector< double > & | data | ||
) |
Fourier transforms a vector for frequency analysis.
win | window size. |
start | start within window. |
end | end within window. |
*data | sequence. |
A brute force Fourier transform is done.
Limits the selection to the reference sequence in an aligned set.
*molgroup | the set of sequences. |
&refseq | reference sequence identifier. |
long seq_select | ( | Bmolgroup * | molgroup, |
long | minlen, | ||
long | maxlen | ||
) |
Selects sequences within a range of lengths.
*molgroup | the set of sequences. |
minlen | minimum length. |
maxlen | maximum length. |
Selects sequences based on a comparison matrix of aligned sequences.
*molgroup | the set of sequences. |
mat | comparison matrix. |
ref | reference sequence number (starting at 1). |
cutoff | threshold for selecting sequences. |