Consensus and Profile

AUTHOR

L. Grondin

Finding a Most Likely Common Ancestor

http://rosalind.info/problems/cons/

Sample input

ATCCAGCT
    GGGCAACT
    ATGGATCT
    AAGCAACC
    TTGGAACT
    ATGCCATT
    ATGGCACT

Sample output

ATGCAACT
    A: 5 1 0 0 5 5 0 0
    C: 0 0 1 4 2 0 6 1
    G: 1 1 6 3 0 1 0 0
    T: 1 5 0 0 0 1 1 6
use v6;



my @default-data = qw{
    ATCCAGCT
    GGGCAACT
    ATGGATCT
    AAGCAACC
    TTGGAACT
    ATGCCATT
    ATGGCACT
};

my \N = @default-data.pick.chars;

my %profile;
%profile{$_} = [0 xx N] for <A C G T>;

for @default-data[] {
    my @dna = .comb;
    my %dna-index-map = classify { @dna[$_] }, ^@dna;
    for %dna-index-map.kv -> $k, $v {
        %profile{$k}[$v[]]ยป++;
    }
}
my @profile = %profile<A C G T>;

say my $consensus = [~] gather
for ^N -> \c {
    my $max = max map { @profile[$_][c] }, ^4;
    take <A C G T>[$_] given first { @profile[$_][c] == $max }, ^4;
}

say [~] .key, ': ', @profile[.value] for enum <A C G T>.sort;

# vim: expandtab shiftwidth=4 ft=perl6

See Also

afrq-grondilu.pl

Counting Disease Carriers

aspc-grondilu.pl

Introduction to Alternative Splicing

conv-grondilu.pl

Comparing Spectra with the Spectral Convolution

cstr-grondilu.pl

Creating a Character Table from Genetic Strings

ctbl-grondilu.pl

Creating a Character Table

dbpr-grondilu.pl

Introduction to Protein Databases

dna-gerdr.pl

Counting DNA Nucleotides

dna-grondilu.pl

Counting DNA Nucleotides

eubt-grondilu.pl

Enumerating Unrooted Binary Trees

eval-grondilu.pl

Expected Number of Restriction Sites

fib-grondilu.pl

Rabbits and Recurrence Relations

fibd-grondilu.pl

Mortal Fibonacci Rabbits

gc-gerdr.pl

Computing GC Content

grph-grondilu.pl

Overlap Graphs

hamm-grondilu.pl

Counting Point Mutations

iev-grondilu.pl

Calculating Expected Offspring

indc-grondilu.pl

Independent Segregation of Chromosomes

iprb-grondilu.pl

Mendel's First Law

itwv-grondilu.pl

Finding Disjoint Motifs in a Gene

lcsq-grondilu.pl

Finding a Shared Spliced Motif

lia-grondilu.pl

Independent Alleles

lrep-grondilu-p5.pl

mmch-grondilu.pl

Maximum Matchings and RNA Secondary Structures

mprt-grondilu.pl

Finding a Protein Motif

mrna-grondilu.pl

Inferring mRNA from Protein

nwck-grondilu.pl

Distances in Trees

orf-grondilu.pl

Open Reading Frames

pmch-grondilu.pl

Perfect Matchings and RNA Secondary Structures

pper-grondilu.pl

Partial Permutations

prob-grondilu.pl

Introduction to Random Strings

qrt-grondilu.pl

Quartets

README.md

revc-gerdr.pl

Complementing a Strand of DNA

rna-gerdr.pl

Transcribing DNA into RNA

rstr-grondilu.pl

Matching Random Motifs

sexl-grondilu.pl

Sex-Linked Inheritance

sgra-grondilu.pl

Using the Spectrum Graph to Infer Peptides

spec-grondilu.pl

Inferring Protein from Spectrum

sseq-grondilu.pl

Finding a Spliced Motif

subs-grondilu.pl

Finding a Motif in DNA

suff-grondilu.pl

Encoding Suffix Trees

tran-grondilu.pl

Transitions and Transversions

trie-grondilu.pl

Introduction to Pattern Matching

The Camelia image is copyright 2009 by Larry Wall. "Raku" is trademark of the Yet Another Society. All rights reserved.