-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bio::SeqFeature::Tools::Unflattener does not convert pseudogene correctly #124
Comments
Relevant feature in the GenBank file is:
not sure why it has changed, @cmungall any ideas? |
May try running |
I'm afraid the exact rationale escapes me at the moment. I assume it was to do with an asymmetry in how pseudogene models were typically encoded in genbank records at the time. For this particular example, the pseudogene mirrors the structure of a gene precisely, even having a CDS. It looks like the code should be simplified here so that pseudogenes are structured symmetrically to genes. But it's not clear what the consequences of making this change would be for other pseudogene records in genbank. Also, there is a secondary issue that the pseudogene hierarchy doesnt mirror the gene one entirely in SO (e.g. no pseudoCDS) |
Thanks for working on this issue. Yes, there is no pseudoCDS in SO so far. There is pseudogenic_exon instead. For the change in BioPerl 1.6.9 # PSEUDOGENES, PSEUDOEXONS AND PSEUDOINTRONS
# these are indicated with the /pseudo tag
# these are mapped to a different type; they should NOT
# be treated as normal genes
foreach my $sf (@all_seq_features) {
if ($sf->has_tag('pseudo')) {
my $type = $sf->primary_tag;
# SO type is typically the same as the normal
# type but preceeded by "pseudo"
if ($type eq 'misc_RNA' || $type eq 'mRNA') {
# dgg: see TypeMapper; both pseudo mRNA,misc_RNA should be pseudogenic_transcript
$sf->primary_tag("pseudotranscript");
}
else {
$sf->primary_tag("pseudo$type");
}
}
} I propose the following, foreach my $sf (@all_seq_features) {
if ($sf->has_tag('pseudo')) {
my $type = $sf->primary_tag;
if ($type eq 'gene') {
$sf->primary_tag("pseudogene");
} elsif ($type eq 'CDS') {
$sf->primary_tag("pseudogenic_exon");
} elsif ($type eq 'mRNA') {
$sf->primary_tag("pseudogenic_transcript");
}
else {
$sf->primary_tag("pseudogenic_$type");
}
}
} After this, you can unflat the structure as and so on. |
When I use Bio::SeqFeature::Tools::Unflattener to convert GenBank flat-feature-list to containment hierarchy,
Everything is fine except these genes has pseudo tag.
Like to know if there is any other parameter, or any other method that I can use to convert both gene and pseudogene correctly.
sample file
http://www.ncbi.nlm.nih.gov/nuccore/FR823391
http://www.ncbi.nlm.nih.gov/nuccore/GL636509
sample codes
I am using bioperl_live/1.6.9,
$ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
1.0069
Everything is fine when I use bioperl version 1.4. Not sure what is changed for Bio::SeqFeature::Tools::Unflattener between 1.4 and 1.6.9.
The different outputs with above code in different bioperl version 1.6.9 and 1.4.
Thanks.
The text was updated successfully, but these errors were encountered: