-
Notifications
You must be signed in to change notification settings - Fork 39
Feature/794 canonical transcript #804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…`max()` key argument to use a lambda function, addressing mypy's type inference issue with `dict.get` in this context.
|
Hi @jonbrenas, |
|
Thanks @mohamed-laarej. This is great! I think you could make it slightly more modular but introducing new functions that do some of the more basic tasks (e.g., a function that lists all transcripts of a gene, a function that lists all exons of a transcript, ...). |
|
Another small nitpick: could you add the "type definition" for |
- Add find_gene_feature() to locate genes by various identifiers - Add get_gene_transcripts() to retrieve all transcripts for a gene - Add get_transcript_exons() to get exons for a transcript - Add calculate_transcript_length() to compute transcript length - Add get_gene_transcript_lengths() to get all transcript lengths for a gene - Refactor canonical_transcript() to use new helper functions - Improves code modularity, testability, and reusability
|
That's great @mohamed-laarej. I think the doc description might conflict with the way the doc is currently generated by default. Could you remove the example more specifically as the notebooks are used for that purpose? |
3d0bf45 to
6569103
Compare
|
Hello @jonbrenas , |
|
Looks like this just needs a re-review. |
|
Thank you, @mohamed-laarej. Would it be possible to add a few tests to |
|
Yes @jonbrenas, I can add them. |
…ility - Added extensive tests covering gene/transcript lookup, exon retrieval, length calculations, and integration scenarios, with dynamic fixture handling and graceful skips. - Updated find_gene_feature to use configurable GFF gene name attribute for broader compatibility.
|
Hi @jonbrenas , Just wanted to let you know that I've pushed the latest changes to this PR. Regarding the tests, you'll notice that specifically the I've implemented I'm happy to discuss this further if you'd like to explore options for updating the Thanks, |
|
Thanks @mohamed-laarej. I think the test shouldn't be skipped but the issue isn't with your code so it is fine here. I'll create a new issue to deal with that. |
feat: Add
canonical_transcriptmethod with supporting helpersAddresses #794
This PR adds a
canonical_transcript()function that identifies the most representative transcript for a gene, based on the total transcribed length (sum of exon lengths). It also introduces the following helper functions:find_gene_feature()– Locate a gene by ID, name, or attributesget_gene_transcripts()– Retrieve all transcript features for a geneget_transcript_exons()– Get all exons for a given transcriptcalculate_transcript_length()– Compute total length of a transcript by summing its exon lengthsget_gene_transcript_lengths()– Return lengths of all transcripts for a geneThese functions build on the existing
genome_featuresandgenome_feature_childrenmethods. The implementation includes basic error handling for missing or incomplete features.