Used download_koterniak_2020.sh to download the SRR listed in the metadata table. Note, also downloaded the Kaletsky (2018) data (the Koterniak data was downloaded with the older fastq-dump, the Kaletsky data with fasterq-dump, as some of the bigger files failed to download). However, most of the Kaletsky (2018) files are single-end reads, not processing them here (maybe later version).
Used star_index.sh to create index (saved along with references), and dsq_star_align.sh to align. Note, did not use option to sort BAMs as it kept running out of memory. The dSQ jobs are defined in joblist_align and re_joblist_align (bigger samples that need more time). Job was then created with:
dsq --job-file src/joblist_align.txt --mem 20GB --cpus-per-task 10 -t 5:00:00 --mail-type ALL
dsq --job-file src/re_joblist_align.txt --mem 20GB --cpus-per-task 10 -t 23:50:00 --mail-type ALL
After this, bams were sorted and indexed with the dSQ sort_bams.sh and joblist_sortindex.txt, using:
dsq --job-file src/joblist_sortindex.txt --mem 20GB --cpus-per-task 5 -t 23:50:00 --mail-type ALL
The SJ files created by STAR and the sorted indexed bams were transferred and renamed manually based on description in GEO:
| Accession | Description | Run | Short Name |
|---|---|---|---|
| GSM2836730 | muscle_TRAP_rep_1 | SRR6238092 | muscle_6238092 |
| GSM2836731 | muscle_TRAP_rep_2 | SRR6238093 | muscle_6238093 |
| GSM2836732 | intestine_TRAP_rep_1 | SRR6238094 | intestine_6238094 |
| GSM2836733 | intestine_TRAP_rep_2 | SRR6238095 | intestine_6238095 |
| GSM2836734 | neuronal_TRAP_rep_1 | SRR6238096 | neurons_6238096 |
| GSM2836735 | neuronal_TRAP_rep_2 | SRR6238097 | neurons_6238097 |
| GSM2836736 | serotonin_TRAP_rep_1 | SRR6238098 | serotonergic_6238098 |
| GSM2836737 | serotonin_TRAP_rep_2 | SRR6238099 | serotonergic_6238099 |
| GSM2836738 | dopamine_TRAP_rep_1 | SRR6238100 | dopaminergic_6238100 |
| GSM2836739 | dopamine_TRAP_rep_2 | SRR6238101 | dopaminergic_6238101 |
note we don't continue processing samples SRR6238102-6238111 here as they are the input (whole worm) for each of these samples.
Stringtie quantification with src/stringtie.sh, then export the TPMs with summarize_stringtie_q.R (ran manually on cluster). That gives us the intermediates/240827_strq_outs/240828_tx_TPM.tsv file.
Manually deleted the first header transcript_id\t so that the header starts with sample names.
Finally, run src/suppa_psi.sh to get PSI per event and src/suppa_dpsi.sh for deltaPSI, analyze in repo suppa_events along with the neuronal quantifications.