Skip to content

Latest commit

 

History

History
75 lines (53 loc) · 4.17 KB

File metadata and controls

75 lines (53 loc) · 4.17 KB

Running Metacompass

This guide provides instructions on how to run the Metacompass.

Requirements

Before running the Metacompass, ensure that you have the following:

  • Nextflow (setting it up using our conda environment is recommended)
  • Reference database (ref_db_path) (see build instructions here)
  • Input forward and reverse read files in FASTQ format
  • Other requirements

Usage

To run Metacompass effectively, we recommend creating a script that incorporates the following key elements:

  1. Input Data Paths: Specify the paths to the input forward and reverse read files.

  2. Parameter Configuration: Define various parameters, including:

    • The reference database
    • The output directory
    • The number of threads to use

    (Full parameter configuration)

Creating such a script will simplify the execution of Metacompass for your specific analysis.

Example:

example.sh

# Set the paths to your input data and reference database by modifying
# the following variables in your shell script:

forward_read="forwad_read.fastq.gz" # will also work with .fastq file
reverse_read="reverse_read.fastq.gz" # will also work with .fastq file
ref_db_path="/path/to/your/reference/database"
output_folder="/path/to/your/output/directory"

# Run metacompass on these variables using the following command:

nextflow run metacompass2.nf \
 --reference_db "${ref_db_path}" \ # [required]
 --forward "$forward_read" \ # [required]
 --reverse "$reverse_read" \ # [required]
 -output-dir "$output_folder" \ # [required]
 --threads 8 \ # [optional] by default it is 16
 --trace_file_name "$output_folder/trace.txt" \ # [optional] 
 -with-timeline "$output_folder/timeline.html" \ # [optional]
 -with-dag "$output_folder/${read}_dag.png" # [optional]
 
 # --trace_file_name: Path to a nextflow trace file
 # -with-timeline: Generates a timeline HTML report. [optional]
 # -with-dag: Generates a Directed Acyclic Graph (DAG) visualization. [optional]
./example.sh

Monitor the progress and view the results in the specified output directory.

Toggling de novo assembly on and off

By default, MetaCompass also generates a de novo assembly using the reads that were not used by the reference-guided step. To turn off this feature, you can use the parameter --de_novo 0 or you can turn it on with --de_novo 1 .

Expected outputs

There will be several sub-folders within the main output folder:

  • reference_selection/ - this is where the outputs from the reference selection will be located. Notably, the file ref_genome_marker_gene_coverage.tsv contains a list of all the genomes that have hits to at least one marker gene and provides information about the depth and breadth of coverage for the full marker gene set. This file can be used to determine why a genome you expect to be found in a sample was not actually selected by MetaCompass.
  • reference_culling/ - this is where you see the outputs from the step that clusters together the genomes selected in the reference selection step . This folder should contain a separate .fna file for each cluster as well as several files named clusters.* that include information about the membership of each genome cluster
  • reference_assembly/ - this folder contains the assemblies generated for each genome in each cluster. The folder should contain several sub-folders, one per genome, containing the assembled sequence for that reference genome sequence.
  • denovo_assembly/ - this folder contains the output of the de novo assembler (if run). Currently it should only contain one folder, megahit_out corresponding to the output of MEGAHIT
  • output/ - this folder contains the key outputs from MetaCompass - the separate assemblies of each genome, the de novo assembly, as well as a file called all_contigs.fna containing all the contigs generated by both the reference-guided and de novo assemblies.