<aside> 💡 This may take a while. Remember to start a screen first!
</aside>
# Activate the conda environment which has all the installed packages, default is 'rnaseq' or 'envs/rnaseq'
conda activate rnaseq # change 'rnaseq' if necessary
# Move to the directory that contains your raw reads
cd /path/to/your/dir
# Run the fastqc (assuming your raw reads are named ...fq.gz)
# -t specifies the no. of threads (CPUs) to use, be realistic especially if many users are running their jobs at the same time
# *.fq.gz is a wildcard, it means all files with the pattern ...fq.gz
fastqc -t 24 *.fq.gz
It may take a while to run, and output files in the name of ...fastqc... in the same directory. Let's make the outputs more neat:
mkdir fastqc # crate a sub-directory called fastqc
mv *fastqc* fastqc # move all files with 'fastqc' in their names into the sub-directory fastqc
cd fastqc # go into the sub-directory fastqc
You can then inspect the QC files for each read separately, OR use another tool called multiqc to produce a neat QC report:
multiqc . # run multiqc in the current directory (.)
It will then produce a neat Multiqc report (.html) for you to visualise all reads at the same time. You can open the .html with any common browser on your desktop.
Example
The QC reports may flag some warnings or even errors. Most of them are actually safe to ignore because the analysis is tolerant to a bit of "dirty" reads. However, discuss with a bioinformatician if some errors are more substantial.