Difference between revisions of "MSblender TACC"

From Marcotte Lab
Jump to: navigation, search
(Prepare database)
Line 19: Line 19:
  
 
== Prepare database ==
 
== Prepare database ==
* Copy your FASTA file to 'myProject/DB' directory.
+
* You can run this process on any computer. If it takes longer than a minute, it would be better to process it on other than TACC login node (your account may be locked).
 
+
<pre>$ python ~/git/MS-toolbox/bin/fasta-reverse.py XENLA_prot_v4.fasta
+
$ mv XENLA_prot_v4.fasta.target XENLA_prot_v4_combined.fasta
+
$ cat XENLA_prot_v4.fasta.reverse >> XENLA_prot_v4_combined.fasta
+
$ head -n 1 XENLA_prot_v4.fasta
+
>10a1.1|XB-GENE-6077477|AAH55957|33416620
+
$ head -n 1 XENLA_prot_v4.fasta.reverse
+
>rv_nadkd1|XB-GENE-991229|AAI46629|148921623 </pre>
+
  
 +
<pre>$ python $HOME/git/MSblender/pre/fasta-reverse.py my_seq.fa
 +
$ cat my_seq.fa.* > my_seq.combined.fa</pre>
 +
 
=== DB setup for X!tandem ===
 
=== DB setup for X!tandem ===
<pre> $~/src.MS/local/bin/fasta_pro.exe (my combined fasta file) </pre>
+
<pre> $ $HOME/git/MSblender/extern/fasta_pro.exe my_seq.conbind.fa</pre>
It makes an index file with '.pro' suffix after your FASTA filename.
+
 
<pre> $~/src.MS/local/bin/fasta_pro.exe XENLA_prot_v4_combined.fasta
+
You may see the message like below:
 +
 
 +
<pre>$ ~/git/MSblender/extern/fasta_pro.exe my_seq.combined.fa
 
fasta_pro file conversion utility, v. 2006.09.15
 
fasta_pro file conversion utility, v. 2006.09.15
  input path = XENLA_prot_v4_combined.fasta
+
  input path = my_seq.combined.fa
output path = XENLA_prot_v4_combined.fasta.pro
+
output path = my_seq.combined.fa.pro
 
db type = plain</pre>
 
db type = plain</pre>
  
=== DB setup for Crux ===
+
=== DB setup for comet ===
<pre> $~/src.MS/local/bin/crux create-index --enzyme trypsin --missed-cleavages 2 --peptide-list T --decoys none (my combined fasta file) (my index name)</pre>
+
You don't need to do anything for this.  
* If you want to use Crux function separately (or other embeded post-processing tool, i.e. percolator or q-ranker), you should use FASTA file with target sequence only, with certain decoy option (default option is protein-shuffle, but peptide-shuffle would be better.)
+
* 'peptide-list' is optional.
+
* Trypsin digestion pattern in Crux is '[KR]|{P}', so it does not cut K/R if the next AA is P. If you want to ignore this 'Proline' constraint, you can use '--custom-enzyme "[KR]|[X]"' instead of '--enzyme trypsin'.
+
 
+
=== DB setup for InsPecT ===
+
<pre> $~/src.MS/inspect/current/PrepDB.py FASTA (my fasta file)</pre>
+
* It makes an index file with '.trie' suffix after your FASTA filename.
+
  
=== DB setup for MSGFDB ===
+
=== DB setup for MSGF+ ===
<pre>$ java -cp ~/src.MS/MSGFDB/current/MSGFDB.jar msdbsearch.BuildSA -d (my FASTA file) -tda 0</pre>
+
It uses significant amount of computing resources (i.e. memory), so it may not be suitable to run on login node.  
* It generates .canno, .cnlcp, .csarr & .cseq files.
+
* If you want to use native MS-GFDB function, use -tda 2 (generate target & combined database) with target-only FASTA file.
+
  
 +
<pre> $ module load jdk64
 +
$ java -Xmx4000M -cp /home1/00992/linusben/git/MSblender/extern/MSGFPlus.jar edu.ucsd.msjava.msdbsearch.BuildSA -d XenopusHybrid_xlJGIv16_xtJGIv83.combined.fa -tda 0
  
Copy your mzXML files on this diretory ($SCRATCH/myProject/mzXML).
+
== Prepare mzXML files ==
  
 +
Copy your mzXML files on this diretory ($SCRATCH/myProject/mzXML).
  
 
== Prepare search ==
 
== Prepare search ==

Revision as of 18:00, 2 March 2015

Contents

Before you start

  • To use this setting, your TACC account needs to be allocated to our lab project('A-cm10'). If you don't have an account, create it at https://portal.tacc.utexas.edu/. Then, ask Edward to assign your account as a member of lab project.
  • This document is for 'stampede'.
  • Always work at $SCRATCH directory, not at /corral or your $HOME.

Install MSblender (and comet, MSGFDB, X!Tandem)

$ cd ~
$ mkdir git
$ cd git
$ git clone https://github.com/marcottelab/MSblender.git

Prepare a working space

$ module load python
$ cd $SCRATCH
$ mkdir myProject
$ cd myProject
$ mkdir mzXML
$ mkdir DB

Prepare database

  • You can run this process on any computer. If it takes longer than a minute, it would be better to process it on other than TACC login node (your account may be locked).
$ python $HOME/git/MSblender/pre/fasta-reverse.py my_seq.fa
$ cat my_seq.fa.* > my_seq.combined.fa

DB setup for X!tandem

 $ $HOME/git/MSblender/extern/fasta_pro.exe my_seq.conbind.fa

You may see the message like below:

$ ~/git/MSblender/extern/fasta_pro.exe my_seq.combined.fa 
fasta_pro file conversion utility, v. 2006.09.15
 input path = my_seq.combined.fa
output path = my_seq.combined.fa.pro
db type = plain

DB setup for comet

You don't need to do anything for this.

DB setup for MSGF+

It uses significant amount of computing resources (i.e. memory), so it may not be suitable to run on login node.

 $ module load jdk64
$ java -Xmx4000M -cp /home1/00992/linusben/git/MSblender/extern/MSGFPlus.jar edu.ucsd.msjava.msdbsearch.BuildSA -d XenopusHybrid_xlJGIv16_xtJGIv83.combined.fa -tda 0 

== Prepare mzXML files ==

Copy your mzXML files on this diretory ($SCRATCH/myProject/mzXML).

== Prepare search ==
<pre>$ python ~/git/MS-toolbox/bin/prepare-tandemK.py 
Create /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/tandemK.
Write /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/tandemK/tandem-taxonomy.xml.
Write /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/tandemK/20110713_XENLA_Egg1_1.tandemK.xml
...

TandemK is ready. Run /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/scripts/run-tandemK.sh.
$ python ~/git/MS-toolbox/bin/prepare-inspect.py 
Create /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/inspect.
Write /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/inspect/20110713_XENLA_Egg1_1.inspect_in.
Write /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/inspect/20110713_XENLA_Egg1_2.inspect_in.
...

InsPecT is ready. Run /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/scripts/run-inspect.sh.
$ python ~/git/MS-toolbox/bin/prepare-MSGFDB.py 
Create /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/MSGFDB.
20110713_XENLA_Egg1_1.mzXML
20110713_XENLA_Egg1_2.mzXML
....

MSGFDB is ready. Run /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/scripts/run-MSGFDB.sh.

Run search

In a standalone workstation, you can run ./script/run-(search_engine).sh directly to start. But you shouldn't do this in TACC login terminal. Put the following parameters on each run-*.sh script, then submit a job by qsub.

  • If you use lonestar, replace '4way 8' to '8way to 24'. See Lonestar user guide and Longhorn user guide for detail.
  • Don't forget to put your email address at -M.
  • Put short job name to check the status easily.
#!/bin/bash
#$ -V                   # Inherit the submission environment
#$ -cwd                 # Start job in submission directory
#$ -j y                 # Combine stderr and stdout
#$ -o $JOB_NAME.o$JOB_ID
#$ -pe 4way 8
#$ -q long
#$ -l h_rt=24:00:00     # Run time (hh:mm:ss)
#$ -M (your email)
#$ -m be                # Email at Begin and End of job
#$ -P hpc
set -x

#$ -N (job name)
(put the remaining part of run-* script after #!/bin/bash line)