Example Input File

The following is a snippet from the evidence.txt output from MaxQuant (located in combined/txt in the MaxQuant output):

Modified sequence	Raw file	Retention time	PEP	Charge	Leading razor protein	Proteins	Retention length	Intensity
_AAASARR_	190523S_LCA16_X_SQC107	54.165	0.040934	2	sp|Q5TG53|SEAS1_HUMAN	sp|Q5TG53|SEAS1_HUMAN	0.61451	232050000
_AAATPAK_	190523S_LCA16_X_SQC107	46.483	0.04549	2	sp|P19338|NUCL_HUMAN	sp|P19338|NUCL_HUMAN	0.34084	125970000
_AAATPAKK_	190523S_LCA16_X_SQC107	49.362	0.033515	3	sp|P19338|NUCL_HUMAN	sp|P19338|NUCL_HUMAN	0.48575	160780000
_AAEDDEDDDVDTKK_	190523S_LCA16_X_SQC107	54.778	8.5508e-6	3	sp|P06454|PTMA_HUMAN	sp|P06454|PTMA_HUMAN	0.94468	1093100000
_AAFNSGK_	190523S_LCA16_X_SQC107	54.232	0.025567	2	sp|P04406|G3P_HUMAN	sp|P04406|G3P_HUMAN	0.48417	614860000
_AAGAGAAK_	190523S_LCA16_X_SQC107	42.639	0.0015378	2	sp|P16401|H15_HUMAN	sp|P16401|H15_HUMAN	0.51279	233520000
_AAKVATK_	190523S_LCA16_X_SQC107	55.86	0.032726	3	sp|Q92576|PHF3_HUMAN	sp|Q92576|PHF3_HUMAN	0.87625	1971100000
_AAKVQKLS(ph)K_	190523S_LCA16_X_SQC107	22.815	0.034397	3	sp|Q2VIR3|IF2GL_HUMAN	sp|Q2VIR3|IF2GL_HUMAN;sp|P41091|IF2G_HUMAN	1	NA
_AAPSHGSK_	190523S_LCA16_X_SQC107	70.93	0.12359	2	REV__sp|P35638|DDIT3_HUMAN	NA	0.33766	223540000

The input file is mapped to DART-ID column definitions by this block in the configuration file:

# column mappings for MaxQuant
col_names:
  sequence: "Modified sequence"
  raw_file: "Raw file"
  retention_time: "Retention time"
  pep: "PEP"

  # optional columns
  charge: "Charge"
  leading_protein: "Leading razor protein"
  proteins: "Proteins"
  retention_length: "Retention length"

The default delimiter is set to tabs (\t) for tab-separated values (.tsv), but this can be changed to any sort of delimited file, like , for .csv files. Specify the delimiter in the configuration file by setting:

sep: ","

Example Data

A full MaxQuant evidence file is available on MassIVE: ftp://massive.ucsd.edu/MSV000083149/other/MaxQuant/SQC_67_95_Varied/evidence.txt