Chapter 2 Input Data Format

The online tool request two files for input. One file includes abundance for each protein and the other includes sample information in the experiment.

2.1 Protein File

The first column of the file is the id for each protein with header “Protein” in the first row. The second column and the following are the abundance of protein for each sample witch a unique column name in the first row. An example protein file is shown in Table 2.1.

Table 2.1: Example of first 10 rows of the protein file
Protein TR1 FS1 FS2 MN1 MN2 TR2
A2A432 12946 23552 19797 21634 25137 18322
A2A5R2 8421 13330 11759 14449 16622 11235
A2A690 8773 13603 11344 14952 16806 13155
Q6PDQ2 1007 1761 1348 2027 2510 1614
A2ADY9 941 1534 1466 1632 1707 1101
A2AG50 83568 149689 126248 140182 163669 116929
A2AGT5 113268 216990 174870 201874 235820 169570
A2AHC3 7311 15867 12267 14200 16059 12361
A2AJA9 1309 2513 1773 2242 2604 2323
A2AJI0 20485 39374 30744 39205 42535 31750
Please make sure the column name is unique for each column.

2.2 Sample Information File

Sample information file contains the experiment design information. It should contain at least two columns: col.name and sample.id for single batch data. The first column is col.name which should be the same as the column names in protein file from the second column. The column sample.id shows the sample id in the experiment. It could be the same as col.name if there is no technical replicate. If there are technical replicates, the technical replicates should have the sample sample.id but different col.name. As shown in Table 2.2, TR1 and TR2 are technical replicates, they have same sample.id TR but different col.name TR1 and TR2.

Table 2.2: Sample information file matched for Table 2.1
col.name sample.id
TR1 TR
FS1 FS1
FS2 FS2
MN1 MN1
MN2 MN2
TR2 TR

Besides col.name and sample.id, sample information file also accepts the following information as a column in the file:

Column Name Description
batch Batch effect information. E.g., the first three samples are in batch 1 and the last 3 samples are in batch 2 in Table 2.3
pool Pooling information indicating whether a sample is a pooled sample or not.
cv Group information for coefficient of variation (CV) calculation. The samples with same group id in cv column are used for coefficient of variation calculation. Leave the column blank if the sample is not included for the calculation. E.g., P1 and P4 is not included for calculation; FS1 and FS2 are included in CV calculation for group FS; MN3 and MN4 are included in CV calculation for group MN in Table 2.3.

If the input data contains multiple batches, a batch column must be included in the sample information file. An example of sample information file with all columns is list in Table 2.3.

Table 2.3: Example of sample information file with all columns
col.name sample.id batch pool cv
P1 pool 1 TRUE
FS1 FS1 1 FALSE FS
FS2 FS2 1 FALSE FS
MN3 MN3 2 FALSE MN
MN4 MN4 2 FALSE MN
P4 pool 2 TRUE
Please make sure that the col.name in the sample information file must have the same order as the column name in the protein file.