Chapter 2 Input Data Format

The online tool request two files for input. One file includes abundance for each protein and the other includes sample information in the experiment.

2.1 Protein File

The first column of the file is the id for each protein with header “Protein” in the first row. The second column and the following are the abundance of protein for each sample witch a unique column name in the first row. An example protein file is shown in Table 2.1.

Table 2.1: Example of first 10 rows of the protein file
Protein	TR1	FS1	FS2	…	MN1	MN2	TR2
A2A432	12946	23552	19797	…	21634	25137	18322
A2A5R2	8421	13330	11759	…	14449	16622	11235
A2A690	8773	13603	11344	…	14952	16806	13155
Q6PDQ2	1007	1761	1348	…	2027	2510	1614
A2ADY9	941	1534	1466	…	1632	1707	1101
A2AG50	83568	149689	126248	…	140182	163669	116929
A2AGT5	113268	216990	174870	…	201874	235820	169570
A2AHC3	7311	15867	12267	…	14200	16059	12361
A2AJA9	1309	2513	1773	…	2242	2604	2323
A2AJI0	20485	39374	30744	…	39205	42535	31750

Please make sure the column name is unique for each column.

2.2 Sample Information File

Sample information file contains the experiment design information. It should contain at least two columns: col.name and sample.id for single batch data. The first column is col.name which should be the same as the column names in protein file from the second column. The column sample.id shows the sample id in the experiment. It could be the same as col.name if there is no technical replicate. If there are technical replicates, the technical replicates should have the sample sample.id but different col.name. As shown in Table 2.2, TR1 and TR2 are technical replicates, they have same sample.id TR but different col.name TR1 and TR2.

Table 2.2: Sample information file matched for Table 2.1
col.name	sample.id
TR1	TR
FS1	FS1
FS2	FS2
…	…
MN1	MN1
MN2	MN2
TR2	TR

Besides col.name and sample.id, sample information file also accepts the following information as a column in the file:

Column Name	Description
batch	Batch effect information. E.g., the first three samples are in batch 1 and the last 3 samples are in batch 2 in Table 2.3
pool	Pooling information indicating whether a sample is a pooled sample or not.
cv	Group information for coefficient of variation (CV) calculation. The samples with same group id in cv column are used for coefficient of variation calculation. Leave the column blank if the sample is not included for the calculation. E.g., P1 and P4 is not included for calculation; FS1 and FS2 are included in CV calculation for group FS; MN3 and MN4 are included in CV calculation for group MN in Table 2.3.

If the input data contains multiple batches, a batch column must be included in the sample information file. An example of sample information file with all columns is list in Table 2.3.

Table 2.3: Example of sample information file with all columns
col.name	sample.id	batch	pool	cv
P1	pool	1	TRUE
FS1	FS1	1	FALSE	FS
FS2	FS2	1	FALSE	FS
…	…	…	…	…
MN3	MN3	2	FALSE	MN
MN4	MN4	2	FALSE	MN
P4	pool	2	TRUE

Please make sure that the col.name in the sample information file must have the same order as the column name in the protein file.