Chapter 2 Input Data Format
The online tool request two files for input. One file includes abundance for each protein and the other includes sample information in the experiment.
2.1 Protein File
The first column of the file is the id for each protein with header “Protein” in the first row. The second column and the following are the abundance of protein for each sample witch a unique column name in the first row. An example protein file is shown in Table 2.1.
Protein | TR1 | FS1 | FS2 | … | MN1 | MN2 | TR2 |
---|---|---|---|---|---|---|---|
A2A432 | 12946 | 23552 | 19797 | … | 21634 | 25137 | 18322 |
A2A5R2 | 8421 | 13330 | 11759 | … | 14449 | 16622 | 11235 |
A2A690 | 8773 | 13603 | 11344 | … | 14952 | 16806 | 13155 |
Q6PDQ2 | 1007 | 1761 | 1348 | … | 2027 | 2510 | 1614 |
A2ADY9 | 941 | 1534 | 1466 | … | 1632 | 1707 | 1101 |
A2AG50 | 83568 | 149689 | 126248 | … | 140182 | 163669 | 116929 |
A2AGT5 | 113268 | 216990 | 174870 | … | 201874 | 235820 | 169570 |
A2AHC3 | 7311 | 15867 | 12267 | … | 14200 | 16059 | 12361 |
A2AJA9 | 1309 | 2513 | 1773 | … | 2242 | 2604 | 2323 |
A2AJI0 | 20485 | 39374 | 30744 | … | 39205 | 42535 | 31750 |
2.2 Sample Information File
Sample information file contains the experiment design information. It should contain at least two columns: col.name and sample.id for single batch data. The first column is col.name which should be the same as the column names in protein file from the second column. The column sample.id shows the sample id in the experiment. It could be the same as col.name if there is no technical replicate. If there are technical replicates, the technical replicates should have the sample sample.id but different col.name. As shown in Table 2.2, TR1 and TR2 are technical replicates, they have same sample.id TR but different col.name TR1 and TR2.
col.name | sample.id |
---|---|
TR1 | TR |
FS1 | FS1 |
FS2 | FS2 |
… | … |
MN1 | MN1 |
MN2 | MN2 |
TR2 | TR |
Besides col.name and sample.id, sample information file also accepts the following information as a column in the file:
Column Name | Description |
---|---|
batch | Batch effect information. E.g., the first three samples are in batch 1 and the last 3 samples are in batch 2 in Table 2.3 |
pool | Pooling information indicating whether a sample is a pooled sample or not. |
cv | Group information for coefficient of variation (CV) calculation. The samples with same group id in cv column are used for coefficient of variation calculation. Leave the column blank if the sample is not included for the calculation. E.g., P1 and P4 is not included for calculation; FS1 and FS2 are included in CV calculation for group FS; MN3 and MN4 are included in CV calculation for group MN in Table 2.3. |
If the input data contains multiple batches, a batch column must be included in the sample information file. An example of sample information file with all columns is list in Table 2.3.
col.name | sample.id | batch | pool | cv |
---|---|---|---|---|
P1 | pool | 1 | TRUE | |
FS1 | FS1 | 1 | FALSE | FS |
FS2 | FS2 | 1 | FALSE | FS |
… | … | … | … | … |
MN3 | MN3 | 2 | FALSE | MN |
MN4 | MN4 | 2 | FALSE | MN |
P4 | pool | 2 | TRUE |