Inputs

The Data File

File structure

The data accepted by NMRQuant should be passed in as an excel file (.xlsx), a comma-separated text file (.csv) or a tabulation-separated text file (.tsv). The structure of the file must be as follows:

# Spectrum#

Metabolite1

Metabolite2

Metabolite3

1

xxx

xxx

xxx

2

xxx

xxx

xxx

3

xxx

xxx

xxx

4

xxx

xxx

xxx

The number of metabolites processed at a time has no limit. The important column here is the first one: “# Spectrum#”, as it will be responsible for linking the recorded values per metabolite with the different times, conditions and replicates that will be given through the metadata file (ref).

Note

The column header is case-sensitive! Since in most cases the spectrum data comes from spectra integrated using Bruker’s TopSpin software and this is the column header used by their software, for practical reasons the same formalism was kept in NMRQuant.

Every other column must have as header the name of a metabolite, and the associated areas should be given in each row. If a metabolite has more than one integrated area, it should be given in two or more columns and a number should be assigned to each of the column headers with an underscore between the metabolite name and the number. For example:

# Spectrum#

Phenylalanine_1

Phenylalanine_2

Phenylalanine_3

1

xxx

xxx

xxx

2

xxx

xxx

xxx

3

xxx

xxx

xxx

4

xxx

xxx

xxx

Note

The number of protons for each area group should be given in the database file using the same nomenclature as in the data file, including the spaces and numbers.

Warning

To calculate concentrations for a metabolite using multiple integration areas (as for phenylalanine in the example above), make sure that the proton count for each area is referenced in the database.

Calibration molecule input

To calculate concentrations, NMRQ needs to know if the calibration is internal or external. To know this, it searches in the data file for a column named “Strd” (for Standard). This column is added in manually by the user, and must have at least in it’s first row the number 1 (if the calibration is internal) or 9 (if the calibration is external). If the value is equal to 1, the Strd’s concentration is not needed and the user does not need to input it in the notebook or the Command-Line Interface (CLI). On the contrary, if the value is equal to 9, the user will have to give the TSP concentration through the notebook or the CLI.

The Database File

To calculate concentrations from 1D H NMR spectra areas, it is necessary to have the number of protons for each integrated region (corresponding to an equivalent group of protons in a molecule). Consequently, it is necessary to pass this information to NMRQuant through an excel or csv file containing these proton numbers for each metabolite we are quantifying. The structure of the excel file should at least be as follows:

Metabolite

Heq

Formate

1

Phenylalanine

5

Tyrosine

2

As always, the headers for each column are case-sensitive, as are the names of each metabolite which have to be exactly the same as the ones used in the data file. The two columns shown above are the minimal requirements for NMRQ to function. Good practices dictate that users also add a column with the different ppm positions of each region for informational purposes. This is not a problem and will not interfere with the software’s ability to read the file.

Note

For metabolites that are quantified using two or more regions, the formalism is to add a number separated from the metabolite name by an underscore. The same must be done for the metabolite names in the data file:

Metabolite

Heq

Formate

1

Phenylalanine_1

3

Phenylalanine_2

2

Tyrosine

2

Note

If there is no corresponding metabolite in the database file for a given metabolite in the datafile, RMNQ will notify you by adding _Area after the metabolite’s name in the output file, and keep the areas in the final results. In this case, the software will also use the area values for plotting. This means that if the user uses an arbitrary name for an unknown integration area (”unknown” for example) it will still be plotted and put in the results.

The Template File

Once the datafile is uploaded into the notebook or given in the command-line interface (CLI), the user can generate a structured template file in which the Time Points, Conditions and Replicate numbers must be referenced. To do this, the program reads the ‘# Spectrum#’ field in the data file and generates the required number of rows for each spectrum with the correct formalism for the headers.

Warning

Do not change the names of the columns in the generated template file as this will stop NMRQ from reading the file correctly!

The metadata given through the template file will then be used to separate the datas in groups for plotting.

Note

The Replicates column must number each individual sample of similar Times + Conditions from 1 to n (n being the last replicate). This will let the software mean the concentrations and also create the summary plots and meaned histograms with error bars.