Atmospheric Hydrogen Dry Air Mole Fractions from the NOAA GML Carbon
Cycle Cooperative Global Air Sampling Network, 2009-2021

Version: 2023-07-05

1.       Data source and contacts
2.       Use of data
2.1      Citation
3.       Reciprocity 
4.       Warnings
5.       Update notes
6.       Introduction
7.       DATA - General Comments
7.1      DATA - Sampling Locations
7.2      DATA - File Name Description
7.3      DATA - File Types
7.4      DATA - Content
7.5      DATA - QC Flags
7.6      DATA - Collection Methods
7.7      DATA - Monthly Averages
8.       Data retrieval
9.       References


National Oceanic and Atmospheric Administration (NOAA)
Global Monitoring Laboratory (GML)

Correspondence concerning these data should be directed to:

Gabrielle Petron
NOAA ESRL Global Monitoring Laboratory
325 Broadway, GML-1
Boulder, CO  80305

Electronic Mail: gabrielle.petron@noaa.gov


These data are made freely available to the public and the
scientific community in the belief that their wide dissemination
will lead to greater understanding and new scientific insights.
The availability of these data does not constitute publication
of the data.  NOAA relies on the ethics and integrity of the user to
ensure that GML receives fair credit for their work.  If the data 
are obtained for potential use in a publication or presentation, 
GML should be informed at the outset of the nature of this work.  
If the GML data are essential to the work, or if an important 
result or conclusion depends on the GML data, co-authorship
may be appropriate.  This should be discussed at an early stage in
the work.  Manuscripts using the GML data should be sent to GML
for review before they are submitted for publication so we can
ensure that the quality and limitations of the data are accurately


Please reference these data as

   Petron, G., Crotwell, A., Crotwell, M., Kitzis, D., Madronich, M.,
   Mefford, T., Moglia, E., Mund, J., Neff, D., Thoning, K., & Wolter, S.
   (2023). Atmospheric Hydrogen Dry Air Mole Fractions from the NOAA GML Carbon
   Cycle Cooperative Global Air Sampling Network, 2009-2021 [Data set].
   NOAA GML CCGG Division. Version: 2023-07-05, https://doi.org/10.15138/WP0W-EZ08


Use of these data implies an agreement to reciprocate.
Laboratories making similar measurements agree to make their
own data available to the general public and to the scientific
community in an equally complete and easily accessible form.
Modelers are encouraged to make available to the community,
upon request, their own tools used in the interpretation
of the GML data, namely well documented model code, transport
fields, modeled mole fractions, and additional information 
necessary for other scientists to repeat the work and to run 
modified versions. Model availability includes collaborative 
support for new users of the models.


Every effort is made to produce the most accurate and precise
measurements possible.  However, we reserve the right to make
corrections to the data based on recalibration of standard gases
or for other reasons deemed scientifically justified.

We are not responsible for results and conclusions based on use
of these data without regard to this warning.

Lab-wide notes:


We introduced the term "measurement group", which identifies
the group within NOAA or Institute of Arctic and Alpine Research (INSTAAR)
University of Colorado Boulder that made the measurement.  We can 
now have multiple groups measuring some of the same trace gas species 
in our discrete samples.  

Measurement groups within NOAA and INSTAAR are 

  ccgg:  NOAA Carbon Cycle Greenhouse Gases group (CCGG)
  hats:  NOAA Halocarbons and other Atmospheric Trace Species group (HATS)
  arl:   INSTAAR Atmospheric Research Laboratory (ARL)
  sil:   INSTAAR Stable Isotope Laboratory (SIL)
  curl:  INSTAAR Laboratory for Radiocarbon Preparation and Research (CURL)


Project-specific notes:


Dataset is now provided in self describing ObsPack format with improved metadata.
Surface flask event data are available in NetCDF and ASCII text. Surface flask
monthly data are available in ASCII text. Shipboard data binned by 5 or 3 degrees
are now removed from surface flask event data, but still provided in monthly data.
This format change makes some previous notes irrelevant.

Parameter-specific notes:


In Spring 2023, we moved to an internal quality control
(QC) tagging system for the flask air samples.
There are three categories of tags documenting
issues associated with sample collection, measurement
and representativity in the CCGG database.
Tags are more specific than flags, which allows
a more granular internal tracking and analysis of QC issues.

Tags are converted to simplified 3 character flags in the data
files for external data users.  See section 7.5 for more details.


Individual site data files provide H2 dry air mole fractions in 
parts per billion (ppb) (ppb = 1 part in 10^-9 by mole 
fraction = nmole/mole) based on measurements from the NOAA 
GML Carbon Cycle Cooperative Global Air Sampling Network. 

More information about the flask network can be found at:
A map and table at https://gml.noaa.gov/dv/site/?program=ccgg
list the flask network sampling locations, the 3 letter codes 
used to identify them, and their latitudes, longitudes and 
altitudes. H2 data from sites not provided in this directory 
may be available from GML (contact Gabrielle Petron).  

All air samples were analyzed for H2 at the NOAA GML laboratory 
in Boulder using gas chromatography with a Helium Pulse Discharge
Detector (Novelli et al., 2009). All measurements are
referenced to the WMO/MPI X2009 calibration scale.

Between 2009 and July 2019, we used a single standard calibration 
strategy as the response characteristics of the instrument, 
Gas Chromatography with a Pulsed Discharged Helium Ionization Detector, 
has been shown to be linear over a range of 0 to 2000 ppb H2 
(Novelli et al., 2009). 

Since August 2019, we use a multi-standard calibration strategy 
(normalized to a reference air tank) for the flask analysis system 
instrument, MAGICC-3. These calibration episodes were conducted biweekly early 
on and are now performed every 4 to 5 weeks. Instrument drift 
between calibration episodes is corrected by normalization to 
a reference tank. The calibration response results for the H2 standards
normalized peak heights are valid until the next calibration episode.   

One sigma total uncertainties are provided. They are calculated from two terms: 
short term repeatability and calibration scale propagation uncertainty. 
To get the total uncertainty variance, terms are added in quadrature.

Previous NOAA GML CCGG H2 data releases and publications included 
flask air measurements using gas chromatography with a mercuric oxide 
reduction detection and referenced to an internal H2 calibration scale 
(Novelli et al., 1999). These results were not corrected for the transition
to the WMO H2 X2009 scale maintained by MPI-BGC. The NOAA H2 X1996 internal scale 
is known to be biased relative to WMO H2 X2009 and unstable over time. 
In addition, the GML older data is not corrected for the non-linear response 
of the reduction gas analyzer (RGA) instruments, so may contain mole fraction 
dependent offsets. 

Petron et al. (in prep) will provide a detailed description
of the implementation of the WMO/MPI H2 X2009 scale at GML 
and the reprocessing of the flask air GC-HePDD H2 measurements.


Measurements are reported in units of 10^-9 mol H2 per mol 
of dry air (nmol/mol) or parts per billion (ppb) relative 
to the WMO/MPI H2 X2009 scale (Jordan and Steinberg, 2011).

Pacific Ocean Cruise (POC, travelling between the US west coast
and New Zealand or Australia) flask-air samples were collected in
about 5 degree latitude intervals. For South China Sea (SCS), samples
 were collected at about 3 degree latitude intervals.

Sampling intervals are approximately weekly for fixed sites
and average one sample every 3 weeks per latitude zone for POC and
about one sample every week per latitude for SCS.

Historically, samples have been collected using two general methods:
flushing and then pressurizing glass flasks with a pump, or opening a
stopcock on an evacuated glass flask; since 28 April 2003, only the
former method is used.  During each sampling event, a pair of flasks
is filled.


For a summary of sampling locations, please visit


Note: Data for all species may not be available for all sites listed 
in the table.

To view near real-time data, manipulate and compare data, and create
custom graphs, please visit



Encoded into each file name are the parameter (trace gas identifier); sampling 
site; sampling project; laboratory ID number; measurement group (optional); and optional 
qualifiers that further define the file contents.

All file names use the following naming scheme:

         1      2         3               4                   5            
[parameter]_[site]_[project]_[lab ID number]_[optional measurement group]_[optional 

         6           7
qualifiers].[file type]

1. [parameter]

   Identifies the measured parameter or trace gas species.

   co2      Carbon dioxide
   ch4      Methane
   co2c13   d13C (co2)
   merge    more than one parameter

2. [site]

   Identifies the sampling site code.


3. [project]
   Identifies sampling platform and strategy.


4. [lab ID number]

   A numeric field that identifies the sampling laboratory (1,2,3, ...).
   NOAA GML is lab number 1 (see https://gml.noaa.gov/ccgg/obspack/labinfo.html).

5. [optional measurement group]

  Identifies the group within the NOAA GML or the Institute of Arctic and Alpine
  Research (INSTAAR) at the University of Colorado Boulder that made the
  It is possible to have multiple different groups measuring some of the same
  trace gas species in our discrete samples.  

  Measurement groups within NOAA and INSTAAR are 

  ccgg:  NOAA Carbon Cycle Greenhouse Gases group (CCGG)
  hats:  NOAA Halocarbons and other Atmospheric Trace Species group (HATS)
  arl:   INSTAAR Atmospheric Research Laboratory (ARL)
  sil:   INSTAAR Stable Isotope Laboratory (SIL)
  curl:  INSTAAR Laboratory for Radiocarbon Preparation and Research (CURL)

6. [optional qualifiers]

   Optional qualifier(s) may indicate data subsetting or averaging.
   Multiple qualifiers are delimited by an underscore (_).  A more detailed
   description of the file contents is included within each data file.

   event         All measurement results for all collected samples (discrete (flask) data only).
   month         Computed monthly averages all collected samples (discrete (flask) data only).
   hour_####     Computed hourly averages for the specified 4-digit year (quasi-continuous data only)
   HourlyData    Computed hourly averages for entire record (quasi-continuous data only)
   DailyData     Computed daily averages for entire record (quasi-continuous data only)
   MonthlyData   Computed monthly averages for entire record (quasi-continuous data only)

7. [file type]
   File format (netCDF, ASCII text). 


   txt           ASCII text file
   nc            netCDF4 file


We now provide some NOAA Global Monitoring Laboratory measurements
in two unique file formats; netCDF and ASCII text. 

The Network Common Data Form (NetCDF) is a self-describing, machine-independent
data format that supports creation, access, and sharing of array-oriented
scientific data.  To learn more about netCDF and how to read netCDF
files, please visit http://www.unidata.ucar.edu. 

The ASCII text (technically UTF-8 encoded) file is derived directly from the 
netCDF file.  The text file is also self-describing and can be viewed using 
any ASCII or UTF-8 capable text editor.  "Self-describing" means the file 
includes enough information about the included data (called metadata) 
that no additional file is required to understand the structure of the data 
and how to read and use the data.  Note that some non-ASCII characters (accents,
international character sets) may be present in various names and contact 
information.  These may require a UTF-8 capable text editor to view properly.


See individual files for description of the provided variables and other
dataset metadata.


NOAA GML uses a 3-column quality control flag where each column
is defined as follows:

column 1    REJECTION flag.  An alphanumeric other
            than a period (.) in the FIRST column indicates
            a sample with obvious problems during collection
            or analysis.  This measurement should not be interpreted.

column 2    SELECTION flag.  An alphanumeric other than a
            period (.) in the SECOND column indicates a sample
            that is likely valid but does not meet selection
            criteria determined by the goals of a particular

column 3    INFORMATION flag.  An alphanumeric other than a period (.) 
            in the THIRD column provides additional information 
            about the collection or analysis of the sample.

            WARNING: A "P" in the 3rd column of the QC flag indicates
            the measurement result is preliminary and has not yet been 
            carefully examined by the PI.  The "P" flag is removed once 
            the quality of the measurement has been assessed.

When samples are collected in pairs, the pair
difference is calculated, and samples with a
pair difference greater than 5 ppb are rejected.

Quality control 3-column flags indicate retained and rejected flask 
results as follows in datafiles.

If the first character is not a period, the sample result has been 
rejected. A second column character other than a period indicates 
a sample that is likely valid but does not meet selection 
for representativeness such as midday sampling or 
background air sampling. A third column flag other than a period 
indicates abnormal circumstances that are not thought 
to significantly affect the data quality.

              Flag      Description

Retained      ... 	good pair, no other issues
                        (D <= 5 ppb)

Rejected      M..	sample measurement issue

 	      C..	sample collection issue

	      B..	both measurement and collection issues

Selection     .S.	selection issue. High/low mole fraction 
			thought to not represent background
                      	conditions for example.

Informational ..M	informational measurement tag or 
			potential measurement issue

	      ..C	informational collection tag or 
			potential collection issue

The retained values comprise the data set that we feel best
represents the H2 distribution in the remote, well-mixed 
global troposphere.  It is possible, and even likely, that 
some values flagged as non-background conditions 
(with a 2nd column flag other than '.') are valid 
measurements, but represent poorly mixed air parcels 
influenced by local anthropogenic sources or strong local 
biospheric sources or sinks.  Users of these data should be 
aware that data selection is a difficult but necessary aspect 
of the analysis and interpretation of atmospheric trace gas 
data sets, and the specific data selection scheme used may be 
determined by the goals of a particular investigation.


A single-character code is used to identify the sample collection method.
The codes are:

    P - Sample collected using a portable, battery
 powered pumping unit.  Two flasks are
 connected in series, flushed with air, and then
 pressurized to 1.2 - 1.5 times ambient pressure.
    D - Similar to P but the air passes through a
 condenser cooled to about 5 deg C to partially
  dry the sample.
    G - Similar to D but with a gold-plated condenser.
    T - Evacuated flask filled by opening an O-ring sealed       
    S - Flasks filled at NOAA GML observatories by sampling
 air from the in situ CO2 measurement air intake system.
    N - Before 1981, flasks filled using a hand-held
 aspirator bulb. After 1981, flasks filled using a
 pump different from those used in method P, D, or G.
    F - Five liter evacuated flasks filled by opening a
 ground glass, greased stopcock


The monthly data files in https://gml.noaa.gov/aftp/data/trace_gases/h2/flask/surface/ 
use the following naming scheme (see Section 7.2):

     [parameter]_[site]_[project]_[lab ID number]_[measurement group]_month.txt

(ex) CH4_pocn30_surface-flask_1_ccgg_month.txt contains CH4 ccgg monthly
     mean values for all surface flask samples collected on the Pacific
     Ocean Cruise sampling platform and grouped at 30N +/- 2.5 degrees.

(ex) CO2_brw_surface-flask_1_ccgg_month.txt contains CO2 ccgg monthly
     mean values for all surface flask samples collected at Barrow, Alaska.

Monthly means are produced for each site by first averaging all
valid measurement results in the event file with a unique sample
date and time.  Values are then extracted at weekly intervals from 
a smooth curve (Thoning et al., 1989) fitted to the averaged data 
and these weekly values are averaged for each month to give the 
monthly means recorded in the files.  Flagged data are excluded from the
curve fitting process.  Some sites are excluded from the monthly
mean directory because sparse data or a short record does not allow a
reasonable curve fit.  Also, if there are 3 or more consecutive months
without data, monthly means are not calculated for these months.

The data files contain multiple lines of header information 
followed by one line for each available month.

Fields are defined as follows:

Field 1:    [SITE CODE] The three-character sampling location code (see above).

Field 2:    [YEAR] The sample collection year and month.
Field 3:    [MONTH]

Field 4:    [MEAN VALUE] Computed monthly mean value


All (ASCII text and netCDF) files are located in 

To transfer all files in a directory, it is more efficient to 
download the tar or zipped files. Individual or zipped files can
be downloaded using your web browser by clicking the hyperlinked file
or right clicking hyperlink and using browser menu to 'save as' or similar.

Files can also be accessed by anonymous ftp at aftp.cmdl.noaa.gov. 


Jordan, A. and B. Steinberg, Calibration of atmospheric hydrogen 
  measurements, Atmos. Meas. Tech., 4, 509–521, 2011,

Novelli, P.C., A.M. Crotwell, and B.D. Hall, Application 
  of Gas Chromatography with a Pulsed Discharge Helium Ionization 
  Detector for Measurements of Molecular Hydrogen in the Atmosphere, 
  Environ. Sci. Technol. 2009, 43, 7, 2431–2436,