GSWP-2 Data FAQ
Last update: 31 August 2004
We will update this FAQ as users submit queries and our experts find answers.
Clarifications, corrections, etc:
Time stamps of ISCCP radaiation data (31 August 2004)
The 3-hourly surface shortwave and
longwave radiation data from ISCCP have different time characteristics
than the other radiation data (SRB, NCEP, and ECMWF). NCEP and
ECMWF represent a 3-hour mean, and the SRB data were originally
instantaneous, but interpolated to a 3-hour mean for consistency with
the reanalysis products. From a GDS, a time period that
represents the mean from 0000UTC to 0300UTC is labeled as 0000UTC, but
the NetCDF file metadata label each record with the end time (e.g., 0000UTC to 0300UTC is labeled as 0300UTC).
data have a time stamp in the NetCDF files and on the GDS that
represents the middle of the 3-hour period, not the end as with the
other radiation data sets. Thus, a record labeled as 0300UTC
represents the period between 0130UTC-0430UTC. Strictly speaking
it is not a time average, but rather the center of a 3-hour sampling
window. However, for our purposes, it can be treated as an
If the interpolation subroutine drv_finterp is used, the time flags should be chosen thusly:
Access over GDS: "C" or "c" for ISCCP data; "N" or "n" for SRB, ERA and NCEP.
Navigating NetCDF files using their metadata: "C" or "c" for ISCCP data; "L" or "l" for SRB, ERA and NCEP.
High near-surface air temperatures over Greenland (13 November 2003)
Units error for slope data (22 August 2003)
Kenji Tanaka has discovered some unusual behavior in temperatures over some locations in Greenland (PDF of sample plots here
The errors exist at least in 1983 and 1985. We have not yet
determined the source of the error, but the data will not be changed
again. If these errors cause problems in your model, please
screen for unrealistic temperature swings in your model (e.g., a
maximum gradient check between surface and air temperature).
Similar screens may be necessary for specific humidity at these
locations as well.
Kenji has kindly provide some Fortran code
to adjust the extreme temperatures at those gridpoints where there are obvious problems. You will also need the grid mask
for large temperature fluctuations. A list of filenames, read by the program, is given here
. This code assumes you have the files on local disk, with one year's temperature data per file.
Values for average land slope (Slope) were reported in the wrong units.
The original ISLSCP-2 slope data were recordedas an angle in degrees,
and not as a percentage grade as we had believed. So our conversion
to fractional grade (tan(slope angle) or "rise over run") was incorrect,
leading to values that were too small by nearly a factor of two. This
data set should be corrected in the next few days.
Zero values for some vegetation/aerodynamic fields (22 August 2003)
This message is in regard to a problem found initially with the roughness
length data (originally from the ISLSCP-2 data provided by Sietse Los) by
Helin Wei at NCEP/EMC. Roughness length values are 0 over land ice
points, as well as over some severe desert points. Because of the algorithm
used by Los to calculate roughness length (based on vegetation coverage),
points that register no vegetation in his data set were assigned no value
for aerodynamic paramaters. Some of the desert points may have intermiitent
zero values (i.e., not all months have positive values at the same point).
The table below summarizes what Maggie Zhao has found from scanning
the data sets:
1) Fapar: oceans=0 and ice=0.001
2) Green LAI: oceans=-0.1 and ice=0
3) Total LAI: oceans=-0.1 and ice=0.01
4) Roughness length (z0): oceans=-0.1 and ice=0
5) zero plane displacement (d): oceans=-0.1 and ice=0
6) Vcover: oceans=-0.1 and ice=0 (ONLY ONE FILE)
The best solution is to set a minimum acceptable value in your model code,
consistent with your LSS. This, we feel, is better than having us arbitrarily
chaging values in the ISLSCP-2 data set.
Extreme values in some CSU albedo fields (18 August 2003)
This message is in regard to a problem found initially by Helin Wei at NCEP/EMC.
Some zero values appeared over land, due to a problem in the approach
to land-sea masking that was applued to the monthly dataset. This has
been fixed and rerun - new files have been posted to DODS and FTP servers
for the file:
These updates were made on or after 18 August.
Also, there appear to be some unrealistically high or low values to snow-free
surface albedo in the CSU surface albedo field (monthly-varying total albedo).
These appear to be in the original data - the high values are consistent
with inappropriate screening for snow in some areas. We have no theory
about the low values. Anyhow, as with the data sets described above,
please apply checks for minimum/maximum acceptable values in your model code,
consistent with your LSS.
Very low values for SWdown (1 August 2003)
This message is in regard to the problem found with the shortwave radiation
control forcing data (SWdown_srb variable). In the model running process,
Chris Milly's group has uncovered some unusual behavior in this variable.
For examplle, at the gridpoint (12.5E, 14.5S - coast of Africa) in March
and May 1987, daily-average SWdown_srb is found to take a value less than
10 W/m2 for most of the time during these months, where no such behavior
appears at adjacent gridpoints. Such unusual behavior might crash some models.
We have checked the files. We think this problem might
be due to the missing value in the original SRB data since we have set the values of all
missing points equal to 10 W/m^2 for high latitude bands in the data processing.
land-sea mask for the SRB data is not static in time (we assumed it was),
since this problem is occurring at a coastal point. We will look
through the shortwave radiation data for similar problems. If anyone
else encounters similar problems with the radiation data, please let us
know, and we will attempt a satisfactory solution. If this turns
to be an isolated incident, we may just suggest a
work-around. In the mean time, does anyone have a clever idea how to screen the SWdown files for similar glitches??
ALMA names and units of soils output data (30 July 2003)
Clarification and change to Table 11
to distingish the output soil properties of wilting point, field capacity
and saturation in units of water depth (m) as opposed to the optional input
parameters for these fields, which are volumetric. The variable names have
been changed from W_* to M_* to distinguish them.
Differences in timestamps for 3-hourly data:
- Problem: The 3-hourly NetCDF forcing files
will have slightly different time stamps depending on whether you access
the files online from the GDS using DODS libraries, or download and access
the NetCDF metadata locally. The local NetCDF metadata will show the
first time step as 0300UTC 1 July 1982,
the valid instantaneous time for the state variables (temperature, humidity,
pressure, winds) while the flux fields (radiation and precipitation) contain
the mean flux rate over the previous 3 hours (00-03 UTC). However,
on the GDS, the first time step of data will be labeled as 00UTC 1 July 1982.
This means the timestamp reported by the GDS is incorrect for the instantaneous
state variables (it is 3-hours too early) and represents the mean flux rate over the following 3 hours (00-03 UTC).
- Reason: Because of the way GrADS handles
templating (files with date information in the file name, such as a monthly
file with 3-hourly time step data), it expects all data in the file for a
given month to have a date in that month. Because of the last time
step in these monthly data files is actually 0000UTC on the first day of
the previous month, the templating fails. The solution was to override
the NetCDF metadata and enforce a 0000UTC start time for each monthly file.
- Solution: Do not use the date/time information
provided by the GDS, but rather the integer timestep (note the timestep axis,
called "time" in the NetCDF metadata, is identical and correct in both cases).
This will allow you to navigate the time axis without shifting your
forcing data by 3 hours.
Other minor glitches that have been repaired:
Averaging of output data
- vegfrac_uk was unreadable from GDS during the last few weeks - a glitch in the GDS directory has been fixed (16 July 2003).
- soildepth had the incorrect units. Instead of meters, the units were decimeters. This data set was rescaled to meters on 10 July 2003, so please check runs made before then for errors related to this units error (10 July 2003).
Clarifications have been made on the web documentation regarding the treatment
of state variables (averaged for daily data, but not for the 3-hourly data).
All fluxes are averaged rates.
Choice of land surface parameters
There have been questions about which parameter fields to use when more than
one option is available (e.g., prescribing wilting point). The answer:
use your best judgement, whatever is most consistent between your model and
the experiment. Please document any choices you make and submit that
information with your results following the guidelines for ancillary information (9 July 2003).
Proper use of DODS/GDS (P. Dirmeyer, 12 May 2003)
There continues to be some confusion about the proper application of the DODS technology. Hopefully this will clairfy some misconceptions:
Downloading the GSWP-2 data set from the DODS server to your local system is generally a misuse of the DODS server. The DODS data server is like a library in the purist sense of the term. When you check out a book from the library, do you photocopy the entire book, return it, and then read your copy? It is difficult to overcome the habit of working with data that you don't actually have on your own disk. The capability to access and use gigabytes of binary data over the Internet has not existed before. It does not make sense for terabytes of disk across many institutes to be redundantly occupied by the same data that are freely available over the net.
Is your model going to run slower using the NetCDF/DODS libraries and the DODS server than running on local disk? Yes - anywhere from 15% to perhaps 100% or more depending on your connectivity. But it would take many days just to download the data anyway. Currently the GSWP-2 forcing data are 56GB in NetCDF CF format (1° lat-lon land only 60°S-90°N), and will be over 100GB once the ERA-40 meteorological forcing are processed.
How do you compensate for this slowdown? The best way is to thoroughly test your code on a subset of the data. You can run through the entire simulation (5-10 years' spin-up on the first year of data, 2.5 years simulation to the start of the period of record, and then the 10-year simulation) on a small set of grid points, or run your global domain for just a month or two. The former approach may be a better test of your model - the DODS servers can easily subset and serve specific points and times to your program. Once you are satisfied that the code is working properly, then you can conduct a full simulation.
Only if you are seriously restricted in your Internet connectivity should you consider storing the data sets on your local disk; see the NetCDF entry below for a means of downloading data off the server, or request us to send you DDS tapes of the data.
This is a new data distribution technology, and GSWP-2 is one of the first experiments of this type (distributed modeling; centralized data) to be attempted. There is a learning curve, and some time to invest up front to get the DODS libraries functioning properly. Once you have your DODS client(s) (Fortran, C, IDL, Matlab, GrADS, Ferret,...) working, a whole new data universe will be open to you. For example, NCEP is making their operational forecasts available on GDS. The web page http://www.iges.org/grads/gds/index.html lists a few of the GDS servers that are available - in the US there are also GDSs at GFDL, GSFC and NCAR. Unidata maintains a broader list of DODS data servers, and the Global Change Master Directory is now available via DODS.
What does GDS provide that DODS doesn't? (P. Dirmeyer, 5 May 2003)
There are 3 main advantages to the GDS over a generic DODS server:
FTP access to GSWP-2 data. (J. Adams, P. Dirmeyer, 20 June 2003)
Server-side analysis. This is not particularly useful for modelers who are just accessing forcing data, but it could be very advantageous for analysis and comparison of the model results. See the section titled "Evaluate expressions on the server side when appropriate" at: http://www.iges.org/grads/gds/doc/user.html for more information on how to have the GDS do your number crunching and send back only the final results to you over the Internet.
Templating of file names in GrADS. GrADS data descriptor files allow for defining parts of file names as having meanings, such as time-stamps, that allow GrADS and the GDS to access a large number of files (e.g., a time series of data with each time in a different file) with a single "open" statement. This was developed to access GCM output easily (e.g., forecasts from NCEP or ECMWF). But it is also useful here because ISLSCP-2 (from which the GSWP-2 forcing data are derived) insisted on having separate files for each time step.
Dual-access to compressed ALMA data sets. This is a new feature developed just for GSWP. ALMA data sets in the NetCDF CF "compressed by gathering" format, where all water points are squeezed out, can be accessed as either the native land-only vectors or complete repopulated grids. This is especially cool, and requires the 1.9 beta version of GrADS on the server side, but nothing special for the client/user. The different access to the same data is accomplished by having two different data descriptor files (.ctl files) for the same data set. One describes it as a vector (e.g., for model access), and the other describes it as compressed grid and gives the "pdef" to uncompress it "on-the-fly". A special binary mapping table has been created for the ISLSCP2/GSWP2 grid (60S-90N), and can be used to view gridded versions of your ALMA vector model output with GrADS.
Data are now being made available by FTP for those who cannot access any
of the servers above because of firewall, compiler, or other issues. Access
to the FTP server is tightly controlled, and should be use as a last resort only if you have tried and cannot access the data sets interactively from your programs (see the page on GSWP-2 and DODS for guidelines on using the DODS-enabled libraries for your favorite programming language or applications). There are over 55GB of forcing data (including only the baseline and NCEP-based atmospheric forcing datasets), so be prepared for the volume if you must take this route.
To access the data by FTP, you must contact Jennifer Adams
and give her the IP address of the machine you will be using for your FTP
session(s). You will also need to request a password from her. The password,
once activated for you, will expire after seven (7) days. If you cannot
complete your downloads in that time, contact Jennifer to have it reset.
The FTP site is:
login as user gswp2data
There you will find a list of the directories that
contain the set of netcdf files. There is one file for each month for
the 3-hourly data. For the monthly data, the entire timeseries is
contained in one netcdf file. Note that these are the so-called "vector" or "compressed by gathering" files that contain only data for the land grid points, and not fully-populated grid files.
Also, please let Jennifer know when you're finished.
Building the DODS package (J. Wielgosz, 7 April 2003)
You will need three source tarballs:
Untar these all in the same directory; then
cd DODS; ./configure; make World
and it will start building everything. Expect to spend some time helping it along if you are building on Alpha, various scripts and sourcefiles will probably need tweaking.
Testing your access to DODS data servers (Z. Guo, 30 April 2003)
We have supplied a simple script and FORTRAN code to test your access to the North American GSWP-2 GDS (with minor modification it can be use to test access to other GDSs as well). The Unix script and source code are at:
The script will delete .dods_cache and build the executable file. When you run the test
program, you need to change the .dods_cache directory and path to the DODS libraries for your case, and type:
to run the test program. The program only reads 9 variables of one month data. So anybody who thinks they are having problems with the server stability can use test_monsoon.f90 to access the DODS server data in their local computer. The source code can be changed to point to the European or Japanese mirror GDSs (URLs not available at the time of this posting).
Diagnostics to help debug DODS problems in your code (J. Adams, 14 April 2003; P. Dirmeyer, 12 May 2003)
If you are having problems accessing the data on the GDS, it would be most helpful to note the following information when reporting your problem to the support personnel:
Exact time of the problem and the name (URL or IP address) of the machine you are running on (so we can check the GDS access logs)
The platform, operating system and utility/computing language you are using.
The error message as it appeared.
The return codes from the calls to the NetCDF/DODS subroutines (if you are running a program you compiled from source code that uses DODS/NetCDF libraries, or any other DODS-enabled library package).
The file and record you were trying to access (if you can determine it)
Is the error reproducible or intermittent? If your program crashes at a different place each time (after removing the .dods_cache directory before each run), that may point to a different cause.
Periodically deleting the DODS cache to improve stability (Z. Guo, 29 April 2003)
Using the DODS client library generates entries in your home directory: .dodsrc is a facility file, and .dods_cache is a facility directory for DODS. Both of them are implicit files. They can be found in your home directory by typing:
Changing the maximum number of DODS files that you can open (J. Wielgosz, 7 April 2003)
The file .dodsrc cannot be deleted, while .dods_cache can be deleted before or after your code run, but not in the middle of data transfer. The .dods_cache is purported to store some information on your data accessing history
for fast re-access of the DODS data you accessed before. It caches stuff without you knowing about it. This facility can be turned off by setting
in .dodsrc (see: http://www.unidata.ucar.edu/packages/dods/user/guild-html/guide_71.html). However, it might result in errors when you access a large amount of DODS served data. At the same time, if you leave it alone, and there are too many directories created in .dods_cache, your local cache can get corrupted, your program will end in a "segmentation fault". So the best way is to maintain a reasonable amount information in .dods_cache, and to delete it before or after your program execution.
If your local DODS cache grows very large or is used for multiple sessions, you may experience program crashes. It is advisable to deleted the .dods_cache directory periodically. For example, if you are running your land surface model globally for one month at a time (restarting the model each month from a restart file), you should delete the cache between each month's run. Do not delete the cache while the model (or any other program using the DODS librarles) is still active and has remote files open.
The limit of 32 files open at once is part of the DODS library. To change this limit you need to edit DODS/src/nc3-dods*/lnetcdf/netcdf.h line 830:
#define MAX_NC_OPEN 32
in the DODS distribution, then recompile libdap++.a and libnc-dods.a.
NetCDF utilities that are DODS-compatible (J. Wielgosz, 6 May 2003)
There exist "NetCDF Operators" (http://nco.sourceforge.net) that may be useful for some DODS data access situations. They are a bunch of C utilities that do simple things to netCDF data, like splitting it, averaging it, etc. In particular, ncks ("netcdf kitchen sink") copies data from one netCDF file to another, or dumps it as IEEE binary or ASCII (like ncdump). So one can generate local copy of data on a DODS server as simply as this:
ncks http://dods-url localfile.nc
There are a bunch of command line switches that allow control of subsetting - individual variables, and dimension constraints within the variables - as well as other details. For example:
ncks -d time,0,9,2 -d lev,6,10 -v t http://cola8.iges.org:9191/dods/eta/eta2003050612i eta.nc
retrieves variable named "t" for vertical levels 6-10 (whatever those correspond to), at 12 hour intervals, from todays ETA output, and writes it to eta.nc.
Of course, many people who *think* they want to generate local files from DODS, actually just misunderstand the system. But for those who really do need local files for something, and aren't/don't want to be GrADS users, it might be worth a try.
GrADS and the CF convention of NetCDF (ALMA) (P. Dirmeyer, 12 May 2003)
If you are a GrADS user, soon you will be able to use GrADS to directly view gridded versions of your ALMA vector model output. Hopefully this new version of GrADS (with many other new features) will be ready this summer.
Conflicts with more than one NetCDF library (Z. Guo, 23 May 2003)
Using NetCDF libraries other than the DODS-NetCDF libraries may cause problems such as unrecognized routines if mixed with the DODS-enabled code. To ensure compatibility with your site, you may need to create the DODS-NetCDF library yourself from the tar-files on the FAQ page, rather than rely on a library compiled elsewhere.
NetCDF routines appear undefined (Z. Guo, 27 May 2003)
Library compilation problems may lead to the following problems.
Using the F90 NetCDF routines with DODS (Z. Guo, 7 June 2003)
First: routines like nc__create_mp, nc_delete_mp, nc__open_mp, nc_delete are undefined. Downloading and compiling the source codes from the tar-files listed on the FAQ site should avoid this problem (see conflict note above).
Second: nf_open__, nf_enddef__, nf_close__, etc. are undefined. You find those references have an added extra underscore on the end. If that's the case, you need to turn on the '-Df2cFortran' option (change DEFS) in the Makefile (or Makefile.in), and re-build the DODS netcdf library. This should make it work.
I believe the current available libraries from UCAR Unidata are not linked with Fortran 90 NetCDF interface. So you can only link your Fortran code in F77 with those libraries. If you do want to use F90 NetCDF subroutine calls to access DODS data, you have to link NetCDF F90 interface with the DODS libraries to get the DODS/netCDF libraries, and use that set libraries to link with your Fortran code in F90. Two messages at
might be useful.
The NetCDF User's Guide for Fortran 90 is online at http://www.unidata.ucar.edu/packages/netcdf/f90/Documentation/guide.book.pdf
All the interfaces for netcdf/f90 functions can be found there.
Sample Code to read and write NetCDF in a Fortran model driver.
IOT Trap error (Z. Guo, 29 April 2003)
The IOT stands for "Index Organized Table". The "IOT Trap" error message is not specific for DODS or NetCDF or Fortran. It is usually related to threads or compiler issues. It seems the DODS uses a system call (IOT facility) to organize the .dods_cache. Under certain conditions (.dods_cache is too large to be under control, or some other reasons, I guess), that system call can fail. This problem is hard to be traced or solved. In my experience, reducing the times of data accessing transaction, and keeping the number of .dods_cache directories to no more than 400 is a practical solution for the "IOT trap" problem. Note that the subdirectories are automatically created in the .dods_cache directory. One directory will be created for one data retrieval transaction, so retrieving one month of GSWP forcing data for nine variables will create ~30*9 directories if one day's data for one variable is retrieved per transaction time.
|Contacts for GSWP-2 data, DODS, and GDS issues:|