ERDDAP Easier access to scientific data
|
Brought to you by
NOAA
NMFS
SFSC
ERD
|
Working with the datasets.xml File
[This web page will only be of interest to ERDDAP administrators.]
After you have followed the ERDDAP
installation instructions,
you must edit the datasets.xml file
in <tomcat>/content/erddap/
to describe the datasets that your ERDDAP installation will serve.
Some Assembly Required -
Setting up a dataset in ERDDAP isn't just a matter of pointing to the dataset's directory or URL.
You have to write a chunk of XML for datasets.xml which describes the dataset.
- For gridded datasets, in order to make the dataset conform to ERDDAP's data structure for gridded data,
you have to identify a subset of the dataset's variables which share the same dimensions.
(Why? How?)
- The dataset's current metadata is imported automatically.
But if you want to modify that metadata or add other metadata, you have to specify it in datasets.xml.
And ERDDAP needs other metadata, including global attributes
(such as infoUrl, institution, sourceUrl, summary, and title)
and variable attributes (such as long_name and units).
Just as the metadata that is currently in the dataset adds descriptive information to the dataset,
the metadata requested by ERDDAP adds descriptive information to the dataset.
The additional metadata is a good addition to your dataset
and helps ERDDAP do a better job of presenting your data to users who aren't familiar with it.
- ERDDAP needs you to do special things with the longitude, latitude, altitude, and time variables.
If you buy into these ideas and expend the effort to create the XML for datasets.xml,
you get all the advantages of ERDDAP, including:
- Full text search for datasets
- Search for datasets by category
- Data Access Forms so you can request subset of data in lots of different file formats
- Forms to request graphs and maps
- Web Map Service (WMS) for gridded datasets
- RESTful access to your data
Making the datasets.xml takes considerable effort for the first few datasets, but it gets easier.
After the first dataset, you can use often re-use a lot of your work for the next dataset.
Fortunately, there are two Tools to help you create the XML for each dataset in datasets.xml.
Tools -
There are two command line programs which are tools to help you create the XML
for each dataset that you want your ERDDAP to serve.
Once you have ERDDAP installed in Tomcat and Tomcat has unpacked the erddap.war file,
you can find these programs in the <tomcat>/webapps/erddap/WEB-INF directory.
There are Linux/Unix shell scripts (the program name, with no extension) and
Windows .bat files for each program.
When you run each program, it will ask you questions.
For each question, type a response, then press Enter. (How quaint! It's just like the old days.)
Or press ^C to exit a program at any time.
The two tools are a big help, but you still must read all of these instructions on this page carefully
and make important decisions yourself.
- GenerateDatasetsXml
is a command line program that can generate a rough draft
of the dataset XML for almost any type of datasets.
When you use the GenerateDatasetsXml program:
- GenerateDatasetsXml asks you a series of questions so that it can access the dataset's source.
- If you answer the questions correctly, GenerateDatasetsXml will connect to the dataset's source
and gather basic information (e.g., variable names).
- GenerateDatasetsXml will generate a rough draft of the dataset XML for that dataset.
- You can then copy that XML and paste it into your datasets.xml file and start to edit it.
- You can then use DasDds (see below) to repeatedly test the XML for that dataset.
Often, one of your answers won't be what GenerateDatasetsXml needs.
You can then try again, with revised answers to the questions,
until GenerateDatasetsXml can successfully connect to the dataset.
If you use "GenerateDatasetsXml -verbose", it may print more diagnostic messages than usual.
- DasDds
is a command line program that you can use
after you have created a first attempt at the XML for a new dataset in datasets.xml.
With DasDds, you can repeatedly test and refine the XML.
When you use the DasDds program:
- DasDds asks you for the datasetID for the dataset you are working on.
- DasDds tries to create the dataset with that datasetID.
It always prints lots of diagnostic messages.
If it fails (for whatever reason), it will show you the error message.
Read the diagnostic messages and the error message carefully.
Then you can make a change to the XML and let DasDds try to create the dataset again.
- If DasDds can create the dataset, DasDds will then show you the .das and .dds for the dataset.
Often, you will want to make some small change to the dataset's XML to clean up the dataset's metadata.
By going through this cycle repeatedly, you will eventually revise the dataset's XML
so that the dataset can be created and so that the dataset's metadata is as you want it to be.
If you use "DasDds -verbose", it will print more diagnostic messages than usual.
The basic structure of the datasets.xml file is:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<erddapDatasets>
<dataset>...</dataset> <!-- 1 or more -->
<user username="..." password="..." roles="..." /> <!-- 0 or more -->
<requestBlacklist>...</requestBlacklist> <!-- 0 or 1 -->
<subscriptionEmailBlacklist>...</subscriptionEmailBlacklist> <!-- 0 or 1 -->
</erddapDatasets>
It is possible that other encodings will be allowed in the future,
but for now, only ISO-8859-1 is recommended.
Several types of datasets are supported.
They fall into two categories. (Why?)
- EDDGrid datasets handle gridded data.
- In EDDGrid datasets, data variables are multi-dimensional arrays of data.
- There MUST be an axis variable for each dimension.
Axis variables MUST be specified in the order that the data variables use them.
- In EDDGrid datasets, all data variables MUST use (share) all of the axis variables.
(Why? What if they don't?)
- See the more complete description of the
EDDGrid data model.
- The EDDGrid dataset types are:
- EDDTable datasets handle tabular data.
- Tabular data can be represented as a table with rows and columns.
Each column (a data variable) has a name, a set of attributes, and stores just one type of data.
- See the more complete description of the
EDDTable data model.
- The EDDTable dataset types are:
See also the Tag and Attribute Descriptions.
Working with the datasets.xml file is a non-trivial project.
Please read this entire web page carefully, especially these notes.
- Hint - It is often easier to generate the XML for a dataset by making a
copy of a working dataset description in dataset.xml and then modifying it.
- Encoding Special Characters -
Since datasets.xml is an XML file, you need to encode "&", "<", and ">"
in any content as "&", "<", and ">".
Wrong:
<title>Time & Tides</title>
Right:
<title>Time & Tides</title>
- XML doesn't tolerate syntax errors.
After you edit the dataset.xml file, it is a good idea to verify that the result
is well-formed XML by pasting the XML text into an XML checker like
RUWF.
- Other Ways To Help Diagnose Problems With Datasets
In addition to the two main Tools,
- log.txt
is a log file with all of ERDDAP's diagnostic messages.
- The Status Page
is a quick way to check ERDDAP's status from any web browser.
It includes a list of datasets that didn't load (although not the related exceptions)
and taskThread statistics (showing the progress of
EDDGridCopy and
EDDTableCopy datasets).
- The Daily Report
has more information than the status page, including a list of datasets that didn't load and the exceptions they generated.
- The longitude, latitude, altitude, and time (LLAT) variables are special.
- LLAT variables are made known to ERDDAP if the axis variable's (for EDDGrid datasets)
or data variable's (for EDDTable datasets)
destinationName is "longitude", "latitude", "altitude", or "time".
- We strongly encourage you to use these standard names for these variables whenever possible.
If you don't, ERDDAP won't recognize their significance and, for example,
will make a graph instead of a map if the x axis variable is lon and the y axis variable is lat.
- ERDDAP will automatically add lots of metadata to LLAT variables
(e.g., "ioos_category", "units", and
several standards-related attributes like "_CoordinateAxisType").
- ERDDAP will automatically, on-the-fly, add lots of global metadata related
to the LLAT values of the selected data subset (e.g., "geospatial_lon_min").
- Clients that support these metadata standards will be able to
take advantage of the added metadata to position the data in time and space.
- Clients will find it easier to generate queries that include LLAT variables
because the variable's names are the same in all relevant datasets.
- LLAT variables are treated specially by Make A Map. For example,
if the X Axis variable is "longitude" and the Y Axis variable is "latitude",
you will get a map (using a standard projection, and with a land mask,
political boundaries, etc.) instead of a graph.
- The time variable (and related timeStamp variables) are unique in that
they always convert data values from the source's time format (what ever it is) into a numeric value
(seconds since 1970-01-01T00:00:00Z) or a String value (ISO 8601 format), depending on the situation.
- When a user requests time data, they can request it by specifying the time as
a numeric value (seconds since 1970-01-01T00:00:00Z) or a String value (ISO 8601 format).
- See units for more information about time and timeStamp variables.
- Why just two basic data structures?
- Since it is difficult for human clients and computer clients to deal with a complex set of
possible dataset structures, ERDDAP uses just two basic data structures:
- Certainly, not all data can be expressed in these structures, but much of it can.
Tables, in particular, are very flexible data structures
(look at the success of relational database programs).
- This makes data queries easier to construct.
- This makes data responses have a simple structure,
which makes it easier to serve the data in a wider variety
of standard file types (which often just support simple data structures).
This is the main reason that we set up ERDDAP this way.
- This, in turn, makes it very easy for us (or anyone) to write client software which works with all ERDDAP datasets.
- This makes it easier to compare data from different sources.
- We are very aware that if you are used to working with data in other data structures
you may initially think that this approach is simplistic or insufficient.
But all data structures have tradeoffs. None is perfect.
Even the do-it-all structures have their downsides: working with them is complex and
the files can only be written or read with special software libraries.
If you accept ERDDAP's approach enough to try to work with it, you may find that it has its advantages
(notably the support for multiple file types that can hold the data responses).
The
ERDDAP slide show
(particularly the
data
structures slide)
talk a lot about these issues.
- And even if this approach sounds odd to you, most ERDDAP clients will never notice --
they will simply see that all of the datasets have a nice simple structure
and they will be thankful that they can get data from a wide variety of sources
returned in a wide variety of file formats.
- What if the grid variables in the source dataset DON'T share the same axis variables?
In EDDGrid datasets, all data variables MUST use (share) all of the axis variables.
So if a source dataset has some variables with one set of dimensions,
and other variables with a different set of dimensions,
you will have to make two datasets in ERDDAP.
For example, you might make one ERDDAP dataset entitled "Some Title (at surface)"
to hold variables that just use [time][latitude][longitude] dimensions
and make another ERDDAP dataset entitled "Some Title (at depths)"
to hold the variables that use [time][altitude][latitude][longitude].
Or perhaps you can change the data source to add a dimension
with a single value (for example, altitude=0) to make the variables consistent.
Skeleton XML for each Type of Dataset
EDDGridFromDap handles grid variables from
DAP servers.
EDDGridFromErddap handles gridded data from
a remote ERDDAP server.
EDDTableFromErddap handles tabular data from
a remote ERDDAP server.
- EDDGridFromErddap and EDDTableFromErddap behave differently from all other
types of datasets in ERDDAP.
- Like other types of datasets, these datasets get information about the dataset
from the source and keep it in memory.
- Like other types of datasets, when ERDDAP searches for datasets,
displays the Data Access Form, or displays the Make A Graph form,
ERDDAP uses the information about the dataset which is in memory.
- Unlike other types of datasets, when ERDDAP receives a request for data
or images from these datasets, ERDDAP
redirects
the request to the remote ERDDAP server. The result is:
- This is very efficient (CPU, memory, and bandwidth), because otherwise
- The composite ERDDAP has to send the request to the other ERDDAP (which takes time).
- The other ERDDAP has to get the data, reformat it, and transmit the data to the composite ERDDAP.
- The composite ERDDAP has to receive the data (using bandwidth), reformat it (using CPU and memory),
and transmit the data to the user (using bandwidth).
By redirecting the request and allowing the other ERDDAP to send the response directly to the user,
the composite ERDDAP spends essentially no CPU time, memory, or bandwidth on the request.
- The redirect is transparent to the user regardless of the client software
(a browser or any other software or command line tool).
- Normally, when an EDDGridFromErddap and EDDTableFromErddap are (re)loaded on your ERDDAP,
they try to add a subscription to the remote dataset via the remote ERDDAP's email/URL subscription system.
That way, whenever the remote dataset changes, the remote ERDDAP contacts the
setDatasetFlag URL
on your ERDDAP so that the local dataset is reloaded ASAP
and so that the local dataset always mimics the remote dataset.
So, the first time this happens, you should get an email requesting that you validate the subscription.
However, if the local ERDDAP can't send an email or if the remote ERDDAP's email/URL subscription system
isn't active, you should email the remote ERDDAP administrator and request that s/he manually add
<onChange>...</onChange>
tags to all of the relevant datasets to call your dataset's
setDatasetFlag URLs.
See your ERDDAP daily report for a list of setDatasetFlag URLs,
but just send the ones for EDDGridFromErddap and EDDTableFromErddap datasets
to the remote ERDDAP administrator.
- EDDGridFromErddap and EDDTableFromErddap are the basis for
clusters and federations
of ERDDAPs, which efficiently distribute the CPU usage (mostly for making maps), memory usage,
dataset storage, and bandwidth usage of a large data center.
- For security reasons, EDDGridFromErddap and EDDTableFromErddap don't support the
<accessibleTo> tag.
See ERDDAP's
security system
for restricting access to some datasets to some users.
- The skeleton XML for an EDDGridFromErddap dataset is very simple, because the intent is just to
mimic the remote dataset which is already suitable for use in ERDDAP:
<dataset type="EDDGridFromErddap" datasetID="..." active="..." >
<sourceUrl>...</sourceUrl>
<reloadEveryNMinutes>...</reloadEveryNMinutes>
<onChange>...</onChange> <!-- 0 or more -->
</dataset>
- The skeleton XML for an EDDTableFromErddap dataset is very simple, because the intent is just to
mimic the remote dataset, which is already suitable for use in ERDDAP:
<dataset type="EDDTableFromErddap" datasetID="..." active="..." >
<sourceUrl>...</sourceUrl>
<reloadEveryNMinutes>...</reloadEveryNMinutes>
<onChange>...</onChange> <!-- 0 or more -->
</dataset>
EDDGridFromEtopo just serves the
ETOPO topography data
which is distributed with ERDDAP.
EDDGridFromFiles is the superclass of all
EDDGridFrom...Files classes.
You can't use EDDGridFromFiles directly.
Instead, you can use subclasses of EDDGridFromFiles to handle specific file types:
It should be relatively easy to support other file types. Contact us if you have requests.
Details - The following information applies to all of the subclasses of EDDGridFromFiles.
- Aggregation - This class aggregates data from local files.
The resulting dataset appears as if all of the file's data had been combined.
The local files all MUST have the same dataVariables (as defined in the dataset's datasets.xml).
All of the dataVariables MUST use the same axisVariables/dimensions (as defined in the dataset's datasets.xml).
The files will be aggregated based on the first (left-most) dimension, sorted in ascending order.
Each file MAY have data for one or more values of the first dimension, but there can't be any overlap between files.
If a file has more than one value for the first dimension,
the values MUST be sorted in ascending order, with no ties.
All files MUST have exactly the same values for all of the other dimensions.
All files MUST have exactly the same units metadata for all axisVariables and dataVariables.
For example, the dimensions might be [time][altitude][latitude][longitude],
and the files might have the data for one time (or more) value(s) per file.
The big advantages of aggregation are:
- The size of the aggregated data set can be much larger than a single file can be conveniently (~2GB).
- For near-real-time data, it is easy to add a new file with the latest chunk of data.
You don't have to rewrite the entire dataset.
- Directories - The files MAY be in one directory, or in a directory and its subdirectories (recursively).
Note that if there are a large number of files (e.g., >1000), the operating system (and thus EDDGridFromFiles)
will operate much more efficiently if you store the files in a series of subdirectories.
- Cached File Information - When an EDDGridFromFiles dataset is first loaded,
EDDGridFromFiles reads information from all of the relevant files
and creates tables in memory with information about each valid file and each invalid file (one file per row).
The tables are also stored on disk, as .json files in <bigParentDirectory>/datasetInfo in files named
[datasetID].dirs.json (which holds a list of unique directory names),
[datasetID].files.json (which holds the table with each valid file's information),
[datasetID].bad.json (which holds the table with each bad file's information).
The copy of the file information tables on disk is also useful when ERDDAP is shut down and restarted:
it saves EDDGridFromFiles from having to re-read all of the data files.
You shouldn't ever need to delete or work with these files.
You can delete these files (but why?).
If you ever do need to delete these files (why?), you can do it when ERDDAP is running.
(Then set a flag.)
If you want to encourage ERDDAP to update the stored dataset information
(for example, if you just added, removed, or changed some files to the dataset's data directory),
use the
flag system
to force ERDDAP to update the cached file information.
- Handling Requests -
When a client's request for data is processed, EDDGridFromFiles can quickly look
in the table with the valid file information to see which files have the requested data.
- Updating the Cached File Information -
Whenever the dataset is reloaded, the cached file information is updated.
- The dataset is reloaded periodically as determined by the <reloadEveryNMinutes>
in the dataset's information in datasets.xml.
- The dataset is reloaded as soon as possible whenever ERDDAP detects that you have added, removed,
touch'd (to change the file's lastModified time),
or changed a datafile.
- The dataset is reloaded as soon as possible if you use the
flag system.
When the dataset is reloaded, ERDDAP compares the currently available files to the cached file information tables.
New files are read and added to the valid files table.
Files that no longer exist are dropped from the valid files table.
Files where the file timestamp has changed are read and their information is updated.
The new tables replace the old tables in memory and on disk.
- Bad Files - The table of bad files and the reasons the files were declared bad (corrupted file,
missing variables, etc.) is emailed to the emailEverythingTo email address
(probably you) every time the dataset is reloaded.
You should replace or repair these files as soon as possible.
- FTP Trouble/Advice - If you FTP new data files to the ERDDAP server while ERDDAP is running,
there is the chance that ERDDAP will be reloading the dataset during the FTP process.
It happens more often than you might think!
If it happens, the file will appear to be valid (it has a valid name), but the file isn't yet valid.
If ERDDAP tries to read data from that invalid file, the resulting error will cause the file to be
added to the table of invalid files.
This is not good.
To avoid this problem, use a temporary file name when FTP'ing the file, e.g., ABC2005.nc_TEMP .
Then, the fileNameRegex test (see below) will indicate that this is not a relevant file.
After the FTP process is complete, rename the file to the correct name.
The renaming process will cause the file to become relevant in an instant.
- The skeleton XML for all EDDGridFromFiles subclasses is:
<dataset type="EDDGridFrom...Files" datasetID="..." active="..." >
<sourceUrl>...</sourceUrl>
<accessibleTo>...</accessibleTo> <!-- 0 or 1 -->
<reloadEveryNMinutes>...</reloadEveryNMinutes>
<onChange>...</onChange> <!-- 0 or more -->
<altitudeMetersPerSourceUnit>...</altitudeMetersPerSourceUnit>
<fileDir>...</fileDir> <-- The directory with the data files. -->
<recursive>true|false</recursive> <-- Indicates if subdirectories of fileDir have data files, too. -->
<fileNameRegex>...</fileNameRegex> <-- A regular expression
describing valid data files names, e.g., ".*\.nc" for all .nc files. -->
<metadataFrom>...</metadataFrom> <-- The file to get metadata from ("first" or "last" (the default) based on files' lastModifiedTime). -->
<addAttributes>...</addAttributes>
<dataVariable>...</dataVariable> <!-- 1 or more -->
</dataset>
EDDGridFromNcFiles aggregates data from local, gridded,
GRIB .grb and .grb2 files,
HDF .hdf 4 (and 5?) files,
NetCDF .nc files.
This may work with other file types (e.g., BUFR), we just haven't tested it -- please send us some sample files.
See this class' superclass, EDDGridFromFiles, for details.
Note that for GRIB files, ERDDAP will make a .gbx index file the first time it reads each GRIB file.
So the GRIB files must be in a directory where the "user" that ran Tomcat has read+write permission.
EDDGridSideBySide
aggregates two or more EDDGrid datasets (the children) side by side.
EDDGridAggregateExistingDimension
aggregates two or more EDDGrid datasets based on different values of the first dimension.
EDDGridCopy makes and maintains a local copy of another EDDGrid's
data and serves data from the local copy.
- EDDGridCopy (and for tabular data, EDDTableCopy)
is a very easy to use and a very effective
solution to some of the biggest problems with serving data from remote data sources:
- Accessing data from a remote data source can be slow.
- They may be slow because they are inherently slow (e.g., an inefficient type of server),
- because they are overwhelmed by too many requests,
- or because your server or the remote server is bandwidth limited.
- The remote dataset is sometimes unavailable (again, for a variety of reasons).
- Relying on one source for the data doesn't scale well (e.g., when many users and many ERDDAPs utilize it).
- EDDGridCopy solves these problems by automatically making and maintaining
a local copy of the data and serving data from the local copy.
ERDDAP can serve data from the local copy very, very quickly.
And making a local copy relieves the burden on the remote server.
And the local copy is a backup of the original, which is useful in case something happens to the original.
There is nothing new about making a local copy of a dataset. What is new here is that this class
makes it *easy* to create and *maintain* a local copy of data from a *variety* of types
of remote data sources and *add metadata* while copying the data.
- EDDGridCopy makes the local copy of the data by requesting chunks of data from the remote <dataset> .
There will be a chunk for each value of the leftmost axis variable.
Note that EDDGridCopy doesn't rely on the remote dataset's index numbers for the axis -- those may change.
WARNING: If the size of a chunk of data is so big that it causes problems (> 1GB?),
EDDGridCopy can't be used.
(Sorry, we hope to have a solution for this problem in the future.)
WARNING: EDDGridCopy REQUIRES that the data values for each chunk not change.
If that is not the case, don't use EDDGridCopy.
For example, for a forecast model where the axis values are 0 (now cast), 6 (6 hour forecast, 12 (12 hour forecast),
and where the data associated with those index values changes every 6 hours (when the model is rerun),
EDDGridCopy will only see that the axis values haven't changed, not that the data values did change.
- Each chunk of data is stored in a separate netCDF file in a subdirectory of
<bigParentDirectory>/copy/datasetID/
(as specified in setup.xml).
File names created from axis values are modified to make them file-name-safe
(e.g., hyphens are replaced by "x2D") -- this doesn't affect the actual data.
- Each time EDDGridCopy is reloaded, it checks the remote <dataset> to see what chunks are available.
If the file for a chunk of data doesn't already exist, a request to get the chunk is added to a queue.
ERDDAP's taskThread processes all the queued requests for chunks of data, one-by-one.
You can see statistics for the taskThread's activity on the
Status Page and in the
Daily Report.
(Yes, ERDDAP could assign multiple tasks to this process, but that would use up
lots of the remote data source's bandwidth, memory, and CPU time, and
lots of the local ERDDAP's bandwidth, memory, and CPU time, neither of which is a good idea.)
NOTE: The very first time an EDDGridCopy is loaded, (if all goes well)
lots of requests for chunks of data will be added to the taskThread's queue,
but no local data files will have been created.
So the constructor will fail but taskThread will continue to work and create local files.
If all goes well, the taskThread will make some local data files and the next attempt to
reload the dataset (in ~15 minutes) will succeed, but initially with a very limited amount of data.
WARNING: If the remote dataset is large and/or the remote server is slow (that's the problem, isn't it?!),
it will take a long time to make a complete local copy.
In some cases, the time needed will be unacceptable.
For example, transmitting 1 TB of data over a T1 line (0.15 GB/s) takes at least 60 days, under optimal conditions.
Plus, it uses lots of bandwidth, memory, and CPU time on the remote and local computers.
The solution is to mail a hard drive to the administrator of the remote data set so that
s/he can make a copy of the dataset and mail the hard drive back to you.
Use that data as a starting point and EDDGridCopy will add data to it.
(That is how Amazon's EC2 Cloud Service handles the problem,
even though they have lots of bandwidth.)
WARNING: If a given value for the leftmost axis variable disappears from the remote dataset,
EDDGridCopy does NOT delete the local copied file. If you want to, you can delete it yourself.
- Recommended use:
- Create the <dataset> entry for the remote data source.
Get it working correctly, including all of the desired metadata.
- If it is too slow, add XML code to wrap it in an EDDGridCopy dataset.
- Use a different datasetID (perhaps add "c" to the beginning of the old datasetID).
- Copy the <accessibleTo>, <reloadEveryNMinutes> and <onChange> from the
remote EDDGrid's XML to the EDDGridCopy's XML.
(Their values for EDDGridCopy matter; their values for the inner dataset become irrelevant.)
- ERDDAP will make and maintain a local copy of the data.
- If you want to force EDDGridCopy to re-get some or all of the data (perhaps you know that it changed),
you can delete one or more of the files in <bigParentDirectory>/copy/datasetID/ any time.
The files will be automatically recreated the next time the dataset is reloaded.
You might want to use ERDDAP's
flag
system to reload the dataset immediately.
If you do use a flag and you have an email subscription to the dataset, you will get two emails:
one when the dataset first reloads and starts to copy the data,
and another when the dataset loads again (automatically) and detects the new local data files.
- If you need to change any addAttributes or change the order of the variables associated with the source dataset:
- Change the addAttributes for the source dataset in datasets.xml, as needed.
- Delete one of the copied files.
- Set a flag
to reload the dataset immediately.
If you do use a flag and you have an email subscription to the dataset, you will get two emails:
one when the dataset first reloads and starts to copy the data,
and another when the dataset loads again (automatically) and detects the new local data files.
- The deleted file will be regenerated with the new metadata.
If the source dataset is ever unavailable, the EDDGridCopy dataset will get metadata
from the regenerated file, since it is the youngest file.
- The skeleton XML for an EDDGridCopy dataset is:
<dataset type="EDDGridCopy" datasetID="..." active="..." >
<accessibleTo>...</accessibleTo> <!-- 0 or 1 -->
<reloadEveryNMinutes>...</reloadEveryNMinutes>
<onChange>...</onChange> <!-- 0 or more -->
<dataset>...</dataset> <!-- 1 -->
</dataset>
EDDTableFromBMDE handles data from a
BMDE server.
- BMDE servers expect an XML request and return an XML response.
- All BMDE servers return XML responses with the same variables, as specified in the
BMDE schema.
The BMDE schema defines 515 data variables(!), but most BMDE servers only return data in a few of the variables.
You should ask the BMDE administrator for a list of the data variables which are active for a given sourceCode.
Then, just create ERDDAP dataVariables for those variables.
ERDDAP will check that each dataVariable's sourceName is a valid BMDE name (without the "bmde:" prefix).
ERDDAP will add some standard sourceAttributes for each dataVariable,
so you usually will not need to define any addAttributes.
- The skeleton XML for an EDDTableFromBMDE dataset is:
<dataset type="EDDTableFromBMDE" datasetID="..." active="..." >
<sourceUrl>...</sourceUrl>
<sourceCode>...</sourceCode>
<!-- If you read the XML response from the sourceUrl, the source code (e.g., prbo05)
is the value from one of the <resource><code> tags. -->
<accessibleTo>...</accessibleTo> <!-- 0 or 1 -->
<reloadEveryNMinutes>...</reloadEveryNMinutes>
<onChange>...</onChange> <!-- 0 or more -->
<altitudeMetersPerSourceUnit>...</altitudeMetersPerSourceUnit>
<dataVariable>...</dataVariable> <!-- 1 or more -->
</dataset>
EDDTableFromDapSequence handles variables within
1- and 2-level sequences from
DAP servers such as
DAPPER.
EDDTableFromDatabase handles data from one
database table or view.
- If the data you want to serve is in two or more tables (and needs a JOIN to extract data),
you need to make a new table (or a view) with the JOINed/flattened information.
Contact your database administrator.
- You must get the appropriate JDBC 3 (or JDBC 4?) driver .jar file and put it in <tomcat>/common/lib.
[Hmm. This used to work. But on some computers, you may need to put it in
<tomcat>/webapps/erddap/WEB-INF/lib after you install ERDDAP.]
For example, for Postgresql, you can click on "8.3-604 JDBC 3" to get the postgresql-8.3-604.jdbc3.jar file from
http://jdbc.postgresql.org.
In datasets.xml (see below), use "org.postgresql.Driver" for the <driverName>.
- You can gather most of the information you need to create the XML for an EDDTableFromDatabase dataset
by contacting the database administrator and by searching the web.
The <driverName>, driver .jar file, <connectionProperty> names (e.g., "user", "password", and "ssl"),
and some of the connectionProperty values can be found by searching the web for
"JDBC connection properties databaseType" (e.g., Oracle, MySQL, PostgreSQL).
- It is difficult to create the correct datasets.xml information needed for ERDDAP to establish
a connection to the database.
Be patient. Be methodical.
Search the web for examples of using JDBC to connect to your type of database.
Work closely with the database administrator, who may have relevant experience.
Look in the log.txt file to read the error messages. Read the messages carefully.
- Security - When working with databases, you need to do things as safely and
securely as possible to avoid allowing a malicious user to
damage your database or gain access to data they shouldn't have access to.
ERDDAP tries to do things in a secure way, too.
- We encourage you to set up ERDDAP to connect to the database as a
database user that only has access to the relevant database(s)
and only has READ privileges.
- We encourage you to set up the connection from ERDDAP
to the database so that it
- always uses SSL,
- only allows connections from one IP address (or one block of addresses) and from the one ERDDAP user, and
- only transfers passwords in their MD5 hashed form.
- [KNOWN PROBLEM]The connectionProperties (including the password!) are stored as plain text in datasets.xml.
Only the administrator should have READ/WRITE access to this file!
No other users of the computer should have READ or READ/WRITE access to this file!
We haven't found a way to allow the administrator to enter the database
password during ERDDAP's startup in Tomcat (which occurs without user input),
so the password must be accessible in a file.
- When in ERDDAP, the password and other connection properties are kept "private".
- Requests from clients are parsed and checked for validity before
generating requests for the database.
- Requests to the database are made with PreparedStatements, to avoid SQL injection.
- Requests to the database are submitted with executeQuery
(not executeStatement) to limit requests to be read-only
(so attempted SQL injection to alter the database will fail for this
reason, too).
- Views - EDDTableFromDatabase is limited to getting data from one table, but that shouldn't be a problem.
If a table of interest has foreign keys which link to other tables,
we recommend that you ask the database administrator to create a
VIEW.
Views "can join and simplify multiple tables into a single virtual table" (Wikipedia).
Views are great because:
- They simplify queries (since the queries don't have to specify the JOINs, etc.).
- They are efficient (since the database just has to set it up once).
- They increase abstraction (since the database can be changed without having to change how the VIEW appears to the client).
- Speed - Or, if speed is an issue, you will probably get faster responses if you periodically
(every day? whenever there is new data?) generate an actual table
(similarly to how you generated the VIEW) instead of using a VIEW.
Since any request to the table can then be fulfilled without JOINing another table,
the response will be much faster.
- The skeleton XML for an EDDTableFromDapSequence dataset is:
<dataset type="EDDTableFromDatabase" datasetID="..." active="..." >
<sourceUrl>...</sourceUrl>
<!-- Put the database name at the end, for example,
"jdbc:postgresql://123.45.67.89:5432/databaseName". REQUIRED. -->
<driverName>...</driverName>
<!-- The high-level name of the database driver, e.g., "org.postgresql.Driver".
You need to put the actual database driver .jar file (for example,
postgresql-8.3-603.jdbc3.jar) in [tomcat]/common/lib or perhaps in
<tomcat>/webapps/erddap/WEB-INF/lib. REQUIRED. -->
<connectionProperty name="name">value</connectionProperty>
<!-- The names (e.g., "user", "password", and "ssl") and values of the properties
needed for ERDDAP to establish the connection to the database. 0 or more. -->
<catalogName>...</catalogName>
<!-- The name of the catalog which has the schema which has the table, default = "". OPTIONAL. -->
<schemaName>...</schemaName>
<!-- The name of the schema which has the table, default = "". OPTIONAL. -->
<tableName>...</tableName> <!-- The name of the table, default = "". REQUIRED. -->
<orderBy>...</orderBy> <!-- A comma-separated list of sourceNames to be used
in an ORDER BY clause at the end of the every query sent to the database
(unless the user's request includes an &orderBy() filter, in which case the user's orderBy is used).
The order of the sourceNames is important.
The leftmost sourceName is most important; subsequent sourceNames are only used to break ties.
Only relevant sourceNames are included in the ORDER BY clause for a given user request.
If this is not specified, the order of the returned values in not specified.
Default = "". OPTIONAL. -->
<accessibleTo>...</accessibleTo> <!-- 0 or 1 -->
<reloadEveryNMinutes>...</reloadEveryNMinutes>
<onChange>...</onChange> <!-- 0 or more -->
<altitudeMetersPerSourceUnit>...</altitudeMetersPerSourceUnit>
<addAttributes>...</addAttributes>
<dataVariable>...</dataVariable> <!-- 1 or more -->
</dataset>
EDDTableFromFiles is the superclass of all
EDDTableFrom...Files classes.
You can't use EDDTableFromFiles directly.
Instead, you can use subclasses of EDDTableFromFiles to handle specific file types:
- EDDTableFromHyraxFiles aggregates data with several variables,
each with shared dimensions (e.g., time, altitude, latitude, longitude), and served by a
Hyrax OPeNDAP server.
- EDDTableFromNcFiles aggregates data from .nc files with several variables,
each with shared dimensions (e.g., time, altitude, latitude, longitude).
Details - The following information applies to all of the subclasses of EDDTableFromFiles.
- Aggregation - This class aggregates data from local files. Each file holds a (relatively) small table of data.
- The resulting dataset appears as if all of the file's tables had been combined
(all of the rows of data from file #1, plus all of the rows from file #2, ...).
- The files don't all have to have all of the specified variables.
- The variables in all of the files MUST have the same values for the
add_offset,
missing_value,
_FillValue,
scale_factor, and
units attributes (if any).
ERDDAP checks, but it is an imperfect test --
if there are different values, ERDDAP doesn't know which is correct and therefore which files are invalid.
- Directories - The files can be in one directory, or in a directory and its subdirectories (recursively).
Note that if there are a large number of files (e.g., >1000), the operating system (and thus EDDTableFromFiles)
will operate much more efficiently if you store the files in a series of subdirectories.
- Cached File Information - When an EDDTableFromFiles dataset is first loaded,
EDDTableFromFiles reads all of the relevant files and creates tables in memory with information
about each valid file (one file per row, including the minimum and maximum value of each variable,
even String variables) and each invalid file.
The tables are also stored on disk, as .json files in <bigParentDirectory>/datasetInfo in files named
[datasetID].dirs.json (which holds a list of unique directory names) and
[datasetID].files.json (which holds the table with each valid file's information),
[datasetID].bad.json (which holds the table with each bad file's information).
The copy of the file information tables on disk is also useful when ERDDAP is shut down and restarted:
it saves EDDTableFromFiles from having to re-read all of the data files.
You shouldn't ever need to delete or work with these files.
You can delete these files (but why?). You can use the
flag system
to force ERDDAP to update the cached file information.
- Handling Requests - ERDDAP tabular data requests can put constraints on any variable.
When a client's request for data is processed, EDDTableFromFiles can quickly look
in the table with the valid file information to see which files might have relevant data.
For example, if each source file has the data for one fixed-location buoy, EDDTableFromFiles can
very efficiently determine which files might have data within a given longitude range and latitude range.
Because the valid file information table includes the minimum and maximum value of every variable
for every valid file, EDDTableFromFiles can often handle other queries quite efficiently.
(EDDTableFromHyraxFiles only keeps track of the min and max for the axis variables, not the data variables.)
For example, if some of the buoys don't have an air pressure sensor, and a client requests data for
airPressure!=NaN, EDDTableFromFiles can efficiently determine which buoys have air pressure data.
- Updating the Cached File Information -
Whenever the dataset is reloaded, the cached file information is updated.
- The dataset is reloaded periodically as determined by the <reloadEveryNMinutes>
in the dataset's information in datasets.xml.
- The dataset is reloaded as soon as possible whenever ERDDAP detects that you have added, removed,
touch'd (to change the file's lastModified time),
or changed a datafile.
- The dataset is reloaded as soon as possible if you use the
flag system.
When the dataset is reloaded, ERDDAP compares the currently available files to the cached file information table.
New files are read and added to the valid files table.
Files that no longer exist are dropped from the valid files table.
Files where the file timestamp has changed are read and their information is updated.
The new tables replaces the old tables in memory and on disk.
- Bad Files - The table of bad files and the reasons the files were declared bad (corrupted file,
missing variables, incorrect axis values, etc.) is emailed to the emailEverythingTo email address
(probably you) every time the dataset is reloaded.
You should replace or repair these files as soon as possible.
For EDDTableFromHyraxFiles, the table of bad files is thrown out each time the dataset is reloaded.
So an individual Hyrax file that is bad at some point, isn't marked as bad forever.
- Near Real Time Data -
EDDTableFromFiles treats requests for very recent data as a special case.
The problem: If the files making up the dataset are updated frequently, it is likely that the dataset won't
be updated every time a file is changed. So EDDTableFromFiles won't be aware of the changed files.
(You could use the flag system,
but this might lead to ERDDAP reloading the dataset almost continually. So in most cases, we don't recommend it.)
Instead, EDDTableFromFiles does two things to deal with this situation:
1) When the dataset is loaded, if the maximum value for the time variable is in the last 24 hours,
ERDDAP sets the maximum time to be NaN (meaning Now).
2) When ERDDAP gets a request for data within the last 20 hours (e.g., 8 hours ago until Now),
ERDDAP will search all files which have any data in the last 20 hours.
Thus, ERDDAP doesn't need to have perfectly up-to-date data for all of the files in order to find the latest data.
You should still set <reloadEveryNMinutes> to a reasonably
small value (e.g., 60), but it doesn't have to be tiny (e.g., 3).
Recommended organization of near-real-time data in the files:
If, for example, you have a dataset that stores data for numerous stations (or buoy, or ...) for many years,
you could arrange the files so that, for example, there is one file per station.
But then, every time new data for a station arrives, you have to read a large old file and write a large new file.
And when ERDDAP reloads the dataset, it notices that some files have been modified, so it reads those files completely.
That is inefficient.
Instead, we recommend that you store the data in chunks, e.g., all data for one station for one year (or one month).
Then, when a new datum arrives, you only have to read and rewrite the file with this year's (or month's) data.
All of the files for previous years (or months) for that station remain unchanged.
And when ERDDAP reloads the dataset, most files are unchanged; only a few, small files have changed and need to be read.
- FTP Trouble/Advice - If you FTP new data files to the ERDDAP server while ERDDAP is running,
there is the chance that ERDDAP will be reloading the dataset during the FTP process.
It happens more often than you might think!
If it happens, the file will appear to be valid (it has a valid name), but the file isn't valid.
If ERDDAP tries to read data from that invalid file, the resulting error will cause the file to be
added to the table of invalid files.
This is not good.
To avoid this problem, use a temporary file name when FTP'ing the file, e.g., ABC2005.nc_TEMP .
Then, the fileNameRegex test (see below) will indicate that this is not a relevant file.
After the FTP process is complete, rename the file to the correct name.
The renaming process will cause the file to become relevant in an instant.
- File Name Extracts -
EDDTableFromFiles has a system for extracting a String from each file name and using that to make
a psuedo data variable.
Currently, there is no system to interpret these Strings as dates/times.
There are several XML tags to set up this system.
If you don't need part or all of this system, just don't specify these tags or use "" values.
- preExtractRegex is a
regular expression
used to identify text to be removed from the start of the file name.
The removal only occurs if the regex is matched.
This usually begins with "^" to match the beginning of the file name.
- postExtractRegex is a
regular expression
used to identify text to be removed from the end of the file name.
The removal only occurs if the regex is matched.
This usually ends with "$" to match the beginning of the file name.
- extractRegex If present, this
regular expression
is used after preExtractRegex and postExtractRegex
to identify a string to be extracted from the file name (e.g., stationID).
If the regex isn't matched, the entire file name is used (minus preExtract and postExtract).
Use ".*" to match the entire file name that is left after preExtractRegex and postExtractRegex.
- columnNameForExtract is the data column name for the extracted Strings.
This column name must be in the tDataVariables list as a source column name (with any data type).
- The skeleton XML for all EDDTableFromFiles subclasses is:
<dataset type="EDDTableFrom...Files" datasetID="..." active="..." >
<nDimensions>...</nDimensions> <!-- The value should be an integer >= 1. -->
<accessibleTo>...</accessibleTo> <!-- 0 or 1 -->
<reloadEveryNMinutes>...</reloadEveryNMinutes>
<onChange>...</onChange> <!-- 0 or more -->
<altitudeMetersPerSourceUnit>...</altitudeMetersPerSourceUnit>
<fileDir>...</fileDir> <-- The directory with the data files. -->
<recursive>true|false</recursive> <-- Indicates if subdirectories of fileDir have data files, too. -->
<fileNameRegex>...</fileNameRegex> <-- A regular expression
describing valid data files names, e.g., ".*\.nc" for all .nc files. -->
<metadataFrom>...</metadataFrom> <-- The file to get metadata from ("first" or "last" (the default) based on files' lastModifiedTime). -->
<preExtractRegex>...</preExtractRegex> <-- See File Name Extracts. -->
<postExtractRegex>...</postExtractRegex> <-- See File Name Extracts. -->
<extractRegex>...</extractRegex> <-- See File Name Extracts. -->
<columnNameForExtract>...</columnNameForExtract> <-- See File Name Extracts. -->
<sortedColumnName>...</sortedColumnName>
<-- The sourceName of the numeric column that the data files are usually already sorted by
within each file, e.g., "time".
Use null or "" if no variable is suitable.
It is ok if not all files are sorted by this column.
If present, this can greatly speed up some data requests.
For EDDTableFromHyraxFiles and EDDTableFromNcFiles, this must be the leftmost axis variable.-->
<sortFilesBySourceNames>...</sortFilesBySourceNames>
<-- This is a space-separated list of source variable names which specifies how the internal list
of files should be sorted (in ascending order), for example "id time".
It is the minimum value of the specified columns in each fine that is used for sorting.
When a data request is filled, data is obtained from the files in this order.
Thus it determines the overall order of the data in the response.
If you specify more than one column name,
the second name is used if there is a tie for the first column;
the third is used if there is a tie for the first and second columns; ...
This is OPTIONAL (the default is fileDir+fileName order). -->
<addAttributes>...</addAttributes>
<dataVariable>...</dataVariable> <!-- 1 or more -->
<-- For EDDTableFromHyraxFiles (and EDDTableFromNcFiles?), the axis variables needn't be first
or in any specific order. -->
</dataset>
EDDTableFromHyraxFiles aggregates data files with several variables,
each with one or more shared dimensions (e.g., time, altitude, latitude, longitude), and served by a
Hyrax OPeNDAP server.
- In all cases, each file has multiple values for the leftmost dimension, e.g. time.
- The files often (but don't have to) have a single value for the other dimensions (e.g., altitude, latitude, longitude).
- The files may have character variables with an additional dimension (e.g., nCharacters).
- Hyrax servers can be identified by the "dods-bin/nph-dods/" in the URL.
For example, http://biloxi-bay.ssc.hpc.msstate.edu/dods-bin/nph-dods/WCOS/nmsp/wcos/
- This class screen-scrapes the Hyrax web pages with the lists of files in each directory.
Because of this, it is very specific to the current format of Hyrax web pages.
We will try to adjust ERDDAP quickly if/when future versions of Hyrax change how the files are listed.
- For the <datasets> tag in datasets.xml:
For <fileDir>, use the URL of the base directory of the dataset in the Hyrax server, for example,
http://biloxi-bay.ssc.hpc.msstate.edu/dods-bin/nph-dods/WCOS/nmsp/wcos/
The web page should have "OPeNDAP Server Index of [directoryName]" at the top.
- Because some data requests are fulfilled by ERDDAP getting data from lots of remote files, some responses will be slow.
If this is a problem, wrap this dataset in EDDTableCopy.
- See this class' superclass, EDDTableFromFiles, for details.
- See the 1D, 2D, 3D, and 4D examples for EDDTableFromNcFiles.
EDDTableFromNcFiles
aggregates data from .nc files with several variables, each with one or more shared dimensions (e.g., time, altitude, latitude, longitude).
In all cases, each file has multiple values for the leftmost dimension, e.g. time.
In all cases, each file has a single value for the other dimensions (e.g., altitude, latitude, longitude).
The files may have character variables with an additional dimension (e.g., nCharacters).
See this class' superclass, EDDTableFromFiles, for details.
- 1D Example:
1D files are somewhat different from 2D, 3D, 4D, ... files.
You might have a set of .nc data files where each file has one month's worth of data from one drifting buoy.
Each file will have 1 dimension, e.g., time (size = [many]).
Each file will have many 1D variables which use that dimension, e.g., time, longitude, latitude, air temperature, ....
Each file may have 2D character variables, e.g., with dimensions (time,nCharacters).
- 2D Example:
You might have a set of .nc data files where each file has one month's worth of data from one drifting buoy.
Each file will have 2 dimensions, e.g., time (size = [many]) and id (size = 1).
Each file will have 2 1D variables with the same names as the dimensions and using the same-name dimension, e.g., time(time), id(id).
These 1D variables should be included in the list of <dataVariable>'s in the dataset's XML.
Each file will have many 2D variables, e.g., longitude, latitude, air temperature, water temperature, ...
Each file may have 3D character variables, e.g., with dimensions (time,id,nCharacters).
- 3D Example:
You might have a set of .nc data files where each file has one month's worth of data from one stationary buoy.
Each file will have 3 dimensions, e.g., time (size = [many]), lat (size = 1), and lon (size = 1).
Each file will have 3 1D variables with the same names as the dimensions and using the same-name dimension, e.g., time(time), lat(lat), lon(lon).
These 1D variables should be included in the list of <dataVariable>'s in the dataset's XML.
Each file will have many 3D variables, e.g., air temperature, water temperature, ...
Each file may have 4D character variables, e.g., with dimensions (time,lat,lon,nCharacters).
The file's name might have the buoy's name within the file's name.
- 4D Example:
You might have a set of .nc data files where each file has one month's worth of data from one stationary buoy.
Each file will have 4 dimensions, e.g., time (size = [many]), alt (size = 1), lat (size = 1), and lon (size = 1).
Each file will have 4 1D variables with the same names as the dimensions and using the same-name dimension, e.g., time(time), alt(alt), lat(lat), lon(lon).
These 1D variables should be included in the list of <dataVariable>'s in the dataset's XML.
Each file will have many 4D variables, e.g., air temperature, water temperature, ...
Each file may have 5D character variables, e.g., with dimensions (time,alt,lat,lon,nCharacters).
The file's name might have the buoy's name within the file's name.
EDDTableFromNOS handles data from a NOAA
NOS source,
which uses SOAP+XML for requests and responses. It is very specific to NOAA NOS's XML.
See the sample EDDTableFromNOS dataset in datasets2.xml.
EDDTableFromOBIS handles data from an
OBIS server.
- OBIS servers expect an XML request and return an XML response.
- Because all OBIS servers serve the same variables the same way
(see the OBIS schema),
you don't have to specify much to set up an OBIS dataset in ERDDAP.
- The skeleton XML for an EDDTableFromOBIS dataset is:
<dataset type="EDDTableFromOBIS" datasetID="..." active="..." >
<sourceUrl>...</sourceUrl>
<sourceCode>...</sourceCode>
<!-- If you read the XML response from the sourceUrl, the source code (e.g., GHMP)
is the value from one of the <resource><code> tags. -->
<accessibleTo>...</accessibleTo> <!-- 0 or 1 -->
<reloadEveryNMinutes>...</reloadEveryNMinutes>
<onChange>...</onChange> <!-- 0 or more -->
<longitudeSourceMinimum>...</longitudeSourceMinimum> <-- OPTIONAL -->
<longitudeSourceMaximum>...</longitudeSourceMaximum> <-- OPTIONAL -->
<latitudeSourceMinimum>...</latitudeSourceMinimum> <-- OPTIONAL -->
<latitudeSourceMaximum>...</latitudeSourceMaximum> <-- OPTIONAL -->
<altitudeSourceMinimum>...</altitudeSourceMinimum> <-- OPTIONAL -->
<altitudeSourceMaximum>...</altitudeSourceMaximum> <-- OPTIONAL -->
<timeSourceMinimum>...</timeSourceMinimum> <-- OPTIONAL, in YYYY-MM-DDThh:mm:ss format -->
<timeSourceMaximum>...</timeSourceMaximum> <-- OPTIONAL, in YYYY-MM-DDThh:mm:ss format -->
</dataset>
EDDTableFromSOS handles data from a
Sensor Observation Service
(SWE/SOS) server.
- This dataset type aggregates data from a group of stations which are all served by
one SOS server.
- The stations all serve the same set of variables
(although the source for each station doesn't have to serve all variables).
- SOS servers expect an XML request and return an XML response.
- It is not easy to generate the dataset XML for SOS datasets.
To find the needed information, you must visit the sourceUrl in a browser;
look at the XML; make a request by hand;
and look at the XML response to the request.
- SOS overview:
- SWE (Sensor Web Enablement) and SOS (Sensor Observation Service) are
OpenGIS® standards.
That web site has the standards documents.
- The OGC Web Services Common Specification ver 1.1.0 (OGC 06-121r3)
covers construction of GET and POST queries (see section 7.2.3 and section 9).
- If you send a getCapabilities xml request to a SOS server
(sourceUrl + "?service=SOS&request=GetCapabilities"),
you get an xml result
with a list of stations and the observedProperties that they have data for.
- An observedProperty is a formal URI reference to a property.
For example, urn:ogc:phenomenon:longitude:wgs84 or http://marinemetadata.org/cf#sea_water_temperature
- An observedProperty isn't a variable.
- More than one variable may have the same observedProperty
(for example, insideTemp and outsideTemp might both have
observedProperty http://marinemetadata.org/cf#air_temperature)
- If you send a getObservation xml request to a SOS server, you get an xml result
with descriptions of field names in the response, field units, and the data.
The field names will include longitude, latitude, depth(perhaps), and time.
- The dataType for each dataVariable may not be specified by the server.
If so, you must look at the XML data responses from the server and assign appropriate dataTypes
in the ERDDAP dataset dataVariable definitions.
- (At the time of writing this) some SOS servers respond to getObservation requests
for more than one observedProperty by just returning results for
the first of the observedProperties. (No error message!)
See the constructor parameter requestObservedPropertiesSeparately.
- See the
Phenomenon Dictionary.
- The skeleton XML for an EDDTableFromSOS dataset is:
<dataset type="EDDTableFromSOS" datasetID="..." active="..." >
<sourceUrl>...</sourceUrl>
<accessibleTo>...</accessibleTo> <!-- 0 or 1 -->
<reloadEveryNMinutes>...</reloadEveryNMinutes>
<onChange>...</onChange> <!-- 0 or more -->
<longitudeSourceName>...</longitudeSourceName>
<latitudeSourceName>...</latitudeSourceName>
<altitudeSourceName>...</altitudeSourceName>
<altitudeSourceMinimum>...</altitudeSourceMinimum>
<altitudeSourceMaximum>...</altitudeSourceMaximum>
<altitudeMetersPerSourceUnit>...</altitudeMetersPerSourceUnit>
<timeSourceName>...</timeSourceName>
<timeSourceFormat>...</timeSourceFormat>
<!-- timeSourceFormat MUST be either
* For numeric data: a UDUnits-compatible string (with the format "units since baseTime")
describing how to interpret source time values (e.g., "seconds since 1970-01-01T00:00:00"),
where the base time is an ISO 8601 formatted date time string (YYYY-MM-DDThh:mm:ss).
* For String String data: an org.joda.time.format.DateTimeFormat string
(which is compatible with java.text.SimpleDateFormat) describing how to interpret
string times (e.g., the ISO8601TZ_FORMAT "yyyy-MM-dd'T'HH:mm:ssZ"). See
Joda Time or Java's SimpleDateFormat -->
<observationOfferingIdRegex>...</observationOfferingIdRegex>
<!-- Only observationOfferings with IDs (usually the station names) which match this
regular expression
will be included in the dataset (".+" will catch all station names) -->
<requestObservedPropertiesSeparately>true | false (the default)</requestObservedPropertiesSeparately>
<addAttributes>...</addAttributes>
<dataVariable>...</dataVariable> <!-- 1 or more; include the 'datatype' tag -->
</dataset>
EDDTableCopy makes and maintains a local copy of another EDDTable's
data and serves data from the local copy.
- EDDTableCopy (and for grid data, EDDGridCopy)
is a very easy to use and a very effective
solution to some of the biggest problems with serving data from remote data sources:
- Accessing data from a remote data source can be slow.
- They may be slow because they are inherently slow (e.g., an inefficient type of server),
- because they are overwhelmed by too many requests,
- or because your server or the remote server is bandwidth limited.
- The remote dataset is sometimes unavailable (again, for a variety of reasons).
- Relying on one source for the data doesn't scale well (e.g., when many users and many ERDDAPs utilize it).
- EDDTableCopy solves these problems by automatically making and maintaining
a local copy of the data and serving data from the local copy.
ERDDAP can serve data from the local copy very, very quickly.
And making and using a local copy relieves the burden on the remote server.
And the local copy is a backup of the original, which is useful in case something happens to the original.
There is nothing new about making a local copy of a dataset. What is new here is that this class
makes it *easy* to create and *maintain* a local copy of data from a *variety* of types
of remote data sources and *add metadata* while copying the data.
- EDDTableCopy makes the local copy of the data by requesting chunks of data from the remote <dataset> .
EDDTableCopy determines which chunks to request by requesting the &distinct() values
for the <extractDestinationNames> (specified in the datasets.xml, see below),
which are the space-separated destination names of variables in the remote <dataset>.
For example, <extractDestinationNames>drifter profile</extractDestinationNames> might yield
distinct values combinations of drifter=tig17,profile=1017, drifter=tig17,profile=1095, ...
drifter=une12,profile=1223, drifter=une12,profile=1251, ....
In situations where one column (e.g., profile) may be all that is required to uniquely identify a group of rows of data,
if there are a very large number of, e.g., profiles, it may be useful to also specify an additional
extractDestinationName (e.g., drifter) which serves to subdivide the profiles.
That leads to fewer data files in a given directory, which may lead to faster access.
WARNING: EDDTableCopy REQUIRES that the data values for each chunk (each distinct value combination) not change.
If there are no suitable variables for <extractDestinationNames>, don't use EDDTableCopy.
For example, if the dataset has stations that receive data over a long period of time (and are still receiving data),
then EDDTableCopy can't be used because the the data values in each chunk will change.
(Sorry, we hope to have a solution for this problem in the future.)
- Each chunk of data is stored in a separate netCDF file in a subdirectory of
<bigParentDirectory>/copy/datasetID/
(as specified in setup.xml).
There is one subdirectory level for all but the last extractDestinationName.
For example, data for tig17+1017, would be stored in <bigParentDirectory>/copy/sampleDataset/tig17/1017.nc.
For example, data for une12+1251, would be stored in <bigParentDirectory>/copy/sampleDataset/une12/1251.nc.
Directory and file names created from data values are modified to make them file-name-safe
(e.g., spaces are replaced by "x20") -- this doesn't affect the actual data.
- Each time EDDTableCopy is reloaded, it checks the remote <dataset> to see what distinct chunks are available.
If the file for a chunk of data doesn't already exist, a request to get the chunk is added to a queue.
ERDDAP's taskThread processes all the queued requests for chunks of data, one-by-one.
You can see statistics for the taskThread's activity on the
Status Page and in the
Daily Report.
(Yes, ERDDAP could assign multiple tasks to this process, but that would use up
lots of the remote data source's bandwidth, memory, and CPU time, and
lots of the local ERDDAP's bandwidth, memory, and CPU time, neither of which is a good idea.)
NOTE: The very first time an EDDTableCopy is loaded, (if all goes well)
lots of requests for chunks of data will be added to the taskThread's queue,
but no local data files will have been created.
So the constructor will fail but taskThread will continue to work and create local files.
If all goes well, the taskThread will make some local data files and the next attempt to
reload the dataset (in ~15 minutes) will succeed, but initially with a very limited amount of data.
WARNING: If the remote dataset is large and/or the remote server is slow (that's the problem, isn't it?!),
it will take a long time to make a complete local copy.
In some cases, the time needed will be unacceptable.
For example, transmitting 1 TB of data over a T1 line (0.15 GB/s) takes at least 60 days, under optimal conditions.
Plus, it uses lots of bandwidth, memory, and CPU time on the remote and local computers.
The solution is to mail a hard drive to the administrator of the remote data set so that
s/he can make a copy of the dataset and mail the hard drive back to you.
Use that data as a starting point and EDDTableCopy will add data to it.
(That is how Amazon's EC2 Cloud Service handles the problem,
even though they have lots of bandwidth.)
WARNING: If a given combination of values disappears from remote dataset,
EDDTableCopy does NOT delete the local copied file. If you want to, you can delete it yourself.
- Recommended use:
- Create the <dataset> entry for the remote data source.
Get it working correctly, including all of the desired metadata.
- If it is too slow, add XML code to wrap it in an EDDTableCopy dataset.
- Use a different datasetID (perhaps add "c" to the beginning of the old datasetID).
- Copy the <accessibleTo>, <reloadEveryNMinutes> and <onChange> from the
remote EDDTable's XML to the EDDTableCopy's XML.
(Their values for EDDTableCopy matter; their values for the inner dataset become irrelevant.)
- <orderExtractBy> is an OPTIONAL space separated list of destination variable names in the
remote <dataset>.
When each chunk of data is downloaded from the remote server, the chunk will be sorted by these variables
(by the first variable, then by the second variable if the first variable is tied, ...).
In some cases, ERDDAP will be able to extract data faster from the local data files
if the first variable in the list is a numeric variable ("time" counts as a numeric variable).
But choose the these variables in a way that is appropriate for the dataset.
- ERDDAP will make and maintain a local copy of the data.
- If you want to force EDDTableCopy to re-get some or all of the data (perhaps you know that it changed),
you can delete one or more of the files in <bigParentDirectory>/copy/datasetID/ any time.
The files will be automatically recreated the next time the dataset is reloaded.
You might want to use ERDDAP's
flag
system to reload the dataset immediately.
If you do use a flag and you have an email subscription to the dataset, you will get two emails:
one when the dataset first reloads and starts to copy the data,
and another when the dataset loads again (automatically) and detects the new local data files.
- If you need to change any addAttributes or change the order of the variables associated with the source dataset:
- Change the addAttributes for the source dataset in datasets.xml, as needed.
- Delete one of the copied files.
- Set a flag
to reload the dataset immediately.
If you do use a flag and you have an email subscription to the dataset, you will get two emails:
one when the dataset first reloads and starts to copy the data,
and another when the dataset loads again (automatically) and detects the new local data files.
- The deleted file will be regenerated with the new metadata.
If the source dataset is ever unavailable, the EDDTableCopy dataset will get metadata
from the regenerated file, since it is the youngest file.
- Note that EDDGridCopy is very similar to EDDTableCopy,
but works with gridded datasets.
- The skeleton XML for an EDDTableCopy dataset is:
<dataset type="EDDTableCopy" datasetID="..." active="..." >
<accessibleTo>...</accessibleTo> <!-- 0 or 1 -->
<reloadEveryNMinutes>...</reloadEveryNMinutes>
<onChange>...</onChange> <!-- 0 or more -->
<extractDestinationNames>...</extractDestinationNames> <!-- 1 -->
<orderExtractBy>>...</orderExtractBy>> <!-- 0 or 1 -->
<dataset>...</dataset> <!-- 1 -->
</dataset>
Tag and Attribute Descriptions
Here are descriptions of some of the tags which are used by more than one dataset type.
- <accessibleTo>
is an OPTIONAL tag within a <dataset> tag that specifies a space-separated list of
roles which are allowed to have access to this dataset.
- This is part of ERDDAP's
security system
for restricting access to some datasets to some users.
- If this tag is not present, all users (even if they haven't logged in) will have access to this dataset.
- If this tag is present, this dataset will only be visible and accessible
to logged-in users who have one of the specified roles.
This dataset won't be visible to users who aren't logged in.
- active="..." is an OPTIONAL attribute within the <dataset> tag
which indicates if a dataset is active (eligible for use in ERDDAP) or not.
- Valid values are "true" (the default) and "false".
- Since the default is "true", you don't need to use this attribute except
to use active="false"
to force a
dataset's removal
datasets removal as soon as possible (if it is alive in ERDDAP)
and to tell ERDDAP not to try to load it in the future.
- <addAttributes>
is an OPTIONAL tag which lets ERDDAP administrators control
the metadata attributes associated with a dataset and its variables.
- The addAttributes tag encloses 0 or more att subtags,
which are used to specify individual attributes.
- Each attribute consists of a name and a value (which has a specific data type, e.g., double).
- There can be only one attribute with a given name.
If there are more, the last one has priority.
- The value can be a single value or a space-separated list of values.
- When ERDDAP loads a dataset, it gets the attributes ("sourceAttributes") that the datasource provides.
For example, for OPeNDAP data sources, ERDDAP reads the data source's .das file to get the sourceAttributes.
- You can use the addAttributes tag (for the dataset and/or each variable) in a dataset's XML in
datasets.xml to modify the sourceAttributes.
- ERDDAP combines the sourceAttributes and the addAttributes (which have priority) to make
the "combinedAttributes", which are what ERDDAP users see.
- Thus, you can use addAttributes to redefine the values of sourceAttributes,
add new attributes, or remove attributes.
- Syntax
- The order of the att subtags within addAttributes is not important.
- The att subtag format is <att name="name" [type="type"] >value</att>
- If an att subtag has no value or a value of "null", that attribute will be removed from the combined attributes.
For example, <att name="rows" /> will remove "rows" from the combined attributes.
For example, <att name="coordinates">null</att> will remove "coordinates" from the combined attributes.
- The OPTIONAL "type" value for att subtags indicates the data type for the values. The default type is "string".
- Valid types for single values are
"byte", "unsignedShort", "short", "int", "long", "float", "double", and "string".
- Valid types for space-separated lists of values (or single values) are
"byteList", "unsignedShortList", "shortList", "intList", "longList", "floatList", "doubleList".
- There is no "stringList". Store the String values in a newline-separated String.
- Global Attributes -
The addAttributes tag within the dataset tag
is used to add attributes that apply to the entire dataset.
- Comments about global attributes that are special in ERDDAP:
- cdm_data_type - the
Common Data Model
data type for the dataset.
- Either the dataset's sourceAttributes or its addAttributes MUST include this attribute.
- All EDDGrid datasets automatically set this to "Grid".
- For EDDTable datasets, you MUST use "Other", "Point", "Station" or "Trajectory",
although some dataset types (like EDDTableFromObis) will set this automatically.
- (As of this writing) the CDM standard is still evolving.
- history is a multi-line string
with a line for every processing step that the data has undergone.
- Ideally, each line has an ISO-8601 formatted date and a description of the processing step.
- ERDDAP creates/modifies this automatically.
- If it already exists, ERDDAP will append to information to the existing information.
- "history" is important because it allows clients to backtrack to the original source of the data.
- "history" is used in the
CF and
netCDF
metadata standards.
- infoUrl - the URL of a web page with more information about this dataset
(usually at the source institution's web site).
- Either the dataset's sourceAttributes or its addAttributes MUST include this attribute.
- "infoUrl" is important because it allows clients to find out more about the data from the original source.
- ERDDAP displays a link to the infoUrl on the dataset's Data Access Form, Make A Graph web page,
and other web pages.
- If the URL has a query part (after the "?"), it MUST be already
percent encoded.
In practice, this can be very minimal percent encoding:
all you have to do is convert special characters in the right-hand-side values of any constraints:
% into %25, & into %26, " into %22, = into %3D, + into %2B, and space into %20 (or +)
and convert all characters above #126 to their %HH form (where HH is the 2-digit hex value).
Unicode characters above #255 must be UTF-8 encoded and then each byte must be converted to %HH form
(ask a programmer for help).
- Since datasets.xml is an XML file, you MUST also encode '&', '<', and '>'
in the URL as '&', '<', and '>'.
- "infoUrl" is unique to ERDDAP. It is not from any metadata standard.
- institution - the short version of the name of the institution
which is the source of this data (usually an acronym, usually <20 characters).
- Either the dataset's sourceAttributes or its addAttributes MUST include this attribute.
- ERDDAP displays the institution whenever it displays a list of datasets.
- If you add institution to the list of <categoryAttributes> in ERDDAP's
setup.xml file,
users can easily find datasets from the same institution via ERDDAP's
"Search for Datasets by Category" on the home page.
- "institution" is used in the
CF
metadata standard.
- sourceUrl is the URL of the source of the data.
- ERDDAP always creates this automatically.
- "sourceUrl" is important because it allows clients to backtrack to the original source of the data.
- "sourceUrl" is unique to ERDDAP. It is not from any metadata standard.
- summary - the long description of the dataset (usually <500 characters).
- Either the dataset's sourceAttributes or its addAttributes MUST include this attribute.
- "summary" is important because it allows clients to read a description of the dataset
that has more information than the title.
- ERDDAP displays the summary on the dataset's Data Access Form, Make A Graph web page, and other web pages.
- "summary" is used in the
CF
metadata standard.
- title - the short description of the dataset (usually <80 characters).
- Either the dataset's sourceAttributes or its addAttributes MUST include this attribute.
- "title" is important because every list of datasets presented by ERDDAP (other than search results)
lists the datasets in alphabetical order, by title.
So if you want to specify the order of datasets, or have some datasets grouped together,
you have to create titles with that in mind.
Many lists of datasets (e.g., in response to a category search), show a subset of the full list.
So the title for each dataset should stand on its own.
- When a client makes a graph, the title is put in the legend.
If the title is too long (> ~80 characters), the end of the title won't be visible(!).
- "title" is used in the
CF and
netCDF
metadata standards.
- When a user selects a subset of data, globalAttributes related to the variable's
longitude, latitude, altitude, and time ranges
(for example, Southernmost_Northing, Northernmost_Northing, time_coverage_start, time_coverage_end)
are generated automatically.
- A sample global addAttributes is:
<addAttributes>
<att name="Conventions">COARDS, CF-1.0, Unidata Dataset Discovery v1.0</att>
<att name="infoUrl">http://coastwatch.pfeg.noaa.gov/infog/PH_ssta_las.html</att>
<att name="institution">NOAA CoastWatch, West Coast Node</att>
<att name="title">SST, Pathfinder Ver 5.0, Day and Night, Global, Science Quality (1 Day Composite)</att>
<att name="cwhdf_version" />
</addAttributes>
- Variable Attributes -
The addAttributes tag within an axisVariable or dataVariable tag
is used to add attributes to a variable.
- <altitudeMetersPerSourceUnit>
is a number which is multiplied by the source altitude or depth values
to convert them into altitude values (in meters above sea level).
- For example, if the source is already measured in meters above sea level, use 1.
- For example, if the source is measured in meters below sea level, use -1.
- For example, if the source is measured in km above sea level, use 0.001.
- This tag is OPTIONAL, but recommended. The default value is 1.
- An example is: <altitudeMetersPerSourceUnit>-1</altitudeMetersPerSourceUnit>
- <axisVariable> is used to describe a dimension
shared by the data variables in an EDDGrid dataset.
- datasetID="..." is a REQUIRED attribute within a <dataset> tag
which assigns a short (usually <15 characters), unique, identifying name to a dataset.
- Valid characters are A-Z, a-z, 0-9, _, -, and '.', but we recommend just alphanumeric characters.
- Best practices: We recommend using camelCase.
- Best practices: We recommend that the first part be an acronym or abbreviation of the source institution's name
and the second part be an acronym or abbreviation of the the dataset's name.
When possible, we create a name which reflect's the source's name for the dataset.
For example, we used datasetID="erdPHssta8day" for a dataset from
the NOAA NMFS SFSC Environmental Research Division
which is designated by the source to be satellite/PH/ssta/8day.
- <dataVariable> is used to describe a data variable.
- <onChange>
specifies an action which will be done
when this dataset is created (when ERDDAP is restarted) and
whenever this dataset changes in any way.
Currently, for EDDGrid subclasses, any change to metadata or to an axis variable
(e.g., a new time point for near-real-time data) is considered a change,
but a reloading of the dataset is not considered a change (by itself).
Currently, for EDDTable subclasses, any reloading of the dataset is considered a change.
Currently, only two types of actions are allowed:
- http:// - If the action starts with "http://", ERDDAP will send
an HTTP GET request to the specified URL. The response will be ignored.
For example, the URL might tell some other web service to do something.
- If the URL has a query part (after the "?"), it MUST be already
percent encoded.
In practice, this can be very minimal percent encoding:
all you have to do is convert special characters in the right-hand-side values of any constraints:
% into %25, & into %26, ", into %22, = into %3D, + into %2B, and space into %20 (or +)
and convert all characters above #126 to their %HH form (where HH is the 2-digit hex value).
Unicode characters above #255 must be UTF-8 encoded and then each byte must be converted to %HH form
(ask a programmer for help).
- Since datasets.xml is an XML file, you also need to encode '&', '<', and '>'
in the URL as '&', '<', and '>'.
- Example: For a URL that you might type into a browser as:
http://www.company.com/webService?department=R%26D¶m2=value2
You should specify an onChange tag via
<onChange>http://www.company.com/webService?department=R%26D&param2=value2</onChange>
- mailto: - If the action starts with "mailto:", ERDDAP
will send an email to the subsequent email address
indicating that the dataset has been updated/changed.
Example: <onChange>mailto:john.smith@company.com</onChange>
If you have a good reason for ERDDAP to support some other type of action,
send us an email describing what you want.
This tag is OPTIONAL. There can be as many of these tags as you want.
This is analogous to ERDDAP's email/URL subscription system,
but these actions aren't stored persistently
(i.e., they are only stored in an EDD object).
To remove a subscription, just remove the onChange tag. The change will be noted the next time the dataset is reloaded.
- <reloadEveryNMinutes>
indicates how often the dataset should be reloaded.
- Generally, datasets that are updated frequently should be reloaded frequently,
for example, every 60 minutes.
- Datasets that are updated infrequently should be reloaded infrequently,
for example, every 1440 minutes (daily) or 10080 minutes (weekly).
- This tag is OPTIONAL, but recommended. The default is 10080.
- An example is: <reloadEveryNMinutes>1440</reloadEveryNMinutes>
- Note that when a dataset is reloaded, all files in the
bigParentDirectory/cache/datasetID directory are deleted.
- No matter what this is set to, a dataset won't be loaded more frequently than
<loadDatasetsMinMinutes> (default = 15), as specified in
setup.xml.
So if you want datasets to be reloaded very frequently,
you need to set both reloadEveryNMinutes and loadDatasetsMinMinutes to small values.
- Don't set reloadEveryNMinutes to the same value as loadDatasetsMinMinutes,
because the elapsed time is likely to be (for example) 14:58 or 15:02,
so about half the time the dataset won't be reloaded. Use a smaller reloadEveryNMinutes value (e.g., 10).
- Regardless of reloadEveryNMinutes, you can tell ERDDAP to reload a specific dataset
as soon as possible via a
flag file.
- <requestBlacklist>
is an optional tag which contains a comma-separated list of numeric IP addresses which will be immediately blacklisted.
You can also replace the last number in an IP address with * to block 0-255 (e.g., 123.45.67.*).
Any request from one of these addresses will receive an HTTP Error 403: Forbidden.
This can be used to fend off a
Denial of Service attack or an overly zealous
web robot.
For example, <requestBlacklist>123.45.67.89, 98.76.54.321, 123.45.68.*</requestBlacklist>
See your ERDDAP daily report for a list/tally of the most active allowed and blocked requesters.
You can try to convert the IP numbers to domain names with free, reverse DNS, web services like
http://www.hcidata.info/host2ip.htm.
- <sourceCanConstrainStringEQNE>
indicates if the source can constrain String variables with the = and != operators.
- For EDDTableFromDapSequence, this applies to the outer sequence String variables only.
It is assumed that the source can't handle any constraints on inner sequence variables.
- Valid values are "true" (the default) and "false".
- This tag is OPTIONAL. The default is "true".
- For EDDTableFromDapSequence OPeNDAP DRDS servers, this should be set to true (the default).
- For EDDTableFromDapSequence Dapper servers, this should be set to false.
- An example is: <sourceCanConstrainStringEQNE>true</sourceCanConstrainStringEQNE>
- <sourceCanConstrainStringGTLT>
indicates if the source can constrain String variables with the <, <=, >, and >= operators.
- For EDDTableFromDapSequence, this applies to the outer sequence String variables only.
It is assumed that the source can't handle any constraints on inner sequence variables.
- Valid values are "true" (the default) and "false".
- This tag is OPTIONAL. The default is "true".
- For EDDTableFromDapSequence OPeNDAP DRDS servers, this should be set to true (the default).
- For EDDTableFromDapSequence Dapper servers, this should be set to false.
- An example is: <sourceCanConstrainStringGTLT>true</sourceCanConstrainStringGTLT>
- <sourceCanConstrainStringRegex>
indicates if the source can constrain String variables by regular expressions,
and if so, what the operator is.
- Valid values are "=~" (the DAP standard), "~=" (mistakenly supported by many DAP servers),
or "" (indicating that the source doesn't support regular expressions).
- This tag is OPTIONAL. The default is "".
- For EDDTableFromDapSequence OPeNDAP DRDS servers, this should be set to "" (the default).
- For EDDTableFromDapSequence Dapper servers, this should be set to "" (the default).
- An example is: <sourceCanConstrainStringRegex>=~</sourceCanConstrainStringRegex>
- <sourceUrl> specifies the url source of the data.
- For most dataset types, this is REQUIRED. For others it is not an option.
See the dataset type's description for details.
- For most datasets, this is the base of the url that is used to request data.
- For example, for DAP servers, this is the url to which .dods, .das, .dds, or .html could be added.
- If the URL has a query part (after the "?"), it MUST be already
percent encoded.
In practice, this can be very minimal percent encoding:
all you have to do is convert special characters in the right-hand-side values of any constraints:
% into %25, & into %26, ", into %22, = into %3D, + into %2B, and space into %20 (or +)
and convert all characters above #126 to their %HH form (where HH is the 2-digit hex value).
Unicode characters above #255 must be UTF-8 encoded and then each byte must be converted to %HH form
(ask a programmer for help).
- Since datasets.xml is an XML file, you MUST also encode '&', '<', and '>'
in the URL as '&', '<', and '>'.
- An example is:
<sourceUrl>http://dapper.pmel.noaa.gov/dapper/epic/tao_time_series.cdp</sourceUrl>
- <subscriptionEmailBlacklist>
is an optional tag which contains a comma-separated list of email address which are
immediately blacklisted from the
subscription system, for example
<subscriptionEmailBlacklist>bob@badguy.com, john@badguy.com</subscriptionEmailBlacklist> .
If an email address on the list has subscriptions, the subscriptions will be cancelled.
If an email address on the list tries to subscribe, the request will be refused.
- <user>
is an OPTIONAL tag within an <erddapDatasets> tag that identifies a user's
username, password, and roles (a comma-separated list).
- This is part of ERDDAP's
security system
for restricting access to some datasets to some users.
- Make a separate user tag for each user.
- If setup.xml's <authentication> is openid, the username is always the user's OpenID URL
and no password is used.
- setup.xml's <passwordEncoding> determines how the password is stored here.
- The comma-separated list of roles specifies which roles you are assigning to the user.
The "admin" role has a special meaning -- it identifies the ERDDAP administrator.
- The user will then have access to datasets that list one of these roles in the dataset's
accessibleTo tag.
- Thus, this is
role-based access control.
- If there is no user tag for a client, he will only be able to access
datasets which don't have an accessibleTo tag.
- If setup.xml's <authentication> is "custom", you need to specify the user and the password
(at least 7 characters long).
You can choose to store the password in different encodings (in order of increasing security):
plaintext, MD5 (MD5(UserName)),
or UEPMD5 (MD5(UserName:ERDDAP:password), the default).
For UEPMD5, the UserName and "ERDDAP" are used to
salt the hash value,
making it more difficult to decode.
Plaintext passwords are case sensitive.
(See <passwordEncoding> in setup.xml.)
For example (using UEPMD5):
<user username="jsmith" password="57AB7ACCEB545E0BEB46C4C75CEC3C30"
roles="role1, role2" />
where the password is generated from md5 -djsmith:ERDDAP:actualPassword
On Windows, you can generate MD5 digest values by downloading an MD5 program (such as
MD5)
and using (for example): md5 -djsmith:ERDDAP:actualPassword
On Linux/Unix, you can generate MD5 digest values by using the built-in md5sum program (for example):
echo -n "jsmith:ERDDAP:actualPassword" | md5sum
- If setup.xml's <authentication> is "openid", specify the user's OpenID URL
as the "username" and don't specify a password.
For example <user username="http://jsmith.myopenid.com/" roles="role1, role2" />
DISCLAIMER OF ENDORSEMENT
Any reference obtained from this server to a specific commercial product,
process, or service does not constitute or imply an endorsement by CoastWatch,
NOAA, or the United States Government of the product, process, or service, or
its producer or provider. The views and opinions expressed in any referenced
document do not necessarily state or reflect those of CoastWatch, ERD,
NOAA, or the United States Government.
DISCLAIMER FOR EXTERNAL LINKS
The appearance of external links on this World Wide Web site does not
constitute endorsement by the
Department of Commerce/National
Oceanic and Atmospheric Administration
of external Web sites or the information, products or services contained
therein. For other than authorized activities, the Department of Commerce/NOAA does not
exercise any editorial control over the information you may find at these locations. These
links are provided consistent with the stated purpose of this Department of Commerce/NOAA
Web site.
DISCLAIMER OF LIABILITY
Neither the data providers, ERD, CoastWatch, NOAA, nor the United States Government,
nor any of their employees or contractors, makes any warranty, express or implied,
including warranties of merchantability and fitness for a particular purpose,
or assumes any legal liability for the accuracy, completeness, or usefulness,
of any information at this site.
USAGE LIMITATIONS
The SeaWiFS images and data from this site may be used for free, but not
redistributed; all other images and data from this site may be used and
redistributed for free.
CONTACT
Please email questions, comments, or
suggestions regarding this web page to
bob.simons@noaa.gov.
We would really appreciate knowing if you use ERDDAP.
If a data request doesn't work and you think it should,
please send the request URL?query (the String from the
"Just generate the URL" textfield in the Data Access Form).