Set Up Your Own ERDDAP
Why? |
Initial Setup |
Update |
Need To Know |
Don't Need To Know |
Programmer's Guide |
List of Changes |
Credits |
License |
Contact
And see the related document
Working with the datasets.xml File
ERDDAP is an all-open source, all-Java (servlet), web application that runs in a web application server
(for example, Tomcat). This web page is mostly for people ("ERDDAP administrators") who want to set up
their own ERDDAP installation at their own web site.
Why use ERDDAP to distribute your data?
Because the small effort to setup ERDDAP brings many benefits.
- If you already have a web service for distributing your data,
you can set up ERDDAP to access your data via the existing service.
Or, you can set up ERDDAP to access your data directly from local files.
- For each dataset, you only have to write a small chunk of XML to tell ERDDAP how to access the dataset.
- Once you have ERDDAP serving your data, end users can:
- Request the data in various ways (DAP, WMS, and more in the future).
- Get the data response in various file formats. (That's probably the biggest reason!)
- Make graphs and maps. (Everyone likes pretty pictures.)
You can customize your ERDDAP's appearance so ERDDAP fits in with the rest of your web site.
Is the installation procedure hard? Can I do it?
The initial installation takes some time, but it isn't very hard. You can do it.
If you get stuck, email me at bob dot simons at noaa dot gov . I will help you.
How To Do the Initial Setup of ERDDAP on Your Server
ERDDAP can run on any server that supports Java and Tomcat (and perhaps other application servers).
ERDDAP has been tested on Linux, Mac, and Windows computers.
Running ERDDAP on Windows may have problems: notably, ERDDAP may be unable to delete
and/or rename files quickly. This is probably due to antivirus software (e.g., from MacAfee
and Norton) which is checking the files for viruses. If you run into this problem (which
can be seen by error messages in the log.txt file like "Unable to delete ..."), changing the
antivirus software's settings may partially alleviate the problem. Or consider using a Linux
or Mac server instead.
- Set up Java.
Type "java -version" from your server's command line to make sure you have
Java
(JRE or JDK) version 1.7 installed.
For security reasons, it is almost always best to use the latest version of Java.
ERDDAP works with 32 bit or 64 bit Java. 64 bit is preferred for 64-bit operating systems.
On Linux, we recommend that you download and install Java even if your computer came with Java
installed. This lets you be in control of which Java you have and where it is (usr/local?).
To install Java on Linux, see these
instructions.
This version of ERDDAP will work with Java 6; however, we recommend against using Java 6.
Java 6 is past its official end of life and so is no longer supported by Oracle.
As Oracle's Java SE 6 Downloads web page says: "WARNING: These older versions
of the JRE and JDK are provided to help developers debug issues in older systems.
They are not updated with the latest security patches and are not recommended for
use in production."
- Set up Tomcat.
For security reasons, it is almost always best to use the latest version of Tomcat.
Below, the Tomcat directory will be referred to as [tomcat] .
Warning! If you already have a Tomcat running some other web application (especially THREDDS),
we recommend that you install ERDDAP in a second Tomcat,
because ERDDAP may need different settings
and shouldn't have to contend with other applications for memory.
- Follow the instructions at
http://tomcat.apache.org/
to set up Tomcat on your server.
On Linux, we recommend installing it in /usr/local.
On a Mac, it is probably already installed in /Library/Tomcat.
On Windows, select the "32-bit/64-bit Windows Service Installer".
It will be installed in c:/Program Files.
- On Windows, run Tomcat Manager and right click on the Tomcat Manager icon in the system tray;
select Configure : Java.
- Set the Java Virtual Machine directory, e.g.,
C:\Program Files\Java\jre7\bin\client\jvm.dll.
- Set Initial Memory Pool and
Maximum Memory Pool to 1200 MB on 32-bit Windows computers or
up to 1/2 of physical memory on 64-bit Windows computers. On 32-bit computers, going much higher than 1200 MB will probably fail.
- On Linux and Macs, Tomcat is often set up as belonging to
user "tomcat". Set up that account now.
From the parent of the apache-tomcat directory, type
chown -R tomcat apache-tomcat-7.0.32
chgrp -R tomcat apache-tomcat-7.0.32
(but substitute the actual name of your tomcat directory) right after unpacking Tomcat.
Do most of the rest of the setup instructions as user "tomcat".
Later, run the startup.sh and shutdown.sh scripts as user "tomcat" so that Tomcat has
permission to write to its log files.
- On Linux and Macs,
create a file [tomcat]/bin/setenv.sh
(or in Red Hat Enterprise Linux (RHEL), edit ~tomcat/conf/tomcat7.conf)
to set environmental variables.
This file will be used by [tomcat]/bin/startup.sh and shutdown.sh.
The file should contain
export JAVA_HOME=/usr/local/jre1.7.0_09
export JAVA_OPTS='-server -Djava.awt.headless=true -Xmx1500M -Xms1500M'
export TOMCAT_HOME=/usr/local/apache-tomcat-7.0.32
export CATALINA_HOME=/usr/local/apache-tomcat-7.0.32
(but substitute the directory names from your computer).
(If you previously set JRE_HOME, you can remove that.)
On Macs, you probably don't need to set JAVA_HOME.
The -Xmx and -Xms memory settings are important because ERDDAP works better with more
memory. Always set -Xms to the same value as -Xmx.
- For 32 bit Operating Systems and 32 bit Java:
The more physical memory in the server the better: 4+ GB is really good, 2 GB is okay,
less is not recommended. With 32 bit Java, even with abundant physical memory,
Tomcat and Java won't run if you try to set -Xmx much above 1500M (1200M on some computers).
If your server has less than 2GB of memory, reduce the -Xmx value (in 'M'egaBytes)
to 1/2 of the computer's physical memory.
- For 64 bit Operating Systems and 64 bit Java:
64 bit Java will only work on a 64 bit operating system.
To enable 64 bit Java, add -d64 to the list of JAVA_OPTS in startup.sh and shutdown.sh, for example,
export JAVA_OPTS='-server -Djava.awt.headless=true -Xmx8000M -Xms8000M -d64'
With 64 bit Java, Tomcat and Java can use very high -Xmx and -Xms settings.
The more physical memory in the server the better. As a simplistic suggestion:
we recommend you set -Xmx and -Xms to (in 'M'egaBytes) to 1/2 (or less) of the computer's
physical memory.
You can see if Tomcat, Java, and ERDDAP are indeed running in 64 bit mode by searching for
" bit," in ERDDAP's Daily Report email or in the
<bigParentDirectory>/logs/log.txt file (as specified in
setup.xml).
- On Linux and Macs, change the permissions of all
*.sh files in [tomcat]/bin/ to be executable by the owner, e.g., with
chmod +x *.sh
- Fonts for images: We strongly prefer the free Vera Sans fonts
to the standard Linux/Java fonts. Installing these fonts isn't required.
If you don't install these fonts, you need to change the fontFamily setting in
setup.xml to <fontFamily>SansSerif</fontFamily> .
To install the fonts, please download
BitstreamVeraSans.zip
(344,753 bytes, MD5=E16AF0C9838FD2443434F6E0E9FD0A0D)
and unzip the font files to a temporary directory.
- On Linux (as the root user) and Windows XP (as the administrator),
copy the font files into <JAVA_HOME>/lib/fonts
so Java can find the fonts.
Remember: if/when you later upgrade to a newer version of Java, you need to reinstall
these fonts.
- On Macs, for each font file, double click on it and then click Install Font.
- On Windows Vista and 7, in Windows Explorer, select all of the font files.
Right click. Click on Install.
- Test your Tomcat installation.
- Linux:
As user "tomcat", run [tomcat]/bin/startup.sh
View your URL + ":8080/" in your browser (e.g.,
http://coastwatch.pfeg.noaa.gov:8080/).
You should see the Tomcat "Congratulations" page.
If there is trouble, see the Tomcat log file [tomcat]/logs/catalina.out.
- Mac (run tomcat as the system administrator user):
Run [tomcat]/bin/startup.sh
View your URL + ":8080/" in your browser (e.g.,
http://coastwatch.pfeg.noaa.gov:8080/).
You should see the Tomcat "Congratulations" page.
If there is trouble, see the Tomcat log file [tomcat]/logs/catalina.out.
- Windows localhost:
Right click on the Tomcat icon in the system tray, and choose "Start service".
View http://127.0.0.1:8080/ in your browser.
You should see the Tomcat "Congratulations" page.
If there is trouble, see the Tomcat log file [tomcat]/logs/catalina.out.
- Troubles with the Tomcat installation?
- See the Tomcat log file [tomcat]/logs/catalina.out.
Tomcat problems are almost always indicated there.
- See the Tomcat web site or search the web for help,
but please let us know the problems you had and the solutions you found.
- Email me at bob dot simons at noaa dot gov . I will try to help you.
- Set up the [tomcat]/content/erddap configuration files.
On Linux, Mac, and Windows, download
erddapContent.zip
(version 1.42, size=72,146 bytes, MD5=4EC605978B939835947A9D551C947E97)
and unzip it into [tomcat], creating [tomcat]/content/erddap .
For Red Hat Enterprise Linux (RHEL), unzip it into ~tomcat
and set the system property
erddapContentDirectory=~tomcat/content/erddap
in ~tomcat/conf/tomcat7.conf so ERDDAP can find the directory.
Then,
- Read the comments in [tomcat]/content/erddap/setup.xml
and make the requested changes.
setup.xml is the file with all of the settings which specify how your ERDDAP behaves.
- Read the comments in
Working with the datasets.xml File, then modify the XML in
[tomcat]/content/erddap/datasets.xml to specify all of the datasets
you want your ERDDAP to serve.
- (Unlikely) Now or (slightly more likely) in the future,
if you want to modify erddap's CSS file,
make a copy of [tomcat]/content/erddap/images/erddapStart.css called erddap.css
and then make changes to it.
Changes to erddap.css only take affect when ERDDAP is restarted
and often also require the user to clear the browser's cached files.
After you edit the .xml files, it is a good idea to verify that the result is well-formed XML
by pasting the XML text into an XML checker like
RUWF.
In the unusual situation where you aren't allowed to modify the Tomcat directory,
you can put the ERDDAP content directory somewhere else (e.g., /usr/local/erddap).
To let ERDDAP know where it is,
set the system property erddapContentDirectory=/usr/local/erddap/content/erddap/
(or wherever it is).
If you aren't allowed to set this property in startup.sh, perhaps you can set it in Tomcat's context.xml.
- Install the erddap.war file.
On Linux, Mac, and Windows, download
erddap.war
(version 1.42, size=477,356,261 bytes, MD5=EC29F6D9E6185BC71557CF47063E3276)
into [tomcat]/webapps .
The .war file is big because it contains high resolution coastline, boundary,
and elevation data needed to create maps.
- Use ProxyPass so users don't have to put :8080 in the URL.
On Linux computers, if Tomcat is running in Apache, you need to modify the
/etc/httpd/conf/httpd.conf
file to allow HTTP traffic to/from ERDDAP:
- Add the following lines right before #<IfModule mod_proxy.c> :
ProxyPass /erddap http://www.YourServer.org:8080/erddap
ProxyPassReverse /erddap http://www.YourServer.org:8080/erddap
- Then restart Apache: /usr/sbin/apachectl restart (but sometimes it is in a different directory).
- Start Tomcat.
If Tomcat is already running, use [tomcat]/bin/shutdown.sh to shut down Tomcat.
Use [tomcat]/bin/startup.sh to start Tomcat.
Or, if you use the Tomcat Web Application Manager:
- Download erddap.war
into a temporary directory on your computer.
- Use "Select WAR file to upload" to pick the erddap.war file.
- Click on "Deploy".
- Is ERDDAP running?
Hopefully, you can now use a browser to view <YourServer'sURL>/erddap/
and see ERDDAP within a minute (the first time is slow).
ERDDAP starts up without any datasets loaded.
Datasets are loaded in a background thread and so become available one-by-one.
- Troubles installing Tomcat or ERDDAP?
Email me at bob dot simons at noaa dot gov . I will help you.
How To Do an Update of an Existing ERDDAP on Your Server
- Download
erddapContent.zip
(version 1.42, size=72,146 bytes, MD5=4EC605978B939835947A9D551C947E97)
and unzip it into a temporary directory.
- If you made changes to your previous copy of messages.xml,
make the same changes to the new messages.xml in the temporary directory.
- Move messages.xml from the temporary directory to be
[tomcat]/content/erddap/messages.xml .
- Keep using your current setup.xml and datasets.xml.
- Make the changes listed in
Changes
in the section entitled "Things ERDDAP Administrators Need to Know and Do"
for all of the ERDDAP versions since the version you were using.
- Download erddap.war
(version 1.42, size=477,356,261 bytes, MD5=EC29F6D9E6185BC71557CF47063E3276)
into a temporary directory.
- In Tomcat Manager:
- "Undeploy" ERDDAP.
- "Deploy" the erddap.war file.
Or from a Linux or Mac command line:
- In [tomcat]/bin, use ./shutdown.sh
- Use ps -u tomcatUser to ensure that the java/tomcat thread has stopped.
- In [tomcat]/webapps, use rm -r erddap
- In [tomcat]/webapps, use rm erddap.war
- Copy the new erddap.war file from the temporary directory to [tomcat]/webapps .
- In [tomcat]/bin, use ./startup.sh
- Troubles updating ERDDAP?
Email me at bob dot simons at noaa dot gov . I will help you.
- Status Page -
You can view the status of your ERDDAP from any browser by going to
<baseUrl>/erddap/status.html
- This page is generated dynamically, so it always has up-to-the-moment statistics for your ERDDAP.
- It includes statistics regarding the number of requests, memory usage, thread stack traces,
the taskThread, etc.
- Because the Status Page can be viewed by anyone, it doesn't include quite as much information
as the Daily Report.
- log.txt -
When you are first installing ERDDAP or any time something isn't working as expected,
it is very useful to look at the diagnostic messages in the ERDDAP log file.
- The log file is the log.txt file in [bigParentDirectory]/logs
(as specified in setup.xml).
- When the log.txt files gets to 5 MB, it is renamed log.txt.previous and a new log.txt file is created.
So log files don't accumulate.
- Whenever you restart ERDDAP, it makes an archive copy of the log.txt and log.txt.previous files with
a time stamp. If there was trouble before the restart, it may be useful to analyze these archived
files for clues as to what the trouble was.
- You can delete the archive files if they are no longer needed.
- Types of diagnostic messages in the log file:
- The word "error" is used when something went so wrong that the procedure failed to complete.
Although it is annoying to get an error, the error forces you to deal with the problem.
Our thinking is that it is better to throw an error, than to have ERDDAP hobble along,
working in a way you didn't expect.
- The word "warning" is used when something went wrong, but the procedure was able to complete.
These are pretty rare.
- Anything else is just an informative message.
You can control how much information is logged with <logLevel> in
setup.xml.
- Tomcat Shutdown Sometimes Slow -
On Linux and Macs, when you use
[tomcat]/bin/shutdown.sh
to
shutdown Tomcat (and ERDDAP), Tomcat (and ERDDAP) may not be completely shut down
for a few minutes.
The reason is: ERDDAP sends a message to its background threads
to tell them to stop, but sometimes it takes these threads a few minutes
to get to a good stopping place.
So, you should always use
ps -u tomcatUser
(or with other switches)
to ensure that the java/tomcat thread has stopped running before using
[tomcat]/bin/startup.sh
to restart tomcat (and ERDDAP).
If you don't want to wait for Tomcat (and ERDDAP) to stop by itself,
you can use
kill -9 processID
to force the tomcat process to stop immediately.
(Sorry, I don't know the analogous commands for Windows servers.)
- Tomcat Log Files - Tomcat errors are logged to
[tomcat]/logs/catalina.(today).log or
[tomcat]/logs/catalina.out.
These are often particularly helpful if ERDDAP won't even start up.
- emailLogYEAR-MM-DD.txt -
ERDDAP always writes the text of all out-going email messages
in the current day's emailLogYEAR-MM-DD.txt file in [bigParentDirectory]/logs
(as specified in setup.xml).
- If the server can't send out email messages, or if you have configured ERDDAP not to send
out email messages, or if you are just curious, this file is a convenient way to see
all of the email messages that have been sent out.
- You can delete previous days' email log files if they are no longer needed.
- Daily Report -
The Daily Report has lots of useful information -- more than the Status Page.
- It is the most complete summary of your ERDDAP's status.
- Among other statistics, it includes a list of datasets that didn't load and the exceptions
they generated.
- It is emailed to <emailDailyReportsTo> and
<emailEverythingTo> (as specified in
setup.xml) provided you have set up the email email system (in setup.xml).
- It is also written to the log file.
- It is generated when you start up ERDDAP, just after ERDDAP's first attempt to load all of
the datasets.
- It is also generated soon after 7 am local time every morning.
- Adding/Changing Datasets -
ERDDAP usually rereads datasets.xml every <loadDatasetsMinMinutes>
(specified in setup.xml).
So you can make changes to datasets.xml any time, even while ERDDAP is running.
A new dataset will be detected soon, usually within
<loadDatasetsMinMinutes>.
A changed dataset will be reloaded when it is <reloadEveryNMinutes>
old (as specified in datasets.xml).
- A Flag File Tells ERDDAP to Try to Reload
a Dataset As Soon As Possible
- ERDDAP won't notice any changes to a dataset's setup in datasets.xml until ERDDAP reloads the dataset.
- If a dataset is active in ERDDAP and you want to force ERDDAP to reload it as soon as possible
(before the dataset's <reloadEveryNMinutes> would cause it to be reloaded),
put a file in [bigParentDirectory]/flag
(as specified in setup.xml) that has the
same name as the datasetID (as specified in datasets.xml).
- The contents of the flag file are irrelevant.
- ERDDAP continuously looks for flag files.
- When ERDDAP finds a flag file, it deletes the file and tries to reload the dataset very soon
(usually within seconds).
- Note that when a dataset is reloaded, all files in the
bigParentDirectory/cache/datasetID
directory are deleted.
- Note that if the dataset's xml includes
active="false",
a flag will cause the dataset to be made inactive
(if it is active), and in any case, not reloaded.
ERDDAP has a web service so that flags can be set via URLs.
- For example,
http://coastwatch.pfeg.noaa.gov/erddap/setDatasetFlag.txt?datasetID=rPmelTao&flagKey=31415926
(that's a fake flagKey) will set a flag for the rPmelTao dataset.
- There is a different flagKey for each datasetID.
- Administrators can see a list of flag URLs for all datasets by looking at the bottom of
their Daily Report email.
- Administrators should treat these URLs as confidential, since they give someone the right to
reset a dataset at will.
- If you think the flagKeys have fallen into the hands of someone who is abusing them,
you can change <flagKeyKey> in setup.xml and
restart ERDDAP
to force ERDDAP to generate and use a different set of flagKeys.
- If you change <flagKeyKey>, delete all of the old subscriptions (see
the list in your Daily Report) and remember to send the new URLs to the
people who you do want to have them.
The flag system can serve as the basis for a more efficient mechanism for telling ERDDAP when to
reload a dataset. For example, you could set a dataset's <reloadEveryNMinutes>
to a large number (e.g., 10080 = 1 week).
Then, when you know the dataset has changed (perhaps because you added a file to the dataset's data
directory), set a flag so that the dataset is reloaded as soon as possible.
Flags are usually seen quickly. But if the LoadDatasets thread is already
busy, it may be a while before it is available to act on the flag.
But the flag system is much more responsive and much more efficient than setting
<reloadEveryNMinutes> to a small number.
- Force Dataset Removal -
If a dataset is active in ERDDAP, and you want to deactivate it as soon as
possible, add active="false"
to the dataset tag and set a flag.
Flags are usually seen quickly. But if the LoadDatasets thread is already
busy, it may be a while before it is available to act on the flag.
Once the dataset is not active (i.e., not visible in ERDDAP's list of datasets),
you can remove the dataset's description from the datasets.xml file if you want to.
- When Are Datasets Reloaded?
A thread called RunLoadDatasets is the master thread that controls
when datasets are reloaded. RunLoadDatasets loops forever:
- RunLoadDatasets notes the current time.
- RunLoadDatasets starts a LoadDatasets thread to do a "majorLoad".
You can see information about the current/previous majorLoad at the top of your
ERDDAP's status page at http://www.yourSite.org/erddap/status.html
(status page example).
- LoadDatasets makes a copy of datasets.xml.
- LoadDatasets reads through the copy of datasets.xml and, for each dataset,
sees if the dataset needs to be (re)loaded or removed.
- If a flag file exists for this dataset,
the file is deleted and
the dataset is removed if active="false"
or (re)loaded if active="true" (regardless of the dataset's age).
- If the dataset's dataset.xml chunk has active="false"
and the dataset is currently loaded (active), is is unloaded (removed).
- If the dataset has active="true" and the dataset isn't already loaded,
it is loaded.
- If the dataset has active="true" and the dataset is already loaded,
the data set is reloaded if the dataset's age (time since last load) is greater than
its <reloadEveryNMinutes> (default = 10080 minutes),
otherwise, the dataset is left alone.
- LoadDatasets finishes.
The RunLoadDatasets thread waits for the LoadDatasets thread to finish.
If LoadDatasets takes longer than loadDatasetsMinMinutes
(as specified in setup.xml), RunLoadDatasets interrupts the LoadDatasets thread.
Ideally, LoadDatasets notices the interrupt and finishes.
But if it doesn't notice the interrupt within a minute,
RunLoadDatasets calls loadDatasets.stop(), which is undesirable.
- While the time since the start of the last majorLoad is less than
loadDatasetsMinMinutes (as specified in setup.xml, e.g., 15 minutes),
RunLoadDatasets repeatedly looks for flag files
in the [bigParentDirectory]/flag directory. If one or more flag files are
found, they are deleted, and RunLoadDatasets starts a LoadDatasets thread to do a
"minorLoad" (majorLoad=false). You can't see minorLoad information on
your ERDDAP's status page.
- LoadDatasets makes a copy of datasets.xml.
- LoadDatasets reads through the copy of datasets.xml and,
for each dataset for which there was a flag file:
- If the dataset's dataset.xml chunk has active="false"
and the dataset is currently loaded (active), is is unloaded (removed).
- If the dataset has active="true", the dataset is (re)loaded,
regardless of its age.
Non-flagged datasets are ignored.
- LoadDatasets finishes.
- RunLoadDatasets goes back to step 1.
Notes:
- Startup
When you restart ERDDAP, every dataset with active="true" is loaded.
- Cache
When a dataset is (re)loaded, its cache (including any data response files
and/or image files) is emptied.
- Lots of Datasets
If you have a lot of datasets and/or one or more datasets are slow to (re)load,
a LoadDatasets thread may take a long to finish its work,
perhaps even longer than loadDatasetsMinMinutes.
- One LoadDatasets Thread
There is never more than one LoadDatasets thread running at once.
If a flag is set when LoadDatasets is already running,
the flag probably won't be noticed or acted on until that LoadDatasets thread finishes running.
You might say: "That's stupid. Why don't you just start a bunch of new threads
to load datasets?" But if you have lots of datasets which get data from
one remote server, even one LoadDatasets thread will put substantial stress
on the remote server. The same is true if you have lots of datasets which get data from
files on one RAID. There are rapidly diminishing returns from having
more than one LoadDatasets thread.
- Flag = ASAP
Setting a flag just signals that the dataset should be (re)loaded
as soon as possible, not necessarily immediately.
If no LoadDatasets thread is currently running, the dataset will start to be reloaded
within a few seconds.
But if a LoadDatasets thread is currently running, the dataset probably won't be reloaded
until after that LoadDatasets thread is finished.
- Flag File Deleted
In general, if you put a flag file in the [bigParentDirectory]/erddap/flag
directory (by visiting the dataset's flagUrl or putting an actual file there), the
dataset will usually be reloaded very soon after that flag file is deleted.
- Flag vs. Small reloadEveryNMinutes
If you have some external way of knowing when a dataset needs to be reloaded
and if it is convenient for you,
the best way to make sure that a dataset is always up-to-date is to
set its reloadEveryNMinutes to a large number (10080?) and
set a flag (via a script?) whenever it needs to be reloaded.
That is the system that EDDGridFromErddap and EDDTableFromErddap use
receive messages that the dataset needs to be reloaded.
- Look in log.txt
Lots of relevant information is written to the
<bigParentDirectory>/logs/log.txt file.
If things aren't working as you expect, looking at log.txt
lets you diagnose the problem by finding out exactly what ERDDAP did.
- Search for "majorLoad=true" for the start of major LoadDataset threads.
- Search for "majorLoad=false" for the start of minor LoadDatasets threads.
- Search for a given dataset's datasetID for information about it being
(re)loaded or queried.
- Cached Responses -
In general, ERDDAP doesn't cache (store) responses to user requests.
The rationale was that most requests would be slightly different so the cache wouldn't be very effective.
The biggest exceptions are requests for image files
(which are cached since browsers and programs like Google Earth often re-request images)
and requests for .nc files (because they can't be created on-the-fly).
ERDDAP stores each dataset's cached files in a different directory:
bigParentDirectory/cache/datasetID
since a single cache directory might have a huge number of files which might become slow to access.
Files are removed from the cache for one of three reasons:
- All files in this cache are deleted when ERDDAP is restarted.
- Periodically, any file more than <cacheMinutes> old (as specified in
setup.xml) will be deleted.
Removing files in the cache based on age (not Least-Recently-Used) ensures that files won't stay
in the cache very long.
Although it might seem like a given request should always return the same response, that isn't true.
For example, a tabledap request which includes &time>someTime will change if
new data arrives for the dataset.
And a griddap request which includes [last] for the time dimension will change if new
data arrives for the dataset.
- Images showing error conditions are cached, but only for a few minutes (it's a difficult situation).
- Every time a dataset is reloaded, all files in that dataset's cache are deleted.
Because requests may be for the "last" index in a gridded dataset, files in the cache may become
invalid when a dataset is reloaded.
- Stored Dataset Information -
For all types of datasets, ERDDAP gathers lots of information when a dataset is loaded and keeps that
in memory. This allows ERDDAP to respond very quickly to searches, requests for lists of datasets,
and requests for information about a dataset.
For a few types of datasets (notably EDDGridCopy, EDDTableCopy, EDDGridFromXxxFiles, and
EDDTableFromXxxFiles), ERDDAP stores on disk some information about the dataset that is reused
when the dataset is reloaded. This greatly speeds the reloading process.
- The dataset information files are human-readable .json files and are stored in
<bigParentDirectory>/dataset .
- ERDDAP only deletes these files in unusual situations.
- It shouldn't ever be necessary for you to delete these files because ERDDAP verifies and updates
the stored information when the dataset is reloaded.
- But if you ever do need to delete these files (why?), you can do it when ERDDAP is running.
Then set a flag.
- If you want to encourage ERDDAP to update the stored dataset information (for example,
if you just added, removed, or changed some files to the dataset's data directory),
use the flag system.
- robots.txt -
The search engine companies use web crawlers (e.g., GoogleBot) to examine all of
the pages on the web to add the content to the search engines.
For ERDDAP, that is good. ERDDAP has lots of links between pages, so the crawlers
will find all of the web pages and add them to the search engines.
Then, users of the search engines will be able to find datasets on your ERDDAP.
Unfortunately, some web crawlers (e.g., GoogleBot) are now filling out and submitting
forms in order to find additional content. For web commerce sites, this is great.
But this is terrible for ERDDAP because it just leads to an infinite number of
undesirable and pointless attempts to crawl the actual data.
This can lead to more requests for data than from all other users combined.
And it fills the search engine with goofy, pointless subsets of the actual data.
To tell the web crawlers to stop filling out forms, you need to create a text file called
robots.txt
in the root directory of your web site's document hierarchy so that it can be viewed by
anyone as, e.g., http://www.example.com/robots.txt .
If you are creating a new robots.txt file, put these two lines in it:
User-agent: *
Disallow: /*?
If you already have a robots.txt file, add this line to the Disallow section:
Disallow: /*?
This tells the web crawler not to visit URLs with "?" in the URL
(i.e., all form submissions).
It may take a few days for the search engines to notice and for the changes to take effect.
- sitemap.xml - As the
www.sitemaps.org web site says:
Sitemaps are an easy way for webmasters to inform search engines about pages on their
sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists
URLs for a site along with additional metadata about each URL (when it was last updated, how
often it usually changes, and how important it is, relative to other URLs in the site) so that
search engines can more intelligently crawl the site.
Web crawlers usually discover pages from links within the site and from other sites. Sitemaps
supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap
and learn about those URLs using the associated metadata. Using the Sitemap protocol does not
guarantee that web pages are included in search engines, but provides hints for web crawlers
to do a better job of crawling your site.
Actually, since ERDDAP is RESTful, search engine spiders can easily crawl your ERDDAP.
But they tend to do it more often (daily!) than necessary (monthly?).
- Given that each search engine may be crawling your entire ERDDAP every day,
this can lead to a lot of unnecessary requests.
- So ERDDAP generates a sitemap.xml file for your ERDDAP which tells search engines
that your ERDDAP only needs to be crawled every month.
- You should add a refernce to ERDDAP's sitemap.xml to your
robots.txt file:
Sitemap: http://www.yoursite.org/erddap/sitemap.xml
- If that doesn't seem to be getting the message to the crawlers, you can tell the various search
engines about the sitemap.xml file by visiting these URLs (but change YourInstitution
to your institution's acronym or abbreviation and www.yoursite.org to your ERDDAP's URL):
- http://submissions.ask.com/ping?sitemap=http://www.yoursite.org/erddap/sitemap.xml
- http://www.bing.com/webmaster/ping.aspx?siteMap=http://www.yoursite.org/erddap/sitemap.xml
- http://www.google.com/ping?sitemap=http://www.yoursite.org/erddap/sitemap.xml
- http://search.yahooapis.com/SiteExplorerService/V1/updateNotification?appid=YourInstitution_ERDDAP&url=http://www.yoursite.org/erddap/sitemap.xml
(I think) you just need to ping each search engine once, for all time.
The search engines will then detect changes to sitemap.xml periodically.
- Email Notification of Updates - If you want to receive an
email whenever a new version of ERDDAP
is available, send an email to bob dot simons at noaa dot gov requesting this.
- Data Dissemination / Data Distribution Networks: Push and Pull Technology
- Normally, ERDDAP acts as an intermediary: it takes a request from a user;
gets data from a remote data source; reformats the data; and sends it to the user.
- Pull Technology:
ERDDAP also has the ability to actively get all of the available data
from a remote data source and
store
a local copy of the data.
- Push Technology:
By using ERDDAP's
subscription services,
other data servers can be notified as soon
as new data is available so that they can request the data (by pulling the data).
- ERDDAP's
EDDGridFromErddap and
EDDTableFromErddap use ERDDAP's subscription services and
flag system so that it will be notified immediately when new data is available.
- You can combine these to great effect: if you wrap an EDDGridCopy around an EDDGridFromErddap
dataset (or wrap an EDDTableCopy around an EDDTableFromErddap dataset),
ERDDAP will automatically create and maintain a local copy of another ERDDAP's dataset.
- Because the subscription services work as soon as new data is available,
push technology disseminates data very quickly (within seconds).
This architecture puts each ERDDAP administrator in charge of determining where the data
for his/her ERDDAP comes from.
- Other ERDDAP administrators can do the same. There is no need for coordination between
administrators.
- If many ERDDAP administrators link to each other's ERDDAPs, a data distribution network is formed.
- Data will be quickly, efficiently, and automatically disseminated from data sources
(ERDDAPs and other servers) to data re-distribution sites (ERDDAPs) anywhere in the network.
- A given ERDDAP can be both a source of data for some datasets and a re-distribution site for other
datasets.
- The resulting network is roughly similar to data distribution networks set up with programs like
Unidata's IDD/IDM, but less rigidly structured.
- Security/Authentication/Authorization - By default, ERDDAP
runs as an entirely public server (using http)
with no login system
(authentication)
and no restrictions to data access
(authorization).
If you want to restrict access to some or all datasets to some users, you can use ERDDAP's
built-in security system. When the security system is in use:
- ERDDAP uses
role-based access control.
- The ERDDAP administrator defines users with the
<user>
tag in datasets.xml.
Each user has a username, a password, and one or more roles.
- The ERDDAP administrator defines which roles have access to a given dataset via the
<accessibleTo>
tag in datasets.xml for any dataset that shouldn't have public access.
- The user's log in status (and a link to log in/out) will be shown at the top of every web page.
(But a logged in user will appear to ERDDAP to be not logged in if he uses an http URL.)
If a user tries and fails to log in 3 times, the user is blocked from trying to log in for 15 minutes. This prevents hackers from simply trying millions of passwords until they
find the right one.
- Users who are not logged in use ERDDAP's http URLs.
Users who are logged in use ERDDAP's https URLs.
This helps prevent
session hijacking and sidejacking.
- Anyone who isn't logged in can access and use the public datasets.
By default, private datasets don't appear in lists of datasets if a user isn't logged in.
If the administrator has set setup.xml's <listPrivateDatasets> to true,
they will appear.
Attempts to request data from private datasets (if the user knows the URL) will be redirected
to the login page.
- Anyone who is logged in will be able to see and request data from any public dataset
and any private dataset to which their role allows them access.
By default, private datasets to which a user doesn't have access don't appear in lists of datasets.
If the administrator has set setup.xml's <listPrivateDatasets> to true,
they will appear.
Attempts to request data from private datasets to which the user doesn't have access will be
redirected to the login page.
- The RSS information for all datasets is always available to anyone.
(This is not ideal. But RSS readers won't ever log in, so they need access without loggin in.)
But, since private datasets that aren't accessible aren't advertised,
their RSS links are not advertised either.
- Email subscriptions can only be set up when a user has access to a dataset.
Once set up, they continue to function after the user has logged out.
To set up the security/authentication/authorization system:
- Do the standard ERDDAP initial setup.
- In setup.xml,
- Add/change the <authenticate> value from nothing to
custom or openid.
See the comments about these options below.
- Add/change the <baseHttpsUrl> value.
- Insert/uncomment &loginInfo; in <startBodyHtml> to display
the user's log in/out info
at the top of each web page.
- Configure tomcat to support SSL
(the basis for https connections) by creating a keystore with a
digital certificate
and by modifying [tomcat]/conf/server.xml to uncomment the connector for port 8443.
It is better to get a digital certificate from a
certificate authority
than to make a
self-signed certificate
(instructions),
because it gives your clients more assurance that they are indeed connecting to your
ERDDAP, not an imposter's website.
Many vendors sell digital certificates. (Search for web.) They are not expensive.
On Windows, you may have to move .keystore from "c:\Documents and Settings\you\.keystore" to
"c:\Documents and Settings\Default User\.keystore" or "c:\.keystore"
(see [tomcat]/logs/catalina.(today).log if the application doesn't load or users can't see the log in
page).
You can see when the .keystore certificate will expire by examining the certificate when you log in.
For additional security, create a signed certificate from a trusted source.
(Search the web for more information.)
- If Tomcat is running in Apache, you need to modify the
/etc/httpd/conf/httpd.conf file to allow HTTPS
traffic to/from ERDDAP:
To the "VirtualHost" tag, add the lines:
ProxyPass /erddap <YourServer'sHttpsURL>/erddap
ProxyPassReverse /erddap <YourServer'sHttpsURL>/erddap
(This is untested. If you do this and it works or doesn't work, let us know.)
- In datasets.xml, create a
<user>
tag for each user with username, password, and roles information.
This is the authorization part of ERDDAP's security system.
- In datasets.xml, add an
<accessibleTo>
tag to each dataset that shouldn't have public access.
<accessibleTo> lets you specify which roles have access to that dataset.
- Restart Tomcat.
Trouble? Check the Tomcat logs.
- CHECK YOUR WORK! Any mistake could lead to a security flaw.
- Check that the login page uses https (not http).
Attempts to connect via http should be automatically redirected to https and port 8443.
You may need to work with your network administrator to allow external web requests to access
port 8443 on your server.
- You can change the <user> and <accessibleTo> tags at any time.
The changes will be applied at the next regular reload of any dataset,
or immediately if you use a flag.
- It worked for a few months, now users can't get to the log in page?
Check the Tomcat logs. Your .keystore certificate may have expired and you may need to make a new one.
You can see when the .keystore certificate will expire by examining the certificate when you log in.
Authentication (logging in) - Currently, ERDDAP supports
custom and
openid (recommended) authentication.
We strongly recommend OpenID because it frees you from storing and handling user's passwords.
Remember that users often use the same password at different sites.
So they may be using the same password for your ERDDAP as they do at their bank.
That makes their password very valuable -- much more valuable to the user than the data they are requesting.
So you need to do as much as you can to keep the passwords private. That is a big responsibility.
OpenID takes care of passwords, so you don't have to gather, store, or work with them.
So you are freed from that responsibility.
- custom is ERDDAP's custom system for letting users log in by entering their User Name and
Password in a form on a web page.
This is secure because the User Name and Password are transmitted via https (not http),
but OpenID is better because it frees you from having to handle passwords.
The custom approach requires you to collect User Names and Passwords (use your phone!
email isn't secure!) and store them in datasets.xml in
<user> tags.
The custom approach uses a cookie on the user's computer,
so the user's browser must be
set to allow cookies. If a user is making ERDDAP requests from a computer program
(not a browser), cookies are hard to work with. Sorry.
- openid is an open standard that lets users log
in with your password at one web site
and then log in without your password at many other web sites, including ERDDAP.
OpenID is very convenient for ERDDAP administrators -- you don't ever have to deal with passwords.
All you need is a user's OpenID URL (which is public information)
so that you can define the users and their roles in datasets.xml with
<user> tags.
OpenID uses a cookie on the user's computer,
so the user's browser must be set to allow cookies.
If a user is making ERDDAP requests from a computer program (not a browser),
cookies are hard to work with. Sorry.
ERDDAP doesn't support BASIC authentication because:
- BASIC seems geared toward predefined web pages needing secure access or blanket on/off access
to the whole site, but ERDDAP allows (restricted access) datasets to be added on-the-fly.
- BASIC authentication doesn't offer a way for users to log out!
Secure Data Sources - If a data set is to have restricted access to ERDDAP users,
the data source (from where ERDDAP gets the data) should not be publicly accessible.
So how can ERDDAP get the data for restricted access datasets? Some options are:
- ERDDAP can serve data from local files (for example, via EDDTableFromFiles or EDDGridFromFiles).
- ERDDAP can be in a
DMZ
and the data source (e.g., an OPeNDAP server or a database) can be
behind a firewall,
where it is accessible to ERDDAP but not to the public.
- The data source can be on a public web site, but require a login to get the data.
The one type of dataset that ERDDAP can log on to access is
EDDTableFromDatabase.
These datasets support (and should always use) user names, passwords, SSL connections,
and other security measures.
But in general, currently, ERDDAP can't deal these data sources because it has no
provisions for logging on to the data source.
This is the reason why access to
EDDGridFromErddap and EDDTableFromErddap datasets
can't be restricted.
Currently, the local ERDDAP has no way to login and access the metadata information
from the remote ERDDAP.
And putting the remote ERDDAP behind your firewall and removing its dataset's
accessibleTo restrictions doesn't solve the problem:
since user requests for EDDXxxFromErddap data need to be redirected to the remote ERDDAP,
the remote ERDDAP can't be behind a firewall.
Questions? Suggestions? If you have any questions about ERDDAP's security system
or have any questions, doubts, concerns, or suggestions about how it is set up,
please email bob dot simons at noaa dot gov.
These are details that you don't need to know until a need arises.
- Setting Up a Second ERDDAP for Testing/Development
If you want to do this, there are two approaches:
- (Best) Install Tomcat and ERDDAP on a computer other than the computer that has your public ERDDAP.
If you use your personal computer,
- Do the installation one step at a time. Get Tomcat up and running first.
- When Tomcat is running, the Tomcat Manager should be at http://127.0.0.1:8080/manager/html/
- In ERDDAP's setup.xml, set baseUrl to http://127.0.0.1:8080
- ERDDAP will be at http://127.0.0.1:8080/erddap
- (Second Best) Install another Tomcat on the same computer as your public ERDDAP.
- Change all of the port numbers associated with the second Tomcat (e.g., change 8080 to 8081)
(see these directions).
- Install ERDDAP in the new Tomcat.
In the new ERDDAP's setup.xml, set baseUrl to http://www.your.site.name:8081
You can then access the new ERDDAP via http://www.your.site.name:8081/erddap .
- Heavy Loads / Constraints -
With heavy use, a standalone ERDDAP may be constrained by various problems.
For more information, see the
list of
constraints and solutions.
- Grids, Clusters, and Federations -
Under very heavy use, a single standalone ERDDAP will run into
one or more constraints and even the suggested solutions will be insufficient.
For such situations, ERDDAP has features that make it easy to construct scalable grids
(also called clusters or federations) of ERDDAPs which allow the system to handle very heavy use
(e.g., for a large data center). For more information, see
grids, clusters,
and federations of ERDDAPs.
- Cloud Computing -
Several companies are starting to offer cloud computing services
(e.g., Amazon Web Services).
Web hosting companies
have offered a range of roughly similar services since the mid-1990's.
You can use these services to set up a grid/cluster of ERDDAPs to handle very heavy use.
For more information, see
cloud
computing with ERDDAP.
- WaitThenTryAgain Exception -
In unusual circumstances, a user may get an error message like
WaitThenTryAgainException:
There was a (temporary?) problem. Wait a minute, then try again.
(In a browser, click the Reload button.)
Details: GridDataAccessor.increment: partialResults[0]="123542730" was expected
to be "123532800".
The explanation is:
For each EDDGrid dataset, ERDDAP keeps the axis variable values in memory.
They are used, for example, to convert requested axis values that use the "()" format into index numbers.
For example, if the axis values are "10, 15, 20, 25", a request for (20)
will be interpreted as a request for index #2 (0-based indices).
When ERDDAP gets a request for data and gets the data from the source,
it verifies that the axis values that it got from the source match the axis values in memory.
Normally, they do.
But sometimes the data source has changed in a significant way:
for example, index values from the beginning of the axis variable may have
been removed (e.g., "10, 15, 20, 25" may have become "20, 25, 30").
If that happens, it is clear that ERDDAP's interpretation of the request (e.g., "(20)" is index #2)
is now wrong. So ERDDAP throws an exception and marks the dataset as needing to be reloaded.
ERDDAP will update the dataset soon (often in a few seconds, usually within a minute).
Other, similar problems also throw the WaitThenTryAgain exception.
Starting with ERDDAP version 1.14, it became much less likely that a user would actually see this error.
Now, when the underlying error occurs, ERDDAP automatically internally tries to reload the dataset
and resubmit the request to the reloaded dataset.
Often this succeeds. When it does, the user will simply see that a given request took a little longer
than usual. If it fails, the user should (as the message says) wait a minute, then try again.
- Responding Slowly -
If ERDDAP is responding slowly, view your ERDDAP's Status Page from any browser:
<baseUrl>/erddap/status.html .
It has lots of useful information about memory usage, number of requests, thread usage, etc.
Is recent memory usage really high? Are a large number of threads in use?
And/or you can use the "Server Status" link in the Tomcat Manager
to check on the status of all of the response threads.
Is your network / internet connection running slowly?
There may be other useful diagnostic information in the ERDDAP log or in [tomcat]/logs/catalina.out.
- Frozen, Hung, Locked Up -
If ERDDAP freezes, hangs, or locks up, you need to restart Tomcat
or perhaps reboot the server. Before you do, if the server uses Linux, from a command line,
- Type "ps -u <tomcatUser>" (but substitute the name of the user that ran
Tomcat).
- In the ps output, find the processID for the "java" process.
- Type "kill -3 <processID>" to tell Java (which is running Tomcat) to do a thread dump to [tomcat]/logs/catalina.out.
- After you reboot, you can diagnose the problem by finding the thread dump information
(and any other useful information above it) in [tomcat]/logs/catalina.out and
also by reading relevant parts of the ERDDAP log archive.
- Please email that information to bob dot simons at noaa dot gov so we can see
what went wrong.
- PermGen -
If you repeatedly use Tomcat Manager to Reload (or Stop and Start) ERDDAP,
ERDDAP may fail to start up and throw java.lang.OutOfMemoryError: PermGen.
The solution is to periodically (or every time?) shut down and restart Tomcat,
instead of Reloading ERDDAP.
[Update: This problem should be greatly minimized or fixed in ERDDAP version 1.24.]
- Palettes are used to convert a range of numbers into a range of colors
when making graphs and maps.
ERDDAP comes with several pre-made palettes. You can add additional palettes.
ERDDAP's palettes are defined in GMT-style
.cpt (Color Palette Table) files.
All ERDDAP .cpt files are valid GMT .cpt files, but the opposite may not be true.
ERDDAP may just support a subset of what GMT supports.
If there is trouble, ERDDAP will probably throw an error when the .cpt file is parsed
(which is better than misusing the information).
ERDDAP requires that all .cpt files be stored in [tomcat]/webapps/erddap/WEB-INF/cptfiles.
To get ERDDAP to use a new .cpt file, store the file in that directory
and modify the <palettes> tag in [tomcat]/content/erddap/messages.xml.
But don't remove any of the standard palettes. They are a standard feature of all ERDDAP installations.
An advantage of this approach is that you can specify the order of the palettes in the list
presented to users.
Every new ERDDAP release will overwrite these changes.
So, for every new ERDDAP release, you need to:
- Put your .cpt files in [tomcat]/webapps/erddap/WEB-INF/cptfiles
- Make the changes to <palettes> in messages.xml
- Restart ERDDAP so ERDDAP re-reads the messages.xml file.
- How does ERDDAP generate the colors in a colorbar?
- The user selects one of the predefined palettes or uses the default,
e.g., Rainbow. Palettes are stored/defined in GMT-style .cpt Color Palette Table files.
Each of ERDDAP's predefined palettes has a simple integer range,
e.g., 0 to 1 (if there is just one section in the palette),
or 0 to 4 (if there are four sections in the palette).
Each segment in the file covers n to n+1, starting at n=0.
- ERDDAP generates a new .cpt file on-the-fly, by scaling the predefined palette's range
(e.g., 0 to 4) to the range of the palette needed by the
user (e.g., 0.1 to 50) and then generating a section in the
new palette for each section of the new palette
(e.g., a log scale with ticks at 0.1, 0.5, 1, 5, 10, 50 will have 5 sections).
The color for the end point of each section is generated by finding the relevant section of the
palette in the .cpt file, then linearly interpolating the R, G, and B values.
(That's the same as how GMT generates colors from its Color Palette Table files.)
This system allows ERDDAP to start with generic palettes
(e.g., Rainbow with 8 segments, in total spanning 0 to 8)
and create custom palettes on-the-fly
(e.g., a custom Rainbow, which maps 0.1 to 50 mg/L to the rainbow colors).
- ERDDAP then uses that new .cpt file to generate the color for each different colored pixel
in the color bar
(and later for each data point when plotting data on a graph or map),
again by finding the relevant section of the
palette in the .cpt file, then linearly interpolating the R, G, and B values.
This process may seem unnecessarily complicated. But it solves problems
related to log scales that are hard to solve other ways.
So how can you mimic what ERDDAP is doing? That isn't easy.
Basically you need to duplicate the process that ERDDAP is using.
If you are a Java programmer, you can use the same Java class that ERDDAP uses to do all of this:
[tomcat]/webapps/erddap/WEB-INF/classes/gov/noaa/pfel/coastwatch/sgt/CompoundColorMap.java.
- Guidelines for Data Distribution Systems
- More general opinions about the design and evaluation
of data distribution systems can be found
here.
These are things that only a programmer who intends to work with ERDDAP's Java classes needs to know.
- Source Code
- The source code for the current version of ERDDAP is always in the current erddap.war file.
- The source code for recent public versions and in-development versions
is available via GitHub.
Please read the Wiki
for that project.
- ERDDAP has a very liberal, open-source license,
so you can use the source code for any purpose.
- Use the Code for Other Projects
While you are welcome to use parts of the ERDDAP code for other projects, be warned that the code
can and will change.
We don't promise to support other uses of our code.
Our main goal is to make a web application that people can download and use, as is, to distribute data.
For many situations where you might be tempted to use parts of ERDDAP in your project,
we think you will find it much easier to install and use ERDDAP as is,
and then write other services which use ERDDAP's services.
You can set up your own ERDDAP installation crudely in an hour or two.
You can set up your own ERDDAP installation in a polished way in a few days
(depending on the number and complexity of your datasets).
But hacking out parts of ERDDAP for your own project is likely to take weeks
(and months to catch subtleties).
We (obviously) think there are many benefits to using ERDDAP as is and making your ERDDAP
installation publicly accessible.
However, in some circumstances, you might not want to make your ERDDAP installation publicly accessible.
Then, your service can access and use your private ERDDAP and your clients needn't know about ERDDAP.
Half Way - Or, there is another approach which you may find useful
which is half way between delving into ERDDAP's code and using ERDDAP as a stand-alone web service:
In the EDD class, there is a static method which lets you make an instance of a dataset
(based on the specification in datasets.xml):
oneFromDatasetXml(String tDatasetID)
It returns an instance of an EDDTable or EDDGrid dataset.
Given that instance, you can call
makeNewFileForDapQuery(String userDapQuery, String dir, String fileName, String fileTypeName)
to tell the instance to make a data file, of a specific fileType, with the results from a user query.
Thus, this is a simple way to use ERDDAP's methods to request data and get a file in response,
just as a client would use the ERDDAP web application.
But this approach works within your Java program and bypasses the need for an application server
like Tomcat.
We use this approach for many of the unit tests of EDDTable and EDDGrid subclasses,
so you can see examples of this in the source code for all of those classes.
- Development Environment
- Directory Structure
- If you are installing ERDDAP in a Tomcat (whether or not you will actually use it that way),
follow the instructions above.
- If you aren't installing ERDDAP in a Tomcat, you still need to make a Tomcat-like directory structure,
so that ERDDAP can find the setup files in [tomcat]/content/erddap .
- Make a directory somewhere called "tomcat" (it can be something else, but this is easier to explain).
- As indicated above, unzip erddapContent.zip into [tomcat], creating [tomcat]/content/erddap .
Follow the instructions above to modify setup.xml and datasets.xml.
Depending on your situation, you may need to specify that directory by adding something like
-DErddapContentDirectory="/someDirectory/content/erddap/"
to the Java command line for your program.
- Make a [tomcat]/webapps/erddap directory.
- .war files are just .zip files that follow a few additional conventions.
So you can use an unzip program to unzip erddap.war into [tomcat]/webapps/erddap .
That has all of ERDDAP's .java classes and many other files.
- Our development environment is just a programmer's editor (we're not saying which one).
(No, we don't use Eclipse, Ant, Maven, or ...; nor do we offer ERDDAP-related support for them.
If we required you to use Ant and you preferred Maven (or vice versa), you wouldn't be very happy about it.)
- We use a batch file which deletes all of the .class files in the source tree.
- We currently use javac 1.6.0_35 to compile gov.noaa.pfel.coastwatch.TestAll
(it has links to a few classes that wouldn't be compiled otherwise)
and java 1.6.0_35 and 1.7.0_09 to run the tests.
For security reasons, it is almost always best to use the latest versions of Java and Tomcat.
- When we run javac or java, the current directory is [tomcat]/webapps/erddap/WEB-INF .
- Our javac and java classpath (including some unnecessary items) is currently
./classes;../../../lib/servlet-api.jar;lib/activation.jar;lib/axis.jar;lib/commons-compress.jar;lib/commons-discovery.jar;lib/itext-1.3.1.jar;lib/joda-time.jar;lib/joid.jar;lib/lucene-core.jar;lib/mail.jar;lib/netcdfAll-latest.jar;lib/postgresql.jdbc.jar;lib/jaxrpc.jar;lib/saaj.jar;lib/slf4j-jdk14.jar;lib/tsik.jar;lib/wsdl4j.jar
- So your javac command line will be something like
javac -cp (classpath above) classes/gov/noaa/pfel/coastwatch/TestAll.java
- And your java command line will be something like
java -cp (classpath above) -Xmx1200M -Xms1200M classes/gov/noaa/pfel/coastwatch/TestAll
Optional: you can add -verbose:gc. It tells Java to print garbage collection
statistics.
- If TestAll compiles, everything ERDDAP needs has been compiled.
Lots of classes are compiled that aren't needed for ERDDAP.
If compiling TestAll succeeds but doesn't compile some class, that class isn't needed.
(There are some unfinished/unused classes.)
- We use some 3rd party source code instead of .jar files (notably for SSHTools and Dods)
and have modified them slightly to avoid problems compiling with Java 1.7.
We have often made other slight modifications (notably to Dods) for other reasons.
- Most classes have test methods. We run lots of tests via TestAll.
Unfortunately, many of the tests are specific to our set up. (Sorry.)
- Important Classes - If you want to look
at the source code and try to figure out how ERDDAP works, please do.
- The code has JavaDoc comments, but the JavaDocs haven't been generated. Feel free to generate them.
- The most important classes (including the ones mentioned below) are within gov/noaa/pfel/erddap.
- The Erddap class has the highest level methods. It extends HttpServlet.
- Erddap passes requests to instances of subclasses of EDDGrid or EDDTable,
which represent individual datasets.
- EDDGrid and EDDTable subclasses parse the request, get data from subclass-specific methods,
then format the data for the response.
- EDDGrid subclasses push data into GridDataAccessor (the internal data container for gridded data).
- EDDTable subclasses push data into TableWriter subclasses, which write data to a specific file
type on-the-fly.
- Code Contributions -
If you have written some code which would be a nice addition to ERDDAP,
please email bob dot simons at noaa dot gov. We'll work out the details.
The two situations that are most likely are:
- You want to write another subclass of EDDGrid or EDDTable to handle another data source type.
If so, we recommend that you find the closest existing subclass and use that code as a starting point.
- You want to write another saveAsFileType method.
If so, we recommend that you find the closest existing saveAsFileType method
in EDDGrid or EDDTable
and use that code as a starting point.
Both of these situations have the advantage that the code you write is self-contained.
You won't need to know all the details of ERDDAP's internals.
And it will be easy for us to incorporate your code in ERDDAP.
Note that if you do submit code, the license will need compatible with the ERDDAP
license (e.g.,
Apache,
BSD, or
MIT-X).
You can hold the copyright to your code. We'll list your contribution in the
credits.
The
List of Changes
for each ERDDAP release is now on a separate web page.
ERDDAP is a product of the
NOAA
NMFS
ERD.
Bob Simons is the author of ERDDAP (the designer and programmer who wrote the ERDDAP-specific code).
Roy Mendelssohn instigated the project.
The ERDDAP-specific code is licensed as copyrighted open source, with
NOAA
holding the copyright. See the ERDDAP license.
ERDDAP uses copyrighted open source, Apache, LGPL, MIT/X, Mozilla, and public domain libraries and data.
ERDDAP does not require any GPL code or commercial programs.
Some of the funding for work on ERDDAP has come from the
NOAA CoastWatch program,
the NOAA IOOS
program, and the POST program.
Thank you all very much.
- ERDDAP is a
Java Servlet
program. At ERD, it runs inside of a
Tomcat application server
(license: Apache),
with an
Apache web server
(license: Apache),
running on a computer using the
Red Hat Linux operating
system (license: GPL).
- The data sets are from various sources. See the metadata (in particular the "sourceUrl", "infoUrl",
and "institution") for each dataset.
- The com/cohort classes are from CoHort Software
(http://www.cohort.com)
which makes these classes available with an MIT/X-like license (see classes/com/cohort/util/LICENSE.txt).
- Data from OPeNDAP servers are read
with Java DAP 1.1.7
(license: LGPL).
- NetCDF files (.nc) and GMT-style NetCDF files (.grd) are read and written with code in the
NetCDF Java Library
(license: MIT/X-like)
from Unidata.
- The NetCDF Java Library reads GRIB files via the Unidata
GRIB decoder (grib-6.0.jar)
(license: MIT/X-like).
- The NetCDF Java Library reads BUFR files via the Unidata
BUFR decoder (bufrTables-1.5.jar)
(license: MIT/X-like).
- The NetCDF Java Library requires a logger facade (we chose slf4j-jdk14.jar) from the
Simple Logging Facade for Java project
(license: MIT/X).
- The NetCDF Java Library uses code from several .jar files from
Apache projects:
commons-codec,
commons-httpclient, and
commons-logging
(license: Apache).
- Other parts of ERDDAP use code from other
Apache projects:
commons-compress and
commons-discovery
(license: Apache).
- The NetCDF Java Library uses XML processing code from
JDOM
(license: Apache).
- ERDDAP uses
json.org's Java-based JSON library
to parse JSON data
(license: copyrighted open source).
- The graphs and maps are created on-the-fly with a modified version of NOAA's
SGT version 3
(a Java-based Scientific Graphics Toolkit written by Donald Denbo at
NOAA PMEL)
(license: copyrighted open source).
- Big, HTML tooltips on ERDDAP's HTML pages are created with Walter Zorn's wz_tooltip.js
(license: LGPL).
- Sliders and the drag and drop feature of the Slide Sorter are created with Walter Zorn's
wz_dragdrop.js
(license: LGPL).
- The .pdf files are created with
iText (version 1.3.1, which used the
Mozilla license),
a free Java-PDF library by Bruno Lowagie and Paulo Soares.
- The shoreline and lake data are from
GSHHS
- A Global Self-consistent, Hierarchical,
High-resolution Shoreline Database
(license: GPL)
and created by Paul Wessel and Walter Smith.
- The political boundary and river data are from the
pscoast
program in
GMT,
which uses data from the
CIA World Data Bank II
(license: public domain).
- The bathymetry/topography data used in the background of some maps is the
ETOPO1 Global 1-Minute Gridded Elevation Data Set (Ice Surface, grid registered, binary,
2 byte int: etopo1_ice_g_i2.zip)
(license:
public domain),
which is distributed for free by
NOAA NGDC.
- Emails are sent using code in mail.jar from Oracle's
JavaMail API
(license:
Oracle
Binary Code License Agreement for Java EE Technologies).
- JavaMail uses activation.jar from the
JavaBeans Activation Framework
(license:
Oracle
Binary Code License Agreement for Java EE Technologies.
- For OpenID authentication, ERDDAP uses the
joid.jar and tsik.jar libraries from the
joid project
from Verisign
(license: Apache 2.0).
- ERDDAP uses Joda
for some calendar calculations.
(license: Apache).
- ERDDAP includes the
PostGres JDBC4 driver
(license: BSD).
The driver is Copyright (c) 1997-2010, PostgreSQL Global Development Group. All rights reserved.
- For SOAP services, ERDDAP uses:
- ERDDAP uses code from the
CoastWatch Browser
project from the
NOAA CoastWatch
West Coast Regional Node
(license:
copyrighted open source).
That project was initiated and managed by Dave Foley, the Coordinator of the NOAA CoastWatch
West Coast Regional Node.
The CoastWatch Browser code was written by Bob Simons.
- The ERDDAP distribution includes code from the J2SSH project which is distributed by
SSHTools
(version 0.2.2, license: Apache).
It is based on j2ssh/examples/SftpConnect.java (License: LGPL) which is
Copyright (C) 2002 Lee David Painter (lee@sshtools.com).
The ERDDAP-specific code is licensed as copyrighted open source,
with NOAA holding the copyright.
The license is:
ERDDAP, Copyright 2012, NOAA.
PERMISSION TO USE, COPY, MODIFY, AND DISTRIBUTE THIS SOFTWARE AND
ITS DOCUMENTATION FOR ANY PURPOSE AND WITHOUT FEE IS HEREBY GRANTED,
PROVIDED THAT THE ABOVE COPYRIGHT NOTICE APPEAR IN ALL COPIES, THAT
BOTH THE COPYRIGHT NOTICE AND THIS PERMISSION NOTICE APPEAR IN
SUPPORTING DOCUMENTATION, AND THAT REDISTRIBUTIONS OF MODIFIED FORMS
OF THE SOURCE OR BINARY CODE CARRY PROMINENT NOTICES STATING THAT THE
ORIGINAL CODE WAS CHANGED AND THE DATE OF THE CHANGE. THIS SOFTWARE
IS PROVIDED "AS IS" WITHOUT EXPRESS OR IMPLIED WARRANTY.
Questions, comments, suggestions? Please send an email to
bob dot simons at noaa dot gov
and include the ERDDAP URL directly related to your question or comment.
ERDDAP, Version 1.42
Disclaimers |
Privacy Policy
|