Set Up Your Own ERDDAP
ERDDAP is a
Free and Open Source,
all-Java (servlet), web application that
runs in a web application server
(for example, Tomcat (recommended), or Jetty (it works, but we don't support it)).
This web page is mostly for people ("ERDDAP administrators")
who want to set up their own ERDDAP installation at their own website.
Because the small effort to set up ERDDAP brings many benefits.
- If you already have a web service for distributing your data,
you can set up ERDDAP to access your data via the existing service.
Or, you can set up ERDDAP to access your data directly from local files.
- For each dataset, you only have to write a small chunk of XML to tell ERDDAP how to access the dataset.
- Once you have ERDDAP serving your data, end users can:
- Request the data in various ways (DAP, WMS, and more in the future).
- Get the data response in various file formats. (That's probably the biggest reason!)
- Make graphs and maps. (Everyone likes pretty pictures.)
- Build other useful and interesting things on top of ERDDAP's web services
-- see the
Awesome ERDDAP
list of awesome ERDDAP-related projects.
You can
customize your ERDDAP's appearance
so ERDDAP reflects your organization and fits in with the rest of your website.
(Australia, Belgium, Canada, China, France, India, Ireland,
Italy, New Zealand, Russia, South Africa, Spain, Sri Lanka, Sweden, Thailand, UK, USA),
including:
- APDRC
(Asia-Pacific Data-Research Center, International Pacific Research Center)
at the University of Hawaii (UH)
- BCO-DMO at WHOI
(Biological and Chemical Oceanography Data Management Office at Woods Hole Oceanographic Institution)
- CanWIN ERDDAP
(Canadian Watershed Information Network) at the Centre for Earth Observation Science (CEOS), University of Manitoba
- CDIP
(Coastal Data Information Program at UCSD)
- CNR-ISP
(National Research Council of Italy, Institute of Polar Sciences)
- CSIRO and IMOS
(Australia's Commonwealth Scientific and Industrial Research Organisation and the Integrated Marine Observing System)
- DIVER (NOAA ORR)
(NOAA Office of Response and Restoration)
- EMODnet Physics
(The European Marine Observation and Data Network -- Physics)
- GoMRI
(Gulf of Mexico Research Initiative)
- Hakai Institute
(The Hakai Institute on the Central Coast of British Columbia, Canada)
- High School Technology Services,
which offers coding and technology training for students and adults
- ICHEC
(Irish Centre for High-End Computing)
- INCOIS
(Indian National Centre for Ocean Information Services)
- IRD (Institut de Recherche pour le Développement, France)
CNRS (Centre National de la Recherche Scientifique, France)
UPMC (Université Pierre et Marie CURIE, Paris, France)
UCAD (Université Cheikh Anta Diop de Dakar, Sénégal)
UGB (Université Gaston Berger -- Saint-Louis du Sénégal)
UFHB (Université Félix HOUPHOUËT-BOIGNY, Abidjan, Côte d'Ivoire)
IPSL (Institut Pierre Simon Laplace des sciences de l'environnement, Paris, France)
LMI ECLAIRS (Laboratoire Mixte International «Etude du Climat en Afrique de l’Ouest
et de ses Interactions avec l’Environnement Régional, et appui aux services climatiques»)
- JRC (European Commission - Joint Research Centre, European Union)
- The Marine Institute
(Ireland)
- Marine Instruments S.A. (Spain)
- NCI (Australia's National Computational Infrastructure)
- NOAA CoastWatch
(central)
- NOAA CoastWatch CGOM
(Caribbean/Gulf of Mexico Node)
- NOAA CoastWatch GLERL
(Great Lakes Node)
- NOAA CoastWatch West Coast
which is co-located with and works with
NOAA ERD
(Environmental Research Division of SWFSC of NMFS)
- NOAA IOOS Sensors
(Integrated Ocean Observing System)
- NOAA IOOS CeNCOOS
(Central and Northern California Ocean Observing System, run by Axiom Data Science)
- NOAA IOOS GCOOS Atmospheric and Oceanographic Data: Observing System
NOAA IOOS GCOOS Atmospheric and Oceanographic Data: Historical Collections
NOAA IOOS GCOOS Biological and Socioeconomics
(Gulf Coast Ocean Observing System)
- NOAA IOOS NERACOOS
(Northeastern Regional Association of Coastal and Ocean Observing Systems)
- NOAA IOOS NGDAC
(National Glider Data Assembly Center)
- NOAA IOOS NANOOS
(Northwest Association of Networked Ocean Observing Systems)
- NOAA IOOS PacIOOS
(Pacific Islands Ocean Observing System) at the University of Hawaii (UH)
- NOAA IOOS SCCOOS
(Southern California Coastal Ocean Observing System)
- NOAA IOOS SECOORA
(Southeast Coastal Ocean Observing Regional Association)
- NOAA NCEI
(National Centers for Environmental Information)
- NOAA NGDC STP (National Geophysical Data Center, Solar -- Terrestrial Physics)
- NOAA NMFS NEFSC (Northeast Fisheries Science Center)
- NOAA NOS CO-OPS
(Center for Operational Oceanographic Products and Services)
- NOAA OSMC
(Observing System Monitoring Center)
- NOAA PIFSC
(Pacific Islands Fisheries Science Center)
- NOAA PolarWatch
- NOAA UAF
(Unified Access Framework)
- Ocean Networks Canada
- Ocean Tracking Network
- OOI / All Data
(Ocean Observatories Initiative)
OOI / Uncabled Data
- Princeton, Hydrometeorology Research Group
- R.Tech Engineering, France
- Rutgers University, Department of Marine and Coastal Sciences
-
San Francisco Estuary Institute
- Scripps Institution of Oceanography, Spray Underwater Gliders
- Smart Atlantic
Memorial University of Newfoundland
- South African Environmental Observation Network
- Spyglass Technologies
- Stanford University, Hopkins Marine Station
- UNESCO IODE
(International Oceanographic and Information Data Exchange)
- University of British Columbia, Earth, Ocean & Atmospheric Sciences Department
- University of California at Davis, Bodega Marine Laboratory
- University of Delaware, Satellite Receiving Station
- University of Washington, Applied Physics Laboratory
- USGS CMGP
(Coastal and Marine Geology Program)
- VOTO
(Voice Of The Ocean, Sweden)
This is a list of just some of the organizations where ERDDAP has been installed
by some individual or some group.
It does not imply that the individual, the group, or the organization recommends
or endorses ERDDAP.
ERDDAP is recommended within NOAA and CNRS
NOAA's Data Access Procedural Directive
includes ERDDAP in its list of recommended data servers for use by groups within NOAA.
ERDDAP is favorably mentioned in section 4.2.3 of the
Guide de bonnes pratiques sur la gestion des données de la recherche
(Research Data Management Best Practices Guide)
of the Centre National de la Recherche Scientifique (CNRS) in France.
Is the installation procedure hard? Can I do it?
The initial installation takes some time, but it isn't very hard. You can do it.
If you get stuck, email me at
erd dot data at noaa dot gov . I will help you.
Or, you can join the
ERDDAP Google Group / Mailing List
and post your question there.
ERDDAP can run on any server that supports Java and Tomcat (and other application servers like Jetty, but we don't support them).
ERDDAP has been tested on Linux (including on Amazon's AWS), Mac, and Windows computers.
- Amazon -- If you are installing
ERDDAP on an Amazon Web Services EC2 instance, see this
Amazon Web Services Overview (below)
first.
- Docker -- Axiom now offers
ERDDAP in a Docker container
and IOOS now offers a
Quick Start Guide for ERDDAP in a Docker Container.
It's the standard ERDDAP installation, but Axiom has put it in a docker container.
If you already use Docker, you will probably prefer the Docker version.
If you don't already use Docker, we generally don't recommend this.
If you chose to install ERDDAP via Docker, we don't offer any support
for the installation process.
We haven't worked with Docker yet. If you work with this, please send us your comments.
- Linux and Macs -- ERDDAP works great on Linux and Mac computers. See the instructions below.
- Windows -- Windows is fine for testing ERDDAP and for personal use (see the instructions below),
but we don't recommend using it for public ERDDAPs.
Running ERDDAP on Windows may have problems:
notably, ERDDAP may be unable to delete
and/or rename files quickly. This is probably due to antivirus software (e.g., from McAfee
and Norton) which is checking the files for viruses. If you run into this problem (which
can be seen by error messages in the log.txt
file like "Unable to delete ..."), changing the
antivirus software's settings may partially alleviate the problem. Or consider using a Linux
or Mac server instead.
The standard ERDDAP installation instructions for Linux, Macs, and Windows computers are:
- For ERDDAP v2.19+, set up Java 17.
For security reasons, it is almost always best to use the latest version of Java 17.
Please download and install the latest version of
Adoptium's OpenJDK (Temurin) 17 (LTS).
To verify the installation, type "/javaJreBinDirectory/java -version", for example
/usr/local/jdk-17.0.4+8/jre/bin/java -version
[For ERDDAP versions before v2.19, use Java 8.]
ERDDAP works with Java from other sources, but we recommend Adoptium because it is the
main, community-supported, free (as in beer and speech) version of Java 17
that offers Long Term Support (free upgrades for many years past the initial release).
For security reasons, please update your ERDDAP's version of Java periodically
as new versions of Java 17 become available from Adoptium.
ERDDAP has been tested and used extensively with Java 17, not other versions.
For various reasons, we don't test with nor support other versions of Java.
- Set up
Tomcat.
Tomcat is the most widely used Java Application Server,
which is Java software that stands between the operating system's network services
and Java server software like ERDDAP. It is Free and Open Source Software (FOSS).
You can use another Java Application Server (e.g., Jetty), but we only test with and support
Tomcat.
- Download Tomcat and unpack it on your server or PC.
For security reasons, it is almost always best to use the latest version of
Tomcat 10 (version 9 and below are not acceptable)
which is designed to work with Java 17.
Below, the Tomcat directory will be referred to as tomcat.
Warning! If you already have a Tomcat running some other web application (especially THREDDS),
we recommend that you install ERDDAP in
a second Tomcat,
because ERDDAP needs different Tomcat settings
and shouldn't have to contend with other applications for memory.
- On Linux,
download the "Core" "tar.gz" Tomcat distribution
and unpack it. We recommend unpacking it in /usr/local.
- On a Mac, Tomcat is probably already installed in /Library/Tomcat,
but should update it to the latest version of Tomcat 10.
If you download it,
download the "Core" "tar.gz" Tomcat distribution
and unpack it in /Library/Tomcat.
- On Windows, you can
download the "Core" "zip" Tomcat distribution
(which doesn't mess with the Windows registry and which you control from a DOS command line)
and unpack it in an appropriate directory.
(For development, we use the "Core" "zip" distribution.
We make a /programs directory and unpack it there.)
Or you can download the "Core" "64-bit Windows zip" distribution, which includes more features.
If the distribution is a Windows installer, it will probably put Tomcat
in, for example, /Program Files/apache-tomcat-10.0.23 .
- server.xml -
In tomcat/conf/server.xml file,
there are two changes that you should make to each of the two
<Connector> tags (one for '<Connector port="8080" '
and one for '<Connector port="8443" '):
- (Recommended) Increase the connectionTimeout parameter value, perhaps to
300000 (milliseconds) (which is 5 minutes).
- (Recommended) Add
a new parameter: relaxedQueryChars="[]|"
This is optional and slightly less secure, but removes the need for users to
percent-encode these characters when
they occur in the parameters of a user's request URL.
- context.xml -- Resources Cache -
In tomcat/conf/context.xml, right before the </Context> tag,
change the Resources tag (or add it if it isn't already there) to
set the cacheMaxSize parameter to 80000:
<Resources cachingAllowed="true" cacheMaxSize="80000" />
This avoids numerous warnings in catalina.out that all start with
"WARNING [main] org.apache.catalina.webresources.Cache.getResource Unable to add the resource at [/WEB-INF/classes/..."
- On Linux computers, change the Apache timeout settings
so that time-consuming user requests don't timeout
(with what often appears as a "Proxy" or "Bad Gateway" error).
As the root user:
- Modify the Apache httpd.conf file (usually in /etc/httpd/conf/ ):
Change the existing <Timeout> setting
(or add one at the end of the file)
to 3600 (seconds), instead of the default 60 or 120 seconds.
Change the existing <ProxyTimeout> setting
(or add one at the end of the file)
to 3600 (seconds), instead of the default 60 or 120 seconds.
- Restart Apache: /usr/sbin/apachectl -k graceful
(but sometimes it is in a different directory).
- Security recommendation: See
these instructions
to increase the security of your Tomcat installation, especially for public servers.
- For public ERDDAP installations on Linux and Macs,
it is best to set up Tomcat (the program) as belonging to
user "tomcat" (a separate user with limited permissions and which has no password.
Thus, only the super user can switch to acting as user tomcat.
This makes it impossible for hackers to log in to your server as user tomcat.
And in any case, you should make it so that the tomcat user has very limited
permissions on the server's file system
(read+write+execute privileges for the apache-tomcat directory tree and
<bigParentDirectory>
and read-only privileges for directories with data that ERDDAP needs access to).
- You can create the tomcat user account (which has no password) by using the command
sudo useradd tomcat -s /bin/bash -p '*'
- You can switch to working as user tomcat by using the command
sudo su - tomcat
(It will ask you for the superuser password for permission to do this.)
- You can stop working as user tomcat by using the command
exit
- Do most of the rest of the Tomcat and ERDDAP setup instructions as user "tomcat".
Later, run the startup.sh and shutdown.sh scripts as user "tomcat" so that Tomcat has
permission to write to its log files.
- After unpacking Tomcat, from the parent of the apache-tomcat directory:
- Change ownership of the apache-tomcat directory tree to the tomcat user.
chown -R tomcat apache-tomcat-10.0.23
(but substitute the actual name of your tomcat directory).
- Change the "group" to be tomcat, your username,
or the name of a small group that includes tomcat and
all the administrators of Tomcat/ERDDAP, e.g.,
chgrp -R yourUserName apache-tomcat-10.0.23
- Change permissions so that tomcat and the group have
read, write, execute privileges, e.g,.
chmod -R ug+rwx apache-tomcat-10.0.23
- Remove "other" user's permissions to read, write, or execute:
chmod -R o-rwx apache-tomcat-10.0.23
This is important, because it prevents other users from
reading possibly sensitive information in ERDDAP setup files.
- Set
Tomcat's
Environment
Variables
On Linux and Macs:
Create a file tomcat/bin/setenv.sh
(or in Red Hat Enterprise Linux [RHEL], edit ~tomcat/conf/tomcat10.conf)
to set Tomcat's environment variables.
This file will be used by tomcat/bin/startup.sh and shutdown.sh.
The file should contain something like:
export JAVA_HOME=/usr/local/jdk-17.0.4+8
export JAVA_OPTS='-server -Djava.awt.headless=true -Xmx1500M -Xms1500M'
export TOMCAT_HOME=/usr/local/apache-tomcat-10.0.23
export CATALINA_HOME=/usr/local/apache-tomcat-10.0.23
(but substitute the directory names from your computer).
(If you previously set JRE_HOME, you can remove that.)
On Macs, you probably don't need to set JAVA_HOME.
On Windows:
Create a file tomcat\bin\setenv.bat
to set Tomcat's environment variables.
This file will be used by tomcat\bin\startup.bat and shutdown.bat.
The file should contain something like:
SET "JAVA_HOME=\someDirectory\jdk-17.0.4+8"
SET "JAVA_OPTS=-server -Xmx1500M -Xms1500M"
SET "TOMCAT_HOME=\Program Files\apache-tomcat-10.0.23"
SET "CATALINA_HOME=\Program Files\apache-tomcat-10.0.23"
(but substitute the directory names from your computer).
If this is just for local testing, remove "-server".
(If you previously set JRE_HOME, you can remove that.)
The -Xmx and -Xms memory settings are important because ERDDAP works better with more
memory. Always set -Xms to the same value as -Xmx.
- For 32 bit Operating Systems and 32 bit Java:
64 bit Java is much better than 32 bit Java, but 32 bit Java will work
as long as the server isn't really busy.
The more physical memory in the server the better: 4+ GB is really good, 2 GB is okay,
less is not recommended. With 32 bit Java, even with abundant physical memory,
Tomcat and Java won't run if you try to set -Xmx much above 1500M (1200M on some computers).
If your server has less than 2GB of memory, reduce the -Xmx value (in 'M'egaBytes)
to 1/2 of the computer's physical memory.
- For 64 bit Operating Systems and 64 bit Java:
64 bit Java will only work on a 64 bit operating system.
- With Java 8, you need to add -d64 to the Tomcat CATALINA_OPTS parameter in setenv.bat
- With Java 17, you choose 64 bit Java when you download a version of Java marked "64 bit".
With 64 bit Java, Tomcat and Java can use very high -Xmx and -Xms settings.
The more physical memory in the server the better. As a simplistic suggestion:
we recommend you set -Xmx and -Xms to (in 'M'egaBytes) to 1/2 (or less) of the computer's
physical memory.
You can see if Tomcat, Java, and ERDDAP are indeed running in 64 bit mode by searching for
" bit," in ERDDAP's Daily Report email or in the
bigParentDirectory/logs/log.txt file
(bigParentDirectory is specified in
setup.xml).
- In ERDDAP's log.txt file,
you will see many "GC (Allocation Failure)" messages.
This is usually not a problem. It is a frequent message from a normally
operating Java saying that it just finished a minor garbage collection
because it ran out of room in Eden (the section of the Java heap for
very young objects). Usually the message shows you
memoryUseBefore->memoryUseAfter.
If those two numbers are close together, it means that the garbage
collection wasn't productive. The message is only a sign of trouble
if it is very frequent (every few seconds), not productive,
and the numbers are large and not growing,
which together indicate that Java needs more memory,
is struggling to free up memory,
and is unable to free up memory.
This may happen during a stressful time, then go away. But if it persists,
that is a sign of trouble.
- If you see java.lang.OutOfMemoryError's in ERDDAP's log.txt file,
see OutOfMemoryError
for tips on how to diagnose and resolve the problems.
- On Linux and Macs, change the permissions
of all *.sh files in tomcat/bin/ to be executable by the owner, e.g., with
chmod +x *.sh
- Fonts for images:
We strongly prefer the free
DejaVu fonts
to the other Java fonts.
Using these fonts is strongly recommended but not required.
If you choose not to use the DejaVu fonts, you need to change the fontFamily setting in
setup.xml to <fontFamily>SansSerif</fontFamily>,
which is available with all Java distributions. If you set fontFamily to
the name of a font that isn't available, ERDDAP won't load and will
print a list of available fonts in the log.txt file. You must use one of those fonts.
If you choose to use the DejaVu fonts, please make sure the fontFamily setting in
setup.xml is <fontFamily>DejaVu Sans</fontFamily>.
To install the DejaVu fonts, please download
DejaVuFonts.zip
(5,522,795 bytes, MD5=33E1E61FAB06A547851ED308B4FFEF42)
and unzip the font files to a temporary directory.
- On Linux:
- For Linux Adoptium Java distributions, see
these instructions.
- With other Java distributions:
As the Tomcat user, copy the font files into JAVA_HOME/lib/fonts
so Java can find the fonts.
Remember: if/when you later upgrade to a newer version of Java, you need to reinstall
these fonts.
- On Macs: for each font file, double click on it and then click Install Font.
- On Windows 7 and 10: in Windows Explorer, select all of the font files.
Right click. Click on Install.
- Test your Tomcat installation.
- Linux:
- As user "tomcat", run tomcat/bin/startup.sh
- View your URL + ":8080/" in your browser (e.g.,
http://coastwatch.pfeg.noaa.gov:8080/).
- You should see the Tomcat "Congratulations" page.
If there is trouble, see the Tomcat log file tomcat/logs/catalina.out.
- Mac (run tomcat as the system administrator user):
- Run tomcat/bin/startup.sh
- View your URL + ":8080/" in your browser (e.g.,
http://coastwatch.pfeg.noaa.gov:8080/).
Note that by default, your Tomcat is only accessible by you. It is not publicly accessible.
- You should see the Tomcat "Congratulations" page.
If there is trouble, see the Tomcat log file tomcat/logs/catalina.out.
- Windows localhost:
- Right click on the Tomcat icon in the system tray, and choose "Start service".
- View http://127.0.0.1:8080/ (or perhaps
http://localhost:8080/)
in your browser.
Note that by default, your Tomcat is only accessible by you. It is not publicly accessible.
- You should see the Tomcat "Congratulations" page.
If there is trouble, see the Tomcat log file tomcat/logs/catalina.out.
- Troubles with the Tomcat installation?
- On Linux and Mac, if you can't reach Tomcat or ERDDAP (or perhaps
you just can't reach them from a computer outside your firewall),
you can test if Tomcat is listening to port 8080, by typing (as root)
on a command line of the server:
netstat -tuplen | grep 8080
That should return one line with something like:
tcp 0 0 :::8080 :::* LISTEN ## ##### ####/java
(where '#' is some digit), indicating that
a "java" process (presumably Tomcat) is listening on port "8080" for "tcp" traffic.
If no lines were returned, if the line returned is significantly different,
or if two or more lines were returned,
then there may be a problem with the port settings.
- See the Tomcat log file tomcat/logs/catalina.out.
Tomcat problems and some ERDDAP startup problems are almost always indicated there.
This is common when you are first setting up ERDDAP.
- See the Tomcat
website or search the web for help,
but please let us know the problems you had and the solutions you found.
- Email me at erd dot data at noaa dot gov . I will try to help you.
- Or, you can join the ERDDAP Google Group / Mailing List
and post your question there.
- Set up the tomcat/content/erddap configuration files.
On Linux, Mac, and Windows, download
erddapContent.zip
(version 2.22, 19810 bytes, MD5=1E26F62E7A06191EE6868C40B9A29362, dated 2022-12-08)
and unzip it into tomcat, creating tomcat/content/erddap .
Other Directory: For Red Hat Enterprise Linux (RHEL) or for other situations where
you aren't allowed to modify the Tomcat directory
or where you want/need to put the ERDDAP content directory in some other location for some other reason
(for example, if you use Jetty instead of Tomcat),
unzip erddapContent.zip into the desired directory (to which only user=tomcat has access)
and set the erddapContentDirectory system property
(e.g., erddapContentDirectory=~tomcat/content/erddap)
so ERDDAP can find this new content directory.
[Some previous versions are also available:
2.17
(19,792 bytes, MD5=8F892616BAEEF2DF0F4BB036DCB4AD7C, dated 2022-02-16)
2.18
(19,792 bytes, MD5=8F892616BAEEF2DF0F4BB036DCB4AD7C, dated 2022-02-16)
2.21
(19,810 bytes, MD5=1E26F62E7A06191EE6868C40B9A29362, dated 2022-10-09)
]
Then,
- Read the comments in tomcat/content/erddap/setup.xml
and make the requested changes.
setup.xml is the file with all of the settings which specify how your ERDDAP behaves.
For the initial setup, you MUST at least change these settings:
<bigParentDirectory>, <emailEverythingTo>, <baseUrl>,
<email.*>, <admin.*>
(and <baseHttpsUrl> when you set up https).
When you create the bigParentDirectory, from the parent directory of
bigParentDirectory:
- Make user=tomcat the owner of the bigParentDirectory, e.g.,
chown -R tomcat bigParentDirectory
- Change the "group" to be tomcat, your username,
or the name of a small group that includes tomcat and
all the administrators of Tomcat/ERDDAP, e.g.,
chgrp -R yourUserName bigParentDirectory
- Change permissions so that tomcat and the group have
read, write, execute privileges, e.g,.
chmod -R ug+rwx bigParentDirectory
- Remove "other" user's permissions to read, write, or execute:
chmod -R o-rwx bigParentDirectory
reading possibly sensitive information in ERDDAP log files
and files with information about private datasets.
Environment Variables - Starting with ERDDAP v2.13,
ERDDAP administrators can override any value in setup.xml
by specifying an environment variable named ERDDAP_valueName
before running ERDDAP.
For example, use ERDDAP_baseUrl overrides the <baseUrl> value.
This can be handy when deploying ERDDAP with a container, as you can
put standard settings in setup.xml and then supply special settings
via environment variables.
If you supply secret information to ERDDAP via this method, be sure
to check that the information will remain secret.
ERDDAP only reads environment variables once per startup, in the first second of startup,
so one way to use this is: set the environment variables, start ERDDAP,
wait until ERDDAP is started, then unset the environment variables.
- Read the comments in
Working with the datasets.xml File.
Later, after you get ERDDAP running for the first time (usually with just the default datasets),
you will modify the XML in
tomcat/content/erddap/datasets.xml to specify all of the datasets
you want your ERDDAP to serve.
This is where you will you spend the bulk of your time while setting up ERDDAP
and later while maintaining your ERDDAP.
- (Unlikely) Now or (slightly more likely) in the future,
if you want to modify erddap's CSS file,
make a copy of tomcat/content/erddap/images/erddapStart2.css called erddap2.css
and then make changes to it.
Changes to erddap2.css only take effect when ERDDAP is restarted
and often also require the user to clear the browser's cached files.
ERDDAP will not work correctly if the setup.xml or datasets.xml file isn't a well-formed XML file.
So, after you edit these files, it is a good idea to verify that the result is well-formed XML
by pasting the XML text into an XML checker like
xmlvalidation.
- Install the erddap.war file.
On Linux, Mac, and Windows, download
erddap.war
into tomcat/webapps .
(version 2.22, 567,742,765 bytes, MD5=2B33354F633294213AE2AFDDCF4DA6D0, dated
2022-12-08)
The .war file is big because it contains high resolution coastline, boundary,
and elevation data needed to create maps.
[Some previous versions are also available.
2.17
(551,068,245 bytes, MD5=5FEA912B5D42E50EAB9591F773EA848D, dated 2022-02-16)
2.18
(551,069,844 bytes, MD5=461325E97E7577EC671DD50246CCFB8B, dated 2022-02-23)
2.21
(568,644,411 bytes, MD5=F2CFF805893146E932E498FDDBD519B6, dated 2022-10-09)
]
- Use ProxyPass
so users don't have to put the port number, e.g., :8080, in the URL.
On Linux computers, if Tomcat is running in Apache, please modify the
Apache httpd.conf file
(usually in /etc/httpd/conf/ )
to allow HTTP traffic to/from ERDDAP without requiring the port number, e.g., :8080, in the URL.
As the root user:
- Modify the existing <VirtualHost> tag (if there is one),
or add one at the end of the file:
<VirtualHost *:80>
ServerName YourDomain.org
ProxyRequests Off
ProxyPreserveHost On
ProxyPass /erddap http://localhost:8080/erddap
ProxyPassReverse /erddap http://localhost:8080/erddap
</VirtualHost>
- Then restart Apache: /usr/sbin/apachectl -k graceful
(but sometimes it is in a different directory).
- (UNCOMMON)
If you are using
NGINX
(a web server and load balancer):
in order to get NGINX and ERDDAP working correctly with https,
you need to put the following snippet inside the Tomcat server.xml <Host> block:
<Valve className="org.apache.catalina.valves.RemoteIpValve"
remoteIpHeader="X-Forwarded-For"
protocolHeader="X-Forwarded-Proto"
protocolHeaderHttpsValue="https" />
And in the nginx config file, you need to set these headers:
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header REMOTE_ADDR $remote_addr;
proxy_set_header HTTP_CLIENT_IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
(Thanks to Kyle Wilcox.)
- Start Tomcat.
- (I don't recommend using the Tomcat Web Application Manager. If you don't
fully shutdown and startup Tomcat, sooner or later you will have PermGen
memory issues.)
- (In Linux or Mac OS, if you have created a special user to run Tomcat,
e.g., tomcat, remember to do the following steps as that user.)
- If Tomcat is already running, shut down Tomcat
with (in Linux or Mac OS) tomcat/bin/shutdown.sh
or (in Windows) tomcat\bin\shutdown.bat
On Linux, use ps -ef | grep tomcat before and after shutdown.sh
to make sure the tomcat process has stopped.
The process should be listed before the shutdown and eventually not listed
after the shutdown. It may take a minute or two for ERDDAP to fully shut down.
Be patient.
Or if it looks like it won't stop on its own, use:
kill -9 processID
- Start Tomcat with (in Linux or Mac OS) tomcat/bin/startup.sh
or (in Windows) tomcat\bin\startup.bat
- Is ERDDAP running?
Use a browser to try to view http://www.YourServer.org/erddap/status.html
ERDDAP starts up without any datasets loaded.
Datasets are loaded in a background thread and so become available one-by-one.
Trouble? Look in the log files.
- When a request from a user comes in, it goes to Apache (on Linux and Mac OS computers), then Tomcat, then ERDDAP.
- You can see what comes to Apache (and related errors) in the Apache log files.
- You can see what comes to Tomcat (and related errors)
in the Tomcat log files (tomcat/logs/catalina.out and other files in that directory).
- You can see what comes to ERDDAP,
diagnostic messages from ERDDAP, and error messages from ERDDAP,
in the ERDDAP <bigParentDirectory>logs/log.txt file.
- Tomcat doesn't start ERDDAP until Tomcat gets a request for ERDDAP.
So you can see in the Tomcat log files if it started ERDDAP or if there is an error message related to that attempt.
- When ERDDAP starts up, it renames the old ERDDAP log.txt file
(logArchivedAtCurrentTime.txt) and creates a new log.txt file.
So if the log.txt file is old, it is a sign that ERDDAP hasn't recently restarted.
ERDDAP writes log info to a buffer and only writes the buffer to the log file periodically,
but you can force ERDDAP to write the buffer to the log file by visiting .../erddap/status.html.
Trouble: Old Version of Java
If you are using a version of Java that is too old for ERDDAP,
ERDDAP won't run and you will see an error message in Tomcat's log file like
Exception in thread "main" java.lang.UnsupportedClassVersionError:
some/class/name: Unsupported major.minor version someNumber
The solution is to update to the most recent version of Java
and make sure that Tomcat is using it.
Trouble: Slow Startup First Time
Tomcat has to do a lot of work the first time an application like
ERDDAP is started; notably, it has to unpack the erddap.war file (which is like a .zip file).
On some servers, the first attempt to view ERDDAP stalls (30 seconds?) until
this work is finished.
On other servers, the first attempt will fail immediately. But if you wait
30 seconds and try again,
it will succeed if ERDDAP was installed correctly.
There is no fix for this. This is simply how Tomcat works.
But it only occurs the first time after you install a new version of ERDDAP.
- In the future, to shut down (and restart) ERDDAP, see
How to Shut Down and Restart Tomcat and ERDDAP.
- Troubles installing Tomcat or ERDDAP?
Email me at erd dot data at noaa dot gov . I will help you.
Or, you can join the ERDDAP Google Group / Mailing List
and post your question there.
- Email Notification of New Versions of ERDDAP
If you want to receive an email whenever a new version of ERDDAP
is available, send an email to erd dot data at noaa dot gov
requesting to be added to the ERDDAP Announcements mailing list.
This list averages roughly one email every three months.
- Customize your ERDDAP to highlight your organization (not NOAA ERD).
- Change the banner that appears at the top of all ERDDAP .html pages
by editing the <startBodyHtml5> tag in your datasets.xml file.
(If there isn't one, copy the default from ERDDAP's
[tomcat]/webapps/erddap/WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml file
into datasets.xml and edit it.)
For example, you could:
- Use a different image (i.e., your organization's logo).
- Change the background color.
- Change "ERDDAP" to "YourOrganization's ERDDAP"
- Change "Easier access to scientific data" to "Easier access to YourOrganization's data".
- Change the "Brought to you by" links to be links to your organization and funding sources.
- Change the information on the left side of the home page
by editing the <theShortDescriptionHtml> tag in your datasets.xml file.
(If there isn't one, copy the default from ERDDAP's
[tomcat]/webapps/erddap/WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml file
into datasets.xml and edit it.)
For example, you could:
- Describe what your organization and/or group does.
- Describe what kind of data this ERDDAP has.
- To change the icon that appears on browser tabs, put your organization's favicon.ico in
tomcat/content/erddap/images/ .
See https://en.wikipedia.org/wiki/Favicon.
- Make the changes listed in
Changes
in the section entitled "Things ERDDAP Administrators Need to Know and Do"
for all of the ERDDAP versions since the version you were using.
- If you are upgrading from ERDDAP version 2.18 or below, you need to switch to Java 17
and the related Tomcat 10.
See the regular ERDDAP installation instructions for
Java and
Tomcat.
You'll also have to copy your tomcat/content/erddap directory from your
old Tomcat installation to your new Tomcat installation.
- Download
erddap.war
into tomcat/webapps .
(version 2.22, 567,742,765 bytes, MD5=2B33354F633294213AE2AFDDCF4DA6D0, dated
2022-12-08)
- messages.xml
- Common: If you are upgrading from ERDDAP version 1.46 (or above)
and you just use the standard messages,
the new standard messages.xml will be installed automatically
(amongst the .class files via erddap.war).
- Rare: If you are upgrading from ERDDAP version 1.44 (or below),
you MUST delete the old messages.xml file:
tomcat/content/erddap/messages.xml .
The new standard messages.xml will be installed automatically
(amongst the .class files via erddap.war).
- Rare: If you always make changes to the standard messages.xml file (in place),
you need to make those changes to the new messages.xml file (which is
WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml after erddap.war
is decompressed by Tomcat).
- Rare: If you maintain a custom messages.xml file in tomcat/content/erddap/,
you need to figure out (via diff) what changes have been made to the default messages.xml
(which are in the new erddap.war as
WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml)
and modify your custom messages.xml file accordingly.
- Install the new ERDDAP in Tomcat:
* Don't use Tomcat Manager. Sooner or later there will be PermGen memory issues.
It is better to actually shutdown and startup Tomcat.
* Replace references to tomcat below with the actual
Tomcat directory on your computer.
- For Linux and Macs:
- Shutdown Tomcat:
From a command line, use: tomcat/bin/shutdown.sh
And use ps -ef | grep tomcat to see if/when
the process has been stopped. (It may take a minute or two.)
- Remove the decompressed ERDDAP installation:
In tomcat/webapps, use
rm -rf erddap
- Delete the old erddap.war file:
In tomcat/webapps, use rm erddap.war
- Copy the new erddap.war file from the temporary directory to
tomcat/webapps
- Restart Tomcat and ERDDAP:
use tomcat/bin/startup.sh
- View ERDDAP in your browser to check that the restart succeeded.
(Often, you have to try a few times and wait a minute before you see ERDDAP.)
- For Windows:
- Shutdown Tomcat:
From a command line, use: tomcat\bin\shutdown.bat
- Remove the decompressed ERDDAP installation:
In tomcat/webapps, use
del /S/Q erddap
- Delete the old erddap.war file:
In tomcat\webapps, use del erddap.war
- Copy the new erddap.war file from the temporary directory to
tomcat\webapps
- Restart Tomcat and ERDDAP: use tomcat\bin\startup.bat
- View ERDDAP in your browser to check that the restart succeeded.
(Often, you have to try a few times and wait a minute before you see ERDDAP.)
Troubles updating ERDDAP?
Email me at
erd dot data at noaa dot gov . I will help you.
Or, you can join the
ERDDAP Google Group / Mailing List
and post your question there.
- Use Ctrl-F To Find Things On This Web Page
All of the information about administering ERDDAP
(other than
working with datasets.xml)
is on this one, very long, .html web page,
not several .html pages as some people prefer.
The advantage of one .html web page is that you can use Ctrl-F (Command-F on a Mac)
in your web browser to search for text (for example, flag) within this web page.
Alternatively, at the top of this document, there is a
list of main topics (a Table of Contents).
- Internal Links
ERDDAP's web pages
have a large number of almost invisible, internal links
(the text is black and not underlined).
If you hover over one of these links (usually the first few words of headings
and paragraphs), the cursor becomes a hand.
If you click on the link, the URL is the internal link to that section of the
document. This makes it easy to refer to specific sections of ERDDAP web pages.
As an example, hover over, and click on, the bold "Internal Links" at the start
of this paragraph.
- Proxy Errors
Sometimes, a request to ERDDAP will return a Proxy Error,
an HTTP 502 Bad Gateway Error, or some similar error.
These errors are being thrown by Apache or Tomcat, not ERDDAP itself.
- If every request generates these errors, especially when you are first
setting up your ERDDAP, then it probably is a proxy or bad gateway error,
and the solution is probably to fix
ERDDAP's proxy settings.
This may also be the problem when an established ERDDAP suddenly starts
throwing these errors for every request.
- Otherwise, "proxy" errors are usually actually time out errors thrown by Apache or Tomcat.
Even when they happen relatively quickly, it is some sort of response from
Apache or Tomcat that occurs when ERDDAP is very busy, memory-limited,
or limited by some other resource.
In these cases, see the advice below to deal with
ERDDAP responding slowly.
Requests for a long time range (>30 time points)
from a gridded dataset are prone to time out failures,
which often appear as Proxy Errors,
because it takes significant time for ERDDAP to open all of the data files one-by-one.
If ERDDAP is otherwise busy during the request, the problem is more likely to occur.
If the dataset's files are compressed, the problem is more likely to occur,
although it's hard for a user to determine if a dataset's files are compressed.
The solution is to make several requests, each with a smaller time range.
How small of a time range?
I suggest starting really small (~30 time points?),
then (approximately) double the time range until the request fails,
then go back one doubling.
Then make all the requests (each for a different chunk of time)
needed to get all of the data.
An ERDDAP administrator can lessen this problem by increasing the
Apache timeout settings.
- Monitoring ERDDAP
We all want our data services to find their audience and be extensively used,
but sometimes your ERDDAP may be used too much, causing problems,
including super slow responses for all requests.
Our plan to avoid problems is:
- Monitor ERDDAP via the
status.html web page.
It has tons of useful information.
If you see that a huge number of requests are coming in,
or tons of memory being used, or tons of failed requests,
or each Major LoadDatasets is taking a long time,
or see any sign of things getting bogged down and responding slowly,
then look in ERDDAP's log.txt file
to see what's going on.
It's also useful to simply note how fast the status page responds.
If it responds slowly, that is an important indicator that
ERDDAP is very busy.
- Monitor ERDDAP via the Daily Report email.
- Watch for out-of-date datasets via the baseUrl/erddap/outOfDateDatasets.html web page
which is based on the optional
testOutOfDate global attribute.
- External Monitors
The methods listed above are ERDDAP's ways of monitoring itself.
It is also possible to make or use external systems to monitor your ERDDAP.
One project to do this is
Axiom's erddap-metrics project.
Such external systems have some advantages:
- They can be customized to provide the information you want, displayed in the way you want.
- They can include information about ERDDAP that ERDDAP can't access easily or at all
(for example, CPU usage, disk free space, ERDDAP response time as seen from the user's perspective,
ERDDAP uptime,
- They can provide alerts (emails, phone calls, texts) to administrators when
problems exceed some threshold.
- Blacklist users making multiple simultaneous requests!
If it is clear that some user is making more than one simultaneous request,
repeatedly and continuously, then add their IP address to ERDDAP's
<requestBlacklist> in your datasets.xml file.
Sometimes the requests are all from one IP address.
Sometimes they are from multiple IP addresses, but clearly the same user.
You can also blacklist people making tons of invalid requests
or tons of mind-numbingly inefficient requests.
Then, for each request they make, ERDDAP returns:
HTTP ERROR 403 - Access Forbidden --
Your IP address is on this ERDDAP's request blacklist.
Did you often submit more than one request at a time?
Did you often submit identical requests in a short period of time?
Did you submit a large number of invalid requests?
If you are ready to avoid these problems, please email
[ERDDAP administrator's email address] to request to be taken off of the blacklist.
Hopefully the user will see this message and contact you
to find out how to fix the problem and get off the blacklist.
Sometimes, they just switch IP addresses and try again.
It is like the balance of power between offensive and defensive weapons in war.
Here, the defensive weapons (ERDDAP) have a fixed capacity, limited by
the number of cores in the CPU, the disk access bandwidth, and the
network bandwidth.
But the offensive weapons (users, notably scripts) have unlimited
capacity:
- A single request for data from a lot of time points may cause ERDDAP
to open a huge number of files (in sequence or partly multi-threaded).
In extreme cases, one "simple" request can easily tie up the RAID attached to ERDDAP for a
minute, effectively blocking the handling of other requests.
- A single request may consume a large chunk of memory (even though
ERDDAP is coded to minimize the memory needed to handle large requests).
- Parallelization -
It is easy for a clever user to parallelize a big task by generating
lots of threads, each of which submits a separate request (which may be large
or small). This behavior is encouraged by the computer science community
as an efficient way to deal with a large problem
(and parallelizing is efficient in other circumstances).
Going back to the war analogy: users can make an essentially unlimited
number of simultaneous requests with the cost of each being essentially zero,
but the cost of each request coming into ERDDAP can be large and ERDDAP's response
capability is finite. Clearly, ERDDAP will lose this battle,
unless the ERDDAP administrator blacklists users who are making
multiple simultaneous requests which are unfairly crowding out other users.
- Multiple Scripts -
Now think about what happens when there are several clever users each running
parallelized scripts. If one user can generate so many requests that
other users are crowded out, then multiple such users can generate so many
requests that ERDDAP becomes overwhelmed and seemingly unresponsive. It is effectively a
DDOS attack
Again, the only defense for ERDDAP is to blacklist users making
multiple simultaneous requests which are unfairly crowding out other users.
- Inflated Expectations -
In this world of massive tech companies (Amazon, Google, Facebook, ...),
users have come to expect essentially unlimited capabilities
from the providers.
Since these companies are money making operations, the more users they have,
the more revenue they have to expand their IT infrastructure.
So they can afford a massive IT infrastructure
to handle requests. And they cleverly limit the number of requests
and cost of each request from users by limiting the kinds of requests that users can make so
that no single request is burdensome, and there is never a reason
(or a way) for users to make multiple simultaneous requests.
So these huge tech companies may have far more users than ERDDAP, but
they have massively more resources and clever ways to limit
the requests from each user. It's a manageable situation for
the big IT companies (and they get rich!) but not for ERDDAP installations.
Again, the only defense for ERDDAP is to blacklist users making
multiple simultaneous requests which are unfairly crowding out other users.
So users: Don't make multiple simultaneous requests or you will be blacklisted!
Clearly, it is best if your server has a lot of cores, a lot of memory
(so you can allocate a lot of memory to ERDDAP, more than it ever needs),
and a high bandwidth internet connection.
Then, memory is rarely or never a limiting factor,
but network bandwidth becomes the more common limiting factor.
Basically, as there are more and more simultaneous requests,
the speed to any given user decreases.
That naturally slows down the number of requests coming in if each user
is just submitting one request at a time.
ERDDAP Getting Data From THREDDS
If your ERDDAP gets some of its data from a THREDDS at your site,
there are some advantages to making a copy of the THREDDS data files
(at least for the most popular datasets) on another RAID that ERDDAP has access
to so that ERDDAP can serve data from the files directly.
At ERD, we do that for our most popular datasets.
- ERDDAP can get the data directly and not have to wait for THREDDS to reload the dataset or ...
- ERDDAP can notice and incorporate new data files immediately, so it doesn't
have to pester THREDDS frequently to see if the dataset has changed.
See
<updateEveryNMillis>.
- The load is split between 2 RAIDS and 2 servers,
instead of the request being hard on both ERDDAP and THREDDS.
- You avoid the mismatch problem caused by THREDDS having a small (by default) maximum request size.
ERDDAP has a system to handle the mismatch, but avoiding the problem is better.
- You have a backup copy of the data which is always a good idea.
In any case, don't ever run THREDDS and ERDDAP in the same Tomcat. Run them
in separate Tomcats, or better, on separate servers.
We find that THREDDS periodically gets in a state where requests just hang.
If your ERDDAP is getting data from a THREDDS and the THREDDS is in this state,
ERDDAP has a defense (it says the THREDDS-based dataset isn't available),
but it is still troublesome for ERDDAP because ERDDAP has to wait until the timeout
each time it tries to reload a dataset from a hung THREDDS.
Some groups (including ERD) avoid this by proactively restarting THREDDS frequently
(e.g., nightly in a cron job).
- If ERDDAP Is Responding Slowly
or if just certain requests are responding slowly,
you may be able to figure out if the slowness is reasonable
and temporary (e.g., because of lots of requests from scripts or WMS users),
or if something is inexplicably wrong and you need to
shut down and restart Tomcat and ERDDAP.
If ERDDAP is responding slowly, see the advice below to determine the
cause, which hopefully will enable you to fix the problem.
You may have a specific starting point (e.g., a specific request URL)
or a vague starting point (e.g., ERDDAP is slow).
You may know the user involved (e.g., because they emailed you), or not.
You may have other clues, or not.
Since all of these situations and all of the possible causes of the problems
blur together, the advice below tries to deal with all possible
starting points and all possible problems related to slow responses.
- How to
Shut Down
and
Restart Tomcat and ERDDAP
You don't need to shut down and restart Tomcat and ERDDAP
if ERDDAP is temporarily slow, slow for some known reason
(like lots of requests from scripts or WMS users),
or to apply changes to datasets.xml file.
You do need to shut down and restart Tomcat and ERDDAP
if you need to apply changes to the setup.xml file,
or if ERDDAP freezes, hangs, or locks up.
In extreme circumstances, Java may freeze for a minute or two
while it does a full garbage collection, but then recover. So it is good to
wait a minute or two to see if Java/ERDDAP is really frozen or
if it is just doing a long garbage collection.
(If garbage collection is a common problem,
allocate more memory to Tomcat.)
I don't recommend using the Tomcat Web Application Manager to
start or shutdown Tomcat.
If you don't fully shutdown and startup Tomcat, sooner or later you will
have PermGen memory issues.
To shutdown and restart Tomcat and ERDDAP:
- If you use Linux or a Mac:
(If you have created a special user to run Tomcat, e.g., tomcat,
remember to do the following steps as that user.)
- Use cd tomcat/bin
- Use ps -ef | grep tomcat to find the java/tomcat processID
(hopefully, just one process will be listed),
which we'll call javaProcessID below.
- If ERDDAP is frozen/hung/locked up, use kill -3 javaProcessID
to tell Java (which is running Tomcat)
to do a thread dump to the Tomcat log file: tomcat/logs/catalina.out .
After you reboot, you can diagnose the problem by finding the thread
dump information
(and any other useful information above it) in tomcat/logs/catalina.out
and also by reading relevant parts of the
ERDDAP log archive.
If you want, you can email that information to
erd dot data at noaa dot gov
so I can see what went wrong.
Or, you can join the ERDDAP Google Group / Mailing List
and post your question there.
- Use ./shutdown.sh
- Use ps -ef | grep tomcat repeatedly until the
java/tomcat process isn't listed.
Sometimes, the java/tomcat process will take up to two minutes to fully
shut down.
The reason is: ERDDAP sends a message to its background threads
to tell them to stop, but sometimes it takes these threads a long time
to get to a good stopping place.
- If after a minute or so, java/tomcat isn't stopping by itself,
you can use
kill -9 javaProcessID
to force the java/tomcat process to stop immediately.
If possible, use this only as a last resort.
The -9 switch is powerful, but it may cause various problems.
- To restart ERDDAP, use ./startup.sh
- View ERDDAP in your browser to check that the restart succeeded.
(Sometimes, you need to wait 30 seconds and try to load ERDDAP again
in your browser for it to succeed.)
- If you use Windows:
- Use cd tomcat/bin
- Use shutdown.bat
- You may want/need to use the Windows Task Manager (accessible via Ctrl Alt Del)
to ensure that the Java/Tomcat/ERDDAP process/application has fully stopped.
Sometimes, the process/application will take up to two minutes to shut down.
The reason is: ERDDAP sends a message to its background threads
to tell them to stop, but sometimes it takes these threads a long time
to get to a good stopping place.
- To restart ERDDAP, use startup.bat
- View ERDDAP in your browser to check that the restart succeeded.
(Sometimes, you need to wait 30 seconds and try to load ERDDAP again
in your browser for it to succeed.)
- Frequent Crashes or Freezes
If ERDDAP becomes slow, crashes or freezes, something is wrong.
Look in ERDDAP's log file
to try to figure out the cause. If you can't,
please email the details to erd dot data at noaa dot gov .
The most common problem is a troublesome user who is
running several scripts at once and/or someone making a large number of invalid requests.
If this happens, you should probably blacklist that user.
When a blacklisted user makes a request, the error message in the response encourages
them to email you to work out the problems. Then, you can encourage them
to run just one script at a time and to fix the problems in their script
(e.g., requesting data from a remote dataset that can't respond before timing out).
See <requestBlacklist> in your datasets.xml file.
In extreme circumstances, Java may freeze for a minute or two
while it does a full garbage collection, but then recover. So it is good to
wait a minute or two to see if Java/ERDDAP is really frozen or
if it is just doing a long garbage collection.
(If garbage collection is a common problem,
allocate more memory to Tomcat.)
If ERDDAP becomes slow or freezes and the problem isn't a troublesome user
or a long garbage collection,
you can usually solve the problem by
restarting ERDDAP.
My experience is that ERDDAP can run for months without needing a restart.
- Monitor ERDDAP
You can monitor your ERDDAP's status by looking at the /erddap/status.html page,
notably the statistics in the top section.
If ERDDAP becomes slow or freezes and the problem isn't just extremely heavy usage,
you can usually solve the problem by
restarting ERDDAP.
My experience is that ERDDAP can run for months without needing a restart.
You should only need to restart it if you want
to apply some changes you made to ERDDAP's setup.xml
or when you need to install new versions
of ERDDAP, Java, Tomcat, or the operating system.
If you need to restart ERDDAP frequently, something is wrong.
Look in ERDDAP's log file
to try to figure out the cause. If you can't,
please email the details to erd dot data at noaa dot gov .
As a temporary solution, you might try using
Monit
to monitor your ERDDAP and restart it if needed.
Or, you could make a cron job to restart ERDDAP (proactively) periodically.
It may be a little challenging to write a script to automate monitoring and restarting ERDDAP.
Some tips that might help:
- You can simplify testing if the Tomcat process is still running
by using the -c switch with grep:
ps -u tomcatUser | grep -c java
That will reduce the output to "1" if the tomcat process is still alive,
or "0" if the process has stopped.
- If you are good with gawk, you can extract the processID from the results of
ps -u tomcatUser | grep java, and use the
processID in other lines of the script.
If you do set up Monit or a cron job, please email the details to
erd dot data at noaa dot gov .
Or, you can join the ERDDAP Google Group / Mailing List
and share the information there.
- PermGen
If you repeatedly use Tomcat Manager to Reload (or Stop and Start) ERDDAP,
ERDDAP may fail to start up and throw java.lang.OutOfMemoryError: PermGen.
The solution is to periodically (or every time?)
shut down and restart tomcat and ERDDAP,
instead of just reloading ERDDAP.
[Update: This problem was greatly minimized or fixed in ERDDAP version 1.24.]
- log.txt
If ERDDAP doesn't start up or if something isn't working as expected,
it is very useful to look at the error and diagnostic messages in the ERDDAP log file.
- The log file is bigParentDirectory/logs/log.txt
(bigParentDirectory is specified in setup.xml).
If there is no log.txt file or if the log.txt file hasn't been updated since
you restarted ERDDAP, look in the Tomcat Log Files
to see if there is an error message there.
- Types of diagnostic messages in the log file:
- The word "error" is used when something went so wrong that the procedure failed to complete.
Although it is annoying to get an error, the error forces you to deal with the problem.
Our thinking is that it is better to throw an error, than to have ERDDAP hobble along,
working in a way you didn't expect.
- The word "warning" is used when something went wrong, but the procedure was able to be completed.
These are pretty rare.
- Anything else is just an informative message.
You can control how much information is logged with
<logLevel>
datasets.xml.
- Dataset reloads and user responses that take >10 seconds to finish (successfully or unsuccessfully) are
marked with "(>10s!)". Thus, you can search the log.txt file for this phrase to find
the datasets that were slow to reload or the request number of the requests that were slow to finish. You can then
look higher in the log.txt file to see what the dataset problem was or what the user request was and who it was from.
These slow dataset loads and user requests are sometimes taxing on ERDDAP. So knowing more about these
requests can help you identify and solve problems.
- Information is written to the log file on the disk drive in fairly big chunks.
The advantage is that this is very efficient -- ERDDAP will never block
waiting for information to be written to the log file.
The disadvantage is that the log will almost always end with a
partial message, which won't be completed until the next chunk is written.
You can make it up-to-date (for an instant) by viewing your ERDDAP's
status web page at https://your.domain.org/erddap/status.html (or http:// if https isn't enabled).
- When the log.txt files gets to 20 MB,
the file is renamed log.txt.previous and a new log.txt file is created.
So log files don't accumulate.
In setup.xml, you can specify a different maximum size for the log file, in MegaBytes.
The minimum allowed is 1 (MB). The maximum allowed is 2000 (MB).
The default is 20 (MB). For example:
<logMaxSizeMB>20</logMaxSizeMB>
- Whenever you restart ERDDAP,
ERDDAP makes an archive copy of the
log.txt and log.txt.previous files with a time stamp in the file's name.
If there was trouble before the restart,
it may be useful to analyze these archived files for clues as to what the trouble was.
You can delete the archive files if they are no longer needed.
- Parsing log.txt
ERDDAP's log.txt file isn't designed for parsing (although you might be able to create
regular expressions that extract desired information).
It is designed to help a human figure out what is going wrong
when something is going wrong. When you submit a bug or problem report
to ERDDAP developers, when possible, please include all of the information from the log.txt
file related to the troublesome request.
For efficiency reasons, ERDDAP only writes information to log.txt
after a large chunk of information has accumulated.
So if you visit log.txt right after an error has occurred, information
related to the error may not yet have been written to log.txt.
In order to get perfectly up-to-date information from log.txt,
visit your ERDDAP's
status.html page.
When ERDDAP processes that request,
it flushes all pending information to log.txt.
For ERDDAP usage statistics, please use the
Apache and/or Tomcat log files
instead of ERDDAP's log.txt.
Note that ERDDAP's
status.html page (some) and
Daily Report (more)
have a large number of usage statistics precalculated for you.
- Tomcat Log Files
If ERDDAP doesn't start up because an error occurred very early in ERDDAP's startup,
the error message will show up in Tomcat's log files
(tomcat/logs/catalina.today.log or
tomcat/logs/catalina.out),
not in ERDDAP's log.txt file.
Usage Statistics:
For most of the information that people want to gather from a log file
(e.g., usage statistics), please use the Apache and/or Tomcat log files.
They are nicely formatted and have that type of information.
There are numerous tools for analyzing them, for example,
AWStats,
ElasticSearch's Kibana,
and
JMeter,
but search the web to find the right tool for your purposes.
Note that the log files only identify users as IP addresses.
There are websites to help you get information related to a given IP address, e.g.,
WhatIsMyIPAddress,
but you normally won't be able to find the name of the user.
Also, because of DHCP,
a given user's IP address may be different on different days,
or different users may have the same IP address at different times.
Alternatively, you can use something like
Google Analytics.
But beware: when you use external services like Google Analytics,
you are giving up your users' privacy
by giving Google full access to their activity on your site
which Google (and others?) can keep forever and use for any purpose
(perhaps not technically, but probably in practice).
Your users haven't consented to this and probably aren't aware
that they will be tracked on your website, just as they probably aren't aware
of the extent they are being tracked on almost all websites.
These days, many users are very concerned that everything they do on the web is being
monitored by these big companies (Google, Facebook, etc.) and by the government,
and find this an unwarranted intrusion into their lives (as in the book, 1984).
This has driven many users to install products like
Privacy Badger to minimize tracking,
to use alternative browsers like
Tor Browser (or turn off tracking in traditional browsers),
and to use alternative search engines like
Duck Duck Go.
If you use a service like Google Analytics,
please at least document its use and the consequences by changing the
<standardPrivacyPolicy> tag in ERDDAP's
[tomcat]/webapps/erddap/WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml file.
- emailLogYEAR-MM-DD.txt
ERDDAP always writes the text of all out-going email messages
in the current day's emailLogYEAR-MM-DD.txt file in bigParentDirectory/logs
(bigParentDirectory is specified in setup.xml).
- If the server can't send out email messages, or if you have configured ERDDAP not to send
out email messages, or if you are just curious, this file is a convenient way to see
all of the email messages that have been sent out.
- You can delete previous days' email log files if they are no longer needed.
- Daily Report
The Daily Report has lots of useful information -- all of the information from
your ERDDAP's /erddap/status.html page and more.
- It is the most complete summary of your ERDDAP's status.
- Among other statistics, it includes a list of datasets that didn't load and the exceptions
they generated.
- It is generated when you start up ERDDAP (just after ERDDAP finishes trying to load all of
the datasets) and generated soon after 7 am local time every morning.
- Whenever it is generated, it is written to ERDDAP's log.txt file.
- Whenever it is generated, it is emailed to <emailDailyReportsTo> and
<emailEverythingTo> (which are specified in
setup.xml)
provided you have set up the email system (in setup.xml).
- Status Page
You can view the status of your ERDDAP from any browser by going to
<baseUrl>/erddap/status.html
- This page is generated dynamically, so it always has up-to-the-moment statistics for your ERDDAP.
- It includes statistics regarding the number of requests, memory usage, thread stack traces,
the taskThread, etc.
- Because the Status Page can be viewed by anyone, it doesn't include quite as much information
as the Daily Report.
- Adding/Changing Datasets
ERDDAP usually rereads datasets.xml every loadDatasetsMinMinutes
(specified in setup.xml).
So you can make changes to datasets.xml any time, even while ERDDAP is running.
A new dataset will be detected soon, usually within
loadDatasetsMinMinutes.
A changed dataset will be reloaded when it is reloadEveryNMinutes
old (as specified in datasets.xml).
- A Flag File
Tells ERDDAP to Try to
Reload a Dataset As Soon As Possible
- ERDDAP won't notice any changes to a dataset's setup in datasets.xml until ERDDAP reloads the dataset.
- To tell ERDDAP to reload a dataset as soon as possible
(before the dataset's <reloadEveryNMinutes> would cause it to be reloaded),
put a file in bigParentDirectory/flag
(bigParentDirectory is specified in setup.xml) that has the
same name as the dataset's datasetID.
This tells ERDDAP to try to reload that dataset ASAP.
The old version of the dataset will remain available to users until
the new version is available and swapped atomically into place.
For EDDGridFromFiles and EDDTableFromFiles, the reloading dataset will look for
new or changed files, read those, and incorporate them into the dataset.
So the time to reload is dependent on the number of new or changed files.
If the dataset has active="false", ERDDAP will remove the dataset.
- One variant of the /flag directory
is the /badFilesFlag directory. (Added in ERDDAP v2.12.)
If you put a file in the bigParentDirectory/badFilesFlag directory
with a datasetID as the file name (the file contents don't matter),
then as soon as ERDDAP sees the badFilesFlag file, ERDDAP will:
- Delete the badFilesFlag file.
- Delete the badFiles.nc file (if there is one), which has the list of bad files for that dataset.
For datasets like EDDGridSideBySide that have childDatasets,
this also deletes the badFiles.nc file for all child datasets.
- Reload the dataset ASAP.
Thus, this causes ERDDAP to try again to work with the files previously (erroneously?) marked as bad.
- Another variant of the /flag directory
is the /hardFlag directory. (Added in ERDDAP v1.74.)
If you put a file in bigParentDirectory/hardFlag
with a datasetID as the file name (the file contents don't matter),
then as soon as ERDDAP sees the hardFlag file, ERDDAP will:
- Delete the hardFlag file.
- Remove the dataset from ERDDAP.
- Delete all of the information that ERDDAP has stored
about this dataset.
For EDDGridFromFiles and EDDTableFromFiles subclasses,
this deletes the internal database of data files and their contents.
For datasets like EDDGridSideBySide that have childDatasets,
this also deletes the internal database of data files and their contents
for all child datasets.
- Reload the dataset.
For EDDGridFromFiles and EDDTableFromFiles subclasses,
this causes ERDDAP to reread all of the data files.
Thus, the reload time is dependent on the total number of data files in the dataset.
Because the dataset was removed from ERDDAP when the hardFlag was noticed,
the dataset will be unavailable until the dataset finishes reloading.
Be patient. Look in the log.txt file if you want to see what's going on.
The hardFlag variant deletes the dataset's stored information
even if the dataset isn't currently loaded in ERDDAP.
HardFlags are very useful when you do something that causes a
change in how ERDDAP reads and interprets the source data, for example,
when you install a new version of ERDDAP or when you have made a change to a
dataset's definition in datasets.xml
- The contents of the flag, badFilesFlag, and hardFlag files are irrelevant.
ERDDAP just looks at the file name to get the datasetID.
- In between major dataset reloads, ERDDAP looks continuously for flag, badFilesFlag, and hardFlag files.
- Note that when a dataset is reloaded, all files in the
bigParentDirectory/cache/datasetID
directory are deleted. This includes .nc and image files that are normally cached
for ~15 minutes.
- Note that if the dataset's xml includes
active="false",
a flag will cause the dataset to be made inactive
(if it is active), and in any case, not reloaded.
- Any time ERDDAP runs LoadDatasets to do a major reload (the timed reload
controlled by <loadDatasetsMinMinutes>) or a minor reload
(as a result of an external or internal flag),
ERDDAP reads all <decompressedCacheMaxGB>, <decompressedCacheMaxMinutesOld>,
<user>, <requestBlacklist>, <slowDownTroubleMillis>,
and <subscriptionEmailBlacklist> tags and switches to the new settings.
So you can use a flag as a way to get ERDDAP to notice changes
to those tags ASAP.
ERDDAP has a web service so that flags can be set via URLs.
- For example,
https://coastwatch.pfeg.noaa.gov/erddap/setDatasetFlag.txt?datasetID=rPmelTao&flagKey=123456789
(that's a fake flagKey) will set a flag for the rPmelTao dataset.
- There is a different flagKey for each datasetID.
- Administrators can see a list of flag URLs for all datasets by looking at the bottom of
their Daily Report email.
- Administrators should treat these URLs as confidential, since they give someone the right to
reset a dataset at will.
- If you think the flagKeys have fallen into the hands of someone who is abusing them,
you can change <flagKeyKey> in setup.xml and
restart ERDDAP
to force ERDDAP to generate and use a different set of flagKeys.
- If you change <flagKeyKey>, delete all of the old subscriptions (see
the list in your Daily Report) and remember to send the new URLs to the
people who you do want to have them.
The flag system can serve as the basis for a more efficient mechanism for telling ERDDAP when to
reload a dataset. For example, you could set a dataset's <reloadEveryNMinutes>
to a large number (e.g., 10080 = 1 week).
Then, when you know the dataset has changed (perhaps because you added a file to the dataset's data
directory), set a flag so that the dataset is reloaded as soon as possible.
Flags are usually seen quickly. But if the LoadDatasets thread is already
busy, it may be a while before it is available to act on the flag.
But the flag system is much more responsive and much more efficient than setting
<reloadEveryNMinutes> to a small number.
- Force Dataset Removal
If a dataset is active in ERDDAP and you want to deactivate it temporarily or permanently:
- In datasets.xml for the dataset, set active="false" in the dataset tag.
- Wait for ERDDAP to remove the dataset during the next major reload
or set a flag
for the dataset to tell ERDDAP to notice this change as soon as possible.
When you do this, ERDDAP doesn't throw out any information it may have
stored about the dataset and certainly doesn't do anything to the actual data.
- Then you can leave the active="false" dataset in datasets.xml
or remove it.
- When Are Datasets Reloaded?
A thread called RunLoadDatasets is the master thread that controls
when datasets are reloaded. RunLoadDatasets loops forever:
- RunLoadDatasets notes the current time.
- RunLoadDatasets starts a LoadDatasets thread to do a "majorLoad".
You can see information about the current/previous majorLoad at the top of
your ERDDAP's
/erddap/status.html page (for example,
status page example).
- LoadDatasets makes a copy of datasets.xml.
- LoadDatasets reads through the copy of datasets.xml and, for each dataset,
sees if the dataset needs to be (re)loaded or removed.
- If a flag file exists for this dataset,
the file is deleted and
the dataset is removed if active="false"
or (re)loaded if active="true" (regardless of the dataset's age).
- If the dataset's dataset.xml chunk has active="false"
and the dataset is currently loaded (active), it is unloaded (removed).
- If the dataset has active="true" and the dataset isn't already loaded,
it is loaded.
- If the dataset has active="true" and the dataset is already loaded,
the data set is reloaded if the dataset's age (time since last load) is greater than
its <reloadEveryNMinutes> (default = 10080 minutes),
otherwise, the dataset is left alone.
- LoadDatasets finishes.
The RunLoadDatasets thread waits for the LoadDatasets thread to finish.
If LoadDatasets takes longer than loadDatasetsMinMinutes
(as specified in setup.xml), RunLoadDatasets interrupts the LoadDatasets thread.
Ideally, LoadDatasets notices the interrupt and finishes.
But if it doesn't notice the interrupt within a minute,
RunLoadDatasets calls loadDatasets.stop(), which is undesirable.
- While the time since the start of the last majorLoad is less than
loadDatasetsMinMinutes (as specified in setup.xml, e.g., 15 minutes),
RunLoadDatasets repeatedly looks for flag files
in the bigParentDirectory/flag directory. If one or more flag files are
found, they are deleted, and RunLoadDatasets starts a LoadDatasets thread to do a
"minorLoad" (majorLoad=false). You can't see minorLoad information on
your ERDDAP's /erddap/status.html page.
- LoadDatasets makes a copy of datasets.xml.
- LoadDatasets reads through the copy of datasets.xml and,
for each dataset for which there was a flag file:
- If the dataset's dataset.xml chunk has active="false"
and the dataset is currently loaded (active), it is unloaded (removed).
- If the dataset has active="true", the dataset is (re)loaded,
regardless of its age.
Non-flagged datasets are ignored.
- LoadDatasets finishes.
- RunLoadDatasets goes back to step 1.
Notes:
- Startup
When you restart ERDDAP, every dataset with active="true" is loaded.
- Cache
When a dataset is (re)loaded, its cache (including any data response files
and/or image files) is emptied.
- Lots of Datasets
If you have a lot of datasets and/or one or more datasets are slow to (re)load,
a LoadDatasets thread may take a long time to finish its work,
perhaps even longer than loadDatasetsMinMinutes.
- One LoadDatasets Thread
There is never more than one LoadDatasets thread running at once.
If a flag is set when LoadDatasets is already running,
the flag probably won't be noticed or acted on until that LoadDatasets thread finishes running.
You might say: "That's stupid. Why don't you just start a bunch of new threads
to load datasets?" But if you have lots of datasets which get data from
one remote server, even one LoadDatasets thread will put substantial stress
on the remote server. The same is true if you have lots of datasets which get data from
files on one RAID. There are rapidly diminishing returns from having
more than one LoadDatasets thread.
- Flag = ASAP
Setting a flag just signals that the dataset should be (re)loaded
as soon as possible, not necessarily immediately.
If no LoadDatasets thread is currently running, the dataset will start to be reloaded
within a few seconds.
But if a LoadDatasets thread is currently running, the dataset probably won't be reloaded
until after that LoadDatasets thread is finished.
- Flag File Deleted
In general, if you put a flag file in the bigParentDirectory/erddap/flag
directory (by visiting the dataset's flagUrl or putting an actual file there), the
dataset will usually be reloaded very soon after that flag file is deleted.
- Flag versus Small reloadEveryNMinutes
If you have some external way of knowing when a dataset needs to be reloaded
and if it is convenient for you,
the best way to make sure that a dataset is always up-to-date is to
set its reloadEveryNMinutes to a large number (10080?) and
set a flag (via a script?) whenever it needs to be reloaded.
That is the system that EDDGridFromErddap and EDDTableFromErddap use
receive messages that the dataset needs to be reloaded.
- Look in log.txt
Lots of relevant information is written to the
bigParentDirectory/logs/log.txt file.
If things aren't working as you expect, looking at log.txt
lets you diagnose the problem by finding out exactly what ERDDAP did.
- Search for "majorLoad=true" for the start of major LoadDataset threads.
- Search for "majorLoad=false" for the start of minor LoadDatasets threads.
- Search for a given dataset's datasetID for information about it being
(re)loaded or queried.
- Cached Responses
In general, ERDDAP doesn't cache (store) responses to user requests.
The rationale was that most requests would be slightly different so the cache wouldn't be very effective.
The biggest exceptions are requests for image files
(which are cached since browsers and programs like Google Earth often re-request images)
and requests for .nc files (because they can't be created on-the-fly).
ERDDAP stores each dataset's cached files in a different directory:
bigParentDirectory/cache/datasetID
since a single cache directory might have a huge number of files which might become slow to access.
Files are removed from the cache for one of three reasons:
- All files in this cache are deleted when ERDDAP is restarted.
- Periodically, any file more than <cacheMinutes> old (as specified in
setup.xml) will be deleted.
Removing files in the cache based on age (not Least-Recently-Used) ensures that files won't stay
in the cache very long.
Although it might seem like a given request should always return the same response, that isn't true.
For example, a tabledap request which includes &time>someTime will change if
new data arrives for the dataset.
And a griddap request which includes [last] for the time dimension will change if new
data arrives for the dataset.
- Images showing error conditions are cached, but only for a few minutes (it's a difficult situation).
- Every time a dataset is reloaded, all files in that dataset's cache are deleted.
Because requests may be for the "last" index in a gridded dataset, files in the cache may become
invalid when a dataset is reloaded.
- Stored Dataset Information
For all types of datasets, ERDDAP gathers lots of information when a dataset
is loaded and keeps that
in memory. This allows ERDDAP to respond very quickly to searches, requests
for lists of datasets,
and requests for information about a dataset.
For a few types of datasets (notably EDDGridCopy, EDDTableCopy,
EDDGridFromXxxFiles, and
EDDTableFromXxxFiles), ERDDAP stores on disk some information about the
dataset that is reused
when the dataset is reloaded. This greatly speeds the reloading process.
- Some of the dataset information files are human-readable .json files and are stored in
bigParentDirectory/dataset/last2LettersOfDatasetID/datasetID .
- ERDDAP only deletes these files in unusual situations, e.g., if you add
or delete a variable from the dataset's datasets.xml chunk.
- Most changes to a dataset's datasets.xml chunk (e.g., changing a
global attribute or a variable attribute) don't necessitate that you delete these files.
A regular dataset reload will handle these types of changes.
You can tell ERDDAP to reload a dataset ASAP by setting a
flag for the dataset.
- Similarly, the addition, deletion, or change of data files will be handled when
ERDDAP reloads a dataset.
But ERDDAP will notice this type of change soon and automatically
if the dataset is using the
<updateEveryNMillis>
system.
- It should only rarely be necessary for you to delete these files.
The most common situation where you need to force ERDDAP to delete the stored
information (because it is out-of-date/incorrect and won't be automatically
fixed by ERDDAP)
is when you make changes
to the dataset's datasets.xml chunk that affect how ERDDAP interprets
data in the source data files, for example, changing the time variable's format string.
- To delete a dataset's stored information files from an ERDDAP that is running
(even if the dataset isn't currently loaded), set a
hardFlag for that dataset.
Remember that if a dataset is an aggregation of a large number of files,
reloading the dataset may take considerable time.
- To delete a dataset's stored information files when ERDDAP isn't running, run
DasDds for that dataset
(which is easier than figuring in which directory the info is located and
deleting the files by hand).
Remember that if a dataset is an aggregation of a large number of files,
reloading the dataset may take considerable time.
- Memory Status
ERDDAP shouldn't ever crash or freeze up. If it does, one of the
most likely causes is insufficient memory. You can monitor memory usage
by looking at the status.html web page, which includes a line like
0 gc calls, 0 requests shed, and 0 dangerousMemoryEmails since last major LoadDatasets
(those are progressively more serious events)
and MB inUse and gc Calls columns in the table of statistics.
You can tell how memory-stressed your ERDDAP is by watching these numbers.
Higher numbers indicate more stress.
- MB inUse should always be less than half of the
-Xmx memory setting.
Larger numbers are a bad sign.
- gc calls indicates the number of times ERDDAP called the
garbage collector to try to alleviate high memory usage.
If this gets to be >100, that's a sign of serious trouble.
- shed indicates the number of incoming requests that
were shed (with HTTP error number 503, Service Unavailable) because
memory use was already too high. Ideally, no requests should be shed.
It's okay if a few requests are shed, but a sign of serious trouble if many are shed.
- dangerousMemoryEmails -
If memory use becomes dangerously high, ERDDAP sends an email to the email addresses listed in
<emailEverythingTo> (in setup.xml) with a list of the active user requests.
As the email says,
please forward these emails to Chris.John at noaa.gov so we can use the information
to improve future versions of ERDDAP.
If your ERDDAP is memory-stressed:
- Consider allocating more of your server's memory to ERDDAP by changing
the Tomcat ‑Xmx memory setting.
- If you've already allocated as much memory as you can to ERDDAP via -Xmx,
consider buying more memory for your server. Memory is cheap (compared
to the price of a new server or your time)! Then increase -Xmx.
- In datasets.xml, set <nGridThreads> to 1,
set <nTableThreads> to 1, and
set <ipAddressMaxRequestsActive> to 1.
- Look at the requests in log.txt for inefficient or troublesome (but legitimate) requests.
Add their IP addresses to <requestBlacklist> in datasets.xml.
The blacklist error message includes the ERDDAP administrator's email address
with the hope that those users will contact you so that you can work with
them to use ERDDAP more efficiently.
It's good to keep a list of IP addresses you blacklist and why,
so that you can work with the users if they contact you.
- Look at the requests in log.txt for requests from malicious users.
Add their IP addresses to <requestBlacklist> in datasets.xml.
If similar requests are coming from multiple similar IP address,
you can use some who-is services (e.g.,
https://www.whois.com/whois/)
to find out the range of IP addresses from that source and blacklist the
entire range. See the
<requestBlacklist> documentation.
- OutOfMemoryError
When you set up ERDDAP, you specify the maximum amount of memory that Java
can use via the -Xmx setting.
If ERDDAP ever needs more memory than that,
it will throw a java.lang.OutOfMemoryError. ERDDAP does a lot of checking to
enable it to handle that error gracefully
(e.g., so a troublesome request will fail, but the system retains its integrity).
But sometimes, the error damages system integrity and you have to restart ERDDAP.
Hopefully, that is rare.
The quick and easy solution to an OutOfMemoryError is to increase the
-Xmx setting,
but you shouldn't ever increase the -Xmx setting to more than 80% of the
physical memory in the server (e.g., for a 10GB server, don't set -Xmx above
8GB). Memory is relatively cheap, so it may be a good option to increase
the memory in the server. But if you have maxed out the memory in the server
or for other reasons can't increase it, you need to deal more directly with the
cause of the OutOfMemoryError.
If you look in the log.txt
file to see what ERDDAP was doing when the error arose,
you can usually get a good clue as to the cause of the OutOfMemoryError.
There are lots of possible causes, including:
- A single huge data file can cause the OutOfMemoryError, notably, huge ASCII
data files.
If this is the problem, it should be obvious because ERDDAP will fail to load the dataset
(for tabular datasets) or read data from that file (for gridded datasets).
The solution, if feasible, is to split the file into multiple files.
Ideally, you can split the file into logical chunks.
For example, if the file has 20 month's worth of data, split it into 20 files,
each with 1 month's worth of data.
But there are advantages even if the main file is split up arbitrarily.
This approach has multiple benefits:
a) This will reduce the memory needed to read the data files to 1/20th,
because only one file is read at a time.
b) Often, ERDDAP can deal with requests much faster because it only has to look
in one or a few files to find the data for a given request.
c) If data collection is ongoing, then the existing 20 files can remain unchanged,
and you only need to modify one, small, new file to add the next month's worth of data to the dataset.
- A single huge request can cause the OutOfMemoryError.
In particular, some of the orderBy options have the entire response in memory
for a second (e.g., to do a sort). If the response is huge, it can lead to the error.
There will always be some requests which are, in various ways, too big.
You can solve the problem by increasing the -Xmx setting.
Or, you can encourage the user to make a series of smaller requests.
- It is unlikely that a large number of files would cause the file index that
ERDDAP creates to be so large that that file would cause the error.
If we assume that each file uses 300 bytes, then 1,000,000 files would only take up 300MB.
But datasets with a huge number of data files cause other problems for ERDDAP,
notably, it takes a long time for ERDDAP to open all those data files
when responding to a user request for data.
In this case, the solution may be to aggregate the files so that there are fewer data files.
For tabular datasets, it is often great if you save the data from the current dataset in
CF Discrete Sampling Geometries (DSG)
Contiguous Ragged Array data files (request .ncCF files from ERDDAP) and then make a new dataset.
These files can be handled very efficiently with ERDDAP's
EDDTableFromNcCFFiles).
If they are logically organized (each with data for a chunk of space and time),
ERDDAP can extract data from them very quickly.
- For tabular datasets that use the
<subsetVariables>
attribute, ERDDAP makes a table of unique combinations of the values of
those variables.
For huge datasets or when <subsetVariables> is misconfigured,
this table can be large enough to cause OutOfMemoryErrors.
The solution is to remove variables from the list of <subsetVariables>
for which there are a large number of values, or remove variables
as needed until the size of that table is reasonable.
The parts of ERDDAP that use the subsetVariables system don't work well
(e.g., web pages load very slowly) when there are more than 100,000 rows in that table.
- It's always possible that several simultaneous large requests
(on a really busy ERDDAP) can combine to cause memory trouble.
For example, 8 requests, each using 1GB each, would cause problems for an
-Xmx=8GB setup. But it is rare that each request would be at the peak of its
memory use simultaneously.
And you would easily be able to see that your ERDDAP is really busy with big requests.
But, it's possible.
It's hard to deal with this problem other than by increasing the -Xmx setting.
- There are other scenarios.
If you look at the
log.txt
file to see what ERDDAP was doing when the error arose,
you can usually get a good clue as to the cause.
In most cases, there is a way to minimize that problem (see above),
but sometimes you just need more memory and a higher -Xmx setting.
- Too Many Open Files
Starting with ERDDAP v2.12,
ERDDAP has a system to monitor the number of open files (which includes sockets and some other things, not just files)
in Tomcat on Linux computers.
If some files mistakenly never get closed (a "resource leak"), the number of open files may increase until
it exceeds the maximum allowed by the operating system and numerous really bad things happen.
So now, on Linux computers (because the information isn't available for Windows):
- There is an "Open Files" column on the far right of the status.html web page
showing the percent of max files open. On Windows, it just shows "?".
- When ERDDAP generates that information at the end of each major dataset reload,
it will print to the log.txt file:
openFileCount=current of max=max %=percent
- If the percentage is >50%, an email is sent to the ERDDAP administrator
and the emailEverythingTo email addresses.
If the percentage is 100%, ERDDAP is in terrible trouble. Don't let this happen.
If the percentage is >75%, ERDDAP is close to terrible trouble. That's not okay.
If the percentage is >50%, it is very possible that a spike will cause the percentage to hit 100.
If the percentage is ever >50%, you should:
- Increase the maximum number of open files allowed by either:
- Making these changes each time before you start tomcat (put them in the Tomcat startup.sh file?):
ulimit -Hn 16384
ulimit -Sn 16384
- Or making a permanent change by editing (as root) /etc/security/limits.conf and adding the lines:
tomcat soft nofile 16384
tomcat hard nofile 16384
Those commands assume that the user running Tomcat is called "tomcat".
On many Linux variants, you have to restart the server to apply those changes.
For both options, the "16384" above is an example. You choose the number that you think is best.
- Restart ERDDAP. The operating system will close any open files.
- Unusual Activity: >25% of requests failed
As part of every reloadDatasets, which is usually every 15 minutes,
ERDDAP looks at the percentage of requests which failed since the last reloadDatasets.
If it is >25%, ERDDAP sends an email to the ERDDAP administrator with the
subject "Unusual Activity: >25% of requests failed".
That email includes a tally near the bottom entitled "Requester's IP Address (Failed) (since last Major LoadDatasets)".
Search for that. It tells you the IP address of the computers making the most failed requests.
You can then search for those IP addresses in the [bigParentDirectory]/logs/log.txt file
and see what type of requests they are making.
You can use the user's IP number (for example, with
https://whatismyipaddress.com/ip-lookup)
to try to figure out who or what the user is.
Sometimes that will tell you pretty accurately who the user is (e.g., it's a search engine's web crawler).
Most of the time it just gives you a clue
(e.g., it's an amazonaws computer, it's from some university,
it's someone in some specific city).
By looking at the actual request, the IP number, and the error message
(all from log.txt) for
a series of errors, you can usually figure out basically what is going wrong.
In my experience, there are four common causes of lots of failed requests:
1) The requests are malicious (e.g., looking for security weaknesses,
or making requests and then cancelling them before they are completed).
You should use <requestBlacklist> in datasets.xml to blacklist those IP addresses.
2) A search engine is naively trying the URLs listed in ERDDAP web pages and
ISO 19115 documents. For example, there are many places which list the base OPeNDAP URL, for example,
https://coastwatch.pfeg.noaa.gov/erddap/griddap/jplMURSST, to which the user is
supposed to add a file type (e.g., .das, .dds, .html).
But the search engine doesn't know this. And the request to the base URL fails.
A related situation is when the search engine generates bizarre requests or
tries to fill out forms in order to get to "hidden" web pages.
But the search engines often do a bad job of this, leading to failures.
The solution is: create a robots.txt file.
3) Some user is running a script that is repeatedly asking for something that isn't there.
Maybe it is a dataset that used to exist, but is gone now (temporarily or permanently).
Scripts often don't expect this and so don't deal with it intelligently.
So the script just keeps making requests and the requests keep failing.
If you can guess who the user is (from the IP number above), contact them
and tell them the dataset is no longer available and ask them to change their script.
4) Something is really wrong with some dataset. Usually, ERDDAP will
make the troubled dataset inactive. Sometimes it doesn't, so all the requests
to it just lead to errors. If so, fix the problem with the dataset or (if you can't)
set the dataset to
active="false".
Of course, this may lead to problem #2.
Sometimes the errors aren't so bad, notably, if ERDDAP can detect the error
and respond very quickly (<=1ms). So you may decide to take no action.
If all else fails, there is a universal solution:
add the user's IP number to the <requestBlacklist>.
This isn't as bad or as drastic an option as it might seem.
The user will then get an error message saying s/he has been blacklisted and
telling them your (the ERDDAP administrator's) email address.
Sometimes the user will contact you and you can resolve the problem.
Sometimes the user doesn't contact you and you will see the exact same behavior
coming from a different IP number the next day.
Blacklist the new IP number and hope that they will eventually get the message.
(Or this is your Groundhog Day, from which you will never escape. Sorry.)
- robots.txt
The search engine companies use web crawlers (e.g., GoogleBot) to examine all of
the pages on the web to add the content to the search engines.
For ERDDAP, that is basically good. ERDDAP has lots of links between pages, so the crawlers
will find all of the web pages and add them to the search engines.
Then, users of the search engines will be able to find datasets on your ERDDAP.
Unfortunately, some web crawlers (e.g., GoogleBot) are now filling out and submitting
forms in order to find additional content. For web commerce sites, this is great.
But this is terrible for ERDDAP because it just leads to an infinite number of
undesirable and pointless attempts to crawl the actual data.
This can lead to more requests for data than from all other users combined.
And it fills the search engine with goofy, pointless subsets of the actual data.
To tell the web crawlers to stop filling out forms and just generally not
looking at web pages they don't need to look at, you need to create a text file called
robots.txt
in the root directory of your website's document hierarchy so that it can be viewed by
anyone as, e.g., http://www.your.domain/robots.txt .
If you are creating a new robots.txt file, this is a good start:
User-Agent: *
Disallow: /erddap/files/
Disallow: /files/
Disallow: /images/
Disallow: /*?
Disallow: /*?*
Disallow: /*.asc*
Disallow: /*.csv*
Disallow: /*.dods*
Disallow: /*.esriAscii*
Disallow: /*.esriCsv*
Disallow: /*.geoJson*
Disallow: /*.htmlTable*
Disallow: /*.json*
Disallow: /*.mat*
Disallow: /*.nc*
Disallow: /*.odvTxt*
Disallow: /*.tsv*
Disallow: /*.xhtml*
Disallow: /*.geotif*
Disallow: /*.itx*
Disallow: /*.kml*
Disallow: /*.pdf*
Disallow: /*.png*
Disallow: /*.large*
Disallow: /*.small*
Disallow: /*.transparentPng*
Sitemap: http://your.institutions.url/erddap/sitemap.xml
(But replace your.institutions.url with your ERDDAP's base URL.
It may take a few days for the search engines to notice and for the changes to take effect.
- sitemap.xml
As the
https://www.sitemaps.org website says:
Sitemaps are an easy way for webmasters to inform search engines about pages on their
sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists
URLs for a site along with additional metadata about each URL (when it was last updated, how
often it usually changes, and how important it is, relative to other URLs on the site) so that
search engines can more intelligently crawl the site.
Web crawlers usually discover pages from links within the site and from other sites. Sitemaps
supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap
and learn about those URLs using the associated metadata. Using the Sitemap protocol does not
guarantee that web pages are included in search engines, but provides hints for web crawlers
to do a better job of crawling your site.
Actually, since ERDDAP is RESTful, search engine spiders can easily crawl your ERDDAP.
But they tend to do it more often (daily!) than necessary (monthly?).
- Given that each search engine may be crawling your entire ERDDAP every day,
this can lead to a lot of unnecessary requests.
- So ERDDAP generates a sitemap.xml file for your ERDDAP which tells search engines
that your ERDDAP only needs to be crawled every month.
- You should add a reference to ERDDAP's sitemap.xml to your
robots.txt file:
Sitemap: http://www.yoursite.org/erddap/sitemap.xml
- If that doesn't seem to be getting the message to the crawlers, you can tell the various search
engines about the sitemap.xml file by visiting these URLs (but change YourInstitution
to your institution's acronym or abbreviation and www.yoursite.org to your ERDDAP's URL):
- https://www.bing.com/webmaster/ping.aspx?siteMap=http://www.yoursite.org/erddap/sitemap.xml
- https://www.google.com/ping?sitemap=http://www.yoursite.org/erddap/sitemap.xml
(I think) you just need to ping each search engine once, for all time.
The search engines will then detect changes to sitemap.xml periodically.
- Data Dissemination / Data Distribution Networks: Push and Pull Technology
- Normally, ERDDAP acts as an intermediary: it takes a request from a user;
gets data from a remote data source; reformats the data; and sends it to the user.
- Pull Technology:
ERDDAP also has the ability to actively get all of the available data
from a remote data source and
store
a local copy of the data.
- Push Technology:
By using ERDDAP's
subscription services,
other data servers can be notified as soon
as new data is available so that they can request the data (by pulling the data).
- ERDDAP's
EDDGridFromErddap and
EDDTableFromErddap use ERDDAP's subscription services and
flag system so that it will be notified immediately when new data is available.
- You can combine these to great effect: if you wrap an EDDGridCopy around an EDDGridFromErddap
dataset (or wrap an EDDTableCopy around an EDDTableFromErddap dataset),
ERDDAP will automatically create and maintain a local copy of another ERDDAP's dataset.
- Because the subscription services work as soon as new data is available,
push technology disseminates data very quickly (within seconds).
This architecture puts each ERDDAP administrator in charge of determining where the data
for his/her ERDDAP comes from.
- Other ERDDAP administrators can do the same. There is no need for coordination between
administrators.
- If many ERDDAP administrators link to each other's ERDDAPs, a data distribution network is formed.
- Data will be quickly, efficiently, and automatically disseminated from data sources
(ERDDAPs and other servers) to data redistribution sites (ERDDAPs) anywhere in the network.
- A given ERDDAP can be both a source of data for some datasets and a redistribution site for other
datasets.
- The resulting network is roughly similar to data distribution networks set up with programs like
Unidata's IDD/IDM, but less rigidly structured.
- Security, Authentication, and Authorization
By default, ERDDAP
runs as an entirely public server (using http and/or https)
with no login
(authentication)
system and no restrictions to data access
(authorization).
If you want to restrict access to some or all datasets to some users, you can use ERDDAP's
built-in security system. When the security system is in use:
- ERDDAP uses
role-based access control.
- The ERDDAP administrator defines users with the
<user>
tag in datasets.xml.
Each user has a username, a password (if authentication=custom),
and one or more roles.
- The ERDDAP administrator defines which roles have access to a given dataset via the
<accessibleTo>
tag in datasets.xml for any dataset that shouldn't have public access.
- The user's login status (and a link to log in/out) will be shown at the top
of every web page.
(But a logged in user will appear to ERDDAP to be not logged in if he uses an http URL.)
- If the <baseUrl> that you specify in your setup.xml is an http URL,
users who are not logged in may use ERDDAP's http URLs.
If <baseHttpsUrl> is also specified, users who are not logged in
can also use https URLs.
- HTTPS Only --
If the <baseUrl> that you specify
in your setup.xml is an https URL,
users who are not logged are encouraged (not forced) to use ERDDAP's
https URLs --
all of the links on ERDDAP web pages will refer to https URLs.
If you want to force users to use https URL, add a Redirect permanent
line inside the <VirtualHost *:80> section in
your Apache's config file (usually httpd.conf), e.g.,
<VirtualHost *:80>
[...]
ServerName example.com
Redirect permanent / https://example.com/
</VirtualHost>
If you want, there is an additional method to force the use of https:
HTTP Strict Transport Security (HSTS). To use it:
- Enable the Apache Headers Module: a2enmod headers
- Add the additional header to the HTTPS VirtualHost directive.
Max-age is measured in seconds and can be set to some long value.
<VirtualHost *:443>
# Guarantee HTTPS for 1 Year including Sub Domains
Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains"
Please note that this header is only valid on a HTTPS VirtualHost.
A reason not to force users to use https URLs is:
the underlying SSL/TLS link takes time to establish and
then takes time to encrypt and decrypt all information transmitted between
the user and the server. But some institutions require https only.
- Users who are logged in MUST use ERDDAP's https URLs.
If they use http URLs, they appear to ERDDAP to be not logged in.
This ensures the privacy of the communications and
helps prevent
session hijacking and sidejacking.
- Anyone who isn't logged in can access and use the public datasets.
By default, private datasets don't appear in lists of datasets if a user isn't logged in.
If the administrator has set setup.xml's <listPrivateDatasets> to true,
they will appear.
Attempts to request data from private datasets (if the user knows the URL) will be redirected
to the login page.
- Anyone who is logged in will be able to see and request data from any public dataset
and any private dataset to which their role allows them access.
By default, private datasets to which a user doesn't have access don't appear in lists of datasets.
If the administrator has set setup.xml's <listPrivateDatasets> to true,
they will appear.
Attempts to request data from private datasets to which the user doesn't have access will be
redirected to the login page.
- The RSS information for fully private datasets is only available to users
(and RSS readers) who are logged in and authorized to use that dataset.
This makes RSS not very useful for fully private datasets.
If a dataset is private but its
<graphsAccessibleTo>
is set to
public, the dataset's RSS is accessible to anyone.
- Email subscriptions can only be set up when a user has access to a dataset.
If a user subscribes to a private dataset, the subscription
continues to function after the user has logged out.
To set up the security/authentication/authorization system:
- Do the standard ERDDAP initial setup.
- In setup.xml,
- Add/change the <authenticate> value from nothing to
custom (don't use this), email (don't use this), google (recommended),
orcid (recommended), or oauth2 (which is google+orcid, recommended).
See the comments about these options below.
- Add/change the <baseHttpsUrl> value.
- Insert/uncomment &loginInfo; in <startBodyHtml> to display
the user's log in/out info
at the top of each web page.
- For testing purposes on your personal computer,
follow these instructions to configure tomcat to support SSL
(the basis for https connections) by creating a keystore with a
self-signed certificate
and by modifying tomcat/conf/server.xml to uncomment the connector for port 8443.
On Windows, you may need to move .keystore from "c:\Users\you\.keystore" to
"c:\Users\Default User\.keystore" or "c:\.keystore"
(see tomcat/logs/catalina.today.log if the application doesn't
load or users can't see the log in page).
You can see when the .keystore certificate will expire by examining the
certificate when you log in.
For a publicly accessible server, instead of using a self-signed certificate,
it is strongly recommended that you buy and install a certificate signed by a
certificate authority,
because it gives your clients more assurance that they are indeed connecting to your
ERDDAP, not a man-in-the-middle's version of your ERDDAP.
Many vendors sell digital certificates. (Search for web.) They are not expensive.
- On Linux computers, if Tomcat is running in Apache, modify the
/etc/httpd/conf.d/ssl.conf
file to allow HTTPS traffic to/from ERDDAP without requiring the :8443 port number
in the URL:
- Modify the existing <VirtualHost> tag (if there is one),
or add one at the end of the file so that it at least has these lines:
<VirtualHost _default_:443>
SSLEngine on
SSLProxyEngine On
ProxyPass /erddap http://localhost:8443/erddap
ProxyPassReverse /erddap http://localhost:8443/erddap
</VirtualHost>
- Then restart Apache: /usr/sbin/apachectl -k graceful (but sometimes it is in a different directory).
- In tomcat/conf/server.xml, uncomment the port=8443 <Connector> tag:
<Connector port="8443"
protocol="org.apache.coyote.http11.Http11NioProtocol"
maxThreads="150" SSLEnabled="true">
<SSLHostConfig>
<Certificate certificateKeystoreFile="conf/localhost-rsa.jks"
type="RSA" />
</SSLHostConfig>
</Connector>
and change the location of the certificateKeystoreFile.
- In datasets.xml, create a
<user>
tag for each user with username, password (if authorization=custom),
and roles information.
This is the authorization part of ERDDAP's security system.
- In datasets.xml, add an
<accessibleTo>
tag to each dataset that shouldn't have public access.
<accessibleTo> lets you specify which roles have access to that dataset.
- Restart Tomcat.
Trouble? Check the Tomcat logs.
- CHECK YOUR WORK! Any mistake could lead to a security flaw.
- Check that the login page uses https (not http).
Attempts to login via http should be automatically redirected to
https and port 8443 (although the port number may be hidden via an Apache proxy).
You may need to work with your network administrator to allow external web
requests to access port 8443 on your server.
- You can change the <user> and <accessibleTo>
tags at any time.
The changes will be applied at the next regular reload of any dataset,
or ASAP if you use a flag.
Authentication (logging in)
If you don't want to allow users to log in, don't specify a value for <authentication> in setup.xml.
If you do want to allow users to log in, you must specify a value for
<authentication>. Currently, ERDDAP supports
custom (don't use this),
email (don't use this),
google (recommended),
orcid (recommended), and
oauth2 (recommended) for the authentication method.
If you want to enable logging in,
we strongly recommend the google, orcid, or oauth2 options because they
free you from storing and handling user's passwords (needed for custom)
and are more secure than the email option.
Remember that users often use the same password at different sites.
So they may be using the same password for your ERDDAP as they do at their bank.
That makes their password very valuable -- much more valuable to the user than
just the data they are requesting.
So you need to do as much as you can to keep the passwords private.
That is a big responsibility.
The email, google, orcid, and oauth2 options take care of passwords,
so you don't have to gather, store, or work with them.
So you are freed from that responsibility.
All <authentication> options use a
cookie
on the user's computer, so the user's browser must be
set to allow cookies. If a user is making ERDDAP requests from a computer
program (not a browser), cookies and authentication are hard to work with.
That's a common problem with all authentication systems. Sorry.
The details of the <authentication> options are:
- custom
custom is ERDDAP's custom system for letting users log in by
entering their User Name and Password in a form on a web page.
If a user tries and fails to log in 3 times within 10 minutes,
the user is blocked from trying to log in for 10 minutes.
This prevents hackers from simply trying millions
of passwords until they find the right one.
This is somewhat secure because the User Name and Password are transmitted via https
(not http),
but authentication=google, orcid, or oauth2 are better because they free you from
having to handle passwords.
The custom approach requires you to collect a user's Name and
a hash digest of their Password
(use your phone! email isn't secure!) and store them in datasets.xml in
<user> tags.
With the custom option, no one can log in until you
(the ERDDAP administrator) create a <user> tag for the user,
specifying the user's name as the username,
the hash digest of their password as the password, and their roles.
Not Recommended
Because of the awkwardness of generating and transmitting the hash digest
of the user's password and because of the risks associated with
ERDDAP holding the hash digests of the passwords,
this option is not recommended.
To increase the security of this option:
- You MUST make sure that other users on the server (i.e., Linux users, not ERDDAP users)
can't read files in the Tomcat directory (especially the datasets.xml file!)
or ERDDAP's bigParentDirectory.
On Linux, as user=tomcat, use:
chmod -R g-rwx bigParentDirectory
chmod -R o-rwx bigParentDirectory
chmod -R g-rwx tomcatDirectory
chmod -R o-rwx tomcatDirectory
- Use UEPSHA256 for <passwordEncoding>
in setup.xml.
- Use an as-secure-as-possible method to pass the hash digest
of the user's password from the user to the ERDDAP administrator (phone?).
- email
The email authentication option uses a user's email account
to authenticate the user (by sending them an email with a special link
that they have to access in order to log in).
Unlike other emails that ERDDAP sends, ERDDAP does not write these
invitation emails to the email log file because they contain confidential
information.
In theory, this is not very secure, because emails aren't always encrypted,
so a bad guy with the ability to intercept emails could abuse this system
by using a valid user's email address and intercepting the invitation email.
In practice, if you set up ERDDAP to use a Google email account to send emails,
and if you set it up to use one of the TLS options for the connection,
and if the user has a Google email account, this is somewhat secure because
the emails are encrypted all the way from ERDDAP to the user.
To increase the security of this option:
- Make sure that other users on the server (i.e., Linux users, not ERDDAP users)
can't read files in the Tomcat directory or ERDDAP's bigParentDirectory.
On Linux, as user=tomcat, use:
chmod -R g-rwx bigParentDirectory
chmod -R o-rwx bigParentDirectory
chmod -R g-rwx tomcatDirectory
chmod -R o-rwx tomcatDirectory
- Set things up to get end-to-end security for the emails
sent from ERDDAP to the users.
For example, you could make a Google-centric system by
only creating <user> tags for Google-managed email addresses
and by setting up your ERDDAP to use a Google email server
via a secure/TLS connection: in your setup.xml, use e.g.,
<emailSmtpHost>smtp.gmail.com</emailSmtpHost>
<emailSmtpPort>587</emailSmtpPort>
<emailProperties>mail.smtp.starttls.enable|true</emailProperties>
Not Recommended
The email authentication option isn't recommended.
Please use the google, orcid, or oauth2 option instead.
As with the google, orcid, and oauth2 options,
email is very convenient for ERDDAP administrators --
you don't ever have to deal with passwords or their hash digests.
All you need to create is a
<user>
tag for a user in datasets.xml is the user's email address,
which ERDDAP uses as the user's name.
(The password attribute isn't used when authentication=email, google, orcid, or oauth2.)
With the email option, only users that have a <user> tag
in datasets.xml can try to log in to ERDDAP by providing their
email address and clicking on the link in the email that ERDDAP sends them.
ERDDAP treats email addresses as case-insensitive. It does this by
converting email addresses you enter (in <user> tags)
or users enter (on the login form) to their all lowercase version.
To set up authentication=email:
- In your setup.xml, change the <baseHttpsUrl> tag's value.
For experimenting/working on your personal computer, use
https://localhost:8443
For your public ERDDAP, use
https://your.domain.org:8443
or without the :8443 if you are using an Apache
proxypass so that the port number isn't needed.
- In your setup.xml, change the <authentication> tag's value to email:
<authentication>email</authentication>
- In your setup.xml, make sure the email system is set up via all of the
<email...> tags, so that ERDDAP can send out emails.
If possible, set this up to use a secure connection (SSL / TLS) to the email server.
- In your datasets.xml, create
<user>
tags for each user who will have access to private datasets.
Use the user's email address as the username in the tag.
Don't specify the password attribute in the user tag.
- Restart ERDDAP so that the changes to setup.xml and datasets.xml take effect.
- google,
orcid, and
oauth2 (recommended)
All three of these options are the recommended ERDDAP authentication options.
They are all the most secure options.
The other options have significantly weaker security.
- The google authentication option uses
Google Sign-In,
which is an implementation of the
OAuth 2.0 authentication protocol.
ERDDAP users sign into their Google email account,
including Google-managed accounts such as @noaa.gov accounts.
This allows ERDDAP to verify the user's identity (name and email address)
and access their profile image,
but does not give ERDDAP access to their emails, their Google Drive,
or any other private information.
- The orcid authentication option uses
Orcid authentication,
which is an implementation of the
OAuth 2.0 authentication protocol.
ERDDAP users sign into their
Orcid account,
which is commonly used by researchers to identify themselves.
This allows ERDDAP to verify the user's Orcid identity and get their Orcid account number,
but does not give ERDDAP access to their other Orcid account information.
- The oauth2 option lets users sign in with either their Google account
or their Orcid account.
The google, orcid, and oauth2 options are the successors
to the openid option,
which was discontinued after ERDDAP version 1.68,
and which was based on a version of openID that is now out-of-date.
Please switch to the google, orcid, or oauth2 option.
These options are very convenient for ERDDAP administrators --
you don't ever have to deal with passwords or their hash digests.
All you need to create is a
<user>
tag for a user in datasets.xml which specifies the user's Google email address
or Orcid account number as the username attribute.
(The password attribute isn't used when
authentication=email, google, orcid or oauth2.)
With these options, anyone can log in to ERDDAP by
signing into their Google email account or Orcid account,
but no one will have the right to access private datasets
until you (the ERDDAP administrator) create a <user> tag,
specifying their Google email address or Orcid account number
as the username, and specifying their roles.
ERDDAP treats email addresses as case-insensitive. It does this by
converting email addresses you enter (in <user> tags)
or users enter (on the login form) to their all lowercase version.
To set up google, orcid, or oauth2 authentication:
- In your setup.xml, change the <baseHttpsUrl> tag's value.
For experimenting/working on your personal computer, use
https://localhost:8443
For your public ERDDAP, use
https://your.domain.org:8443
or, better, without the :8443 if you are using an Apache
proxypass so that the port number isn't needed.
- In your setup.xml, change the <authentication> tag's value to
google, orcid, or oauth2, for example:
<authentication>oauth2</authentication>
- For the google and oauth2 options:
Follow the instructions below to set up Google authentication for your ERDDAP.
- If you don't have a Google email account,
create one
- Follow
these instructions
to create a Google Developers Console project and get a client ID.
When the Google form asks for authorized JavaScript origins,
enter the value from <baseHttpsUrl>
from your personal computer's ERDDAP setup.xml, e.g.,
https://localhost:8443
On a second line, add the <baseHttpsUrl>
from your public ERDDAP setup.xml, e.g.,
https://your.domain.org:8443
Don't specify any Authorized redirect URIs.
When you see your Client ID for this project,
copy and paste it into your setup.xml
(usually just below <authentication> to be orderly,
but placement doesn't actually matter),
in the <googleClientID> tag, e.g.,
<googleClientID>yourClientID</googleClientID>
The client ID will be a string of about 75 characters,
probably starting with several digits and ending with
.apps.googleusercontent.com .
- In your datasets.xml, create a
<user>
tag for each user who will have access to private datasets.
For the username attribute in the tag:
- For users who will sign in with google, use the user's Google email address.
- For users who will sign in with orcid, use the user's Orcid account number (with dashes).
Don't specify the password attribute for the user tag.
- Restart ERDDAP so that the changes to setup.xml and datasets.xml take effect.
- For the orcid and oauth2 options:
Follow the instructions below to set up Orcid authentication for your ERDDAP.
(For details, see
Orcid's authentication API documentation.)
- If you don't have an Orcid account,
create one
- Log into Orcid
https://orcid.org/signin
using your personal Orcid account.
- Click on "Developer Tools" (under "For Researchers" at the top).
- Click on "Register for the free ORCID public API". Enter this information:
Name: ERDDAP at [your organization]
Website: [your ERDDAP's domain]
Description: ERDDAP is a scientific data server. Users need to authenticate with Google or Orcid to access non-public datasets.
Redirect URIs: [your ERDDAP's domain]/erddap/loginOrcid.html
- Click on the Save icon (it looks like a 3.5" disk!).
You can then see your ORCID APP Client ID and ORCID Client Secret.
- Copy and paste the ORCID APP Client ID (which will start with "APP-")
into setup.xml in the <orcidClientID> tag, e.g.,
<orcidClientID>APP-ALPHANUMERICCHARACTERS</orcidClientID>
- Copy and paste the ORCID Client Secret (lowercase alpha-numeric characters
with dashes) into setup.xml in the <orcidClientSecret> tag, e.g.,
<orcidClientSecret>alpha-numeric-characters-with-dashes</orcidClientSecret>
- In your datasets.xml, create a
<user>
tag for each user who will have access to private datasets.
For the username attribute in the tag:
- For users who will sign in with google, use the user's Google email address.
- For users who will sign in with orcid, use the user's Orcid account number (with dashes).
Don't specify the password attribute for the user tag.
- Restart ERDDAP so that the changes to setup.xml and datasets.xml take effect.
Log In Either Way
If you use the google, orcid, or oauth2 authentication options,
and Google Sign-In or Orcid's Authentication API suddenly
ceases to work (for whatever reason) or ceases to work as ERDDAP expects,
users won't be able to log in to your ERDDAP.
As a temporary (or permanent) solution, you can ask users to sign up with the other system
(get a Google email account, or get an Orcid account). To do this:
- Change the <authentication> tag so that it allows the
other authentication system.
The oauth2 option allows users to log in with either system.
- Duplicate each of the <user> tags and
change the username attribute from the
Google email address to the corresponding Orcid account number
(or vice-versa), but keep the roles attribute the same.
ERDDAP no longer supports the openid authentication option,
which was based on a version of openID that is now out-of-date.
Please use the google, orcid, or oauth2 options instead.
ERDDAP doesn't support BASIC authentication because:
- BASIC seems geared toward predefined web pages needing secure access or
blanket on/off access
to the whole site, but ERDDAP allows (restricted access) datasets to be added
on-the-fly.
- BASIC authentication doesn't offer a way for users to log out!
- BASIC authentication is known to be not secure.
Secure Data Sources
If a data set is to have restricted access to ERDDAP users,
the data source (from where ERDDAP gets the data) should not be publicly accessible.
So how can ERDDAP get the data for restricted access datasets? Some options are:
- ERDDAP can serve data from local files (for example, via EDDTableFromFiles or EDDGridFromFiles).
- ERDDAP can be in a
DMZ
and the data source (e.g., an OPeNDAP server or a database) can be
behind a firewall,
where it is accessible to ERDDAP but not to the public.
- The data source can be on a public website, but require a login to get the data.
The two types of dataset that ERDDAP can log on to access are
EDDTableFromDatabase and
EDDTableFromCassandra.
These datasets support (and should always use) user names
(create an ERDDAP user who only has read-only privileges),
passwords, SSL connections, and other security measures.
But in general, currently, ERDDAP can't deal with these data sources because it has no
provisions for logging on to the data source.
This is the reason why access to
EDDGridFromErddap and EDDTableFromErddap datasets
can't be restricted.
Currently, the local ERDDAP has no way to login and access the metadata information
from the remote ERDDAP.
And putting the "remote" ERDDAP behind your firewall and removing that dataset's
accessibleTo restrictions doesn't solve the problem:
since user requests for EDDXxxFromErddap data need to be redirected to the
remote ERDDAP, the remote ERDDAP must be accessible.
Defenses Against Hackers
There are bad guy hackers who
try to exploit security weaknesses in server software like ERDDAP.
ERDDAP follows the common security advice to have several layers of defenses:
- Restricted Privileges -- One of the most important defenses is to
run Tomcat via a user called tomcat that doesn't have a password (so no one
can log in as that user) and has limited file system privileges (e.g.,
read-only access to the data).
See ERDDAP's instructions for setting up tomcat.
- Heavy Use -
In general, ERDDAP is built for heavy use, including by scripts which make
tens of thousands of requests, one after another.
It is hard for ERDDAP to simultaneously open itself up to heavy legitimate use and
shield itself from abuse.
It is sometimes hard to differentiate heavy legitimate use, excessive legitimate use,
and illegitimate use (and sometimes it is really easy).
Among other defenses, ERDDAP consciously does not allow a single request to
use an inordinate fraction of the system's resources (unless the system is otherwise
not active).
- Identify Troublesome Users -
If ERDDAP is slowing down or freezing
(perhaps because a naive user or a bot is running multiple
scripts to submit multiple requests simultaneously
or perhaps because of a bad guy's
Denial-of-service
attack), you can look at the
Daily Report email
(and more frequent identical information in the
ERDDAP log file)
which displays the number of requests made by the most active users
(see "Requester's IP Address (Allowed)").
ERDDAP also sends emails to the administrator whenever there is
"Unusual activity: >25% of requests failed".
You can then look in the ERDDAP log file to see the nature of their requests.
If you feel that someone is making too many requests, bizarre
requests (you wouldn't believe what I've seen, well, maybe you would),
or attack-type requests,
you can add their IP address to the blacklist.
- Blacklist --
You can add the IP address of troublesome users, bots, and
Denial-of-service
attackers to the ERDDAP
blacklist, so that future requests from them will be immediately rejected.
This setting is in datasets.xml so that you can quickly add an IP address to the list
and then flag
a dataset so that ERDDAP immediately notices and applies the change.
The error message sent to blacklisted users encourages them to contact the ERDDAP administrator
if they feel they have been mistakenly put on the blacklist.
(In our experience, several users have been unaware that they were running
multiple scripts simultaneously, or that their scripts were making nonsense requests.)
- Dataset Security -
Some types of datasets (notably, EDDTableFromDatabase) present additional security risks
(e.g., SQL injection)
and have their own security measures. See the information for those types
of datasets in
Working with the datasets.xml File,
notably
EDDTableFromDatabase security.
- Security Audit --
Although NOAA IT security refused our requests for scans for years, they now routinely scan my (Bob's) ERDDAP installation.
Although the initial scans found some problems that I then fixed,
subsequent scans haven't found problems with ERDDAP.
The scans worry about a lot of things: notably, since tabledap requests look like SQL requests,
they worry about SQL injection vulnerabilities. But those concerns are unfounded because
ERDDAP always parses and validates queries and then separately builds the SQL query in a
way that avoids injection vulnerabilities. The other thing they sometimes complain about
is that our Java version or Tomcat versions aren't as up-to-date as they want, so we update them in response.
I previously offered to show people the security reports, but I'm now told I can't do that.
Questions? Suggestions?
If you have any questions about ERDDAP's security system
or have any questions, doubts, concerns, or suggestions about how it is set up,
please email erd dot data at noaa dot gov.
These are details that you don't need to know until a need arises.
- Setting Up a Second ERDDAP for Testing/Development
If you want to do this, there are two approaches:
- Solid State Drives (SSDs) are great!
The quickest, easiest, and cheapest way to speed up ERDDAP's
access to tabular data is to put the data files on a Solid State Drive (SSD).
Most tabular datasets are relatively small, so a 1 or 2 TB SSD is probably
sufficient to hold all of the data files for all of your tabular datasets.
SSD's eventually wear out if you write data to a cell, delete it,
and write new data to that cell too many times.
So if you just use your SSD to write the data once and read it many times,
even a consumer-grade SSD should last a very long time, probably much longer
than any Hard Disk Drive (HDD).
Consumer-grade SSD's are now cheap (in 2018, ~$200 for 1 TB or ~$400 for 2 TB)
and prices are still falling fast.
When ERDDAP accesses a data file, an SSD offers both
shorter latency (~0.1ms, versus ~3ms for an HDD, versus ~10(?)ms for a RAID, versus ~55ms for Amazon S3)
and higher throughput (~500 MB/S, versus ~75 MB/s for an HDD, versus ~500 MB/s for a RAID).
So you can get a big performance boost (up to 10X versus a HDD) for $200!
Compared to most other possible changes to your system
(a new server for $10,000? a new RAID for $35,000? a new network switch for $5000? etc.),
this is by far the best Return On Investment (ROI).
If/when the SSD dies (in 1, 2, ... 8 years), replace it. Don't rely on it as for
long term, archival storage of the data, just for the front-end copy of the data.
[SSD's would be great for gridded data, too, but most gridded datasets are
much larger, making the SSD very expensive.]
If your server isn't loaded with memory, additional memory for your server
is also a great and relatively inexpensive way to speed up all aspects of ERDDAP.
- Heavy Loads / Constraints
With heavy use, a standalone ERDDAP may be constrained by various problems.
For more information, see the
list of
constraints and solutions.
- Grids, Clusters, and Federations
Under very heavy use, a single standalone ERDDAP will run into
one or more constraints and even the suggested solutions will be insufficient.
For such situations, ERDDAP has features that make it easy to construct scalable grids
(also called clusters or federations) of ERDDAPs which allow the system to handle very heavy use
(e.g., for a large data center). For more information, see
grids, clusters,
and federations of ERDDAPs.
- Cloud Computing
Several companies are starting to offer
cloud computing services
(e.g., Amazon Web Services).
Web hosting companies
have offered simpler services since the mid-1990's, but the "cloud" services
have greatly expanded the flexibility of the systems and the range of
services offered.
You can use these services to set up a single ERDDAP
or a grid/cluster of ERDDAPs to handle very heavy use.
For more information, see
cloud
computing with ERDDAP.
- Amazon Web Services (AWS) EC2 Installation Overview
Amazon Web Services (AWS)
is a
cloud computing service
that offers a wide range of computer infrastructure that you can rent by the hour.
You can install ERDDAP on an
Elastic Compute Cloud (EC2)
instance (their name for a computer that you can rent by the hour).
AWS has an excellent
AWS User Guide
and you can use Google to find answers to specific questions you might have.
Brace yourself --
it is a fair amount of work to get started. But once you get one server up and running,
you can easily rent as many additional resources (servers, databases, SSD-space, etc.)
as you need, at a reasonable price.
[This isn't a recommendation or endorsement of Amazon Web Services.
There are other cloud providers.]
An overview of things you need to do to get ERDDAP running on AWS is:
- In general, you will do all the things described in the
AWS User Guide.
- Set up an AWS account.
- Set up an AWS user within that account with administrator privileges.
Log in as this user to do all the following steps.
- Elastic Block Storage (EBS) is the AWS equivalent of a hard drive attached to
your server.
Some EBS space will be allocated when you first create an EC2 instance.
It is persistent storage -- the information isn't lost when you stop
your EC2 instance. And if you change instance types, your EBS space
automatically gets attached to the new instance.
- Create an Elastic IP address so that your EC2 instance has a stable,
public URL (as opposed to just a private URL that changes every time you
restart your instance).
- Create and start up an EC2 instance (computer).
There are a wide range of
instance types,
each at a different price.
An m4.large or m4.xlarge instance is powerful and is probably suitable
for most uses, but choose whatever meets your needs.
You will probably want to use Amazon's Linux as the operating system.
- If your desktop/laptop computer is a Windows computer, you can use
PuTTY,
a free SSH client for Windows, to get access to your EC2 instance's command line.
Or, you may have some other SSH program that you prefer.
- When you log into your EC2 instance, you will be logged in as the
administrative user with the user name "ec2-user".
ec2-user has sudo privileges.
So, when you need to do something as the root user, use: sudo someCommand
- If your desktop/laptop computer is a Windows computer, you can use
FileZilla,
a free SFTP program, to transfer files
to/from your EC2 instance. Or, you may have some other SFTP program that you prefer.
- Install Apache
on your EC2 instance.
- Follow the standard ERDDAP installation instructions.
- WaitThenTryAgain Exception
A user may get an error message like
WaitThenTryAgainException:
There was a (temporary?) problem. Wait a minute, then try again.
(In a browser, click the Reload button.)
Details: GridDataAccessor.increment: partialResults[0]="123542730" was expected
to be "123532800".
The general explanation of the WaitThenTryAgainException is:
When ERDDAP is responding to a user request, there may be an unexpected error
with the dataset (e.g., an error while reading data from the file, or an error accessing
a remote dataset). WaitThenTryAgain signals to ERDDAP that the request failed (so far)
but that ERDDAP should try to reload the dataset quickly (it calls
RequestReloadASAP) and retry the request.
Often, this succeeds, and the user just sees that the response to the request was slow.
Other times, the reload fails or is too slow, or the subsequent attempt to deal with the
request also fails and throws another WaitThenTryAgain.
If that happens, ERDDAP marks the dataset for reloading but tells the user
(via a WaitThenTryAgain Exception) that there was a failure while responding to the request.
That is the normal behavior. This system can deal with many common problems.
But it is possible for this system to get triggered excessively. The most common cause is
that ERDDAP's loading of the dataset doesn't see a problem, but ERDDAP's response to a
request for data does see the problem.
No matter what the cause is, the solution is for you to deal with whatever
is wrong with the dataset. Look in log.txt to see the actual error messages and
deal with the problems. If lots of files have valid headers but invalid data (a corrupted file),
replace the files with uncorrupted files. If the connection to a RAID is flakey, fix it.
If the connection to a remote service is flakey, find a way to make it not flakey
or download all the files from the remote source and serve the data from the local files.
The detailed explanation of that specific error (above) is:
For each EDDGrid dataset, ERDDAP keeps the axis variable values in memory.
They are used, for example, to convert requested axis values that use the "()" format into index numbers.
For example, if the axis values are "10, 15, 20, 25", a request for (20)
will be interpreted as a request for index #2 (0-based indices).
When ERDDAP gets a request for data and gets the data from the source,
it verifies that the axis values that it got from the source match the axis values in memory.
Normally, they do.
But sometimes the data source has changed in a significant way:
for example, index values from the beginning of the axis variable may have
been removed (e.g., "10, 15, 20, 25" may have become "20, 25, 30").
If that happens, it is clear that ERDDAP's interpretation of the request (e.g., "(20)" is index #2)
is now wrong. So ERDDAP throws an exception and calls RequestReloadASAP.
ERDDAP will update the dataset soon (often in a few seconds, usually within a minute).
Other, similar problems also throw the WaitThenTryAgain exception.
- RequestReloadASAP
You may see RequestReloadASAP in the log.txt file right after an error message
and often near a WaitThenTryAgain Exception.
It is basically an internal, programmatic way for ERDDAP to set a flag
to signal that the dataset should be reloaded ASAP.
- Files Not Being Deleted
For a few ERDDAP installations, there has been a problem with some temporary files
being created by ERDDAP staying open (mistakenly) and thus not being deleted.
In a few cases, many of these files have accumulated and taken up a significant
amount of disk space.
Hopefully, these problems are fixed (as of ERDDAP v2.00). If you see this problem,
please email the directory+names of the offending files to Chris.John at noaa.gov.
You have a few options for dealing with the problem:
- If the files aren't big and aren't causing you to run out of disk space, you can ignore the problem.
- The simplest solution is to shut down tomcat/erddap (after hours so fewer users are affected).
During the shutdown, if the operating system doesn't delete the files, delete them by hand. Then restart ERDDAP.
- Semantic Markup of Datasets with json-ld (JSON Linked Data)
ERDDAP now uses
json-ld (JSON Linked Data)
to make your data catalog and datasets part of the
semantic web,
which is Tim Berners-Lee's idea to make web content more machine readable and
machine "understandable".
The json-ld content uses
schema.org
terms and definitions.
Search engines
(Google in particular)
and other semantic tools can use this structured markup
to facilitate discovery and indexing.
The json-ld structured markup appears as invisible-to-humans <script> code on the
https://.../erddap/info/index.html web page (which is a semantic web
DataCatalog)
and on each https://.../erddap/info/datasetID/index.html web page
(which is a semantic web
Dataset).
(Special thanks to Adam Leadbetter and Rob Fuller of the Marine Institute in Ireland
for doing the hard parts of the work to make this part of ERDDAP.)
- Out-Of-Date URLs
Slowly but surely, the URLs that data providers have written into data files
are becoming out-of-date (for example, http becomes https, websites are rearranged,
and organizations like NODC/NGDC/NCDC are reorganized into NCEI).
The resulting broken links are an ever-present problem faced by all websites.
To deal with this,
ERDDAP now has a system to automatically update out-of-date URLs.
If GenerateDatasetsXml sees an out-of-date URL,
it adds the up-to-date URL to <addAttributes>.
Also, when a dataset loads, if ERDDAP sees an out-of-date URL,
it silently changes it to the up-to-date URL.
The changes are controlled by a series of search-for/replace-with
pairs defined in <updateUrls> in ERDDAP's
[tomcat]/webapps/erddap/WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml file.
You can make changes there.
If you have suggestions for changes,
or if you think this should be turned into a service (like the Converters),
please email Chris.John at noaa.gov.
- CORS
(Cross-Origin Resource Sharing)
"is a mechanism that allows restricted resources (e.g. fonts [or ERDDAP data])
on a web page to be requested from another domain outside the domain from
which the first resource was served"
(Arun Ranganathan).
Basically, CORS is a message that can be put in the HTTP header of a response,
saying essentially, "it is okay with this site if certain other sites
(specific ones, or all) grab resources
(e.g., data) from this site and make it available on their site".
Thus, it is an alternative to
JSONP.
The developers of ERDDAP do not claim to be security experts.
We are not entirely clear about the security issues related to CORS.
We don't want to make any statement endorsing an action that decreases security.
So we'll just stay neutral and leave it up to each ERDDAP admin to decide if the
benefits or enabling a CORS header are worth the risks.
As always, if your ERDDAP has any private datasets,
it's a good idea to be extra careful about security.
If you want to enable CORS for your ERDDAP, there are
readily available instructions
describing how website administrators can enable a CORS header
via their lower level server software (e.g., Apache or nginx).
- Palettes
are used by ERDDAP to convert a range of data values into a range of colors
when making graphs and maps.
Each palette is defined in a .cpt-style palette file as used by
GMT.
All ERDDAP .cpt files are valid GMT .cpt files, but the opposite is not true.
For use in ERDDAP, .cpt files have:
- Optional comments lines at the start of the file, starting with "#".
- A main section with a description of the segments of the palette, one segment per line.
Each segment description line has 8 values:
startValue, startRed, startGreen, startBlue, endValue, endRed, endGreen, endBlue.
There may be any number of segments.
ERDDAP uses linear interpolation between the startRed/Green/Blue and endRed/Green/Blue
of each segment.
We recommend that each segment specify a start and end color which are different,
and that the start color of each segment be the same as the end color of the
previous segment, so that the palette describes a continuous blend of colors.
ERDDAP has a system for creating on-the-fly a palette of discrete colors
from a palette with a continuous blend of colors.
An ERDDAP user can specify if they want the palette to be Continuous (the original)
or Discrete (derived from the original).
But there are legitimate reasons for not following these recommendations for some palettes.
- The startValue and endValues must be integers.
The first segment must have startValue=0 and endValue=1.
The second segment must have startValue=1 and endValue=2.
Etc.
- The red, green, and blue values must be integers from 0 (none) ... 255 (full on).
- The end of the file must have 3 lines with:
- A background rgb color for data values less than the colorbar minimum, e.g.: B 128 128 128
It is often the startRed, startGreen, and startBlue of the first segment.
- A foreground rgb color for data values more than the colorbar maximum, e.g.: F 128 0 0
It is often the endRed, endGreen, and endBlue of the last segment.
- An rgb color for NaN data values, e.g., N 128 128 128
It is often middle gray (128 128 128).
- The values on each line must be separated by tabs, with no extraneous spaces.
A sample .cpt file is BlueWhiteRed.cpt:
# This is BlueWhiteRed.cpt.
0 0 0 128 1 0 0 255
1 0 0 255 2 0 255 255
2 0 255 255 3 255 255 255
3 255 255 255 4 255 255 0
4 255 255 0 5 255 0 0
5 255 0 0 6 128 0 0
B 0 0 128
F 128 0 0
N 128 128 128
See the existing .cpt files for other examples.
If there is trouble with a .cpt file, ERDDAP will probably throw an error when the .cpt file is parsed
(which is better than misusing the information).
You can add additional palettes to ERDDAP.
You can make them yourself or find them on the web (for example, at
cpt-city)
although you'll probably have to edit their format slightly to conform to
ERDDAP's .cpt requirements.
To get ERDDAP to use a new .cpt file, store the file in
tomcat/webapps/erddap/WEB-INF/cptfiles
(you'll need to do that for each new version of ERDDAP)
and either:
- If you use the default messages.xml
file: add the filename to the <palettes> tag in
tomcat/webapps/erddap/WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml.
If you do this, you need to do it every time you upgrade ERDDAP.
- If you use a custom messages.xml file: add the filename to the <palettes> tag
in your custom messages.xml file:
tomcat/content/erddap/messages.xml .
If you do this, you only need to do it once (but there is other work to maintain a custom messages.xml file).
Then restart ERDDAP so ERDDAP notices the changes.
An advantage of this approach is that you can specify the order of the palettes in the list
presented to users.
If you add a collection, we encourage you to add a prefix with the authors initials (e.g., "KT_")
to the name of each palette to identify the collection and so that there can be multiple
palettes which would otherwise have the same name.
Please don't remove or change any of the standard palettes.
They are a standard feature of all ERDDAP installations.
If you think a palette or collection of palettes should be included in the
standard ERDDAP distribution because it/they would be of general use,
please email them to Chris.John at noaa.gov.
- How does ERDDAP generate the colors in a colorbar?
- The user selects one of the predefined palettes or uses the default,
e.g., Rainbow. Palettes are stored/defined in GMT-style .cpt Color Palette Table files.
Each of ERDDAP's predefined palettes has a simple integer range,
e.g., 0 to 1 (if there is just one section in the palette),
or 0 to 4 (if there are four sections in the palette).
Each segment in the file covers n to n+1, starting at n=0.
- ERDDAP generates a new .cpt file on-the-fly, by scaling the predefined palette's range
(e.g., 0 to 4) to the range of the palette needed by the
user (e.g., 0.1 to 50) and then generating a section in the
new palette for each section of the new palette
(e.g., a log scale with ticks at 0.1, 0.5, 1, 5, 10, 50 will have 5 sections).
The color for the end point of each section is generated by finding the relevant section of the
palette in the .cpt file, then linearly interpolating the R, G, and B values.
(That's the same as how GMT generates colors from its Color Palette Table files.)
This system allows ERDDAP to start with generic palettes
(e.g., Rainbow with 8 segments, in total spanning 0 to 8)
and create custom palettes on-the-fly
(e.g., a custom Rainbow, which maps 0.1 to 50 mg/L to the rainbow colors).
- ERDDAP then uses that new .cpt file to generate the color for each different colored pixel
in the color bar
(and later for each data point when plotting data on a graph or map),
again by finding the relevant section of the
palette in the .cpt file, then linearly interpolating the R, G, and B values.
This process may seem unnecessarily complicated. But it solves problems
related to log scales that are hard to solve other ways.
So how can you mimic what ERDDAP is doing? That isn't easy.
Basically you need to duplicate the process that ERDDAP is using.
If you are a Java programmer, you can use the same Java class that ERDDAP uses to do all of this:
tomcat/webapps/erddap/WEB-INF/classes/gov/noaa/pfel/coastwatch/sgt/CompoundColorMap.java.
- Guidelines for Data Distribution Systems
More general opinions about the design and evaluation
of data distribution systems can be found
here.
- ArchiveADataset
Included in your ERDDAP installation is a command line tool called ArchiveADataset
which can help you make an archive (a .zip or .tar.gz file) with part or all of a dataset
stored in a series of netcdf-3 .nc data files in a file format that is suitable for
submission to NOAA's NCEI archive (.nc for gridded datasets or
.ncCFMA
for tabular datasets, as specified by the
NCEI NetCDF Templates v2.0).
ArchiveADataset can make two different archive formats:
Not surprisingly, the
global and variable metadata
that ERDDAP encourages/requires is almost exactly the same in-file
CF and ACDD metadata that NCEI encourages/requires,
so all of your datasets should be ready for submission to NCEI via
Send2NCEI
or
ATRAC
(NCEI's Advanced Tracking and Resource tool for Archive Collections).
If you (the ERDDAP administrator) use ArchiveADataset to submit data to NCEI,
then you (not NCEI) will determine when to submit a chunk of data to NCEI
and what that chunk will be, because you will know when there is new
data and how to specify that chunk (and NCEI won't). Thus, ArchiveADataset
is a tool for you to use to create a package to submit to NCEI.
ArchiveADataset may be useful in other situations,
for example, for ERDDAP administrators who need to convert a subset of a dataset
(on a private ERDDAP) from its native file format into a set of
.ncCF files,
so that a public ERDDAP can
serve the data from the .ncCF files instead of the original files.
Once you have set up ERDDAP and run it (at least one time),
you can find and use ArchiveADataset in the
tomcat/webapps/erddap/WEB-INF directory.
There is a shell script (ArchiveADataset.sh) for Linux/Unix
and a batch file (ArchiveADataset.bat) for Windows.
On Windows, the first time you run ArchiveADataset, you need to edit the
ArchiveADataset.bat file with a text editor to change the path to the java.exe file
so that Windows can find Java.
When you run ArchiveADataset, it will ask you a series of questions.
For each question, type a response, then press Enter.
Or press ^C to exit a program at any time.
Or, you can put the answers to the questions, in order, on the command line.
To do this, run the program once and type in and write down your answers.
Then, you can create a single command line (with the answers as parameters) which
runs the program and answers all the questions.
Use the word default if you want to use the default value for a given parameter.
Use "" (two double quotes) as a placeholder for an empty string.
Specifying parameters on the command line can be very convenient, for example,
if you use ArchiveADataset once a month to
archive a month's worth of data. Once you have generated the command line
with parameters and saved that in your notes or in a shell script,
you just need to make small changes each month to make that month's archive.
The questions that ArchiveADataset asks allow you to:
- Specify original or Bagit file packaging.
For NCEI, use Bagit.
- Specify zip or tar.gz compression for the package.
For NCEI, use tar.gz.
- Specify a contact email address for this archive
(it will be written in the READ_ME.txt file in the archive).
- Specify the datasetID of the dataset you want to archive.
- Specify which data variables you want to archive (usually all).
- Specify which subset of the dataset you want to archive.
You need to format the subset in the same way you would format
a subset for a data request,
so it will be different for gridded
than for tabular datasets.
- For gridded datasets, you can specify a range of values of the leftmost dimension,
usually that is a range of time. ArchiveADataset will make a separate
request and generate a separate data file for each value in the range of values.
Since gridded datasets are usually large, you will almost always have to specify
a small subset relative to the size of the entire dataset.
For example, [(2015-12-01):(2015-12-31)][][][]
- For tabular datasets, you can specify any collection of constraints, but
it is often a range of time.
Since tabular datasets are usually small, it is often possible to specify
no constraints, so that the entire dataset is archived.
For example, &time>=2015-12-01&time<2016-01-01
- For tabular datasets: specify a comma separated list of 0 or more variables
that will determine
how the archived data is further subsetted into different data files.
For datasets that have
cdm_data_type=TimeSeries|TimeSeriesProfile|Trajectory|TrajectoryProfile
you should almost always specify the variable that has the
cf_role=timeseries_id (e.g., stationID)
or cf_role=trajectory_id attribute.
ArchiveADataset will make a separate request and generate a separate
data file for each combination of the values of these variables,
e.g., for each stationID.
For all other tabular datasets, you will probably not specify any variables
for this purpose.
Warning: If the subset of the dataset you are archiving is very large (>2GB)
and there is no suitable variable for this purpose,
then ArchiveADataset is not usable with this dataset. This should be rare.
- Specify the file format for the data files that will be created.
For gridded datasets, for NCEI, use .nc .
For tabular datasets, for NCEI, use
.ncCFMA if it is an option; otherwise use .nc.
- Specify the type of file digest
to be created for each data file and
for the entire archive package: MD5, SHA-1, or SHA-256.
The file digest provides a way for the client (e.g., NCEI)
to test whether the data file has become corrupted.
Traditionally, these were
.md5 files,
but now there are better options.
For NCEI, use SHA-256 .
After you answer all of the questions, ArchiveADataset will:
- Make a series of requests to the dataset and stage the resulting data files in
bigParentDirectory/ArchiveADataset/datasetID_timestamp/.
For gridded datasets, there will be a file for each value of the leftmost dimension
(e.g., time). The name of the file will be that value (e.g., the time value).
For tabular datasets, there will be a file for each value of the
... variable(s). The name of the file will be that value.
If there is more than one variable, the left variables will be used
to make subdirectory names, and the rightmost variable will be used
to make the filenames.
Each data file must be <2GB (the maximum allowed by .nc version 3 files).
- Make a file related to each data file with the digest of the data file.
For example, if the data file is 46088.nc and the digest type is .sha256,
then the digest file will have the name 46088.nc.sha256 .
- Make a READ_ME.txt file with information about the archive,
including a list of all the settings you specified to generate this archive.
- Make 3 files in bigParentDirectory/ArchiveADataset/ :
- A .zip or .tar.gz archive file named datasetID_timestamp.zip (or .tar.gz)
containing all of the staged data files and digest files.
This file may be any size, limited only by disk space.
- A digest file for the archive file, for example,
datasetID_timestamp.zip.sha256.txt
- For the "original" type of archive, a text file named
datasetID_timestamp.zip.listOfFiles.txt (or .tar.gz)
which lists all of the files in the .zip (or .tar.gz) file.
If you are preparing the archive for NCEI, these are the files that you will send to
NCEI, perhaps via
Send2NCEI
or
ATRAC
(NCEI's Advanced Tracking and Resource tool for Archive Collections).
- Delete all of the staged files so that only the archive file (e.g., .zip),
the digest (e.g., .sha256.txt) of the archive,
and (optionally) the .listOfFiles.txt files remain.
ISO 19115 .xml Metadata Files --
The ArchiveADataset archive package does not include the
ISO 19115 .xml metadata file for the dataset. If you want/need to submit
an ISO 19115 file for your dataset to NCEI, you can send them
the ISO 19115 .xml metadata file that ERDDAP created for the dataset
(but NMFS people should get the ISO 19115 file for their datasets
from InPort if ERDDAP isn't already serving that file).
Problems? Suggestions? ArchiveADataset is new. If you have problems or suggestions,
please email them to erd dot data at noaa dot gov .
Here are some PowerPoint slide shows and documents that Bob Simons has created
related to ERDDAP.
DISCLAIMER: The content and opinions expressed in these documents
are Bob Simons' personal opinions and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric Administration.
The Four Main Documents:
Other Presentations:
- 2020 EDM: New Features in ERDDAP v2.10
- 2020-05-19 DMIT: Data Ingest (Or
watch this video of Bob giving this talk.)
- 2019 IOOS DMAC: New Features in ERDDAP v2.0
- 2018 Summer ESIP: Subsetting In ERDDAP
- 2018 Summer ESIP: JSON Support In ERDDAP
- 2018 EDM: A Distributed System of Web Services (Faster, Easier, Less Expensive) (Or, why I was happy 4 years ago.)
- 2018 EDM: ERDDAP in 2018
- 2018 EDM: New Features in ERDDAP for Image, Audio, and Video Data
- 2018 EDM: UAF and ERDDAP Solutions for Data Integration
- 2017 EDM: A Quick Introduction to ERDDAP
- 2017 EDM and 2017 IOOS: New or Little Known ERDDAP Features (for Users)
- 2017 EDM and 2017 IOOS: New or Little Known ERDDAP Features (for Administrators)
- 2017 EDM: EML, KNB, and ERDDAP
- 2017 EDM: How does data get from the source to the end user? Old School versus New School
- 2016 Summer ESIP: The Big Picture: PARR, OPeNDAP, ERDDAP, and Data Distribution
- 2016 EDM: One And Done
- 2016 Gov API: Next Generation Data Servers
- 2015 Summer ESIP: Tabular Aggregation
- 2014 EDM: Bob's Do's and Don't for Tabular Data
- 2014 EDM: The Ideal User Interface
- 2014 Summer ESIP: Tabular Data
- 2013: Don't Treat In-Situ and Tabular Data Like Gridded Data
- 2013 EDM: Do More With Less
- 2012 EDM: Guidelines for Data Distribution Systems
Presentations By Other People:
- A FAIR based tool for improving Global Data sharing
by Kevin O'Brien at the Global Ocean Observing System (GOOS) Webinar / Observation Coordination Group (OCG) Series / 1, November 12, 2020.
- Building Your Own Weather App Using NOAA Open Data and Jupyter Notebooks
by Filipe Fernandes and Rich Signell at SciPy 2018, July 13, 2018.
- Using the OOI ERDDAP
by Rich Signell, February 2018.
- ESIP Tech Dive: "ERDDAP Lightning Talks"
Eight 5-Minute Talks About Interesting Things People Are Doing With ERDDAP
by Jenn Sevadjian, Jim Potemra, Conor Delaney, Kevin O'Brien, John Kerfoot,
Stephanie Petillo, Charles Carleton and Eli Hunter presented as an ESIP Tech Dive
on August 31, 2017.
- Using ERDDAP to Access Tabular Data
by Rich Signell, August 2015.
- Test Using ERDDAP for Blue Carbon Data
by Rich Signell, August 2015.
- Using Data From ERDDAP in NOAA's GNOME Software.
In this video, Rich Signell downloads ocean currents forecast data from ERDDAP to model a toxic
spill in the ocean using
NOAA's GNOME software (in 5 minutes!).
(One tiny error in the video:
when searching for datasets, don't use AND between search terms. It is implicit.)
By Rich Signell, April 8, 2011.
These are things that only a programmer who intends to work with ERDDAP's Java classes needs to know.
- Getting the Source Code
- Via erddap.war
The source code for the current version of ERDDAP is always in the current erddap.war file
So when you install ERDDAP in Tomcat (see the instructions at the top of this page),
all of the source code is unpacked
and installed on that computer. If you just want to read the source code of the
current official version of ERDDAP, this is the easiest option.
- Via Source Code on GitHub
The source code for recent public versions and in-development versions
is also available via GitHub.
Please read the Wiki
for that project.
If you want to modify the source code (and possibly have the changes incorporated
into the standard ERDDAP distribution), this is the recommended approach.
Before the 2020-12-22 GitHub release, one file was missing from the GitHub version
of the code because it was too big: the AWS SDK for Java v1.11 jar file.
With the 2020-12-22 release, we switched to v2 of the AWS SDK which
uses numerous small .jar files (which are in the /lib directory) instead of one huge .jar file.
Thus, the GitHub version now includes all ERDDAP files.
ERDDAP and its subcomponents have very liberal, open-source licenses,
so you can use and modify the source code for any purpose, for-profit or not-for-profit.
Note that ERDDAP and many subcomponents have licenses that require that you
acknowledge the source of the code that you are using. See
Credits. Whether required or not, it is just
good form to acknowledge all of these contributors.
- Use the Code for Other Projects
While you are welcome to use parts of the ERDDAP code for other projects,
be warned that the code can and will change.
We don't promise to support other uses of our code.
Git and GitHub will be your main solutions for dealing with this --
Git allows you to merge our changes into your changes.
For many situations where you might be tempted to use parts of ERDDAP in your project,
we think you will find it much easier to install and use ERDDAP as is,
and then write other services which use ERDDAP's services.
You can set up your own ERDDAP installation crudely in an hour or two.
You can set up your own ERDDAP installation in a polished way in a few days
(depending on the number and complexity of your datasets).
But hacking out parts of ERDDAP for your own project is likely to take weeks
(and months to catch subtleties) and you will lose the ability to
incorporate changes and bug fixes from subsequent ERDDAP releases.
We (obviously) think there are many benefits to using ERDDAP as is and making
your ERDDAP installation publicly accessible.
However, in some circumstances, you might not want to make your ERDDAP
installation publicly accessible.
Then, your service can access and use your private ERDDAP and your clients
needn't know about ERDDAP.
Halfway
Or, there is another approach which you may find useful
which is halfway between delving into ERDDAP's code and using ERDDAP as a
stand-alone web service:
In the EDD class, there is a static method which lets you make an instance of a dataset
(based on the specification in datasets.xml):
oneFromDatasetXml(String tDatasetID)
It returns an instance of an EDDTable or EDDGrid dataset.
Given that instance, you can call
makeNewFileForDapQuery(String userDapQuery, String dir,
String fileName, String fileTypeName)
to tell the instance to make a data file, of a specific fileType, with
the results from a user query.
Thus, this is a simple way to use ERDDAP's methods to request data and get a
file in response, just as a client would use the ERDDAP web application.
But this approach works within your Java program and bypasses the need for an
application server like Tomcat.
We use this approach for many of the unit tests of EDDTable and EDDGrid subclasses,
so you can see examples of this in the source code for all of those classes.
- Development Environment
- Set up ERDDAP in Tomcat
Since ERDDAP is mainly intended to be a servlet running in Tomcat,
we strongly recommend that you follow the standard installation instructions
at the top of this page to install Tomcat, and then install ERDDAP
in Tomcat's webapps directory.
Among other things, ERDDAP was designed to be installed in Tomcat's directory structure
and expects Tomcat to provide some .jar files.
- Our development environment is just a programmer's editor
(EditPlus, although that isn't a recommendation since we're not allowed to make recommendations).
We don't use Eclipse, Ant, etc.; nor do we offer ERDDAP-related support for them.
Starting in December 2020, we do use Maven to compile the classes periodically
in order to manage/gather the needed external .jar files.
(Amazon's SDK for Java forced us to do this.)
The ERDDAP pom.xml file for Maven is included in the GitHub ERDDAP distribution.
- We use a batch file which deletes all of the .class files in the source tree
to ensure that we have a clean compile (with javac).
- We currently use Adoptium's javac jdk-17.0.4+8
to compile gov.noaa.pfeg.coastwatch.TestAll
(it has links to a few classes that wouldn't be compiled otherwise)
and run the tests.
For security reasons, it is almost always best to use the latest versions of
Java 17 and Tomcat 10.
- When we run javac or java, the current directory is tomcat/webapps/erddap/WEB-INF .
- Our javac and java classpath is
classes;../../../lib/servlet-api.jar;lib/*
- So your javac command line will be something like
javac -encoding UTF-8 -cp classes;../../../lib/servlet-api.jar;lib/* classes/gov/noaa/pfel/coastwatch/TestAll.java
- And your java command line will be something like
java -cp classes;../../../lib/servlet-api.jar;lib/* -Xmx4000M -Xms4000M classes/gov/noaa/pfel/coastwatch/TestAll
Optional: you can add -verbose:gc, which tells Java to
print garbage collection statistics.
- If TestAll compiles, everything ERDDAP needs has been compiled.
A few classes are compiled that aren't needed for ERDDAP.
If compiling TestAll succeeds but doesn't compile some class, that class isn't needed.
(There are some unfinished/unused classes.)
- In a few cases, we use 3rd party source code instead of .jar files (notably for DODS)
and have modified them slightly to avoid problems compiling with Java 17.
We have often made other slight modifications (notably to DODS) for other reasons.
- Most classes have test methods. We run lots of tests via TestAll.
Unfortunately, many of the tests are specific to our set up.
(Sorry. We're working to move all test files to the erddapTest and erddapTestBig directories
which are managed by Git and available at GitHub.)
- Important Classes
If you want to look
at the source code and try to figure out how ERDDAP works, please do.
- The code has JavaDoc comments, but the JavaDocs haven't been generated. Feel free to generate them.
- The most important classes (including the ones mentioned below) are within gov/noaa/pfel/erddap.
- The Erddap class has the highest level methods. It extends HttpServlet.
- Erddap passes requests to instances of subclasses of EDDGrid or EDDTable,
which represent individual datasets.
- EDStatic has most of the static information and settings (e.g., from
the setup.xml and messages.xml files) and offers static services
(e.g., sending emails).
- EDDGrid and EDDTable subclasses parse the request, get data from subclass-specific methods,
then format the data for the response.
- EDDGrid subclasses push data into GridDataAccessor (the internal data container for gridded data).
- EDDTable subclasses push data into TableWriter subclasses, which write data to a specific file
type on-the-fly.
- Other classes (e.g., low level classes) are also important, but it is less likely that
you will be working to change them.
- Code Contributions
If you have written some code which would be a nice addition to ERDDAP
(or better, before you write the code so that we can coordinate),
please email erd dot data at noaa dot gov. We'll work out the details.
The most likely situations are:
- You want to write another subclass of EDDGrid or EDDTable to handle another data source type.
If so, we recommend that you find the closest existing subclass
and use that code as a starting point.
- You want to write another saveAsFileType method.
If so, we recommend that you find the closest existing saveAsFileType method
in EDDGrid or EDDTable
and use that code as a starting point.
- You want to work on one of the
GitHub Issues,
which are projects for which we are soliciting help from others.
Those situations have the advantage that the code you write is self-contained.
You won't need to know all the details of ERDDAP's internals.
And it will be easy for us to incorporate your code in ERDDAP.
Note that if you do submit code, the license will need compatible with the ERDDAP
license (e.g.,
Apache,
BSD, or
MIT-X).
We'll list your contribution in the
credits.
GitHub Issues
If you would like to contribute but don't have a project, see the list
of GitHub Issues,
many of which are projects you could take on.
- Judging Your Code Contributions
If you want to submit code or other changes to be included in ERDDAP, that is great.
Your contribution needs to meet certain criteria in order to be accepted.
If you follow the guidelines below, you greatly increase the chances of
your contribution being accepted.
- I'll try to be a BDFL (Benevolent Dictator For Life, well, until I retire),
with the emphasis on Benevolent.
Basically, that means I'm responsible for ERDDAP so I have final word
on decisions about ERDDAP, notably about the design and whether a pull
request will be accepted or not. It needs to be this way partly for efficiency reasons
(it works for Linus Torvalds and Linux) and partly for security
reasons: I have to tell the IT security people that I take responsibility
for the security and integrity of the code.
- I don't guarantee that I'll accept your code.
If a project just doesn't work out as well as we had hoped and if it can't be salvaged,
I won't include the project in the ERDDAP distribution.
Please don't feel bad. Sometimes projects don't work out as well as hoped.
It happens, even to me.
If you follow the guidelines below, you greatly increase your chances of success.
- It's best if the changes are of general interest and usefulness.
If the code is specific to your organization, it is probably best to maintain
a separate branch of ERDDAP for your use. Axiom does this.
Fortunately, Git makes this easy to do.
I want to maintain a consistent vision for ERDDAP, not allow it to become
a kitchen sink project where everyone adds a custom feature for their project.
- Compile via TestAll.
Since the standard way to compile ERDDAP is to compile TestAll.java,
compiling TestAll must compile all of the classes for your code.
If it doesn't, simply add the hidden classes to the "Force compilation"
section of TestAll.java so that it does.
- Follow the Java Code Conventions.
In general, your code should be good quality and should follow the original
Java Code Conventions:
put .class files in the proper place in the directory structure,
give .class files an appropriate name,
include proper JavaDoc comments, include //comments at the start of each
paragraph of code, indent with 4 spaces (not tab), avoid lines >80 characters, etc.
(Yes, my code is far from perfect in this regard. I continually work to make it better.)
- Use descriptive class, method and variable names.
That makes the code easier for others to read.
- Avoid fancy code.
In the long run, you or other people will have to figure out the code
in order to maintain it. So please use simple coding methods that are
thus easier for others (including you in the future) to figure out.
Obviously, if there is a real advantage to using some fancy Java programming feature,
use it, but extensively document what you did, why, and how it works.
- Work with Bob before you start.
If you hope to get your code changes pulled into ERDDAP,
I definitely want to talk about what you're going to do and how you're
going to do it before you make any changes to the code.
That way, we can avoid you making changes that I, in the end, don't accept.
When you're doing the work, I'm willing to answer questions to help you figure out the
existing code and (overall) how to tackle your project.
- Work independently (as much as possible) after you start.
In contrast to the above "Work with Bob", after you get started on the
project, I encourage you to work as independently as possible.
If I have to tell you almost everything and answer lots of questions
(especially ones that you could have answered by reading the documentation or the code),
then your efforts aren't a time savings for me and
I might as well do the work myself.
It's the
Mythical Man Month problem.
Of course, we should still communicate. It would be great to periodically see
your work in progress to make sure the project is on track.
But the more you can work independently
(after we agree on the task at hand and the general approach),
the better.
- Avoid bugs.
If a bug isn't caught before a release, it causes problems for users (at best),
returns the wrong information (at worst), is a blot on ERDDAP's reputation,
and will persist on out-of-date ERDDAP installations for years.
Work very hard to avoid bugs. Part of this is writing clean code (so it is easier to
see problems). Part of this is writing unit tests. Part of this is
a constant attitude of bug avoidance when you write code.
Don't make me regret adding your code to ERDDAP.
- Write a unit test or tests.
At the bottom of most classes in ERDDAP
is a test() method that calls all of the individual test methods for that class.
Please write at least one individual test method that thoroughly tests the
code you write and add it to the class' test() method so that it is run automatically.
Unit (and related) tests are one of the best ways to catch bugs, initially,
and in the long run (as other things change in ERDDAP).
As I've said, "Unit tests are what lets me sleep at night."
- Make it easy for me to understand and accept the changes in your pull request.
Part of that is writing a unit test method(s).
Part of that is limiting your changes to one section of code (or one class) if possible.
I won't accept any pull request with hundreds of changes throughout the code.
I tell the IT security people that I take responsibility for the security
and integrity of the code.
If there are too many changes or they are too hard to figure out,
then it's just too hard to verify the changes are correct and don't introduce
bugs or security issues.
- Keep it simple.
A good overall theme for your code is: Keep it simple.
Simple code is easy for others (including you in the future) to read and maintain.
It's easy for me to understand and thus accept.
- Assume long term responsibility for your code.
In the long run, it is best if you assume ongoing responsibility for
maintaining your code and answering questions about it (e.g., in the ERDDAP Google Group).
As some authors note, code is a liability as well as an asset.
If a bug is discovered in the future, it's best if you fix it because no one
knows your code better than you
(also so that there is an incentive to avoid bugs in the first place).
I'm not asking for a firm commitment to provide ongoing maintenance.
I'm just saying that doing the maintenance will be greatly appreciated.
The
List of Changes
for each ERDDAP release is now on a separate web page.
ERDDAP is a product of the
NOAA
NMFS
SWFSC
ERD.
Bob Simons is the original and still the main author of ERDDAP
(the designer and software developer who wrote the ERDDAP-specific code).
The starting point was Roy Mendelssohn's (Bob's boss) suggestion
that Bob turn his ConvertTable program (a small utility which converts tabular data
from one format to another and which was largely code from Bob's
pre-NOAA work that Bob re-licensed to be open source) into a web service.
It was and is Roy Mendelssohn's ideas about distributed data systems,
his initial suggestion to Bob, and his ongoing support
(including hardware, network, and other software support,
and by freeing up Bob's time so he could spend more time on the ERDDAP code)
that has made this project possible and enabled its growth.
The ERDDAP-specific code is licensed as copyrighted open source, with
NOAA
holding the copyright. See the ERDDAP license.
ERDDAP uses copyrighted open source, Apache, LGPL, MIT/X, Mozilla, and public domain libraries and data.
ERDDAP does not require any GPL code or commercial programs.
The bulk of the funding for work on ERDDAP has come from NOAA,
in that it paid Bob Simons' salary.
For the first year of ERDDAP, when he was a government contractor,
funding came from the
NOAA CoastWatch program,
the NOAA IOOS
program, and the now defunct Pacific Ocean Shelf Tracking (POST) program.
Much credit goes to the many ERDDAP administrators and users who have made
suggestions and comments which have led to many improvements in ERDDAP.
Many are mentioned by name in the
List of Changes.
Thank you all (named and unnamed) very much.
Thus, ERDDAP is a great example of
User-Driven Innovation,
where product innovation often comes from consumers (ERDDAP users), not just the producers (ERDDAP developers).
Here is the list of software and datasets that are in the ERDDAP distribution.
We are very grateful for all of these.
Thank you very much.
[Starting in 2021, it has become almost impossible to properly list all of the
sources of code for ERDDAP because a few of the libraries we use (notably
netcdf-java and especially AWS) in turn use many, many other libraries.
All of the libraries that ERDDAP code calls directly are included below,
as are many of the libraries that the other libraries call in turn.
If you see that we have omitted a project below, please let us know so we can
add the project below and give credit where credit is due.]
- Overview
ERDDAP is a
Java Servlet
program. At ERD, it runs inside of a
Tomcat application server
(license: Apache),
with an
Apache web server
(license: Apache),
running on a computer using the
Red Hat Linux operating
system (license: GPL).
- Datasets
The data sets are from various sources. See the metadata (in particular the "sourceUrl", "infoUrl",
"institution", and "license") for each dataset.
Many datasets have a restriction on their use that requires you to cite/credit the
data provider whenever you use the data. It is always good form to cite/credit the
data provider. See
How to Cite a Dataset in a Paper.
- CoHort Software
The com/cohort classes are from CoHort Software
(https://www.cohortsoftware.com)
which makes these classes available with an MIT/X-like license (see classes/com/cohort/util/LICENSE.txt).
- CoastWatch Browser
ERDDAP uses code from the
CoastWatch Browser
project (now decomissioned) from the
NOAA CoastWatch
West Coast Regional Node
(license: copyrighted open source).
That project was initiated and managed by Dave Foley, a former Coordinator of the NOAA CoastWatch
West Coast Regional Node.
All of the CoastWatch Browser code was written by Bob Simons.
- OPeNDAP
Data from OPeNDAP
servers are read
with Java DAP 1.1.7
(license: LGPL).
- NetCDF-java
NetCDF files (.nc), GMT-style NetCDF files (.grd), GRIB, and BUFR are read and written with code in the
NetCDF Java Library
(license: BSD-3)
from Unidata.
Software Included in the NetCDF Java .jar:
- slf4j
The NetCDF Java Library and Cassandra need slf4j from the Simple Logging Facade for Java project.
Currently, ERDDAP uses the slf4j-simple-xxx.jar renamed as slf4j.jar to meet this need.
(license: MIT/X).
- JDOM
The NetCDF Java .jar includes XML processing code from
JDOM
(license: Apache),
which is included in the netcdfAll.jar.
- Joda
The NetCDF Java .jar includes Joda
for calendar calculations (which are probably not used by ERDDAP).
(license: Apache 2.0).
- Apache
The NetCDF Java .jar includes .jar files from several
Apache projects:
commons-codec,
commons-discovery,
commons-httpclient,
commons-logging
HttpComponents,
(For all: license: Apache)
These are included in the netcdfAll.jar.
- Other
The NetCDF Java .jar also includes code from:
com.google.code.findbugs,
com.google.errorprone,
com.google.guava,
com.google.j2objc,
com.google.protobuf,
edu.ucar,
org.codehaus.mojo,
com.beust.jcommander,
com.google.common,
com.google.re2j, and
com.google.thirdparty.
(Google uses Apache and BSD-like licenses.)
- SGT
The graphs and maps are created on-the-fly with a modified version of NOAA's
SGT (was at https://www.pmel.noaa.gov/epic/java/sgt/, now discontinued) version 3
(a Java-based Scientific Graphics Toolkit written by Donald Denbo at
NOAA PMEL)
(license: copyrighted open source (was at https://www.pmel.noaa.gov/epic/java/license.html)).
- Walter Zorn
Big, HTML tooltips on ERDDAP's HTML pages are created with Walter Zorn's wz_tooltip.js
(license: LGPL).
Sliders and the drag and drop feature of the Slide Sorter are created with Walter Zorn's
wz_dragdrop.js
(license: LGPL).
- iText
The .pdf files are created with
iText
(version 1.3.1, which used the
Mozilla license),
a free Java-PDF library by Bruno Lowagie and Paulo Soares.
- GSHHS
The shoreline and lake data are from
GSHHS
-- A Global Self-consistent, Hierarchical,
High-resolution Shoreline Database
(license: GPL)
and created by Paul Wessel and Walter Smith.
WE MAKE NO CLAIM ABOUT THE CORRECTNESS OF THE SHORELINE DATA THAT COMES WITH ERDDAP --
DO NOT USE IT FOR NAVIGATIONAL PURPOSES.
- GMT pscoast
The political boundary and river data are from the
pscoast
program in
GMT,
which uses data from the
CIA World Data Bank II
(license: public domain).
WE MAKE NO CLAIM ABOUT THE CORRECTNESS OF THE POLITICAL BOUNDARY
DATA THAT COMES WITH ERDDAP.
- ETOPO
The bathymetry/topography data used in the background of some maps is the
ETOPO1 Global 1-Minute Gridded Elevation Data Set
(Ice Surface, grid registered, binary,
2 byte int: etopo1_ice_g_i2.zip)
(license:
public domain),
which is distributed for free by
NOAA NGDC.
WE MAKE NO CLAIM ABOUT THE CORRECTNESS OF THE BATHYMETRY/TOPOGRAPHY
DATA THAT COMES WITH ERDDAP. DO NOT USE IT FOR NAVIGATIONAL PURPOSES.
- JavaMail
Emails are sent using code in mail.jar from Oracle's
JavaMail API
(license:
COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) Version 1.1).
- JSON
ERDDAP uses
json.org's Java-based JSON library
to parse JSON data
(license: copyrighted open source).
- PostgrSQL
ERDDAP includes the
PostGres JDBC driver
(license: BSD).
The driver is Copyright (c) 1997-2010, PostgreSQL Global Development Group. All rights reserved.
- Lucene
ERDDAP use code from Apache
Lucene.
(license: Apache)
for the "lucene" search engine option (but not for the default "original" search engine).
- commons-compress
ERDDAP use code from Apache
commons-compress.
(license: Apache).
- JEXL
ERDDAP support for evaluating expressions and scripts in <sourceNames>'s relies on the
Apache project's:
Java Expression Language (JEXL)
(license: Apache).
- Cassandra
ERDDAP includes Apache
Cassandra's
cassandra-driver-core.jar
(license: Apache 2.0).
Cassandra's cassandra-driver-core.jar requires (and so ERDDAP includes):
- KT_ palettes
The color palettes which have the prefix "KT_" are a
collection of .cpt palettes by Kristen Thyng
(license:
MIT/X),
but slightly reformatted by Jennifer Sevadjian of NOAA so that they conform
to ERDDAP's .cpt requirements.
- Leaflet
ERDDAP uses the JavaScript library Leaflet
(license:
BSD 2)
as the WMS client on WMS web pages in ERDDAP.
It is excellent software (well designed, easy to use, fast, and free) from Vladimir Agafonkin.
- AWS
For working with Amazon AWS (including S3), ERDDAP uses v2 of the
AWS SDK for Java
(license:
Apache).
AWS requires Maven to pull in the dependencies. They include the following
.jar files (where xxx is the version number, which changes over time,
and the license type is in parentheses):
annotations-xxx.jar (Apache),
apache-client-xxx.jar (Apache),
ams-xxx.jar (BSD),
asm-xxx.jar (BSD),
asm-analysis-xxx.jar (BSD),
asm-commons-xxx.jar (BSD),
asm-tree-xxx.jar (BSD),
asm-util-xxx.jar (BSD),
auth-xxx.jar (?),
aws-core-xxx.jar (Apache),
aws-query-protocol-xxx.jar (Apache),
aws-xml-protocol-xxx.jar (Apache),
checker-qual-xxx.jar (MIT),
error_prone_annotations-xxx.jar (Apache),
eventstream-xxx.jar (Apache),
failureaccess-xxx.jar (Apache),
httpcore-xxx.jar (Apache),
j2objc-annotations-xxx.jar (Apache),
jackson-annotations-xxx.jar (Apache),
jackson-core-xxx.jar (Apache),
jackson-databind-xxx.jar (Apache),
jaxen-xxx.jar (BSD),
jffi-xxx.jar (Apache),
jffi-xxx.native.jar (Apache),
jnr-constants-xxx.jar (Apache),
jnr-ffi-xxx.jar (Apache),
jnr-posix-xxx.jar (Apache),
jnr-x86asm-xxx.jar (Apache),
json-xxx.jar (Copyrighted open source),
jsr305-xxx.jar (Apache),
listenablefuture-xxx.jar (Apache),
about a dozen netty .jar's (Apache),
profiles-xxx.jar (Apache),
protocol-core-xxx.jar (Apache),
reactive-streams-xxx.jar (CCO 1.0),
regions-xxx.jar (Apache),
s3-xxx.jar (Apache),
sdk-core-xxx.jar (Apache),
utils-xxx.jar (?).
To see the actual licenses, search for the .jar name in the
Maven Repository
and then rummage around in the project's files to find the license.
- MergeIR
EDDGridFromMergeIRFiles.java was written and contributed by
Jonathan Lafite and Philippe Makowski of R.Tech Engineering
(license: copyrighted open source). Thank you, Jonathan and Philippe!
- TableWriterDataTable
.dataTable (TableWriterDataTable.java) was written and contributed by Roland Schweitzer of NOAA
(license: copyrighted open source). Thank you, Roland!
- json-ld
The initial version of the
Semantic Markup of Datasets with json-ld (JSON Linked Data)
feature (and thus all of the hard work in designing the content) was
written and contributed (license: copyrighted open source)
by Adam Leadbetter and Rob Fuller of the Marine Institute in Ireland.
Thank you, Adam and Rob!
- orderBy
The code for the
orderByMean filter
in tabledap and the extensive changes to the code to
support the
variableName/divisor:offset notation
for all orderBy filters
was written and contributed (license: copyrighted open source)
by Rob Fuller and Adam Leadbetter of the Marine Institute in Ireland.
Thank you, Rob and Adam!
- Borderless Marker Types
The code for three new marker types (Borderless Filled Square, Borderless Filled Circle,
Borderless Filled Up Triangle) was contributed by Marco Alba of ETT / EMODnet Physics.
Thank you, Marco Alba!
- Translations of messages.xml
The initial version of the code in TranslateMessages.java which uses
Google's translation service to translate messages.xml into various languages
was written by Qi Zeng, who was working as a Google Summer of Code intern.
Thank you, Qi!
- orderBySum
The code for the
orderBySum filter
in tabledap (based on Rob Fuller and Adam Leadbetter's orderByMean) and
the Check All and Uncheck All buttons on the EDDGrid Data Access Form
were written and contributed (license: copyrighted open source)
by Marco Alba of ETT Solutions and EMODnet.
Thank you, Marco!
- Out-of-range .transparentPng Requests
ERDDAP now accepts requests for .transparentPng's when
the latitude and/or longitude values are partly or fully out-of-range.
(This was ERDDAP GitHub Issues #19, posted by Rob Fuller -- thanks for posting that, Rob.)
The code to fix this was written by Chris John.
Thank you, Chris!
- Display base64 image data in .htmlTable responses
The code for displaying base64 image data in .htmlTable responses
was contributed by Marco Alba of ETT / EMODnet Physics.
Thank you, Marco Alba!
- nThreads Improvement
The nThreads system for EDDTableFromFiles was significantly improved.
These changes lead to a huge speed improvement (e.g., 2X speedup when nThreads is set to 2 or more)
for the most challenging requests (when a large number of files must be processed to gather the results).
These changes will also lead to a general speedup throughout ERDDAP.
The code for these changes was contributed by Chris John. Thank you, Chris!
We are also very grateful for all of the software and websites that we use when developing ERDDAP,
including
Chrome,
curl,
DuckDuckGo,
EditPlus,
FileZilla.
GitHub,
Google Search,
PuTTY,
stack overflow,
todoist,
Wikipedia,
the Internet, the World Wide Web, and all the other, great, helpful websites.
Thank you very much.
The ERDDAP-specific code is licensed as copyrighted open source,
with
NOAA holding the copyright.
The license is:
ERDDAP, Copyright 2022, NOAA.
PERMISSION TO USE, COPY, MODIFY, AND DISTRIBUTE THIS SOFTWARE AND
ITS DOCUMENTATION FOR ANY PURPOSE AND WITHOUT FEE IS HEREBY GRANTED,
PROVIDED THAT THE ABOVE COPYRIGHT NOTICE APPEAR IN ALL COPIES, THAT
BOTH THE COPYRIGHT NOTICE AND THIS PERMISSION NOTICE APPEAR IN
SUPPORTING DOCUMENTATION, AND THAT REDISTRIBUTIONS OF MODIFIED FORMS
OF THE SOURCE OR BINARY CODE CARRY PROMINENT NOTICES STATING THAT THE
ORIGINAL CODE WAS CHANGED AND THE DATE OF THE CHANGE. THIS SOFTWARE
IS PROVIDED "AS IS" WITHOUT EXPRESS OR IMPLIED WARRANTY.
Questions, comments, suggestions? Please send an email to
erd dot data at noaa dot gov
and include the ERDDAP URL directly related to your question or comment.
Or,
you can join the ERDDAP Google Group / Mailing List by visiting
https://groups.google.com/forum/#!forum/erddap
and clicking on "Apply for membership".
Once you are a member, you can post your question there
or search to see if the question has already been asked and answered.
ERDDAP, Version 2.22
Disclaimers |
Privacy Policy