Set Up Your Own ERDDAP

ERDDAP is a Free and Open Source (external link) , all-Java (servlet), web application that runs in a web application server (for example, Tomcat (recommended), or Jetty (it works, but we don't support it)). This web page is mostly for people ("ERDDAP administrators") who want to set up their own ERDDAP installation at their own website.

Why use ERDDAP to distribute your data?

Because the small effort to set up ERDDAP brings many benefits.

If you already have a web service for distributing your data,
you can set up ERDDAP to access your data via the existing service.
Or, you can set up ERDDAP to access your data directly from local files.
For each dataset, you only have to write a small chunk of XML to tell ERDDAP how to access the dataset.
Once you have ERDDAP serving your data, end users can:
- Request the data in various ways (DAP, WMS, and more in the future).
- Get the data response in various file formats. (That's probably the biggest reason!)
- Make graphs and maps. (Everyone likes pretty pictures.)
- Build other useful and interesting things on top of ERDDAP's web services -- see the Awesome ERDDAP list of awesome ERDDAP-related projects.

You can customize your ERDDAP's appearance so ERDDAP reflects your organization and fits in with the rest of your website.

ERDDAP has been installed by approximately 100 organizations in at least 17 countries

(Australia, Belgium, Canada, China, France, India, Ireland, Italy, New Zealand, Russia, South Africa, Spain, Sri Lanka, Sweden, Thailand, UK, USA), including:

APDRC (Asia-Pacific Data-Research Center, International Pacific Research Center) at the University of Hawaii (UH)
BCO-DMO at WHOI (Biological and Chemical Oceanography Data Management Office at Woods Hole Oceanographic Institution)
CanWIN ERDDAP (Canadian Watershed Information Network) at the Centre for Earth Observation Science (CEOS), University of Manitoba
CDIP (Coastal Data Information Program at UCSD)
CNR-ISP (National Research Council of Italy, Institute of Polar Sciences)
CSIRO and IMOS (Australia's Commonwealth Scientific and Industrial Research Organisation and the Integrated Marine Observing System)
DIVER (NOAA ORR) (NOAA Office of Response and Restoration)
EMODnet Physics (The European Marine Observation and Data Network -- Physics)
GoMRI (Gulf of Mexico Research Initiative)
Hakai Institute (The Hakai Institute on the Central Coast of British Columbia, Canada)
High School Technology Services, which offers coding and technology training for students and adults
ICHEC (Irish Centre for High-End Computing)
INCOIS (Indian National Centre for Ocean Information Services)
IRD (Institut de Recherche pour le Développement, France)
CNRS (Centre National de la Recherche Scientifique, France)
UPMC (Université Pierre et Marie CURIE, Paris, France)
UCAD (Université Cheikh Anta Diop de Dakar, Sénégal)
UGB (Université Gaston Berger -- Saint-Louis du Sénégal)
UFHB (Université Félix HOUPHOUËT-BOIGNY, Abidjan, Côte d'Ivoire)
IPSL (Institut Pierre Simon Laplace des sciences de l'environnement, Paris, France)
LMI ECLAIRS (Laboratoire Mixte International «Etude du Climat en Afrique de l’Ouest et de ses Interactions avec l’Environnement Régional, et appui aux services climatiques»)
JRC (European Commission - Joint Research Centre, European Union)
The Marine Institute (Ireland)
Marine Instruments S.A. (Spain)
NCI (Australia's National Computational Infrastructure)
NOAA CoastWatch (central)
NOAA CoastWatch CGOM (Caribbean/Gulf of Mexico Node)
NOAA CoastWatch GLERL (Great Lakes Node)
NOAA CoastWatch West Coast which is co-located with and works with
NOAA ERD (Environmental Research Division of SWFSC of NMFS)
NOAA IOOS Sensors (Integrated Ocean Observing System)
NOAA IOOS CeNCOOS (Central and Northern California Ocean Observing System, run by Axiom Data Science)
NOAA IOOS GCOOS Atmospheric and Oceanographic Data: Observing System
NOAA IOOS GCOOS Atmospheric and Oceanographic Data: Historical Collections
NOAA IOOS GCOOS Biological and Socioeconomics (Gulf Coast Ocean Observing System)
NOAA IOOS NERACOOS (Northeastern Regional Association of Coastal and Ocean Observing Systems)
NOAA IOOS NGDAC (National Glider Data Assembly Center)
NOAA IOOS NANOOS (Northwest Association of Networked Ocean Observing Systems)
NOAA IOOS PacIOOS (Pacific Islands Ocean Observing System) at the University of Hawaii (UH)
NOAA IOOS SCCOOS (Southern California Coastal Ocean Observing System)
NOAA IOOS SECOORA (Southeast Coastal Ocean Observing Regional Association)
NOAA NCEI (National Centers for Environmental Information)
NOAA NGDC STP (National Geophysical Data Center, Solar -- Terrestrial Physics)
NOAA NMFS NEFSC (Northeast Fisheries Science Center)
NOAA NOS CO-OPS (Center for Operational Oceanographic Products and Services)
NOAA OSMC (Observing System Monitoring Center)
NOAA PIFSC (Pacific Islands Fisheries Science Center)
NOAA PolarWatch
NOAA UAF (Unified Access Framework)
Ocean Networks Canada
Ocean Tracking Network
OOI / All Data (Ocean Observatories Initiative)
OOI / Uncabled Data
Princeton, Hydrometeorology Research Group
R.Tech Engineering, France
Rutgers University, Department of Marine and Coastal Sciences
San Francisco Estuary Institute
Scripps Institution of Oceanography, Spray Underwater Gliders
Smart Atlantic Memorial University of Newfoundland
South African Environmental Observation Network
Spyglass Technologies
Stanford University, Hopkins Marine Station
UNESCO IODE (International Oceanographic and Information Data Exchange)
University of British Columbia, Earth, Ocean & Atmospheric Sciences Department
University of California at Davis, Bodega Marine Laboratory
University of Delaware, Satellite Receiving Station
University of Washington, Applied Physics Laboratory
USGS CMGP (Coastal and Marine Geology Program)
VOTO (Voice Of The Ocean, Sweden)

This is a list of just some of the organizations where ERDDAP has been installed by some individual or some group. It does not imply that the individual, the group, or the organization recommends or endorses ERDDAP.

ERDDAP is recommended within NOAA and CNRS
NOAA's Data Access Procedural Directive (external link) includes ERDDAP in its list of recommended data servers for use by groups within NOAA. ERDDAP is favorably mentioned in section 4.2.3 of the
Guide de bonnes pratiques sur la gestion des données de la recherche
(Research Data Management Best Practices Guide) of the Centre National de la Recherche Scientifique (CNRS) in France.

Is the installation procedure hard? Can I do it?

The initial installation takes some time, but it isn't very hard. You can do it. If you get stuck, email me at erd dot data at noaa dot gov . I will help you.
Or, you can join the ERDDAP Google Group / Mailing List and post your question there.

How To Do the Initial Setup of ERDDAP on Your Server

ERDDAP can run on any server that supports Java and Tomcat (and other application servers like Jetty, but we don't support them). ERDDAP has been tested on Linux (including on Amazon's AWS), Mac, and Windows computers.

Amazon -- If you are installing ERDDAP on an Amazon Web Services EC2 instance, see this Amazon Web Services Overview (below) first.
Docker -- Axiom now offers ERDDAP in a Docker container and IOOS now offers a Quick Start Guide for ERDDAP in a Docker Container.
It's the standard ERDDAP installation, but Axiom has put it in a docker container.
If you already use Docker, you will probably prefer the Docker version.
If you don't already use Docker, we generally don't recommend this.
If you chose to install ERDDAP via Docker, we don't offer any support for the installation process.
We haven't worked with Docker yet. If you work with this, please send us your comments.
Linux and Macs -- ERDDAP works great on Linux and Mac computers. See the instructions below.
Windows -- Windows is fine for testing ERDDAP and for personal use (see the instructions below), but we don't recommend using it for public ERDDAPs. Running ERDDAP on Windows may have problems: notably, ERDDAP may be unable to delete and/or rename files quickly. This is probably due to antivirus software (e.g., from McAfee and Norton) which is checking the files for viruses. If you run into this problem (which can be seen by error messages in the log.txt file like "Unable to delete ..."), changing the antivirus software's settings may partially alleviate the problem. Or consider using a Linux or Mac server instead.

The standard ERDDAP installation instructions for Linux, Macs, and Windows computers are:

For ERDDAP v2.19+, set up Java 17.
For security reasons, it is almost always best to use the latest version of Java 17.
Please download and install the latest version of
Adoptium's OpenJDK (Temurin) 17 (LTS). To verify the installation, type "/javaJreBinDirectory/java -version", for example
/usr/local/jdk-17.0.4+8/jre/bin/java -version
[For ERDDAP versions before v2.19, use Java 8.]
ERDDAP works with Java from other sources, but we recommend Adoptium because it is the main, community-supported, free (as in beer and speech) version of Java 17 that offers Long Term Support (free upgrades for many years past the initial release). For security reasons, please update your ERDDAP's version of Java periodically as new versions of Java 17 become available from Adoptium.
ERDDAP has been tested and used extensively with Java 17, not other versions. For various reasons, we don't test with nor support other versions of Java.
Set up Tomcat.
Tomcat is the most widely used Java Application Server, which is Java software that stands between the operating system's network services and Java server software like ERDDAP. It is Free and Open Source Software (FOSS).
You can use another Java Application Server (e.g., Jetty), but we only test with and support Tomcat.
- Download Tomcat and unpack it on your server or PC.
  For security reasons, it is almost always best to use the latest version of Tomcat 10 (version 9 and below are not acceptable) which is designed to work with Java 17. Below, the Tomcat directory will be referred to as tomcat.
  Warning! If you already have a Tomcat running some other web application (especially THREDDS), we recommend that you install ERDDAP in a second Tomcat, because ERDDAP needs different Tomcat settings and shouldn't have to contend with other applications for memory.
  - On Linux, download the "Core" "tar.gz" Tomcat distribution and unpack it. We recommend unpacking it in /usr/local.
  - On a Mac, Tomcat is probably already installed in /Library/Tomcat, but should update it to the latest version of Tomcat 10.
    If you download it, download the "Core" "tar.gz" Tomcat distribution and unpack it in /Library/Tomcat.
  - On Windows, you can download the "Core" "zip" Tomcat distribution (which doesn't mess with the Windows registry and which you control from a DOS command line) and unpack it in an appropriate directory. (For development, we use the "Core" "zip" distribution. We make a /programs directory and unpack it there.) Or you can download the "Core" "64-bit Windows zip" distribution, which includes more features. If the distribution is a Windows installer, it will probably put Tomcat in, for example, /Program Files/apache-tomcat-10.0.23 .
- server.xml - In tomcat/conf/server.xml file, there are two changes that you should make to each of the two <Connector> tags (one for '<Connector port="8080" ' and one for '<Connector port="8443" '):
  1. (Recommended) Increase the connectionTimeout parameter value, perhaps to 300000 (milliseconds) (which is 5 minutes).
  2. (Recommended) Add a new parameter: relaxedQueryChars="[]|" This is optional and slightly less secure, but removes the need for users to percent-encode these characters when they occur in the parameters of a user's request URL.
- context.xml -- Resources Cache - In tomcat/conf/context.xml, right before the </Context> tag, change the Resources tag (or add it if it isn't already there) to set the cacheMaxSize parameter to 80000:
  <Resources cachingAllowed="true" cacheMaxSize="80000" />
  This avoids numerous warnings in catalina.out that all start with
  "WARNING [main] org.apache.catalina.webresources.Cache.getResource Unable to add the resource at [/WEB-INF/classes/..."
- On Linux computers, change the Apache timeout settings so that time-consuming user requests don't timeout (with what often appears as a "Proxy" or "Bad Gateway" error). As the root user:
  1. Modify the Apache httpd.conf file (usually in /etc/httpd/conf/ ):
    Change the existing <Timeout> setting (or add one at the end of the file) to 3600 (seconds), instead of the default 60 or 120 seconds.
    Change the existing <ProxyTimeout> setting (or add one at the end of the file) to 3600 (seconds), instead of the default 60 or 120 seconds.
  2. Restart Apache: /usr/sbin/apachectl -k graceful (but sometimes it is in a different directory).
- Security recommendation: See these instructions to increase the security of your Tomcat installation, especially for public servers.
- For public ERDDAP installations on Linux and Macs, it is best to set up Tomcat (the program) as belonging to user "tomcat" (a separate user with limited permissions and which has no password. Thus, only the super user can switch to acting as user tomcat. This makes it impossible for hackers to log in to your server as user tomcat. And in any case, you should make it so that the tomcat user has very limited permissions on the server's file system (read+write+execute privileges for the apache-tomcat directory tree and <bigParentDirectory> and read-only privileges for directories with data that ERDDAP needs access to).
  - You can create the tomcat user account (which has no password) by using the command
    sudo useradd tomcat -s /bin/bash -p '*'
  - You can switch to working as user tomcat by using the command
    sudo su - tomcat
    (It will ask you for the superuser password for permission to do this.)
  - You can stop working as user tomcat by using the command
    exit
  - Do most of the rest of the Tomcat and ERDDAP setup instructions as user "tomcat". Later, run the startup.sh and shutdown.sh scripts as user "tomcat" so that Tomcat has permission to write to its log files.
  - After unpacking Tomcat, from the parent of the apache-tomcat directory:
    - Change ownership of the apache-tomcat directory tree to the tomcat user.
      chown -R tomcat apache-tomcat-10.0.23
      (but substitute the actual name of your tomcat directory).
    - Change the "group" to be tomcat, your username, or the name of a small group that includes tomcat and all the administrators of Tomcat/ERDDAP, e.g.,
      chgrp -R yourUserName apache-tomcat-10.0.23
    - Change permissions so that tomcat and the group have read, write, execute privileges, e.g,.
      chmod -R ug+rwx apache-tomcat-10.0.23
    - Remove "other" user's permissions to read, write, or execute:
      chmod -R o-rwx apache-tomcat-10.0.23
      This is important, because it prevents other users from reading possibly sensitive information in ERDDAP setup files.
- Set Tomcat's Environment Variables
  On Linux and Macs:
  Create a file tomcat/bin/setenv.sh (or in Red Hat Enterprise Linux [RHEL], edit ~tomcat/conf/tomcat10.conf) to set Tomcat's environment variables. This file will be used by tomcat/bin/startup.sh and shutdown.sh. The file should contain something like:export JAVA_HOME=/usr/local/jdk-17.0.4+8 export JAVA_OPTS='-server -Djava.awt.headless=true -Xmx1500M -Xms1500M' export TOMCAT_HOME=/usr/local/apache-tomcat-10.0.23 export CATALINA_HOME=/usr/local/apache-tomcat-10.0.23
  (but substitute the directory names from your computer).
  (If you previously set JRE_HOME, you can remove that.)
  On Macs, you probably don't need to set JAVA_HOME.
  On Windows:
  Create a file tomcat\bin\setenv.bat to set Tomcat's environment variables. This file will be used by tomcat\bin\startup.bat and shutdown.bat. The file should contain something like:SET "JAVA_HOME=\someDirectory\jdk-17.0.4+8" SET "JAVA_OPTS=-server -Xmx1500M -Xms1500M" SET "TOMCAT_HOME=\Program Files\apache-tomcat-10.0.23" SET "CATALINA_HOME=\Program Files\apache-tomcat-10.0.23"
  (but substitute the directory names from your computer).
  If this is just for local testing, remove "-server".
  (If you previously set JRE_HOME, you can remove that.)
  The -Xmx and -Xms memory settings are important because ERDDAP works better with more memory. Always set -Xms to the same value as -Xmx.
  - For 32 bit Operating Systems and 32 bit Java:
    64 bit Java is much better than 32 bit Java, but 32 bit Java will work as long as the server isn't really busy. The more physical memory in the server the better: 4+ GB is really good, 2 GB is okay, less is not recommended. With 32 bit Java, even with abundant physical memory, Tomcat and Java won't run if you try to set -Xmx much above 1500M (1200M on some computers). If your server has less than 2GB of memory, reduce the -Xmx value (in 'M'egaBytes) to 1/2 of the computer's physical memory.
  - For 64 bit Operating Systems and 64 bit Java:
    64 bit Java will only work on a 64 bit operating system.
    - With Java 8, you need to add -d64 to the Tomcat CATALINA_OPTS parameter in setenv.bat
    - With Java 17, you choose 64 bit Java when you download a version of Java marked "64 bit".
    With 64 bit Java, Tomcat and Java can use very high -Xmx and -Xms settings. The more physical memory in the server the better. As a simplistic suggestion: we recommend you set -Xmx and -Xms to (in 'M'egaBytes) to 1/2 (or less) of the computer's physical memory. You can see if Tomcat, Java, and ERDDAP are indeed running in 64 bit mode by searching for " bit," in ERDDAP's Daily Report email or in the bigParentDirectory/logs/log.txt file (bigParentDirectory is specified in setup.xml).
  - In ERDDAP's log.txt file, you will see many "GC (Allocation Failure)" messages.
    This is usually not a problem. It is a frequent message from a normally operating Java saying that it just finished a minor garbage collection because it ran out of room in Eden (the section of the Java heap for very young objects). Usually the message shows you memoryUseBefore->memoryUseAfter. If those two numbers are close together, it means that the garbage collection wasn't productive. The message is only a sign of trouble if it is very frequent (every few seconds), not productive, and the numbers are large and not growing, which together indicate that Java needs more memory, is struggling to free up memory, and is unable to free up memory. This may happen during a stressful time, then go away. But if it persists, that is a sign of trouble.
  - If you see java.lang.OutOfMemoryError's in ERDDAP's log.txt file, see OutOfMemoryError for tips on how to diagnose and resolve the problems.
- On Linux and Macs, change the permissions of all *.sh files in tomcat/bin/ to be executable by the owner, e.g., with
  chmod +x *.sh
- Fonts for images: We strongly prefer the free DejaVu fonts to the other Java fonts. Using these fonts is strongly recommended but not required.
  If you choose not to use the DejaVu fonts, you need to change the fontFamily setting in setup.xml to <fontFamily>SansSerif</fontFamily>, which is available with all Java distributions. If you set fontFamily to the name of a font that isn't available, ERDDAP won't load and will print a list of available fonts in the log.txt file. You must use one of those fonts.
  If you choose to use the DejaVu fonts, please make sure the fontFamily setting in setup.xml is <fontFamily>DejaVu Sans</fontFamily>.
  To install the DejaVu fonts, please download DejaVuFonts.zip (5,522,795 bytes, MD5=33E1E61FAB06A547851ED308B4FFEF42) and unzip the font files to a temporary directory.
  - On Linux:
    - For Linux Adoptium Java distributions, see these instructions.
    - With other Java distributions: As the Tomcat user, copy the font files into JAVA_HOME/lib/fonts so Java can find the fonts. Remember: if/when you later upgrade to a newer version of Java, you need to reinstall these fonts.
  - On Macs: for each font file, double click on it and then click Install Font.
  - On Windows 7 and 10: in Windows Explorer, select all of the font files. Right click. Click on Install.
- Test your Tomcat installation.
  - Linux:
    - As user "tomcat", run tomcat/bin/startup.sh
    - View your URL + ":8080/" in your browser (e.g., http://coastwatch.pfeg.noaa.gov:8080/).
    - You should see the Tomcat "Congratulations" page.
      If there is trouble, see the Tomcat log file tomcat/logs/catalina.out.
  - Mac (run tomcat as the system administrator user):
    - Run tomcat/bin/startup.sh
    - View your URL + ":8080/" in your browser (e.g., http://coastwatch.pfeg.noaa.gov:8080/). Note that by default, your Tomcat is only accessible by you. It is not publicly accessible.
    - You should see the Tomcat "Congratulations" page.
      If there is trouble, see the Tomcat log file tomcat/logs/catalina.out.
  - Windows localhost:
    - Right click on the Tomcat icon in the system tray, and choose "Start service".
    - View http://127.0.0.1:8080/ (or perhaps http://localhost:8080/) in your browser. Note that by default, your Tomcat is only accessible by you. It is not publicly accessible.
    - You should see the Tomcat "Congratulations" page.
      If there is trouble, see the Tomcat log file tomcat/logs/catalina.out.
- Troubles with the Tomcat installation?
  - On Linux and Mac, if you can't reach Tomcat or ERDDAP (or perhaps you just can't reach them from a computer outside your firewall), you can test if Tomcat is listening to port 8080, by typing (as root) on a command line of the server:
    netstat -tuplen | grep 8080
    That should return one line with something like:
    tcp 0 0 :::8080 :::* LISTEN ## ##### ####/java
    (where '#' is some digit), indicating that a "java" process (presumably Tomcat) is listening on port "8080" for "tcp" traffic. If no lines were returned, if the line returned is significantly different, or if two or more lines were returned, then there may be a problem with the port settings.
  - See the Tomcat log file tomcat/logs/catalina.out. Tomcat problems and some ERDDAP startup problems are almost always indicated there. This is common when you are first setting up ERDDAP.
  - See the Tomcat website or search the web for help, but please let us know the problems you had and the solutions you found.
  - Email me at erd dot data at noaa dot gov . I will try to help you.
  - Or, you can join the ERDDAP Google Group / Mailing List and post your question there.
Set up the tomcat/content/erddap configuration files.
On Linux, Mac, and Windows, download erddapContent.zip (version 2.22, 19810 bytes, MD5=1E26F62E7A06191EE6868C40B9A29362, dated 2022-12-08) and unzip it into tomcat, creating tomcat/content/erddap .
Other Directory: For Red Hat Enterprise Linux (RHEL) or for other situations where you aren't allowed to modify the Tomcat directory or where you want/need to put the ERDDAP content directory in some other location for some other reason (for example, if you use Jetty instead of Tomcat), unzip erddapContent.zip into the desired directory (to which only user=tomcat has access) and set the erddapContentDirectory system property (e.g., erddapContentDirectory=~tomcat/content/erddap) so ERDDAP can find this new content directory.
[Some previous versions are also available:
2.17 (19,792 bytes, MD5=8F892616BAEEF2DF0F4BB036DCB4AD7C, dated 2022-02-16)
2.18 (19,792 bytes, MD5=8F892616BAEEF2DF0F4BB036DCB4AD7C, dated 2022-02-16)
2.21 (19,810 bytes, MD5=1E26F62E7A06191EE6868C40B9A29362, dated 2022-10-09) ]
Then,
- Read the comments in tomcat/content/erddap/setup.xml and make the requested changes. setup.xml is the file with all of the settings which specify how your ERDDAP behaves.
  For the initial setup, you MUST at least change these settings: <bigParentDirectory>, <emailEverythingTo>, <baseUrl>, <email.*>, <admin.*> (and <baseHttpsUrl> when you set up https).
  When you create the bigParentDirectory, from the parent directory of bigParentDirectory:
  - Make user=tomcat the owner of the bigParentDirectory, e.g.,
    chown -R tomcat bigParentDirectory
  - Change the "group" to be tomcat, your username, or the name of a small group that includes tomcat and all the administrators of Tomcat/ERDDAP, e.g.,
    chgrp -R yourUserName bigParentDirectory
  - Change permissions so that tomcat and the group have read, write, execute privileges, e.g,.
    chmod -R ug+rwx bigParentDirectory
  - Remove "other" user's permissions to read, write, or execute:
    chmod -R o-rwx bigParentDirectory reading possibly sensitive information in ERDDAP log files and files with information about private datasets.
  Environment Variables - Starting with ERDDAP v2.13, ERDDAP administrators can override any value in setup.xml by specifying an environment variable named ERDDAP_valueName before running ERDDAP. For example, use ERDDAP_baseUrl overrides the <baseUrl> value. This can be handy when deploying ERDDAP with a container, as you can put standard settings in setup.xml and then supply special settings via environment variables. If you supply secret information to ERDDAP via this method, be sure to check that the information will remain secret. ERDDAP only reads environment variables once per startup, in the first second of startup, so one way to use this is: set the environment variables, start ERDDAP, wait until ERDDAP is started, then unset the environment variables.
- Read the comments in Working with the datasets.xml File. Later, after you get ERDDAP running for the first time (usually with just the default datasets), you will modify the XML in tomcat/content/erddap/datasets.xml to specify all of the datasets you want your ERDDAP to serve. This is where you will you spend the bulk of your time while setting up ERDDAP and later while maintaining your ERDDAP.
- (Unlikely) Now or (slightly more likely) in the future, if you want to modify erddap's CSS file, make a copy of tomcat/content/erddap/images/erddapStart2.css called erddap2.css and then make changes to it. Changes to erddap2.css only take effect when ERDDAP is restarted and often also require the user to clear the browser's cached files.
ERDDAP will not work correctly if the setup.xml or datasets.xml file isn't a well-formed XML file. So, after you edit these files, it is a good idea to verify that the result is well-formed XML by pasting the XML text into an XML checker like xmlvalidation.
Install the erddap.war file.
On Linux, Mac, and Windows, download erddap.war into tomcat/webapps .
(version 2.22, 567,742,765 bytes, MD5=2B33354F633294213AE2AFDDCF4DA6D0, dated 2022-12-08)
The .war file is big because it contains high resolution coastline, boundary, and elevation data needed to create maps.
[Some previous versions are also available.
2.17 (551,068,245 bytes, MD5=5FEA912B5D42E50EAB9591F773EA848D, dated 2022-02-16)
2.18 (551,069,844 bytes, MD5=461325E97E7577EC671DD50246CCFB8B, dated 2022-02-23)
2.21 (568,644,411 bytes, MD5=F2CFF805893146E932E498FDDBD519B6, dated 2022-10-09) ]
Use ProxyPass so users don't have to put the port number, e.g., :8080, in the URL.
On Linux computers, if Tomcat is running in Apache, please modify the Apache httpd.conf file (usually in /etc/httpd/conf/ ) to allow HTTP traffic to/from ERDDAP without requiring the port number, e.g., :8080, in the URL. As the root user:
1. Modify the existing <VirtualHost> tag (if there is one), or add one at the end of the file:
```
<VirtualHost *:80>
   ServerName YourDomain.org
   ProxyRequests Off
   ProxyPreserveHost On
   ProxyPass /erddap http://localhost:8080/erddap
   ProxyPassReverse /erddap http://localhost:8080/erddap
</VirtualHost>
```
2. Then restart Apache: /usr/sbin/apachectl -k graceful (but sometimes it is in a different directory).

(UNCOMMON) If you are using NGINX (external link)

(a web server and load balancer):
in order to get NGINX and ERDDAP working correctly with https, you need to put the following snippet inside the Tomcat server.xml <Host> block:

<Valve className="org.apache.catalina.valves.RemoteIpValve"  
  remoteIpHeader="X-Forwarded-For"  
  protocolHeader="X-Forwarded-Proto"  
  protocolHeaderHttpsValue="https" />

And in the nginx config file, you need to set these headers:

  proxy_set_header Host              $host;
  proxy_set_header X-Real-IP         $remote_addr;
  proxy_set_header REMOTE_ADDR       $remote_addr;
  proxy_set_header HTTP_CLIENT_IP    $remote_addr;
  proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
  proxy_set_header X-Forwarded-Proto $scheme;

(Thanks to Kyle Wilcox.)

Start Tomcat.
- (I don't recommend using the Tomcat Web Application Manager. If you don't fully shutdown and startup Tomcat, sooner or later you will have PermGen memory issues.)
- (In Linux or Mac OS, if you have created a special user to run Tomcat, e.g., tomcat, remember to do the following steps as that user.)
- If Tomcat is already running, shut down Tomcat with (in Linux or Mac OS) tomcat/bin/shutdown.sh
  or (in Windows) tomcat\bin\shutdown.bat
  On Linux, use ps -ef | grep tomcat before and after shutdown.sh to make sure the tomcat process has stopped. The process should be listed before the shutdown and eventually not listed after the shutdown. It may take a minute or two for ERDDAP to fully shut down. Be patient. Or if it looks like it won't stop on its own, use:
  kill -9 processID
- Start Tomcat with (in Linux or Mac OS) tomcat/bin/startup.sh
  or (in Windows) tomcat\bin\startup.bat
Is ERDDAP running?
Use a browser to try to view http://www.YourServer.org/erddap/status.html
ERDDAP starts up without any datasets loaded. Datasets are loaded in a background thread and so become available one-by-one.
Trouble? Look in the log files.
- When a request from a user comes in, it goes to Apache (on Linux and Mac OS computers), then Tomcat, then ERDDAP.
- You can see what comes to Apache (and related errors) in the Apache log files.
- You can see what comes to Tomcat (and related errors) in the Tomcat log files (tomcat/logs/catalina.out and other files in that directory).
- You can see what comes to ERDDAP, diagnostic messages from ERDDAP, and error messages from ERDDAP, in the ERDDAP <bigParentDirectory>logs/log.txt file.
- Tomcat doesn't start ERDDAP until Tomcat gets a request for ERDDAP. So you can see in the Tomcat log files if it started ERDDAP or if there is an error message related to that attempt.
- When ERDDAP starts up, it renames the old ERDDAP log.txt file (logArchivedAtCurrentTime.txt) and creates a new log.txt file. So if the log.txt file is old, it is a sign that ERDDAP hasn't recently restarted. ERDDAP writes log info to a buffer and only writes the buffer to the log file periodically, but you can force ERDDAP to write the buffer to the log file by visiting .../erddap/status.html.
Trouble: Old Version of Java
If you are using a version of Java that is too old for ERDDAP, ERDDAP won't run and you will see an error message in Tomcat's log file like
Exception in thread "main" java.lang.UnsupportedClassVersionError: some/class/name: Unsupported major.minor version someNumber
The solution is to update to the most recent version of Java and make sure that Tomcat is using it.
Trouble: Slow Startup First Time
Tomcat has to do a lot of work the first time an application like ERDDAP is started; notably, it has to unpack the erddap.war file (which is like a .zip file). On some servers, the first attempt to view ERDDAP stalls (30 seconds?) until this work is finished. On other servers, the first attempt will fail immediately. But if you wait 30 seconds and try again, it will succeed if ERDDAP was installed correctly.
There is no fix for this. This is simply how Tomcat works. But it only occurs the first time after you install a new version of ERDDAP.
In the future, to shut down (and restart) ERDDAP, see
How to Shut Down and Restart Tomcat and ERDDAP.
Troubles installing Tomcat or ERDDAP?
Email me at erd dot data at noaa dot gov . I will help you.
Or, you can join the ERDDAP Google Group / Mailing List and post your question there.
Email Notification of New Versions of ERDDAP
If you want to receive an email whenever a new version of ERDDAP is available, send an email to erd dot data at noaa dot gov requesting to be added to the ERDDAP Announcements mailing list. This list averages roughly one email every three months.
Customize your ERDDAP to highlight your organization (not NOAA ERD).
- Change the banner that appears at the top of all ERDDAP .html pages by editing the <startBodyHtml5> tag in your datasets.xml file. (If there isn't one, copy the default from ERDDAP's
  [tomcat]/webapps/erddap/WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml file into datasets.xml and edit it.) For example, you could:
  - Use a different image (i.e., your organization's logo).
  - Change the background color.
  - Change "ERDDAP" to "YourOrganization's ERDDAP"
  - Change "Easier access to scientific data" to "Easier access to YourOrganization's data".
  - Change the "Brought to you by" links to be links to your organization and funding sources.
- Change the information on the left side of the home page by editing the <theShortDescriptionHtml> tag in your datasets.xml file. (If there isn't one, copy the default from ERDDAP's
  [tomcat]/webapps/erddap/WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml file into datasets.xml and edit it.) For example, you could:
  - Describe what your organization and/or group does.
  - Describe what kind of data this ERDDAP has.
- To change the icon that appears on browser tabs, put your organization's favicon.ico in tomcat/content/erddap/images/ . See https://en.wikipedia.org/wiki/Favicon.

How To Do an Update of an Existing ERDDAP on Your Server

Make the changes listed in Changes in the section entitled "Things ERDDAP Administrators Need to Know and Do" for all of the ERDDAP versions since the version you were using.
If you are upgrading from ERDDAP version 2.18 or below, you need to switch to Java 17 and the related Tomcat 10. See the regular ERDDAP installation instructions for Java and Tomcat. You'll also have to copy your tomcat/content/erddap directory from your old Tomcat installation to your new Tomcat installation.
Download erddap.war into tomcat/webapps .
(version 2.22, 567,742,765 bytes, MD5=2B33354F633294213AE2AFDDCF4DA6D0, dated 2022-12-08)
messages.xml
- Common: If you are upgrading from ERDDAP version 1.46 (or above) and you just use the standard messages, the new standard messages.xml will be installed automatically (amongst the .class files via erddap.war).
- Rare: If you are upgrading from ERDDAP version 1.44 (or below),
  you MUST delete the old messages.xml file:
  tomcat/content/erddap/messages.xml .
  The new standard messages.xml will be installed automatically (amongst the .class files via erddap.war).
- Rare: If you always make changes to the standard messages.xml file (in place),
  you need to make those changes to the new messages.xml file (which is
  WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml after erddap.war is decompressed by Tomcat).
- Rare: If you maintain a custom messages.xml file in tomcat/content/erddap/,
  you need to figure out (via diff) what changes have been made to the default messages.xml (which are in the new erddap.war as
  WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml) and modify your custom messages.xml file accordingly.
Install the new ERDDAP in Tomcat:
* Don't use Tomcat Manager. Sooner or later there will be PermGen memory issues. It is better to actually shutdown and startup Tomcat.
* Replace references to tomcat below with the actual Tomcat directory on your computer.
- For Linux and Macs:
  1. Shutdown Tomcat: From a command line, use: tomcat/bin/shutdown.sh
    And use ps -ef | grep tomcat to see if/when the process has been stopped. (It may take a minute or two.)
  2. Remove the decompressed ERDDAP installation: In tomcat/webapps, use
    rm -rf erddap
  3. Delete the old erddap.war file: In tomcat/webapps, use rm erddap.war
  4. Copy the new erddap.war file from the temporary directory to tomcat/webapps
  5. Restart Tomcat and ERDDAP: use tomcat/bin/startup.sh
  6. View ERDDAP in your browser to check that the restart succeeded.
    (Often, you have to try a few times and wait a minute before you see ERDDAP.)
- For Windows:
  1. Shutdown Tomcat: From a command line, use: tomcat\bin\shutdown.bat
  2. Remove the decompressed ERDDAP installation: In tomcat/webapps, use
    del /S/Q erddap
  3. Delete the old erddap.war file: In tomcat\webapps, use del erddap.war
  4. Copy the new erddap.war file from the temporary directory to tomcat\webapps
  5. Restart Tomcat and ERDDAP: use tomcat\bin\startup.bat
  6. View ERDDAP in your browser to check that the restart succeeded.
    (Often, you have to try a few times and wait a minute before you see ERDDAP.)

Troubles updating ERDDAP?
Email me at erd dot data at noaa dot gov . I will help you.
Or, you can join the ERDDAP Google Group / Mailing List and post your question there.

Things You Need To Know

Use Ctrl-F To Find Things On This Web Page
All of the information about administering ERDDAP (other than working with datasets.xml) is on this one, very long, .html web page, not several .html pages as some people prefer. The advantage of one .html web page is that you can use Ctrl-F (Command-F on a Mac) in your web browser to search for text (for example, flag) within this web page.
Alternatively, at the top of this document, there is a list of main topics (a Table of Contents).
Internal Links
ERDDAP's web pages have a large number of almost invisible, internal links (the text is black and not underlined). If you hover over one of these links (usually the first few words of headings and paragraphs), the cursor becomes a hand. If you click on the link, the URL is the internal link to that section of the document. This makes it easy to refer to specific sections of ERDDAP web pages. As an example, hover over, and click on, the bold "Internal Links" at the start of this paragraph.
Proxy Errors
Sometimes, a request to ERDDAP will return a Proxy Error, an HTTP 502 Bad Gateway Error, or some similar error. These errors are being thrown by Apache or Tomcat, not ERDDAP itself.
- If every request generates these errors, especially when you are first setting up your ERDDAP, then it probably is a proxy or bad gateway error, and the solution is probably to fix ERDDAP's proxy settings. This may also be the problem when an established ERDDAP suddenly starts throwing these errors for every request.
- Otherwise, "proxy" errors are usually actually time out errors thrown by Apache or Tomcat. Even when they happen relatively quickly, it is some sort of response from Apache or Tomcat that occurs when ERDDAP is very busy, memory-limited, or limited by some other resource. In these cases, see the advice below to deal with ERDDAP responding slowly.
  Requests for a long time range (>30 time points) from a gridded dataset are prone to time out failures, which often appear as Proxy Errors, because it takes significant time for ERDDAP to open all of the data files one-by-one. If ERDDAP is otherwise busy during the request, the problem is more likely to occur. If the dataset's files are compressed, the problem is more likely to occur, although it's hard for a user to determine if a dataset's files are compressed.
  The solution is to make several requests, each with a smaller time range. How small of a time range? I suggest starting really small (~30 time points?), then (approximately) double the time range until the request fails, then go back one doubling. Then make all the requests (each for a different chunk of time) needed to get all of the data.
  An ERDDAP administrator can lessen this problem by increasing the Apache timeout settings.
Monitoring ERDDAP
We all want our data services to find their audience and be extensively used, but sometimes your ERDDAP may be used too much, causing problems, including super slow responses for all requests. Our plan to avoid problems is:
- Monitor ERDDAP via the status.html web page.
  It has tons of useful information. If you see that a huge number of requests are coming in, or tons of memory being used, or tons of failed requests, or each Major LoadDatasets is taking a long time, or see any sign of things getting bogged down and responding slowly, then look in ERDDAP's log.txt file to see what's going on.
  It's also useful to simply note how fast the status page responds. If it responds slowly, that is an important indicator that ERDDAP is very busy.
- Monitor ERDDAP via the Daily Report email.
- Watch for out-of-date datasets via the baseUrl/erddap/outOfDateDatasets.html web page which is based on the optional testOutOfDate global attribute.
- External Monitors
  The methods listed above are ERDDAP's ways of monitoring itself. It is also possible to make or use external systems to monitor your ERDDAP. One project to do this is Axiom's erddap-metrics project. Such external systems have some advantages:
  - They can be customized to provide the information you want, displayed in the way you want.
  - They can include information about ERDDAP that ERDDAP can't access easily or at all (for example, CPU usage, disk free space, ERDDAP response time as seen from the user's perspective, ERDDAP uptime,
  - They can provide alerts (emails, phone calls, texts) to administrators when problems exceed some threshold.
- Blacklist users making multiple simultaneous requests!
  If it is clear that some user is making more than one simultaneous request, repeatedly and continuously, then add their IP address to ERDDAP's <requestBlacklist> in your datasets.xml file. Sometimes the requests are all from one IP address. Sometimes they are from multiple IP addresses, but clearly the same user. You can also blacklist people making tons of invalid requests or tons of mind-numbingly inefficient requests.
  Then, for each request they make, ERDDAP returns:
  HTTP ERROR 403 - Access Forbidden --
  Your IP address is on this ERDDAP's request blacklist.
  Did you often submit more than one request at a time?
  Did you often submit identical requests in a short period of time?
  Did you submit a large number of invalid requests?
  If you are ready to avoid these problems, please email [ERDDAP administrator's email address] to request to be taken off of the blacklist.
  
  Hopefully the user will see this message and contact you to find out how to fix the problem and get off the blacklist. Sometimes, they just switch IP addresses and try again.
  It is like the balance of power between offensive and defensive weapons in war. Here, the defensive weapons (ERDDAP) have a fixed capacity, limited by the number of cores in the CPU, the disk access bandwidth, and the network bandwidth. But the offensive weapons (users, notably scripts) have unlimited capacity:
  - A single request for data from a lot of time points may cause ERDDAP to open a huge number of files (in sequence or partly multi-threaded). In extreme cases, one "simple" request can easily tie up the RAID attached to ERDDAP for a minute, effectively blocking the handling of other requests.
  - A single request may consume a large chunk of memory (even though ERDDAP is coded to minimize the memory needed to handle large requests).
  - Parallelization -
    It is easy for a clever user to parallelize a big task by generating lots of threads, each of which submits a separate request (which may be large or small). This behavior is encouraged by the computer science community as an efficient way to deal with a large problem (and parallelizing is efficient in other circumstances). Going back to the war analogy: users can make an essentially unlimited number of simultaneous requests with the cost of each being essentially zero, but the cost of each request coming into ERDDAP can be large and ERDDAP's response capability is finite. Clearly, ERDDAP will lose this battle, unless the ERDDAP administrator blacklists users who are making multiple simultaneous requests which are unfairly crowding out other users.
  - Multiple Scripts -
    Now think about what happens when there are several clever users each running parallelized scripts. If one user can generate so many requests that other users are crowded out, then multiple such users can generate so many requests that ERDDAP becomes overwhelmed and seemingly unresponsive. It is effectively a DDOS attack Again, the only defense for ERDDAP is to blacklist users making multiple simultaneous requests which are unfairly crowding out other users.
  - Inflated Expectations -
    In this world of massive tech companies (Amazon, Google, Facebook, ...), users have come to expect essentially unlimited capabilities from the providers. Since these companies are money making operations, the more users they have, the more revenue they have to expand their IT infrastructure. So they can afford a massive IT infrastructure to handle requests. And they cleverly limit the number of requests and cost of each request from users by limiting the kinds of requests that users can make so that no single request is burdensome, and there is never a reason (or a way) for users to make multiple simultaneous requests. So these huge tech companies may have far more users than ERDDAP, but they have massively more resources and clever ways to limit the requests from each user. It's a manageable situation for the big IT companies (and they get rich!) but not for ERDDAP installations. Again, the only defense for ERDDAP is to blacklist users making multiple simultaneous requests which are unfairly crowding out other users.
  So users: Don't make multiple simultaneous requests or you will be blacklisted!
Clearly, it is best if your server has a lot of cores, a lot of memory (so you can allocate a lot of memory to ERDDAP, more than it ever needs), and a high bandwidth internet connection. Then, memory is rarely or never a limiting factor, but network bandwidth becomes the more common limiting factor. Basically, as there are more and more simultaneous requests, the speed to any given user decreases. That naturally slows down the number of requests coming in if each user is just submitting one request at a time.
ERDDAP Getting Data From THREDDS
If your ERDDAP gets some of its data from a THREDDS at your site, there are some advantages to making a copy of the THREDDS data files (at least for the most popular datasets) on another RAID that ERDDAP has access to so that ERDDAP can serve data from the files directly. At ERD, we do that for our most popular datasets.
- ERDDAP can get the data directly and not have to wait for THREDDS to reload the dataset or ...
- ERDDAP can notice and incorporate new data files immediately, so it doesn't have to pester THREDDS frequently to see if the dataset has changed. See <updateEveryNMillis>.
- The load is split between 2 RAIDS and 2 servers, instead of the request being hard on both ERDDAP and THREDDS.
- You avoid the mismatch problem caused by THREDDS having a small (by default) maximum request size. ERDDAP has a system to handle the mismatch, but avoiding the problem is better.
- You have a backup copy of the data which is always a good idea.
In any case, don't ever run THREDDS and ERDDAP in the same Tomcat. Run them in separate Tomcats, or better, on separate servers.
We find that THREDDS periodically gets in a state where requests just hang. If your ERDDAP is getting data from a THREDDS and the THREDDS is in this state, ERDDAP has a defense (it says the THREDDS-based dataset isn't available), but it is still troublesome for ERDDAP because ERDDAP has to wait until the timeout each time it tries to reload a dataset from a hung THREDDS. Some groups (including ERD) avoid this by proactively restarting THREDDS frequently (e.g., nightly in a cron job).
If ERDDAP Is Responding Slowly or if just certain requests are responding slowly,
you may be able to figure out if the slowness is reasonable and temporary (e.g., because of lots of requests from scripts or WMS users), or if something is inexplicably wrong and you need to shut down and restart Tomcat and ERDDAP.
If ERDDAP is responding slowly, see the advice below to determine the cause, which hopefully will enable you to fix the problem.
You may have a specific starting point (e.g., a specific request URL) or a vague starting point (e.g., ERDDAP is slow).
You may know the user involved (e.g., because they emailed you), or not.
You may have other clues, or not.
Since all of these situations and all of the possible causes of the problems blur together, the advice below tries to deal with all possible starting points and all possible problems related to slow responses.
- Look for clues in ERDDAP's log file (bigParentDirectory/logs/log.txt).
  [On rare occasions, there are clues in Tomcat's log file (tomcat/logs/catalina.out).]
  Look for error messages.
  Look for a large number of requests coming from one (or a few) users and perhaps hogging a lot of your server's resources (memory, CPU time, disk access, internet bandwidth).
  If the trouble is tied to one user, you can often get a clue about who the user is via web services like https://whatismyipaddress.com/ip-lookup that can give you information related to the user's IP address (which you can find in ERDDAP's log.txt file).
  - If the user seems to be a bot behaving badly (notably, a search engine trying to fill out the ERDDAP forms with every possible permutation of entry values), make sure you have properly set up your server's robots.txt file.
  - If the user seems to be a script(s) that is making multiple simultaneous requests, contact the user, explain that your ERDDAP has limited resources (e.g., memory, CPU time, disk access, internet bandwidth), and ask them to be considerate of other users and just make one request at a time. You might also mention that you will blacklist them if they don't back off.
  - If the user seems to be a script making a large number of time-consuming requests, ask the user to be considerate of other users by putting a small pause (2 seconds?) in the script between requests.
  - WMS client software can be very demanding. One client will often ask for 6 custom images at a time. If the user seems to be a WMS client that is making legitimate requests, you can:
    - Ignore it. (recommended, because they'll move on pretty soon)
    - Turn off your server's WMS service via ERDDAP's setup.html file. (not recommended)
  - If the requests seem stupid, insane, excessive, or malicious, or if you can't resolve the problem any other way, consider temporarily or permanently adding the user's IP address to the <requestBlacklist> in your datasets.xml file.
- Try to duplicate the problem yourself, from your computer.
  Figure out if the problem is with one dataset or all datasets, for one user or all users, for just certain types of requests, etc..
  If you can duplicate the problem, try to narrow down the problem.
  If you can't duplicate the problem, then the problem may be tied to the user's computer, the user's internet connection, or your institution's internet connection.
- If just one dataset is responding slowly (perhaps only for one type of request from one user), the problem may be:
  - ERDDAP's access to the dataset's source data (notably from relational databases, Cassandra, and remote datasets) may be temporarily or permanently slow. Try to check the source's speed independent of ERDDAP. If it is slow, perhaps you can improve it.
  - Is the problem related to the specific request or general type of request?
    The larger the requested subset of a dataset, the more likely the request will fail. If the user is making huge requests, ask the user to make smaller requests that are more likely to get a fast and successful response.
    Almost all data sets are better at handling some types of requests than others types of requests. For example, when a dataset stores different time chunks in different files, requests for data from a huge number of time points may be very slow. If the current requests are of a difficult type, consider offering a variant of the dataset that is optimized for these requests. Or just explain to the user that that type of request is difficult and time consuming, and ask for their patience.
  - The dataset may be not optimally configured. You may be able to make changes to the dataset's datasets.xml chunk to help ERDDAP handle the dataset better. For example,
    - EDDGridFromNcFiles datasets that access data from compressed nc4/hdf5 files are slow when getting data for the entire geographic range (e.g., for a world map) because the entire file must be decompressed. You could convert the files to uncompressed files, but then the disk space requirement will be much, much larger. It is probably better to just accept that such datasets will be slow in certain circumstances.
    - The configuration of the <subsetVariables> tag has a huge influence on how ERDDAP handles EDDTable datasets.
    - You may be able to increase the speed of an EDDTableFromDatabase dataset.
    - Many EDDTable datasets can be sped up by storing a copy of the data in NetCDF Contiguous Ragged Array files, which ERDDAP can read very quickly.
    If you want help speeding up a specific dataset, email a description of the problem and the dataset's chunk of datasets.xml to erd dot data at noaa dot gov.
    Or, you can join the ERDDAP Google Group / Mailing List and post your question there.
- If everything in ERDDAP is always slow, the problem may be:
  - The computer that is running ERDDAP may not have enough memory or processing power. It is good to run ERDDAP on a modern, multi-core server. For heavy use, the server should have a 64-bit operating system and 8 GB or more of memory.
  - The computer that is running ERDDAP may be also running other applications that are consuming lots of system resources. If so, can you get a dedicated server for ERDDAP? For example (this is not an endorsement), you can get a quad-core Mac Mini Server with 8 GB of memory for ~$1100.
- If everything in ERDDAP is temporarily slow, view your ERDDAP's /erddap/status.html page in your browser.
  - Does the ERDDAP status page fail to load?
    If so, restart ERDDAP.
  - Did the ERDDAP status page load slowly (e.g., >5 seconds)?
    That is a sign that everything in ERDDAP is running slowly, but it isn't necessarily trouble. ERDDAP may just be really busy.
  - For "Response Failed Time (since last major LoadDatasets)", is n= a large number?
    That indicates there have been lots of failed requests recently. That may be trouble or the start of trouble. The median time for the failures is often large (e.g., 210000 ms),
    which means that there were (are?) lots of active threads.
    which were tying up lots of resources (like memory, open files, open sockets, ...),
    which is not good.
  - For "Response Succeeded Time (since last major LoadDatasets)", is n= a large number?
    That indicates there have been lots of successful requests recently. This isn't trouble. It just means your ERDDAP is getting heavy use.
  - Is the "Number of non-Tomcat-waiting threads" double a typical value?
    This is often serious trouble that will cause ERDDAP to slow down and eventually freeze. If this persists for hours, you may want to proactively restart ERDDAP.
  - At the bottom of the "Memory Use Summary" list, is the last "Memory: currently using" value very high?
    That may just indicate high usage, or it may be a sign of trouble.
  - Look at the list of threads and their status. Are an unusual number of them doing something unusual?
- Is your institution's internet connection currently slow?
  Search the internet for "internet speed test" and use one of the free online tests, such as https://www.speakeasy.net/speedtest/. If your institution's internet connection is slow, then connections between ERDDAP and remote data sources will be slow, and connections between ERDDAP and the user will be slow. Sometimes, you can solve this by stopping unnecessary internet use (e.g., people watching streaming videos or on video conference calls).
- Is the user's internet connection currently slow?
  Have the user search the internet for "internet speed test" and use one of the free online tests, such as https://www.speakeasy.net/speedtest/. If the user's internet connection is slow, it slows down their access to ERDDAP. Sometimes, they can solve this by stopping unnecessary internet use at their institution (e.g., people watching streaming videos or on video conference calls).
- Stuck?
  Email all the details to erd.data at noaa.gov .
How to Shut Down and Restart Tomcat and ERDDAP
You don't need to shut down and restart Tomcat and ERDDAP if ERDDAP is temporarily slow, slow for some known reason (like lots of requests from scripts or WMS users), or to apply changes to datasets.xml file.
You do need to shut down and restart Tomcat and ERDDAP if you need to apply changes to the setup.xml file, or if ERDDAP freezes, hangs, or locks up. In extreme circumstances, Java may freeze for a minute or two while it does a full garbage collection, but then recover. So it is good to wait a minute or two to see if Java/ERDDAP is really frozen or if it is just doing a long garbage collection. (If garbage collection is a common problem, allocate more memory to Tomcat.)
I don't recommend using the Tomcat Web Application Manager to start or shutdown Tomcat. If you don't fully shutdown and startup Tomcat, sooner or later you will have PermGen memory issues.
To shutdown and restart Tomcat and ERDDAP:
- If you use Linux or a Mac:
  (If you have created a special user to run Tomcat, e.g., tomcat, remember to do the following steps as that user.)
  1. Use cd tomcat/bin
  2. Use ps -ef | grep tomcat to find the java/tomcat processID (hopefully, just one process will be listed), which we'll call javaProcessID below.
  3. If ERDDAP is frozen/hung/locked up, use kill -3 javaProcessID to tell Java (which is running Tomcat) to do a thread dump to the Tomcat log file: tomcat/logs/catalina.out . After you reboot, you can diagnose the problem by finding the thread dump information (and any other useful information above it) in tomcat/logs/catalina.out and also by reading relevant parts of the ERDDAP log archive. If you want, you can email that information to erd dot data at noaa dot gov so I can see what went wrong.
    Or, you can join the ERDDAP Google Group / Mailing List and post your question there.
  4. Use ./shutdown.sh
  5. Use ps -ef | grep tomcat repeatedly until the java/tomcat process isn't listed.
    Sometimes, the java/tomcat process will take up to two minutes to fully shut down. The reason is: ERDDAP sends a message to its background threads to tell them to stop, but sometimes it takes these threads a long time to get to a good stopping place.
  6. If after a minute or so, java/tomcat isn't stopping by itself, you can use
    kill -9 javaProcessID
    to force the java/tomcat process to stop immediately. If possible, use this only as a last resort. The -9 switch is powerful, but it may cause various problems.
  7. To restart ERDDAP, use ./startup.sh
  8. View ERDDAP in your browser to check that the restart succeeded. (Sometimes, you need to wait 30 seconds and try to load ERDDAP again in your browser for it to succeed.)
- If you use Windows:
  1. Use cd tomcat/bin
  2. Use shutdown.bat
  3. You may want/need to use the Windows Task Manager (accessible via Ctrl Alt Del) to ensure that the Java/Tomcat/ERDDAP process/application has fully stopped.
    Sometimes, the process/application will take up to two minutes to shut down. The reason is: ERDDAP sends a message to its background threads to tell them to stop, but sometimes it takes these threads a long time to get to a good stopping place.
  4. To restart ERDDAP, use startup.bat
  5. View ERDDAP in your browser to check that the restart succeeded. (Sometimes, you need to wait 30 seconds and try to load ERDDAP again in your browser for it to succeed.)
Frequent Crashes or Freezes
If ERDDAP becomes slow, crashes or freezes, something is wrong. Look in ERDDAP's log file to try to figure out the cause. If you can't, please email the details to erd dot data at noaa dot gov .
The most common problem is a troublesome user who is running several scripts at once and/or someone making a large number of invalid requests. If this happens, you should probably blacklist that user. When a blacklisted user makes a request, the error message in the response encourages them to email you to work out the problems. Then, you can encourage them to run just one script at a time and to fix the problems in their script (e.g., requesting data from a remote dataset that can't respond before timing out). See <requestBlacklist> in your datasets.xml file.
In extreme circumstances, Java may freeze for a minute or two while it does a full garbage collection, but then recover. So it is good to wait a minute or two to see if Java/ERDDAP is really frozen or if it is just doing a long garbage collection. (If garbage collection is a common problem, allocate more memory to Tomcat.)
If ERDDAP becomes slow or freezes and the problem isn't a troublesome user or a long garbage collection, you can usually solve the problem by restarting ERDDAP. My experience is that ERDDAP can run for months without needing a restart.
Monitor ERDDAP
You can monitor your ERDDAP's status by looking at the /erddap/status.html page, notably the statistics in the top section. If ERDDAP becomes slow or freezes and the problem isn't just extremely heavy usage, you can usually solve the problem by restarting ERDDAP.
My experience is that ERDDAP can run for months without needing a restart. You should only need to restart it if you want to apply some changes you made to ERDDAP's setup.xml or when you need to install new versions of ERDDAP, Java, Tomcat, or the operating system. If you need to restart ERDDAP frequently, something is wrong. Look in ERDDAP's log file to try to figure out the cause. If you can't, please email the details to erd dot data at noaa dot gov . As a temporary solution, you might try using Monit to monitor your ERDDAP and restart it if needed. Or, you could make a cron job to restart ERDDAP (proactively) periodically. It may be a little challenging to write a script to automate monitoring and restarting ERDDAP. Some tips that might help:
- You can simplify testing if the Tomcat process is still running by using the -c switch with grep:
  ps -u tomcatUser | grep -c java
  That will reduce the output to "1" if the tomcat process is still alive, or "0" if the process has stopped.
- If you are good with gawk, you can extract the processID from the results of
  ps -u tomcatUser | grep java, and use the processID in other lines of the script.
If you do set up Monit or a cron job, please email the details to erd dot data at noaa dot gov .
Or, you can join the ERDDAP Google Group / Mailing List and share the information there.
PermGen
If you repeatedly use Tomcat Manager to Reload (or Stop and Start) ERDDAP, ERDDAP may fail to start up and throw java.lang.OutOfMemoryError: PermGen. The solution is to periodically (or every time?) shut down and restart tomcat and ERDDAP, instead of just reloading ERDDAP.
[Update: This problem was greatly minimized or fixed in ERDDAP version 1.24.]
log.txt
If ERDDAP doesn't start up or if something isn't working as expected, it is very useful to look at the error and diagnostic messages in the ERDDAP log file.
- The log file is bigParentDirectory/logs/log.txt
  (bigParentDirectory is specified in setup.xml). If there is no log.txt file or if the log.txt file hasn't been updated since you restarted ERDDAP, look in the Tomcat Log Files to see if there is an error message there.
- Types of diagnostic messages in the log file:
  - The word "error" is used when something went so wrong that the procedure failed to complete. Although it is annoying to get an error, the error forces you to deal with the problem. Our thinking is that it is better to throw an error, than to have ERDDAP hobble along, working in a way you didn't expect.
  - The word "warning" is used when something went wrong, but the procedure was able to be completed. These are pretty rare.
  - Anything else is just an informative message. You can control how much information is logged with <logLevel> datasets.xml.
  - Dataset reloads and user responses that take >10 seconds to finish (successfully or unsuccessfully) are marked with "(>10s!)". Thus, you can search the log.txt file for this phrase to find the datasets that were slow to reload or the request number of the requests that were slow to finish. You can then look higher in the log.txt file to see what the dataset problem was or what the user request was and who it was from. These slow dataset loads and user requests are sometimes taxing on ERDDAP. So knowing more about these requests can help you identify and solve problems.
- Information is written to the log file on the disk drive in fairly big chunks. The advantage is that this is very efficient -- ERDDAP will never block waiting for information to be written to the log file. The disadvantage is that the log will almost always end with a partial message, which won't be completed until the next chunk is written. You can make it up-to-date (for an instant) by viewing your ERDDAP's status web page at https://your.domain.org/erddap/status.html (or http:// if https isn't enabled).
- When the log.txt files gets to 20 MB,
  the file is renamed log.txt.previous and a new log.txt file is created. So log files don't accumulate.
  In setup.xml, you can specify a different maximum size for the log file, in MegaBytes. The minimum allowed is 1 (MB). The maximum allowed is 2000 (MB). The default is 20 (MB). For example:
  <logMaxSizeMB>20</logMaxSizeMB>
- Whenever you restart ERDDAP,
  ERDDAP makes an archive copy of the log.txt and log.txt.previous files with a time stamp in the file's name. If there was trouble before the restart, it may be useful to analyze these archived files for clues as to what the trouble was. You can delete the archive files if they are no longer needed.
Parsing log.txt
ERDDAP's log.txt file isn't designed for parsing (although you might be able to create regular expressions that extract desired information). It is designed to help a human figure out what is going wrong when something is going wrong. When you submit a bug or problem report to ERDDAP developers, when possible, please include all of the information from the log.txt file related to the troublesome request.
For efficiency reasons, ERDDAP only writes information to log.txt after a large chunk of information has accumulated. So if you visit log.txt right after an error has occurred, information related to the error may not yet have been written to log.txt. In order to get perfectly up-to-date information from log.txt, visit your ERDDAP's status.html page. When ERDDAP processes that request, it flushes all pending information to log.txt.
For ERDDAP usage statistics, please use the Apache and/or Tomcat log files instead of ERDDAP's log.txt. Note that ERDDAP's status.html page (some) and Daily Report (more) have a large number of usage statistics precalculated for you.
Tomcat Log Files
If ERDDAP doesn't start up because an error occurred very early in ERDDAP's startup, the error message will show up in Tomcat's log files (tomcat/logs/catalina.today.log or tomcat/logs/catalina.out), not in ERDDAP's log.txt file.
Usage Statistics: For most of the information that people want to gather from a log file (e.g., usage statistics), please use the Apache and/or Tomcat log files. They are nicely formatted and have that type of information. There are numerous tools for analyzing them, for example, AWStats, ElasticSearch's Kibana, and JMeter, but search the web to find the right tool for your purposes.
Note that the log files only identify users as IP addresses. There are websites to help you get information related to a given IP address, e.g., WhatIsMyIPAddress, but you normally won't be able to find the name of the user.
Also, because of DHCP, a given user's IP address may be different on different days, or different users may have the same IP address at different times.
Alternatively, you can use something like Google Analytics. But beware: when you use external services like Google Analytics, you are giving up your users' privacy by giving Google full access to their activity on your site which Google (and others?) can keep forever and use for any purpose (perhaps not technically, but probably in practice). Your users haven't consented to this and probably aren't aware that they will be tracked on your website, just as they probably aren't aware of the extent they are being tracked on almost all websites. These days, many users are very concerned that everything they do on the web is being monitored by these big companies (Google, Facebook, etc.) and by the government, and find this an unwarranted intrusion into their lives (as in the book, 1984). This has driven many users to install products like Privacy Badger to minimize tracking, to use alternative browsers like Tor Browser (or turn off tracking in traditional browsers), and to use alternative search engines like Duck Duck Go. If you use a service like Google Analytics, please at least document its use and the consequences by changing the <standardPrivacyPolicy> tag in ERDDAP's
[tomcat]/webapps/erddap/WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml file.
emailLogYEAR-MM-DD.txt
ERDDAP always writes the text of all out-going email messages in the current day's emailLogYEAR-MM-DD.txt file in bigParentDirectory/logs (bigParentDirectory is specified in setup.xml).
- If the server can't send out email messages, or if you have configured ERDDAP not to send out email messages, or if you are just curious, this file is a convenient way to see all of the email messages that have been sent out.
- You can delete previous days' email log files if they are no longer needed.
Daily Report
The Daily Report has lots of useful information -- all of the information from your ERDDAP's /erddap/status.html page and more.
- It is the most complete summary of your ERDDAP's status.
- Among other statistics, it includes a list of datasets that didn't load and the exceptions they generated.
- It is generated when you start up ERDDAP (just after ERDDAP finishes trying to load all of the datasets) and generated soon after 7 am local time every morning.
- Whenever it is generated, it is written to ERDDAP's log.txt file.
- Whenever it is generated, it is emailed to <emailDailyReportsTo> and <emailEverythingTo> (which are specified in setup.xml) provided you have set up the email system (in setup.xml).
Status Page
You can view the status of your ERDDAP from any browser by going to <baseUrl>/erddap/status.html
- This page is generated dynamically, so it always has up-to-the-moment statistics for your ERDDAP.
- It includes statistics regarding the number of requests, memory usage, thread stack traces, the taskThread, etc.
- Because the Status Page can be viewed by anyone, it doesn't include quite as much information as the Daily Report.
Adding/Changing Datasets
ERDDAP usually rereads datasets.xml every loadDatasetsMinMinutes (specified in setup.xml). So you can make changes to datasets.xml any time, even while ERDDAP is running.
A new dataset will be detected soon, usually within loadDatasetsMinMinutes.
A changed dataset will be reloaded when it is reloadEveryNMinutes old (as specified in datasets.xml).
A Flag File Tells ERDDAP to Try to Reload a Dataset As Soon As Possible
- ERDDAP won't notice any changes to a dataset's setup in datasets.xml until ERDDAP reloads the dataset.
- To tell ERDDAP to reload a dataset as soon as possible (before the dataset's <reloadEveryNMinutes> would cause it to be reloaded), put a file in bigParentDirectory/flag (bigParentDirectory is specified in setup.xml) that has the same name as the dataset's datasetID.
  This tells ERDDAP to try to reload that dataset ASAP.
  The old version of the dataset will remain available to users until the new version is available and swapped atomically into place.
  For EDDGridFromFiles and EDDTableFromFiles, the reloading dataset will look for new or changed files, read those, and incorporate them into the dataset. So the time to reload is dependent on the number of new or changed files.
  If the dataset has active="false", ERDDAP will remove the dataset.
- One variant of the /flag directory is the /badFilesFlag directory. (Added in ERDDAP v2.12.)
  If you put a file in the bigParentDirectory/badFilesFlag directory with a datasetID as the file name (the file contents don't matter), then as soon as ERDDAP sees the badFilesFlag file, ERDDAP will:
  1. Delete the badFilesFlag file.
  2. Delete the badFiles.nc file (if there is one), which has the list of bad files for that dataset.
    For datasets like EDDGridSideBySide that have childDatasets, this also deletes the badFiles.nc file for all child datasets.
  3. Reload the dataset ASAP.
  Thus, this causes ERDDAP to try again to work with the files previously (erroneously?) marked as bad.
- Another variant of the /flag directory is the /hardFlag directory. (Added in ERDDAP v1.74.)
  If you put a file in bigParentDirectory/hardFlag with a datasetID as the file name (the file contents don't matter), then as soon as ERDDAP sees the hardFlag file, ERDDAP will:
  1. Delete the hardFlag file.
  2. Remove the dataset from ERDDAP.
  3. Delete all of the information that ERDDAP has stored about this dataset.
    For EDDGridFromFiles and EDDTableFromFiles subclasses, this deletes the internal database of data files and their contents.
    For datasets like EDDGridSideBySide that have childDatasets, this also deletes the internal database of data files and their contents for all child datasets.
  4. Reload the dataset.
    For EDDGridFromFiles and EDDTableFromFiles subclasses, this causes ERDDAP to reread all of the data files. Thus, the reload time is dependent on the total number of data files in the dataset. Because the dataset was removed from ERDDAP when the hardFlag was noticed, the dataset will be unavailable until the dataset finishes reloading. Be patient. Look in the log.txt file if you want to see what's going on.
  The hardFlag variant deletes the dataset's stored information even if the dataset isn't currently loaded in ERDDAP.
  HardFlags are very useful when you do something that causes a change in how ERDDAP reads and interprets the source data, for example, when you install a new version of ERDDAP or when you have made a change to a dataset's definition in datasets.xml
- The contents of the flag, badFilesFlag, and hardFlag files are irrelevant. ERDDAP just looks at the file name to get the datasetID.
- In between major dataset reloads, ERDDAP looks continuously for flag, badFilesFlag, and hardFlag files.
- Note that when a dataset is reloaded, all files in the bigParentDirectory/cache/datasetID directory are deleted. This includes .nc and image files that are normally cached for ~15 minutes.
- Note that if the dataset's xml includes active="false", a flag will cause the dataset to be made inactive (if it is active), and in any case, not reloaded.
- Any time ERDDAP runs LoadDatasets to do a major reload (the timed reload controlled by <loadDatasetsMinMinutes>) or a minor reload (as a result of an external or internal flag), ERDDAP reads all <decompressedCacheMaxGB>, <decompressedCacheMaxMinutesOld>, <user>, <requestBlacklist>, <slowDownTroubleMillis>, and <subscriptionEmailBlacklist> tags and switches to the new settings. So you can use a flag as a way to get ERDDAP to notice changes to those tags ASAP.
ERDDAP has a web service so that flags can be set via URLs.
- For example,
  https://coastwatch.pfeg.noaa.gov/erddap/setDatasetFlag.txt?datasetID=rPmelTao&flagKey=123456789
  (that's a fake flagKey) will set a flag for the rPmelTao dataset.
- There is a different flagKey for each datasetID.
- Administrators can see a list of flag URLs for all datasets by looking at the bottom of their Daily Report email.
- Administrators should treat these URLs as confidential, since they give someone the right to reset a dataset at will.
- If you think the flagKeys have fallen into the hands of someone who is abusing them, you can change <flagKeyKey> in setup.xml and restart ERDDAP to force ERDDAP to generate and use a different set of flagKeys.
- If you change <flagKeyKey>, delete all of the old subscriptions (see the list in your Daily Report) and remember to send the new URLs to the people who you do want to have them.
The flag system can serve as the basis for a more efficient mechanism for telling ERDDAP when to reload a dataset. For example, you could set a dataset's <reloadEveryNMinutes> to a large number (e.g., 10080 = 1 week). Then, when you know the dataset has changed (perhaps because you added a file to the dataset's data directory), set a flag so that the dataset is reloaded as soon as possible. Flags are usually seen quickly. But if the LoadDatasets thread is already busy, it may be a while before it is available to act on the flag. But the flag system is much more responsive and much more efficient than setting <reloadEveryNMinutes> to a small number.
Force Dataset Removal
If a dataset is active in ERDDAP and you want to deactivate it temporarily or permanently:
1. In datasets.xml for the dataset, set active="false" in the dataset tag.
2. Wait for ERDDAP to remove the dataset during the next major reload or set a flag for the dataset to tell ERDDAP to notice this change as soon as possible. When you do this, ERDDAP doesn't throw out any information it may have stored about the dataset and certainly doesn't do anything to the actual data.
3. Then you can leave the active="false" dataset in datasets.xml or remove it.
When Are Datasets Reloaded?
A thread called RunLoadDatasets is the master thread that controls when datasets are reloaded. RunLoadDatasets loops forever:
1. RunLoadDatasets notes the current time.
2. RunLoadDatasets starts a LoadDatasets thread to do a "majorLoad". You can see information about the current/previous majorLoad at the top of your ERDDAP's
  /erddap/status.html page (for example, status page example).
  1. LoadDatasets makes a copy of datasets.xml.
  2. LoadDatasets reads through the copy of datasets.xml and, for each dataset, sees if the dataset needs to be (re)loaded or removed.
    - If a flag file exists for this dataset, the file is deleted and the dataset is removed if active="false" or (re)loaded if active="true" (regardless of the dataset's age).
    - If the dataset's dataset.xml chunk has active="false" and the dataset is currently loaded (active), it is unloaded (removed).
    - If the dataset has active="true" and the dataset isn't already loaded, it is loaded.
    - If the dataset has active="true" and the dataset is already loaded, the data set is reloaded if the dataset's age (time since last load) is greater than its <reloadEveryNMinutes> (default = 10080 minutes), otherwise, the dataset is left alone.
  3. LoadDatasets finishes.
  The RunLoadDatasets thread waits for the LoadDatasets thread to finish. If LoadDatasets takes longer than loadDatasetsMinMinutes (as specified in setup.xml), RunLoadDatasets interrupts the LoadDatasets thread. Ideally, LoadDatasets notices the interrupt and finishes. But if it doesn't notice the interrupt within a minute, RunLoadDatasets calls loadDatasets.stop(), which is undesirable.
3. While the time since the start of the last majorLoad is less than loadDatasetsMinMinutes (as specified in setup.xml, e.g., 15 minutes), RunLoadDatasets repeatedly looks for flag files in the bigParentDirectory/flag directory. If one or more flag files are found, they are deleted, and RunLoadDatasets starts a LoadDatasets thread to do a "minorLoad" (majorLoad=false). You can't see minorLoad information on your ERDDAP's /erddap/status.html page.
  1. LoadDatasets makes a copy of datasets.xml.
  2. LoadDatasets reads through the copy of datasets.xml and, for each dataset for which there was a flag file:
    - If the dataset's dataset.xml chunk has active="false" and the dataset is currently loaded (active), it is unloaded (removed).
    - If the dataset has active="true", the dataset is (re)loaded, regardless of its age.
    Non-flagged datasets are ignored.
  3. LoadDatasets finishes.
4. RunLoadDatasets goes back to step 1.
Notes:
- Startup
  When you restart ERDDAP, every dataset with active="true" is loaded.
- Cache
  When a dataset is (re)loaded, its cache (including any data response files and/or image files) is emptied.
- Lots of Datasets
  If you have a lot of datasets and/or one or more datasets are slow to (re)load, a LoadDatasets thread may take a long time to finish its work, perhaps even longer than loadDatasetsMinMinutes.
- One LoadDatasets Thread
  There is never more than one LoadDatasets thread running at once. If a flag is set when LoadDatasets is already running, the flag probably won't be noticed or acted on until that LoadDatasets thread finishes running. You might say: "That's stupid. Why don't you just start a bunch of new threads to load datasets?" But if you have lots of datasets which get data from one remote server, even one LoadDatasets thread will put substantial stress on the remote server. The same is true if you have lots of datasets which get data from files on one RAID. There are rapidly diminishing returns from having more than one LoadDatasets thread.
- Flag = ASAP
  Setting a flag just signals that the dataset should be (re)loaded as soon as possible, not necessarily immediately. If no LoadDatasets thread is currently running, the dataset will start to be reloaded within a few seconds. But if a LoadDatasets thread is currently running, the dataset probably won't be reloaded until after that LoadDatasets thread is finished.
- Flag File Deleted
  In general, if you put a flag file in the bigParentDirectory/erddap/flag directory (by visiting the dataset's flagUrl or putting an actual file there), the dataset will usually be reloaded very soon after that flag file is deleted.
- Flag versus Small reloadEveryNMinutes
  If you have some external way of knowing when a dataset needs to be reloaded and if it is convenient for you, the best way to make sure that a dataset is always up-to-date is to set its reloadEveryNMinutes to a large number (10080?) and set a flag (via a script?) whenever it needs to be reloaded. That is the system that EDDGridFromErddap and EDDTableFromErddap use receive messages that the dataset needs to be reloaded.
- Look in log.txt
  Lots of relevant information is written to the bigParentDirectory/logs/log.txt file. If things aren't working as you expect, looking at log.txt lets you diagnose the problem by finding out exactly what ERDDAP did.
  - Search for "majorLoad=true" for the start of major LoadDataset threads.
  - Search for "majorLoad=false" for the start of minor LoadDatasets threads.
  - Search for a given dataset's datasetID for information about it being (re)loaded or queried.
Cached Responses
In general, ERDDAP doesn't cache (store) responses to user requests. The rationale was that most requests would be slightly different so the cache wouldn't be very effective. The biggest exceptions are requests for image files (which are cached since browsers and programs like Google Earth often re-request images) and requests for .nc files (because they can't be created on-the-fly). ERDDAP stores each dataset's cached files in a different directory: bigParentDirectory/cache/datasetID since a single cache directory might have a huge number of files which might become slow to access.
Files are removed from the cache for one of three reasons:
- All files in this cache are deleted when ERDDAP is restarted.
- Periodically, any file more than <cacheMinutes> old (as specified in setup.xml) will be deleted. Removing files in the cache based on age (not Least-Recently-Used) ensures that files won't stay in the cache very long. Although it might seem like a given request should always return the same response, that isn't true. For example, a tabledap request which includes &time>someTime will change if new data arrives for the dataset. And a griddap request which includes [last] for the time dimension will change if new data arrives for the dataset.
- Images showing error conditions are cached, but only for a few minutes (it's a difficult situation).
- Every time a dataset is reloaded, all files in that dataset's cache are deleted. Because requests may be for the "last" index in a gridded dataset, files in the cache may become invalid when a dataset is reloaded.
Stored Dataset Information
For all types of datasets, ERDDAP gathers lots of information when a dataset is loaded and keeps that in memory. This allows ERDDAP to respond very quickly to searches, requests for lists of datasets, and requests for information about a dataset.
For a few types of datasets (notably EDDGridCopy, EDDTableCopy, EDDGridFromXxxFiles, and EDDTableFromXxxFiles), ERDDAP stores on disk some information about the dataset that is reused when the dataset is reloaded. This greatly speeds the reloading process.
- Some of the dataset information files are human-readable .json files and are stored in bigParentDirectory/dataset/last2LettersOfDatasetID/datasetID .
- ERDDAP only deletes these files in unusual situations, e.g., if you add or delete a variable from the dataset's datasets.xml chunk.
- Most changes to a dataset's datasets.xml chunk (e.g., changing a global attribute or a variable attribute) don't necessitate that you delete these files. A regular dataset reload will handle these types of changes. You can tell ERDDAP to reload a dataset ASAP by setting a flag for the dataset.
- Similarly, the addition, deletion, or change of data files will be handled when ERDDAP reloads a dataset. But ERDDAP will notice this type of change soon and automatically if the dataset is using the <updateEveryNMillis> system.
- It should only rarely be necessary for you to delete these files. The most common situation where you need to force ERDDAP to delete the stored information (because it is out-of-date/incorrect and won't be automatically fixed by ERDDAP) is when you make changes to the dataset's datasets.xml chunk that affect how ERDDAP interprets data in the source data files, for example, changing the time variable's format string.
- To delete a dataset's stored information files from an ERDDAP that is running (even if the dataset isn't currently loaded), set a hardFlag for that dataset. Remember that if a dataset is an aggregation of a large number of files, reloading the dataset may take considerable time.
- To delete a dataset's stored information files when ERDDAP isn't running, run DasDds for that dataset (which is easier than figuring in which directory the info is located and deleting the files by hand). Remember that if a dataset is an aggregation of a large number of files, reloading the dataset may take considerable time.
Memory Status
ERDDAP shouldn't ever crash or freeze up. If it does, one of the most likely causes is insufficient memory. You can monitor memory usage by looking at the status.html web page, which includes a line like
```
0 gc calls, 0 requests shed, and 0 dangerousMemoryEmails since last major LoadDatasets
```
(those are progressively more serious events)
and MB inUse and gc Calls columns in the table of statistics. You can tell how memory-stressed your ERDDAP is by watching these numbers. Higher numbers indicate more stress.
- MB inUse should always be less than half of the -Xmx memory setting. Larger numbers are a bad sign.
- gc calls indicates the number of times ERDDAP called the garbage collector to try to alleviate high memory usage. If this gets to be >100, that's a sign of serious trouble.
- shed indicates the number of incoming requests that were shed (with HTTP error number 503, Service Unavailable) because memory use was already too high. Ideally, no requests should be shed. It's okay if a few requests are shed, but a sign of serious trouble if many are shed.
- dangerousMemoryEmails - If memory use becomes dangerously high, ERDDAP sends an email to the email addresses listed in <emailEverythingTo> (in setup.xml) with a list of the active user requests. As the email says, please forward these emails to Chris.John at noaa.gov so we can use the information to improve future versions of ERDDAP.
If your ERDDAP is memory-stressed:
- Consider allocating more of your server's memory to ERDDAP by changing the Tomcat ‑Xmx memory setting.
- If you've already allocated as much memory as you can to ERDDAP via -Xmx, consider buying more memory for your server. Memory is cheap (compared to the price of a new server or your time)! Then increase -Xmx.
- In datasets.xml, set <nGridThreads> to 1, set <nTableThreads> to 1, and set <ipAddressMaxRequestsActive> to 1.
- Look at the requests in log.txt for inefficient or troublesome (but legitimate) requests. Add their IP addresses to <requestBlacklist> in datasets.xml. The blacklist error message includes the ERDDAP administrator's email address with the hope that those users will contact you so that you can work with them to use ERDDAP more efficiently. It's good to keep a list of IP addresses you blacklist and why, so that you can work with the users if they contact you.
- Look at the requests in log.txt for requests from malicious users. Add their IP addresses to <requestBlacklist> in datasets.xml. If similar requests are coming from multiple similar IP address, you can use some who-is services (e.g., https://www.whois.com/whois/) to find out the range of IP addresses from that source and blacklist the entire range. See the <requestBlacklist> documentation.
OutOfMemoryError
When you set up ERDDAP, you specify the maximum amount of memory that Java can use via the -Xmx setting. If ERDDAP ever needs more memory than that, it will throw a java.lang.OutOfMemoryError. ERDDAP does a lot of checking to enable it to handle that error gracefully (e.g., so a troublesome request will fail, but the system retains its integrity). But sometimes, the error damages system integrity and you have to restart ERDDAP. Hopefully, that is rare.
The quick and easy solution to an OutOfMemoryError is to increase the -Xmx setting, but you shouldn't ever increase the -Xmx setting to more than 80% of the physical memory in the server (e.g., for a 10GB server, don't set -Xmx above 8GB). Memory is relatively cheap, so it may be a good option to increase the memory in the server. But if you have maxed out the memory in the server or for other reasons can't increase it, you need to deal more directly with the cause of the OutOfMemoryError.
If you look in the log.txt file to see what ERDDAP was doing when the error arose, you can usually get a good clue as to the cause of the OutOfMemoryError. There are lots of possible causes, including:
- A single huge data file can cause the OutOfMemoryError, notably, huge ASCII data files. If this is the problem, it should be obvious because ERDDAP will fail to load the dataset (for tabular datasets) or read data from that file (for gridded datasets). The solution, if feasible, is to split the file into multiple files. Ideally, you can split the file into logical chunks. For example, if the file has 20 month's worth of data, split it into 20 files, each with 1 month's worth of data. But there are advantages even if the main file is split up arbitrarily. This approach has multiple benefits: a) This will reduce the memory needed to read the data files to 1/20th, because only one file is read at a time. b) Often, ERDDAP can deal with requests much faster because it only has to look in one or a few files to find the data for a given request. c) If data collection is ongoing, then the existing 20 files can remain unchanged, and you only need to modify one, small, new file to add the next month's worth of data to the dataset.
- A single huge request can cause the OutOfMemoryError. In particular, some of the orderBy options have the entire response in memory for a second (e.g., to do a sort). If the response is huge, it can lead to the error. There will always be some requests which are, in various ways, too big. You can solve the problem by increasing the -Xmx setting. Or, you can encourage the user to make a series of smaller requests.
- It is unlikely that a large number of files would cause the file index that ERDDAP creates to be so large that that file would cause the error. If we assume that each file uses 300 bytes, then 1,000,000 files would only take up 300MB. But datasets with a huge number of data files cause other problems for ERDDAP, notably, it takes a long time for ERDDAP to open all those data files when responding to a user request for data. In this case, the solution may be to aggregate the files so that there are fewer data files. For tabular datasets, it is often great if you save the data from the current dataset in CF Discrete Sampling Geometries (DSG) Contiguous Ragged Array data files (request .ncCF files from ERDDAP) and then make a new dataset. These files can be handled very efficiently with ERDDAP's EDDTableFromNcCFFiles). If they are logically organized (each with data for a chunk of space and time), ERDDAP can extract data from them very quickly.
- For tabular datasets that use the <subsetVariables> attribute, ERDDAP makes a table of unique combinations of the values of those variables. For huge datasets or when <subsetVariables> is misconfigured, this table can be large enough to cause OutOfMemoryErrors. The solution is to remove variables from the list of <subsetVariables> for which there are a large number of values, or remove variables as needed until the size of that table is reasonable. The parts of ERDDAP that use the subsetVariables system don't work well (e.g., web pages load very slowly) when there are more than 100,000 rows in that table.
- It's always possible that several simultaneous large requests (on a really busy ERDDAP) can combine to cause memory trouble. For example, 8 requests, each using 1GB each, would cause problems for an -Xmx=8GB setup. But it is rare that each request would be at the peak of its memory use simultaneously. And you would easily be able to see that your ERDDAP is really busy with big requests. But, it's possible. It's hard to deal with this problem other than by increasing the -Xmx setting.
- There are other scenarios. If you look at the log.txt file to see what ERDDAP was doing when the error arose, you can usually get a good clue as to the cause. In most cases, there is a way to minimize that problem (see above), but sometimes you just need more memory and a higher -Xmx setting.
Too Many Open Files
Starting with ERDDAP v2.12, ERDDAP has a system to monitor the number of open files (which includes sockets and some other things, not just files) in Tomcat on Linux computers. If some files mistakenly never get closed (a "resource leak"), the number of open files may increase until it exceeds the maximum allowed by the operating system and numerous really bad things happen. So now, on Linux computers (because the information isn't available for Windows):
- There is an "Open Files" column on the far right of the status.html web page showing the percent of max files open. On Windows, it just shows "?".
- When ERDDAP generates that information at the end of each major dataset reload, it will print to the log.txt file:
  openFileCount=current of max=max %=percent
- If the percentage is >50%, an email is sent to the ERDDAP administrator and the emailEverythingTo email addresses.
If the percentage is 100%, ERDDAP is in terrible trouble. Don't let this happen.
If the percentage is >75%, ERDDAP is close to terrible trouble. That's not okay.
If the percentage is >50%, it is very possible that a spike will cause the percentage to hit 100.
If the percentage is ever >50%, you should:
- Increase the maximum number of open files allowed by either:
  - Making these changes each time before you start tomcat (put them in the Tomcat startup.sh file?):
    ulimit -Hn 16384 ulimit -Sn 16384
  - Or making a permanent change by editing (as root) /etc/security/limits.conf and adding the lines:
    tomcat soft nofile 16384 tomcat hard nofile 16384
    Those commands assume that the user running Tomcat is called "tomcat".
    On many Linux variants, you have to restart the server to apply those changes.
  For both options, the "16384" above is an example. You choose the number that you think is best.
- Restart ERDDAP. The operating system will close any open files.
Unusual Activity: >25% of requests failed
As part of every reloadDatasets, which is usually every 15 minutes, ERDDAP looks at the percentage of requests which failed since the last reloadDatasets. If it is >25%, ERDDAP sends an email to the ERDDAP administrator with the subject "Unusual Activity: >25% of requests failed". That email includes a tally near the bottom entitled "Requester's IP Address (Failed) (since last Major LoadDatasets)". Search for that. It tells you the IP address of the computers making the most failed requests. You can then search for those IP addresses in the [bigParentDirectory]/logs/log.txt file and see what type of requests they are making.
You can use the user's IP number (for example, with https://whatismyipaddress.com/ip-lookup) to try to figure out who or what the user is. Sometimes that will tell you pretty accurately who the user is (e.g., it's a search engine's web crawler). Most of the time it just gives you a clue (e.g., it's an amazonaws computer, it's from some university, it's someone in some specific city).
By looking at the actual request, the IP number, and the error message (all from log.txt) for a series of errors, you can usually figure out basically what is going wrong. In my experience, there are four common causes of lots of failed requests:
1) The requests are malicious (e.g., looking for security weaknesses, or making requests and then cancelling them before they are completed). You should use <requestBlacklist> in datasets.xml to blacklist those IP addresses.
2) A search engine is naively trying the URLs listed in ERDDAP web pages and ISO 19115 documents. For example, there are many places which list the base OPeNDAP URL, for example, https://coastwatch.pfeg.noaa.gov/erddap/griddap/jplMURSST, to which the user is supposed to add a file type (e.g., .das, .dds, .html). But the search engine doesn't know this. And the request to the base URL fails. A related situation is when the search engine generates bizarre requests or tries to fill out forms in order to get to "hidden" web pages. But the search engines often do a bad job of this, leading to failures. The solution is: create a robots.txt file.
3) Some user is running a script that is repeatedly asking for something that isn't there. Maybe it is a dataset that used to exist, but is gone now (temporarily or permanently). Scripts often don't expect this and so don't deal with it intelligently. So the script just keeps making requests and the requests keep failing. If you can guess who the user is (from the IP number above), contact them and tell them the dataset is no longer available and ask them to change their script.
4) Something is really wrong with some dataset. Usually, ERDDAP will make the troubled dataset inactive. Sometimes it doesn't, so all the requests to it just lead to errors. If so, fix the problem with the dataset or (if you can't) set the dataset to active="false". Of course, this may lead to problem #2.
Sometimes the errors aren't so bad, notably, if ERDDAP can detect the error and respond very quickly (<=1ms). So you may decide to take no action.
If all else fails, there is a universal solution: add the user's IP number to the <requestBlacklist>. This isn't as bad or as drastic an option as it might seem. The user will then get an error message saying s/he has been blacklisted and telling them your (the ERDDAP administrator's) email address. Sometimes the user will contact you and you can resolve the problem. Sometimes the user doesn't contact you and you will see the exact same behavior coming from a different IP number the next day. Blacklist the new IP number and hope that they will eventually get the message. (Or this is your Groundhog Day, from which you will never escape. Sorry.)
robots.txt
The search engine companies use web crawlers (e.g., GoogleBot) to examine all of the pages on the web to add the content to the search engines. For ERDDAP, that is basically good. ERDDAP has lots of links between pages, so the crawlers will find all of the web pages and add them to the search engines. Then, users of the search engines will be able to find datasets on your ERDDAP.
Unfortunately, some web crawlers (e.g., GoogleBot) are now filling out and submitting forms in order to find additional content. For web commerce sites, this is great. But this is terrible for ERDDAP because it just leads to an infinite number of undesirable and pointless attempts to crawl the actual data. This can lead to more requests for data than from all other users combined. And it fills the search engine with goofy, pointless subsets of the actual data.
To tell the web crawlers to stop filling out forms and just generally not looking at web pages they don't need to look at, you need to create a text file called robots.txt in the root directory of your website's document hierarchy so that it can be viewed by anyone as, e.g., http://www.your.domain/robots.txt .
If you are creating a new robots.txt file, this is a good start:
```
User-Agent: *
Disallow: /erddap/files/ 
Disallow: /files/ 
Disallow: /images/ 
Disallow: /*?
Disallow: /*?*
Disallow: /*.asc*
Disallow: /*.csv*
Disallow: /*.dods*
Disallow: /*.esriAscii*
Disallow: /*.esriCsv*
Disallow: /*.geoJson*
Disallow: /*.htmlTable*
Disallow: /*.json*
Disallow: /*.mat*
Disallow: /*.nc*
Disallow: /*.odvTxt*
Disallow: /*.tsv*
Disallow: /*.xhtml*
Disallow: /*.geotif*
Disallow: /*.itx*
Disallow: /*.kml*
Disallow: /*.pdf*
Disallow: /*.png*
Disallow: /*.large*
Disallow: /*.small*
Disallow: /*.transparentPng*
Sitemap: http://your.institutions.url/erddap/sitemap.xml
```
(But replace your.institutions.url with your ERDDAP's base URL.
It may take a few days for the search engines to notice and for the changes to take effect.
sitemap.xml
As the https://www.sitemaps.org website says:
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs on the site) so that search engines can more intelligently crawl the site.
Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.
Actually, since ERDDAP is RESTful, search engine spiders can easily crawl your ERDDAP. But they tend to do it more often (daily!) than necessary (monthly?).
- Given that each search engine may be crawling your entire ERDDAP every day, this can lead to a lot of unnecessary requests.
- So ERDDAP generates a sitemap.xml file for your ERDDAP which tells search engines that your ERDDAP only needs to be crawled every month.
- You should add a reference to ERDDAP's sitemap.xml to your robots.txt file:
  Sitemap: http://www.yoursite.org/erddap/sitemap.xml
- If that doesn't seem to be getting the message to the crawlers, you can tell the various search engines about the sitemap.xml file by visiting these URLs (but change YourInstitution to your institution's acronym or abbreviation and www.yoursite.org to your ERDDAP's URL):
  - https://www.bing.com/webmaster/ping.aspx?siteMap=http://www.yoursite.org/erddap/sitemap.xml
  - https://www.google.com/ping?sitemap=http://www.yoursite.org/erddap/sitemap.xml
  (I think) you just need to ping each search engine once, for all time. The search engines will then detect changes to sitemap.xml periodically.
Data Dissemination / Data Distribution Networks: Push and Pull Technology
- Normally, ERDDAP acts as an intermediary: it takes a request from a user; gets data from a remote data source; reformats the data; and sends it to the user.
- Pull Technology: ERDDAP also has the ability to actively get all of the available data from a remote data source and store a local copy of the data.
- Push Technology: By using ERDDAP's subscription services, other data servers can be notified as soon as new data is available so that they can request the data (by pulling the data).
- ERDDAP's EDDGridFromErddap and EDDTableFromErddap use ERDDAP's subscription services and flag system so that it will be notified immediately when new data is available.
- You can combine these to great effect: if you wrap an EDDGridCopy around an EDDGridFromErddap dataset (or wrap an EDDTableCopy around an EDDTableFromErddap dataset), ERDDAP will automatically create and maintain a local copy of another ERDDAP's dataset.
- Because the subscription services work as soon as new data is available, push technology disseminates data very quickly (within seconds).
This architecture puts each ERDDAP administrator in charge of determining where the data for his/her ERDDAP comes from.
- Other ERDDAP administrators can do the same. There is no need for coordination between administrators.
- If many ERDDAP administrators link to each other's ERDDAPs, a data distribution network is formed.
- Data will be quickly, efficiently, and automatically disseminated from data sources (ERDDAPs and other servers) to data redistribution sites (ERDDAPs) anywhere in the network.
- A given ERDDAP can be both a source of data for some datasets and a redistribution site for other datasets.
- The resulting network is roughly similar to data distribution networks set up with programs like Unidata's IDD/IDM, but less rigidly structured.
Security, Authentication, and Authorization
By default, ERDDAP runs as an entirely public server (using http and/or https) with no login (authentication) system and no restrictions to data access (authorization).
If you want to restrict access to some or all datasets to some users, you can use ERDDAP's built-in security system. When the security system is in use:
- ERDDAP uses role-based access control.
  - The ERDDAP administrator defines users with the <user> tag in datasets.xml. Each user has a username, a password (if authentication=custom), and one or more roles.
  - The ERDDAP administrator defines which roles have access to a given dataset via the <accessibleTo> tag in datasets.xml for any dataset that shouldn't have public access.
- The user's login status (and a link to log in/out) will be shown at the top of every web page. (But a logged in user will appear to ERDDAP to be not logged in if he uses an http URL.)
- If the <baseUrl> that you specify in your setup.xml is an http URL, users who are not logged in may use ERDDAP's http URLs. If <baseHttpsUrl> is also specified, users who are not logged in can also use https URLs.
- HTTPS Only -- If the <baseUrl> that you specify in your setup.xml is an https URL, users who are not logged are encouraged (not forced) to use ERDDAP's https URLs -- all of the links on ERDDAP web pages will refer to https URLs.
  If you want to force users to use https URL, add a Redirect permanent line inside the <VirtualHost *:80> section in your Apache's config file (usually httpd.conf), e.g.,
```
<VirtualHost *:80>
  [...]
  ServerName example.com
  Redirect permanent / https://example.com/
</VirtualHost>
```
  If you want, there is an additional method to force the use of https: HTTP Strict Transport Security (HSTS). To use it:
  1. Enable the Apache Headers Module: a2enmod headers
  2. Add the additional header to the HTTPS VirtualHost directive. Max-age is measured in seconds and can be set to some long value.
```
<VirtualHost *:443>
  # Guarantee HTTPS for 1 Year including Sub Domains 
  Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains"
```
  Please note that this header is only valid on a HTTPS VirtualHost.
  A reason not to force users to use https URLs is: the underlying SSL/TLS link takes time to establish and then takes time to encrypt and decrypt all information transmitted between the user and the server. But some institutions require https only.
- Users who are logged in MUST use ERDDAP's https URLs. If they use http URLs, they appear to ERDDAP to be not logged in. This ensures the privacy of the communications and helps prevent session hijacking and sidejacking.
- Anyone who isn't logged in can access and use the public datasets. By default, private datasets don't appear in lists of datasets if a user isn't logged in. If the administrator has set setup.xml's <listPrivateDatasets> to true, they will appear. Attempts to request data from private datasets (if the user knows the URL) will be redirected to the login page.
- Anyone who is logged in will be able to see and request data from any public dataset and any private dataset to which their role allows them access. By default, private datasets to which a user doesn't have access don't appear in lists of datasets. If the administrator has set setup.xml's <listPrivateDatasets> to true, they will appear. Attempts to request data from private datasets to which the user doesn't have access will be redirected to the login page.
- The RSS information for fully private datasets is only available to users (and RSS readers) who are logged in and authorized to use that dataset. This makes RSS not very useful for fully private datasets.
  If a dataset is private but its <graphsAccessibleTo> is set to public, the dataset's RSS is accessible to anyone.
- Email subscriptions can only be set up when a user has access to a dataset. If a user subscribes to a private dataset, the subscription continues to function after the user has logged out.
To set up the security/authentication/authorization system:
- Do the standard ERDDAP initial setup.
- In setup.xml,
  - Add/change the <authenticate> value from nothing to custom (don't use this), email (don't use this), google (recommended), orcid (recommended), or oauth2 (which is google+orcid, recommended). See the comments about these options below.
  - Add/change the <baseHttpsUrl> value.
  - Insert/uncomment &loginInfo; in <startBodyHtml> to display the user's log in/out info at the top of each web page.
- For testing purposes on your personal computer, follow these instructions to configure tomcat to support SSL (the basis for https connections) by creating a keystore with a self-signed certificate and by modifying tomcat/conf/server.xml to uncomment the connector for port 8443. On Windows, you may need to move .keystore from "c:\Users\you\.keystore" to "c:\Users\Default User\.keystore" or "c:\.keystore" (see tomcat/logs/catalina.today.log if the application doesn't load or users can't see the log in page). You can see when the .keystore certificate will expire by examining the certificate when you log in.
  For a publicly accessible server, instead of using a self-signed certificate, it is strongly recommended that you buy and install a certificate signed by a certificate authority, because it gives your clients more assurance that they are indeed connecting to your ERDDAP, not a man-in-the-middle's version of your ERDDAP. Many vendors sell digital certificates. (Search for web.) They are not expensive.
- On Linux computers, if Tomcat is running in Apache, modify the /etc/httpd/conf.d/ssl.conf file to allow HTTPS traffic to/from ERDDAP without requiring the :8443 port number in the URL:
  1. Modify the existing <VirtualHost> tag (if there is one), or add one at the end of the file so that it at least has these lines:
```
<VirtualHost _default_:443>
   SSLEngine on
   SSLProxyEngine On
   ProxyPass /erddap http://localhost:8443/erddap
   ProxyPassReverse /erddap http://localhost:8443/erddap
</VirtualHost>
```
  2. Then restart Apache: /usr/sbin/apachectl -k graceful (but sometimes it is in a different directory).
- In tomcat/conf/server.xml, uncomment the port=8443 <Connector> tag:
```
<Connector port="8443" 
    protocol="org.apache.coyote.http11.Http11NioProtocol"
    maxThreads="150" SSLEnabled="true">
  <SSLHostConfig>
    <Certificate certificateKeystoreFile="conf/localhost-rsa.jks" 
      type="RSA" />
  </SSLHostConfig>
</Connector>
```
  and change the location of the certificateKeystoreFile.
- In datasets.xml, create a <user> tag for each user with username, password (if authorization=custom), and roles information. This is the authorization part of ERDDAP's security system.
- In datasets.xml, add an <accessibleTo> tag to each dataset that shouldn't have public access. <accessibleTo> lets you specify which roles have access to that dataset.
- Restart Tomcat. Trouble? Check the Tomcat logs.
- CHECK YOUR WORK! Any mistake could lead to a security flaw.
- Check that the login page uses https (not http). Attempts to login via http should be automatically redirected to https and port 8443 (although the port number may be hidden via an Apache proxy). You may need to work with your network administrator to allow external web requests to access port 8443 on your server.
- You can change the <user> and <accessibleTo> tags at any time. The changes will be applied at the next regular reload of any dataset, or ASAP if you use a flag.
Authentication (logging in)
If you don't want to allow users to log in, don't specify a value for <authentication> in setup.xml.
If you do want to allow users to log in, you must specify a value for <authentication>. Currently, ERDDAP supports
custom (don't use this),
email (don't use this),
google (recommended),
orcid (recommended), and
oauth2 (recommended) for the authentication method.
If you want to enable logging in, we strongly recommend the google, orcid, or oauth2 options because they free you from storing and handling user's passwords (needed for custom) and are more secure than the email option. Remember that users often use the same password at different sites. So they may be using the same password for your ERDDAP as they do at their bank. That makes their password very valuable -- much more valuable to the user than just the data they are requesting. So you need to do as much as you can to keep the passwords private. That is a big responsibility. The email, google, orcid, and oauth2 options take care of passwords, so you don't have to gather, store, or work with them. So you are freed from that responsibility.
All <authentication> options use a cookie on the user's computer, so the user's browser must be set to allow cookies. If a user is making ERDDAP requests from a computer program (not a browser), cookies and authentication are hard to work with. That's a common problem with all authentication systems. Sorry.
The details of the <authentication> options are:
- custom
  custom is ERDDAP's custom system for letting users log in by entering their User Name and Password in a form on a web page. If a user tries and fails to log in 3 times within 10 minutes, the user is blocked from trying to log in for 10 minutes. This prevents hackers from simply trying millions of passwords until they find the right one.
  This is somewhat secure because the User Name and Password are transmitted via https (not http), but authentication=google, orcid, or oauth2 are better because they free you from having to handle passwords. The custom approach requires you to collect a user's Name and a hash digest of their Password (use your phone! email isn't secure!) and store them in datasets.xml in <user> tags.
  With the custom option, no one can log in until you (the ERDDAP administrator) create a <user> tag for the user, specifying the user's name as the username, the hash digest of their password as the password, and their roles.
  Not Recommended
  Because of the awkwardness of generating and transmitting the hash digest of the user's password and because of the risks associated with ERDDAP holding the hash digests of the passwords, this option is not recommended.
  To increase the security of this option:
  - You MUST make sure that other users on the server (i.e., Linux users, not ERDDAP users) can't read files in the Tomcat directory (especially the datasets.xml file!) or ERDDAP's bigParentDirectory.
    On Linux, as user=tomcat, use:
    chmod -R g-rwx bigParentDirectory
    chmod -R o-rwx bigParentDirectory
    chmod -R g-rwx tomcatDirectory
    chmod -R o-rwx tomcatDirectory
  - Use UEPSHA256 for <passwordEncoding> in setup.xml.
  - Use an as-secure-as-possible method to pass the hash digest of the user's password from the user to the ERDDAP administrator (phone?).
- email
  The email authentication option uses a user's email account to authenticate the user (by sending them an email with a special link that they have to access in order to log in). Unlike other emails that ERDDAP sends, ERDDAP does not write these invitation emails to the email log file because they contain confidential information.
  In theory, this is not very secure, because emails aren't always encrypted, so a bad guy with the ability to intercept emails could abuse this system by using a valid user's email address and intercepting the invitation email.
  In practice, if you set up ERDDAP to use a Google email account to send emails, and if you set it up to use one of the TLS options for the connection, and if the user has a Google email account, this is somewhat secure because the emails are encrypted all the way from ERDDAP to the user.
  To increase the security of this option:
  - Make sure that other users on the server (i.e., Linux users, not ERDDAP users) can't read files in the Tomcat directory or ERDDAP's bigParentDirectory.
    On Linux, as user=tomcat, use:
    chmod -R g-rwx bigParentDirectory
    chmod -R o-rwx bigParentDirectory
    chmod -R g-rwx tomcatDirectory
    chmod -R o-rwx tomcatDirectory
  - Set things up to get end-to-end security for the emails sent from ERDDAP to the users. For example, you could make a Google-centric system by only creating <user> tags for Google-managed email addresses and by setting up your ERDDAP to use a Google email server via a secure/TLS connection: in your setup.xml, use e.g., <emailSmtpHost>smtp.gmail.com</emailSmtpHost> <emailSmtpPort>587</emailSmtpPort> <emailProperties>mail.smtp.starttls.enable|true</emailProperties>
  Not Recommended
  The email authentication option isn't recommended. Please use the google, orcid, or oauth2 option instead.
  As with the google, orcid, and oauth2 options, email is very convenient for ERDDAP administrators -- you don't ever have to deal with passwords or their hash digests. All you need to create is a <user> tag for a user in datasets.xml is the user's email address, which ERDDAP uses as the user's name. (The password attribute isn't used when authentication=email, google, orcid, or oauth2.)
  With the email option, only users that have a <user> tag in datasets.xml can try to log in to ERDDAP by providing their email address and clicking on the link in the email that ERDDAP sends them.
  ERDDAP treats email addresses as case-insensitive. It does this by converting email addresses you enter (in <user> tags) or users enter (on the login form) to their all lowercase version.
  To set up authentication=email:
  1. In your setup.xml, change the <baseHttpsUrl> tag's value.
    For experimenting/working on your personal computer, use
    https://localhost:8443
    For your public ERDDAP, use
    https://your.domain.org:8443
    or without the :8443 if you are using an Apache proxypass so that the port number isn't needed.
  2. In your setup.xml, change the <authentication> tag's value to email:
    <authentication>email</authentication>
  3. In your setup.xml, make sure the email system is set up via all of the <email...> tags, so that ERDDAP can send out emails. If possible, set this up to use a secure connection (SSL / TLS) to the email server.
  4. In your datasets.xml, create <user> tags for each user who will have access to private datasets.
    Use the user's email address as the username in the tag.
    Don't specify the password attribute in the user tag.
  5. Restart ERDDAP so that the changes to setup.xml and datasets.xml take effect.
- google, orcid, and oauth2 (recommended)
  All three of these options are the recommended ERDDAP authentication options. They are all the most secure options. The other options have significantly weaker security.
  - The google authentication option uses Google Sign-In, which is an implementation of the OAuth 2.0 authentication protocol. ERDDAP users sign into their Google email account, including Google-managed accounts such as @noaa.gov accounts. This allows ERDDAP to verify the user's identity (name and email address) and access their profile image, but does not give ERDDAP access to their emails, their Google Drive, or any other private information.
  - The orcid authentication option uses Orcid authentication, which is an implementation of the OAuth 2.0 authentication protocol. ERDDAP users sign into their Orcid account, which is commonly used by researchers to identify themselves. This allows ERDDAP to verify the user's Orcid identity and get their Orcid account number, but does not give ERDDAP access to their other Orcid account information.
  - The oauth2 option lets users sign in with either their Google account or their Orcid account.
  The google, orcid, and oauth2 options are the successors to the openid option, which was discontinued after ERDDAP version 1.68, and which was based on a version of openID that is now out-of-date. Please switch to the google, orcid, or oauth2 option.
  These options are very convenient for ERDDAP administrators -- you don't ever have to deal with passwords or their hash digests. All you need to create is a <user> tag for a user in datasets.xml which specifies the user's Google email address or Orcid account number as the username attribute. (The password attribute isn't used when authentication=email, google, orcid or oauth2.)
  With these options, anyone can log in to ERDDAP by signing into their Google email account or Orcid account, but no one will have the right to access private datasets until you (the ERDDAP administrator) create a <user> tag, specifying their Google email address or Orcid account number as the username, and specifying their roles.
  ERDDAP treats email addresses as case-insensitive. It does this by converting email addresses you enter (in <user> tags) or users enter (on the login form) to their all lowercase version.
  To set up google, orcid, or oauth2 authentication:
  - In your setup.xml, change the <baseHttpsUrl> tag's value.
    For experimenting/working on your personal computer, use
    https://localhost:8443
    For your public ERDDAP, use
    https://your.domain.org:8443
    or, better, without the :8443 if you are using an Apache proxypass so that the port number isn't needed.
  - In your setup.xml, change the <authentication> tag's value to google, orcid, or oauth2, for example:
    <authentication>oauth2</authentication>
  - For the google and oauth2 options:
    Follow the instructions below to set up Google authentication for your ERDDAP.
    1. If you don't have a Google email account, create one
    2. Follow these instructions to create a Google Developers Console project and get a client ID.
      When the Google form asks for authorized JavaScript origins, enter the value from <baseHttpsUrl> from your personal computer's ERDDAP setup.xml, e.g.,
      https://localhost:8443
      On a second line, add the <baseHttpsUrl> from your public ERDDAP setup.xml, e.g.,
      https://your.domain.org:8443
      Don't specify any Authorized redirect URIs.
      When you see your Client ID for this project, copy and paste it into your setup.xml (usually just below <authentication> to be orderly, but placement doesn't actually matter), in the <googleClientID> tag, e.g.,
      <googleClientID>yourClientID</googleClientID>
      The client ID will be a string of about 75 characters, probably starting with several digits and ending with .apps.googleusercontent.com .
    3. In your datasets.xml, create a <user> tag for each user who will have access to private datasets. For the username attribute in the tag:
      - For users who will sign in with google, use the user's Google email address.
      - For users who will sign in with orcid, use the user's Orcid account number (with dashes).
      Don't specify the password attribute for the user tag.
    4. Restart ERDDAP so that the changes to setup.xml and datasets.xml take effect.
  - For the orcid and oauth2 options:
    Follow the instructions below to set up Orcid authentication for your ERDDAP.
    (For details, see Orcid's authentication API documentation.)
    1. If you don't have an Orcid account, create one
    2. Log into Orcid https://orcid.org/signin using your personal Orcid account.
    3. Click on "Developer Tools" (under "For Researchers" at the top).
    4. Click on "Register for the free ORCID public API". Enter this information:
      Name: ERDDAP at [your organization]
      Website: [your ERDDAP's domain]
      Description: ERDDAP is a scientific data server. Users need to authenticate with Google or Orcid to access non-public datasets.
      Redirect URIs: [your ERDDAP's domain]/erddap/loginOrcid.html
    5. Click on the Save icon (it looks like a 3.5" disk!).
      You can then see your ORCID APP Client ID and ORCID Client Secret.
    6. Copy and paste the ORCID APP Client ID (which will start with "APP-") into setup.xml in the <orcidClientID> tag, e.g.,
```
<orcidClientID>APP-ALPHANUMERICCHARACTERS</orcidClientID>
```
    7. Copy and paste the ORCID Client Secret (lowercase alpha-numeric characters with dashes) into setup.xml in the <orcidClientSecret> tag, e.g.,
```
<orcidClientSecret>alpha-numeric-characters-with-dashes</orcidClientSecret>
```
    8. In your datasets.xml, create a <user> tag for each user who will have access to private datasets. For the username attribute in the tag:
      - For users who will sign in with google, use the user's Google email address.
      - For users who will sign in with orcid, use the user's Orcid account number (with dashes).
      Don't specify the password attribute for the user tag.
    9. Restart ERDDAP so that the changes to setup.xml and datasets.xml take effect.
Log In Either Way
If you use the google, orcid, or oauth2 authentication options, and Google Sign-In or Orcid's Authentication API suddenly ceases to work (for whatever reason) or ceases to work as ERDDAP expects, users won't be able to log in to your ERDDAP. As a temporary (or permanent) solution, you can ask users to sign up with the other system (get a Google email account, or get an Orcid account). To do this:
1. Change the <authentication> tag so that it allows the other authentication system. The oauth2 option allows users to log in with either system.
2. Duplicate each of the <user> tags and change the username attribute from the Google email address to the corresponding Orcid account number (or vice-versa), but keep the roles attribute the same.
ERDDAP no longer supports the openid authentication option, which was based on a version of openID that is now out-of-date. Please use the google, orcid, or oauth2 options instead.
ERDDAP doesn't support BASIC authentication because:
- BASIC seems geared toward predefined web pages needing secure access or blanket on/off access to the whole site, but ERDDAP allows (restricted access) datasets to be added on-the-fly.
- BASIC authentication doesn't offer a way for users to log out!
- BASIC authentication is known to be not secure.
Secure Data Sources
If a data set is to have restricted access to ERDDAP users, the data source (from where ERDDAP gets the data) should not be publicly accessible. So how can ERDDAP get the data for restricted access datasets? Some options are:
- ERDDAP can serve data from local files (for example, via EDDTableFromFiles or EDDGridFromFiles).
- ERDDAP can be in a DMZ and the data source (e.g., an OPeNDAP server or a database) can be behind a firewall, where it is accessible to ERDDAP but not to the public.
- The data source can be on a public website, but require a login to get the data. The two types of dataset that ERDDAP can log on to access are EDDTableFromDatabase and EDDTableFromCassandra. These datasets support (and should always use) user names (create an ERDDAP user who only has read-only privileges), passwords, SSL connections, and other security measures.
  But in general, currently, ERDDAP can't deal with these data sources because it has no provisions for logging on to the data source. This is the reason why access to EDDGridFromErddap and EDDTableFromErddap datasets can't be restricted. Currently, the local ERDDAP has no way to login and access the metadata information from the remote ERDDAP. And putting the "remote" ERDDAP behind your firewall and removing that dataset's accessibleTo restrictions doesn't solve the problem: since user requests for EDDXxxFromErddap data need to be redirected to the remote ERDDAP, the remote ERDDAP must be accessible.
Defenses Against Hackers
There are bad guy hackers who try to exploit security weaknesses in server software like ERDDAP. ERDDAP follows the common security advice to have several layers of defenses:
- Restricted Privileges -- One of the most important defenses is to run Tomcat via a user called tomcat that doesn't have a password (so no one can log in as that user) and has limited file system privileges (e.g., read-only access to the data). See ERDDAP's instructions for setting up tomcat.
- Heavy Use - In general, ERDDAP is built for heavy use, including by scripts which make tens of thousands of requests, one after another. It is hard for ERDDAP to simultaneously open itself up to heavy legitimate use and shield itself from abuse. It is sometimes hard to differentiate heavy legitimate use, excessive legitimate use, and illegitimate use (and sometimes it is really easy). Among other defenses, ERDDAP consciously does not allow a single request to use an inordinate fraction of the system's resources (unless the system is otherwise not active).
- Identify Troublesome Users - If ERDDAP is slowing down or freezing (perhaps because a naive user or a bot is running multiple scripts to submit multiple requests simultaneously or perhaps because of a bad guy's Denial-of-service attack), you can look at the Daily Report email (and more frequent identical information in the ERDDAP log file) which displays the number of requests made by the most active users (see "Requester's IP Address (Allowed)"). ERDDAP also sends emails to the administrator whenever there is "Unusual activity: >25% of requests failed". You can then look in the ERDDAP log file to see the nature of their requests. If you feel that someone is making too many requests, bizarre requests (you wouldn't believe what I've seen, well, maybe you would), or attack-type requests, you can add their IP address to the blacklist.
- Blacklist -- You can add the IP address of troublesome users, bots, and Denial-of-service attackers to the ERDDAP blacklist, so that future requests from them will be immediately rejected. This setting is in datasets.xml so that you can quickly add an IP address to the list and then flag a dataset so that ERDDAP immediately notices and applies the change. The error message sent to blacklisted users encourages them to contact the ERDDAP administrator if they feel they have been mistakenly put on the blacklist. (In our experience, several users have been unaware that they were running multiple scripts simultaneously, or that their scripts were making nonsense requests.)
- Dataset Security - Some types of datasets (notably, EDDTableFromDatabase) present additional security risks (e.g., SQL injection) and have their own security measures. See the information for those types of datasets in Working with the datasets.xml File, notably EDDTableFromDatabase security.
- Security Audit -- Although NOAA IT security refused our requests for scans for years, they now routinely scan my (Bob's) ERDDAP installation. Although the initial scans found some problems that I then fixed, subsequent scans haven't found problems with ERDDAP. The scans worry about a lot of things: notably, since tabledap requests look like SQL requests, they worry about SQL injection vulnerabilities. But those concerns are unfounded because ERDDAP always parses and validates queries and then separately builds the SQL query in a way that avoids injection vulnerabilities. The other thing they sometimes complain about is that our Java version or Tomcat versions aren't as up-to-date as they want, so we update them in response. I previously offered to show people the security reports, but I'm now told I can't do that.
Questions? Suggestions?
If you have any questions about ERDDAP's security system or have any questions, doubts, concerns, or suggestions about how it is set up, please email erd dot data at noaa dot gov.

Things You Don't Need To Know

These are details that you don't need to know until a need arises.

Setting Up a Second ERDDAP for Testing/Development
If you want to do this, there are two approaches:
- (Best) Install Tomcat and ERDDAP on a computer other than the computer that has your public ERDDAP. If you use your personal computer:
  1. Do the installation one step at a time. Get Tomcat up and running first.
    When Tomcat is running, the Tomcat Manager should be at
    http://127.0.0.1:8080/manager/html/ (or perhaps http://localhost:8080/manager/html/)
  2. Install ERDDAP.
  3. Don't use ProxyPass to eliminate the port number from the ERDDAP URL.
  4. In setup.xml, set baseUrl to http://127.0.0.1:8080
  5. After you start up this ERDDAP, you should be able to see it at
    http://127.0.0.1:8080/erddap/status.html (or perhaps http://localhost:8080/erddap/status.html)
- (Second Best) Install another Tomcat on the same computer as your public ERDDAP.
  1. Do the installation one step at a time. Get Tomcat up and running first.
    Change all of the port numbers associated with the second Tomcat (e.g., change 8080 to 8081) (see the Multiple Tomcat Instances section halfway through that document).
  2. Install ERDDAP in the new Tomcat.
  3. Don't use ProxyPass to eliminate the port number from the ERDDAP URL.
  4. In setup.xml, set baseUrl to http://www.yourDomainName:8081
  5. After you start up this ERDDAP, you should be able to see it at
    http://www.yourDomainName:8081/erddap/status.html
Solid State Drives (SSDs) are great!
The quickest, easiest, and cheapest way to speed up ERDDAP's access to tabular data is to put the data files on a Solid State Drive (SSD). Most tabular datasets are relatively small, so a 1 or 2 TB SSD is probably sufficient to hold all of the data files for all of your tabular datasets. SSD's eventually wear out if you write data to a cell, delete it, and write new data to that cell too many times. So if you just use your SSD to write the data once and read it many times, even a consumer-grade SSD should last a very long time, probably much longer than any Hard Disk Drive (HDD). Consumer-grade SSD's are now cheap (in 2018, ~$200 for 1 TB or ~$400 for 2 TB) and prices are still falling fast. When ERDDAP accesses a data file, an SSD offers both shorter latency (~0.1ms, versus ~3ms for an HDD, versus ~10(?)ms for a RAID, versus ~55ms for Amazon S3) and higher throughput (~500 MB/S, versus ~75 MB/s for an HDD, versus ~500 MB/s for a RAID). So you can get a big performance boost (up to 10X versus a HDD) for $200! Compared to most other possible changes to your system (a new server for $10,000? a new RAID for $35,000? a new network switch for $5000? etc.), this is by far the best Return On Investment (ROI). If/when the SSD dies (in 1, 2, ... 8 years), replace it. Don't rely on it as for long term, archival storage of the data, just for the front-end copy of the data. [SSD's would be great for gridded data, too, but most gridded datasets are much larger, making the SSD very expensive.]
If your server isn't loaded with memory, additional memory for your server is also a great and relatively inexpensive way to speed up all aspects of ERDDAP.
Heavy Loads / Constraints
With heavy use, a standalone ERDDAP may be constrained by various problems. For more information, see the list of constraints and solutions.
Grids, Clusters, and Federations
Under very heavy use, a single standalone ERDDAP will run into one or more constraints and even the suggested solutions will be insufficient. For such situations, ERDDAP has features that make it easy to construct scalable grids (also called clusters or federations) of ERDDAPs which allow the system to handle very heavy use (e.g., for a large data center). For more information, see grids, clusters, and federations of ERDDAPs.
Cloud Computing
Several companies are starting to offer cloud computing services (e.g., Amazon Web Services). Web hosting companies have offered simpler services since the mid-1990's, but the "cloud" services have greatly expanded the flexibility of the systems and the range of services offered. You can use these services to set up a single ERDDAP or a grid/cluster of ERDDAPs to handle very heavy use. For more information, see cloud computing with ERDDAP.
Amazon Web Services (AWS) EC2 Installation Overview
Amazon Web Services (AWS) is a cloud computing service that offers a wide range of computer infrastructure that you can rent by the hour. You can install ERDDAP on an Elastic Compute Cloud (EC2) instance (their name for a computer that you can rent by the hour). AWS has an excellent AWS User Guide and you can use Google to find answers to specific questions you might have. Brace yourself -- it is a fair amount of work to get started. But once you get one server up and running, you can easily rent as many additional resources (servers, databases, SSD-space, etc.) as you need, at a reasonable price. [This isn't a recommendation or endorsement of Amazon Web Services. There are other cloud providers.]
An overview of things you need to do to get ERDDAP running on AWS is:
- In general, you will do all the things described in the AWS User Guide.
- Set up an AWS account.
- Set up an AWS user within that account with administrator privileges. Log in as this user to do all the following steps.
- Elastic Block Storage (EBS) is the AWS equivalent of a hard drive attached to your server. Some EBS space will be allocated when you first create an EC2 instance. It is persistent storage -- the information isn't lost when you stop your EC2 instance. And if you change instance types, your EBS space automatically gets attached to the new instance.
- Create an Elastic IP address so that your EC2 instance has a stable, public URL (as opposed to just a private URL that changes every time you restart your instance).
- Create and start up an EC2 instance (computer). There are a wide range of instance types, each at a different price. An m4.large or m4.xlarge instance is powerful and is probably suitable for most uses, but choose whatever meets your needs. You will probably want to use Amazon's Linux as the operating system.
- If your desktop/laptop computer is a Windows computer, you can use PuTTY, a free SSH client for Windows, to get access to your EC2 instance's command line. Or, you may have some other SSH program that you prefer.
- When you log into your EC2 instance, you will be logged in as the administrative user with the user name "ec2-user". ec2-user has sudo privileges. So, when you need to do something as the root user, use: sudo someCommand
- If your desktop/laptop computer is a Windows computer, you can use FileZilla, a free SFTP program, to transfer files to/from your EC2 instance. Or, you may have some other SFTP program that you prefer.
- Install Apache on your EC2 instance.
- Follow the standard ERDDAP installation instructions.
WaitThenTryAgain Exception
A user may get an error message like WaitThenTryAgainException: There was a (temporary?) problem. Wait a minute, then try again. (In a browser, click the Reload button.) Details: GridDataAccessor.increment: partialResults[0]="123542730" was expected to be "123532800".
The general explanation of the WaitThenTryAgainException is:
When ERDDAP is responding to a user request, there may be an unexpected error with the dataset (e.g., an error while reading data from the file, or an error accessing a remote dataset). WaitThenTryAgain signals to ERDDAP that the request failed (so far) but that ERDDAP should try to reload the dataset quickly (it calls RequestReloadASAP) and retry the request. Often, this succeeds, and the user just sees that the response to the request was slow. Other times, the reload fails or is too slow, or the subsequent attempt to deal with the request also fails and throws another WaitThenTryAgain. If that happens, ERDDAP marks the dataset for reloading but tells the user (via a WaitThenTryAgain Exception) that there was a failure while responding to the request.
That is the normal behavior. This system can deal with many common problems.
But it is possible for this system to get triggered excessively. The most common cause is that ERDDAP's loading of the dataset doesn't see a problem, but ERDDAP's response to a request for data does see the problem. No matter what the cause is, the solution is for you to deal with whatever is wrong with the dataset. Look in log.txt to see the actual error messages and deal with the problems. If lots of files have valid headers but invalid data (a corrupted file), replace the files with uncorrupted files. If the connection to a RAID is flakey, fix it. If the connection to a remote service is flakey, find a way to make it not flakey or download all the files from the remote source and serve the data from the local files.
The detailed explanation of that specific error (above) is:
For each EDDGrid dataset, ERDDAP keeps the axis variable values in memory. They are used, for example, to convert requested axis values that use the "()" format into index numbers. For example, if the axis values are "10, 15, 20, 25", a request for (20) will be interpreted as a request for index #2 (0-based indices). When ERDDAP gets a request for data and gets the data from the source, it verifies that the axis values that it got from the source match the axis values in memory. Normally, they do. But sometimes the data source has changed in a significant way: for example, index values from the beginning of the axis variable may have been removed (e.g., "10, 15, 20, 25" may have become "20, 25, 30"). If that happens, it is clear that ERDDAP's interpretation of the request (e.g., "(20)" is index #2) is now wrong. So ERDDAP throws an exception and calls RequestReloadASAP. ERDDAP will update the dataset soon (often in a few seconds, usually within a minute). Other, similar problems also throw the WaitThenTryAgain exception.
RequestReloadASAP
You may see RequestReloadASAP in the log.txt file right after an error message and often near a WaitThenTryAgain Exception. It is basically an internal, programmatic way for ERDDAP to set a flag to signal that the dataset should be reloaded ASAP.
Files Not Being Deleted
For a few ERDDAP installations, there has been a problem with some temporary files being created by ERDDAP staying open (mistakenly) and thus not being deleted. In a few cases, many of these files have accumulated and taken up a significant amount of disk space.
Hopefully, these problems are fixed (as of ERDDAP v2.00). If you see this problem, please email the directory+names of the offending files to Chris.John at noaa.gov. You have a few options for dealing with the problem:
- If the files aren't big and aren't causing you to run out of disk space, you can ignore the problem.
- The simplest solution is to shut down tomcat/erddap (after hours so fewer users are affected). During the shutdown, if the operating system doesn't delete the files, delete them by hand. Then restart ERDDAP.
Semantic Markup of Datasets with json-ld (JSON Linked Data)
ERDDAP now uses json-ld (JSON Linked Data) to make your data catalog and datasets part of the semantic web, which is Tim Berners-Lee's idea to make web content more machine readable and machine "understandable". The json-ld content uses schema.org terms and definitions. Search engines (Google in particular) and other semantic tools can use this structured markup to facilitate discovery and indexing. The json-ld structured markup appears as invisible-to-humans <script> code on the https://.../erddap/info/index.html web page (which is a semantic web DataCatalog) and on each https://.../erddap/info/datasetID/index.html web page (which is a semantic web Dataset). (Special thanks to Adam Leadbetter and Rob Fuller of the Marine Institute in Ireland for doing the hard parts of the work to make this part of ERDDAP.)
Out-Of-Date URLs
Slowly but surely, the URLs that data providers have written into data files are becoming out-of-date (for example, http becomes https, websites are rearranged, and organizations like NODC/NGDC/NCDC are reorganized into NCEI). The resulting broken links are an ever-present problem faced by all websites. To deal with this, ERDDAP now has a system to automatically update out-of-date URLs. If GenerateDatasetsXml sees an out-of-date URL, it adds the up-to-date URL to <addAttributes>. Also, when a dataset loads, if ERDDAP sees an out-of-date URL, it silently changes it to the up-to-date URL. The changes are controlled by a series of search-for/replace-with pairs defined in <updateUrls> in ERDDAP's
[tomcat]/webapps/erddap/WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml file. You can make changes there. If you have suggestions for changes, or if you think this should be turned into a service (like the Converters), please email Chris.John at noaa.gov.
CORS (Cross-Origin Resource Sharing)
"is a mechanism that allows restricted resources (e.g. fonts [or ERDDAP data]) on a web page to be requested from another domain outside the domain from which the first resource was served" (Arun Ranganathan). Basically, CORS is a message that can be put in the HTTP header of a response, saying essentially, "it is okay with this site if certain other sites (specific ones, or all) grab resources (e.g., data) from this site and make it available on their site". Thus, it is an alternative to JSONP.
The developers of ERDDAP do not claim to be security experts. We are not entirely clear about the security issues related to CORS. We don't want to make any statement endorsing an action that decreases security. So we'll just stay neutral and leave it up to each ERDDAP admin to decide if the benefits or enabling a CORS header are worth the risks. As always, if your ERDDAP has any private datasets, it's a good idea to be extra careful about security.
If you want to enable CORS for your ERDDAP, there are readily available instructions describing how website administrators can enable a CORS header via their lower level server software (e.g., Apache or nginx).
Palettes
are used by ERDDAP to convert a range of data values into a range of colors when making graphs and maps.
Each palette is defined in a .cpt-style palette file as used by GMT. All ERDDAP .cpt files are valid GMT .cpt files, but the opposite is not true. For use in ERDDAP, .cpt files have:
- Optional comments lines at the start of the file, starting with "#".
- A main section with a description of the segments of the palette, one segment per line. Each segment description line has 8 values:
  startValue, startRed, startGreen, startBlue, endValue, endRed, endGreen, endBlue.
  There may be any number of segments. ERDDAP uses linear interpolation between the startRed/Green/Blue and endRed/Green/Blue of each segment.
  We recommend that each segment specify a start and end color which are different, and that the start color of each segment be the same as the end color of the previous segment, so that the palette describes a continuous blend of colors. ERDDAP has a system for creating on-the-fly a palette of discrete colors from a palette with a continuous blend of colors. An ERDDAP user can specify if they want the palette to be Continuous (the original) or Discrete (derived from the original). But there are legitimate reasons for not following these recommendations for some palettes.
- The startValue and endValues must be integers.
  The first segment must have startValue=0 and endValue=1.
  The second segment must have startValue=1 and endValue=2.
  Etc.
- The red, green, and blue values must be integers from 0 (none) ... 255 (full on).
- The end of the file must have 3 lines with:
  1. A background rgb color for data values less than the colorbar minimum, e.g.: B 128 128 128
    It is often the startRed, startGreen, and startBlue of the first segment.
  2. A foreground rgb color for data values more than the colorbar maximum, e.g.: F 128 0 0
    It is often the endRed, endGreen, and endBlue of the last segment.
  3. An rgb color for NaN data values, e.g., N 128 128 128
    It is often middle gray (128 128 128).
- The values on each line must be separated by tabs, with no extraneous spaces.
A sample .cpt file is BlueWhiteRed.cpt:
```
# This is BlueWhiteRed.cpt.
0  0    0    128  1  0    0    255
1  0    0    255  2  0    255  255
2  0    255  255  3  255  255  255
3  255  255  255  4  255  255  0
4  255  255  0    5  255  0    0
5  255  0    0    6  128  0    0
B  0    0    128
F  128  0    0
N  128  128  128
```
See the existing .cpt files for other examples. If there is trouble with a .cpt file, ERDDAP will probably throw an error when the .cpt file is parsed (which is better than misusing the information).
You can add additional palettes to ERDDAP. You can make them yourself or find them on the web (for example, at cpt-city) although you'll probably have to edit their format slightly to conform to ERDDAP's .cpt requirements. To get ERDDAP to use a new .cpt file, store the file in tomcat/webapps/erddap/WEB-INF/cptfiles (you'll need to do that for each new version of ERDDAP) and either:
- If you use the default messages.xml file: add the filename to the <palettes> tag in
  tomcat/webapps/erddap/WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml.
  If you do this, you need to do it every time you upgrade ERDDAP.
- If you use a custom messages.xml file: add the filename to the <palettes> tag in your custom messages.xml file: tomcat/content/erddap/messages.xml . If you do this, you only need to do it once (but there is other work to maintain a custom messages.xml file).
Then restart ERDDAP so ERDDAP notices the changes. An advantage of this approach is that you can specify the order of the palettes in the list presented to users. If you add a collection, we encourage you to add a prefix with the authors initials (e.g., "KT_") to the name of each palette to identify the collection and so that there can be multiple palettes which would otherwise have the same name.
Please don't remove or change any of the standard palettes. They are a standard feature of all ERDDAP installations. If you think a palette or collection of palettes should be included in the standard ERDDAP distribution because it/they would be of general use, please email them to Chris.John at noaa.gov.
How does ERDDAP generate the colors in a colorbar?
1. The user selects one of the predefined palettes or uses the default, e.g., Rainbow. Palettes are stored/defined in GMT-style .cpt Color Palette Table files. Each of ERDDAP's predefined palettes has a simple integer range, e.g., 0 to 1 (if there is just one section in the palette), or 0 to 4 (if there are four sections in the palette). Each segment in the file covers n to n+1, starting at n=0.
2. ERDDAP generates a new .cpt file on-the-fly, by scaling the predefined palette's range (e.g., 0 to 4) to the range of the palette needed by the user (e.g., 0.1 to 50) and then generating a section in the new palette for each section of the new palette (e.g., a log scale with ticks at 0.1, 0.5, 1, 5, 10, 50 will have 5 sections). The color for the end point of each section is generated by finding the relevant section of the palette in the .cpt file, then linearly interpolating the R, G, and B values. (That's the same as how GMT generates colors from its Color Palette Table files.) This system allows ERDDAP to start with generic palettes (e.g., Rainbow with 8 segments, in total spanning 0 to 8) and create custom palettes on-the-fly (e.g., a custom Rainbow, which maps 0.1 to 50 mg/L to the rainbow colors).
3. ERDDAP then uses that new .cpt file to generate the color for each different colored pixel in the color bar (and later for each data point when plotting data on a graph or map), again by finding the relevant section of the palette in the .cpt file, then linearly interpolating the R, G, and B values.
This process may seem unnecessarily complicated. But it solves problems related to log scales that are hard to solve other ways.
So how can you mimic what ERDDAP is doing? That isn't easy. Basically you need to duplicate the process that ERDDAP is using. If you are a Java programmer, you can use the same Java class that ERDDAP uses to do all of this:
tomcat/webapps/erddap/WEB-INF/classes/gov/noaa/pfel/coastwatch/sgt/CompoundColorMap.java.
Guidelines for Data Distribution Systems
More general opinions about the design and evaluation of data distribution systems can be found here.
ArchiveADataset
Included in your ERDDAP installation is a command line tool called ArchiveADataset which can help you make an archive (a .zip or .tar.gz file) with part or all of a dataset stored in a series of netcdf-3 .nc data files in a file format that is suitable for submission to NOAA's NCEI archive (.nc for gridded datasets or .ncCFMA for tabular datasets, as specified by the NCEI NetCDF Templates v2.0).
ArchiveADataset can make two different archive formats:
- The "original" format follows these NCEI Archiving Guidelines, this guide for Archiving Your Data at NCEI, and the related Practices for Ensuring Data Integrity.
- The "BagIt" format makes BagIt files, a standardized archive format promoted by the U.S. Library of Congress, as specified by the BagIt v0.97 specification. NOAA's NCEI may standardize on BagIt files for submissions to the archive.
Not surprisingly, the global and variable metadata that ERDDAP encourages/requires is almost exactly the same in-file CF and ACDD metadata that NCEI encourages/requires, so all of your datasets should be ready for submission to NCEI via Send2NCEI or ATRAC (NCEI's Advanced Tracking and Resource tool for Archive Collections).
If you (the ERDDAP administrator) use ArchiveADataset to submit data to NCEI, then you (not NCEI) will determine when to submit a chunk of data to NCEI and what that chunk will be, because you will know when there is new data and how to specify that chunk (and NCEI won't). Thus, ArchiveADataset is a tool for you to use to create a package to submit to NCEI.
ArchiveADataset may be useful in other situations, for example, for ERDDAP administrators who need to convert a subset of a dataset (on a private ERDDAP) from its native file format into a set of .ncCF files, so that a public ERDDAP can serve the data from the .ncCF files instead of the original files.
Once you have set up ERDDAP and run it (at least one time), you can find and use ArchiveADataset in the tomcat/webapps/erddap/WEB-INF directory. There is a shell script (ArchiveADataset.sh) for Linux/Unix and a batch file (ArchiveADataset.bat) for Windows.
On Windows, the first time you run ArchiveADataset, you need to edit the ArchiveADataset.bat file with a text editor to change the path to the java.exe file so that Windows can find Java.
When you run ArchiveADataset, it will ask you a series of questions. For each question, type a response, then press Enter. Or press ^C to exit a program at any time.
Or, you can put the answers to the questions, in order, on the command line. To do this, run the program once and type in and write down your answers. Then, you can create a single command line (with the answers as parameters) which runs the program and answers all the questions.
Use the word default if you want to use the default value for a given parameter.
Use "" (two double quotes) as a placeholder for an empty string.
Specifying parameters on the command line can be very convenient, for example, if you use ArchiveADataset once a month to archive a month's worth of data. Once you have generated the command line with parameters and saved that in your notes or in a shell script, you just need to make small changes each month to make that month's archive.
The questions that ArchiveADataset asks allow you to:
- Specify original or Bagit file packaging. For NCEI, use Bagit.
- Specify zip or tar.gz compression for the package. For NCEI, use tar.gz.
- Specify a contact email address for this archive (it will be written in the READ_ME.txt file in the archive).
- Specify the datasetID of the dataset you want to archive.
- Specify which data variables you want to archive (usually all).
- Specify which subset of the dataset you want to archive. You need to format the subset in the same way you would format a subset for a data request, so it will be different for gridded than for tabular datasets.
  - For gridded datasets, you can specify a range of values of the leftmost dimension, usually that is a range of time. ArchiveADataset will make a separate request and generate a separate data file for each value in the range of values. Since gridded datasets are usually large, you will almost always have to specify a small subset relative to the size of the entire dataset.
    For example, [(2015-12-01):(2015-12-31)][][][]
  - For tabular datasets, you can specify any collection of constraints, but it is often a range of time. Since tabular datasets are usually small, it is often possible to specify no constraints, so that the entire dataset is archived.
    For example, &time>=2015-12-01&time<2016-01-01
- For tabular datasets: specify a comma separated list of 0 or more variables that will determine how the archived data is further subsetted into different data files. For datasets that have
  cdm_data_type=TimeSeries|TimeSeriesProfile|Trajectory|TrajectoryProfile
  you should almost always specify the variable that has the cf_role=timeseries_id (e.g., stationID) or cf_role=trajectory_id attribute. ArchiveADataset will make a separate request and generate a separate data file for each combination of the values of these variables, e.g., for each stationID.
  For all other tabular datasets, you will probably not specify any variables for this purpose.
  Warning: If the subset of the dataset you are archiving is very large (>2GB) and there is no suitable variable for this purpose, then ArchiveADataset is not usable with this dataset. This should be rare.
- Specify the file format for the data files that will be created.
  For gridded datasets, for NCEI, use .nc .
  For tabular datasets, for NCEI, use .ncCFMA if it is an option; otherwise use .nc.
- Specify the type of file digest to be created for each data file and for the entire archive package: MD5, SHA-1, or SHA-256. The file digest provides a way for the client (e.g., NCEI) to test whether the data file has become corrupted. Traditionally, these were .md5 files, but now there are better options. For NCEI, use SHA-256 .
After you answer all of the questions, ArchiveADataset will:
1. Make a series of requests to the dataset and stage the resulting data files in bigParentDirectory/ArchiveADataset/datasetID_timestamp/.
  For gridded datasets, there will be a file for each value of the leftmost dimension (e.g., time). The name of the file will be that value (e.g., the time value).
  For tabular datasets, there will be a file for each value of the ... variable(s). The name of the file will be that value. If there is more than one variable, the left variables will be used to make subdirectory names, and the rightmost variable will be used to make the filenames.
  Each data file must be <2GB (the maximum allowed by .nc version 3 files).
2. Make a file related to each data file with the digest of the data file. For example, if the data file is 46088.nc and the digest type is .sha256, then the digest file will have the name 46088.nc.sha256 .
3. Make a READ_ME.txt file with information about the archive, including a list of all the settings you specified to generate this archive.
4. Make 3 files in bigParentDirectory/ArchiveADataset/ :
  - A .zip or .tar.gz archive file named datasetID_timestamp.zip (or .tar.gz) containing all of the staged data files and digest files. This file may be any size, limited only by disk space.
  - A digest file for the archive file, for example, datasetID_timestamp.zip.sha256.txt
  - For the "original" type of archive, a text file named datasetID_timestamp.zip.listOfFiles.txt (or .tar.gz) which lists all of the files in the .zip (or .tar.gz) file.
  If you are preparing the archive for NCEI, these are the files that you will send to NCEI, perhaps via Send2NCEI or ATRAC (NCEI's Advanced Tracking and Resource tool for Archive Collections).
5. Delete all of the staged files so that only the archive file (e.g., .zip), the digest (e.g., .sha256.txt) of the archive, and (optionally) the .listOfFiles.txt files remain.
ISO 19115 .xml Metadata Files --
The ArchiveADataset archive package does not include the ISO 19115 .xml metadata file for the dataset. If you want/need to submit an ISO 19115 file for your dataset to NCEI, you can send them the ISO 19115 .xml metadata file that ERDDAP created for the dataset (but NMFS people should get the ISO 19115 file for their datasets from InPort if ERDDAP isn't already serving that file).
Problems? Suggestions? ArchiveADataset is new. If you have problems or suggestions, please email them to erd dot data at noaa dot gov .

Slide Shows

Here are some PowerPoint slide shows and documents that Bob Simons has created related to ERDDAP.

DISCLAIMER: The content and opinions expressed in these documents are Bob Simons' personal opinions and do not necessarily reflect any position of the Government or the National Oceanic and Atmospheric Administration.

The Four Main Documents:

Programmer's Guide

These are things that only a programmer who intends to work with ERDDAP's Java classes needs to know.

Getting the Source Code
- Via erddap.war
  The source code for the current version of ERDDAP is always in the current erddap.war file So when you install ERDDAP in Tomcat (see the instructions at the top of this page), all of the source code is unpacked and installed on that computer. If you just want to read the source code of the current official version of ERDDAP, this is the easiest option.
- Via Source Code on GitHub
  The source code for recent public versions and in-development versions is also available via GitHub. Please read the Wiki for that project. If you want to modify the source code (and possibly have the changes incorporated into the standard ERDDAP distribution), this is the recommended approach.
  Before the 2020-12-22 GitHub release, one file was missing from the GitHub version of the code because it was too big: the AWS SDK for Java v1.11 jar file. With the 2020-12-22 release, we switched to v2 of the AWS SDK which uses numerous small .jar files (which are in the /lib directory) instead of one huge .jar file. Thus, the GitHub version now includes all ERDDAP files.
ERDDAP and its subcomponents have very liberal, open-source licenses, so you can use and modify the source code for any purpose, for-profit or not-for-profit. Note that ERDDAP and many subcomponents have licenses that require that you acknowledge the source of the code that you are using. See Credits. Whether required or not, it is just good form to acknowledge all of these contributors.
Use the Code for Other Projects
While you are welcome to use parts of the ERDDAP code for other projects, be warned that the code can and will change. We don't promise to support other uses of our code. Git and GitHub will be your main solutions for dealing with this -- Git allows you to merge our changes into your changes.
For many situations where you might be tempted to use parts of ERDDAP in your project, we think you will find it much easier to install and use ERDDAP as is, and then write other services which use ERDDAP's services. You can set up your own ERDDAP installation crudely in an hour or two. You can set up your own ERDDAP installation in a polished way in a few days (depending on the number and complexity of your datasets). But hacking out parts of ERDDAP for your own project is likely to take weeks (and months to catch subtleties) and you will lose the ability to incorporate changes and bug fixes from subsequent ERDDAP releases. We (obviously) think there are many benefits to using ERDDAP as is and making your ERDDAP installation publicly accessible. However, in some circumstances, you might not want to make your ERDDAP installation publicly accessible. Then, your service can access and use your private ERDDAP and your clients needn't know about ERDDAP.
Halfway
Or, there is another approach which you may find useful which is halfway between delving into ERDDAP's code and using ERDDAP as a stand-alone web service: In the EDD class, there is a static method which lets you make an instance of a dataset (based on the specification in datasets.xml):
oneFromDatasetXml(String tDatasetID)
It returns an instance of an EDDTable or EDDGrid dataset. Given that instance, you can call
makeNewFileForDapQuery(String userDapQuery, String dir, String fileName, String fileTypeName)
to tell the instance to make a data file, of a specific fileType, with the results from a user query. Thus, this is a simple way to use ERDDAP's methods to request data and get a file in response, just as a client would use the ERDDAP web application. But this approach works within your Java program and bypasses the need for an application server like Tomcat. We use this approach for many of the unit tests of EDDTable and EDDGrid subclasses, so you can see examples of this in the source code for all of those classes.
Development Environment
- Set up ERDDAP in Tomcat
  Since ERDDAP is mainly intended to be a servlet running in Tomcat, we strongly recommend that you follow the standard installation instructions at the top of this page to install Tomcat, and then install ERDDAP in Tomcat's webapps directory. Among other things, ERDDAP was designed to be installed in Tomcat's directory structure and expects Tomcat to provide some .jar files.
- Our development environment is just a programmer's editor (EditPlus, although that isn't a recommendation since we're not allowed to make recommendations).
  We don't use Eclipse, Ant, etc.; nor do we offer ERDDAP-related support for them.
  Starting in December 2020, we do use Maven to compile the classes periodically in order to manage/gather the needed external .jar files. (Amazon's SDK for Java forced us to do this.) The ERDDAP pom.xml file for Maven is included in the GitHub ERDDAP distribution.
- We use a batch file which deletes all of the .class files in the source tree to ensure that we have a clean compile (with javac).
- We currently use Adoptium's javac jdk-17.0.4+8 to compile gov.noaa.pfeg.coastwatch.TestAll (it has links to a few classes that wouldn't be compiled otherwise) and run the tests. For security reasons, it is almost always best to use the latest versions of Java 17 and Tomcat 10.
  - When we run javac or java, the current directory is tomcat/webapps/erddap/WEB-INF .
  - Our javac and java classpath is
    classes;../../../lib/servlet-api.jar;lib/*
  - So your javac command line will be something like
    javac -encoding UTF-8 -cp classes;../../../lib/servlet-api.jar;lib/* classes/gov/noaa/pfel/coastwatch/TestAll.java
  - And your java command line will be something like
    java -cp classes;../../../lib/servlet-api.jar;lib/* -Xmx4000M -Xms4000M classes/gov/noaa/pfel/coastwatch/TestAll
    Optional: you can add -verbose:gc, which tells Java to print garbage collection statistics.
  - If TestAll compiles, everything ERDDAP needs has been compiled. A few classes are compiled that aren't needed for ERDDAP. If compiling TestAll succeeds but doesn't compile some class, that class isn't needed. (There are some unfinished/unused classes.)
- In a few cases, we use 3rd party source code instead of .jar files (notably for DODS) and have modified them slightly to avoid problems compiling with Java 17. We have often made other slight modifications (notably to DODS) for other reasons.
- Most classes have test methods. We run lots of tests via TestAll. Unfortunately, many of the tests are specific to our set up. (Sorry. We're working to move all test files to the erddapTest and erddapTestBig directories which are managed by Git and available at GitHub.)
Important Classes
If you want to look at the source code and try to figure out how ERDDAP works, please do.
- The code has JavaDoc comments, but the JavaDocs haven't been generated. Feel free to generate them.
- The most important classes (including the ones mentioned below) are within gov/noaa/pfel/erddap.
- The Erddap class has the highest level methods. It extends HttpServlet.
- Erddap passes requests to instances of subclasses of EDDGrid or EDDTable, which represent individual datasets.
- EDStatic has most of the static information and settings (e.g., from the setup.xml and messages.xml files) and offers static services (e.g., sending emails).
- EDDGrid and EDDTable subclasses parse the request, get data from subclass-specific methods, then format the data for the response.
- EDDGrid subclasses push data into GridDataAccessor (the internal data container for gridded data).
- EDDTable subclasses push data into TableWriter subclasses, which write data to a specific file type on-the-fly.
- Other classes (e.g., low level classes) are also important, but it is less likely that you will be working to change them.
Code Contributions
If you have written some code which would be a nice addition to ERDDAP (or better, before you write the code so that we can coordinate), please email erd dot data at noaa dot gov. We'll work out the details. The most likely situations are:
1. You want to write another subclass of EDDGrid or EDDTable to handle another data source type. If so, we recommend that you find the closest existing subclass and use that code as a starting point.
2. You want to write another saveAsFileType method. If so, we recommend that you find the closest existing saveAsFileType method in EDDGrid or EDDTable and use that code as a starting point.
3. You want to work on one of the GitHub Issues, which are projects for which we are soliciting help from others.
Those situations have the advantage that the code you write is self-contained. You won't need to know all the details of ERDDAP's internals. And it will be easy for us to incorporate your code in ERDDAP. Note that if you do submit code, the license will need compatible with the ERDDAP license (e.g., Apache, BSD, or MIT-X). We'll list your contribution in the credits.
GitHub Issues
If you would like to contribute but don't have a project, see the list of GitHub Issues, many of which are projects you could take on.
Judging Your Code Contributions
If you want to submit code or other changes to be included in ERDDAP, that is great. Your contribution needs to meet certain criteria in order to be accepted. If you follow the guidelines below, you greatly increase the chances of your contribution being accepted.
- I'll try to be a BDFL (Benevolent Dictator For Life, well, until I retire), with the emphasis on Benevolent.
  Basically, that means I'm responsible for ERDDAP so I have final word on decisions about ERDDAP, notably about the design and whether a pull request will be accepted or not. It needs to be this way partly for efficiency reasons (it works for Linus Torvalds and Linux) and partly for security reasons: I have to tell the IT security people that I take responsibility for the security and integrity of the code.
- I don't guarantee that I'll accept your code.
  If a project just doesn't work out as well as we had hoped and if it can't be salvaged, I won't include the project in the ERDDAP distribution. Please don't feel bad. Sometimes projects don't work out as well as hoped. It happens, even to me. If you follow the guidelines below, you greatly increase your chances of success.
- It's best if the changes are of general interest and usefulness.
  If the code is specific to your organization, it is probably best to maintain a separate branch of ERDDAP for your use. Axiom does this. Fortunately, Git makes this easy to do. I want to maintain a consistent vision for ERDDAP, not allow it to become a kitchen sink project where everyone adds a custom feature for their project.
- Compile via TestAll.
  Since the standard way to compile ERDDAP is to compile TestAll.java, compiling TestAll must compile all of the classes for your code. If it doesn't, simply add the hidden classes to the "Force compilation" section of TestAll.java so that it does.
- Follow the Java Code Conventions.
  In general, your code should be good quality and should follow the original Java Code Conventions: put .class files in the proper place in the directory structure, give .class files an appropriate name, include proper JavaDoc comments, include //comments at the start of each paragraph of code, indent with 4 spaces (not tab), avoid lines >80 characters, etc. (Yes, my code is far from perfect in this regard. I continually work to make it better.)
- Use descriptive class, method and variable names.
  That makes the code easier for others to read.
- Avoid fancy code.
  In the long run, you or other people will have to figure out the code in order to maintain it. So please use simple coding methods that are thus easier for others (including you in the future) to figure out. Obviously, if there is a real advantage to using some fancy Java programming feature, use it, but extensively document what you did, why, and how it works.
- Work with Bob before you start.
  If you hope to get your code changes pulled into ERDDAP, I definitely want to talk about what you're going to do and how you're going to do it before you make any changes to the code. That way, we can avoid you making changes that I, in the end, don't accept. When you're doing the work, I'm willing to answer questions to help you figure out the existing code and (overall) how to tackle your project.
- Work independently (as much as possible) after you start.
  In contrast to the above "Work with Bob", after you get started on the project, I encourage you to work as independently as possible. If I have to tell you almost everything and answer lots of questions (especially ones that you could have answered by reading the documentation or the code), then your efforts aren't a time savings for me and I might as well do the work myself. It's the Mythical Man Month problem. Of course, we should still communicate. It would be great to periodically see your work in progress to make sure the project is on track. But the more you can work independently (after we agree on the task at hand and the general approach), the better.
- Avoid bugs.
  If a bug isn't caught before a release, it causes problems for users (at best), returns the wrong information (at worst), is a blot on ERDDAP's reputation, and will persist on out-of-date ERDDAP installations for years. Work very hard to avoid bugs. Part of this is writing clean code (so it is easier to see problems). Part of this is writing unit tests. Part of this is a constant attitude of bug avoidance when you write code. Don't make me regret adding your code to ERDDAP.
- Write a unit test or tests.
  At the bottom of most classes in ERDDAP is a test() method that calls all of the individual test methods for that class. Please write at least one individual test method that thoroughly tests the code you write and add it to the class' test() method so that it is run automatically. Unit (and related) tests are one of the best ways to catch bugs, initially, and in the long run (as other things change in ERDDAP). As I've said, "Unit tests are what lets me sleep at night."
- Make it easy for me to understand and accept the changes in your pull request.
  Part of that is writing a unit test method(s). Part of that is limiting your changes to one section of code (or one class) if possible. I won't accept any pull request with hundreds of changes throughout the code. I tell the IT security people that I take responsibility for the security and integrity of the code. If there are too many changes or they are too hard to figure out, then it's just too hard to verify the changes are correct and don't introduce bugs or security issues.
- Keep it simple.
  A good overall theme for your code is: Keep it simple. Simple code is easy for others (including you in the future) to read and maintain. It's easy for me to understand and thus accept.
- Assume long term responsibility for your code.
  In the long run, it is best if you assume ongoing responsibility for maintaining your code and answering questions about it (e.g., in the ERDDAP Google Group). As some authors note, code is a liability as well as an asset. If a bug is discovered in the future, it's best if you fix it because no one knows your code better than you (also so that there is an incentive to avoid bugs in the first place). I'm not asking for a firm commitment to provide ongoing maintenance. I'm just saying that doing the maintenance will be greatly appreciated.

List of Changes

The List of Changes for each ERDDAP release is now on a separate web page.

Credits

ERDDAP is a product of the NOAA (external link)

NMFS

SWFSC ERD.

Bob Simons is the original and still the main author of ERDDAP (the designer and software developer who wrote the ERDDAP-specific code). The starting point was Roy Mendelssohn's (Bob's boss) suggestion that Bob turn his ConvertTable program (a small utility which converts tabular data from one format to another and which was largely code from Bob's pre-NOAA work that Bob re-licensed to be open source) into a web service.

It was and is Roy Mendelssohn's ideas about distributed data systems, his initial suggestion to Bob, and his ongoing support (including hardware, network, and other software support, and by freeing up Bob's time so he could spend more time on the ERDDAP code) that has made this project possible and enabled its growth.

The ERDDAP-specific code is licensed as copyrighted open source, with NOAA (external link) holding the copyright. See the ERDDAP license.
ERDDAP uses copyrighted open source, Apache, LGPL, MIT/X, Mozilla, and public domain libraries and data.
ERDDAP does not require any GPL code or commercial programs.

The bulk of the funding for work on ERDDAP has come from NOAA, in that it paid Bob Simons' salary. For the first year of ERDDAP, when he was a government contractor, funding came from the NOAA CoastWatch (external link) program, the NOAA IOOS program, and the now defunct Pacific Ocean Shelf Tracking (POST) program.

Much credit goes to the many ERDDAP administrators and users who have made suggestions and comments which have led to many improvements in ERDDAP. Many are mentioned by name in the List of Changes. Thank you all (named and unnamed) very much. Thus, ERDDAP is a great example of User-Driven Innovation (external link) , where product innovation often comes from consumers (ERDDAP users), not just the producers (ERDDAP developers).

Here is the list of software and datasets that are in the ERDDAP distribution. We are very grateful for all of these. Thank you very much.
[Starting in 2021, it has become almost impossible to properly list all of the sources of code for ERDDAP because a few of the libraries we use (notably netcdf-java and especially AWS) in turn use many, many other libraries. All of the libraries that ERDDAP code calls directly are included below, as are many of the libraries that the other libraries call in turn. If you see that we have omitted a project below, please let us know so we can add the project below and give credit where credit is due.]

Overview
ERDDAP is a Java Servlet program. At ERD, it runs inside of a Tomcat application server (license: Apache), with an Apache web server (license: Apache), running on a computer using the Red Hat Linux operating system (license: GPL).
Datasets
The data sets are from various sources. See the metadata (in particular the "sourceUrl", "infoUrl", "institution", and "license") for each dataset. Many datasets have a restriction on their use that requires you to cite/credit the data provider whenever you use the data. It is always good form to cite/credit the data provider. See How to Cite a Dataset in a Paper.
CoHort Software
The com/cohort classes are from CoHort Software (https://www.cohortsoftware.com) which makes these classes available with an MIT/X-like license (see classes/com/cohort/util/LICENSE.txt).
CoastWatch Browser
ERDDAP uses code from the CoastWatch Browser project (now decomissioned) from the NOAA CoastWatch West Coast Regional Node (license: copyrighted open source). That project was initiated and managed by Dave Foley, a former Coordinator of the NOAA CoastWatch West Coast Regional Node. All of the CoastWatch Browser code was written by Bob Simons.
OPeNDAP
Data from OPeNDAP servers are read with Java DAP 1.1.7 (license: LGPL).
NetCDF-java
NetCDF files (.nc), GMT-style NetCDF files (.grd), GRIB, and BUFR are read and written with code in the NetCDF Java Library (license: BSD-3) from Unidata.
Software Included in the NetCDF Java .jar:
- slf4j
  The NetCDF Java Library and Cassandra need slf4j from the Simple Logging Facade for Java project. Currently, ERDDAP uses the slf4j-simple-xxx.jar renamed as slf4j.jar to meet this need. (license: MIT/X).
- JDOM
  The NetCDF Java .jar includes XML processing code from JDOM (license: Apache), which is included in the netcdfAll.jar.
- Joda
  The NetCDF Java .jar includes Joda for calendar calculations (which are probably not used by ERDDAP). (license: Apache 2.0).
- Apache
  The NetCDF Java .jar includes .jar files from several Apache projects:
  commons-codec,
  commons-discovery,
  commons-httpclient,
  commons-logging
  HttpComponents,
  (For all: license: Apache)
  These are included in the netcdfAll.jar.
- Other
  The NetCDF Java .jar also includes code from: com.google.code.findbugs, com.google.errorprone, com.google.guava, com.google.j2objc, com.google.protobuf, edu.ucar, org.codehaus.mojo, com.beust.jcommander, com.google.common, com.google.re2j, and com.google.thirdparty. (Google uses Apache and BSD-like licenses.)
SGT
The graphs and maps are created on-the-fly with a modified version of NOAA's SGT (was at https://www.pmel.noaa.gov/epic/java/sgt/, now discontinued) version 3 (a Java-based Scientific Graphics Toolkit written by Donald Denbo at NOAA PMEL) (license: copyrighted open source (was at https://www.pmel.noaa.gov/epic/java/license.html)).
Walter Zorn
Big, HTML tooltips on ERDDAP's HTML pages are created with Walter Zorn's wz_tooltip.js (license: LGPL).
Sliders and the drag and drop feature of the Slide Sorter are created with Walter Zorn's wz_dragdrop.js (license: LGPL).
iText
The .pdf files are created with iText (version 1.3.1, which used the Mozilla license), a free Java-PDF library by Bruno Lowagie and Paulo Soares.
GSHHS
The shoreline and lake data are from GSHHS -- A Global Self-consistent, Hierarchical, High-resolution Shoreline Database (license: GPL) and created by Paul Wessel and Walter Smith.
WE MAKE NO CLAIM ABOUT THE CORRECTNESS OF THE SHORELINE DATA THAT COMES WITH ERDDAP -- DO NOT USE IT FOR NAVIGATIONAL PURPOSES.
GMT pscoast
The political boundary and river data are from the pscoast program in GMT, which uses data from the CIA World Data Bank II (license: public domain).
WE MAKE NO CLAIM ABOUT THE CORRECTNESS OF THE POLITICAL BOUNDARY DATA THAT COMES WITH ERDDAP.
ETOPO
The bathymetry/topography data used in the background of some maps is the ETOPO1 Global 1-Minute Gridded Elevation Data Set (Ice Surface, grid registered, binary, 2 byte int: etopo1_ice_g_i2.zip) (license: public domain), which is distributed for free by NOAA NGDC.
WE MAKE NO CLAIM ABOUT THE CORRECTNESS OF THE BATHYMETRY/TOPOGRAPHY DATA THAT COMES WITH ERDDAP. DO NOT USE IT FOR NAVIGATIONAL PURPOSES.
JavaMail
Emails are sent using code in mail.jar from Oracle's JavaMail API (license: COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) Version 1.1).
JSON
ERDDAP uses json.org's Java-based JSON library to parse JSON data (license: copyrighted open source).
Lucene
ERDDAP use code from Apache Lucene. (license: Apache) for the "lucene" search engine option (but not for the default "original" search engine).
commons-compress
ERDDAP use code from Apache commons-compress. (license: Apache).
JEXL
ERDDAP support for evaluating expressions and scripts in <sourceNames>'s relies on the Apache project's: Java Expression Language (JEXL) (license: Apache).
Cassandra
ERDDAP includes Apache Cassandra's cassandra-driver-core.jar (license: Apache 2.0).
Cassandra's cassandra-driver-core.jar requires (and so ERDDAP includes):
- guava.jar (license: Apache 2.0).
- lz4.jar (license: Apache 2.0).
- metrics-core.jar (license: MIT).
- netty-all.jar (license: Apache 2.0).
- snappy-java.jar (license: Apache 2.0).
KT_ palettes
The color palettes which have the prefix "KT_" are a collection of .cpt palettes by Kristen Thyng (license: MIT/X), but slightly reformatted by Jennifer Sevadjian of NOAA so that they conform to ERDDAP's .cpt requirements.
Leaflet
ERDDAP uses the JavaScript library Leaflet (license: BSD 2) as the WMS client on WMS web pages in ERDDAP. It is excellent software (well designed, easy to use, fast, and free) from Vladimir Agafonkin.
AWS
For working with Amazon AWS (including S3), ERDDAP uses v2 of the AWS SDK for Java (license: Apache).
AWS requires Maven to pull in the dependencies. They include the following .jar files (where xxx is the version number, which changes over time, and the license type is in parentheses): annotations-xxx.jar (Apache), apache-client-xxx.jar (Apache), ams-xxx.jar (BSD), asm-xxx.jar (BSD), asm-analysis-xxx.jar (BSD), asm-commons-xxx.jar (BSD), asm-tree-xxx.jar (BSD), asm-util-xxx.jar (BSD), auth-xxx.jar (?), aws-core-xxx.jar (Apache), aws-query-protocol-xxx.jar (Apache), aws-xml-protocol-xxx.jar (Apache), checker-qual-xxx.jar (MIT), error_prone_annotations-xxx.jar (Apache), eventstream-xxx.jar (Apache), failureaccess-xxx.jar (Apache), httpcore-xxx.jar (Apache), j2objc-annotations-xxx.jar (Apache), jackson-annotations-xxx.jar (Apache), jackson-core-xxx.jar (Apache), jackson-databind-xxx.jar (Apache), jaxen-xxx.jar (BSD), jffi-xxx.jar (Apache), jffi-xxx.native.jar (Apache), jnr-constants-xxx.jar (Apache), jnr-ffi-xxx.jar (Apache), jnr-posix-xxx.jar (Apache), jnr-x86asm-xxx.jar (Apache), json-xxx.jar (Copyrighted open source), jsr305-xxx.jar (Apache), listenablefuture-xxx.jar (Apache), about a dozen netty .jar's (Apache), profiles-xxx.jar (Apache), protocol-core-xxx.jar (Apache), reactive-streams-xxx.jar (CCO 1.0), regions-xxx.jar (Apache), s3-xxx.jar (Apache), sdk-core-xxx.jar (Apache), utils-xxx.jar (?). To see the actual licenses, search for the .jar name in the Maven Repository and then rummage around in the project's files to find the license.
MergeIR
EDDGridFromMergeIRFiles.java was written and contributed by Jonathan Lafite and Philippe Makowski of R.Tech Engineering (license: copyrighted open source). Thank you, Jonathan and Philippe!
TableWriterDataTable
.dataTable (TableWriterDataTable.java) was written and contributed by Roland Schweitzer of NOAA (license: copyrighted open source). Thank you, Roland!
json-ld
The initial version of the Semantic Markup of Datasets with json-ld (JSON Linked Data) feature (and thus all of the hard work in designing the content) was written and contributed (license: copyrighted open source) by Adam Leadbetter and Rob Fuller of the Marine Institute in Ireland. Thank you, Adam and Rob!
orderBy
The code for the orderByMean filter in tabledap and the extensive changes to the code to support the variableName/divisor:offset notation for all orderBy filters was written and contributed (license: copyrighted open source) by Rob Fuller and Adam Leadbetter of the Marine Institute in Ireland. Thank you, Rob and Adam!
Borderless Marker Types
The code for three new marker types (Borderless Filled Square, Borderless Filled Circle, Borderless Filled Up Triangle) was contributed by Marco Alba of ETT / EMODnet Physics. Thank you, Marco Alba!
Translations of messages.xml
The initial version of the code in TranslateMessages.java which uses Google's translation service to translate messages.xml into various languages was written by Qi Zeng, who was working as a Google Summer of Code intern. Thank you, Qi!
orderBySum
The code for the orderBySum filter in tabledap (based on Rob Fuller and Adam Leadbetter's orderByMean) and the Check All and Uncheck All buttons on the EDDGrid Data Access Form were written and contributed (license: copyrighted open source) by Marco Alba of ETT Solutions and EMODnet. Thank you, Marco!
Out-of-range .transparentPng Requests
ERDDAP now accepts requests for .transparentPng's when the latitude and/or longitude values are partly or fully out-of-range. (This was ERDDAP GitHub Issues #19, posted by Rob Fuller -- thanks for posting that, Rob.) The code to fix this was written by Chris John. Thank you, Chris!
Display base64 image data in .htmlTable responses
The code for displaying base64 image data in .htmlTable responses was contributed by Marco Alba of ETT / EMODnet Physics. Thank you, Marco Alba!
nThreads Improvement
The nThreads system for EDDTableFromFiles was significantly improved. These changes lead to a huge speed improvement (e.g., 2X speedup when nThreads is set to 2 or more) for the most challenging requests (when a large number of files must be processed to gather the results). These changes will also lead to a general speedup throughout ERDDAP. The code for these changes was contributed by Chris John. Thank you, Chris!

We are also very grateful for all of the software and websites that we use when developing ERDDAP, including
Chrome (external link) ,
curl,
DuckDuckGo,
EditPlus,
FileZilla.
GitHub,
Google Search,
PuTTY,
stack overflow,
todoist,
Wikipedia,
the Internet, the World Wide Web, and all the other, great, helpful websites.
Thank you very much.

License

The ERDDAP-specific code is licensed as copyrighted open source, with NOAA (external link)

holding the copyright. The license is:

ERDDAP, Copyright 2022, NOAA.
PERMISSION TO USE, COPY, MODIFY, AND DISTRIBUTE THIS SOFTWARE AND ITS DOCUMENTATION FOR ANY PURPOSE AND WITHOUT FEE IS HEREBY GRANTED, PROVIDED THAT THE ABOVE COPYRIGHT NOTICE APPEAR IN ALL COPIES, THAT BOTH THE COPYRIGHT NOTICE AND THIS PERMISSION NOTICE APPEAR IN SUPPORTING DOCUMENTATION, AND THAT REDISTRIBUTIONS OF MODIFIED FORMS OF THE SOURCE OR BINARY CODE CARRY PROMINENT NOTICES STATING THAT THE ORIGINAL CODE WAS CHANGED AND THE DATE OF THE CHANGE. THIS SOFTWARE IS PROVIDED "AS IS" WITHOUT EXPRESS OR IMPLIED WARRANTY.

Contact

Questions, comments, suggestions? Please send an email to erd dot data at noaa dot gov and include the ERDDAP URL directly related to your question or comment.

Or, you can join the ERDDAP Google Group / Mailing List by visiting https://groups.google.com/forum/#!forum/erddap (external link) and clicking on "Apply for membership". Once you are a member, you can post your question there or search to see if the question has already been asked and answered.

ERDDAP, Version 2.22
Disclaimers | Privacy Policy