ERDDAP
is a web application and a web service that aggregates scientific data from
diverse local and
remote sources and offers a simple, consistent way to download subsets of the
data in common file
formats and make graphs and maps.
This web page discusses issues related to heavy ERDDAP usage loads
and explores possibilities for dealing with extremely heavy loads
via grids, clusters, federations, and cloud computing.
DISCLAIMER -
The contents of this web page are my (Bob Simons) personal opinions and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric Administration.
The calculations are simplistic, but I think the conclusions are correct.
Did I use faulty logic or make a mistake in my calculations?
If so, the fault is mine alone.
Please send an email with the correction to bob dot simons at noaa dot gov.
The original version was written in June 2009. There have been no significant
changes. This was last updated 2018-02-05.
With heavy use, a standalone ERDDAP will be constrained (from most to least likely) by:
- A remote data source's bandwidth -
Even with an efficient connection (e.g., via OPeNDAP),
unless a remote data source has a very high bandwidth
Internet connection, ERDDAP's responses will be constrained by how fast ERDDAP can get
data from the data source. A solution is to copy the dataset onto ERDDAP's hard drive,
perhaps with
EDDGridCopy
or
EDDTableCopy.
- ERDDAP's server's bandwidth - Unless ERDDAP's server has a very high bandwidth Internet
connection, ERDDAP's responses will be constrained by how fast ERDDAP can get data from
the data sources and how fast ERDDAP can return data to the clients. The only solution
is to get a faster Internet connection.
- Memory -
If there are many simultaneous requests, ERDDAP can run out of memory
and temporarily refuse new requests.
(ERDDAP has a couple of mechanisms to avoid this and to minimize the
consequences if it does
happen.) So the more memory in the server the better.
On a 32-bit server, 4+ GB is really good, 2 GB is okay,
less is not recommended.
On a 64-bit server, you can almost entirely avoid the problem but getting
lots of memory.
See the
-Xmx and -Xms settings
for ERDDAP/Tomcat.
An ERDDAP getting heavy usage on a computer with a 64-bit server
with 8GB of memory and -Xmx set to 4000M is rarely, if ever, constrained by memory.
- Hard drive bandwidth -
Accessing data stored on the server's hard drive
is vastly faster than
accessing remote data. Even so, if the ERDDAP server has a very high
bandwidth Internet connection,
it is possible that accessing data on the hard drive will be a bottleneck.
A partial solution
is to use faster (e.g., 10,000 RPM) magnetic hard drives
or SSD drives (if it makes
sense cost-wise). Another solution is to store different datasets
on different drives, so that the cumulative hard drive bandwidth is much higher.
- Too many files
in a cache directory -
ERDDAP caches all images, but only caches the
data for certain types of data requests. It is possible for the cache directory for a
dataset to have a large number of files temporarily. This will slow down requests to see
if a file is in the cache (really!). <cacheMinutes> in
setup.xml
lets you set how
long a file can be in the cache before it is deleted. Setting a smaller
number would minimize this problem.
- CPU -
Only two things take a lot of CPU time:
- NetCDF 4 and HDF 5 now support internal compression of data.
Decompressing a large compressed NetCDF 4 / HDF 5 data file can take 10
or more seconds. So multiple simultaneous requests to datasets with
data stored in compressed files can put a severe strain on any server.
If this is a problem, the solution is to store popular datasets
in uncompressed files, or get a server with a CPU with more cores.
- Making graphs (including maps): roughly 0.2 - 1 second per graph.
So if there were many simultaneous unique requests for graphs
(WMS clients often make 6 simultaneous requests!),
there could be a CPU limitation.
When multiple users are running WMS clients, this becomes a problem.
Under very heavy use, a single standalone ERDDAP will run into one or more of the
constraints listed
above and even the suggested solutions will be insufficient. For such situations,
ERDDAP has
features that make it easy to construct scalable grids (also called clusters or federations)
of ERDDAPs which allow the system to handle very heavy use (e.g., for a large data center).
I'm using
grid
as a general term to indicate a type of
computer cluster
where all of the
parts may or may not be physically located in one facility and may or may not be centrally
administered. An advantage of co-located, centrally owned and administered grids (clusters)
is that they benefit from economies of scale (especially the human workload) and simplify
making the parts of the system work well together. An advantage of non-co-located grids,
non-centrally owned and administered (federations)
is that they distribute the human work load
and the cost, and may provide some additional fault tolerance.
The solution I propose below works well for all grid topographies.
The basic idea of designing a scalable system is to identify the potential bottlenecks
and then design the system so that parts of the system can be replicated as needed to
alleviate the bottlenecks. Ideally, each replicated part increases the capacity of that
part of the system linearly (efficiency of scaling). The system isn't scalable unless
there is a scalable solution for every bottleneck.
Scalability
is different from efficiency (how quickly a task can be done -- efficiency
of the parts). Scalability allows the system to grow to handle any level of demand.
Efficiency (of scaling and of the parts) determines how may servers, etc., will be needed
to meet a given level of demand. Efficiency is very important, but always has limits.
Scalability is the only practical solution to building a system that can handle very
heavy use. Ideally, the system will be scalable and efficient.
The goals of this design are:
- To make a scalable architecture
(one that is easily extensible by replicating any part that
becomes over-burdened). To make an efficient system that maximizes the
availability and
throughput of the data given the available computing resources.
(Cost is almost always an issue.)
- To balance the capabilities of the parts of the system so that one part
of the system won't overwhelm another part.
- To make a simple architecture so that the system is easy to set up and administer.
- To make an architecture that works well with all grid topographies.
- To make a system that fails gracefully
and in a limited way if any part becomes over-burdened.
(The time required to copy a large datasets will always limit
the system's ability to deal
with sudden increases in the demand for a specific dataset.)
- (If possible) To make an architecture that isn't tied any specific
cloud computing service
or other external services (because it doesn't need them).
Our recommendations are:
- Basically, I suggest setting up a Composite ERDDAP
(D in the diagram), which is a
regular ERDDAP except that it just serves data from other ERDDAPs.
The grid's architecture
is designed to shift as much work as possible
(CPU usage, memory usage, bandwidth usage)
from the Composite ERDDAP to the other ERDDAPs.
- ERDDAP has two special dataset types,
EDDGridFromErddap
and
EDDTableFromErddap,
which refer to
datasets on other ERDDAPs.
- When the composite ERDDAP receives a request for data or images from
these datasets, the composite ERDDAP
redirects
the data request to the other ERDDAP server. The result is:
- This is very efficient (CPU, memory, and bandwidth), because otherwise
- The composite ERDDAP has to send the data request to the other ERDDAP.
- The other ERDDAP has to get the data, reformat it,
and transmit the data to the composite ERDDAP.
- The composite ERDDAP has to receive the data (using extra bandwidth),
reformat it (using extra CPU time and memory),
and transmit the data to the user (using extra bandwidth).
By redirecting the data request and allowing the other ERDDAP to send the
response directly
to the user, the composite ERDDAP spends essentially no CPU time, memory,
or bandwidth on data requests.
- The redirect is transparent to the user regardless of the client software
(a browser or any other software or command line tool).
The parts of the grid are:
A) For every remote data source that
has a high-bandwidth OPeNDAP server, you can connect directly
to the remote server.
If the remote server is an ERDDAP, use EDDGridFromErddap or
EDDTableFromERDDAP to serve the data in the Composite ERDDAP.
If the remote server is some other type of DAP server,
e.g., THREDDS, Hyrax, or GrADS, use EDDGridFromDap.
B) For every ERDDAP-able data source
(a data source from which ERDDAP
can read data) that has a high-bandwidth server, set up another ERDDAP in
the grid which
is responsible for serving the data from this data source.
- If several such ERDDAPs aren't getting many requests for data, you can
consolidate them into one ERDDAP.
- If the ERDDAP dedicated to getting data from one remote source is
getting too many requests,
there is a temptation to add additional ERDDAPs to access the remote
data source. In special cases this may make sense,
but it is more likely that this will overwhelm the remote data
source (which is self-defeating) and also prevent other users
from accessing the remote data source (which isn't nice).
In such a case, consider setting up another ERDDAP to serve that
one dataset and copy the dataset on that ERDDAP's hard drive (see C),
perhaps with
EDDGridCopy
and/or
EDDTableCopy.
- B servers must be publicly accessible.
C) For every ERDDAP-able data source
that has a low-bandwidth server
(or is a slow service for other reasons),
consider setting up another ERDDAP and storing a copy of the dataset
on that ERDDAP's hard drives, perhaps with
EDDGridCopy
and/or
EDDTableCopy.
If several such ERDDAPs
aren't getting many requests for data, you can consolidate them into one ERDDAP.
C servers must be publicly accessible.
D)
The composite ERDDAP is a regular
ERDDAP except that it just serves data from other ERDDAPs.
- Because the composite ERDDAP has information in memory about all of the
datasets, it can
quickly respond to requests for lists of datasets (full text searches, category searches,
the list of all datasets), and requests for an individual dataset's Data Access Form,
Make A Graph form, or WMS info page. These are all small, dynamically generated, HTML
pages based on information which is held in memory. So the responses are very fast.
- Because requests for actual data are quickly redirected to the other ERDDAPs,
the composite
ERDDAP can quickly respond to requests for actual data without using any CPU time, memory, or bandwidth.
- By shifting as much work as possible (CPU, memory, bandwidth)
from the Composite ERDDAP to
the other ERDDAPs, the composite ERDDAP can appear to serve data
from all of the datasets
and yet still keep up with very large numbers of data requests
from a large number of users.
- Preliminary tests indicate that the composite ERDDAP can respond to
most requests in ~1ms of
CPU time, or 1000 requests/second. So an 8 core processor should be able
to respond to about 8000 requests/second.
Although it is possible to envision bursts of higher activity
which would cause slowdowns, that is a lot of throughput.
It is likely that data center
bandwidth will be the bottleneck long before the composite ERDDAP becomes the bottleneck.
- In very extreme cases, or for fault tolerance,
you may want to set up more than one composite ERDDAP.
It is likely that other parts of the system (notably, the data center's bandwidth)
will become a problem long before the composite ERDDAP becomes a bottleneck.
So the solution is probably to set up additional, geographically diverse, data centers
(mirrors), each with one composite ERDDAP and servers with ERDDAPs and (at least) mirror
copies of the datasets which are in high demand. Such a setup also provides fault
tolerance and data backup (via copying).
In this case, it is best if the composite ERDDAP's have different URLs.
If you really want all of the composite ERDDAP's to have the same URL,
use a front end system
that assigns a given user to just one of the composite ERDDAPs (base on the IP address),
so that all of the user's requests go to just one of the composite ERDDAPs.
There are two reasons:
- When an underlying dataset changes (e.g., a new data file in a gridded dataset),
the composite ERDDAP's may be slightly out of synch, but with
eventual consistency
.
Normally, they will re-synch within 5 seconds, but sometimes it will be longer.
If a user makes an automated system that relies on
ERDDAP subscriptions that trigger actions, the brief synchronicity
problems will become significant.
- The 2+ composite ERDDAP's each maintain their own set of subscriptions
(because of the synch problem described above).
So a given user should be directed to just one of the composite ERDDAP's
to avoid these problems.
If one of the composite ERDDAP's goes down, the front end system can
redirect that ERDDAP's users to another ERDDAP that is up.
However, if it is a capacity problem that cause the first composite ERDDAP to fail
(an overzealous user? a
denial-of-service attack
?),
this makes it very likely that redirecting its users to other composite ERDDAP's
will cause a
cascading failure
.
Thus, the most robust setup is to have composite ERDDAP's with different URLs.
- [For a fascinating design of a high performance system running on one server,
see
Cringely's overview
or the detailed description of Mailinator
.]
Datasets In Very High Demand -
In the really unusual case that one of the
A, B, or C ERDDAPs
can't keep up with the requests because of bandwidth or hard drive limitations,
it makes sense to copy the data (again) on to another server+hardDrive+ERDDAP,
perhaps with
EDDGridCopy
and/or
EDDTableCopy.
While it may seem ideal to have the original dataset and the
copied dataset appear seamlessly as one dataset in the composite ERDDAP, this is difficult
because the two datasets will be in slightly different states at different times (notably,
after the original gets new data, but before the copied dataset gets its copy).
Therefore, I recommend that the datasets be given slightly different titles (e.g.,
"... (copy #1)" and "... (copy #2)", or perhaps "(mirror #n)" or "(server #n)") and
appear as separate datasets in the composite ERDDAP.
Users are used to seeing lists of
mirror sites
at popular file download sites, so this shouldn't surprise or disappoint them.
Because of bandwidth limitations at a given site, it may make sense to have the mirror
located at another site. If the mirror copy is at a different data center, accessed just
by that data center's composite ERDDAP, the different titles (e.g., "mirror #1) aren't
necessary.
RAIDs vs. Regular Hard Drives -
If a large dataset or a group of datasets are not heavily used,
it may make sense to store the data on a RAID since it offers fault tolerance and since
you don't need the processing power or bandwidth of another server. But if a dataset is
heavily used, it may make more sense to copy the data on another server + ERDDAP + hard
drive (similar to
what Google does
)
rather than to use one server and a RAID to store
multiple datasets since you get to use both server+hardDrive+ERDDAPs in the grid until
one of them fails.
Failures - What happens if...
- There is a burst of requests for one dataset (e.g., all students in a class
simultaneously request similar data)?
Only the ERDDAP serving that dataset will be overwhelmed and
slow down or refuse requests. The composite ERDDAP and other ERDDAPs won't be
affected. Since the limiting factor for a given dataset within the system is the hard
drive with the data (not ERDDAP), the only solution (not immediate) is to make a copy
of the dataset on a different server+hardDrive+ERDDAP.
- An A, B, or C ERDDAP fails (e.g., hard drive failure)?
Only the dataset(s) served by that ERDDAP are affected.
If the dataset(s) is mirrored on another server+hardDrive+ERDDAP, the effect is minimal.
If the problem is a hard drive failure in a level 5 or 6 RAID, you just replace the
drive and have the RAID rebuild the data on the drive.
- The composite ERDDAP fails?
If you want to make a system with very
high availability
,
you can set up
multiple composite ERDDAPs (as discussed above),
using something like
NGINX
or
Traefik
to handle load balancing.
Note that a given composite ERDDAP can handle a large number of requests
from a large number of users, because
requests for metadata are small and are handled by information that is in memory,
and
requests for data (which may be large) are redirected to the child ERDDAPs.
Simple,
Scalable
- This system is easy to set up and administer,
and easily extensible when
any part of it becomes over-burdened. The only real limitations for a given data center
are the data center's bandwidth and the cost of the system.
Bandwidth -
Note the approximate bandwidth of commonly used components of the system:
Component | Approximate Bandwidth (GBytes/s) |
DDR memory | 2.5 |
SSD drive | 1 |
SATA hard drive | 0.3 |
Gigabit Ethernet | 0.1 |
OC-12 | 0.06 |
OC-3 | 0.015 |
T1 | 0.0002 |
So, one SATA hard drive (0.3GB/s) on one server with one ERDDAP can probably saturate a
Gigabit Ethernet LAN (0.1GB/s).
And one Gigabit Ethernet LAN (0.1GB/s) can probably saturate an OC-12 Internet connection
(0.06GB/s).
And at least one source lists OC-12 lines costing about $100,000 per month.
(Yes, these calculations are based on pushing the system to its limits,
which is not good because it leads to very sluggish responses.
But these calculations are useful for planning and for balancing parts of the system.)
Clearly, a suitably fast Internet connection for your data center is
by far the most expensive part of the system.
You can easily and relatively cheaply build a grid with a dozen servers
running a dozen ERDDAPs
which is capable of pumping out lots of data quickly,
but a suitably fast Internet connection will be very, very expensive.
The partial solutions are:
- Encourage clients to request subsets of the data if that is all that is needed.
If the client only needs data for a small region or at a lower resolution,
that is what they should request.
Subsetting is a central focus of the protocols ERDDAP supports for
requesting data.
- Encourage transmitting compressed data.
ERDDAP compresses
a data transmission if it
finds "accept-encoding" in the HTTP GET request header. All web browsers use
"accept-encoding" and automatically decompress the response. Other clients
(e.g., computer programs) have to use it explicitly.
- Colocate your servers at an ISP or other site that offers relatively
less expensive bandwidth costs.
- Disperse the servers with the ERDDAPs to different institutions so that
the costs are dispersed.
You can then link your composite ERDDAP to their ERDDAPs.
Note that
Cloud Computing and web hosting services
offer all the Internet bandwidth
you need, but don't solve the price problem.
For general information on designing scalable,
high capacity, fault-tolerant systems,
see Michael T. Nygard's book
Release It
.
[These are my opinions.
Yes, the calculations are simplistic, but I think the conclusions are correct.
Did I use faulty logic or make a mistake in my calculations? If so, the fault is mine alone.
Please send an email with the correction to bob dot simons at noaa dot gov.]
Several companies offer cloud computing services
(e.g.,
Amazon Web Services
and
Google Cloud Platform
).
Web hosting companies
have offered simpler services since the mid-1990's,
but the "cloud" services have greatly expanded the flexibility
of the systems and the range of services offered.
Since the ERDDAP grid just consists of ERDDAPs and
since ERDDAPs are Java web applications that can run in Tomcat (the most common
application server) or other application servers, it should be relatively easy to
set up an ERDDAP grid on a cloud service or web hosting site.
The advantages of these services are:
- They offer access to very high bandwidth Internet connections.
This alone may justify using these services.
- They only charge for the services you use.
For example, you get access to a very high
bandwidth Internet connection, but you only pay for actual data transferred.
That lets you build a system that rarely gets overwhelmed (even at peak demand),
without having to pay for capacity that is rarely used.
- They are easily extensible. You can change server types or add
as many servers or as much storage as you want, in less than a minute.
This alone may justify using these services.
- They free you from many of the administrative duties of running the
servers and networks.
This alone may justify using these services.
The disadvantages of these services are:
- They charge for their services, sometimes a lot
(in absolute terms; not that it isn't a good value).
The prices listed here are for
Amazon EC2
.
These prices (as of June 2015) will come down.
In the past, prices were higher,
but data files and the number of requests were smaller.
In the future, prices will be lower,
but data files and the number of requests will be larger.
So the details change, but the situation stays relatively constant.
And it isn't that the service is over-priced,
it is that we are using and buying a lot of the service.
- Data Transfer - Data transfers into the system are now free (Yea!).
Data transfers out of the system are $0.09/GB.
One SATA hard drive (0.3GB/s) on one server with one ERDDAP can probably
saturate a Gigabit Ethernet LAN (0.1GB/s).
One Gigabit Ethernet LAN (0.1GB/s) can probably saturate an OC-12 Internet
connection (0.06GB/s).
If one OC-12 connection can transmit ~150,000 GB/month, the Data Transfer costs
could be as much as 150,000 GB @ $0.09/GB = $13,500/month,
which is a significant cost.
Clearly, if you have a dozen hard-working ERDDAP's on a cloud service, your
monthly Data Transfer fees could be substantial (up to $162,000/month).
(Again, it isn't that the service is over-priced,
it is that we are using and buying a lot of the service.)
- Data storage - Amazon charges $50/month per TB.
(Compare that to buying a 4TB enterprise drive outright for ~$50/TB,
although the RAID to put it in and administrative costs add to the total cost.)
So if you need to store lots of data in the cloud,
it might be fairly expensive (e.g., 100TB would cost $5000/month).
But unless you have a really large amount of data,
this is a smaller issue than the bandwidth/data transfer costs.
(Again, it isn't that the service is over-priced,
it is that we are using and buying a lot of the service.)
- The subsetting problem:
The only way to efficiently distribute data from data files
is to have the program which is distributing the data (e.g., ERDDAP) running on
a server which has the data stored on a local hard drive
(or similarly fast access to a SAN or local RAID).
Local file systems allow ERDDAP (and underlying libraries, such as netcdf-java)
to request specific byte ranges from the files and get responses very quickly.
Many types of data requests from ERDDAP to the file
(notably gridded data requests where the stride value
is > 1) can't be done efficiently if the program
has to request the entire file or big chunks of a file
from a non-local (hence slower) data storage system and then extract a subset.
If the cloud setup doesn't give ERDDAP fast access to byte ranges of the files
(as fast as with local files),
ERDDAP's access to the data will be a severe bottleneck
and negate other benefits of using a cloud service.
Thanks -
Many thanks to Matthew Arrott and his group in the original OOI effort
for their work on putting ERDDAP in
the cloud and the resulting discussions.
The contents of this web page are my (Bob Simons) personal opinions and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric Administration.
The calculations are simplistic, but I think the conclusions are correct.
Did I use faulty logic or make a mistake in my calculations? If so, the fault is mine alone.
Please send an email with the correction to
bob dot simons at noaa dot gov.
Questions, comments, suggestions? Please send an email to
bob dot simons at noaa dot gov
and include the ERDDAP URL directly related to your question or comment.
ERDDAP, Version 1.82
Disclaimers |
Privacy Policy