NCCSV -
A NetCDF-Compatible, UTF-8, CSV File Specification,
Version 1.20
Bob Simons <erd.data at noaa.gov> and Steve Hankin
"NCCSV" by Bob Simons and Steve Hankin is licensed under
CC BY 4.0
This document specifies a UTF-8 CSV text file format that can contain
all of the information (metadata and data) that can be found in a NetCDF .nc
file that contains a CSV-file-like table of data. The file extension for a
UTF-8 CSV text file following this specification must be .csv so that
it can be read easily and correctly into spreadsheet programs like
Excel and Google Sheets. Bob Simons will write software to convert an
NCCSV file into a NetCDF-3 (and perhaps also a NetCDF-4) .nc file,
and the reverse, with no loss of information.
Bob Simons has modified
ERDDAP™
to support reading and writing this type of file.
The NCCSV format is designed so that spreadsheet software such as Excel
and Google Sheets can import an NCCSV file as a csv file, with all of the
information in the spreadsheet's cells ready for editing. Or, a spreadsheet
can be created from scratch following the NCCSV conventions.
Regardless of the source of the spreadsheet,
if it is then exported as a .csv file,
it will conform to the NCCSV specification and no information will be lost.
The only differences between NCCSV files and the analogous spreadsheet
files which follow these conventions are:
- NCCSV files have values on a line separated by commas.
Spreadsheets have values on a line in adjacent cells.
- Strings in NCCSV files are often surrounded by double quotes.
Strings in spreadsheets are never surrounded by double quotes.
- Internal double quotes ( " ) in Strings in NCCSV files appear
as 2 double quotes.
Internal double quotes in spreadsheets appear as 1 double quote.
See the Spreadsheet
section below for more information.
Streamable —
Like CSV files in general, NCCSV files are streamable.
Thus, if an NCSV is generated on-the-fly by a data server such as
ERDDAP™,
the server can start to stream data to the requester before all of the data
has been gathered. This is a useful and desirable feature. NetCDF files,
by contrast, are not streamable.
ERDDAP™ —
This specification is designed so that NCCSV files and the .nc files
that can be created from them can be used by an
ERDDAP™ data server
(via the
EDDTableFromNccsvFiles
and
EDDTableFromNcFiles dataset types),
but this specification is external to ERDDAP.
ERDDAP™ has several required global attributes and many recommended global
and variable attributes, mostly based on CF and ACDD attributes (see
https://erddap.github.io/setupDatasetsXml.html#globalAttributes).
Balance —
The design of the NCCSV format is a balance of several requirements:
- The files must contain all of the data and metadata that would be in a
tabular NetCDF file, including specific data types.
- The files must be able to be read into and then written out of a
spreadsheet with no loss of information.
- The files must be easy for humans to create, edit, read, and understand.
- The files must be able to be unambiguously parsed by computer programs.
If some requirement in this document seems odd or picky,
it is probably needed to meet one of these requirements.
Other Specifications -
This specification refers to several other specifications and libraries
that it is designed to work with, but this specification is not a part of
any of those other specifications, nor does it need any changes to them,
nor does it conflict with them.
If a detail related to one of these standards is not specified here,
see the related specification. Notably, this includes:
Notation -
In this specification, brackets, [ ], denote optional items.
A complete NCCSV file consists of two sections: the metadata section,
followed by the data section.
NCCSV files may contain any UCS-2 characters
(i.e., 2-byte Unicode characters, as in Java) encoded via UTF-8.
ERDDAP™ reads and writes NCCSV files using the UTF-8 encoding.
NCCSV files may use either newline (\n)
(which is common on Linux and Mac OS X computers)
or carriageReturn plus newline (\r\n) (which is common on Windows computers)
as end-of-line markers, but not both.
.nccsvMetadata —
When both the creator and the reader are expecting it,
it is also possible and sometimes useful to make a variant of an NCCSV file
which contains just the metadata section (including the *END_METADATA* line).
The result provides a complete description of the file's attributes,
variable names, and data types,
thus serving the same purpose as the .das plus .dds responses
from an OPeNDAP server. ERDDAP™ will return this variation
if you request fileType=.nccsvMetadata from an ERDDAP™ dataset.
In an NCCSV file, each line of the metadata section
uses the format
variableName,attributeName,value1[,value2][,value3][,value4][...]
Spaces before or after items are not allowed because they cause problems
when importing the file into spreadsheet programs.
Conventions -
The first line of an NCCSV file is the first line of the metadata section and
must have a *GLOBAL*
Conventions attribute listing all of the
conventions used in the file as a String containing a CSV list, for example:
*GLOBAL*,Conventions,"COARDS, CF-1.6, ACDD-1.3, NCCSV-1.2"
One of the conventions listed must be NCCSV-1.2,
which refers to the current version of this specification.
*END_METADATA* -
The end of the metadata section of an NCCSV file must be denoted by a
line with only
*END_METADATA*
It is recommended but not required that all of the attributes for a
given variable appear on adjacent lines of the metadata section.
If an NCCSV file is converted into a NetCDF file, the order that the
variableNames first appear in the metadata section will be the order of
the variables in the NetCDF file.
Optional blank lines are allowed in the metadata section after the required
first line with
*GLOBAL* Conventions
information (see below)
and before the required last line with *END_METADATA*.
If a spreadsheet is created from an NCCSV file,
the metadata data section will appear with variable names in column A,
attribute names in column B, and values in column C.
If a spreadsheet following these conventions is saved as a CSV file,
there will often be extra commas at the end of the lines in the metadata
section. The software that converts NCCSV files into .nc files will ignore
the extra commas.
variableName is the case-sensitive name of a variable in the data file.
All variable names must begin with a 7-bit ASCII letter or underscore and
be composed of 7-bit ASCII letters, underscores, and 7-bit ASCII digits.
The special variableName
*GLOBAL* is used to denote global metadata.
attributeName is the case-sensitive name of an attribute
associated with a variable or
*GLOBAL*.
All attribute names must begin with a 7-bit ASCII letter or
underscore and be composed of 7-bit ASCII letters, underscores,
and 7-bit ASCII digits.
*SCALAR* —
The special attributeName *SCALAR*
can be used to create a
scalar data variable and define its value.
The data type of the *SCALAR* defines the data type
for the variable,
so do not specify a *DATA_TYPE* attribute for scalar variables.
Note that there must not be data for the scalar variable in the Data Section
of the NCCSV file.
For example, to create a scalar variable named "ship"
with the value "Okeanos Explorer" and a cf_role attribute, use:
ship,*SCALAR*,"Okeanos Explorer"
ship,cf_role,trajectory_id
When a scalar data variable is read into ERDDAP™,
the scalar value is converted into a column in the data table with the
same value on every row.
value is the value of the metadata attribute
and must be an array with one or more of either a
byte, ubyte, short, ushort, int, uint, long, ulong, float, double, String, or char.
No other data types are supported. Attributes with no value will be ignored.
If there is more than one sub-value, the sub-values must all be of the
same data type.
For data types other than Strings, the values MUST be separated by commas, for example:
sst,actual_range,0.17f,23.58f
For Strings, use a single String with
\n (newline) characters separating the substrings.
The definitions of the attribute data types are:
- byte
attribute values (8-bit, signed) must be written with the
suffix 'b', e.g., -7b, 0b, 7b . The range of valid byte values is -128 to 127.
A number that looks like a byte but is invalid (e.g., 128b)
will be converted to a missing value or generate an error message.
- ubyte
attribute values (8-bit, unsigned) must be written with the
suffix 'ub', e.g., 0ub, 7ub, 250ub . The range of valid byte values is 0 to 255.
A number that looks like a ubyte but is invalid (e.g., 256ub)
will be converted to a missing value or generate an error message.
When possible, use byte instead of ubyte, because many systems don't support
unsigned bytes (e.g., attributes in NetCDF-3 files).
- short
attribute values (16-bit, signed) must be written with the
suffix 's', e.g., -30000s, 0s, 30000s. The range of valid short values is
-32768 to 32767. A number that looks like a short but is invalid (e.g., 32768s)
will be converted to a missing value or generate an error message.
- ushort
attribute values (16-bit, unsigned) must be written with the
suffix 'us', e.g., 0us, 30000us, 60000us. The range of valid short values is
0 to 65535. A number that looks like a ushort but is invalid (e.g., 65536us)
will be converted to a missing value or generate an error message.
When possible, use short instead of ushort, because many systems don't support
unsigned bytes (e.g., attributes in NetCDF-3 files).
- int
attribute values (32-bit, signed) must be written as JSON ints
without a decimal point or exponent, but with the suffix 'i',
e.g., -12067978i, 0i, 12067978i. The range of valid int values is
-2147483648 to 2147483647. A number that looks like an int but is invalid
(e.g., 2147483648i)
will be converted to a missing value or generate an error message.
- uint
attribute values (32-bit, unsigned) must be written as JSON ints
without a decimal point or exponent, but with the suffix 'ui',
e.g., 0ui, 12067978ui, 4123456789ui. The range of valid int values is
0 to 4294967295. A number that looks like a uint but is invalid
(e.g., 2147483648ui)
will be converted to a missing value or generate an error message.
When possible, use int instead of uint, because many systems don't support
unsigned bytes (e.g., attributes in NetCDF-3 files).
- long
attribute values (64-bit, signed, currently supported by
NUG and ERDDAP™ but not yet supported by CF) must be written without
a decimal point and with the suffix 'L',
e.g., -12345678987654321L, 0L, 12345678987654321L .
If you use the converting software to convert an NCCSV file with long
values into a NetCDF-3 file, any long values will be converted to double values.
The range of valid long values is -9223372036854775808 to 9223372036854775807.
A number that looks like a long but is invalid (e.g., 9223372036854775808L)
will be converted to a missing value or generate an error message.
When possible, use double instead of ulong, because many systems don't support
long (e.g., NetCDF-3 files).
- ulong
attribute values (64-bit, unsigned, currently supported by
NUG and ERDDAP™ but not yet supported by CF) must be written without
a decimal point and with the suffix 'uL',
e.g., 0uL, 12345678987654321uL, 9007199254740992uL .
If you use the converting software to convert an NCCSV file with long
values into a NetCDF-3 file, any long values will be converted to double values.
The range of valid long values is 0 to 18446744073709551615.
A number that looks like a ulong but is invalid (e.g., 18446744073709551616uL)
will be converted to a missing value or generate an error message.
When possible, use double instead of ulong, because many systems don't support
signed or unsigned long (e.g., NetCDF-3 files).
- float
attribute values (32-bit) must be written with the suffix 'f'
and may have a decimal point and/or an exponent,
e.g., 0f, 1f, 12.34f, 1e12f, 1.23e+12f, 1.23e12f, 1.87E-7f.
Use NaNf for a float NaN (missing) value.
The range of floats is approximately
+/-3.40282347E+38f (~7 significant decimal digits).
A number that looks like a float but is invalid (e.g., 1.0e39f)
will be converted to a missing value or generate an error message.
- double
attribute values (64-bit) must be written
with the suffix 'd' and may have a decimal point and/or an exponent,
e.g., 0d, 1d, 12.34d, 1e12d, 1.23e+12d, 1.23e12d, 1.87E-7d.
Use NaNd for a double NaN (missing) value.
The range of doubles is approximately +/-1.79769313486231570E+308d
(~15 significant decimal digits).
A number that looks like a double but is invalid (e.g., 1.0e309d)
will be converted to a missing value or generate an error message.
- String
attribute values are a sequence of UCS-2 characters
(i.e., 2-byte Unicode characters, as in Java),
which must be written as JSON-like strings.
- Double quotes (") within a String value must be encoded as two double
quotes (""). That's what spreadsheet programs require when
reading .csv files. That's what spreadsheet programs write when
you save a spreadsheet as a .csv file.
- The special JSON backslash-encoded characters within a String value must be encoded as
in JSON (notably \n (newline), \\ (backslash),
\f (formfeed), \t (tab), \r (carriage return) or with the
\uhhhh syntax. In a spreadsheet,
don't use Alt Enter to specify a new line within a text cell;
instead, use \n (2 characters: backslash and 'n') to indicate a new line.
- \uhhhh -
All other characters less than character #32 must be encoded with the syntax
\uhhhh, where hhhh is the 4-digit hexadecimal
number of the character.
- All printable characters greater than character #126, e.g., the Euro sign,
may appear unencoded, e.g., € (the Euro character), or encoded with the
\uhhhh syntax, e.g., \u20AC.
See the code pages referenced at
https://en.wikipedia.org/wiki/Unicode
to find the hexadecimal numbers associated with specific Unicode characters,
or use a software library.
Note that some ERDDAP™ output file types, e.g., .csv, use the ISO 8859-1 character set,
so Unicode characters above #255 will be lost when these data values are written to those file types.
- All non-printable characters greater than character #126, e.g., character #127,
are STRONGLY DISCOURAGED, but you SHOULD use the \uhhhh syntax if you do include them.
- If the String has a space at the beginning or end, or includes "
(double quote) or a comma, or contains values that would otherwise be
interpreted as some other data type (e.g., an int), or is the word "null",
the entire String must
be enclosed in double quotes; otherwise, unlike JSON,
the enclosing double quotes are optional.
We recommend: when in doubt, enclose the entire String in double quotes.
Spaces at the beginning or end of a String are strongly discouraged.
- char
attribute values are a single UCS-2 character
(i.e., 2-byte Unicode characters, as in Java).
Printable characters (other than special characters like newline, ", ', and \)
can be written as is.
Special characters (e.g., newline) and other unprintable characters (e.g., #127)
must be written with the
\uhhhh syntax.
Char attribute values must be enclosed in single quotes (the inner quotes)
and double quotes (the outer quotes), e.g., "'a'", "'""'"
(a double quote character),
"'\''" (a single quote character), "'\t'" (a tab),
"'\u007F'" (the delete 'character'), and
"'€'" (the Euro character). This system of using
single and double quotes is odd and cumbersome, but it is a way to
distinguish char values from Strings in a way that works with spreadsheets.
A value that looks like a char but is invalid will generate an error message.
Note that some ERDDAP™ output file types, e.g., .csv, use the ISO 8859-1 character set,
so Unicode characters above #255 will be lost when these data values are written to those file types.
Suffix —
Note that in the attributes section of an NCCSV file,
all numeric attribute values must have a suffix letter (e.g., 'b')
to identify the numeric data type (e.g., byte).
But in the data section of an NCCSV file,
numeric data values must never have these suffix letters
(with the exception of 'L' for long integers and 'uL' for ulong integers) —
the data type is specified by the *DATA_TYPE* attribute for the variable.
*DATA_TYPE* -
The data type for each non-scalar
variable must be specified by a
*DATA_TYPE* attribute which can have a value of
byte, ubyte, short, ushort, int, uint, long, ulong, float, double, String, or char
(case insensitive). For example,
qc_flag,*DATA_TYPE*,byte
WARNING: Specifying the correct *DATA_TYPE* is your responsibility.
Specifying the wrong data type (e.g., int when you should have
specified float) will not generate an error message and may cause
information to be lost (e.g., float values will be rounded to ints)
when the NCCSV file is read by ERDDAP™ or converted into a NetCDF file.
char Discouraged -
The use of char data values is discouraged because they are not widely
supported in other file types. char values may be written in the
data section as single characters or as Strings (notably, if you need to
write a special character). If a String is found, the first character of the
String will be used as the char's value. Zero length Strings and missing values
will be converted to character \uFFFF. Note that NetCDF files only support
single byte chars, so any chars greater than char #255 will be converted to
'?' when writing NetCDF files. Unless a charset attribute is used
to specify a different charset for a char variable, the ISO-8859-1 charset
will be used.
long and Unsigned Types Are Discouraged -
Although many file types (e.g., NetCDF-4 and json) and ERDDAP™ support
long and unsigned (ubyte, ushort, uint, ulong) values,
the use of long and unsigned values in
NCCSV files is currently discouraged because they are currently
not supported by Excel, CF and NetCDF-3 files.
If you want to specify long or unsigned values in an NCCSV file
(or in the corresponding Excel spreadsheet),
you must use the suffix 'L' so that Excel doesn't treat the numbers as
floating point numbers with lower precision.
Currently, if an NCCSV files is converted into a NetCDF-3 .nc file,
long and ulong data values will be converted into double values,
causing a loss of precision for very large values
(less than -2^53 for long, or greater than 2^53 for long and ulong).
In NetCDF-3 .nc files, ubyte, ushort, and uint variables appear as
byte, short, and int with the _Unsigned=true metadata attribute.
In NetCDF-3 .nc files, ubyte, ushort, and uint attributes appear as
byte, short, and int attributes containing the corresponding two's-complement value
(e.g., 255ub appears as -1b). This is obviously trouble, so signed
data types should be used instead of unsigned data types whenever possible.
CF, ACDD, and ERDDAP™ Metadata -
Since it is envisioned that most NCCSV files, or the .nc files
created from them, will be read into ERDDAP,
it is strongly recommended that NCCSV files include the metadata attributes
which are required or recommended by ERDDAP™ (see
https://erddap.github.io/setupDatasetsXml.html#globalAttributes).
The attributes are almost all from the CF and ACDD metadata standards and
serve to properly describe the dataset (who, what, when, where, why, how)
to someone who otherwise knows
nothing about the dataset. Of particular importance,
almost all numeric variables should have a units
attribute with a UDUNITS-compatible value, e.g.,
sst,units,degree_C
It is fine to include additional attributes which are not from the
CF or ACDD standards or from ERDDAP.
The first line of the data section must have a case-sensitive,
comma-separated list of variable names.
All of the variables in this list must be described in the metadata section,
and vice versa (other than
*GLOBAL* attributes
and
*SCALAR* variables).
The second through the penultimate lines of the data section must
have a comma-separated list of values.
Each row of data must have the same number of values as the comma-separated
list of variable names.
Spaces before or after values are not allowed because they cause problems
when importing the file into spreadsheet programs.
Each column in this section must contain only values of the *DATA_TYPE*
specified for that variable by the *DATA_TYPE*
attribute for that variable.
Unlike in the attributes section, numeric values in the data section
must not have suffix letters to denote the data type.
Unlike in the attributes section, char values in the data section
may omit the enclosing single quotes if they are not needed for disambiguation
(thus, ',' and '\'' must be quoted as shown here).
There may be any number of these data rows in an NCCSV file,
but currently ERDDAP™ can only read NCCSV files with up to
about 2 billion rows.
In general, it is recommended that you split large datasets into
multiple NCCSV data files with fewer than 1 million rows each.
*END_DATA* -
The end of the data section must be denoted by a line with only
*END_DATA*
If there is additional content in the NCCSV file after the
*END_DATA* line, it will be ignored when the NCCSV file is
converted into an .nc file. Such content is therefore discouraged.
In a spreadsheet following these conventions,
the variable names and data values will be in multiple columns.
See the example below.
Numeric missing values may be written as a numeric value identified by a
missing_value or _FillValue attribute for that variable.
For example, see the second value on this data row:
Bell M. Shimada,99,123.4
This is the recommended way to handle missing values for
byte, ubyte, short, ushort, int, uint, long, and ulong variables.
float or double NaN values may be written as NaN.
For example, see the second value on this data row:
Bell M. Shimada,NaN,123.4
String and numeric missing values may be indicated by an empty field.
For example, see the second value on this data row:
Bell M. Shimada,,123.4
For byte, ubyte, short, ushort, int, uint, long, and ulong variables,
the NCCSV converter utility and ERDDAP™ will convert an empty field
into the maximum allowed value for that data type (e.g., 127 for bytes).
If you do this, be sure to add a missing_value or
_FillValue attribute for that variable to identify this value,
e.g.,
variableName,_FillValue,127b
For float and double variables, an empty field will be converted to NaN.
DateTime values (including date values that don't have a time component)
may be represented as numbers or as Strings in NCCSV files.
A given dateTime variable may only have String values or only numeric values,
not both. The NCCSV software will convert String dateTime values into numeric
dateTime values when creating .nc files (as required by CF).
String dateTime values have the advantage of being easily readable by humans.
DateTime values represented as numeric values must have a
units attribute which specifies the "units since dateTime"
as required by CF and specified by UDUNITS, e.g.,
time,units,seconds since 1970-01-01T00:00:00Z
DateTime values represented as String values must have a
String *DATA_TYPE* attribute and a units attribute
which specifies a dateTime pattern as specified by the
Java DateTimeFormatter class
(https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html). For example,
time,units,yyyy-MM-dd'T'HH:mm:ssZ
All dateTime values for a given data variable must use the same format.
In most cases, the dateTime pattern you need for the units
attribute will be a variation of one of these formats:
- yyyy-MM-dd'T'HH:mm:ss.SSSZ — which is the ISO 8601:2004(E) dateTime format.
You may need a shortened version of this,
e.g., yyyy-MM-dd'T'HH:mm:ssZ (the only recommended format) or yyyy-MM-dd.
If you are changing the format of your dateTime values,
NCCSV strongly recommends that you change to this format (perhaps shortened).
This is the format that ERDDAP™ will use when it writes NCCSV files.
- yyyyMMddHHmmss.SSS — which is the compact version of the ISO 8601:2004
dateTime format. You may need a shortened version of this, e.g., yyyyMMdd.
- M/d/yyyy H:mm:ss.SSS — which handles US-style dates and dateTimes like
"3/23/2017 16:22:03.000". You may need a shortened version of this,
e.g., M/d/yyyy .
- yyyyDDDHHmmssSSS — which is the year plus the zero-padded
day of the year (e.g, 001 = Jan 1, 365 = Dec 31 in a non-leap year;
this is sometimes erroneously called the Julian date).
You may need a shortened version of this, e.g., yyyyDDD .
Precision —
When a software library converts an .nc file into an NCCSV file,
all dateTime values will be written as Strings with the
ISO 8601:2004(E) dateTime format, e.g., 1970-01-01T00:00:00Z .
You can control the precision with the ERDDAP-specific attribute
time_precision. See
https://erddap.github.io/setupDatasetsXml.html#time_precision.
Time Zone —
The default time zone for dateTime values is the Zulu (or GMT)
time zone, which has no daylight saving time periods.
If a dateTime variable has dateTime values from a different time zone,
you must specify this with the ERDDAP-specific attribute time_zone.
This is a requirement for ERDDAP™ (see
https://erddap.github.io/setupDatasetsXml.html#time_zone).
As required by CF, all degree values (e.g., for longitude and latitude)
must be specified as decimal-degree double values,
not as a degree°min'sec" String or as separate variables for
degrees, minutes, seconds.
The direction designators N, S, E, and W are not allowed.
Use negative values for West longitudes and for South latitudes.
An NCCSV file may contain CF Discrete Sampling Geometry
(
https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#discrete-sampling-geometries) data. It is the attributes that make this work:
- As required by CF, the NCCSV file must include a line in the
metadata section identifying the *GLOBAL*
featureType attribute, e.g.,
*GLOBAL*,featureType,trajectory
- For use in ERDDAP™, the NCCSV file must include a line or lines
in the metadata section identifying the cf_role=..._id variables, e.g.,
ship,cf_role,trajectory_id
This is optional for CF, but required in NCCSV.
- For use in ERDDAP™, the NCCSV file must include a line or lines
in the metadata section identifying which variables are associated
with each timeSeries, trajectory, or profile as required by ERDDAP™ (see
https://erddap.github.io/setupDatasetsXml.html#cdm_data_type), e.g.,
*GLOBAL*,cdm_trajectory_variables,"ship"
or
*GLOBAL*,cdm_timeseries_variables,"station_id,lat,lon"
Here is a sample file which demonstrates many of the features of an NCCSV file:
*GLOBAL*,Conventions,"COARDS, CF-1.6, ACDD-1.3, NCCSV-1.2"
*GLOBAL*,cdm_trajectory_variables,"ship"
*GLOBAL*,creator_email,erd.data@noaa.gov
*GLOBAL*,creator_name,Bob Simons
*GLOBAL*,creator_type,person
*GLOBAL*,creator_url,https://www.pfeg.noaa.gov
*GLOBAL*,featureType,trajectory
*GLOBAL*,infoUrl,https://erddap.github.io/NCCSV.html
*GLOBAL*,institution,"NOAA NMFS SWFSC ERD, NOAA PMEL"
*GLOBAL*,license,"""NCCSV Demonstration"" by Bob Simons and Steve Hankin is
licensed under CC BY 4.0, https://creativecommons.org/licenses/by/4.0/ ."
*GLOBAL*,keywords,"NOAA, sea, ship, sst, surface, temperature, trajectory"
*GLOBAL*,standard_name_vocabulary,CF Standard Name Table v55
*GLOBAL*,subsetVariables,"ship"
*GLOBAL*,summary,"This is a paragraph or two describing the dataset."
*GLOBAL*,title,"NCCSV Demonstration"
ship,*DATA_TYPE*,String
ship,cf_role,trajectory_id
time,*DATA_TYPE*,String
time,standard_name,time
time,units,"yyyy-MM-dd'T'HH:mm:ssZ"
lat,*DATA_TYPE*,double
lat,units,degrees_north
lon,*DATA_TYPE*,double
"lon","units","degrees_east"
status,*DATA_TYPE*,char
status,comment,"From http://some.url.gov/someProjectDocument , Table C"
testByte,*DATA_TYPE*,byte
testByte,units,1
testUByte,*DATA_TYPE*,ubyte
testUByte,units,1
testLong,*DATA_TYPE*,long
testLong,units,1
testULong,*DATA_TYPE*,ulong
testULong,units,1
sst,*DATA_TYPE*,float
sst,standard_name,sea_surface_temperature
sst,actual_range,0.17f,23.58f
sst,units,degree_C
sst,missing_value,99f
sst,testBytes,-128b,0b,127b
sst,testShorts,-32768s,0s,32767s
sst,testInts,-2147483648i,0i,2147483647i
sst,testLongs,-9223372036854775808L,0L,9223372036854775807L
sst,testFloats,-3.40282347e38f,0f,3.40282347E+38f
sst,testDoubles,-1.79769313486231570e308d,0d,1.79769313486231570E+308d
sst,testChars,"','","'""'","'€'"
sst,testStrings," a~,\n'z""\u20AC"
sst,testUBytes,0ub,127ub,255ub
sst,testUInts,0ui,2147483647ui,4294967295ui
sst,testULongs,0uL,9223372036854775807uL,18446744073709551615uL
sst,testUShorts,0us,32767us,65535us
*END_METADATA*
ship,time,lat,lon,status,testByte,testUByte,testLong,testULong,sst
Bell M. Shimada,2017-03-23T00:45:00Z,28.0002,-130.2576,A,-128, 0,-9223372036854775808L,0uL,10.9
Bell M. Shimada,2017-03-23T01:45:00Z,28.0003,-130.3472,\u20AC,0,127,-9007199254740992L,9223372036854775807uL,10.0
"Bell M. Shimada","2017-03-23T02:45:00Z",28.0001,-130.4305,"'\t'",126,254,9223372036854775806L,18446744073709551614uL,99
"Bell M. Shimada",2017-03-23T12:45:00Z,27.9998,-131.5578,"'""'",127,255,9223372036854775807L,18446744073709551615uL,NaN
Notes:
- This sample file includes many difficult cases (e.g., char and
long variables and difficult String values).
Most NCCSV files will be much simpler.
- The license line is broken into two lines here, but is just
one line in the sample file.
- \u20ac is the \uhhhh encoding of €. \u00FC is the \uhhhh encoding of ü.
You can also use the unencoded characters directly.
- Many Strings in the example are enclosed by double quotes
even though they don't have to be,
e.g., many global attributes including the title, the lon units
attribute, and the 3rd line of data.)
- It would be clearer and better if the units attribute
for the testLong variable were written in double quotes indicating
it is a String value. But the current representation (1,
without quotes) will be interpreted correctly as a String,
not an integer, because there is no 'i' suffix.
- Unlike other numeric data types, the long values
in the data section have the suffix ('L') that identifies
their numeric data type.
This is required to prevent spreadsheets from interpreting the
values as floating point numbers and thus losing precision.
In a spreadsheet, as in an NCCSV file:
- Write numeric attribute values as specified for NCCSV files
(e.g., with a suffix letter, e.g., 'f', to identify attribute's data type).
- In Strings, all non-printable and special characters must be written
as either a JSON-like backslashed character
(e.g., \n for newline) or as the hexadecimal Unicode character number
(case insensitive) with the syntax
\uhhhh.
Notably, use \n (2 characters: backslash and 'n')
to indicate a new line within a String, not Alt Enter.
All printable characters may be written unencoded or with the syntax
\uhhhh.
The only differences between NCCSV files and the analogous spreadsheet
which follow these conventions are:
- NCCSV files have values on a line separated by commas.
Spreadsheets have values on a line in adjacent cells.
- Strings in NCCSV files are often surrounded by double quotes.
Strings in spreadsheets are never surrounded by double quotes.
- Internal double quotes ( " ) in Strings in NCCSV files appear as
2 double quotes.
Internal double quotes in spreadsheets appear as 1 double quote.
If a spreadsheet following these conventions is saved as a CSV file,
there will often be extra commas at the end of many of the lines.
The software that converts NCCSV files into .nc files will ignore the extra commas.
To import an NCCSV file into Excel:
- Choose File : Open .
- Change the file type to Text Files (*.prn; *.txt; *.csv) .
- Search the directories and click on the NCCSV .csv file.
- Click Open .
To create an NCCSV file from an Excel spreadsheet:
- Choose File : Save As .
- Change the Save as type: to be CSV (Comma delimited) (*.csv) .
- In response to the compatibility warning, click Yes .
- The resulting .csv file will have extra commas at the end
of all the rows other than the CSV rows. You can ignore them.
In Excel, the sample NCCSV file above appears as
To import an NCCSV file into Google Sheets:
- Choose File : Import .
- Choose to Upload a file and click on
Upload a file from your computer .
Select the file, then click Open .
Or, choose My Drive and change the file type drop down
selection to All file types .
Select the file, then click Open .
To create an NCCSV file from a Google Sheets spreadsheet:
- Choose File : Save As .
- Change the Save as type: to be CSV (Comma delimited) (*.csv) .
- In response to the compatibility warning, click Yes .
- The resulting .csv file will have extra commas at the end of all the rows
other than the CSV rows. Ignore them.
- If you create an NCCSV file with a text editor or if you create an
analogous spreadsheet in a spreadsheet program, the text editor
or the spreadsheet program will not check that you followed these
conventions correctly. It is up to you to follow these conventions correctly.
- The conversion of a spreadsheet following this convention
into a csv file (thus, an NCCSV file) will lead to extra commas
at the end of all the rows other than the CSV data rows.
Ignore them.
The software then converts NCCSV files into .nc files will ignore them.
- If an NCCSV file has excess commas at the end of rows,
you can remove them by converting the NCCSV file into a NetCDF file
and then converting the NetCDF file back into a NCCSV file.
- When you try to convert an NCCSV file into a NetCDF file, some errors
will be detected by the software and will generate error messages,
causing the conversion to fail.
Other problems are hard or impossible to catch and will not
generate error messages or warnings. Other problems
(e.g., excess commas at the end of rows) will be ignored.
The file converter will do only minimal checking of correctness
of the resulting NetCDF file, e.g., in regards to CF compliance.
It is the file creator's and file user's responsibility to check
that the results of the conversion are as desired and correct.
Two ways to check are:
The original version was
NCCSV v1.00 (in ERDDAP™ v1.76, released 2017-05-12)
- Changes introduced in NCCSV v1.10 (in ERDDAP™ v2.10, released 2020-11-05):
- Added support for ubyte, ushort, uint, ulong.
Thanks to CF for adding support for these data types in CF.
- Changes introduced in v1.20 (in ERDDAP™ v2.23, released 2023-02-27):
- Switched from ASCII character encoding to UTF-8 encoding for NCCSV .csv files.
- ERDDAP™ can still read files from all previous and current versions of NCCSV.
- ERDDAP™ now always writes NCCSV v1.20 files.
- If you wrote a client to read NCCSV files, change it so it treats all NCCSV files as UTF-8 files.
That will work with older NCCSV files because ASCII is a subset of the UTF-8 character encoding.
- Thanks to Pauline Chauvet, Nate, and Thomas Gardiner.
Questions, comments, suggestions? Please send an email to
erd dot data at noaa dot gov .
ERDDAP, Version 2.25_1
Disclaimers |
Privacy Policy