Comma-separated values
|
The comma-separated values (CSV) file format is a tabular data format that has fields separated by the comma character and quoted by the double quote character. If a field's value contains a double quote character it is escaped with a pair of double quote characters.
The CSV file format does not require a specific character encoding, byte order or line terminator format.
It is often not required by software to have fields quoted unless they contain a comma character.
Contents |
|
Formal specifications
While no formal specification for CSV exists, there are several informal documents describing the format (1 (http://www.catb.org/~esr/writings/taoup/html/ch05s02.html), 2 (http://www.ricebridge.com/products/csvman/reference.htm), 3 (http://www.edoceo.com/utilis/csv-file-format.php) and 4 (http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm)). The closest thing to a formal specification is this Internet Draft (http://www.ietf.org/internet-drafts/draft-shafranovich-mime-csv-05.txt).
MIME type
There are several informal MIME types used for CSV including "application/csv", "text/x-csv", etc. The formal MIME type for CSV is "text/csv" as specified by an IETF draft (http://www.shaftek.org/publications/drafts/mime-csv/draft-shafranovich-mime-csv-02.html) and registered by IANA.
Example
"Chicane", "Love on the Run", "Knight Rider", "This field contains a comma, but it doesn't matter as the field is quoted" "Samuel Barber", "Adagio for Strings", "Classical", "This field contains a double quote character, "", but it doesn't matter as it is escaped"
Application support
The CSV file format is a very simple data file format that is supported by almost all spreadsheet software such as Excel (Careful: some local versions use semicolons instead of commas!) and Gnumeric. Any programming language that has input/output and string processing functionality will be able to read and write CSV files.
CSV is similar in ubiquity for tabular data as ASCII files are for text data.
Programming language tools
.Net
CsvReader (http://www.geocities.com/shriop/index.html) is a commercial delimited file parsing utility focusing on speed and ease of use and usable from all .Net languages.
C/C++
Michael B Allen's CSV module (http://www.ioplex.com/~miallen/libmba/dl/src/csv.c) is small, complete, and robust.
Perl
With Text::CSV_XS and Text::CSV_PP
CSV files can be easily manipulated with the CPAN module Text::CSV_XS (http://search.cpan.org/author/JWIED/Text-CSV_XS/CSV_XS.pm). or with the equivalent pure perl module Text::CSV_PP (http://search.cpan.org/author/MAKAMAKA/Text-CSV_PP/lib/Text/CSV_PP.pm).
With DBI
CSV files can be accessed via SQL statements through DBI using a driver such as DBD::CSV (http://search.cpan.org/~jzucker/DBD-CSV-0.2002/lib/DBD/CSV.pm) or DBD::AnyData (http://search.cpan.org/~jzucker/DBD-AnyData-0.06/AnyData.pm).
With regular expressions
CSV files can be manipulated using Perl's built in text processing capabilites. This is easy to do incorrectly. For instance, the following code will convert comma delimited data into colon delimited data.
perl -ne 'print join q(:),(split /,/,$_)' < input.csv > output.csv
That code does not deal with quote marks, an integral part of the CSV format.
Java
Direct interface
CSVReader/Writer (http://mindprod.com/products.html#CSV) provides a simple Java interface to CSV file I/O and is free.
The Java CSV Library (http://sourceforge.net/projects/javacsv/) is an open-source (LGPL) currently in beta.
Stephen Ostermiller has released a library[1] (http://ostermiller.org/utils/ExcelCSV.html) under the GPL to read and write CSV for Excel.
This CSV class (http://www.ioplex.com/~miallen/CSV.txt) is small, complete, and has been widely used in production environments.
Ricebridge Java CSV Component (http://www.ricebridge.com/products/csvman.htm) is a commercial CSV interface for high-speed, high-volume data handling.
JDBC interface
CsvJdbc (http://sourceforge.net/projects/csvjdbc/) is a read-only JDBC driver released under the LGPL.
StelsCSV (http://www.csv-jdbc.com/) is a commercial JDBC driver for CSV file databases. It supports much of SQL'92.
FOSITEX by i-net software also includes a CSV JDBC driver [2] (http://www.inetsoftware.de/products/jdbc/fositex/).
On Microsoft Windows one can access a CSV file through SQL using ODBC. See Using CSV Files as Databases and Interacting with Them Using Java (http://www.devarticles.com/c/a/Java/Using-CSV-Files-as-Databases-and-Interacting-with-Them-Using-Java/).
Python
Python has a csv module (http://www.python.org/doc/current/lib/module-csv.html) in the standard library since version 2.3.
Utilities
The csvprint (http://www.ioplex.com/~miallen/libmba/dl/examples/csvprint.c) utility will reformat CSV input based on a format string. This can be useful for reordering fields or generating source code or tables as illustrated in the following example:
$ csvprint data.csv "\t{ %0, %1, %2, \"%3\" },\n" { 0xC0000008, 0x00060001, NT_STATUS_INVALID_HANDLE, "The handle is invalid." },