org.virbo.dsutil
Class AsciiParser

java.lang.Object
  extended by org.virbo.dsutil.AsciiParser

public class AsciiParser
extends java.lang.Object

Class for reading ascii tables into a QDataSet. This parses a file by breaking it up into records, and passing the record off to a delegate record parser. The record parser then breaks up the record into fields, and each field is parser by a delegate field parser. Each column of the table as a Unit and a field name associated with it. Examples of record parsers include DelimParser, which splits the record by a delimiter such as a tab or comma, and FixedColumnsParser, which splits the record by character positions. Example of field parsers include DOUBLE_PARSER which parses the value as a double, and UNITS_PARSER, which uses the Unit attached to the column to interpret the value. The skipLines property tells the parser to skip a given number of header lines before attempting to parse the record. Also, commentPrefix identifies lines to be ignored. In either the header or in comments, we look for propertyPattern, and if a property is matched, then the builder property is set. Two Patterns are provided NAME_COLON_VALUE_PATTERN and NAME_EQUAL_VALUE_PATTERN for convenience. Adapted to v3.0 QDataSet model, Jeremy, May 2007.


Nested Class Summary
 class AsciiParser.DelimParser
          DelimParser splits the line on a regex (like "," or "\\s+") to create the fields.
static interface AsciiParser.FieldParser
           
 class AsciiParser.FixedColumnsParser
           
static interface AsciiParser.RecordParser
           
 class AsciiParser.RegexParser
           
 
Field Summary
static java.util.regex.Pattern COLUMN_HEADER_PATTERN
           
static java.lang.String DELIM_COMMA
           
static java.lang.String DELIM_TAB
           
static java.lang.String DELIM_WHITESPACE
           
static AsciiParser.FieldParser DOUBLE_PARSER
          parses the field using Double.parseDouble, java's double parser.
static java.util.regex.Pattern NAME_COLON_VALUE_PATTERN
           
static java.util.regex.Pattern NAME_EQUAL_VALUE_PATTERN
           
static java.lang.String PROP_VALIDMAX
           
static java.lang.String PROP_VALIDMIN
           
static java.lang.String PROPERTY_FIELD_NAMES
           
static java.lang.String PROPERTY_FIELD_PARSER
           
static java.lang.String PROPERTY_FILE_HEADER
           
static java.lang.String PROPERTY_FIRST_RECORD
           
 AsciiParser.FieldParser UNITS_PARSER
          delegates to the unit object set for this field to parse the data.
protected  double validMax
           
protected  double validMin
           
 
Constructor Summary
AsciiParser()
          Creates a new instance of AsciiParser
 
Method Summary
 void addPropertyChangeListener(java.beans.PropertyChangeListener l)
          Adds a PropertyChangeListener to the listener list.
 int getFieldIndex(java.lang.String string)
          returns the index of the field
 java.lang.String[] getFieldNames()
          return the name of each field.
 double getFillValue()
          Getter for property fillValue.
 AsciiParser.RecordParser getRecordParser()
          Getter for property recordParser.
 org.das2.datum.Units getUnits(int index)
          Indexed getter for property units.
 double getValidMax()
           
 double getValidMin()
           
 AsciiParser.DelimParser guessDelimParser(java.lang.String line)
          read in the first record, then guess the delimiter and possibly the column headers.
static int guessFieldCount(java.lang.String filename)
          return the field count that would result in the largest number of records parsed.
 boolean isKeepFileHeader()
          Getter for property keepHeader.
static void main(java.lang.String[] args)
           
static AsciiParser newParser(int fieldCount)
          creates a parser with @param fieldCount fields, named "field0,...,fieldN"
static AsciiParser newParser(java.lang.String[] fieldNames)
          creates a parser with the named fields.
 WritableDataSet readFile(java.lang.String filename, org.das2.util.monitor.ProgressMonitor mon)
          Parse the file using the current settings.
 java.lang.String readFirstParseableRecord(java.lang.String filename)
          returns the first record that the record parser parses successfully.
 java.lang.String readFirstRecord(java.lang.String filename)
          return the first record that the parser would parse.
 WritableDataSet readStream(java.io.Reader in, org.das2.util.monitor.ProgressMonitor mon)
          Parse the stream using the current settings.
 void removePropertyChangeListener(java.beans.PropertyChangeListener l)
          Removes a PropertyChangeListener from the listener list.
 void setCommentPrefix(java.lang.String comment)
          Records starting with this are not processed as data, for example "#".
 AsciiParser.DelimParser setDelimParser(java.io.Reader in, java.lang.String delimRegex)
          The DelimParser splits each record into fields using a delimiter like "," or "\\s+".
 AsciiParser.RecordParser setDelimParser(java.lang.String filename, java.lang.String delimRegex)
          configure the parser to split on a delimRegex.
 void setFieldParser(int field, AsciiParser.FieldParser fp)
           
 void setFillValue(double fillValue)
          numbers that parse to this value are considered to be fill.
 AsciiParser.FixedColumnsParser setFixedColumnsParser(int[] columnOffsets, int[] columnWidths, AsciiParser.FieldParser[] parsers)
           
 AsciiParser.FixedColumnsParser setFixedColumnsParser(java.io.Reader in, java.lang.String delim)
          looks at the first line after skipping, and splits it to calculate where the columns are.
 AsciiParser.FixedColumnsParser setFixedColumnsParser(java.lang.String filename, java.lang.String delim)
          looks at the first line after skipping, and splits it to calculate where the columns are.
 void setKeepFileHeader(boolean keepHeader)
          Setter for property keepHeader.
 void setPropertyPattern(java.util.regex.Pattern propertyPattern)
          specify the Pattern used to recognize properties.
 void setRecordCountLimit(int recordCountLimit)
          limit the number of records read.
 void setRecordParser(AsciiParser.RecordParser recordParser)
          Setter for property recordParser.
 AsciiParser.RecordParser setRegexParser(java.lang.String[] fieldNames)
          The regex parser is a slow parser, but gives precise control.
 void setSkipLines(int skipLines)
          skip a number of lines before trying to parse anything.
 void setUnits(int index, org.das2.datum.Units units)
          Indexed setter for property units.
 void setValidMax(double validMax)
           
 void setValidMin(double validMin)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NAME_COLON_VALUE_PATTERN

public static final java.util.regex.Pattern NAME_COLON_VALUE_PATTERN

NAME_EQUAL_VALUE_PATTERN

public static final java.util.regex.Pattern NAME_EQUAL_VALUE_PATTERN

COLUMN_HEADER_PATTERN

public static final java.util.regex.Pattern COLUMN_HEADER_PATTERN

PROPERTY_FIELD_NAMES

public static final java.lang.String PROPERTY_FIELD_NAMES
See Also:
Constant Field Values

PROPERTY_FILE_HEADER

public static final java.lang.String PROPERTY_FILE_HEADER
See Also:
Constant Field Values

PROPERTY_FIRST_RECORD

public static final java.lang.String PROPERTY_FIRST_RECORD
See Also:
Constant Field Values

PROPERTY_FIELD_PARSER

public static final java.lang.String PROPERTY_FIELD_PARSER
See Also:
Constant Field Values

DELIM_COMMA

public static final java.lang.String DELIM_COMMA
See Also:
Constant Field Values

DELIM_TAB

public static final java.lang.String DELIM_TAB
See Also:
Constant Field Values

DELIM_WHITESPACE

public static final java.lang.String DELIM_WHITESPACE
See Also:
Constant Field Values

DOUBLE_PARSER

public static final AsciiParser.FieldParser DOUBLE_PARSER
parses the field using Double.parseDouble, java's double parser.


UNITS_PARSER

public final AsciiParser.FieldParser UNITS_PARSER
delegates to the unit object set for this field to parse the data.


validMin

protected double validMin

PROP_VALIDMIN

public static final java.lang.String PROP_VALIDMIN
See Also:
Constant Field Values

validMax

protected double validMax

PROP_VALIDMAX

public static final java.lang.String PROP_VALIDMAX
See Also:
Constant Field Values
Constructor Detail

AsciiParser

public AsciiParser()
Creates a new instance of AsciiParser

Method Detail

setDelimParser

public AsciiParser.RecordParser setDelimParser(java.lang.String filename,
                                               java.lang.String delimRegex)
                                        throws java.io.IOException
configure the parser to split on a delimRegex. For example, " +" one or more spaces "\t" tab "\\s*" whitespace "," comma see DELIM_COMMA, DELIM_WHITESPACE, etc.

Throws:
java.io.IOException

readFirstRecord

public java.lang.String readFirstRecord(java.lang.String filename)
                                 throws java.io.IOException
return the first record that the parser would parse. If skipLines is more than the total number of lines, or all lines are comments, then null is returned.

Parameters:
filename -
Returns:
the first line after skip lines and comment lines.
Throws:
java.io.IOException

readFirstParseableRecord

public java.lang.String readFirstParseableRecord(java.lang.String filename)
                                          throws java.io.IOException
returns the first record that the record parser parses successfully. The recordParser should be set and configured enough to identify the fields. If no records can be parsed, then null is returned.

Parameters:
filename -
Returns:
the first parseable line, or null if no such line exists.
Throws:
java.io.IOException

guessDelimParser

public AsciiParser.DelimParser guessDelimParser(java.lang.String line)
                                         throws java.io.IOException
read in the first record, then guess the delimiter and possibly the column headers.

Parameters:
Reader - pointed to the beginning of the file.
Returns:
RecordParser object that can be queried. (Strange interface.)
Throws:
java.io.IOException

setDelimParser

public AsciiParser.DelimParser setDelimParser(java.io.Reader in,
                                              java.lang.String delimRegex)
                                       throws java.io.IOException
The DelimParser splits each record into fields using a delimiter like "," or "\\s+".

Parameters:
in -
delimRegex -
Returns:
Throws:
java.io.IOException

setRegexParser

public AsciiParser.RecordParser setRegexParser(java.lang.String[] fieldNames)
The regex parser is a slow parser, but gives precise control.


setFixedColumnsParser

public AsciiParser.FixedColumnsParser setFixedColumnsParser(java.lang.String filename,
                                                            java.lang.String delim)
                                                     throws java.io.IOException
looks at the first line after skipping, and splits it to calculate where the columns are. The FixedColumnsParser is the fastest of the three parsers.

Parameters:
filename - filename to read in.
delim - regex to split the initial line into the fixed columns.
Returns:
the record parser that will split each line.
Throws:
java.io.IOException

setFixedColumnsParser

public AsciiParser.FixedColumnsParser setFixedColumnsParser(java.io.Reader in,
                                                            java.lang.String delim)
                                                     throws java.io.IOException
looks at the first line after skipping, and splits it to calculate where the columns are.

Parameters:
in - the Reader to get lines from.
delim - regex to split the initial line into the fixed columns.
Returns:
the record parser that will split each line.
Throws:
java.io.IOException

guessFieldCount

public static int guessFieldCount(java.lang.String filename)
                           throws java.io.FileNotFoundException,
                                  java.io.IOException
return the field count that would result in the largest number of records parsed. The entire file is scanned, and for each line the number of decimal fields is counted. At the end of the scan, the fieldCount with the highest record count is returned.

Throws:
java.io.FileNotFoundException
java.io.IOException

setFieldParser

public void setFieldParser(int field,
                           AsciiParser.FieldParser fp)

newParser

public static AsciiParser newParser(int fieldCount)
creates a parser with @param fieldCount fields, named "field0,...,fieldN"


newParser

public static AsciiParser newParser(java.lang.String[] fieldNames)
creates a parser with the named fields.


setSkipLines

public void setSkipLines(int skipLines)
skip a number of lines before trying to parse anything. This can be set to point at the first valid line, and the RecordParser will be configured using that line.


setRecordCountLimit

public void setRecordCountLimit(int recordCountLimit)
limit the number of records read. parsing will stop once this number of records is read. This is Integer.MAX_VALUE by default.


setPropertyPattern

public void setPropertyPattern(java.util.regex.Pattern propertyPattern)
specify the Pattern used to recognize properties. Note property values are not parsed, they are provided as Strings.


setCommentPrefix

public void setCommentPrefix(java.lang.String comment)
Records starting with this are not processed as data, for example "#". This is initially "#". Setting this to null disables this check.

Parameters:
comment -

readStream

public WritableDataSet readStream(java.io.Reader in,
                                  org.das2.util.monitor.ProgressMonitor mon)
                           throws java.io.IOException
Parse the stream using the current settings.

Throws:
java.io.IOException

setFixedColumnsParser

public AsciiParser.FixedColumnsParser setFixedColumnsParser(int[] columnOffsets,
                                                            int[] columnWidths,
                                                            AsciiParser.FieldParser[] parsers)

getFieldNames

public java.lang.String[] getFieldNames()
return the name of each field. field0, field1, ... are the default names when names are not discovered in the table.

Returns:

readFile

public WritableDataSet readFile(java.lang.String filename,
                                org.das2.util.monitor.ProgressMonitor mon)
                         throws java.io.IOException
Parse the file using the current settings.

Returns:
a rank 2 dataset.
Throws:
java.io.IOException

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Throws:
java.lang.Exception

addPropertyChangeListener

public void addPropertyChangeListener(java.beans.PropertyChangeListener l)
Adds a PropertyChangeListener to the listener list.

Parameters:
l - The listener to add.

removePropertyChangeListener

public void removePropertyChangeListener(java.beans.PropertyChangeListener l)
Removes a PropertyChangeListener from the listener list.

Parameters:
l - The listener to remove.

isKeepFileHeader

public boolean isKeepFileHeader()
Getter for property keepHeader.

Returns:
Value of property keepHeader.

setKeepFileHeader

public void setKeepFileHeader(boolean keepHeader)
Setter for property keepHeader. By default false but if true, the file header ignored by skipLines is put into the property PROPERTY_FILE_HEADER.

Parameters:
keepHeader - New value of property keepHeader.

getRecordParser

public AsciiParser.RecordParser getRecordParser()
Getter for property recordParser.

Returns:
Value of property recordParser.

setRecordParser

public void setRecordParser(AsciiParser.RecordParser recordParser)
Setter for property recordParser.

Parameters:
recordParser - New value of property recordParser.

getUnits

public org.das2.datum.Units getUnits(int index)
Indexed getter for property units.

Parameters:
index - Index of the property.
Returns:
Value of the property at index.

setUnits

public void setUnits(int index,
                     org.das2.datum.Units units)
Indexed setter for property units.

Parameters:
index - Index of the property.
units - New value of the property at index.

getFieldIndex

public int getFieldIndex(java.lang.String string)
returns the index of the field


getFillValue

public double getFillValue()
Getter for property fillValue.

Returns:
Value of property fillValue.

setFillValue

public void setFillValue(double fillValue)
numbers that parse to this value are considered to be fill.

Parameters:
fillValue - New value of property fillValue.

getValidMin

public double getValidMin()

setValidMin

public void setValidMin(double validMin)

getValidMax

public double getValidMax()

setValidMax

public void setValidMax(double validMax)