A Perl based
Survex/CaveSurvey XML Converter
for Cave Survey Data
Michael Lake
Mike.Lake@uts.edu.au
http://www.science.uts.edu.au/michael-lake/
This document details the program svx2xml, which converts cave survey data in Survex format to CaveScript XML format, and the program xml2svx which converts the latter back to Survex format.
These programs and the XML are released under the GNU General Public License Version 2, June 1991
<gnu copyright>= (U-> U->) # Programs svx2xml/xml2svx: convert cave surveying data between # Survex and CaveScript XML format. # # Copyright (C) 2000 Michael R. Lake # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
This documentation was written using --- a literate programming tool (Ref. [cite noweb]). There is a short document on using in the CaveScript root directory. Without LaTeX and you won't be able to easily extract the programs or example files from this document.
From this documentation one can extract the following files:
Makefile | A front end to all the commands,
noweb , notangle &noweave . |
svx2xml | The program that converts Survex to XML format |
xml2svx | The program that converts XML to Survex format |
example.svx | An example Survex file |
example.xml | An example XML file |
svx-xml.tex | Documentation in LaTeX format |
svx-xml.html | Documentation in HTML format |
The commands to extract these files are shown in Table [->]. The Makefile provides the easiest method to run the commands and is described later.
To extract: | Run: |
All the files: | noweb svx-xml.nw |
To extract the: | Run: |
Makefile: | notangle -t4 -RMakefile svx-xml.nw > Makefile |
svx2xml program: | notangle -t4 -Rsvx2xml svx-xml.nw > svx2xml |
xml2svx program: | notangle -t4 -Rxml2svx svx-xml.nw > xml2svx |
Example Survex file: | notangle -t4 -Rexample.svx svx-xml.nw > example.svx |
Example XML file: | notangle -t4 -Rexample2.xml svx-xml.nw > example2.xml |
LaTeX documentation | noweave -t4 -delay -index svx-xml.nw > svx-xml.tex |
latex svx-xml.tex | |
HTML documentation | noweave -html -filter 'l2h -show-unknowns' -x svx-xml.nw | htmltoc > svx-xml.html |
Throughout the dvi and Postscript documentation you will see that each chunk of code is uniquely identified by a page number and an alphabetic sub-page reference. An example is:
10b <cavesurvey.dtd 9>+=== (15) 10a 11
This line tells us that we are now in code chunk 10b. This code chunk is on page 10 and it is the second code chunk defined on this page.
The construct <cavesurvey.dtd 9>+=== tells us that we are in a code chunk called cavesurvey.dtd, that its definition began in chunk 9 and the +=== means we are adding to its definition (concatenates definitions with the same name in order of appearance).
At the right margin we find: (15) 10a 11
This tells us that the chunk we're defining is used within chunk 15, and that this current chunk is continued from chunk 10a and is continued in chunk 11.
At the end of each code chunk a %def can be used to define any variables within that code chunk that we want to cross reference. These defined variables get listed in the index with a page number to where they were defined. The LaTeX hyperref package is being used so this page number will be a hyperlink and show as underlined.
Any defined variables enclosed in double square brackets like this
[[variable]]
in the documentation text becomes a hyperlink,
again to the place where that variable is defined.
The following Makefile provides a convenient way to create the code or
documentation after modifications to the source file rather than
typing all the notangle
or noweave
commands. In fact all development
is usually done changing the source file and running the appropriate
make
command. One generally never changes the output files directly (except
for quick hacks).
To extract the Makefile:
notangle -t4 -RMakefile svx-xml.nw > Makefile
Run ``make help
'' to see what options there are.
One can then modify the source file and extract the new code or
documentation.
For instance, after making changes to any of the programs or example files in
the source file I run ``make dvi
'' to see my changes in xdvi
.
<Makefile>= # Makefile for creating svx-xml scripts NOWEB_SOURCE = svx-xml.nw DIST_LOCATION = /home/mikel/Web_pages/mikes_homepage/cavescript # List of all files for a distribution DIST_LIST = README COPYING MANIFEST DIST_LIST := $(DIST_LIST) svx-xml.html DIST_LIST := $(DIST_LIST) svx-xml.ps DIST_LIST := $(DIST_LIST) svx-xml.nw DIST_LIST := $(DIST_LIST) svx2xml DIST_LIST := $(DIST_LIST) xml2svx DIST_LIST := $(DIST_LIST) example.svx DIST_LIST := $(DIST_LIST) example.xml DIST_LIST := $(DIST_LIST) example2.xml # If the user just types 'make' with no args then help, being the first routine # will be invoked. help: @echo 'Usage: make [code examples dvi ps html dist clean]' # Specify how to make the programs and examples svx2xml: $(NOWEB_SOURCE) notangle -t4 -Rsvx2xml $(NOWEB_SOURCE) > svx2xml xml2svx: $(NOWEB_SOURCE) notangle -t4 -Rxml2svx $(NOWEB_SOURCE) > xml2svx example.svx: $(NOWEB_SOURCE) notangle -t4 -Rexample.svx $(NOWEB_SOURCE) > example.svx example.xml: example.svx ./svx2xml example.svx > example.xml example2.xml: $(NOWEB_SOURCE) notangle -t4 -Rexample2.xml $(NOWEB_SOURCE) > example2.xml # Create code and examples code: svx2xml xml2svx $(MAKE) svx2xml xml2svx chmod u+x svx2xml xml2svx examples: example.svx example.xml example2.xml # Create documentation dvi: $(NOWEB_SOURCE) noweave -t4 -delay -index $(NOWEB_SOURCE) >| svx-xml.tex latex svx-xml.tex @echo @echo 'You may need to run latex again' @echo @echo 'latex svx-xml.tex' ps: dvi dvips svx-xml.dvi -o svx-xml.ps html: $(NOWEB_SOURCE) noweave -html -filter l2h -index $(NOWEB_SOURCE) | htmltoc >| svx-xml.html all: code examples html ps # Removes unnecessary LaTeX files like *.aux, *.log etc. clean: # lintex rm svx2xml xml2svx rm -f *.aux *.dvi *.tex *.toc *.log rm -f svx-xml.html svx-xml.ps rm -f example.svx example2.xml rm -f *.err rm -f *.inf rm -f *.pos rm -f *.3d dist: tar cvf - $(DIST_LIST) | gzip > svx-xml-0.1.tar.gz mv *.gz $(DIST_LOCATION)/download cp svx-xml.html $(DIST_LOCATION)/docs cp example.svx $(DIST_LOCATION)/docs cp example.xml $(DIST_LOCATION)/docs cp example2.xml $(DIST_LOCATION)/docs
The program svx2xml
requires the Perl module XML::Parser.
The program xml2svx
requires the Perl module XML::Writer.
``XML::Writer is a simple Perl module for writing XML documents: it takes care of constructing markup and escaping data correctly, and by default, it also performs a significant amount of well-formedness checking on the output, to make certain (for example) that start and end tags match, that there is exactly one document element, and that there are not duplicate attribute names.''
The program svx2xml
converts Survex format files into CaveScript
format XML files. XML files are more verbose; the small example Survex file is
around 1.5 kbytes versus the 2.5 kbyte XML file.
Some space can be saved by not having returns at the end of the XML tags. This
saves from 10 to 20 %.
The Survex commands supported in this version of
svx2xml
and xml2svx
are listed in Table [->].
A `yes' indicates that the command is fully suported, a × that it
is not supported and a - indicates that there is partial or qualified support.
Command | Supported | Comments |
; comments | yes | supported at start or end of lines |
*begin | yes | |
*calibrate | yes | |
*case | × | |
*data | yes | type normal and diving are supported |
*default | × | |
*end | yes | |
*equate | × | not yet - will be implemented via XPointers and XLinks |
*fix | yes | |
*include | × | |
*infer | × | |
*prefix | × | deprecated by *begin and *end - will not be supported |
*sd | yes | |
*set | × | |
*solve | × | |
*title | × | |
*truncate | × | |
*units | × |
svx2xml
The svx2xml script starts with a preamble
code chunk which includes the
usual Perl things, a copyright statement and variable declarations.
This is followed by start xml
which writes the beginning of the XML
file---the required declarations which are are not dependent on the content
of the Survex file being parsed.
The main work of the program is in process survex lines
where Survex
*commands
are parsed and converted to XML tags.
Finally the end xml
closes any open XML tags.
<svx2xml>= <svx2xml preamble> <svx2xml start xml> <svx2xml process survex lines> <svx2xml end xml> ################################### ### Survex specific subroutines ### ################################### <svx2xml survex subroutines> ################################# ### Miscellaneous Subroutines ### ################################# <misc subroutines>
Each of these code chunks will now be described.
Usual Perl stuff, the GNU copyright, and variable declarations.
A parameter that can be changed here is NEWLINES
.
When the new object $writer
is created we can set NEWLINES => 0
or NEWLINES => 1
or not specify it at all.
If this value is true, then XML::Writer will insert an extra newline before the
closing delimiter of start, end, and empty tags so that the document does not
end up as a single, long line. If the paramter is not present, or is false,
the module will not insert the newlines.
<svx2xml preamble>= (<-U) #!/usr/bin/perl -w # Survex format to XML format # This program is generated from noweb documentation. <gnu copyright> # Program Usage: program_name < survex_data_file > xml_data_file use strict; use XML::Writer; # see perldoc XML::Writer my $i = 0; my ($tmp, @tmp); my $todays_date; my $data_in; my @data_in; my $writer = new XML::Writer(NEWLINES => 0); my $data_type = "normal"; my @data_type_stack = ""; # When we change the Survex data-type we push the # last one onto this stack. # Declare and initialise a hash for the data ordering. my %data_order_default; $data_order_default{"from"} = 0; $data_order_default{"to"} = 1; $data_order_default{"tape"} = 2; $data_order_default{"compass"} = 3; $data_order_default{"clino"} = 4; # Copy the default data-ordering hash to the current working hash. my %data_order_current = %data_order_default;
Defines%data_order_current
,%data_order_default
,$data_type
(links are to index).
The string $data_type
and hash %data_order_current
will be set
each time we encounter a *data <type> <ordering>
and used each time we process a Survex data line.
All XML files start with an XML declaration, optionally any DOCTYPE
declarations and a root element.
XML declaration: <?xml version="1.0" encoding="UTF-8" standalone="no"?>
XML declaration: <!DOCTYPE CAVESURVEY SYSTEM "CaveSurvey.dtd">
Root element : <CAVESURVEY>
<svx2xml start xml>= (<-U) # Write an XML declaration and a comment that the XML file was created from # this Survex to XML conversion program. $writer->xmlDecl("", "no"); $writer->doctype("CAVESURVEY", "", "CaveSurvey.dtd"); # temporarily removed date function as Martin Laverty reported that it crashes Perl # on M$ Windows # $tmp = "This file was generated from svx2xml on ".`date`; $tmp = "This file was generated from svx2xml"; chomp($tmp); $writer->comment($tmp); # Write the opening root element. $writer->startTag("CAVESURVEY"); $writer->characters("\n");
Defines$writer->characters()
,$writer->comment()
,$writer->startTag
,$writer->xmlDecl()
(links are to index).
Now read in the entire file and save each line in an array @data_in
then use
a for
loop to process each line. Code for processing Survex blank lines and
comment lines follows this code section.
<svx2xml process survex lines>= (<-U) # Read data from STDIN @data_in = read_stdin(); # Process Survex lines (* commands are in alphabetical order) for ($i=0; $i<=$#data_in; $i++) { # Survex blank line. <svx2xml blank line> # Survex comment line. <svx2xml comment line> # NOW we check for *commands. This way any *commands commented out with ; # will have been turned into XML comments. We are also checking that the # *command is at the start of a line - superfluous but lets do it. # See the comments made about this in the full noweb documentation. # Survex "*begin series_name" line to XML elsif ($data_in[$i] =~ /^\*begin/i) { svx_begin($data_in[$i]); } # Survex "*calibrate instrument value" line to XML elsif ($data_in[$i] =~ /^\*calibrate/i) { svx_calibrate($data_in[$i]); } # Survex "*data type ordering" line to XML elsif ($data_in[$i] =~ /^\*data/i) { svx_datatype($data_in[$i]); } # Survex "*end series_name" line to XML elsif ($data_in[$i] =~ /^\*end/i) { svx_end($data_in[$i]); } # Survex "*equate station1 station2" line to a XML elsif ($data_in[$i] =~ /^\*equate/i) { svx_equate($data_in[$i]); } # Survex "*fix station" line to XML. elsif ($data_in[$i] =~ /^\*fix/i) { svx_fix($data_in[$i]); } # Survex "*include filename.svx" line to XML. elsif ($data_in[$i] =~ /^\*include/i) { svx_include($data_in[$i]); } # Survex "*sd instrument value units" line to XML elsif ($data_in[$i] =~ /^\*sd/i) { svx_sd($data_in[$i]); } # At this point we assume it's valid survey data in a Survex format. # Default order is "From To Tape Compass Clino" but a diff order # can be handled by the *data command. else { svx_survey_data($i, $data_in[$i]); } }
read_stdin()
removes
all leading spaces and the trailing newline of each line read in. Hence
for a blank line $data_in
will contain no characters at all ie ""
.
No subroutine to handle this is required---we just write out a newline into the
XML file.
<svx2xml blank line>= (<-U) if ($data_in[$i] eq "") { $writer->characters("\n"); }
/;/
then any command or data followed by a Survex comment would be treated as one
long comment. We only want to enclose the entire line if a semicolon starts the
line, though perhaps with whitespace in front. Hence the pattern matching of
/^\s*;/
.
<svx2xml comment line>= (<-U) # FIRST we check for comment lines BEFORE we check for *command lines. # A Survex comment is a line comprising zero or more white space followed # by a semicolon and trailing text. Convert to XML comment. elsif ($data_in[$i] =~ /^\s*;/) { svx_comment($data_in[$i]); }
*
Commands:/^\*begin/i
.
The *
in *begin
is escaped but notice the caret at the start. Matching
on a *command
string that appears only at the start of a line is good in
case the user has a *command
within a comment eg.
; should we use an *equate here?
We don't actually need this caret because we match commands
after we have matched for a comment, so the comment satisfies the
match first and the line is converted to an XML comment. The if
block then
never gets to test for a *command
if the line is a comment. But lets leave the caret in in case we swap elsif
's around. Clear?
Finally the presence of an i
after the final slash for the Survex command pattern matching provides
case insensitivity so that a *BEGIN
will match as well as a *begin
Finally end the XML document by closing the root element ie appending:
</CAVESURVEY>
<svx2xml end xml>= (<-U) $writer->endTag("CAVESURVEY"); # Finish the XML document. This method will # check that the document has exactly one document # element, and that all start tags are closed: $writer->end();
Survex cave survey data usually begins with some comments which describes the cave, the survey and the surveyors.
A note is inserted at the start of the example file so that I will know where it came from.
<example.svx>= [D->] ; This Survex example file is generated from noweb documentation. ; WOMBEYAN CAVES, NSW ; Sigma Cave (W45), upstream section from the top of `Fallaway Drop' ; along the streamway to `Knockers Cavern Two'. ; Club: Sydney University Speleological Society ; Surveyors: Mike Lake, Jill Rowling, Geoff McDonnell ; Instruments: Suunto Twin, 30m fibreglass tape (SUSS1) and 8m steel tape ; for cross sections. ; Date: 1st November 1997
In XML this information would stored as the attributes of elements (see CaveScript Document Type Definitions), however in parsing the Survex file a program can't differentiate one comment from another and place the correct information into appropriate elements. There are two choices for what to do with Survex comments:
CDATA
sections
The first choice is easier. Perl code to convert Survex comment lines to the XML format is shown below.
<svx2xml survex subroutines>= (<-U) [D->] sub svx_comment { # Survex comment line to XML comment line. my @tmp = split (/;/, $_[0]); # split the original string again! $writer->comment($tmp[1]); # write out the second half. $writer->characters("\n"); }
Definessvx_comment()
(links are to index).
The svx_comment
subroutine is invoked like svx_comment($data_in[$i]);
where $data_in
is the complete Survex line.
The code just splits the line on the semicolon and writes out the right hand side.
Comments are processed whether they occur at the start of a line or at the ends of command or data lines. The semicolon starting a comment can have white-space before it on a line.
<svx2xml comment trailing>= (U-> U-> U-> U-> U-> U-> U-> U-> U->) if ($_[0] =~ /;/) { svx_comment($_[0]); } else { $writer->characters("\n"); }
This section covers the Survex *commands
which are covered in alphabetical
order. The *commands
in the Survex file can be in upper or lower case and
both cases are supported.
The example Survex data is now extended to include some typical survex commands:
<example.svx>+= [<-D->] *begin sigma *calibrate compass 0.0 *calibrate declination -12.0 *calibrate tape 0.1 0.95 ; Tape is marked SUSS1, stretched by 5% *data normal from to tape compass clino *fix 90 1240 3512 700.0 ; 1240m East, 3512m North, 700m altitude
The *begin
starts a survey series and between *begin
's and *end
's
in Survex any settings are saved away and restored between survey series.
There are two cases to consider; a
*begin
by itself or a *begin series_name
ie. begin followed by
the name of a survey series.
<svx2xml survex subroutines>+= (<-U) [<-D->] sub svx_begin { # Survex "*begin survey_name ; comment" line. # Save away the current data_type (ie normal or diving) so # it can be restored on meeting an *end push(@data_type_stack, $data_type); my @tmp = split(/\s+/, $_[0]); if (!$#tmp) # Case 1: "*begin " ie. no name or trailing comments { $writer->startTag("SERIES"); } else # Case 2: "*begin something [something else]" { # ie. @tmp array will be (*begin, name [; comments...] ) so # $tmp[0] will be *begin always # $tmp[1] will be either ";" or will be a series "name" but NOT null # $tmp[2] will be either ";" or comment characters or else "" ie null. if ($tmp[1] eq ";") { $writer->startTag("SERIES"); } else { $writer->startTag("SERIES", "NAME" => $tmp[1]); } } # Cope with case of a trailing comment. <svx2xml comment trailing> }
DefinesSERIES
,svx_begin()
(links are to index).
We first check the length of the array @tmp
after
splitting the input string. If the *begin has no series name ie. and there
were no trailing comments (ie just ``*begin
'') then the last index of the
$tmp
array will be zero. Negate this so that we write out SERIES
and
finish. (Yes, testing for the case of a trailing comment is then done
unneccessarily at the end.)
If the size of the @tmp
array is >=1 then we have either a name or
comments or both. Split on white space and if $tmp[1]
is a semicolon then
there is no prefix so again output SERIES
. If it's not a semicolon then it
has to be a prefix possibly followed by comments. Finally test for any comment
and print them.
If the Survex line contains a *calibrate
command then it is either
a declination correction or an instrument correction. The two are handled
slightly differently in that there is no scale correction for a declination.
A `*calibrate declination -12.5
' will become:<AREA DECLINATION="-12.0" />
While `*calibrate tape -0.2 0.95
' will become:<INSTRUMENT><TAPE ZERO_CORRECT="0.2" SCALE="0.95" /></INSTRUMENT>
This will apply for instruments such as tape, compass, clino, counter and for measurements such as depth and x, y, z positions. If a scale is provided for a declination it will be ignored.
The zero and scale corrections applied by Survex are:
value = ( reading - zero_correction) * scale_correction
The zero error and scale defaults to 0.0 and 1.0 respectively.
<svx2xml survex subroutines>+= (<-U) [<-D->] sub svx_calibrate { # Survex "*calibrate instrument value" line to XML line # <instrument ZERO_CORRECT="value" SCALE_CORRECT="value"> tag. # or # Survex "*calibrate declination value" line to XML line # <AREA DECLINATION="value" > tag. my @tmp = split(/\s+/, $_[0]); # ie. @tmp array will be either, using tape as an example... # (*calibrate, tape, 0.1 [scale]) or (*calibrate, declination, 12.5 [1]) # $tmp[1] will be either the instrument we are calibrating or declination # $tmp[2] will be it's value. # An instrument may have a scale value as well. If it doesn't just # set it to 1. # $tmp[3] its scale value if its an instrument. $tmp[1] =~ tr/a-z/A-Z/; # Make sure element name is UPPERCASE. if ( $tmp[1] =~ /DECLINATION/) { # If there is a scale given ie. $tmp[3] exists we don't write an # attribute for it as in the case of an instrument as it has no # meaning for a declination. $writer->emptyTag("AREA", "DECLINATION" => $tmp[2]); } else { if (!$tmp[3]) # if there is no scale then set its value to 1 { $tmp[3] = 1; } $writer->startTag("INSTRUMENT"); $writer->emptyTag($tmp[1], "ZERO_CORRECT" => $tmp[2], "SCALE_CORRECT" => $tmp[3]); $writer->endTag("INSTRUMENT"); } # Cope with case of a trailing comment. <svx2xml comment trailing> }
DefinesDECLINATION
,INSTRUMENT
,SCALE_CORRECT
,svx_calibrate()
,ZERO_CORRECT
(links are to index).
The order of the Survex data can be specified with the command:*data <type> <ordering>
where | <type> | = | normal | diving |
and | <ordering> | = | A selection from one of the following: |
normal from to length compass clino | |||
or | diving from to length compass fromdepth todepth | ||
or | cartesian from to dx dy dz
|
The default ordering if not specified is:
*data normal from to tape compass clino
If the data ordering is normal
then there must be 5 data fields in a
Survex data line whereas if the data ordering is diving
then there must
be 6 data fields.
does not require a new element to deal with data ordering as
all data is an attribute of the element shot. However we do
need to process the *data
command and store the current ordering so
when we read in the data lines we know what fields to assign in what
order in case some users use a different order from the default.
We will do this using the hash %data_order_default
. This was
initialised immediately prior to processing the Survex lines
(Section [<-]) to reflect the default data
ordering in Survex.
%data_order_default = ("from", 0, "to", 1, "tape", 2, "compass", 3, "clino", 4);
which sets up the following key-value pairs;
key: | from | to | tape | compass | clino |
value: | 0 | 1 | 2 | 3 | 4 |
<svx2xml survex subroutines>+= (<-U) [<-D->] sub svx_datatype { # Survex "*data <type> <ordering>" line to XML line # Previously initialised using the hash %data_order_current. my $i; my @tmp = split(/\s+/, $_[0]); # Cases might be: # *data normal from to tape compass clino # tmp[0] tmp[1] tmp[2] tmp[3] tmp[4] tmp[5] tmp[6] # *data diving from to tape compass fromdepth todepth # tmp[0] tmp[1] tmp[2] tmp[3] tmp[4] tmp[5] tmp[6] tmp[7] $tmp[1] =~ tr/A-Z/a-z/; # Make sure element name if 'NORMAL' or # 'DIVING' is converted to lowercase. if ( $tmp[1] =~ /normal/) { $data_type = "normal"; # We only want $tmp[2] through $tmp[6] ie the 5 fields # from to tape compass clino for ($i=2; $i<=6; $i++) { $data_order_current{$tmp[$i]} = $i-2; # ie. $data_order_current{"from"} = 0 # $data_order_current{"to"} = 1 etc... # debug line # print STDERR "Data is $tmp[1] key=", $tmp[$i], " value=", $i, "\n"; } } elsif ( $tmp[1] =~ /diving/) { $data_type = "diving"; # We only want $tmp[2] through $tmp[7] ie the 6 fields # from to tape compass fromdepth todepth for ($i=2; $i<=7; $i++) { $data_order_current{$tmp[$i]} = $i-2; # debug line # print STDERR "Data is $tmp[1] key ", $tmp[$i], " value=", $i, "\n"; } } else { print STDERR "The *data type ", $tmp[1], " is not supported.\n"; print STDERR "Correct line: ", $_[0], "\n"; } # Cope with case of a trailing comment. <svx2xml comment trailing> }
DefinesDATA
,DIVING
,NORMAL
(links are to index).
Whenever a data series starts we will have to close the scope of the series after the data is processed.
<svx2xml survex subroutines>+= (<-U) [<-D->] sub svx_end { # Survex "*end name ; comment" line to XML line. # Might also be just "*end ". my @tmp = split(/\s+/, $_[0]); $writer->endTag("SERIES"); # Restore the default data-type and data-ordering. $data_type = pop(@data_type_stack); %data_order_current = %data_order_default; # TODO restore other environments as we have only restored the default # data type. # DATA TYPE & ORDERING, DEFAULT CALIBRATION CASE SD INFER SET etc #$data_order = pop(@data_order_stack); # Cope with case of a trailing comment. <svx2xml comment trailing> }
DefinesSERIES
,svx_end()
(links are to index).
XML, with its XPointer and XLink capabilities, will be able to provide more
functionality to simple equated stations. Until we have XML browsers the
equates can be implemented as a tag EQUATE
which might be deprecated later.
Information about the equated stations can be placed as text content within the
EQUATE
element.
<svx2xml survex subroutines>+= (<-U) [<-D->] sub svx_equate { # Survex "*equate stn1 stn2 ; comment" line to XML line. my @tmp; @tmp = split(/\s+/, $_[0]); $writer->startTag("EQUATE", "STN1" => $tmp[1], "STN2" => $tmp[2]); $writer->characters("\n"); $writer->endTag("EQUATE"); # Cope with case of a trailing comment. <svx2xml comment trailing> }
DefinesEQUATE
,STN
,svx_equate()
(links are to index).
The above code will convert the Survex statement:*equate 88 ext.1
to an XML line:<EQUATE STN1="88" STN2="ext.1"></EQUATE>
<svx2xml survex subroutines>+= (<-U) [<-D->] sub svx_fix { # Survex *fix line to XML line. # eg. "*fix 3a 1000 2000 700" fixes station 3a to 1000m East, 2000m North # and 700m Height. # $tmp[0] is *fix, $tmp[1] is the station name, # $tmp[2] is E, $tmp[3] is N, $tmp[4] is H. my @tmp = split(/\s+/, $_[0]); $writer->emptyTag("STN", "NAME" => $tmp[1], "EAST" => $tmp[2], "NORTH" => $tmp[3], "HEIGHT" => $tmp[4]); # Cope with case of a trailing comment. <svx2xml comment trailing> }
DefinesEAST
,HEIGHT
,NAME
,NORTH
,STN
,svx_fix()
(links are to index).
<svx2xml survex subroutines>+= (<-U) [<-D->] sub svx_include { # Survex *include line to XML line. # eg. "*include extension.svx # $tmp[0] is "*include", $tmp[1] is "extension.svx" my @tmp = split(/\s+/, $_[0]); $writer->emptyTag("INCLUDE", "filename" => $tmp[1]); # Cope with case of a trailing comment. <svx2xml comment trailing> }
DefinesINCLUDE
,svx_include()
(links are to index).
Instruments can also have a standard deviation which describes their accuracy.
<svx2xml survex subroutines>+= (<-U) [<-D->] sub svx_sd { # Survex "*sd instrument value ; comment" line to XML line. my @tmp = split(/\s+/, $_[0]); # ie. @tmp array will be (*sd, instrument, value, units, ;, comments) # $tmp[1] will be the instrument name # $tmp[2] it's value # $tmp[3] it's units. $tmp[1] =~ tr/a-z/A-Z/; # Make sure instrument name is UPPERCASE. $writer->startTag("INSTRUMENT"); $writer->emptyTag($tmp[1], "SD" => $tmp[2], "UNITS" => $tmp[3]); $writer->endTag("INSTRUMENT"); # Cope with case of a trailing comment. <svx2xml comment trailing> }
DefinesSD
,svx_sd()
,UNITS
(links are to index).
Example Survex data is described by the following:
<example.svx>+= [<-D->] ; From To Dist (m) Compass Elev 85 86 5.42 28 +43 86 87 2.16 0.0 +22 87 88 5.90 343 +1 88 89 3.71 10 -3 90 89 4.3 169 -7 ; back bearing
In XML this will become...
<SERIES NAME="sigma">
<SHOT FROM="85" TO="86" DIST="5.42" AZIM="28" ELEV="+43"/>
<SHOT FROM="86" TO="87" DIST="2.16" AZIM="0.0" ELEV="+22"/>
<SHOT FROM="87" TO="88" DIST="5.90" AZIM="343" ELEV=" +1"/>
<SHOT FROM="88" TO="89" DIST="3.71" AZIM="10" ELEV=" -3"/>
<SHOT FROM="90" TO="89" DIST="10.3" AZIM="169" ELEV=" -7"/> <!-- back bearing -->
</SERIES>
Notice that the attribute name sigma
comes from the *begin
argument and
we no longer need the comment which describes the order of the data columns.
(Also see the Survex *data type order
command.)
The svx2xml
program will append the comment to shot because we have no way
to tell if the comment really does descibe this shot. In a better XML file the
describtion of a shot would become the text content of the shot tag like so;<SHOT FROM="90" TO="89" DIST="10.3" AZIM="69" ELEV=" -7">back bearing</SHOT>
Perl code to process Survex data to XML data is:
<svx2xml survex subroutines>+= (<-U) [<-D] sub svx_survey_data { my ($i, $data_in) = @_; my $line_no; my @tmp = split(/\s+/, $data_in); # ie. @tmp arrray might be (1, 2, 355, +24, 10.6 ; comments...) # Final check - if data type=normal there must be at least 5 elements # to this array. # From...Elev = 5 elements and if there is a comment following # then possibly more than 5 elements. # TODO handle comment 1st then check for number of elements left # for better error pickup. if ( ($data_type eq "normal") && ($#tmp >= 4) ) { # Notice that we access the fields with a hash. See the # subroutine 'svx_datatype'. $writer->emptyTag("SHOT", "FROM" => $tmp[$data_order_current{"from"}], "TO" => $tmp[$data_order_current{"to"}], "DIST" => $tmp[$data_order_current{"tape"}], "AZIM" => $tmp[$data_order_current{"compass"}], "ELEV" => $tmp[$data_order_current{"clino"}]); } elsif ( ($data_type eq "diving") && ($#tmp >= 5) ) { $writer->emptyTag("SHOT", "FROM" => $tmp[$data_order_current{"from"}], "TO" => $tmp[$data_order_current{"to"}], "DIST" => $tmp[$data_order_current{"tape"}], "AZIM" => $tmp[$data_order_current{"compass"}], "FROMDEPTH" => $tmp[$data_order_current{"fromdepth"}], "TODEPTH" => $tmp[$data_order_current{"todepth"}]); } elsif ( ($data_type eq "topofil") && ($#tmp >= 5) ) { # FROM TO FROMCOUNT TOCOUNT [BACK]BEARING [BACK]GRADIENT $writer->emptyTag("SHOT", "FROM" => $tmp[$data_order_current{"from"}], "TO" => $tmp[$data_order_current{"to"}], "FROMCOUNT" => $tmp[$data_order_current{"fromcount"}], "TOCOUNT" => $tmp[$data_order_current{"tocount"}], "AZIM" => $tmp[$data_order_current{"compass"}], "CLINO" => $tmp[$data_order_current{"clino"}]); } else { # Bail out and write data as an XML comment. $line_no = $i + 1; $tmp = "WARNING line $line_no in the Survex file has less than "; $tmp = $tmp."5 data values.\n"; $tmp = $tmp."The line has been commented out in the XML file!.\n"; print STDERR $tmp; $writer->comment($tmp); $writer->characters("\n"); $writer->comment($data_in[$i]); $writer->characters("\n"); } # Cope with case of a trailing comment. <svx2xml comment trailing> }
DefinesAZIM
,DIST
,ELEV
,FROM
,FROMCOUNT
,FROMDEPTH
,SHOT
,TO
,TOCOUNT
,TODEPTH
(links are to index).
Trailing comments are often used to describe the survey stations and anything else the surveyors wanted---especially LRUD information. These will just be converted to XML comments and appended to the data lines.
Note that this function takes two arguments, $i
and $data_in
. The line
number of the input file is passed along so that if there are less than five
data values the line number in the Survex file can be provided to the user
in the warning. (But it won't stop 4 data values and a trailing comment!)
<example.svx>+= [<-D->] ; Station Descriptions ; 85 = Cusp (lower one) of rock at base of 1st drop. ; 86 = Cusp of rock at apex or corner in passage. ; 87 = Cusp of rock in narrow passage. ; 88 = Cusp of rock 1m above stream bed. ; 89 = Southerly-most end of ridge of rock at waist height. ; 90 = Stalagmite? (dropped pendant?) of rock in muddy chamber.
Survex allows data to be nested with the *begin and *end commands storing the
values of the current settings such as instrument calibration, prefix etc.
This part of the example Survex file tests the svx2xml
conversion for
nesting, use of the *equate
command and *data
ordering.
<example.svx>+= [<-D->] *equate 88 ext.1 *begin ext ; Start new extension. Jim did the data ordering differently. *data normal from to compass clino tape ; From To Azim Clino Dist 1 2 240 -10 3.0 2 3 228 -2 2.1 3 4 189 -5 2.9 ; Here the divers did the short connection *begin *sd compass 5 degrees *data diving from to tape compass fromdepth todepth ; From To Dist Azim FromDepth ToDepth 4 5 0.6 160 0 0.2 ; estimated leg 6 5 1 15 0.2 0 *end 6 7 100 +10 3.4 7 8 100 +12 2.0 *end ext *equate ext.8 86
Finally we still have to close the original opening *begin sigma
<example.svx>+= [<-D] *end sigma
The example Survex file can be extracted as described in Table [<-].
The program xml2svx
converts CaveScript
format XML files into Survex format files. Information is lost in this process.
The CaveScript tags that can be converted to survex commands
in this version of xml2svx
are listed in Table [->].
Element | Status of Support |
Comments | supported at start or end of lines |
INSTRUMENT | only declination and zero correction for instruments (not scale correction) |
TAPE | |
COMPASS | |
CLINO | |
EQUATE | will be deprecated by XPointers and XLinks |
SERIES | supported |
STN | supported |
SHOT | supported |
The Perl module XML::Parser Version 2.27 by Larry Wall and Clark Cooper is used.
<xml2svx>= #!/usr/bin/perl -w # XML to Survex converter # This program is generated from noweb documentation. <gnu copyright> # Usage is: xml2svx xml_file > svx_file use strict; use XML::Parser; # see perldoc XML::Parser my $p1; # Instance of an XML parser my $string; my $file_xml = shift; # The stack of station names built up as we encounter series after series # of cave surveys. my (@series, $last_series); my $semi_needed = 1; # First check that we have a well formed document. # If no style is specified the parser will just check for well-formedness. $p1 = new XML::Parser(); if (!$p1->parsefile($file_xml)) { print "Document $file_xml not well-formed!\n"; exit(0); } # Set style to Subs. $p1 = new XML::Parser(Style => 'Subs'); # XML declarations and Doctypes aren't needed by Survex but we will # save them with the Survex file for version information. #$p1->setHandlers( XMLDecl => \&handle_decl, # Doctype => \&handle_doctype); $p1->setHandlers( Comment => \&handle_xmlcomment); $p1->setHandlers(Char => \&handle_char); #$p1->setHandlers(Default => \&handle_default); # temp remove date as it crashes Perl on M$ Windows # print "; This file was generated from xml2svx on ".`date`; print "; This file was generated from xml2svx; $p1->parsefile($file_xml); #################################################### # Functions for handling tags in Cave Script XML #################################################### sub handle_decl { my ($p, $Version, $Encoding, $Standalone) = @_; print "; XML: Ver=$Version\n"; } sub handle_doctype { my ($p, $Name, $Sysid, $Pubid, $Internal) = @_; print "; Sys=$Sysid\n"; } sub handle_xmlcomment { my ($p, $string) = @_; $string = trim_whitespace($string); $string = trim_semicolon($string); print "; $string\n"; } sub handle_char { my ($p, $string) = @_; # remove leading and trailing whitespace - including newlines $string = trim_whitespace($string); # If the string was just white space it will now be null. # If the string is not null print it. if ($string ne "") { if ($semi_needed == 1) { print " ; $string"; # Needs a semicolon. } elsif ($semi_needed == 0) { print " $string"; # Doesn't need a semicolon. } else { # we should never be here print "\nError in handler_char()"; } } # else string is null so don't even create a new line. } sub handle_default { # covers situation where there is no registered handler my ($p, $string) = @_; if ($string eq "") { return; } my $line = $p->current_line; print "\n; No support for ", $string; } sub CAVESURVEY { $semi_needed = 1; } sub CAVESURVEY_ { $semi_needed = 0; print "\n"; } sub AREA { my ($p, $element, %attr) = @_; print "\n; AREA"; if ($attr{"NAME"}) { print "\n; Area Name: ", $attr{"NAME"}; $semi_needed = 0; } if ($attr{"DECLINATION"}) { print "\n*calibrate declination ", $attr{"DECLINATION"}; $semi_needed = 1; } } sub AREA_ { print "\n"; } sub CAVE { my ($p, $element, %attr) = @_; print "\n; CAVE"; if ($attr{"NAME"}) { print "\n; Cave Name: ", $attr{"NAME"}; } if ($attr{"TAG"}) { print "\n; Cave Tag: ", $attr{"TAG"}; } $semi_needed = 0; } sub CAVE_ { print "\n"; } sub DATE { print "\n; DATE"; } sub DATE_ { print "\n"; } sub SURVEYDATE { my ($p, $element, %attr) = @_; print "\n; Survey Date: "; date_format(%attr); } sub SURVEYDATE_ { #print "\n"; } sub CREATIONDATE { my ($p, $element, %attr) = @_; print "\n; Creation Date: "; date_format(%attr); } sub CREATIONDATE_ { #print "\n"; } sub MODIFICATIONDATE { my ($p, $element, %attr) = @_; print "\n; Modification Date: "; date_format(%attr); } sub MODIFICATIONDATE_ { #print "\n"; } sub SURVEYORS { print "\n; SURVEYORS"; } sub SURVEYORS_ { print "\n"; } sub SURVEYOR { my ($p, $element, %attr) = @_; if ($attr{"NAME"}) { print "\n; Surveyor: ", $attr{"NAME"}; } if ($attr{"AFFILIATION"}) { print "\n; Affiliation: ", $attr{"AFFILIATION"}; } $semi_needed = 1; } sub SURVEYOR_ { #print "\n"; } sub INSTRUMENT { print "\n; INSTRUMENT"; } sub INSTRUMENT_ { print "\n"; } sub instrument_type { my ($instrument, %attr) = @_; if ($attr{"ID"}) { print "\n; $instrument ID: ", $attr{"ID"}; } if ($attr{"UNITS"}) { print "\n; $instrument units: ", $attr{"UNITS"}; } if ($attr{"USED"}) { print "\n; $instrument used: ", $attr{"USED"}; } if ($attr{"ZERO_CORRECT"}) { print "\n*calibrate $instrument ", $attr{"ZERO_CORRECT"}; } if ($attr{"SD"}) { print "\n*sd $instrument ", $attr{"SD"}; } if (!keys(%attr)) { print "\n; $instrument: "; } $semi_needed=1; } sub TAPE { my ($p, $element, %attr) = @_; instrument_type("Tape", %attr); } sub TAPE_ { #print "\n"; } sub COMPASS { my ($p, $element, %attr) = @_; instrument_type("Compass", %attr); } sub COMPASS_ { #print "\n"; } sub CLINO { my ($p, $element, %attr) = @_; instrument_type("Clino", %attr); } sub CLINO_ { #print "\n"; } sub SERIES { my ($p, $element, %attr) = @_; if ($attr{"NAME"}) { print "\n*begin ", $attr{"NAME"}; push(@series, $attr{"NAME"}); } else { print "\n*begin"; push(@series, " "); } $semi_needed=1; } sub SERIES_ { $last_series = pop(@series); print "\n*end ", $last_series; } sub STN { my ($p, $element, %attr) = @_; print "\n; Stn: ", $attr{"NAME"}, " "; $semi_needed=0; } sub STN_ { #print "\n"; } sub SHOT { my ($p, $element, %attr) = @_; print "\n", $attr{"FROM"}, "\t", $attr{"TO"}, "\t"; print $attr{"DIST"}, "\t", $attr{"AZIM"}, "\t", $attr{"ELEV"}, " "; $semi_needed=1; } sub SHOT_ { #print "\n"; } sub EQUATE { my ($p, $element, %attr) = @_; print "\n*equate ",$attr{"STN1"}, " ", $attr{"STN2"}, " "; $semi_needed=1; } sub EQUATE_ { #print "\n"; } #################################################### # Miscellaneous functions #################################################### sub trim_semicolon { my $string = $_[0]; $string =~ s/^;//; # remove leading semicolon return $string; } sub trim_whitespace { my $string = $_[0]; $string =~ s/^\s*//; # remove leading whitespace $string =~ s/\s*$//; # remove trailing whitespace+newline # This will also have removed the trailing newline. return $string; } sub date_format { my %attr = @_; if ($attr{"DAY"}) { print $attr{"DAY"}; } else { print "??"; } print "-"; if ($attr{"MONTH"}) { print $attr{"MONTH"}; } else { print "??"; } print "-"; if ($attr{"YEAR"}) { print $attr{"YEAR"}; } else { print "????"; } print " (dd-mm-yyyy)" } #################################### ### Not used #################################### sub handle_start { my ($p, $element, %attr) = @_; my $line = $p->current_line; print "$line START $element\n"; } sub handle_end { my ($p, $element) = @_; my $line = $p->current_line; print "$line END $element \n"; }
<misc subroutines>= (<-U) sub read_stdin { my @data_in; while (<>) { s/^\s*//; # get rid of all leading spaces chomp; # get rid of newlines push (@data_in, $_); } return @data_in; } # Function for debugging only. sub print_array { # Print the array. my ($i, $array_size, @array); @array = @_; for ($i=0; $i<=$#array; $i++) { printf "$array[$i]\n"; } $array_size = @array; return $array_size; }
example.svx
for testing svx2xml
example2.xml
for testing xml2svx
<example2.xml>= <?xml version="1.0" encoding="UTF-8" standalone="no"?> <!DOCTYPE CAVESURVEY SYSTEM "CaveSurvey.dtd"> <CAVESURVEY> <HEAD> <AREA NAME="Wombeyan Caves, NSW" DECLINATION="-11.0" /> <CAVE NAME="Sigma Cave" TAG="W15" >The cave is located on the hillside.</CAVE> <DATE><SURVEYDATE YEAR="1974" MONTH="09" /> <CREATIONDATE YEAR="1998" MONTH="01" DAY="10" /> <MODIFICATIONDATE YEAR="1999" MONTH="06" DAY="22" /></DATE> </HEAD> <SURVEYORS> <SURVEYOR NAME="Mike Lake" AFFILIATION="SUSS" /> <SURVEYOR NAME="Jill Rowling" AFFILIATION="SUSS" /> </SURVEYORS> <INSTRUMENT> <TAPE ID="SUSS11" ZERO_CORRECT="+0.1" >30m fibreglass</TAPE> <TAPE ID="Jill">6m steel tape for hard to get places</TAPE> <COMPASS ID="SUSS1">Suunto Twin</COMPASS> <CLINO ID="SUSS2"/> </INSTRUMENT> <SERIES NAME="sigma"> <STN NAME="90" EAST="1240" NORTH="3512" HEIGHT="700.0" /> <STN NAME="85">Cusp (lower one) of rock at base of 1st drop.</STN> <STN NAME="86">Cusp of rock at apex or corner in passage.</STN> <STN NAME="87">Cusp of rock in narrow passage.</STN> <STN NAME="88">Cusp of rock 1m above stream bed.</STN> <STN NAME="89">Southerly-most end of ridge of rock at waist height.</STN> <STN NAME="90">Stalagmite? (dropped pendant?) of rock in muddy chamber.</STN> <SHOT FROM="85" TO="86" DIST="5.42" AZIM="28" ELEV="+43" /> <SHOT FROM="86" TO="87" DIST="2.16" AZIM="0.0" ELEV="+22" /> <SHOT FROM="87" TO="88" DIST="5.90" AZIM="343" ELEV="+1" /> <SHOT FROM="88" TO="89" DIST="3.71" AZIM="10" ELEV="-3" /> <SHOT FROM="90" TO="89" DIST="4.3" AZIM="169" ELEV="-7" >back bearing</SHOT> <EQUATE STN1="88" STN2="ext.1">We are sure about this.</EQUATE> <SERIES NAME="ext"> Start new extension <SHOT FROM="1" TO="2" DIST="3.0" AZIM="240" ELEV="-10" /> <SHOT FROM="2" TO="3" DIST="2.1" AZIM="228" ELEV="-2" /> <SHOT FROM="3" TO="4" DIST="2.9" AZIM="189" ELEV="-5" /> <SERIES> This is a tight crawl that we can't easily survey through so we guestimated it. <INSTRUMENT><COMPASS SD="5" UNITS="degrees" />downgrade leg accuracy</INSTRUMENT> <SHOT FROM="4" TO="5" DIST="0.6" AZIM="160" ELEV="0" >estimated leg</SHOT> <SHOT FROM="6" TO="5" DIST="1" AZIM="15" ELEV="1" /> </SERIES> <SHOT FROM="6" TO="7" DIST="3.4" AZIM="100" ELEV="+10" /> <SHOT FROM="7" TO="8" DIST="2.0" AZIM="100" ELEV="+12" /> </SERIES> <EQUATE STN1="ext.8" STN2="86"/> </SERIES> </CAVESURVEY>
[1] Literate Programming Using Noweb
A. Johnson &B. Johnson
Linux Journal
Issue 42, October 1997
[2] Inside XML DTDs
S. St. Laurent &R. Biggar
McGraw-Hill 1999