From the previous section it is evident that the way to handle user
commands and environments is to add perl code into the system or
personal configuration files, as also discussed in Section
. One can include as well a file with new
definitions on the command line using the -init_file option.
To give a taste of how commands and environments are handled by LaTeX2HTML , we provide a few simple examples that nevertheless clearly show the powerful techniques used to generate HTML documents that preserve the information present in the original LaTeX document.
Let us first consider a LaTeX command (Ucom) used to tag commands that have to be typed by the user on the keyboard. A possible definition using the HTML tag <KBD> for keyboard input is:
sub do_cmd_Ucom {
local($_) = @_;
s/$next_pair_pr_rx//o;
join('',qq+<KBD>$&</KBD>+,$_);
}
The perl variable $next_pair_pr_rx contains the substitution
pattern that extracts the string of characters
surrounded by the following pair of delimiters.
The string of characters and the delimiters
are eliminated and the string is then copied between the
HTML <KBD> and </KBD> appended to the output stream.
Similarly, one can translate the argument of a URL command (containing a Universal Resource Locator) into an HTML anchor, as shown below:
sub do_cmd_URL {
local($_) = @_;
s/$next_pair_pr_rx//o;
join('',"<a href=\"$&\">$&</a>",$_);
}
This procedure creates a link to the specified URL by returning an
anchor with the URL as its target and an anchor description along with
the rest of the as yet unprocessed document.
Our next example shows an enumerated list EnumZW of a special
type whose ``numbers'' are icons available on a WWW server. The
name of the icon depends on the value of the perl variable
count, which is incremented for each item command used
inside the EnumZW environment. Everything takes place inside an
HTML description list <DL>.
sub do_env_EnumZW {
local($_) = @_;
local($count) = 0;
s|\\ item|do {++$count; qq!<DT><IMG ALIGN=TOP ALT=""
SRC="http://somewhere/icons/circled$count.xbm"><DD>!}|eog;
"<DL COMPACT>$_</DL>";
}
Two or more arguments can also be handled graciously, as shown by the following two commands, which have two and three arguments, respectively, and are typeset by LaTeX as follows:
Commandarg1
Commandarg1arg2
The translation in perl is straighforward, since one must merely extract the relevant arguments from the input stream, one after the other.
sub do_cmd_BDefCm { # \BDefCm{Command}{arg1}
local($_) = @_;
s/$next_pair_pr_rx//o; $command = $&;
s/$next_pair_pr_rx//o; $mandatory1 = $&;
join('',"<strong>\\ $command\{$mandatory1\}<\/strong>", $_);
}
sub do_cmd_BDefCom { # \BDefCom{Command}{arg1}{arg2}
local($_) = @_;
s/$next_pair_pr_rx//o; $command = $&;
s/$next_pair_pr_rx//o; $optional1 = $&;
s/$next_pair_pr_rx//o; $mandatory1 = $&;
join('',"<strong>\\ $command\[$optional1\]\{$mandatory1\}<\/strong>", $_);
}
Explaining all this perl code would lead us a little too far, but it should be fairly clear by now that before trying to develop new code for LaTeX2HTML it is a good idea to study in detail the way Nikos Drakos coded his program, not only in order to write perl code compatible with his conventions, but also as a source of inspiration for one's own extensions. Below we show definitions for frequently-occurring regular expressions in the LaTeX2HTML perl code.
$delimiters = '\'\\ s[\\ ]\\ \\ <>(=).,#;:~\/!-';
$delimiter_rx = "([$delimiters])";
# $1 : br_id
# $2 : <environment>
$begin_env_rx = "[\\ \\ ]begin\\ s*$O(\\ d+)$C\\ s*([^$delimiters]+)\\ s*$O\\ 1$C\\ s*";
$match_br_rx = "\\ s*$O\\ d+$C\\ s*";
$optional_arg_rx = "^\\ s*\\ [([^]]+)\\ ]"; # Cannot handle nested []s!
# Matches a pair of matching brackets
# $1 : br_id
# $2 : contents
$next_pair_rx = "^[\\ s%]*$O(\\ d+)$C([\\ s\\ S]*)$O\\ 1$C";
$any_next_pair_rx = "$O(\\ d+)$C([\\ s\\ S]*)$O\\ 1$C";
$any_next_pair_rx4 = "$O(\\ d+)$C([\\ s\\ S]*)$O\\ 4$C";
$any_next_pair_rx5 = "$O(\\ d+)$C([\\ s\\ S]*)$O\\ 5$C";
# $1 : br_id
$begin_cmd_rx = "$O(\\ d+)$C";
# $1 : largest argument number
$tex_def_arg_rx = "^[#0-9]*#([0-9])$O";
# $1 : declaration or command or newline (\\ )
$cmd_delims = q|-#,.~/\'`^"=|; # Commands which are also delimiters!
# The tex2html_dummy is an awful hack ....
$single_cmd_rx = "\\ \\ ([$cmd_delims]|[^$delimiters]+|\\ \\ |(tex2html_dummy))";
# $1 : description in a list environment
$item_description_rx =
"\\ \\ item\\ s*[[]\\ s*((($any_next_pair_rx4)|([[][^]]*[]])|[^]])*)[]]";
$fontchange_rx = 'rm|em|bf|it|sl|sf|tt';
# Matches the \caption command
# $1 : br_id
# $2 : contents
$caption_rx = "\\ \\ caption\\ s*([[]\\ s*((($any_next_pair_rx5)|([[][^]]*[]])|[^]])*)[]])?$O(\\ d+)$C([\\ s\\ S]*)$O\\ 8$C";
# Matches the \htmlimage command
# $1 : br_id
# $2 : contents
$htmlimage_rx = "\\ \\ htmlimage\\ s*$O(\\ d+)$C([\\ s\\ S]*)$O\\ 1$C";
# Matches a pair of matching brackets
# USING PROCESSED DELIMITERS;
# (the delimiters are processed during command translation)
# $1 : br_id
# $2 : contents
$next_pair_pr_rx = "^[\\ s%]*$OP(\\ d+)$CP([\\ s\\ S]*)$OP\\ 1$CP";
$any_next_pair_pr_rx = "$OP(\\ d+)$CP([\\ s\\ S]*)$OP\\ 1$CP";
# This will be used to recognise escaped special characters as such
# and not as commands
$latex_specials_rx = '[\$]|&|%|#|{|}|_';
# This is used in sub revert_to_raw_tex before handing text to be processed by latex.
$html_specials_inv_rx = join("|", keys %html_specials_inv);
# This is also used in sub revert_to_raw_tex
$iso_latin1_character_rx = '(&#\d+;)';
# Matches a \begin or \end {tex2html_wrap}. Also used be revert_to_raw_tex
$tex2html_wrap_rx = '[\\ \\ ](begin|end)\s*{\s*tex2html_wrap[_a-z]*\s*}';
$meta_cmd_rx = '[\\ \\ ](renewcommand|renewenvironment|newcommand|newenvironment|newtheorem|def)';
# Matches counter commands - these are caught ealry and are appended to the
# file that is passed to latex.
$counters_rx ="[\\ \\ ](newcounter|addtocounter|setcounter|refstepcounter|stepcounter|".
"arabic|roman|Roman|alph|Alph|fnsymbol)$delimiter_rx";
# Matches a label command and its argument
$labels_rx = "[\\ \\ ]label\\ s*$O(\\ d+)$C([\\ s\\ S]*)$O\\ 1$C";
# Matches environments that should not be touched during the translation
$verbatim_env_rx = "\\ s*{(verbatim|rawhtml|LVerbatim)[*]?}";
# Matches icon markers
$icon_mark_rx = "<tex2html_(" . join("|", keys %icons) . ")>";
# Frequently used regular expressions with arguments
sub make_end_env_rx {
local($env) = @_;
$env = &escape_rx_chars($env);
"[\\ \\ ]end\\ s*$O(\\ d+)$C\\ s*$env\\ s*$O\\ 1$C";
}
sub make_begin_end_env_rx {
local($env) = @_;
$env = &escape_rx_chars($env);
"[\\ \\ ](begin|end)\\ s*$O(\\ d+)$C\\ s*$env\\ s*$O\\ 2$C(\\ s*\$)?";
}
sub make_end_cmd_rx {
local($br_id) = @_;
"$O$br_id$C";
}
sub make_new_cmd_rx {
"[\\ \\ ](". join("|", keys %new_command) . ")"
if each %new_command;
}
sub make_new_env_rx {
local($where) = @_;
$where = &escape_rx_chars($where);
"[\\ \\ ]$where\\ s*$O(\\ d+)$C\\ s*(".
join("|", keys %new_environment) .
")\\ s*$O\\ 1$C\\ s*"
if each %new_environment;
}
sub make_sections_rx {
local($section_alts) = &get_current_sections;
# $section_alts includes the *-forms of sectioning commands
$sections_no_delim_rx = "\\ \\ ($section_alts)";
$sections_rx = "\\ \\ ($section_alts)$delimiter_rx"
}
sub make_order_sensitive_rx {
local(@theorem_alts, $theorem_alts);
@theorem_alts = ($preamble =~ /\\ newtheorem\s*{([^\s}]+)}/og);
$theorem_alts = join('|',@theorem_alts);
$order_sensitive_rx =
"(equation|eqnarray|caption|ref|counter|\\ \\ the|\\ \\ stepcounter" .
"|\\ \\ arabic|\\ \\ roman|\\ \\ Roman|\\ \\ alph|\\ \\ Alph|\\ \\ fnsymbol)";
$order_sensitive_rx =~ s/\)/|$theorem_alts|/ if $theorem_alts;
}
sub make_language_rx {
local($language_alts) = join("|", keys %language_translations);
$setlanguage_rx = "\\ \\ setlanguage{\\ \\ ($language_alts)}";
$language_rx = "\\ \\ ($language_alts)TeX";
}
sub make_raw_arg_cmd_rx {
# $1 : commands to be processed in latex (with arguments untouched)
$raw_arg_cmd_rx = "\\ \\ (" . &get_raw_arg_cmds . ")([$delimiters]+|\\ \\ |#|\$)";
}
# Creates an anchor for its argument and saves the information in the array %index;
# In the index the word will use the beginning of the title of
# the current section (instead of the usual pagenumber).
# The argument to the \index command is IGNORED (as in latex)
sub make_index_entry {
local($br_id,$str) = @_;
# If TITLE is not yet available (i.e the \index command is in the title of the
# current section), use $ref_before.
$TITLE = $ref_before unless $TITLE;
# Save the reference
$str = "$str###" . ++$global{'max_id'}; # Make unique
$index{$str} .= &make_half_href("$CURRENT_FILE#$br_id");
"<A NAME=$br_id>$anchor_invisible_mark<\/A>";
}