Millions of document exist in electronic form. Reuse of the large investment in this knowledge base is an important point.
New documents can exploit the advantages of the latest technologies and adapt readily to the target audience (e-commerce, scientific, dictionaries, etc.).
Several presentations of the same source are often a must, and ad hoc and different techniques have to be used to optimally exploit the display possibilities of the medium (browser, print, audio-video).
Their exist a variety of application domains. Scientific articles, financial data, or ancient Greek poetry need different tools and target different audience.
The developers of XML have taken into account the lessons learned in the last decade using SGML and HTML.
By construction XML is (should be) an ideal tool for dealing with most kinds of data and (multi-lingual) source documents (based on Unicode, seamless integration with modern languages, such as Java, perl, and python).
Has gotten support from all corners of the Internet world: Open Source people as well as commercial players.
Many free (and not-so-free) tools are available for all conceivable operating systems and purposes. Soon HTML-only browsers will only be a (bad) memory, and most Internet tools will support XML natively.
Often typographic quality is a must.
The print button of most present-day HTML browsers does not in general give high quality printable copy.
Special and separate procedures are used to prepare the printable output for XML documents.
With SGML and DSSSL James Clark's Jade (nowadays being extended by members of the DSSSL community as open source library under the name of Openjade, see ).
Jade reads DSSSL style sheets and provides several back ends (FOT, TeX, *ML, RTF).
Sebastian Rahtz' (David Megginson) jadetex generates a printable document (PS, PDF).
We can run the above XML file with an XSL processor (such as xt or LotusXSL) to generate FOs, and then use FOP (see James Tauber's presentation) or PassiveTeX to generate PDF.




PassiveTeX is a library of TeX macros which can be used to process an XML document resulting from an XSL transformation to formatting objects.
PassiveTeX provides a rapid development environment for experimenting with XSL FO, using a reliable pre-existing formatter.
Running PassiveTeX with the pdfTeX variant of TeX generates high-quality PDF files in a single operation.
PassiveTeX shows how TeX can remain the formatter of choice for XML, while hiding the details of its operation from the user.
PassiveTeX is available at http://users.ox.ac.uk/~rahtz/passivetex/.
PassiveTeX derives from and builds on:
typehtml, a LaTeX package by David Carlisle, used to typeset HTML directly using LaTeX;
jadetex, a package by Sebastian Rahtz that implements the output of the Jade DSSSL processor's TeX backend;
A UTF-8 handler by David Carlisle, in conjunction with the catalogue of Unicode/TeX mappings built up for jadetex.
The system components, which probably will be rewritten separately, are:
TeX macro library to parse XML input, deal with attributes, entities, etc.;
set of macros instantiating the formatting object elements and attributes;
Macros to parse UTF-8 input;
Macros to parse MathML elements;
Macros to map Unicode to TeX font layouts.
rapid development;
well-understood, robust, stable, and freely available page formatter;
fonts, graphics inclusion, hyperlinks, etc. come for free;
mature handling of language issues, including hyphenation;
high-quality math rendering (TeX's raison d'être);
pdfTeX variant generates very high-quality PDF.
constraint to use TeX's page makeup model, and force XSL FO to fit it;
as LaTeX is already high-level markup, it is too easy to allow things to fall through and take LaTeX defaults;
TeX macro writing is obscure and difficult, so that the system is not transparent for most (non-TeX) programmers;
TeX is large and monolithic, and difficult to embed in other applications
TeX seems much like a sledgehammer to crack a nut...
A typical invocation would be on a Unix command line:
# make the special TeX format file (first time only) pdftex -ini "&pdflatex" fotex.tex # run an XSL processor xt file.xml somestyle.xsl file.fo # run pdfTeX on the result; twice, to make sure # referencing and pagination is stable. pdflatex "&fotex" file.fo pdflatex "&fotex" file.fo # look at the result acroread file.pdf |
Most modern TeX implementations contain pdfTeX.
PassiveTeX supports MathML directly. An XSL style sheet can pass <math> and its children through unchanged, as follows:
<xsl:template match="math">
<xsl:apply-templates mode="math"/>
</xsl:template>
<xsl:template mode="math"
match="*|@*|comment()|processing-instruction()|text()">
<xsl:copy>
<xsl:apply-templates mode="math"
select="*|@*|processing-instruction()|text()"/>
</xsl:copy>
</xsl:template>
|
A reasonable subset of presentation MathML is recognized, and produces good output. We show an example later.
No use is made of LaTeX's high-level constructs. No sections, no lists, no cross-references, no bibliographies; on the other hand, some extensions in the fotex: namespace are supported (e.g., to get Acrobat bookmarks).
XSL FOs underlying character set is Unicode; by default, entities are mapped to their Unicode position.
All vertical and horizontal space is explicit in the specification.
TeX only does the page and line breaking, all the rest is up to you.
The XSL FO page model is inherited from DSSSL, and is unproven for production-quality print formatting.
The XSL specification is unfinished and incomplete.
One cannot easily tweak TeX's behaviour with this system.
The table model of XSL is sufficiently far from TeX's that it may require a pre-processor.
Together with FOP, we now have systems to experiment with, and commercial implementations cannot be far behind (can they?).
TeX is close to being a XSL FO-capable formatter.
With the Omega TeX variant (using Unicode internally), we have a native Unicode typesetting system ready and waiting.
XSL FO does not threaten TeX -- it gives it a reason to survive.
Things can only get better!
use teixlite.dtd and write corresponding HTML and FO style sheets (http://users.ox.ac.uk/~rahtz/tei/).
typeset the TEI5 users' guide;
exercise most functions with a torture test.
Docbook
Documents marked up according to Norman Walsh's XML version of the DocBook DTD (http://nwalsh.com/docbook/xml/index.html);
use also Norm's XSL style sheets (http://nwalsh.com/docbook/xsl/index.html);
typeset one of the test documents.
original source is LaTeX;
translated to XML with TeX4ht using LaTeX-like ad hoc DTD and MathML;
write an XSL style sheet to treat the textual elements of the document;
pass through the math components to the back-end; (XSL, and DSSSL up to a point, do not have a sufficient set of FOs to do math correctly, and although the X3C Math WG is talking to the XSL WG, at present the present consensus seems to be that it is best to treat the MathML directly at the end application level);
interpret MathML directly in PassiveTeX.

<section id="vavref">
<stitle>Vavilov theory</stitle>
<par>Vavilov<cite refid="bib-VAVI"/> derived a more accurate
straggling distribution by introducing the kinematic limit on the
maximum transferable energy in a single collision, rather than using
<inlinemath><math><msub><mi>E</mi><mrow><mtext>max</mtext></mrow></msub>
<mo>=</mo><mi>∞</mi></math></inlinemath>.
Now we can write<cite refid="bib-SCH1"/>:
<eqnarray ><subeqn><math><mi>f</mi> <mfenced open='(' close=')'>
<mi>ε</mi><mo>,</mo><mi>δ</mi><mi>s</mi></mfenced>
<mo>=</mo> <mfrac><mrow><mn>1</mn></mrow>
<mrow><mi>ξ</mi></mrow>
</mfrac><msub><mi>φ</mi><mrow><mi>v</mi></mrow></msub>
<mfenced open='(' close=')'>
<msub><mi>λ</mi><mrow><mi>v</mi></mrow></msub><mo>,</mo>
<mi>κ</mi><mo>,</mo><msup><mi>β</mi><mrow><mn>2</mn></mrow>
</msup></mfenced></math></subeqn></eqnarray>
where
<eqnarray><subeqn><math><msub><mi>φ</mi><mrow><mi>v</mi></mrow></msub>
<mfenced open='(' close=')'>
<msub><mi>λ</mi><mrow><mi>v</mi></mrow></msub><mo>,</mo>
<mi>κ</mi><mo>,</mo>
<msup><mi>β</mi><mrow><mn>2</mn></mrow></msup></mfenced>
<mo>=</mo>
<mfrac><mrow><mn>1</mn></mrow>
<mrow><mn>2</mn><mi>π</mi><mi>i</mi></mrow>
</mfrac>
<msubsup><mo>∫</mo>
<mrow><mi>c</mi><mo>+</mo><mi>i</mi><mi>∞</mi></mrow>
<mrow><mi>c</mi><mo>-</mo><mi>i</mi><mi>∞</mi></mrow></msubsup>
<mi>φ</mi><mfenced open='(' close=')'><mi>s</mi></mfenced>
<msup><mi>e</mi><mrow><mi>λ</mi><mi>s</mi></mrow></msup>
<mi>d</mi><mi>s</mi><mspace width='2cm'/><mi>c</mi><mo>≥</mo><mn>0</mn>
</math></subeqn>
<subeqn><math><mi>φ</mi><mfenced open='(' close=')'><mi>s</mi></mfenced>
<mo>=</mo><mo>exp</mo><mfenced open='[' close=']'><mi>κ</mi>
<mrow><mo>(</mo><mn>1</mn><mo>+</mo><msup><mi>β</mi>
<mrow><mn>2</mn></mrow></msup><mi>γ</mi><mo>)</mo></mrow>
</mfenced><mo>exp</mo><mfenced open='[' close=']'><mi>ψ</mi>
<mfenced open='(' close=')'><mi>s</mi></mfenced></mfenced>
<mo>,</mo> </math></subeqn>
|
The original document (formatted by LaTeX)
PDF display optimized for screen viewing
Handle page masters, and running headers and footers, properly;
complete the MathML handling;
handle more variety in property values, such as colors and fonts;
deal with complex tables;
support processing instructions to manipulate TeX formatting directly
<?TEX \enlargethispage{\baselineskip} ?>
|
support of SVG; possible solutions are:
direct intepretation and mapping to raw PDF;
translation to MetaPost, and spawning a MetaPost process;
pre-processing to existing TeX graphics languages.
implement replacement TeX XML parser (being developed by David Carlisle);
use Unicode-based TeX variant (Omega) to handle non-Latin material more naturally;
consolidate present setup to cover complete TEI and Docbook DTDs.
TeX is a reliable output engine that can with rather minimal work provide typographically excellent output;
simple customization is possible in the input source via processing instructions or at the level of TeX using the configuration file fotex.cfg;
formatting of element types of important DTDs, such as TEI and DocBook, works well;
a TeX-based system is a large plus for typesetting scientific XML documents, especially those containing a lot of maths;
by using a well-known and standard DTD, XML can be used as a lingua franca to transport documents between various editing and document handling systems. For instance, at CERN we are testing portability between XML, LaTeX, and FrameMaker/SGML. Probably we could add Microsoft Word, Wordperfect, etc. when these applications will have genuine support for XML and XSL.
![]() | ![]() ![]() | ![]() |