DLMF, LaTeXML and some lessons learned

Saturday, December 9, 2006 - 10:45am - 11:15am
EE/CS 3-180
Bruce Miller (National Institute of Standards and Technology)
The Digital Library of Mathematical Functions is nearing public release. While our situation is unique — a moderate sized, controlled, collection of documents intended as a reference work — perhaps our experiences can help others in related endeavours. After providing an update of DLMF and LaTeXML (the tool used in its creation), I would like to talk about some of the lessons learned, and promote discussion about things the community can do to make these kinds of projects both easier and better in the future.

Digital Libraries within the context of Mathematical Knowledge Management might fall into two categories: more formal, behind-the-scenes, computer-oriented libraries (or databases) and looser, more traditional reader-oriented libraries. Without in the least diminishing the former, I will concentrate on the issues of the latter. Such libraries consist not only of mathematical content, but mixed with text, tables, figures etc. We must be able to project such libraries into attractive, readable, accessible web sites. The usefulness of the data will be greatly enhanced if we can fully expose the implied interconnections between symbols and concepts found in such documents. While it is often a requirement that a high-quality printed format be available, such sites can be much more than Books on the web.

As the content is not automatically generated by (say) a theorem prover, convenient document authoring is a major concern. Useful interrelations, enhancements and meta-data are seldom automatically infer-able. Thus, both the desire and means to provide this information is needed: significant buy-in by authors and editors; as well as additional markup. With hand-authored documents, there is also the issue of getting sufficiently unambiguous representation of the mathematics while preserving the authors intended presentation. We have adopted LaTeX as the document source language, and thus need to go beyond the typical TeX markup with more semantic macros and declarations. While we are beginning to see success in mathematical search, such search must contend with this mixed representation and with the typical sloppiness of user queries.

Looking towards the future, we would all like to leverage each others ideas and tools. Perhaps some standardization of rich document formats is possible; should it be based on proprietary or ad hoc formats; OMDoc or DocBook? For those in the niche within a niche of LaTeX based Digital Libraries, development and wider adoption of more semantic math markup would be helpful, whatever tools are being used to process it.