Su-Shing Chen (Department of Computer & Information Science & Engineering, University of Florida) email@example.com
Based on earlier results, I will describe some ideas of indexing mathematical abstracts or papers by metadata and ontology. Metadata includes existing subject classification schemes and some recent metadata for electronic records. Ontology is a different approach to index abstracts by clustering them into an information visualization interface so that users may select using ontology as well as metadata.
Timothy W. Cole (Library Administration, University of Illinois at Urbana-Champaign) firstname.lastname@example.org
Two automated approaches are being investigated. In the first approach we extract all occurrences of MathML contained in full-text of articles included in a sample corpus of XML-encoded sci-tech journal literature published by ACM, AIP, and IEEE-CS (articles include legacy SGML ISO 12083 math fragments previously converted to MathML). We then filter and normalize those MathML fragments recognized as potentially useful for search and discovery, adding the normalized fragments to qualified Dublin Core metadata records describing the articles. The second approach adopts the hierarchical browse vocabulary of the Wolfram Functions Website as a descriptive metadata controlled vocabulary. Function name strings from this vocabulary which occur in a journal article are added to its metadata record, along with the frequency of occurrence. These approaches are seen as having the potential to enhance discoverability of journal articles and facilitate linkages between journal literature and reference mathematics literature (e.g., the Wolfram Functions Website).
James Crowley (SIAM) email@example.com
A Publisher's Perspective on Searching and Metadata
There is diverse array of solution that various publishers, especially scientific societies, are seeking to provide better searching capabilities to the on-line journal literature. Each of these approaches promise improved capability, but come with costs. These will be discussed from the perspective of a scientific society pubisher.
Matthias Graefenhan (Department of Mathematics and Computer Science, University of Marburg) Matthias@Graefenhan.de
We present an XML based system of mathematical documents currently being developed at the University of Marburg, Germany, which aims at a comprehensive and systematic description of all aspects of pure mathematics. The system consists of numerous documents each devoted to one mathematical topic, which are organized in a highly coherent way. This is achieved through the following features:
1. uniform symbolic notation for all mathematical and logical objects, based on specially created symbols
2. treelike arrangement of the single documents (considered as atomic elements) in order to find each document via a unique path; freely d efinable further arrangements, e.g. cross references or collections of documents for classroom use
3. elaborate network of interconnections between the atomic elements
The structure of the notation mentioned above enables us to perform searching without the need for extra metadata.
Laurent Guillopé (Cellule MathDoc (CNRS/Université Joseph Fourier, Grenoble) & Université de Nantes) Laurent.Guillope@math.univ-nantes.fr
Exchange and Fusion
The NUMDAM program is a component of the World Digital Mathematics Library (WDML). The metatata description of its content is the basis for efficient navigation on the webbed WDML : of particular importance are links in both directions, from NUMDAM papers to related documents (reviews, cited articles,...) as from bibliographical databases, digital archives and preprints databases. These linkings require free metadata availability: the convenient tools (OAI server, lookup engines,...) may be further reused to merge metatada sets for building partial slices of the WDML. Current projects worked by the Cellule MathDoc of such gateways will be discussed.
Nigel Kerr (JSTOR) firstname.lastname@example.org
JSTOR has a large body of data, in Mathematics journals and beyond, that has been historically encoded in LaTeX snippets, in the attempt to accurately reproduce information from the print articles. This strategy has its faults, as does some of JSTOR's LaTeX data itself. JSTOR is at a cross-roads of data migration and systems rebuild, and wants to try to Do The Right Thing. This talk is a description of the pressures and challenges we're aware of, and a request for advice and comment about what JSTOR could do for mathematical content.
Heinz Kröger (FIZ Karlsruhe - Zentralblatt MATH -) email@example.com
We describe the present structure and services of Zentralblatt MATH. Then we look at how Zentralblatt MATH is embedded in a European environment to serve the mathematics community in the future. New trends in retrieval and data presentation are touched upon.
Bernard F. Schutz (Max Planck Institute for Gravitational Physics (Albert Einstein Institute)) Bernard.Schutz@aei.mpg.de
As a part of the European Union funded research project called MOWGLI, the Albert Einstein Institute (AEI) in Germany has developed a new and very effective package to enable authors to write mathematics for the Web. Called HERMES, this package will not only generate MathML from TeX, but it will allow authors to insert meta-data into the XML environment of the mathematical expressions, which will allow intelligent searches to be performed on documents. The package is being tested by a major mathematics journal, and it will be used by the AEI's own web journal Living Reviews in Relativity to produce fully indexed mathematical documents.
There are several levels of digitization of mathematics:
level 1: bitmap images of printed materials (e.g. GIF, TIFF),
level 2: searchable digitized document (e.g. PDF with hidden text),
level 3: structured document with links (e.g. HTML(+MathML), LATEX),
level 4: (partially) executable document (e.g. Mathematica, Maple),
level 5: formally presented document. (e.g. Mizar, OMDoc) Currently most of mathematical knowledge is stored and used mainly in printed materials (level 1) like books or electronic journals.
For being used actively it is preferable that mathematical text is stored in possibly a higher level of digitization. However, making documents digitized to a higher level needs quite a lot of efforts. The aim of the talk is an overview of key technologies from level 1 to level 3, present state and future problems. The results of our research in this paradigm can be found in the web site: http://infty.math.kyushu-u.ac.jp. Some applications can be downloaded from the site. The talk will include a demonstration of our OCR software to digitize mathematical papers into XML in our original format, LaTeX source files and HTML files with mathematical notations in MathML.
Michael Trott (Wolfram Research, Inc.) firstname.lastname@example.org
In this talk I will give an overview over the Wolfram Functions site. The website functions.wolfram.com is generated from a set of Mathematica notebooks. Mathematica notebooks are structured ASCII files, that can be processed and manipulated by the Mathematica kernel. The notebooks contain about 90,000 mathematical formulas about elementary and special functions in typeset form. Because the formulas are readable and "understandable" by Mathematica, it is possible for the software to completely analyze and classify them with respect to their mathematical structure and occurring functions. A first version of a mathematical search interface to be deployed on the website will be shown and demonstrated.
Abdou Youssef (Department of Computer Science, The George Washington University) email@example.com
Worldwide efforts are underway to create digital libraries of mathematical contents, such as the Digital Library of Mathematical Functions (DLMF) at the National Institute of Standards and Technology. A fundamental goal of such libraries is to enable users to search not only for text, but also for equations. The mature information retrieval (IR) technology is primarily for text contents. When applied to math search, text IR is inadequate because of its inability to understand mathematical symbols and structures. In this talk, we will identify the issues of building an advanced Math search system, and present techniques for addressing those issues. Some of the techniques are based on current text search technology, while others will be based on emerging XML-based technologies. Some of the math search capabilities that we have already for DLMF developed will be demonstrated in the talk.