HOME    »    SCIENTIFIC RESOURCES    »    Volumes
Abstracts and Talk Materials
The Evolution of Mathematical Communication in the Age of Digital Libraries
December 8-9, 2006


Jonathan Borwein
http://users.cs.dal.ca/~jborwein/

High performance mathematics and its management
December 8, 2006

Seventy-five years ago Kurt Gödel overturned the mathematical apple cart: he proved entirely deductively that mathematics is not entirely deductive,while holding quite different ideas about legitimate forms of mathematical reasoning: If mathematics describes an objective world just like physics, there is no reason why inductive methods should not be applied in mathematics just the same as in physics. (Kurt Gödel, 1951) This talk provides an introduction to Experimental Mathematics, its theory and its practice. I will focus on the differences between Discovering Truths and Proving Theorems and on the implications for knowledge management and communication. I shall explore various of the computational tools available for deciding what to believe in mathematics, and-using accessible examples-illustrate the rich experimental tool-box mathematicians now have access to. These tools range from web-interfaces and databases to preprint repositories and digital library collections, and prominently include NIST's forthcoming Digital Library of Mathematical Functions. In an attempt to explain how mathematicians may use High Performance Computing (HPC) and what they have to offer other computational scientists, I will touch upon various Computational Mathematics Challenge Problems.

Thierry Bouche
http://www-fourier.ujf-grenoble.fr/~bouche/

NUMDAM
December 31, 1969


Olga Caprotti

Advanced language technologies for mathematical markup
December 9, 2006

Mathematical markup languages like OpenMath and MathML offer the possibility to represent mathematical content in a level of abstraction that is not dependent on localized information. This representation typically focuses on the semantics of the mathematical object and postpones localization aspects of mathematics, such as those influenced by notation and by culture, to the rendering process of the markup. While typesetting of mathematical markup has been the object of a numerous efforts, from MathML-presentation to SVG converters, the rendering of mathematics in a "verbalized" jargon has not yet received similar attention. In this talk, I will present the results of the WebALT EU eContent project concerning the application of language technologies to the automatic generation of text from mathematical markup.

Mathematical jargon is an important aspect of the education of students. Not only does a teacher train pupils in problem solving skills, but she also makes sure that they acquire a proper way of expressing mathematical concepts. To our knowledge, digital eLearning resources have used a representation in which text is intermixed with mathematical expressions even in situations where the actual abstract representation, for instance of the statement of a theorem, can be reduced to a single mathematical object. One reason for this representation choice is that the rendering process would otherwise produce a symbolic, typeset mathematical formula that might prove too difficult to understand for the students or simply just too hard to read. However, by representing this kind of mathematical text in a language-independent format such as the one provided by markup languages, it is possible to apply language technologies that generate the same text in a variety of languages including English, Spanish, Finnish, Swedish, French and Italian.

The project results include editors for mathematical multilingual markup, a web service for generating multiple languages versions and a digital repository of multilingual interactive mathematical exercises and drill questions.

Davide P. Cervone
http://www.math.union.edu/~dpvc/

The current state and future of jsMath
December 8, 2006

JsMath is a means of displaying mathematics in web pages that works across multiple browsers (MSIE, Firefox, Opera, Safari, etc.) and multiple platforms (Windows, unix, Mac OS X). It uses JavaScript, cascading style sheets (CSS), and unicode fonts to render TeX code embedded in an HTML document into typeset mathematics within the browser. Over the past two years, jsMath has found a home within a number of on-line content-generation systems (e.g., bulletin boards, blogs, course-management systems) because it allows participants to enter mathematics in a straight-forward and possibly familiar format while still producing quality output in all the major browsers, both on screen and in print, without the need of complicated installations on the server or extra downloads by the user. This talk will describe some of jsMath's most important features and how these can be controlled by the page author, and will point out some of the issues that need to be addressed when adding jsMath into a content- management system. Finally, we will discuss some of the future plans for jsMath, including incorporation of MathML input and output into jsMath.

Jacques Distler
http://golem.ph.utexas.edu/~distler/

Blogging with MathML
December 8, 2006

Four years ago, I set out to see whether one could used weblogs as an effective vehicle for communicating ideas in Physics. Not knowing any better, I decided to use the much-heralded, but seldom-used web technologies of XHTML+MathML (with the occasional bit of SVG, for good measure). Between my own blog, Musings, and The String Coffee Table, which I host on my server, there are nearly a thousand posts, and many thousands of comments, making them one of the largest collections of MathML on the web.

On the one hand, this was a social experiment, in adopting the web (and weblogs, in particular) as a conduit for scientific communications. On the other hand, it was a technological experiment, in making the technology "easy enough for mere physicists to use." I would like to address both aspects in this talk.

Richard Fateman

How can we speak math?
December 9, 2006

Surprisingly, we can speak mathematics to a computer probably more rapidly and accurately than handwriting. Even better is to speak and use pointing or handwriting. A combination may allow us to identify and cancel errors in one mode or another. In some cases speaking may be more convenient than typing, even for rapid typists: many mathematical symbols missing on the keyboard can be easily spoken. Even without venturing into Greek, handwriting or even typing "fifty million" is probably slower and more error-prone than speaking it.

Pursuing the goal of effectively speaking small pieces of mathematics, we wondered how hard it would be to speak arbitrarily long sections of mathematics, including nested complex expressions.

We first describe programs for the inverse problem: computer generation of mathematical speech. In so doing we find that we need to suggest a few speaking conventions to overcome the unfortunately ambiguous and inconsistent common usages of mathematics.

Then we consider tools and guidelines to make it more plausible for humans to speak full mathematical formulas so they can be recognized by a computer using a speech recognizer program.

We describe our prototype programs which do somewhat less than we propose, but are effective in that speech can either be used alone, or used to fill in boxes (superscripts, etc.) or larger pieces, or for choosing alternatives from plausible symbol recognition from handwriting. We believe the principle barriers to engineering a more complete program can be overcome, though a driving application may be essential for refining prototypes into useful programs. This paper is not intended to be the last word on the subject, but to expose problems and approaches relevant to the task.

Thomas Fischer

Using metadata for the interlinking of digitized mathematics
December 9, 2006

This is work related to communication with Thierry Bouche (Cellule MathDoc and Institut Fourier, Grenoble) and David Ruddy (Cornell University Library)

For the access to mathematical research literature, mathematicians usually employ review journals such as Mathematical Reviews and Zentralblatt der Mathematik. These provide a rich network of interlinked (via references and citations) mathematical sources. To make this even more useful, the review journals could be used as hubs for the linkage not only to printed and born digital material, but also to digitized versions of this literature. An examination of the currently available metadata indicates not only that the present formats do not identify the mathematical literature with sufficient precision, but also that the metadata formats in use are inadequate, unless overloaded with complex syntactical schemes. A richer and more rigid scheme for the expression of the metadata is needed, and different approaches are investigated:
  • The development of a Dublin Core based Application Profile, using qualified Dublin Core and some additional fields to encapsulate the required information.
  • The development of a specialized metadata scheme based on one in use by the French NUMDAM project.
  • A Dublin Core format based on the Dublin Core Abstract Model, using the new DC-XML specification.
  • The usage of OpenURL as a method of reference.


The obvious choice for exchanging these metadata is the OAI Protocol for Metadata Harvesting, and several of the libraries and projects involved in digitizing mathematics have begun to expose metadata records using OAI-PMH, offering several metadata formats, including Dublin Core. The talk will investigate the options to enhance this form of communication to establish a working network of interlinked mathematics and present the state of interlinkage at the SUB Göttingen, using the extensive Mathematica collection of digitized mathematics.

Edward A. Fox
http://fox.cs.vt.edu/

Panel on digital libraries of today
December 31, 1969


Peter Jipsen
http://www1.chapman.edu/~jipsen/

Text-based input formats for mathematical formulas
December 8, 2006

This talk discusses and compares several approaches that can be used to produce mathematical content on the web. Emphasis is placed on the input that the user has to create, in particular on ease-of-use, readability, familiarity, generality, availability and other criteria. While LaTeX to PDF is generally considered the de facto standard in research publications, we will also examine input languages for several computer algebra systems, online assessment/courseware systems and word processors, as well as ASCIIMath.

Since the ASCIIMath language is a relative newcomer (but mostly a refinement of the well known tradition of approximating formulas by ascii characters), the talk includes a brief description and motivation for some of the design decisions. The core of this mathematical input language consists of only 8 lines of BNF, yet it can express most of undergraduate mathematics in a predictable way that generally matches what users expect. In addition the language constructs map in a direct way to a subset of Presentation MathML. E.g. 4/3pir3 translates to

<mfrac><mn>4</mn><mn>3</mn></mfrac><mi>&pi;</mi> <msup><mi>r</mi><mn>3</mn></msup>

and displays in standard typeset form. A JavaScript implementation ASCIIMathML.js parses this syntax and applies the translation in the client's browser. This program has been downloaded by thousands of users in over 90 countries and is currently used in numerous blogs, wikis and course management systems, partly because it enables cross-browser MathML to display in legacy HTML (rather than XHTML) pages. Since the language overlaps with LaTeX and TI-83 syntax, it is familiar to mathematicians and school children, and is also used as input for online calculators. ASCIIMath has been implemented by others in PHP and C#, as well as modified to LaTeXMathML.js. Experiments with an online WYSIWYG HTML editor, called ASciencePad, indicate that ASCIIMath is a convenient alternative to toolbar-driven formula editors. In summary, it appears that an input format like ASCIIMath is a desirable addition to the various ways of creating mathematical content online. Further information can be found at www.chapman.edu/~jipsen/asciimath.html.

Michael Kohlhase
http://kwarc.eecs.iu-bremen.de/kohlhase/

A "Semantic Web" for science and technology communicating the content of mathematics "In the Large"
December 9, 2006

The distributivity of information and services over the Internet has changed all aspects of life. The process of developing and deploying science and technology is no exception to this. Its individual aspects are already supported by a variety of software systems, but the systems are, by and large, not able to inter-operate since they use differing data formats, make differing model assumptions, and are bound to an implicitly given context that is only documented in publications about the systems.

We anticipate a new quality to emerge if humans and systems can inter-operate to cover the whole work-flow of research, education, and application. To further this vision we need to develop, implement, and provide semantic-based and context-aware techniques for acquiring, organizing, processing, sharing, and using knowledge in Science and Technology.

In this talk I will present an alternative vision of a 'Semantic Web' for Science and Technology. Like Tim Berners-Lee's vision we aim to make the Web (here scientific knowledge) machineunderstandable instead of merely machine-readable. However, instead of a top-down metadatadriven approach, which tries to approximate the content of documents by linking them to web ontologies (expressed in terminologic logics), we explore a bottom-up approach and focus on making explicit the intrinsic structure of the underlying scientific knowledge. A connection of documents to web ontologies is still possible, but a secondary effect.

I will make these ideas concrete with the XML-based content/context format for mathematic discourses (OMDoc: Open Mathematical Documents) that supports novel web services by blending formal and natural elements. The core purpose of the OMDoc format is to enable communication of mathematics "in the large." Most current representation formats for mathematics concentrate on representing mathematical formulae and give the representations meaning by providing a fixed context (in the specification). As these contexts only cover specific mathematical areas, we can only cover mathematics "mathematics in the small" with this approach.

OMDoc extends MathML and OpenMath by a rich markup language for (meaning-providing) contexts, which even provides constructs to inter-relate contexts. Thus it can emulate other representation languages by marking up their inscribed contexts, and can act as an interoperability format between languages and even software systems. Taken to the extreme, it can even act as an interoperability format between scientific disciplines; I will discuss this using the the ongoing extension efforts towards STMML2 (Science, Technology & Medical Markup Language).

Aaron Krowne
http://br.endernet.org/~akrowne/

Presentation material
December 31, 1969


Azzeddine Lazrek
http://www.ucam.ac.ma/fssm/rydarab/english/cvlazrek.htm

Multilingual mathematical e-document processing
December 9, 2006

In Arabic handbooks, there are, at least, two models for writing mathematical expression according to the local area and the study level:

  • Latin mathematical presentation as in English or French. Symbols are then imported from one of these European languages writing, according to the dominant cultural influence. Symbolic writing is then running in the opposite direction of the natural language;
  • Arabic mathematical presentation. Specific symbols are used and the writing follows the direction of the natural language handwriting which is from right to left. Arabic presentation uses Arabic symbols coming from the alphabet. Other symbols can be vertically reflected Latin symbols. The so-called Arabic or Arabic-Indic digits represent numbers.


The Arabic scientific and technical e-documents processing area is quite large. It includes topics such as:

  • Typesetting Arabic mathematical texts in various variants (with left to right or right to left expressions using local Arabic, French or English symbols) and related problems about mathematical expressions indexing and coding for their research and their automatic translation and computation;
  • MathML localization (especially for the needs of the Arabic alphabet based writings) in both meaning and presentation structuring. The problem of translation from MathML content to presentation for the Arabic mathematical notation arises;
  • Scientific and technical symbols normalization and coding with Unicode;
  • Design and development of special static and dynamic fonts and software tools;
  • History of scientific and technical notation and related topics such as ancient mathematical works translation;
  • Arabic writing characteristics, witch is more calligraphy than typography, and related problems around justification or encoding, and the need for software localizers.


Very large horizons stand in front will be presented and illustrated by some examples.

Bruce R. Miller
http://math.nist.gov/~BMiller/

DLMF, LaTeXML and some lessons learned
December 9, 2006

The Digital Library of Mathematical Functions is nearing public release. While our situation is unique — a moderate sized, controlled, collection of documents intended as a reference work — perhaps our experiences can help others in related endeavours. After providing an update of DLMF and LaTeXML (the tool used in its creation), I would like to talk about some of the lessons learned, and promote discussion about things the community can do to make these kinds of projects both easier and better in the future.

Digital Libraries within the context of Mathematical Knowledge Management might fall into two categories: more formal, behind-the-scenes, computer-oriented libraries (or databases) and looser, more traditional reader-oriented libraries. Without in the least diminishing the former, I will concentrate on the issues of the latter. Such libraries consist not only of mathematical content, but mixed with text, tables, figures etc. We must be able to project such libraries into attractive, readable, accessible web sites. The usefulness of the data will be greatly enhanced if we can fully expose the implied interconnections between symbols and concepts found in such documents. While it is often a requirement that a high-quality printed format be available, such sites can be much more than "Books on the web."

As the content is not automatically generated by (say) a theorem prover, convenient document authoring is a major concern. Useful interrelations, enhancements and meta-data are seldom automatically infer-able. Thus, both the desire and means to provide this information is needed: significant buy-in by authors and editors; as well as additional markup. With hand-authored documents, there is also the issue of getting sufficiently unambiguous representation of the mathematics while preserving the authors intended presentation. We have adopted LaTeX as the document source language, and thus need to go beyond the typical TeX markup with more semantic macros and declarations. While we are beginning to see success in mathematical search, such search must contend with this mixed representation and with the typical sloppiness of user queries.

Looking towards the future, we would all like to leverage each others ideas and tools. Perhaps some standardization of rich document formats is possible; should it be based on proprietary or ad hoc formats; OMDoc or DocBook? For those in the "niche within a niche" of LaTeX based Digital Libraries, development and wider adoption of more semantic math markup would be helpful, whatever tools are being used to process it.

Robert Miner
http://www.dessci.com/en/company/management.htm

The Mathdex search engine
December 8, 2006

The talk will describe the architecture and implementation of the Mathdex search engine, a math-aware search engine under development by Design Science. Users can use mathematical expressions as query terms along with the usual text query terms. Math search terms are entered via a graphical equation editor applet. Ranked results are returned, with the rank for math expressions based on structural similarity.

The Mathdex search engine is implemented as an extension to the popular Apache Lucene search engine. Content is converted to a common XHTML + MathML format for indexing. MathML terms are normalized and stored as sequences of text-encoded tokens in the Lucene index. Query terms are similarly tokenized, and the search is performed by custom code by doing low-level atomic Lucene queries. Final rankings are computed as from atomic term queries with weightings based on analysis of the MathML structure.

We are hoping to eventually index a large corpus of electronic documents. To begin, we are attempting to convert and index the ArXiv, using Hermes and LaTeXML, two promising LateX to XTHML+MathML translators. We are also working to arrange to index several other collections where unpublished XML+MathML source code is available by special arrangement. We are also running a customized version of the Apache Nutch web crawler to index online documents containing math in a variety of formats.

Andrew Odlyzko
http://www.dtc.umn.edu/~odlyzko

The slow evolution of mathematical communication
December 9, 2006

Digital technologies are advancing rapidly. But their acceptance by scholars varies considerably. Some, such as computerized typesetting and email, have become universal among mathematicians, while others, such as Open Access, are advancing very slowly. While this is frustrating to technology enthusiasts, it should not be too surprising, as it follows a common historical pattern. While each technology is different, people do not change, and they have often been slow to embrace even revolutionary innovations. There are common patterns in the diffusion of earlier technologies, and some possible lessons will be drawn from them for math communication.

T. V. Raman

Structured math on the web
December 8, 2006

It is now 10 years since we started work on MathML. At the time, the Web was still in its early stages, and we began the work in a spirit of extreme optimism. After all, the Web emerged from the need to exchange scientific information, and we were all excited by the possibilities presented by being able to exchange more than just plain text — we looked forward to a time where online hypertexts would include structured mathematics that could be interactively manipulated and presented via a multiplicity of output modalities. With 20/20 hind-sight, it is clear that we had taken an overly narrow view of the Web; having invented it in the context of scientific information exchange, we had assumed that all evolutions on the Web would be driven by the needs of scientific education. The market-place however taught us otherwise, and today, sceptics would say that structured mathematics on the Web is a failure and never likely to happen within mainstream browsers. But in the midst of depression there is hope--MathML never became a mainstream browser feature and might well be called a failure by sceptics; however, the Web platform has now reached a level of maturity where adoption of new technologies by mainstream browsers is no longer a pre-requisite for success. With the emerging ability to deliver highly interactive Web applications that produce and consume structured XML, I believe we're now entering a new era of the Web where innovative user experience that leverages rich content such as MathML can be delivered to the end-user without waiting for browser vendors to ship their next long-awaited upgrade. Looking forward, what new innovations can we deliver in the field of online Mathematics, and how will these in turn contribute to future innovations on the Web?

Ross Reedstrom
http://cnx.org/member_profile/reedstrm

Collaborative curriculum development in engineering and The sciences: The connexions 7 year experience with MathML
December 8, 2006

Connexions uses a collaborative, community-driven approach to content creation, organization, and dissemination. We provide a set of open source tools and an open repository for the publication and exchange of educational materials, by anyone, from anywhere , to anyone, anywhere.

More than three thousand modules in 13 languages are used by over a million people worldwide for both formal and self-guided learning in fields ranging from computer science to music and from mathematics to biodiversity. These XML-based modules allow instructors to compose customized courses, providing students with new opportunities to explore the connections between different ideas and domains, as well as to extend the repository with additional materials of their own.

A significant core of our content is in engineering and other mathematically intensive disciplines. We use content MathML source materials, transforming to presentation MathML for web presentation, and PDF for print. Since are not ourselves the authors of most of the content, we do not have direct control over tools and methods our authors use to create markup. While requiring cMathML provides well known, significant advantages for users of the repository (flexible format output via transformations, alternative notational conventions, consistent notation using mixed sources, etc.), it has a significant cost for our authors. Lack of tools is still the limiting factor for adding more math in more modules. The current version of the MathML standard is somewhat limited in the fields of math covered by semantic markup. Specific notational conventions can also be difficult to accommodate.

Regardless, over half of our content contains math. We are currently working to improve our tools for input as well providing validity checking (as a subset of theorem proving) of math. Providing means to explore this body of materials via structural math search, in combination with other metadata and fulltext searches, has just now become useful. We will discuss currently successful methods of assisting authors in creating MathML markup, what hasn't worked, how the math interacts with other aspects of distributed collaboration (through both space and time), and some characterization of the body of math available from the Connexions Repository.

Alan P. Sexton
http://www.cs.bham.ac.uk/~aps/
Volker Sorge
http://www.cs.bham.ac.uk/~vxs/index.html

The ellipsis in mathematical documents
December 8, 2006

An ellipsis is a series of dots which indicates the omission of some part of a text which the reader should be able to reconstruct from its context. The most complex and sophisticated use of ellipses occur in matrix expressions, where whole classes of matrices of variable dimension are described with their use. But ellipses also occur in discussions of sequences, series, polynomials, sets, systems of equations and generally wherever there is a collection of mathematical objects described by a pattern rather than an explicit enumeration or a closed form. However, while ellipses are very common in mathematical and scientific documents, relatively little work on their recognition, semantic analysis, formal representation, and electronic communication has been carried out.

In our work, we have shown how a matrix expression containing ellipses can be analysed to extract a semantic representation that can be used for a number of purposes including validating and improving optical character recognition of matrix expressions, symbolic calculation of expressions with such matrices and re-representation as lambda expressions for use by theorem provers. This work has opened a number of new research avenues for machine support for mathematicians, scientists and engineers: since we can represent underspecified matrices with ellipses, we can develop systems to solve matrix problems for arbitrary dimensions directly, rather than only for individual subcases of specific dimension; we can consider the question of generalising specific solutions without ellipses to general patterns of solutions with ellipses.

In this talk we shall summarise our research results in this area to date and outline its possible generalisation to deal with ellipsis constructs in other areas. We shall also suggest a structured way of representing ellipses in a uniform format suitable for electronic communication.

Elena Smirnova
http://www.orcca.on.ca/MathML/elena.html
Stephen M. Watt
http://www.csd.uwo.ca/~watt/

Interfaces for mathematical communication
December 9, 2006

In this talk we discuss one of the essential aspects of mathematical communication: providing interfaces to mathematical environments. This involves mathematical data and knowledge representation, communication protocols and organization of digital libraries. Each of these three subjects represents a complex problem studied extensively for the past decades. We present an overview of the state-of-the-art that has been developed over the course of several research networks connecting two large communities: mathematical knowledge management and mathematical communication. In this talk we report on the main outcomes of these projects that have been achieved both in collaboration and independently by our group at the Ontario Research Centre for Computer Algebra.

First, we briefly address the issues accompanying the problem of mathematical knowledge representation, such as customizing notation for mathematical content and translation between most popular mathematical data formats. We describe how these problems can be approached to ensure conservation of high-level semantic content.

Secondly, we describe an approach to providing interoperability between different mathematical environments. We concentrate on system-independent standards, designed to describe mathematical content and problem domains. These comprise languages and ontologies developed by the MONET (Mathematics On the NET) Consortium to enable unambiguous communication between distributed mathematical components.

Finally, we discuss the problem of creation and organization of databases to store mathematical context. We present our approach to collecting and storing the information about mathematical content, retrieved from web-based mathematical archives. We demonstrate the use of these databases by the example of assisting in on-line mathematical handwriting analysis. Specifically, we show how we use the information derived from the analysis of a digital libraries to prioritize character choices in a mathematical handwriting application.

Neil Soiffer

Overview of accessible math
December 9, 2006

Computers and the internet have been a boon to those with visual disabilities. Screen readers and other assistive technology provide access to information that would have previously been inaccessible to these individuals. While access to mathematical information is harder to access than textual information, significant work has been done to make math accessible.

There are several US and international laws that impose legal mandates on accessible material. These mandates provide an economic incentive for researchers and businesses to provide accessible solutions. These mandates are based on standards such as DAISY, and use MathML to encode math.

Mathematical expressions have been a focus for math accessibility solutions, and much of this work uses MathML:
  • MathPlayer works with screen readers to speak math in IE. It also allows for magnification and navigation of expressions, and provides synchronized highlighting.
  • MathSpeak provides similar features for digital talking books using their DAISY player.
  • Group UMA and the LAMBA project are working on two way Braille translators, and many efforts in the US have developed MathML-to-Nemeth code translators.


MathML is not designed to be directly authored. Two programs are focused on authoring/editing:

  • ChattyInfty is a self-voicing text and 2D math expression editor which can export to MathML and TeX.
  • WinTriangle is RTF-based and presents the expression in a quasi-linear format. WinTriangle maps onto TeX in a straightforward manner; it can also import and export MathML.


Beyond making mathematical expressions accessible, graphs and diagrams need to be made accessible. Solutions range from sonification of the graphs as done in the Accessible Graphing Calculator to tactile graphics via embossing printers or thermoform paper.

Because legacy documents are often not available in electronic formats, OCR techniques are important. InftyReader can read printed documents with math in them and convert them to XHTML+MathML or LaTeX.

Petr Sojka
http://www.fi.muni.cz/usr/sojka/

Towards digital mathematics library DML-CZ (OCR of mathematical texts)
December 31, 1969


Philippe Tondeur
http://www.math.uiuc.edu/~tondeur/

WDML: The world digital mathematics library
December 31, 1969


Bernd Wegner
http://www.zblmath.fiz-karlsruhe.de/people/wegner.html

Digital libraries projects
December 31, 1969


Abdou Youssef
http://www.seas.gwu.edu/~ayoussef/

Relevance ranking and hit packaging in math search
December 8, 2006

As in most search applications, math search involves relevance ranking and hit packaging. That is, hits must be ordered using quantitative relevance scores, and every hit must be accompanied by a small amount of qualitative relevance information that conveys what the hit is about and why it matched.

Determining and quantifying relevance is a very hard problem in text search, and is at least as hard in math search. The relevance score must factor in not only query terms, but also a priori information about the hit target such as: (1) whether the target is/has definitions, notations, graphs, theorems, proofs, and so on; (2) expert-predetermined weights of certain entities (e.g., concepts, functions names, operators, etc.) in the target document; (3) number of database-wide or Web-wide links pointing to the hit target; and (4) frequency of user-access to that target.

Hit packaging is primarily a process of document summarization that is biased by both the user-query and the same kinds of a priori information used in relevance scoring. One way to summarize a document is to partition it into small fragments and select several of the most relevant fragments; fragments from the document metadata, if any, may also be included in the summary.

This talk will identify the issues involved in relevance ranking and hit packaging in math search, and discuss approaches for addressing them.

Go