December 8 - 9, 2006
Seventy-five years ago Kurt Gödel overturned the mathematical apple cart: he proved entirely deductively
that mathematics is not entirely deductive,while holding quite different ideas about legitimate
forms of mathematical reasoning: If mathematics describes an objective world just like physics, there
is no reason why inductive methods should not be applied in mathematics just the same as in physics.
(Kurt Gödel, 1951) This talk provides an introduction to Experimental Mathematics, its theory and
its practice. I will focus on the differences between Discovering Truths and Proving Theorems
and on the implications for knowledge management and communication. I shall explore various of
the computational tools available for deciding what to believe in mathematics, and-using accessible
examples-illustrate the rich experimental tool-box mathematicians now have access to. These
tools range from web-interfaces and databases to preprint repositories and digital library collections,
and prominently include NIST's forthcoming Digital Library of Mathematical Functions. In an attempt
to explain how mathematicians may use High Performance Computing (HPC) and what they
have to offer other computational scientists, I will touch upon various Computational Mathematics
Challenge Problems.
Mathematical markup languages like OpenMath and MathML offer the possibility to represent mathematical content in a level of abstraction that is not dependent on localized information. This representation typically focuses on the semantics of the mathematical object and postpones localization aspects of mathematics, such as those influenced by notation and by culture, to the rendering process of the markup. While typesetting of mathematical markup has been the object of a numerous efforts, from MathML-presentation to SVG converters, the rendering of mathematics in a "verbalized" jargon has not yet received similar attention. In this talk, I will present the results of the WebALT EU eContent project concerning the application of language technologies to the automatic generation of text from mathematical markup.
Mathematical jargon is an important aspect of the education of students. Not only does a teacher train pupils in problem solving skills, but she also makes sure that they acquire a proper way of expressing mathematical concepts. To our knowledge, digital eLearning resources have used a representation in which text is intermixed with mathematical expressions even in situations where the actual abstract representation, for instance of the statement of a theorem, can be reduced to a single mathematical object. One reason for this representation choice is that the rendering process would otherwise produce a symbolic, typeset mathematical formula that might prove too difficult to understand for the students or simply just too hard to read. However, by representing this kind of mathematical text in a language-independent format such as the one provided by markup languages, it is possible to apply language technologies that generate the same text in a variety of languages including English, Spanish, Finnish, Swedish, French and Italian.
The project results include editors for mathematical multilingual markup, a web service for generating multiple languages versions and a digital repository of multilingual interactive mathematical exercises and drill questions.
JsMath is a means of displaying mathematics in web pages that works across multiple browsers (MSIE, Firefox, Opera, Safari, etc.) and multiple platforms (Windows,
unix, Mac OS X). It uses JavaScript, cascading style sheets (CSS), and unicode fonts to render TeX code embedded in an HTML document into typeset mathematics within
the browser. Over the past two years, jsMath has found a home within a number of on-line content-generation systems (e.g., bulletin boards, blogs, course-management
systems) because it allows participants to enter mathematics in a straight-forward and possibly familiar format while still producing quality output in all the major
browsers, both on screen and in print, without the need of complicated installations on the server or extra downloads by the user. This talk will describe some of
jsMath's most important features and how these can be controlled by the page author, and will point out some of the issues that need to be addressed when adding
jsMath into a content- management system. Finally, we will discuss some of the future plans for jsMath, including incorporation of MathML input and output into
jsMath.
Four years ago, I set out to see whether one could used weblogs as an effective vehicle for communicating ideas in Physics. Not knowing any better, I decided to use
the much-heralded, but seldom-used web technologies of XHTML+MathML (with the occasional bit of SVG, for good measure). Between my own blog, Musings, and The String
Coffee Table, which I host on my server, there are nearly a thousand posts, and many thousands of comments, making them one of the largest collections of MathML on
the web.
On the one hand, this was a social experiment, in adopting the web (and weblogs, in particular) as a conduit for scientific communications. On the other hand, it was
a technological experiment, in making the technology "easy enough for mere physicists to use." I would like to address both aspects in this talk.
Surprisingly, we can speak mathematics to a computer probably more rapidly and accurately than handwriting. Even better is to speak and use pointing or handwriting. A combination may allow us to identify and cancel errors in one mode or another. In some cases speaking may be more convenient than typing, even for rapid typists: many mathematical symbols missing on the keyboard can be easily spoken. Even without venturing into Greek, handwriting or even typing "fifty million" is probably slower and more error-prone than speaking it.
Pursuing the goal of effectively speaking small pieces of mathematics, we wondered how hard it would be to speak arbitrarily long sections of mathematics, including nested complex expressions.
We first describe programs for the inverse problem: computer generation of mathematical speech. In so doing we find that we need to suggest a few speaking conventions to overcome the unfortunately ambiguous and inconsistent common usages of mathematics.
Then we consider tools and guidelines to make it more plausible for humans to speak full mathematical formulas so they can be recognized by a computer using a speech recognizer program.
We describe our prototype programs which do somewhat less than we propose, but are effective in that speech can either be used alone, or used to fill in boxes (superscripts, etc.) or larger pieces, or for choosing alternatives from plausible symbol recognition from handwriting. We believe the principle barriers to engineering a more complete program can be overcome, though a driving application may be essential for refining prototypes into useful programs. This paper is not intended to be the last word on the subject, but to expose problems and approaches relevant to the task.
This is work related to communication with Thierry Bouche (Cellule MathDoc
and Institut Fourier, Grenoble) and David Ruddy (Cornell University
Library)
For the access to mathematical research literature, mathematicians usually
employ review journals such as Mathematical Reviews and Zentralblatt der
Mathematik. These provide a rich network of interlinked (via references and
citations) mathematical sources. To make this even more useful, the review
journals could be used as hubs for the linkage not only to printed and born
digital material, but also to digitized versions of this literature.
An examination of the currently available metadata indicates not only that
the present formats do not identify the mathematical literature with sufficient
precision, but also that the metadata formats in use are inadequate, unless overloaded
with complex syntactical schemes. A richer and more rigid scheme for the
expression of the metadata is needed, and different approaches are investigated:
- The development of a Dublin Core based Application Profile, using qualified
Dublin Core and some additional fields to encapsulate the required
information.
- The development of a specialized metadata scheme based on one in use
by the French NUMDAM project.
- A Dublin Core format based on the Dublin Core Abstract Model, using
the new DC-XML specification.
- The usage of OpenURL as a method of reference.
The obvious choice for exchanging these metadata is the OAI Protocol for
Metadata Harvesting, and several of the libraries and projects involved in digitizing
mathematics have begun to expose metadata records using OAI-PMH,
offering several metadata formats, including Dublin Core. The talk will investigate
the options to enhance this form of communication to establish a working
network of interlinked mathematics and present the state of interlinkage at the
SUB Göttingen, using the extensive Mathematica collection of digitized mathematics.
This talk discusses and compares several approaches that can be used to
produce mathematical content on the web. Emphasis is placed on the input that the
user has to create, in particular on ease-of-use, readability, familiarity, generality,
availability and other criteria. While LaTeX to PDF is generally considered the de
facto standard in research publications, we will also examine input languages for
several computer algebra systems, online assessment/courseware systems and word
processors, as well as ASCIIMath.
Since the ASCIIMath language is a relative newcomer (but mostly a refinement of
the well known tradition of approximating formulas by ascii characters), the talk
includes a brief description and motivation for some of the design decisions. The
core of this mathematical input language consists of only 8 lines of BNF, yet it can
express most of undergraduate mathematics in a predictable way that generally
matches what users expect. In addition the language constructs map in a direct
way to a subset of Presentation MathML. E.g. 4/3pir
^{3} translates to
<mfrac><mn>4</mn><mn>3</mn></mfrac><mi>π</mi>
<msup><mi>r</mi><mn>3</mn></msup>
and displays in standard typeset form. A JavaScript implementation ASCIIMathML.js parses this
syntax and applies the translation in the client's browser. This program has been
downloaded by thousands of users in over 90 countries and is currently used in
numerous blogs, wikis and course management systems, partly because it enables
cross-browser MathML to display in legacy HTML (rather than XHTML) pages.
Since the language overlaps with LaTeX and TI-83 syntax, it is familiar to mathematicians
and school children, and is also used as input for online calculators.
ASCIIMath has been implemented by others in PHP and C#, as well as modified
to LaTeXMathML.js. Experiments with an online WYSIWYG HTML editor, called
ASciencePad, indicate that ASCIIMath is a convenient alternative to toolbar-driven
formula editors. In summary, it appears that an input format like ASCIIMath is
a desirable addition to the various ways of creating mathematical content online.
Further information can be found at
www.chapman.edu/~jipsen/asciimath.html.
The distributivity of information and services over the Internet has changed all aspects of
life. The process of developing and deploying science and technology is no exception to this. Its
individual aspects are already supported by a variety of software systems, but the systems are,
by and large, not able to inter-operate since they use differing data formats, make differing model
assumptions, and are bound to an implicitly given context that is only documented in publications
about the systems.
We anticipate a new quality to emerge if humans and systems can inter-operate to cover the
whole work-flow of research, education, and application. To further this vision we need to develop,
implement, and provide semantic-based and context-aware techniques for acquiring, organizing,
processing, sharing, and using knowledge in Science and Technology.
In this talk I will present an alternative vision of a 'Semantic Web' for Science and Technology.
Like Tim Berners-Lee's vision we aim to make the Web (here scientific knowledge) machineunderstandable
instead of merely machine-readable. However, instead of a top-down metadatadriven
approach, which tries to approximate the content of documents by linking them to web
ontologies (expressed in terminologic logics), we explore a bottom-up approach and focus on
making explicit the intrinsic structure of the underlying scientific knowledge. A connection of
documents to web ontologies is still possible, but a secondary effect.
I will make these ideas concrete with the XML-based content/context format for mathematic
discourses (OMDoc: Open Mathematical Documents) that supports novel web services by blending
formal and natural elements. The core purpose of the OMDoc format is to enable communication
of mathematics "in the large." Most current representation formats for mathematics concentrate
on representing mathematical formulae and give the representations meaning by providing a fixed
context (in the specification). As these contexts only cover specific mathematical areas, we can
only cover mathematics "mathematics in the small" with this approach.
OMDoc extends MathML and OpenMath by a rich markup language for (meaning-providing)
contexts, which even provides constructs to inter-relate contexts. Thus it can emulate other
representation languages by marking up their inscribed contexts, and can act as an interoperability
format between languages and even software systems. Taken to the extreme, it can even act as
an interoperability format between scientific disciplines; I will discuss this using the the ongoing
extension efforts towards STMML2 (Science, Technology & Medical Markup Language).
In Arabic handbooks, there are, at least, two models for writing mathematical expression
according to the local area and the study level:
- Latin mathematical presentation as in English or French. Symbols are then imported
from one of these European languages writing, according to the dominant cultural
influence. Symbolic writing is then running in the opposite direction of the natural
language;
- Arabic mathematical presentation. Specific symbols are used and the writing follows
the direction of the natural language handwriting which is from right to left. Arabic
presentation uses Arabic symbols coming from the alphabet. Other symbols can be
vertically reflected Latin symbols. The so-called Arabic or Arabic-Indic digits
represent numbers.
The Arabic scientific and technical e-documents processing area is quite large. It includes
topics such as:
- Typesetting Arabic mathematical texts in various variants (with left to right or right to
left expressions using local Arabic, French or English symbols) and related problems
about mathematical expressions indexing and coding for their research and their
automatic translation and computation;
- MathML localization (especially for the needs of the Arabic alphabet based writings)
in both meaning and presentation structuring. The problem of translation from
MathML content to presentation for the Arabic mathematical notation arises;
- Scientific and technical symbols normalization and coding with Unicode;
- Design and development of special static and dynamic fonts and software tools;
- History of scientific and technical notation and related topics such as ancient
mathematical works translation;
- Arabic writing characteristics, witch is more calligraphy than typography, and related
problems around justification or encoding, and the need for software localizers.
Very large horizons stand in front will be presented and illustrated by some examples.
The Digital Library of Mathematical Functions is nearing public release. While our situation is unique — a moderate sized, controlled, collection of documents intended as a reference work — perhaps our experiences can help others in related endeavours. After providing an update of DLMF and LaTeXML (the tool used in its creation), I would like to talk about some of the lessons learned, and promote discussion about things the community can do to make these kinds of projects both easier and better in the future.
Digital Libraries within the context of Mathematical Knowledge Management might fall into two categories: more formal, behind-the-scenes, computer-oriented libraries (or databases) and looser, more traditional reader-oriented libraries. Without in the least diminishing the former, I will concentrate on the issues of the latter. Such libraries consist not only of mathematical content, but mixed with text, tables, figures etc. We must be able to project such libraries into attractive, readable, accessible web sites. The usefulness of the data will be greatly enhanced if we can fully expose the implied interconnections between symbols and concepts found in such documents. While it is often a requirement that a high-quality printed format be available, such sites can be much more than "Books on the web."
As the content is not automatically generated by (say) a theorem prover, convenient document authoring is a major concern. Useful interrelations, enhancements and meta-data are seldom automatically infer-able. Thus, both the desire and means to provide this information is needed: significant buy-in by authors and editors; as well as additional markup. With hand-authored documents, there is also the issue of getting sufficiently unambiguous representation of the mathematics while preserving the authors intended presentation. We have adopted LaTeX as the document source language, and thus need to go beyond the typical TeX markup with more semantic macros and declarations. While we are beginning to see success in mathematical search, such search must contend with this mixed representation and with the typical sloppiness of user queries.
Looking towards the future, we would all like to leverage each others ideas and tools. Perhaps some standardization of rich document formats is possible; should it be based on proprietary or ad hoc formats; OMDoc or DocBook? For those in the "niche within a niche" of LaTeX based Digital Libraries, development and wider adoption of more semantic math markup would be helpful, whatever tools are being used to process it.
The talk will describe the architecture and implementation of the Mathdex search engine, a math-aware search engine under development by Design Science. Users can
use mathematical expressions as query terms along with the usual text query terms. Math search terms are entered via a graphical equation editor applet. Ranked
results are returned, with the rank for math expressions based on structural similarity.
The Mathdex search engine is implemented as an extension to the popular Apache Lucene search engine. Content is converted to a common XHTML + MathML format for indexing.
MathML terms are normalized and stored as sequences of text-encoded tokens in the Lucene index. Query terms are similarly tokenized, and the search is performed by
custom code by doing low-level atomic Lucene queries. Final rankings are computed as from atomic term queries with weightings based on analysis of the MathML
structure.
We are hoping to eventually index a large corpus of electronic documents. To begin, we are attempting to convert and index the ArXiv, using Hermes and LaTeXML, two
promising LateX to XTHML+MathML translators. We are also working to arrange to index several other collections where unpublished XML+MathML source code is available
by special arrangement. We are also running a customized version of the Apache Nutch web crawler to index online documents containing math in a variety of formats.
Digital technologies are advancing rapidly. But their acceptance
by scholars varies considerably. Some, such as computerized typesetting
and email, have become universal among mathematicians, while others, such
as Open Access, are advancing very slowly. While this is frustrating to
technology enthusiasts, it should not be too surprising, as it follows
a common historical pattern. While each technology is different, people
do not change, and they have often been slow to embrace even revolutionary
innovations. There are common patterns in the diffusion of earlier technologies,
and some possible lessons will be drawn from them for math communication.
It is now 10 years since we started work on MathML. At the time, the Web was still in its early stages, and we began the work in a spirit of extreme optimism. After all, the Web emerged from the need to exchange scientific information, and we were all excited by the possibilities presented by being able to exchange more than just plain text — we looked forward to a time where online hypertexts would include structured mathematics that could be interactively manipulated and presented via a multiplicity of output modalities.
With 20/20 hind-sight, it is clear that we had taken an overly narrow view of the Web; having invented it in the context of scientific information exchange, we had assumed that all evolutions on the Web would be driven by the needs of scientific education. The market-place however taught us otherwise, and today, sceptics would say that structured mathematics on the Web is a failure and never likely to happen within mainstream browsers.
But in the midst of depression there is hope--MathML never became a mainstream browser feature and might well be called a failure by sceptics; however, the Web platform has now reached a level of maturity where adoption of new technologies by mainstream browsers is no longer a pre-requisite for success. With the emerging ability to deliver highly interactive Web applications that produce and consume structured XML, I believe we're now entering a new era of the Web where innovative user experience that leverages rich content such as MathML can be delivered to the end-user without waiting for browser vendors to ship their next long-awaited upgrade. Looking forward, what new innovations can we deliver in the field of online Mathematics, and how will these in turn contribute to future innovations on the Web?
Connexions uses a collaborative, community-driven approach to content creation,
organization, and dissemination. We provide a set of open source tools and an
open repository for the publication and exchange of educational materials, by
anyone, from anywhere , to anyone, anywhere.
More than three thousand modules in 13 languages are used by over a million
people worldwide for both formal and self-guided learning in fields ranging
from computer science to music and from mathematics to biodiversity.
These XML-based modules allow instructors to compose customized courses,
providing students with new opportunities to explore the connections between
different ideas and domains, as well as to extend the repository with additional materials of their own.
A significant core of our content is in engineering and other mathematically
intensive disciplines. We use content MathML source materials, transforming to
presentation MathML for web presentation, and PDF for print. Since are not
ourselves the authors of most of the content, we do not have direct control
over tools and methods our authors use to create markup. While requiring
cMathML provides well known, significant advantages for users of the repository
(flexible format output via transformations, alternative notational
conventions, consistent notation using mixed sources, etc.), it has a
significant cost for our authors. Lack of tools is still the limiting factor
for adding more math in more modules. The current version of the MathML
standard is somewhat limited in the fields of math covered by semantic markup.
Specific notational conventions can also be difficult to accommodate.
Regardless, over half of our content contains math. We are currently working
to improve our tools for input as well providing validity checking (as a subset
of theorem proving) of math. Providing means to explore this body of materials
via structural math search, in combination with other metadata and fulltext
searches, has just now become useful. We will discuss currently successful
methods of assisting authors in creating MathML markup, what hasn't worked, how
the math interacts with other aspects of distributed collaboration (through
both space and time), and some characterization of the body of math available
from the Connexions Repository.
An ellipsis is a series of dots which indicates the omission of some part of
a text which the reader should be able to reconstruct from its context. The
most complex and sophisticated use of ellipses occur in matrix expressions,
where whole classes of matrices of variable dimension are described with their
use. But ellipses also occur in discussions of sequences, series, polynomials, sets,
systems of equations and generally wherever there is a collection of mathematical
objects described by a pattern rather than an explicit enumeration or a closed
form. However, while ellipses are very common in mathematical and scientific
documents, relatively little work on their recognition, semantic analysis, formal
representation, and electronic communication has been carried out.
In our work, we have shown how a matrix expression containing ellipses can
be analysed to extract a semantic representation that can be used for a number
of purposes including validating and improving optical character recognition
of matrix expressions, symbolic calculation of expressions with such matrices
and re-representation as lambda expressions for use by theorem provers. This
work has opened a number of new research avenues for machine support for
mathematicians, scientists and engineers: since we can represent underspecified
matrices with ellipses, we can develop systems to solve matrix problems for
arbitrary dimensions directly, rather than only for individual subcases of specific
dimension; we can consider the question of generalising specific solutions without
ellipses to general patterns of solutions with ellipses.
In this talk we shall summarise our research results in this area to date and
outline its possible generalisation to deal with ellipsis constructs in other areas.
We shall also suggest a structured way of representing ellipses in a uniform
format suitable for electronic communication.
In this talk we discuss one of the essential aspects of mathematical communication: providing interfaces to mathematical environments. This involves mathematical data and knowledge representation, communication protocols and organization of digital libraries. Each of these three subjects represents a complex problem studied extensively for the past decades. We present an overview of the state-of-the-art that has been developed over the course of several research networks connecting two large communities: mathematical knowledge management and mathematical communication. In this talk we report on the main outcomes of these projects that have been achieved both in collaboration and independently by our group at the Ontario Research Centre for Computer Algebra.
First, we briefly address the issues accompanying the problem of mathematical knowledge representation, such as customizing notation for mathematical content and translation between most popular mathematical data formats. We describe how these problems can be approached to ensure conservation of high-level semantic content.
Secondly, we describe an approach to providing interoperability between different mathematical environments. We concentrate on system-independent standards, designed to describe mathematical content and problem domains. These comprise languages and ontologies developed by the MONET (Mathematics On the NET) Consortium to enable unambiguous communication between distributed mathematical components.
Finally, we discuss the problem of creation and organization of databases to store mathematical context. We present our approach to collecting and storing the information about mathematical content, retrieved from web-based mathematical archives. We demonstrate the use of these databases by the example of assisting in on-line mathematical handwriting analysis. Specifically, we show how we use the information derived from the analysis of a digital libraries to prioritize character choices in a mathematical handwriting application.
Computers and the internet have been a boon to those with visual disabilities. Screen readers and other assistive technology provide access to information that would have previously been inaccessible to these individuals. While access to mathematical information is harder to access than textual information, significant work has been done to make math accessible.
There are several US and international laws that impose legal mandates on accessible material. These mandates provide an economic incentive for researchers and businesses to provide accessible solutions. These mandates are based on standards such as DAISY, and use MathML to encode math.
Mathematical expressions have been a focus for math accessibility solutions, and much of this work uses MathML:
- MathPlayer works with screen readers to speak math in IE. It also allows for magnification and navigation of expressions, and provides synchronized highlighting.
- MathSpeak provides similar features for digital talking books using their DAISY player.
- Group UMA and the LAMBA project are working on two way Braille translators, and many efforts in the US have developed MathML-to-Nemeth code translators.
MathML is not designed to be directly authored. Two programs are focused on authoring/editing:
- ChattyInfty is a self-voicing text and 2D math expression editor which can export to MathML and TeX.
- WinTriangle is RTF-based and presents the expression in a quasi-linear format. WinTriangle maps onto TeX in a straightforward manner; it can also import and export MathML.
Beyond making mathematical expressions accessible, graphs and diagrams need to be made accessible. Solutions range from sonification of the graphs as done in the Accessible Graphing Calculator to tactile graphics via embossing printers or thermoform paper.
Because legacy documents are often not available in electronic formats, OCR techniques are important. InftyReader can read printed documents with math in them and convert them to XHTML+MathML or LaTeX.
As in most search applications, math search involves relevance ranking and hit
packaging. That is, hits must be ordered using quantitative relevance scores, and every hit
must be accompanied by a small amount of qualitative relevance information that
conveys what the hit is about and why it matched.
Determining and quantifying relevance is a very hard problem in text search, and is at
least as hard in math search. The relevance score must factor in not only query terms, but
also a priori information about the hit target such as: (1) whether the target is/has
definitions, notations, graphs, theorems, proofs, and so on; (2) expert-predetermined
weights of certain entities (e.g., concepts, functions names, operators, etc.) in the target
document; (3) number of database-wide or Web-wide links pointing to the hit target; and
(4) frequency of user-access to that target.
Hit packaging is primarily a process of document summarization that is biased by both
the user-query and the same kinds of a priori information used in relevance scoring. One
way to summarize a document is to partition it into small fragments and select several of
the most relevant fragments; fragments from the document metadata, if any, may also be
included in the summary.
This talk will identify the issues involved in relevance ranking and hit packaging in math
search, and discuss approaches for addressing them.