# From Paper to XML in Mathematics

Monday, April 26, 2004 - 4:00pm - 4:30pm

Keller 3-180

Masakazu Suzuki (Kyushu University)

There are several levels of digitization of mathematics:

level 1: bitmap images of printed materials (e.g. GIF, TIFF),

level 2: searchable digitized document (e.g. PDF with hidden text),

level 3: structured document with links (e.g. HTML(+MathML), LATEX),

level 4: (partially) executable document (e.g. Mathematica, Maple),

level 5: formally presented document. (e.g. Mizar, OMDoc) Currently most of mathematical knowledge is stored and used mainly in printed materials (level 1) like books or electronic journals.

For being used actively it is preferable that mathematical text is stored in possibly a higher level of digitization. However, making documents digitized to a higher level needs quite a lot of efforts. The aim of the talk is an overview of key technologies from level 1 to level 3, present state and future problems. The results of our research in this paradigm can be found in the web site: http://infty.math.kyushu-u.ac.jp. Some applications can be downloaded from the site. The talk will include a demonstration of our OCR software to digitize mathematical papers into XML in our original format, LaTeX source files and HTML files with mathematical notations in MathML.

Statement of Interest: The main subject of my resarch interest is currently the development of practicaly usable software to read printed mathematical documents and to convert the results into digitized formats treatable by machine. I am also interested in knowing if there is any possibility for (near) future computer to judge, for example, the equivalence or some similarity of two theorems formulated differently?

level 1: bitmap images of printed materials (e.g. GIF, TIFF),

level 2: searchable digitized document (e.g. PDF with hidden text),

level 3: structured document with links (e.g. HTML(+MathML), LATEX),

level 4: (partially) executable document (e.g. Mathematica, Maple),

level 5: formally presented document. (e.g. Mizar, OMDoc) Currently most of mathematical knowledge is stored and used mainly in printed materials (level 1) like books or electronic journals.

For being used actively it is preferable that mathematical text is stored in possibly a higher level of digitization. However, making documents digitized to a higher level needs quite a lot of efforts. The aim of the talk is an overview of key technologies from level 1 to level 3, present state and future problems. The results of our research in this paradigm can be found in the web site: http://infty.math.kyushu-u.ac.jp. Some applications can be downloaded from the site. The talk will include a demonstration of our OCR software to digitize mathematical papers into XML in our original format, LaTeX source files and HTML files with mathematical notations in MathML.

Statement of Interest: The main subject of my resarch interest is currently the development of practicaly usable software to read printed mathematical documents and to convert the results into digitized formats treatable by machine. I am also interested in knowing if there is any possibility for (near) future computer to judge, for example, the equivalence or some similarity of two theorems formulated differently?