EXTRAFOR : automatic EXTRAction of mathematical FORmulas

Abstract : We present a method for automatic extraction of mathematical formulas from images of documents without character recognition. Formula extraction is first done by location of its most significant symbols, then extension to adjoining symbols using contextual rules until delimitation of the whole formula space. Mathematical symbols labelling is realised from models created at the learning stage using fuzzy logic. This paper reviews our current efforts to develop such a system, presents problems we have encountered and summarises our results. The average rate of preliminary labelling rate is about 95.3%. 90% of mathematical formulas are well extracted from documents printed with high quality.
