LaTeX::TOM
LaTeX :: TeX Object Model, a perl module

Welcome to the LaTeX::TOM page!

IMPORTANT NOTE: LaTeX::TOM is now being more actively maintained by Steven Schubiger. See the CPAN page.

Overview

LaTeX::TOM is ``LaTeX::TeX Object Model''. It is inspired by the XML::DOM perl module, and more generally, the DOM specificiation for XML. It provides basically the same thing as XML::DOM, but for LaTeX documents.

Motivation & Detail

In multiple projects of mine, I needed a way to manipulate and process LaTeX documents which was very dependent on their logical structure. After fumbling around with crude (and increasingly elaborate) regular expression filters, I finally decided to do what needed to be done and 1) write a parser for LaTeX, and 2) provide an easily manipulatable representation of the parsed document. I ended up creating something very much like XML's DOM, and XML::DOM, but with natural modifications for LaTeX (which is nowhere near as simple as XML, both syntactically and semantically.)

With LaTeX::TOM you can:

  • Easily extract the plain-text (non-math, visible) from a LaTeX document.
  • Easily get an indexable string representing the plain-text in the LaTeX document.
  • Easily extract/detect all math-mode nodes.
  • Apply \newcommand, \def, or \newenvironment mappings.
  • Recursively process \include directives.
  • Recognize and process commands, environments, and groupings.
  • Make arbitrary modifications to the TOM and write out a new LaTeX document.
  • Print out the TOM structure for a LaTeX document.

Download

  • Version .04 - CPAN release by Steven Schubiger.
  • Version .02c - Bug fixes: Handling of newlines and whitespace between commands and parameters and groups, handling of \w+\d+ commands (thanks Leo Tenenblat for both), documentation bugfix: "parseFile", not "parsefile".
  • Version .02b - License included (oops), slight code prettiness updates.

Documentation

An online version of the POD is at the CPAN page.

Contact

I can be reached at akrowne@vt.edu.

What's With The Cat?

I was trying to think of a good name for this module, and "LaTeX::DOM" seemed natural. But even though the module really does work with a "document object model," I decided it could be confusing to call it a "DOM" since that term appears to have been co-opted purely for describing something about XML (or is likely to be perceived as such). So I figured it'd be more accurate and less confusing to call it a "TeX Object Model", or "TOM" (with the stipulation that a "TOM" is a type of "DOM"). By an amazing coincedence, my (ex-)girlfriend's cat is named "Tom", and I thought it'd be cute to make him the mascot for LaTeX::TOM. The logo is based on a picture of Tom. Sadly, Tom passed away a few years go, so this page is dedicated to his memory!

[ back ]

This page © 2003-2007 Aaron Krowne. Last modified 2007-07-27.