• ENCYCLOPEDIE
  • BIBLIOTHEQUE BLEUE
  • PERSEUS UNDER PHILOLOGIC
  • sample
  • sample
  • sample

After Encoding: PhiloLogic TEI workshop

  • tei
  • artfl
  • philologic
  • philomine
  • pair-philoline
  • artfl research
  • future of philologic
  • new at artfl

    After Encoding:
    Searching, Mining and Comparing Your TEI Collections Using the PhiloLogic Tool Set

    Who, What, When, Where

    Presenters: Mark Olsen (ARTFL project, University of Chicago); Helma Dik (Classics, University of Chicago).
    Participants will get an introduction to the use of PhiloLogic, PhiloMine, and PhiloLine, on TEI texts and will have the opportunity for hands-on practice on demonstration databases. Participants should bring their own laptops; wireless internet will be available.
    Date: November 11, 2009 (full day). Detailed schedule below.
    Conference: Text Encoding in the Era of Mass Digitization (conference home page); registration. Minimum number of participants: 8.
    Registration addendum: please also see below, enhanced participation, for important information.

    Introduction

    The Text Encoding Initiative has established the community standard for the mark-up of textual data in the humanities and related disciplines. The creation of a standard encoding scheme thus allows various teams to develop software to do a wide variety of functions. These range from transformations of encoding from one specification to another, analysis functions, and a variety of search and publication functions. Systems that can manage TEI datasets offer different capabilities, system requirements, and end user capabilities.

    The PhiloLogic Tool Set consists of open source implementations of three distinct packages. PhiloLogic is the main text analysis package released by the ARTFL Project in 2004 with subsequent updates. It supports a wide range of text search and analysis functions and can be used for a wide range of document types and languages, ranging from South Asian dictionaries to collections of modern letters. It is designed to function on relatively large collections of materials, ranging to the mid-10s of thousands of documents, and it aimed at a relatively sophisticated user community rather than directly at end-users or desktop applications. PhiloLogic is being used by a range of projects and institutions around the world, such as the Brown University Women Writers Project and the Indica and Buddica site in New Zealand. In addition to PhiloLogic, which supports fairly traditional text analysis functions, the ARTFL Project has released extensions to support text mining and similar passage identification. PhiloMine is an interactive text mining and machine learning system, which supports a wide variety of functions, including comparative machine learning, classification, and clustering applications. PhiloLine is an extension to PhiloLogic which supports similar passage identification using simple a sequence alignment algorithm, which is used to identify textual reuse across large collections. Complete descriptions, demonstrations and documentation, slide shows and, of course, source code are available at the links in the menu on this page.

    For a full-day workshop, we propose a combination of relatively formal presentations, hands-on use of previously installed databases under all three packages, informal presentations concerning current research and development efforts, and hands-on PhiloLogic data loading with particular emphasis on configuration and customization.

    Schedule (subject to change)

    Morning
    The morning session aims to give a full overview for 'power-users' of the PhiloLogic tool set for those familiar with TEI. We would argue that mark-up is truly useful when it can be leveraged in a search and retrieval system, and we show what kinds of leverage PhiloLogic affords the user in a variety of databases.

    9 introductions
    9:15-10 Design Overview: Outline of PhiloLogic, PhiloMine and PhiloLine. When not to use PhiloLogic (Mark).
    10-10:45 Using PhiloLogic Tools (Helma and hands-on use; includes PhiloMine and PhiloLine).
    10:45-11:15 Special use cases (dramatic texts, narrative texts, reference texts); debugging your XML with PhiloLogic?! (Helma)
    11:15-12 PhiloLogic Extensions (Helma, morphology extensions; Mark, similarity searching, virtual normalization, frequencies).

    Afternoon
    The afternoon session gets down to the business of loading and modifying databases under PhiloLogic. As we show in the morning session, databases will vary in the kinds of features that are deemed of interest for users. In the afternoon, we show how a database can go from an initial default load to a smart system that fits your project requirements. We look at the quite straightforward installation of the sequence alignment tool, Philoline, and conclude with a discussion of future directions of PhiloLogic.

    1-2 Loading a PhiloLogic database (Mark and Helma, demonstration and hands-on).
    2-3 Tweaking a standard PhiloLogic load: localization, modification, etc. (Mark and Helma, demonstration and hands-on).
    3-4 Extending the toolset: installation and experimentation with Philoline (Mark and Helma, demonstration and hands-on).
    4-5 PhiloLogic: future. Towards a service-oriented architecture (Mark).

    Enhanced participation

    We welcome all interested parties. Participants may, if they choose, experiment with their own documents in two ways.
    The first is to submit a set of up to 10 TEI-encoded XML documents to
    Mark Olsen, 4 weeks in advance, for a basic load. Mark will run a basic load; participants will be walked through editing search forms and other refinements at the workshop. Obviously, Mark can only do this for a limited number of participants on a first come, first serve basis.
    The second is to download and install PhiloLogic (as well as PhiloMine/PhiloLine) on a Mac OSX or Linux system. For this second option, and particularly in the case of Philomine, participants should not be novices when it comes to the Unix command line. Mark will be happy to provide assistance, but again, installation should be initiated and requests for help made no later than 4 weeks in advance of the workshop. Participants can consult in the month leading up to the workshop on implementation refinements. Again, this would be "within reason" on a first come basis.

    Readings

    The best way to prepare for the workshop is to explore the various links in the menu above. A basic introduction to PhiloLogic, sample databases and reports on recent work at ARTFL can all be found there. Participants interested in getting their own database loaded by us, or doing their own PhiloLogic install, should initiate this process and get in touch with Mark Olsen for any assistance they need as soon as possible.