Implementing Topic Maps
Roger Sperberg
TopicalWeb
A little about Topic Maps
- Basics from the user standpoint
- Topics and metadata and XML and databases
- Uses
- Semantic tools
From the user standpoint
- Where could I use Topic Maps in a publishing environment
- Taxonomy (hierarchical relation — one. Period.)
- Thesaurus (more than one relation)
- Ontology (all relationships, not just hierarchical)
- Where could I use a relationship technology in an information environment
Why not XML?
- Between elements and attributes, maybe you can represent all the information a TM has
- But wait till we get a little farther along ...
Metadata
- Maybe we want to connect ... documents, let's say ...
- Based on their contents — their semantics, you might say, the meaning contained in the document and not just the strings therein
- By topic — "What is this document about?"
- By words/phrases/entities also contained in other documents
The library metaphor
- Author index
- Title index
- Subject index
- Author and title are "metadata"
- Subject is "topic map" matter
Library metaphor continued
- Except metadata and topic map matter are the same thing
We are all librarians now
- The system needs to permit the user:
- To locate a specific document when certain attributes (such as the title, author or date of publication) are known in advance
This is the finding objective
- To locate a set of documents representing
- All the documents from the same author or organization or governmental entity
- All the documents as part of the same series
- Or in the same time frame
- Or on the same subject
- All the linked or cited documents
This is the collocative objective
We are all librarians now
- The system needs to permit the user:
- To choose among different types of documents, which are more or less suitable to the user’s needs
This is the choice objective
- To acquire access to the document, through electronic delivery on-screen, download, printing, faxing, or other mechanism, in real time even if not pre-arranged
This is the acquisition objective
- To navigate the collection—that is, to find documents related to a given document or to a given subject by generalization, association and aggregation, or to travel along axes of equivalence, association or hierarchy
This is the navigation objective
Knowledge organization
- Relationship technologies provide better models for knowledge organization
- You might model the knowledge structures and rules of your information domain in an ontology
- All the concepts, and details about them
- Plus all the relations between the concepts, and all the rules that apply to them
Why not a database?
- Every data point (topic, occurrence) can be represented in a database
- Relationships can be represented as well
- But wait till we get a little farther along ...
Mix and match
- Relationship technologies work especially well:
- When we need to combine disparate sets of information
- Because databases are just not flexible enough to merge in material whose underlying schema keeps changing
An example user with the ideal needs profile
- Intelligence agency
- Own information in one form
- From FBI in another form
- From NYC police department, etc., in other forms
- Newspaper reports, free-form but need to be incorporated
Graphs
In a graph, an arc connects one node to another
Graphs as Trees
A graph can represent an tree (XML)
Graphs as Tables
A graph can represent a table (RDBMS)
What a tree or table can't represent easily
And a little about semantic tools
- Tools that categorize, extract concepts, summarize
- Some leading players:
- Teragram
- Autonomy/Verity
- Clearforest
- Inxight
Teragram
- Classifier
- served by Catcon server
- to TIW thin clients
- Build rules that identify the topic of a document
- Appearance of words/phrases
- Frequency, position, role in sentence
- Boolean rules
- Distance
Teragram
- Concept Extractor
- served by Catcon server
- to TIW thin clients
- Three types of concepts
- Authority lists
- Regexes
- Grammar
Teragram TK240
- Administrative tool
- Use for writing rules
- Allows for testing of rules interactively
- For both concept extraction and categorizer
- Uses same operators in rules for both
Categorizer rules
Categorizer rules
Categorizer results
Circling back around
- Documents are topics
- Subject matter of documents (ie, their topics) are topics
- Pieces of metadata are topics
- Topics are connected
- Topic Maps (the standard) allows easy ways to model this and to store, query and add to information
Contact info
Roger Sperberg
TopicalWeb
email: firstinitial lastname at gmail