Zebra - User's Guide and Reference

Adam Dickmeiss

Heikki Levanto

Marc Cromme

Mike Taylor

Sebastian Hammer

Abstract

Zebra is a free, fast, friendly information management system. It can index records in XML/SGML, MARC, e-mail archives and many other formats, and quickly find them using a combination of boolean searching and relevance ranking. Search-and-retrieve applications can be written using APIs in a wide variety of languages, communicating with the Zebra server using industry-standard information-retrieval protocols or web services.

This manual explains how to build and install Zebra, configure it appropriately for your application, add data and set up a running information service. It describes version 2.0.0 of Zebra.


Table of Contents

1. Introduction
1. Overview
2. Features
3. References and Zebra based Applications
3.1. Koha free open-source ILS
3.2. Emilda open source ILS
3.3. ReIndex.Net web based ILS
3.4. DADS - the DTV Article Database Service
3.5. Infonet Eprints
3.6. Alvis
3.7. ULS (Union List of Serials)
3.8. NLI-Z39.50 - a Natural Language Interface for Libraries
3.9. Various web indexes
4. Support
5. Future Directions
2. Installation
1. UNIX
2. GNU/Debian
2.1. GNU/Debian Linux on i686 Platform
2.2. Ubuntu/Debian and GNU/Debian on other platforms
3. WIN32
4. Upgrading from Zebra version 1.3.x
3. Quick Start
4. Example Configurations
1. Overview
2. Example 1: XML Indexing And Searching
3. Example 2: Supporting Interoperable Searches
5. Overview of Zebra Architecture
1. Local Representation
2. Main Components
2.1. Core Zebra Libraries Containing Common Functionality
2.2. Zebra Indexer
2.3. Zebra Searcher/Retriever
2.4. YAZ Server Frontend
2.5. Record Models and Filter Modules
2.5.1. TEXT Record Model and Filter Module
2.5.2. GRS Record Model and Filter Modules
2.5.3. ALVIS Record Model and Filter Module
3. Indexing and Retrieval Workflow
6. Query Model
1. Query Model Overview
1.1. Query Languages
1.1.1. Prefix Query Format (PQF)
1.1.2. Common Query Language (CQL)
1.2. Operation types
1.2.1. Explain Operation
1.2.2. Search Operation
1.2.3. Scan Operation
2. Prefix Query Format syntax and semantics
2.1. PQF tree structure
2.1.1. Attribute sets
2.1.2. Boolean operators
2.1.3. Atomic queries (APT)
2.1.4. Named Result Sets
2.1.5. Zebra's special access point of type 'string'
2.1.6. Zebra's special access point of type 'XPath' for GRS filters
2.2. Explain Attribute Set
2.2.1. Use Attributes (type = 1)
2.2.2. Explain searches with yaz-client
2.3. Bib1 Attribute Set
2.3.1. Use Attributes (type 1)
2.4. Zebra general Bib1 Non-Use Attributes (type 2-6)
2.4.1. Relation Attributes (type 2)
2.4.2. Position Attributes (type 3)
2.4.3. Structure Attributes (type 4)
2.4.4. Truncation Attributes (type = 5)
2.4.5. Completeness Attributes (type = 6)
3. Advanced Zebra PQF Features
3.1. Zebra specific retrieval of all records
3.2. Zebra specific Search Extensions to all Attribute Sets
3.2.1. Zebra Extension Embedded Sort Attribute (type 7)
3.2.2. Zebra Extension Rank Weight Attribute (type 9)
3.2.3. Zebra Extension Approximative Limit Attribute (type 11)
3.2.4. Zebra Extension Term Reference Attribute (type 10)
3.3. Zebra specific Scan Extensions to all Attribute Sets
3.3.1. Zebra Extension Result Set Narrow (type 8)
3.3.2. Zebra Extension Approximative Limit (type 11)
3.4. Zebra special IDXPATH Attribute Set for GRS indexing
3.4.1. IDXPATH Use Attributes (type = 1)
3.5. Mapping from PQF atomic APT queries to Zebra internal register indexes
3.5.1. Mapping of PQF APT access points
3.5.2. Mapping of PQF APT structure and completeness to register type
3.6. Zebra Regular Expressions in Truncation Attribute (type = 5)
4. Server Side CQL to PQF Query Translation
7. Administrating Zebra
1. Record Types
2. The Zebra Configuration File
3. Locating Records
4. Indexing with no Record IDs (Simple Indexing)
5. Indexing with File Record IDs
6. Indexing with General Record IDs
7. Register Location
8. Safe Updating - Using Shadow Registers
8.1. Description
8.2. How to Use Shadow Register Files
9. Relevance Ranking and Sorting of Result Sets
9.1. Overview
9.2. Static Ranking
9.3. Dynamic Ranking
9.3.1. Dynamically ranking using PQF queries with the 'rank-1' algorithm
9.3.2. Dynamically ranking CQL queries
9.4. Sorting
10. Extended Services: Remote Insert, Update and Delete
10.1. Extended services in the Z39.50 protocol
10.2. Extended services from yaz-client
10.3. Extended services from yaz-php
11. YAZ Frontend Virtual Hosts
8. GRS Record Model and Filter Modules
1. GRS Record Filters
1.1. GRS Canonical Input Format
1.1.1. Record Root
1.1.2. Variants
1.2. GRS REGX And TCL Input Filters
2. GRS Internal Record Representation
2.1. Tagged Elements
2.2. Variants
2.3. Data Elements
3. GRS Record Model Configuration
3.1. The Abstract Syntax
3.2. The Configuration Files
3.3. The Abstract Syntax (.abs) Files
3.4. The Attribute Set (.att) Files
3.5. The Tag Set (.tag) Files
3.6. The Variant Set (.var) Files
3.7. The Element Set (.est) Files
3.8. The Schema Mapping (.map) Files
3.9. The MARC (ISO2709) Representation (.mar) Files
3.10. Field Structure and Character Sets
3.10.1. The default.idx file
3.10.2. The character map file format
3.10.3. Ignoring leading articles
4. GRS Exchange Formats
9. ALVIS XML Record Model and Filter Module
1. ALVIS Record Filter
1.1. ALVIS Internal Record Representation
1.2. ALVIS Canonical Indexing Format
2. ALVIS Record Model Configuration
2.1. ALVIS Indexing Configuration
2.2. ALVIS Exchange Formats
2.3. ALVIS Filter OAI Indexing Example
10. Running the Maintenance Interface (zebraidx)
11. The Z39.50 Server
1. Running the Z39.50 Server (zebrasrv)
1.1. Description
1.2. Synopsis
1.3. Options
1.4. Files
1.5. See Also
2. Z39.50 Protocol Support and Behavior
2.1. Initialization
2.2. Search
2.3. Present
2.4. Scan
2.5. Sort
2.6. Close
2.7. Explain
12. The SRU/SRW Server
1. Running the SRU Server (zebrasrv)
2. SRU and SRW Protocol Support and Behavior
2.1. Search and Retrieval
2.2. Scan
2.3. Explain
2.4. Some SRU Examples
2.5. Initialization, Present, Sort, Close
A. License
1. GNU General Public License
B. About Index Data and the Zebra Server

List of Tables

6.1. Attribute sets predefined in Zebra
6.2. Boolean operators
6.3. Atomic queries (APT)
6.4. Relation Attributes (type 2)
6.5. Position Attributes (type 3)
6.6. Structure Attributes (type 4)
6.7. Truncation Attributes (type 5)
6.8. Completeness Attributes (type = 6)
6.9. Zebra Search Attribute Extensions
6.10. Zebra Scan Attribute Extensions
6.11. Zebra specific IDXPATH Use Attributes (type 1)
6.12. Access point name mapping
6.13. Structure and completeness mapping to register types
6.14. Regular Expression Operands
6.15. Regular Expression Operators
7.1. Extended services Z39.50 Package Fields