HdtDep is a search engine for a treebank consisting of the first book of Herodotus' Histories. The treebank is encoded in an XML file based on A. Godley's Loeb edition (1920), available under a Creative Commons Attribution-ShareAlike 3.0 United States License on the Perseus Project website. All typos have been corrected.
The Greek characters are encoded in the UTF-8 Unicode format. The XML files is structured in <chapter> and <sentence> node, which contain <word> nodes. All punctuation was removed. Since the UTF-8 format encodes graphemes with different diacritics as distinct glyphs, all grave accents have been turned into acutes (in order to improve the searchability). Enclisis accents have been removed. All elided vowels have been restored. Moreover, all crasis forms have been resolved into uncontracted words, in order to correctly represent their syntactic relationship.
The syntactic structure of the sentences has been described by applying an adapted version of Igor Mel'čuk's dependency theory (Mel'čuk 1988: Dependency Syntax: Theory and Practice, Albany; Mel'čuk 2009: 'Dependency in Natural Language', in A. Polguère & I. Mel'čuk, Dependency in Linguistic Description, Amsterdam - Philadelphia: 1-110; Vatri 2011: Syntactic dependencies in Classical Greek [submitted]). Each word is annotated with the element it depends on and its grammatical category/sub-category (see below). Nouns, adjectives and verbs also contain the Attic lexical entry under which they appear in LSJ. The syntactic relationship types, whose interpretation is highly theory-dependent, has not been encoded.
The project was presented at the 2011 Digital Classicist seminar in London. Further technical details on the project are available in the published presentation.
1. Search page
Add new element
The button adds a new element to the query.
The elements will be searched in the exact order in which they appear in the list.
Show category labels
If the box is ticked, the grammatical categories of the searched elements will be shown in the search results. The abbreviations are listed
here.
ID
The figure identifying the element in the query.
Category
Select the grammatical category to which the element belongs. The list contains 6 special categories:
- Anything: any word.
- Same as: a word which belonging to the same category (whatever it may be) as a preceding 'Anything' element. The user will be prompted to input the ID of the 'Anything' element.
- Input word (Unicode): the user will be prompted to input a string in Unicode format. Word limits can be marked by # (hash symbol).
- Lexical entry: the lexeme list will be loaded, and all forms belonging to the selected lexeme will be included in the query.
- Endotactic clause: parenthetical clauses have been encoded as independent sentences and replaced by a signpost in their host sentence. This option searches for all such signposts.
- Noun or pronoun: all nouns (including substantivized elements) and pronouns.
- Noun or pronoun: all adjectives and adverbs.
Morph. button
The
Morph. button prompts users to select the desired morphological categories to be included in the query for each word. N.B.: all selected categories have to occur together for that word to be returned as a result (boolean AND). When one or more categories are selected, the text in the button will be displayed in red.
Dependency
Users
can type the
ID of the element the word must depend on. If left blank, dependency information for that word will be ignored. If the word must be the
head of the sentence, the ID to be entered is
0. If the operator
not is added before the ID (with a space), the search engine will return words that do not depend on the specified ID. The operator
= can be used to specify that the word must depend on the same word in the sentence (which need not be entered) as the element indicated by the ID.
Contiguous
Specify if the word must or must not
immediately follow the previous element. If this is not relevant, the option
Both must be selected (as it is by default).
2. Search results
The results are listed in the order in which they occur in the text. The sentence node and the chapter they belong to are also displayed. By clicking on the "sentence" link, the entire sentence will be loaded and displayed in a table. The first row contains the exact wording, the second one contains the grammatical category to which each word belongs (
abbreviation list), and the third row shows the element each word depends on. On suitable browsers (not on Internet Explorer), a "View" link will appear, which will load a self-generated graphic representation of the dependency structure (dependency arcs).
3. Abbreviations
|
ADJ
|
adjective
|
|
ADJ ANA
|
anaphoric adjective
|
|
ADJ DEM
|
demonstrative adjective
|
|
ADJ IND
|
indefinite adjective
|
|
ADJ INT
|
interrogative adjective
|
|
ADJ NOU
|
substantivized adjective
|
|
ADJ NUM
|
numeral (ordinal) adjective
|
|
ADJ POS
|
possessive adjective
|
|
ADJ REL
|
relative adjective
|
|
ADJ VER
|
verbal adjective
|
|
ADV
|
adverb
|
|
ADV INT
|
interrogative adverb
|
|
ADV NOU
|
substantivized adverb
|
|
ADV REL
|
relative adverb
|
|
ART
|
article
|
|
CON
|
subordinating conjunction
|
|
CON PAR
|
coordinating conjunction
|
|
ENDO
|
parenthesis
|
|
ITJ
|
interjection
|
|
NEG CON PAR
|
negative coordinating conjunction
|
|
NEG PAR
|
negative particle
|
|
NOU
|
noun
|
|
NOU PPR
|
proper name
|
|
NOU VOC
|
vocative
|
|
PAR
|
particle
|
|
PRE
|
preposition
|
|
PRO
|
pronoun
|
|
PRO ANA
|
anaphoric pronoun
|
|
PRO DEM
|
demonstrative pronoun
|
|
PRO IND
|
indefinite pronoun
|
|
PRO INT
|
interrogative pronoun
|
|
PRO NUM
|
numeral (ordinal) pronoun
|
|
PRO PER
|
personal pronoun
|
|
PRO POS
|
possessive pronoun
|
|
PRO REL
|
relative pronoun
|
|
PRO REL IND
|
relative-indefinite pronoun
|
|
VER FVE
|
finite verb
|
|
VER INF
|
infinitive
|
|
VER INF NOU
|
substantivized infinitive
|
|
VER PPL
|
participle
|
|
VER PPL NOU
|
substantivized participle
|