Functions

File and corpora

Diorisis Search is capable of:

Loading the Diorisis Corpus (in XML or JSON format) from a local folder.
Loading single Diorisis Corpus files (in XML or JSON format) from any location.
Parsing any Greek text and loading it into the search engine.
Saving and reloading queries created in the in-app Query Builder.
Exporting the results as a Microsoft Excel 2010+ xlsx workbook.
Exporting selected result sentences as a Microsoft Word 2010+ docx document.
Producing and marking parsing exercises.

Online services

Search results can be uploaded to the application web server for sharing in the app.
Other users’ results can be browsed/searched in the app for loading, viewing, and exporting.
Users can report annotation errors to the maintainers of the Corpus directly from the app.
The app also provides links to the Corpus repositories (XML and JSON versions).

Corpus management

Users can select any combination of texts from the Diorisis Corpus.
Custom selections of texts can be saved and reloaded.
If available, Diorisis Search returns the Diccionario Griego-Español abbreviation corresponding to each author/work.

Text Reader

Browse, navigate, and read all Diorisis corpus texts.
Words can be clicked to show the annotation data.
Search for phrases directly in the reader and jump between results.

Query builder

Users can build complex queries automatically using a simple interactive form with commands in natural English.
Queries can be saved to a local database and be reloaded for use on any set of texts.

Searchable elements

Users can include the following types of linguistic items in their queries:

the exact phrase

Searches for the exact sequence of word nodes whose respective @form attributes correspond exactly (i.e. including grave vs acute accents) to the user input, unless the option ignore diacritics is selected (this option may also be activated globally). Forms may be input either in Unicode (UTF-8) Polytonic Greek or in BetaCode. Punctuation signs may not be included in the search and will be ignored by the search engine (e.g. the sequence λέγω ὅτι will return instances of both "λέγω ὅτι" and "λέγω, ὅτι").

the exact form

A word whose form corresponds exactly (i.e. including the grave vs acute accents) to the user input, unless the option ignore diacritics is selected (this option may also be activated globally). Forms may be input either in Unicode (UTF-8) Polytonic Greek or in BetaCode.

a form containing the sequence

A word node whose form contains the sequence (the string) input by the user.
For instance, πι selects all forms that contain πι (e.g. πιστεύεις, Ἀσκληπιῷ, etc., but not ἐλπίζων or πίπτει, unless the option ignore diacritics is selected). Strings may be input either in Unicode (UTF-8) Polytonic Greek or in BetaCode.

a form of the lemma

A word whose lemma corresponds exactly to the one selected by the user from the list of lemmas occurring in the Diorisis Corpus (and not all lemmas in e.g. LSJ, some of which would not be found anyway!). Lemmas may be input either in Unicode (UTF-8) Polytonic Greek or in BetaCode with all diacritics (even if the option ignoring diacritics is active).

a form of the lemma that contains the sequence

A wordwhose lemma contains the sequence (the string) input by the user.
For instance, πι selects all forms of lemmas that contain πι (e.g. πιστεύω, ἐπιλαμβάνω, etc., but not ἐλπίς, unless the option ignore diacritics is selected). Strings may be input either in Unicode (UTF-8) Polytonic Greek or in BetaCode.

a word with the following morphological features

A word of which at least one possible morphological analysis corresponds to the combination of values input by the user.

the punctuation mark

A punctuation mark corresponding to the user input.

If words are selected according to their form or lemma, users have the option to specify which morphological analyses should be possible for the word, if required.

Search commands

The relationship between linguistic items in the linear order of the sentence can be specified in the following ways:

followed by

Requires that the first word or punctuation mark should be followed by a word or punctuation mark whose features the user will be prompted to specify.
If the first element in the query is a strong punctuation mark (full stop, middle dot, or question mark), the search will be extended to the immediately following sentence (e.g. queries can capture words that follow a question mark within the following sentence).

followed or preceded by

Requires that the first word or punctuation mark should be followed or preceded by a word or punctuation mark mark whose features the user will be prompted to specify.
This command is not available if the first element in the query is a strong punctuation mark (full stop, middle dot, or question mark).

followed by (ignore punctuation)

Requires that the first word or punctuation mark should be followed by a word mark whose features the user will be prompted to specify.
If the first element in the query is a strong punctuation mark (full stop, middle dot, or question mark), the search will be extended to the immediately following sentence (e.g. queries can capture words that follow a question mark within the following sentence).
NB: The scope of the search is defined by counting only the number of word nodes.
For instance, in the sequence [ ὦ ἄνδρες, ἐγὼ ], ἐγὼ counts as immediately following ἄνδρες (scope = 1).
If used to include in the query more than one element after the first, this command will appear as preceded by the word and in the drop-down menu.

followed or preceded by (ignore punctuation)

Requires that the first word or punctuation mark should be followed or preceded by a word whose features the user will be prompted to specify.
NB: The scope of the search is defined by counting only the number of word nodes.
For instance, in the sequence [ ὦ ἄνδρες, ἐγὼ ], ἐγὼ counts as immediately following ἄνδρες (scope = 1).

When any of these commands is selected, the user will also be prompted to indicate the scope of the search (i.e. the required distance or range of distances of the target from the first element).

Specifies that the first element may be defined by an alternative set of features. This command is only available while specifying the first element of the search.

ignoring diacritics

Requests that all diacritics signs be ignored in all form- and lemma-based searches. This option may be activated selectively for individual elements.

All elements may be searched for negatively, that is, it is possible to search for elements that match any feature but those specified (e.g. anything but the exact form instead of the exact form).

The maximum scope of a search is one sentence. In the Diorisis Corpus, sentences are defined as sequences of words and punctuation marks delimited by a strong punctuation mark (full stop, middle dot, or question mark).

Within the sentence, searches for individual elements to follow or precede the first one may be restricted to a specific range (scope) for each element. The following options are available:

within

The search engine will search for the specified element within one and the specified number of elements from the first element.
For instance, a search for the form ἀνὴρ within 3 words after the form ὁ will capture sequences like ὁ ἀνὴρ, ὁ δ’ ἀνὴρ, ὁ αὐτὸς ἀνὴρ, or ὁ δ’ αὐτὸς ἀνὴρ.
With the commands followed by or followed or preceded by, the range is calculated counting the number of word and punct nodes.

between

The search engine will search for the specified element in elements at a distance from the first element ranging within the specified lower and upper end.
For instance, a search for the form ἀνὴρ between 2 and 3 words after the form ὁ will capture sequences like ὁ δ’ ἀνὴρ, ὁ αὐτὸς ἀνὴρ, or ὁ δ’ αὐτὸς ἀνὴρ, but not ὁ ἀνὴρ.
With the commands followed by or followed or preceded by, the range is calculated counting the number of word and punct nodes.

exactly

The search engine will search for the specified element in elements at the specified distance from the first element.
For instance, a search for the form ἀνὴρ exactly 2 words after the form ὁ will capture sequences like ὁ δ’ ἀνὴρ or ὁ αὐτὸς ἀνὴρ, but not ὁ ἀνὴρ or ὁ δ’ αὐτὸς ἀνὴρ.
With the commands followed by or followed or preceded by, the range is calculated counting the number of word and punct nodes.

With the commands followed by (ignore punctuation) or followed or preceded by (ignore punctuation), the range is calculated counting the number of word nodes only (and, as a consequence, it is wider).

in the same sentence

The search engine will search for the specified element in elements in the same sentence as the first element.

Results

Diorisis Search returns the following data:

Summary: a table showing the number of word tokens in each document, the raw count of the occurrences of the search pattern, and its relative frequency per 100 word in each document.
Result Sentences: all the sentences in which occurrences of the searched pattern are found. Results are grouped by text, and the elements captured by the query are highlighted in bold. All words in each sentence are clickable. Clicking on a word opens a pop up displaying its lemma and possible morphological analyses.
Rankings of lemmas and forms occurring as the first element in the query: the raw count of lemma and forms occurring as the first element in the query (ranked from high to low count, listing up to 100 elements). If one wishes, for instance, to discover which verbs directly precede a genitive in Aristophanes, this is the data to look at.
The main table contains the aggregate data for all the texts searched and is followed by a table per text (if only one text is included in the corpus, the data will coincide).

Result sentences can be saved in the Saved Sentences workbook. Saved sentences are temporarily stored in memory, along with their exact reference, and may be viewed and as a Microsoft Word 2010+ docx document for use e.g. in handouts, exercises, or other teaching materials.

Teaching Tools

Diorisis Search can be used to create and mark parsing exercises.

Each exercise consists in parsing each of the forms in a list/sentence or set of sentences.
Teachers can create exercises by entering forms/sentences to be parsed by students.
Teachers will then parse each form themselves (this process is aided by the automatic parser) and create a key to the exercise.
The programme will also create empty exercise script files which can be distributed to students.
Students will load the script file into Diorisis Search and parse each form in the exercise (the automatic parser will not be available).
They will then return the script file to the teacher, who will load the key and each/all of the exercise scripts into the programme for automatic marking.
Teachers will then be able to save each marked script as a separate file that can be returned to each student and viewed in any web browser.

Changelog

Version 3.3 Nov 2022

Added function: add text/passage reference when copying selections from reader (not only when displaying search results).
Added function: start/end-with flags for partial form/lemma searches.
Added function: add selections/sentences to Saved Sentences from reader.
Interface update: removed 'Minimal view'.
Interface update: texts with no hits are hidden, can be toggled.
Interface update: view hits for one text at a time in Result Visualizer.
Interface update: separated Export Results from Save Results.
Bug fix: export Excel.
Bug fix: copy text from reader.
Bug fix: italicised speech numbers in references.

Version 3.21 Nov 2021

Bug fixes.

Version 3.2 Oct 2021

Export text to UTF-8 txt files tool added.
Reader: added support for line breaks in poetry.
Reader: added support for edition information display.
Fixed bugs in corpus updater.

Version 3.11 May 2021

Bug fixes.
Support for 64bit Linux.

Version 3.1 Apr 2021

Lemma suggestions in query builder.
Navigate results in text reader.
Quick search for phrases in text reader.
Updater for single Diorisis Corpus files.
New error reporting system.
Read text of imported file.
Performance improvement loading dictionary.
Windows installer.
Cosmetic fixes.
Bug fixes.

Version 3.01 Mar 2021

Fixed bug with Text Reader and XML version of the Diorisis Corpus.

Version 3.0 Feb 2021

Added Text Reader.
New result visualizer.
Context menus to copy text and references.
Ability to restrict searches to parts of texts.
Save results to local files and reload them into the app.
Improvements in corpus selection interface.
Bug fixes.

Version 2.12 Jan 2021

Integration of DGE (Diccionario Griego-Español) author/work abbreviations.
Search and export DGE abbreviations.
Automatically view DGE abbreviations of corpus works.
Use DGE abbreviations as references when exporting sentences into Word documents.

Version 2.11 Dec 2020

Bug fixes.
New 'Search exact phrase' function.

Version 2.10 Nov 2020

Spanish localization for the parsing exercises (courtesy of Alberto Pardal Padín).
A new, streamlined in-app update system.

Version 2.0 Oct 2020

Bug fixes and cosmetic improvements.

Version 2.0β Sep 2020

Fixed bug in visualization of shared results.
Monitor progress and cancel query.
Performance boost of search engine.
Optimized memory handling in visualization of results.
JSON conversion bug fix.
Search engine bug fixes.
Added support for non-lemmatized texts.
Added possibility to combine multiple user-defined corpora.
Possibility to search for negative patterns.
Parser for non-Diorisis texts typed in by users.
Disambiguator for user-input parsed texts.
Possibility to search user-input parsed texts.
Ability to save lists of sentences.
Create, solve, and mark parsing exercises, with multilingual support (English/Italian).

Version 1.02 May 2020

Fixed bugs in morphology search.
Fixed bugs in XML to JSON converter.
Fixed bugs in morphological analysis visualizer.

Version 1.01 Apr 2020

Fixed bug: saving queries and corpora.

Version 1.0 Apr 2020

Possibility to upload searches to online archive (w/online API) added.
Bug fixes in error reporting system.
Possibility to save and reload queries.
Cancel buttons on all dialogs.
Support for BetaCode in lemma input.
Palette restyling.
Bug fixes in search engine for context words defined morphologically.
User review system added.
Outputs count of lemmas/forms occurring as the first element of a query (seed).
Restyling of result window, with collapsible sections.

User Reviews

Caio Borges Geraldes

App version: 1.0

Researcher (Linguistics)

Reviewed in Brasil on 11/05/20 19:33:32

The search engine is very good and has great flexibility, surely something I will be using in my research from now on. I have two suggestions that would make the engine more usefull for my own research and maybe for other colleagues : 1. A minor improvement would be to include a native Linux version. It is possible to run it with Wine on Linux machines, but the loading time with Wine is a bit too long and might be affecting the query times. It might be quite simple to do so, but I am not so sure. 2. The major improvement I would like to see is to include not only the frequency counts of the query's result, but also the sentences themselves in the export result file. It would make the application way more effective for building research databases for further annotation. It seems to me that this addition can be easily implemented since the engine is already returning this data. Thank you very much for the project!

Mar A Rodda

App version: 3.2

Researcher (Classics)

Reviewed in UK on 24/06/24 16:50:28

Extremely useful tool, ensuring that one of the best available processed corpora (Diorisis) is accessible to the wider researcher community without the need for a sophisticated linguistics/coding background. I have used Diorisis for my research (with my own scripts), and have both recommended Diorisis Search to colleagues who used it in research papers and used it in workshops for students. I have not tried it as a teaching tool, but the ability to set and mark exercises is very promising!