Anaphora Database

Enhanced Search

If you are comfortable conducting simple searches but you find you need a more precise tool, then it is time to invest a little time in mastering the Enhanced Search options. These options include searching for pre-sorted properties assigned to the three types of entities, languages, sentences and anaphoric markers. All of the specifications permitted by Simple Search are also present for enhanced searches, but on top of these all the property specifications are available, including any property specification you may have seen on the Browse page for the language you looked at, and many other sentence properties besides.

First a word about the property specifications - They are only as consistent as the protocol under which they were entered and the standardized usage of those who assigned these properties according to the protocol. If we determine that the matrix verb sentence of a sentence is a verb of semantic class X, what is our definition of ‘verb of semantic class X?’ for any X? If we say that a marker has a reciprocal or sociative reading, what is our definition of these readings? If we say that a language is ‘consistently SOV’ or ‘split word order SVO-SOV’, how do we assign these values to the languages in question? For every property of these three entities that can be searched, the criteria for the assignment of the feature are to be found listed on the Database Property Attributions page. We try to limit inconsistencies and errors of this kind by keeping our protocols and definitions clear (as we do for the glossing conventions), but since this is a human enterprise, there will be errors.

The best way to get a sense of what can be looked for is to state an objective and then try some Enhanced Search strategies to see how the database can be exploited to meet the objective. Here are three example enhanced searches. (The rest of this section is under construction).

Simple Search

In the database navigation column on the left, click on Simple Search. At the top of the Simple Search page, there is a heading in a green box labeled Search for and below it are three options which are the names of the entities that can be searched for: Languages, anaphoric markers, and sentences. This means that if you specify at the top of the page that you are searching for sentences of a certain type, then the result of the search is a set of sentences, but if you specify that you are searching for anaphoric markers of a certain type, then the result of the search will be a set of anaphoric markers, and likewise if you select for languages. The ways that your search can be conducted even for Simple Search are varied, but, we think, intuitively arranged.

Simple searches can be conducted by selecting an entity type and then setting the search parameters below by either specifying some text in the text boxes (‘search by word or substring’) or by clicking on a search parameter. The languages available for simple search are listed first, and if you leave this blank then all the languages available will be searched, but if you want to limit the search to just one language, then click on that language. If you want to search more than one language but not all, click on the first one you want, then hold down the control key and click on another one you want. If you want to ‘unspecify’ a language you have clicked, hold down the control key and click on it again to remove the specification.

If you are going to search for sentences that contain a particular gloss, such as ‘applicative’, which is rendered in our glosses as ‘APPL’, you will need know what our glossing conventions are, and for this go to the Glossing Conventions page.

Let’s do some simple searches.

Simple Search 1: Set the Search for parameter to Anaphoric marker. We are going to look for reciprocal markers. Suppose we want to see all of the morphological strategies in the database that result in reciprocal readings. Since we want to look at everything in the database, we do not specify any language. Since we are not looking for markers that appear in particular sentences, we leave the four lines of text under Sentence blank. For the same reason, we do not specify an anaphora questionnaire (AQ) elicitation sentence number either (more about this in the next simple search). Now we come to the last heading, Anaphoric marker, with two text boxes under it. The first box allows you to specify the name of a strategy and the second one allows you to specify part of the description of the strategy. Let’s try both in turn.

First, just specify the description by typing in ‘reciprocal’. Leave the Name text box empty. Then go to the bottom of the page and click on the Search button. What comes up is a list of all of the reciprocal markers from all of the languages in the database that have the word ‘reciprocal’ as part of their description. Notice that the same name may be applied to the anaphoric marker in several languages, but there is separate heading for that name for each language where it occurs. This is because every time we enter a new anaphoric marker for any given language it gets a new ID# which distinguishes it from every other marker.

Now try a search by marker name. You will notice from our first search that many reciprocals are named ‘RCM’, which comes from our glossing convention for reciprocal readings that are marked by virtue of an affix on the verb. If ‘anaphoric marker’ is selected under Search for and you type in ‘RCM’ in the Name text box under Anaphoric marker, leaving all other specifications blank, and if you then click on Search, the result is another set of markers, this one smaller than the last set, but it has a list of all of the anaphoric strategies that use a verbal reciprocal affix across all the languages of the database.

As it happens, the reciprocal affix in almost any Bantu language is cognate to all the others. This is something that you would discover if you first do a simple search for the anaphoric marker name ‘RCM’, and then isolate all the Bantu languages, but the specification for language family is an Enhanced Search page feature, and so the only way you can do this for a simple search is to specify all the languages you know to be Bantu under the Language heading and the markers from only those languages will be gathered in the search. If you wanted to limit your search to only those languages with a particular word order, that would also require an enhanced search, where language properties can be specified for.

Simple Search 2: Now lets search for a set of sentences. Suppose we want a close comparison of all the sentences across all the languages that were elicited by a particular sentence that appears in the AQ. Suppose that sentence is (C13d), ‘Nick put his book on the table’(where ‘his’=’Nick’). Below the Sentence heading, under the four text lines for sentence searches, there is a subheading Prompt sentence, and below it a list of the elicitation (prompt) sentences from the AQ listed (for the most part) in the order that they appear in the AQ. Only the first six sentences appear in the window, so simply scroll down until you get to (C13d), then click on it. Then press the search button.

The sentences that come up are from all of the languages that we have a translation of this sentence for. For some languages, there will be more than one example corresponding to (C13d) because the consultant tried out more than one way of translating the elicitation sentence. You will notice that all of the sentences that are acceptable are marked ‘(ok)’, and those that are not acceptable are marked (*) or some combination of stars and question marks that indicates the degree of degraded acceptability. Life being what it is, many examples will not have all four sentence lines entered, as many will lack the ‘original text’ line (perhaps the least crucial for syntacticians, but we are trying to improve our data collection in this regard).

It is important to realize that not every sentence can be searched for by prompt sentence #, since many sentences volunteered by our consultants were not based on elicitation models, and much follow-up work involved the elicitation of data that was not initially requested in the AQ. Still, prompt sentence elicitations provide close comparisons where they are available and where the consultant responds with a sentence that is as close to the prompt sentence as possible.

Simple Search 3: Suppose we would like to know if there are any sentences in any language in the database that ever permit both a RCM and a RFM (verbal affix reflexive marker) in the same sentence. For most analyses of Bantu languages which have both a RCM and a RFM, it is generally assumed that the affixes are mutually incompatible. This search should come up empty, even if we do it right. So how do we do it?

We have to be a little creative for this search. The problem is that the gloss line searches are weakly conjunctive, that is, if we enter ‘RCM, RFM’ in the gloss line or ‘RCM RFM’, then we will get all the sentences that have either affix or both, and this is a very long list. However, since ‘RCM’ and ‘RFM’ figure both as glosses and as names of strategies, we can search for the combination sentences by entering ‘RCM’ in the gloss line and ‘RFM’ in the marker name line. Then hit Search.

The result will be a list of sentences that use both a RFM and a RCM in the same sentence. The result of my search today (November, 2008) provides 16 examples, all but one from Lubuksu, and all with both affixes on the same verb. What we really want to see is whether or not the RFM and the RCM are ever attached to the same verb, but it is conceivable that this search could have called up a complex sentence with a subordinate clause or two, where the RCM could have been on a different verb from the RFM; fortuitously, it appears there are no such sentences in the database just now. Still, we have made a surprising discovery - at least in Ikalanga (one example) and Lubukusu, these affixes are not in complementary distribution.

Limitations of Simple Search: Suppose you are interested in anaphoric strategies as they apply to verbs that tend to be understood reflexively when they are intransitive, such as English ‘wash’, ‘dress’ or ‘shave’. Many languages have distinct strategies for forming reflexive readings for these verbs, as English does, insofar as verbs like ‘kill’, ‘describe’ and ‘cut’ do not permit the same strategy. How would we look for just this class of verbs to see what distinctions might arise? A simple search could enter ‘wash’, ‘dress’ and ‘shave’ in the gloss line, and if you try it with simple search, much of what we are looking for will come up. Still, there may be more such ‘grooming’ verbs than one can think of offhand, and some languages may have grooming verbs that don’t correspond to the English ones. Wouldn’t it be convenient to simply limit your search to sentences with grooming verbs? Or suppose you are interested in perception verbs or epistemic verbs? In fact, the database is coded for these properties of sentences (semantic class of main verb), and for a host of other sentence properties, but unless you try out Enhanced Search, you will not be able to exploit the full potential of the database, which also codes for the anaphoric marker and language properties that you saw when you looked at the Browse page for the language of your choice. It’s time to explore the Enhanced Search page.


Some of the most useful classifications of our data to be used for search and data manipulation are the properties that we attribute to the various entities that can be searched for, namely, languages, anaphoric markers, and sentences. This page provides definitions and procedures that guide our property attributions. It should be of particular use to those using the Enhanced Search options and to our own data enterers who generate the distinctions we rely on. While the Language and All Project sentence questions that are found in the Generic Portal are common to all the portals. The other portals we have at this time, the Anaphora Portal and the Clausal Complementation Portal, have additional entities to which properties are attributed. We have organized this page by portal type, where all the properties of each sort of entity specific to that portal are described (and used in data entry). We expect that there will be additional questions and property attributions over time, and occasionally, there will be slight adjustments in the existing property attributions, as long as data is not lost. Changes will be signaled by new version numbers for this page. This is version 1.3.

The Generic Portal

The Anaphora Portal

The Clausal Complementation Portal

Browsing in the Database

Users unfamiliar with database organization should know right at the beginning that, in terms of how they are used, databases can be quite different, and so the first thing to do is to get a sense of what this one does. The easiest way to do that is to enter the database and click on the Browse page. That page will provide you with a list of languages that you can look at individually. Click on Details for a language you are interested in. The page that comes up then gives you three headings boxed in green:  Anaphoric markers, Examples, and Custom properties.

To get a feel for how the data is presented, click first on list examples which is just below Examples. This will bring up the first page of the listed examples for the language you have selected, and there you will see the four lines that are presented for each example, that is the ‘original text’ line, the ‘morpheme breakdown’, which is our consultant’s estimation of the morphological breaks, then the ‘gloss’ line which aligns with the morpheme breakdown, and finally the translation. Whenever you search for a sentence or a set of sentences, the sentences that the search returns will be in this four line format. There are around 300-500 sentences in the database for each language that has a fully entered AQ response, so there are about 15-25 pages of examples with about 20 examples per page.

As you examine the presentation of the examples, you will notice that every sentence in the database has a unique ID:#. This number distinguishes this sentence from every other sentence in the database and however many examples we add to the database, this number will not change. Even if an example should be deleted at some point, no new ID# will ever be reassigned, so if you have written down the ID#, then you can always find that sentence by searching for that ID# (with Enhanced Search). Do not confuse the ID# with the list number that occurs at the far left. The list numbers are consecutive starting from ‘1’ for each language and do not have any function beyond enumerating the list of sentences on the page. The ID#s, by contrast, are often not consecutive. List numbers, unlike ID#s, are not searchable.

When you have browsed the examples to your satisfaction, we recommend that you return to the Browse page for the language you have selected and consider the other two headings in the green boxes. The Custom properties are not for search, but simply present information about our native speaker linguist consultant and some information about the elicitation (which is available more discursively in the static AQ response). Anaphoric markers, however, is an interesting window into one of the entities that is available for enhanced search. There should be a list of anaphoric markers for each language (if the language data has been fully entered) and they are all given names (these too have ID#s for each marker that is entered). The anaphoric marker properties that can be browsed for each named marker include marker shape, context properties, and readings. The marker shape properties are morphological in nature, the context properties, involve the range of syntactic contexts and lexical cooccurrence relations where the marker can be found, and readings provide the kinds of anaphoric meanings that the marker permits (e.g., reciprocity, dependent identity, etc., and more than one reading may be possible). Every property that is specified for an anaphoric marker is a value that can be searched for, but only if you use the Enhanced Search feature of the database.

Now that you have a sense of how the data is presented, we recommend you try a simple search.

Afranaph Glossing Conventions – Revised September, 2016

This is a list of the most commonly used glosses in the Afranaph database. The glosses follow the general guidelines of the Leipzig Glossing Rules [http://www.eva.mpg.de/lingua/resources/glossing-rules.php

Content words are glossed with the best-guess English translation. All conventional glosses should be drawn from the list below, although departures from standard notation have sometimes been judged appropriate by consultants and/or analysts in order to convey relevant language-specific information. Words are separated by spaces on original text, morpheme breakdown and gloss lines. A single morpheme in the morpheme breakdown is bounded on either side by an empty space (at the beginning or end of a word) or by a dash or dashes if it is word internal, and if an indivisible morpheme expresses a combination of glosses, then the glosses corresponding to the single morpheme are separated by a period, as in the example below.

 English: Bill saw girls
  Bill saw girl-s
  Bill see.PST girl-PL

At various points we diverge from the Leipzig glosses either because they are not available for certain kinds of morphemes, they are not specific enough for certain kinds of distinctions, or they are not distinct enough to allow for optimal searches in the database. We expect that we will add to this list or reassign some glosses from time to time, as long as no data is lost or misrepresented as a result.

 

Gloss Meaning Usage note
1st 1st person  
2nd 2nd person  
3rd 3rd person  
ACC Accusative Amharic
AGR Agreement When a more specific term like OM or SM is not used
AGT Agent or Agentive  
AM Associative marker Urhobo, cf. Urhobo AQR
APPL Applicative  
ASP Aspect  
BEN Benefactive  
CAUS1 Causative affix  
CAUS1 Long causative affix Cf. Hyman (2003), Good (2006)
CAUS2 Short causative affix Cf. Hyman (2003), Good (2006)
CJ Conjoint  
CL Noun class marker Where no noun class number is available
COMP Complementizer  
COND Conditional  
CONJ Conjunction  
CPL Centripetal Indicates the event involves motion toward the subject
cX2 Noun class prefix for class X Where X = some number
DEFAGR Default Agreement  
DET Determiner  
DJ Disjoint  
ERL Affix meaning the event was early See use in Eegimaa
EXCL Exclusive  
F Feminine  
FMR Former/used to See use in Eegimaa
FUT3 Future  
FV Final vowel  
GEN Genitive  
HAB Habitual  
HUM Human Where the human/non-human distinction has exponents
IMPV Imperative  
ICV Inclusive Where the distinction is used for 1st and 2nd plural
INF Infinitive  
IPFV Imperfective  
IRM Inherent reflexive marker (verbal affix) When there is an exponent
IRR Irrealis  
LOC Locative  
M Masculine  
MALF Malefactive  
MID Middle voice Ikalanga, Kirundi
NEG Negation

 

NML4 Nominal Where there exists nominalizers
NOM Nominative  
OBJ Amharic: Object agreement  
  Lokaa: Objective case  
OM Object marker (verbal affix)  
OPT Optative  
PART Partitive  
PASS Passive  
PFV Perfective  
PL Plural  
POSS Possessive  
PRN Pronominal  
PROG Progressive  
PRS Present  
PST5 Past  
RCM Reciprocal marker Reciprocal verbal affix
RECP6 Reciprocal (not a verbal affix)   
RED Reduplication  
REFL7 Reflexive (not a verbal affix)  
REL Relative  
REP Repetitive  
REV Reversative  
RFM Reflexive marker Reflexive verbal affix
RLS8 Realis  
RS Lexical reciprocal base Amharic
SBJ Subject agreement Amharic
SBJV Subjunctive  
SG Singular  
SM Subject marker Verbal affix, Bantu-specific notation
TM Tense marker When no more specific gloss is possible
TAM Tense aspect marker When no more specific gloss is possible
TNS Tense When no more specific gloss is possible
WHAGR Wh-agreement

 

     

 

Further notes

1 CAUS: The distinction between CAUS, CAUS1 and CAUS2 is included because there are Bantu phenomena of particular interest in this respect. Searching for CAUS will find all three glosses, but the distinction between CAUS1 and CAUS2 is included because many Bantu languages have two affixes that have been characterized (by some) as causative, although their effects differ in interesting ways (and ways that interact with patterns of anaphora). In languages where both affixes are present, we have classified them according to morphological and (to a lesser degree) semantic effects that distinguish them. For languages that have only one causative affix, CAUS is used exclusively.  

2 cX: This is why we do not just use ‘1’, ‘2’ and ‘3’ for person glosses. If our users want to search for c3 nouns and agreement, there is no confusion with ‘3rd’, but more importantly, if one is searching for 1st, searching for ‘1’ would bring up both noun class and person tokens.

3 FUT: In languages that have temporally distinct future morphemes, FUT is distinguished as FUT1 (temporally closest to PRS), FUT2 (next temporally closest)…FUTn. Many Bantu languages have such systems.

4 NML: This is suitable both for nominalizing morphemes such as the Yoruba 'i-' or noun class markers as in Baatonum and other languages with class markers.

5 PST: In languages that have temporally distinct past morphemes, PST is distinguished as PST1 (temporally closest to PRS), PST2 (next temporally closest)…PSTn. Many Bantu languages have such systems.

6 RECP: This is almost never used in our database because the argument position markers that are interpreted as reciprocals are almost always roots that have other meanings outside of reciprocal contexts. We are particularly interested in the morphological and semantic properties of distinct forms of reciprocal marking. See the fn. on REFL.

7 REFL: Many markers that are interpreted as reflexive are not glossed with REFL because we are particularly interested in the morphological and semantic properties of distinct forms of reflexive marking. We only use REFL when the morpheme in question is not a verb affix and has no other meaning in another use in the language, e.g., when a reflexive argument consists of a pronoun compounded with a root that means ‘body’, the gloss is PRN-BODY, or in some cases, the capitalized morphology of the root in the language (particularly when more than one root can for a reflexive argument).

8 RLS: For example, Jóola Eegima has an exponent for realis, but none for irrealis.