A Guide to the Afranaph Database
- Last Updated on Sunday, 25 September 2016 20:37
A Guide to the Afranaph Database
The Afranaph Database (AD) is a publically accessible research resource that allows any interested person to explore the data we have collected and vetted for specific language or crosslinguistic research. Although the original AD was specifically designed for the study of anaphora, our project has evolved to study a wide variety of issues, so the database has now been redesigned to meet new needs. In particular, there are now several windows or ‘portals’ that can be used to view our data from different perspectives. This guide explains the rudiments of the new portal design and provides practical information to start using the AD.
Some key terms
It is important to understand three terms of art that we use throughout this guide, namely, ‘entities’ and ‘properties’ and ‘details’. An entity is anything that the AD allows you to search for. A property (of an entity) is anything that helps you define a search for an entity. For example, if you want to find all the sentences that have a clausal complement embedded in them, then you must set the search entity as ‘sentence’ and check the property of ‘has clausal complement’ in the dropdown of sentence properties. The search will then return all the sentences in a list that have this property. If you click on the ‘details’ button for any particular sentence, any additional properties or information about that sentence will then appear. Click on the ‘details’ button for any entity and all the properties of that entity will then appear, including researcher and consultant comments. Thus ‘details’ include properties as a subset, but only properties can be designated to guide searches for entities. This is clearer when once you try out a Sample Search or two.
Choosing a portal
From the beginning of the Afranaph Project, our data collection has been driven by particular inquiries developed by linguists interested in specific issues, so the data we have collected naturally reflects the interests and emphases of the various projects that have inspired elicitation. Our original project was limited to the study of anaphora and the database was originally designed to serve that sole research purpose, but as the goals of Afranaph researchers have grown more diverse, the design of our database has since 2016 been engineered to serve both the structured needs of Afranaph Sister Project researchers, on the one hand, and those who come to our data with different or more general goals, on the other.
For this reason, we have built several ‘portals’ for viewing and entering data that serve the needs of particular projects and permit searches particularly pertinent to those particular projects, but all the sentence data that is collected, no matter what sister project it is collected by, can be viewed through every portal. Our Database homepage provides access to these portals which can be opened with a click.
A Generic Portal allows complete access for browsing all of our data but with a simple search interface that is designed only for searches that return sets of sentences or sets of languages as responses to search inquiries. More on the Generic Portal
Each sister project portal allows searches for ‘analytic entities’ specific to those projects. An analytic entity is something that researchers are interested in, i.e., an abstract entity type defined by researchers because generalizations about entities of that type are deemed significant for the research question the portal is defined by. To see how an entity is deemed to have a property, see the Database Property Attribution Guide.
The Anaphora Portal which serves the Anaphora Sister Project includes an analytic entity ‘anaphoric marker’ which is any exponent or phrase that functions as an anaphoric element. More on the Anaphora Portal
The Clausal Complementation Portal (CCP) is designed to study how clausal complements are matched with the predicates they occur with language internally and crosslinguistically, as is outlined on the Clausal Complementation Sister Project. The CCP has three analytic entities in addition to ‘language’ and ‘sentence’ entities. These include clause types, C-types and predicate types. The development of the CCP has provided a prototype for all of our future portals. More on the CCP
At present (2016), only three portals exist, but there are more in our Plans for the future.
The easiest way to see what the database has in it is to open the Generic Portal and click on Browse. A list of languages for which the database has data will then appear. Choose a language and click on it. What will then appear is a list of all the entity types (from all the projects) that we have information about in our database. You will see that in addition to the list of sentences (click on that to open the full list of sentences), there is also information about many other sorts of entities, such as ‘anaphoric markers’ and ‘clause types’, which have been defined for use in other portals. By clicking on any specific entity (such as a sentence or an anaphoric marker), a list of its properties will appear. All properties are attributed according to guidelines for each entity type which are detailed in the Database Property Attribution Guide. In the specialized portals, only the analytic entities specific to that portal are displayed by browse, but the properties of those portal specific entities are available to limit searches and they are not visible for search in the other portals.
The first thing to do when you are going to search for something is to designate what sort of answer you want, that is, what entity type are you searching for? This choice must be made at the top of the search page (some portals have defaults). Though portals differ as to which entities can be searched, they all allow searches for sentences and languages. The value of the more complex portals, the ones with more entity types, is that a search for any entity type can be delimited by the rich set of properties that other entity types can be designated for, but this is best established by demonstration (see Sample Searches). Property designations that delimit searches can be word searches or pre-defined property searches or both.
Let’s start out with a simple case – text searches. Open the Generic Portal. Click Search. Suppose you are searching for the set of all sentences in the AD that contain a passive morpheme. Click ‘sentences’ at the top of the search page under the heading ‘Search for’ to set the entity type. Since we are looking for a particular piece of text within a specific field, this will be a word search. Open the drop-down for Sentence by clicking on the ‘+’ sign. In that case you will want to indicate a search for the PASS gloss (see our Glossing Conventions) as a sentence property in what we call ‘the four lines’.
Search by words (or substrings) of the sentence text.
Once you have done this, all you have to do his hit ‘Search’ under ‘Search for’ and you will then see a list (in this case quite long) of every sentence in the AD that has a PASS morpheme in it (although it will also produce some stray results in the set where the gloss ‘pass’ is part of the translation gloss for a verb like /surpass/, for example). If you type in the wrong line you will be searching for the morphophonological representation PASS or any translation that has the sequence /pass/ in it. The Sentence ID line above the four lines works as follows: If you are just trying to find a particular sentence and you know the sentence ID (every sentence in the AD has a unique sentence ID number), then you can just enter that number. A word search can also be used to locate particular analytic entities. Additional ways of enhancing your search are the following, as stated on the Anaphora Portal search page:
Simple free-text searches require every search term to be present, as a whole or partial word. For more complex queries you can use AND, OR and negation (a minus sign), enclose "a string in quotes" to match it exactly, or search for the keyword NULL, which means the field must be empty.
AND is understood as ‘both’, that is, only entities that have both pieces of text will be returned by a search. We call AND and exclusive search. OR will return all the entities that have either property. We call OR an inclusive search. More on word search
Searching with pre-defined properties of entities
Most entity properties in the AD are not based on fields with text but are predetermined options based on properties defined for each entity type in the Database Property Attribution Guide. The entity properties are based sets of answers to questions about a given entity. The questions about an entity are usually grouped into question groups and the properties that can be searched for are the options for answering a given question in the question group.
For example, one question group about sentence entities is part of every portal, namely, the ‘All Project Sentence Questions’. If you select properties of different questions, you will get an exclusive search (only entities with both properties), but if you select more than one answer of a set of options that answer the question (by holding down the control key you can select more than one answer), then you will get an inclusive search (all the entities that have either property).
Properties of entities can also be used to define a search for a different entity. For example, one can search through the Anaphora portal for all the sentences that have an anaphoric marker that includes a morpheme that corresponds to a body part (in one language or across languages). In the CC Portal, which has five entities, properties of any of the five entities can be part of the definition of a search, e.g. one can define a (exclusive) search for all sentences that have a complementizer with a ‘say’ meaning (C-Type property), an epistemic predicate (predicate property), and full agreement in the complement clause (clause type property) in one or several languages.
We recommend that you try some Sample Searches to get the hang of things, but if you learn better by just trying something yourself, please go ahead.
Data entry is only possible with special permission and is thus password protected. Each sister project has its own protections and its own data entry. No project sentence or sentence detail entered by project A in the database (through its portal) can be subtracted or altered by any project that is not project A. However, any project can add details and entity properties particular to its own portal for sentences collected by other projects. Thus a sentence ID### collected by the Anaphora Project and entered through the Anaphora Portal cannot be edited or deleted through the CCP, but additional details about ID###, details useful to CCP users, can be added to the details about ID### that exist from its original entry. These additional properties are only visible in the portal where they are introduced or in the Generic Portal.
If you have been given permission to enter data through a portal, then you will need to be trained for data entry by Afranaph Project personnel, but those who are granted a password will also have access to the Afranaph Data Entry Guide, which is a designed to assist training and refresh memories for those returning to data entry work.