Annotation Settings

The annotation settings found in the project menu includes project details (e.g. to enter/modify a new label), Documents, Pre-annotation, Metrics, Models and API

Documents

In this tab, you will find a list of the uploaded documents with the annotation status (annotated, non-annotated, rejected or ignored).

Pre-annotation

  1. Entity Dictionary

For each entity type you can associate one or more dictionary to automatically recognize and annotate words contained in the dictionary. You can either input the dictionary element manually or upload a csv file (make sure the file is saved as csv UTF-8 to prevent decoding issues) containing all the associated words with their corresponding entity type (see example below). As an example, let's add a dictionary for entity type DIPLOMA . In the first column we add words corresponding to the DIPLOMA entity such as Bachelor, Master, PhD, in the second column we write the corresponding entity DIPLOMA. Once done, we upload the csv file and press save. The tool will then automatically annotate the words in the dictionary that are found in the corpus document.

Once the dictionary is added, hit save and the corresponding entities will be annotated automatically.

Note: Removing words from an existing dictionary will not remove the already annotated words in the corpus.

  1. Entity Rule Based Matching

In addition to word dictionaries, you can add regular expression to search for a specific pattern in the corpus and annotate it automatically. As an example, let’s suppose we found a repeating pattern in our corpus as follows:

2+ years of experience managing and leading highly technical teams in a fast-paced environment .

As you can see, the sentence follow certain pattern which can be defined by the following regex python code: [0-9][+].+ [.] .

The regex code means, we are looking for a sentence that starts with a number between 0-9 followed by the sign “+” followed by any character having one or more occurrence, followed by a period. The tool will then search for such pattern and annotate the sentence automatically. The complete list of regex codes and documentation can be found here.

With rule based matching you will be able to pre-annotate your documents instantly using a combination of multiple tags.

To do so, go to the pre-annotation tab in the project setting and select the pre-annotation tab:

  1. From the first drop down menu, select the attribute that you would like to search for

  2. Enter the value of the attribute in the next field

  3. Assign the entity to the attribute from the next drop down menu

  4. Click "add token" and hit save

The attributes consist of tags such as Part Of Speech and regular expression patterns, below is the list of all the possible attributes with their description:

  1. Relation Dictionary

For each pair of entities you can define relations to automatically auto-label your data whenever there is a match in the dictionary. Keep in mind that relation auto-labeling is only available for pairs of entities that are not overlapping and have distance within 100 tokens. This auto-labeling process is not reversible once it's launched, so make sure you test it on a small subset of your data before launching it on the whole dataset.

Last updated