Manual Annotation
Last updated
Last updated
We have created a user friendly seamless interface to facilitate the annotation experience.
UBIAI's annotation interface allow you to simultaneously annotate entities, relations and document classes with a click of a button. Simply click on the Entities, Relations or Classification tab to switch between the three annotation types. In addition, UBIAI supports multi-tag annotation where you can assign multiple entities to a token.
Span-based Annotation:
To make the annotation process seamless, we have added a label dialog option that allows you to select the entity from an entities list right in the annotation interface:
Adding Comments to Entity:
Sometimes you are unsure about how to annotate certain entities and wish to leave a note that you can revisit later to clear up. With this new release, you and your team members can add comments by right clicking on the entity and selecting "comments".
Character-based Annotation:
Relations Annotation:
For Relation annotation, first select the relation type in the upper window and click on the head token in the text then the child token. An arrow will then appear on the top left of the head token that signifies the direction of the relation head -> child. You can annotate multiple childs per head by sequentially selecting the head and the respective child one by one.
Document Classification Annotation:
In addition to entity and relation annotation, you can simultaneously assign a label to the whole document for subsequent text classification task. Simply assign a class type (binary or multi-classes), select multiple classes from the class list in the classification tab or select positive/negative in case of binary classification to assign a class to the document.
Image Classification Annotation:
With image classification annotation, you can assign classes to individual images. The images can be in PNG or JPG. In addition, you can attach a text snippet describing the image by uploading a JSON file (containing the text and image ID) and image pair in a ZIP file.
OCR Annotation:
When dealing with semi-structured text such as invoices or contracts, preserving document layout is key to training a high-performance model. Traditional text annotation solutions parse the text from semi-structured texts without preserving the spatial information resulting in poor model training. Combining natural language processing and computer vision, UBIAI’s OCR feature allows you to perform NER, relation extraction, and classification annotation directly on native PDFs, scanned images or pictures from your phone without losing any layout information, resulting in a significant boost of your NLP model performance.
During project creation, select OCR annotation option to be able to upload PDFs, JPG or PNG documents. Once in the annotation page, there are two panels: 1) The parsed text on the left panel and the original image on the right. You have the option to annotate on both panels with the right panel (that has the original image) giving you guidance on the document layout. To annotate directly on the original image, simply double click on the token, to annotate a single token, or create a frame around the tokens you want to annotate. Alternatively, you can press shift + click on 1st token then the 2nd token for multi-token annotation. You can also annotate relations on the left panel (but not on the right panel). Note, all other functions such dictionary annotation, rule based matching and model auto-annotations can still be used even for OCR documents which is one of the main advantages of using UBIAI.
Object Detection:
With object detection, you can draw a bounding box around an image to assign a label. This is useful to supplement your OCR annotation with non-textual entity such as a signature, figures and images. To enable object detection simply check the box Object Detection, draw a bounding a box around the area of interest and assign a label from the labels list. Object export will be included in OCR JSON format.
Free Form:
The free form input interface lets you collect input text of the annotated document. The text input interface is best used to ask the annotator to translate a given text or leave feedback about a manual annotation task. To enable it, simply click on the vertical two arrows "toggle free form field" located at the bottom left of the annotation interface.
Property Field:
In addition to annotating individual entities, you can now assign properties having key:value to each annotated entity by right clicking and selecting "Properties List". This is useful for creating knowledge graphs where each entity might have multiple child nodes. You can add properties with a click of a button!
Once you finish annotating the document, click the validationcheck button (shift + down arrow) if done or rejectclose (shift + left arrow) the annotation if the text is not appropriate. You have the option to ignore the annotation and come back to it later.
To cancel previous annotations, press ctrl + Z or ctrl + Y to relabel previously erased annotations.
You can track the annotation progress by looking at the completion percentage bar along with the number of finished documents.
In order to track the annotation progress, go to the history panel on the left to visualize the annotation status of each document.
You can also filter documents by keywords, document state (validated, rejected, ignored, annotated/not-annotated), entities, relation and classes by defining logic operations as shown below:
During project creation you have the option to choose between span-based and character-based annotation. For span-based annotation you label and assign an entity to a word span instead of character span. You should choose this option if you do not require within word character annotation. In order to do manual entities annotation, select the entity type in the upper window then highlight the word/sentence. Each entity type will have different color for differentiation. To speed up the manual annotation, each entity in the list is assigned to a keyboard shortcut next to it. .
For a specific type of annotation such as for machine translation, it is required to annotate character within a word. In this case, you should choose the character-based annotation option. You can simply highlight a character inside a word and assign a label ot it. .