Database Architecture

Normalization

To normalize a database, we restructured it in accordance to a series of normal forms that avoids the redundancy of data, undesirable dependencies, and inconsistencies.

In his paper “Further Normalization of the Data Base Relational Model" [4], Edgar F. Codd, the “father” of the relational database model, listed these objectives of normalization:

  1. To free the collection of relations from undesirable insertion, update and deletion dependencies.
  2. To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of application programs.
  3. To make the relational model more informative to users.
  4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.

The following set of tables serves as an example for a database that is not properly normalized.

Table one:

username word word_language translation translation_language
Mark house en Haus de
Dana house en maison fr
Eamon casa es house en

Table two:

id username email is_admin
1001 Dana dana@email.com true
1002 Eamon eamon@email.com true
1003 Mark marc@email.com false

Table one holds the values of four key entities: users, words, translations, and languages. Table two maintains user information.

Two independent username columns exist in two tables: an example of redundancy that can lead to so-called “update anomalies”. If we discover after the initial insertion that user with username Marc is not spelled with a “k” but with a “c”, and only proceed to change this data in the users table, then data across the database will become inconsistent. The username needs to be updated in two different places, making database updates open to errors.

Normal forms

Normalization of relational databases is guided by rules that build on each other called “normal forms”.

To satisfy the the first normal form (1NF), data sets must not contain tables themselves, and each data set must have a unique primary key. In the example above, with no nested tables and primary key for only the users table, 1NF is not satisfied.

The second normal form (2NF) must fulfil rules of the first normal form, and demands every attribute of a data set to be functionally dependent on the primary key. Attributes not dependent on the primary key belongs to their own table.

The third normal form (3NF) requires 2NF and requires the elimination of any transitive functional dependencies. For example, if B depends on primary key A, but C depends on B, then both B and C should be extracted to their own table.

Within the scope of Alexandria, our database was designed to satisfy 3NF, thus protecting it from insertion, update and deletion anomalies.

Tables

We started out with a few iterations of basic design. This is one of the first drafts:

As our understanding and ambitions of Alexandria’s user needs and functionalities grew, the number of tables in our database grew. Here’s a few examples of additional features we considered implementing for the MVP:

  • User having access and choosing from other users' translations
  • Saving every existing translation with a new context
  • Users setting favourite dictionaries

Once we pared down Alexandria’s feature list to arrive at a MVP, our database complexities shrank. The reduced scope supports a single-user experience, simplified user profile and context handling, while laying the groundwork for cross-user content sharing.

This is what we arrived at for the initial release of Alexandria:

In the final design, we stripped the connecting tables of their own unique ids, and made composite keys - the combinations of foreign key columns - the primary keys. This ensures that connections are unique and a user cannot have two statuses for the same word.

We left in id as the primary key for the words table while acknowledging it could have been possible to replace unique keys with composite keys - as every word is unique in its own language. Bar one table, Alexandria has achieved 3NF.

Naming things

As a useful side effect of designing the database early on, we were forced to come to a common understanding of naming entities in our code. For example, what is a word, and what is a phrase? Finding that common language was essential for communication going forward.

Alexandria’s MVP database tables

Main tables

table description
users The people using Alexandria. Each user can actively learn one language at a time, and translate to one base language. If language preferences change, previous progress (translations, word statuses) is kept.
admins The people administering Alexandria. That would be the core development team for now.
languages The languages that are supported by Alexandria. For the first release, it is a list of ten. What they have in common is their use of a Latin-based alphabet (to avoid surprises with our text parser), in addition to language support by PostgreSQL text search. A flag representing the language is saved as a unicode character combination to be used in menus across the application.
texts The key entity for learning. Texts are currently provided by users through forms, later through URLs and files. Texts default to private, which should particularly suit those who learn best by reading state secrets or erotic literature. Public texts accessible to every user is on the roadmap for a future release. Database is created with necessary column to provide this functionality.
words Translatable entities. Usually single words like cabbage, but also compounds, or phrases, like get up or l’hôtel. Each word must be unique within its language. It can have different meanings (translations) but in database terms, each combination of letters occurs once per language.
translations Translations of words can be provided or selected by the users, but they exist independently of the users who created them. Since a word is aware of the language it’s in, the translation table tracks only the language a word is translated to, in column target_language. Seeding the database with a language’s most common words and translations could be another future release feature.
webdictionaries Online resources where user can look up a word or phrase.

Connecting tables

table purpose
users_words Word statuses for a user-word combination are “learning”, familiar”, and “learned”.
User dictionaries could be generated using this table.
users_translations Every translated word originates in a text. Surrounding words from that text form the translation’s context. Users have their own contexts for every translation. If the translation is not newly provided but simply selected from previous translations, current context is saved and connected to the current translation and user.
webdictionary_preference Users can have one favourite dictionary per source language.