Bookstore Application

Application Schema and Interface

Authors: Egor Rogov, Pavel Luzanov

Translated by Liudmila Mantrova

Usage of Course Materials

Non-commercial use of course materials (presentations, demonstrations) is

allowed on an unrestricted basis. Commercial use is only possible with prior

written permission of Postgres Professional company. Modification of course

materials is forbidden.

Please send your feedback to: edu@postgrespro.ru

Disclaimer

In no event shall Postgres Professional company be liable for any damages

or loss, including loss of profits, that arise from direct or indirect, special or

incidental use of course materials. Postgres Professional company

specifically disclaims any warranties on course materials. Course materials

are provided “as is,” and Postgres Professional company has no obligations

to provide maintenance, support, updates, enhancements, or modifications.

Agenda

Bookstore application overview

Designing the application schema; normalization

The final version of the application schema

Setting up the client-server interface

Demo

In this demo, we are going to show the Bookstore application as it should

appear after all practical assignments are complete. The application is

available in the VM browser at http://localhost/.

The application consists of several parts that are provided as separate tabs.

“Store” is a web user interface for buying books online.

Other tabs represent the backend interface, which is available only to the

bookstore employees (the admin panel).

“Catalog” is the storekeeper’s interface that is used for ordering books to the

store and viewing new arrivals and purchases.

“Books” and “Authors” are interfaces for librarians, where they can register

new arrivals.

For training purposes, all this functionality is exposed in a single web page.

If any feature is unavailable because the required object (such as a table or

a function) is missing, the application will report an error. It also displays the

text of all queries sent to the server.

We will start with an empty database and will gradually implement all the

required components as the course progresses.

The source code of the application frontend will not be discussed in this

course, but you can download it from the following git repository:

https://git.postgrespro.ru/pub/dev1app.git

Books

Book

title

authors

quantity

operations

entity

attributes

An ER-model for high-level design

entities are concepts of the subject domain

relationships are connections between entities

attributes are properties of entities and relationships

After taking a look at the application’s interface and functionality, we have to

deal with its schema. We will not go into details about the database design:

it is a separate branch of knowledge, which is beyond the scope of this

course. But we cannot ignore this topic entirely.

High-level database design often uses the ER-model (where ER stands for

“Entity–Relationship”). It deals with entities (concepts of the subject area),

their relationships, and attributes (the properties of entities and

relationships). The model allows us to remain at the logical level, without

getting down to data representation at the physical level (i.e., its table form).

The first approach to database design is creating a diagram as shown on

this slide: a book is represented as a single big entity, and everything else

becomes its attributes.

Application Schema (Ver. 1)

id | title | author | qty | operation

----+-------------------+------------------------------+-----+-----------

1 | The Tempest | William Shakespeare | 10 | +11

1 | THE TEMPEST | William Shakespeare | 10 | -1

2 | Romeo and Juliet | William Shakespeare | 4 | +4

3 | Good Omens | Terry Pratchett | 7 | +7

3 | Good Omens | Neil Gaiman | 7 | 0

The data is duplicated

it’s hard to maintain consistency

it’s hard to perform updates

it’s hard to write queries

10 = 11 – 1

7,0

or 0,7

or 7,7

Clearly, this approach cannot be correct. It may be not quite obvious in the

diagram itself, but let’s try to project this diagram onto database tables.

There are several ways to do it. One of them is shown on the slide: the table

corresponds to the entity, and table columns represent the attributes of this

entity.

This diagram is a good illustration that some data is duplicated (these

fragments are highlighted). Data duplication makes it hard to maintain

consistency, which is arguably the main objective of a database system.

For example, each of the two rows related to book 3 must list the total

quantity (7 items). What should be done to reflect a purchase? On the one

hand, we need to add some rows that reflect purchase operations. (But how

many rows are required? Should we add one or two?) On the other hand,

the quantity should be reduced from 7 to 6 in all rows. And what if an error

leads to data discrepancy between these rows? How can we define a

constraint that forbids such a situation?

Many queries will also become overcomplicated. How can we find the total

number of books? How can we find all the unique authors?

Thus, such a schema will not work well for relational databases.

Application Schema (Ver. 2)

entity | attribute | value

--------+------------+-------

1 | title | The Tempest

1 | author | William Shakespeare

1 | qty | 10

1 | operation | +11

1 | operation | -1

2 | title | Romeo and Juliet

2 | author | William Shakespeare

2 | qty | 4

2 | operation | +4

... | ... | ...

Data without a schema

consistency is maintained at the application side

it’s hard to write queries

performance is low (multiple joins)

Another way to represent an entity as a table is a so-called EAV schema

(“entity–attribute–value”). It allows storing anything at all in a single table.

Technically, we get a relational database, but it has virtually no schema, and

the database system cannot guarantee data consistency. Consistency has

to be maintained by the application alone, and sooner or later it is bound to

be compromised.

With such a schema, it is hard to write queries (although they are quite easy

to generate). As a result, handling more or less significant data volumes

becomes a problem because of multiple self-joins.

It is not an approach to follow.

Application Schema (Ver. 3)

book_id | description

---------+-----------------------------------------------------------

1 | {"title": "The Tempest",

| "authors": [ "William Shakespeare" ],

| "qty": 10,

| "operations": [ +11, -1 ]}

3 | {"title": "Good Omens",

| "authors": [ "Terry Pratchett",

| "Neil Gaiman" ],

| "qty": 7,

| "operations": [ +7 ]}

... | ...

Data without a schema

consistency is maintained at the application side

it’s hard to write queries (a special language is needed)

there is index support

Another similar approach consists in representing data in the JSON format,

NoSQL-style. All the previous considerations still apply here.

Besides, it will be impossible to query such a structure using SQL: you will

have to use a special language (previously, jsQuery would be the most

probable choice, but starting from PostgreSQL 12, it is convenient to use the

SQL/JSONPath features defined in the SQL:2016 standard).

https://github.com/postgrespro/jsquery

https://postgrespro.com/docs/postgresql/12/functions-json#FUNCTIONS-SQ

LJSON-PATH

Although PostgreSQL provides index support for JSON, performance is still

a concern.

It is convenient to use such a schema if the database only needs to get

JSON data by ID, and no serious data processing within the JSON structure

is expected. But it is not our case.

(Naturally, nothing is set in stone here. See the last practical assignment for

further discussion.)

Books and Operations

Normalization reduces data redundancy

Large entities are split into smaller ones

Book

title

authors

Operation

quantity change

date

one-to-many

relationship

Thus, we need to eliminate redundancy, so that it is convenient to work with

the data in a relational database system. This process is called

normalization.

You might be familiar with various normal form concepts (first, second, third,

Boyce–Codd, etc.). We are not going to discuss them here; speaking

informally, it is enough to understand that all this math pursues one and the

same goal: eliminating redundancy.

You can reduce redundancy by splitting a big entity into several smaller

ones. The exact way to do it should be prompted by common sense (which

cannot be replaced by the knowledge of normal forms alone anyway).

In our case, everything is quite straightforward. Let’s start by separating

books and operations. These two entities are connected by the one-to-many

relationship: there can be several operations on each book, but each

operation relates to a single book only.

Application Schema

books

book_id | title | author

---------+-------------------+------------------------------

1 | The Tempest | William Shakespeare

2 | Romeo and Juliet | William Shakespeare

3 | Good Omens | Terry Pratchett

3 | Good Omens | Neil Gaiman

operations

operation_id | book_id | qty_change | date_created

--------------+---------+------------+--------------

1 | 1 | +10 | 2020-07-13

2 | 1 | -1 | 2020-08-25

3 | 3 | +7 | 2020-07-13

4 | 2 | +4 | 2020-07-13

At the physical level, the identified can be represented by two tables: books

and operations.

An operation changes the quantity of books. This change can be either

positive or negative (the order operation adds some books, while the

purchase operation subtracts them). Note that the book has no “quantity”

attribute anymore. Instead, it is enough to sum up all quantity changes

made by operations related to this book. Having an additional “quantity”

attribute would lead to data redundancy again.

This solution might seem strange to you at first. Is it really convenient to

calculate the sum instead of simply querying a separate field? But we can

create a view to display the quantity of each book. It won’t result in

redundancy: a view is just a query.

Another point to consider is performance. If summing up all changes brings

too much overhead, we can perform the opposite process called

denormalization: physically add the “quantity” field to the books table and

ensure that it is consistent with the operations table. We are not going to

discuss here whether it makes sense or not (this question is considered in

the QPT course that covers query performance tuning). Common sense

suggests that it’s not required for our “sandbox.” But we will get back to

denormalization in the “Triggers” lecture.

Thus, as you can see on this slide, moving all operations into a separate

entity resolves most of the duplication issues, but not all of them.

Books, Authors, Operations

Book

title

Author

last name

first name

middle name

Operation

quantity change

date

many-to-many

relationship

That’s why we have to take one step further: separate books from authors

and connect them with each other by a many-to-many relationship: each

book can be written by several authors, and each author can have more

than one book. At the table level, such relationship can be implemented

using an additional intermediate table.

The first, last, and middle names can be the author’s attributes. It makes

sense because we may need to work with each of these attributes

separately, e.g., to display the author’s last name and initials.

Designing the Interface

Tables and triggers

reading data directly from tables (views);

writing data directly to tables (views),

using triggers for changing related tables

the application must be aware of the data model;

this approach provides maximum flexibility

consistency is hard to maintain

Functions

reading data via table functions;

writing data by calling functions

the application is separated from the data model and is limited by API

you have to write a lot of wrapper-functions,

potential performance issues

There are several ways to set up an application’s client-server interface.

The first option is to allow the application to access and modify database

tables directly. In this case, the application must have the precise

“knowledge” of the data model. This requirement can be relaxed to some

extent by using views.

Another limitation of this approach is that the application has to follow

certain rules; otherwise, it is very hard to maintain data consistency if you

have to address all possible inappropriate operations at the database level.

But this approach is the most flexible one.

Another option is to forbid direct table access from the application and allow

only function calls. Reading can be performed by table functions (which

return a set of rows). Writing can be performed by calling other functions

and passing the required data to them. In this case, all the necessary

consistency checks can be implemented within functions: the database will

be protected, but the application will be able to use only a limited set of

features that we provide. It requires writing many wrapper functions and can

lead to performance degradation.

You can also combine these two approaches. For example, you can allow

the application to read data from tables directly, but perform modifications

only by calling special functions.

Bookstore Interface

buy_book

books

book_id

title

authorship

book_id

author_id

seq_num

authors

author_id

last_name

first_name

middle_name

operations

operation_id

book_id

qty_change

date_created

book purchase

get_catalog

list of

books

In this application, we will try different ways of setting up the interface

(although it’s usually better to stick to one approach when developing real

applications).

The store will use interface functions:

- get_catalog for searching books (see “SQL. Composite Types”)

- buy_books for making a purchase (see “PL/pgSQL. Executing Queries”)

operations_v catalog_v

authors_v

Admin Panel Interface

add_book

books

book_id

title

authorship

book_id

author_id

seq_num

authors

author_id

last_name

first_name

middle_name

operations

operation_id

book_id

qty_change

date_created

add_author

UPDATE

adding a book adding an author

book order

update_catalog_trigger

The admin panel is going to retrieve data by accessing the following views

(which we have to create as part of the practice for this lecture):

- catalog_v for the list of books

- authors_v for the list of authors

- operations_v for the list of operations

Authors will be added using the add_author function (we will create it once

we get to the “PL/pgSQL. Executing Queries” lecture). For adding books, we

will implement the add_book function (“PL/pgSQL. Arrays”).

To enable book purchase, we will make the catalog_v view updatable

(“PL/pgSQL. Triggers”).

Summary

Database design is a separate complex topic

theory is important, but it should not replace common sense

Normalized data makes your life easier and facilitates

consistency support

The client-server interface can use tables, views, functions,

and triggers

Practice

1. Create the bookstore schema in the bookstore database.

Set up the search path to this schema at the database level.

2. In the bookstore schema, create books, authors, authorship,

and operations tables with all the necessary constraints,

exactly as shown in the demo.

3. Insert the data about several books into the tables.

Check the result by running some queries.

4. In the bookstore schema, create authors_v, catalog_v,

and operations_v views, so that they look exactly like shown

in the demo.

Check that the application now shows some data in “Books”,

“Authors”, and “Catalog” tabs.

Task 1. Recall the contents of the topic “Data Organization. Logical

Structure.”

Task 2. Use the demonstrated output of psql’s \d commands as a reference.

Task 3. You can use the data shown in the demo, or come up with your own

data.

Task 4. Try writing queries to the base tables that return the same results as

the queries to views shown in the demo. Then save these queries as views.

After completing the assignments, make sure to compare your queries with

those in the provided keys. Make corrections if required.

Practice

1. What additional attributes can appear for these entities as the

application evolves?

2. Suppose you have to store the information about the publisher.

Extend the ER-diagram accordingly and create the corresponding

tables.

3. Some books can belong to a series (such as “The Adventure

Collection”). How will it affect the schema?

4. Suppose our store started selling hardware equipment

(motherboards, CPUs, memory, storage devices, monitors, etc.).

What entities and attributes would you single out?

Keep in mind that new types of equipment constantly appear in

the market, and they can have their own specific characteristics.

Task 3. Different publishers can easily have different book series with the

same title.