Bookstore Application
Application Schema and Interface
12
Copyright
© Postgres Professional, 2017–2021
Authors: Egor Rogov, Pavel Luzanov
Translated by Liudmila Mantrova
Usage of Course Materials
Non-commercial use of course materials (presentations, demonstrations) is
allowed on an unrestricted basis. Commercial use is only possible with prior
written permission of Postgres Professional company. Modification of course
materials is forbidden.
Contact Us
Please send your feedback to: edu@postgrespro.ru
Disclaimer
In no event shall Postgres Professional company be liable for any damages
or loss, including loss of profits, that arise from direct or indirect, special or
incidental use of course materials. Postgres Professional company
specifically disclaims any warranties on course materials. Course materials
are provided “as is,” and Postgres Professional company has no obligations
to provide maintenance, support, updates, enhancements, or modifications.
2
Agenda
Bookstore application overview
Designing the application schema; normalization
The final version of the application schema
Setting up the client-server interface
3
Demo
In this demo, we are going to show the Bookstore application as it should
appear after all practical assignments are complete. The application is
available in the VM browser at http://localhost/.
The application consists of several parts that are provided as separate tabs.
“Store” is a web user interface for buying books online.
Other tabs represent the backend interface, which is available only to the
bookstore employees (the admin panel).
“Catalog” is the storekeeper’s interface that is used for ordering books to the
store and viewing new arrivals and purchases.
“Books” and “Authors” are interfaces for librarians, where they can register
new arrivals.
For training purposes, all this functionality is exposed in a single web page.
If any feature is unavailable because the required object (such as a table or
a function) is missing, the application will report an error. It also displays the
text of all queries sent to the server.
We will start with an empty database and will gradually implement all the
required components as the course progresses.
The source code of the application frontend will not be discussed in this
course, but you can download it from the following git repository:
4
Books
Book
title
authors
quantity
operations
entity
attributes
An ER-model for high-level design
entities are concepts of the subject domain
relationships are connections between entities
attributes are properties of entities and relationships
After taking a look at the application’s interface and functionality, we have to
deal with its schema. We will not go into details about the database design:
it is a separate branch of knowledge, which is beyond the scope of this
course. But we cannot ignore this topic entirely.
High-level database design often uses the ER-model (where ER stands for
“Entity–Relationship”). It deals with entities (concepts of the subject area),
their relationships, and attributes (the properties of entities and
relationships). The model allows us to remain at the logical level, without
getting down to data representation at the physical level (i.e., its table form).
The first approach to database design is creating a diagram as shown on
this slide: a book is represented as a single big entity, and everything else
becomes its attributes.
5
Application Schema (Ver. 1)
id | title | author | qty | operation
----+-------------------+------------------------------+-----+-----------
1 | The Tempest | William Shakespeare | 10 | +11
1 | THE TEMPEST | William Shakespeare | 10 | -1
2 | Romeo and Juliet | William Shakespeare | 4 | +4
3 | Good Omens | Terry Pratchett | 7 | +7
3 | Good Omens | Neil Gaiman | 7 | 0
The data is duplicated
it’s hard to maintain consistency
it’s hard to perform updates
it’s hard to write queries
10 = 11 – 1
7,0
or 0,7
or 7,7
?
Clearly, this approach cannot be correct. It may be not quite obvious in the
diagram itself, but let’s try to project this diagram onto database tables.
There are several ways to do it. One of them is shown on the slide: the table
corresponds to the entity, and table columns represent the attributes of this
entity.
This diagram is a good illustration that some data is duplicated (these
fragments are highlighted). Data duplication makes it hard to maintain
consistency, which is arguably the main objective of a database system.
For example, each of the two rows related to book 3 must list the total
quantity (7 items). What should be done to reflect a purchase? On the one
hand, we need to add some rows that reflect purchase operations. (But how
many rows are required? Should we add one or two?) On the other hand,
the quantity should be reduced from 7 to 6 in all rows. And what if an error
leads to data discrepancy between these rows? How can we define a
constraint that forbids such a situation?
Many queries will also become overcomplicated. How can we find the total
number of books? How can we find all the unique authors?
Thus, such a schema will not work well for relational databases.
6
Application Schema (Ver. 2)
entity | attribute | value
--------+------------+-------
1 | title | The Tempest
1 | author | William Shakespeare
1 | qty | 10
1 | operation | +11
1 | operation | -1
2 | title | Romeo and Juliet
2 | author | William Shakespeare
2 | qty | 4
2 | operation | +4
... | ... | ...
Data without a schema
consistency is maintained at the application side
it’s hard to write queries
performance is low (multiple joins)
Another way to represent an entity as a table is a so-called EAV schema
(“entity–attribute–value”). It allows storing anything at all in a single table.
Technically, we get a relational database, but it has virtually no schema, and
the database system cannot guarantee data consistency. Consistency has
to be maintained by the application alone, and sooner or later it is bound to
be compromised.
With such a schema, it is hard to write queries (although they are quite easy
to generate). As a result, handling more or less significant data volumes
becomes a problem because of multiple self-joins.
It is not an approach to follow.
7
Application Schema (Ver. 3)
book_id | description
---------+-----------------------------------------------------------
1 | {"title": "The Tempest",
| "authors": [ "William Shakespeare" ],
| "qty": 10,
| "operations": [ +11, -1 ]}
3 | {"title": "Good Omens",
| "authors": [ "Terry Pratchett",
| "Neil Gaiman" ],
| "qty": 7,
| "operations": [ +7 ]}
... | ...
Data without a schema
consistency is maintained at the application side
it’s hard to write queries (a special language is needed)
there is index support
Another similar approach consists in representing data in the JSON format,
NoSQL-style. All the previous considerations still apply here.
Besides, it will be impossible to query such a structure using SQL: you will
have to use a special language (previously, jsQuery would be the most
probable choice, but starting from PostgreSQL 12, it is convenient to use the
SQL/JSONPath features defined in the SQL:2016 standard).
Although PostgreSQL provides index support for JSON, performance is still
a concern.
It is convenient to use such a schema if the database only needs to get
JSON data by ID, and no serious data processing within the JSON structure
is expected. But it is not our case.
(Naturally, nothing is set in stone here. See the last practical assignment for
further discussion.)
8
Books and Operations
Normalization reduces data redundancy
Large entities are split into smaller ones
Book
title
authors
Operation
quantity change
date
one-to-many
relationship
Thus, we need to eliminate redundancy, so that it is convenient to work with
the data in a relational database system. This process is called
normalization.
You might be familiar with various normal form concepts (first, second, third,
Boyce–Codd, etc.). We are not going to discuss them here; speaking
informally, it is enough to understand that all this math pursues one and the
same goal: eliminating redundancy.
You can reduce redundancy by splitting a big entity into several smaller
ones. The exact way to do it should be prompted by common sense (which
cannot be replaced by the knowledge of normal forms alone anyway).
In our case, everything is quite straightforward. Let’s start by separating
books and operations. These two entities are connected by the one-to-many
relationship: there can be several operations on each book, but each
operation relates to a single book only.
9
Application Schema
books
book_id | title | author
---------+-------------------+------------------------------
1 | The Tempest | William Shakespeare
2 | Romeo and Juliet | William Shakespeare
3 | Good Omens | Terry Pratchett
3 | Good Omens | Neil Gaiman
operations
operation_id | book_id | qty_change | date_created
--------------+---------+------------+--------------
1 | 1 | +10 | 2020-07-13
2 | 1 | -1 | 2020-08-25
3 | 3 | +7 | 2020-07-13
4 | 2 | +4 | 2020-07-13
At the physical level, the identified can be represented by two tables: books
and operations.
An operation changes the quantity of books. This change can be either
positive or negative (the order operation adds some books, while the
purchase operation subtracts them). Note that the book has no “quantity
attribute anymore. Instead, it is enough to sum up all quantity changes
made by operations related to this book. Having an additional “quantity”
attribute would lead to data redundancy again.
This solution might seem strange to you at first. Is it really convenient to
calculate the sum instead of simply querying a separate field? But we can
create a view to display the quantity of each book. It won’t result in
redundancy: a view is just a query.
Another point to consider is performance. If summing up all changes brings
too much overhead, we can perform the opposite process called
denormalization: physically add the “quantity” field to the books table and
ensure that it is consistent with the operations table. We are not going to
discuss here whether it makes sense or not (this question is considered in
the QPT course that covers query performance tuning). Common sense
suggests that it’s not required for our “sandbox.” But we will get back to
denormalization in the “Triggers” lecture.
Thus, as you can see on this slide, moving all operations into a separate
entity resolves most of the duplication issues, but not all of them.
10
Books, Authors, Operations
Book
title
Author
last name
first name
middle name
Operation
quantity change
date
many-to-many
relationship
That’s why we have to take one step further: separate books from authors
and connect them with each other by a many-to-many relationship: each
book can be written by several authors, and each author can have more
than one book. At the table level, such relationship can be implemented
using an additional intermediate table.
The first, last, and middle names can be the author’s attributes. It makes
sense because we may need to work with each of these attributes
separately, e.g., to display the author’s last name and initials.
12
Designing the Interface
Tables and triggers
reading data directly from tables (views);
writing data directly to tables (views),
using triggers for changing related tables
the application must be aware of the data model;
this approach provides maximum flexibility
consistency is hard to maintain
Functions
reading data via table functions;
writing data by calling functions
the application is separated from the data model and is limited by API
you have to write a lot of wrapper-functions,
potential performance issues
There are several ways to set up an application’s client-server interface.
The first option is to allow the application to access and modify database
tables directly. In this case, the application must have the precise
“knowledge” of the data model. This requirement can be relaxed to some
extent by using views.
Another limitation of this approach is that the application has to follow
certain rules; otherwise, it is very hard to maintain data consistency if you
have to address all possible inappropriate operations at the database level.
But this approach is the most flexible one.
Another option is to forbid direct table access from the application and allow
only function calls. Reading can be performed by table functions (which
return a set of rows). Writing can be performed by calling other functions
and passing the required data to them. In this case, all the necessary
consistency checks can be implemented within functions: the database will
be protected, but the application will be able to use only a limited set of
features that we provide. It requires writing many wrapper functions and can
lead to performance degradation.
You can also combine these two approaches. For example, you can allow
the application to read data from tables directly, but perform modifications
only by calling special functions.
13
Bookstore Interface
buy_book
books
book_id
title
authorship
book_id
author_id
seq_num
authors
author_id
last_name
first_name
middle_name
operations
operation_id
book_id
qty_change
date_created
book purchase
get_catalog
list of
books
In this application, we will try different ways of setting up the interface
(although it’s usually better to stick to one approach when developing real
applications).
The store will use interface functions:
- get_catalog for searching books (see “SQL. Composite Types”)
- buy_books for making a purchase (see “PL/pgSQL. Executing Queries”)
14
operations_v catalog_v
authors_v
Admin Panel Interface
add_book
books
book_id
title
authorship
book_id
author_id
seq_num
authors
author_id
last_name
first_name
middle_name
operations
operation_id
book_id
qty_change
date_created
add_author
UPDATE
adding a book adding an author
book order
update_catalog_trigger
The admin panel is going to retrieve data by accessing the following views
(which we have to create as part of the practice for this lecture):
- catalog_v for the list of books
- authors_v for the list of authors
- operations_v for the list of operations
Authors will be added using the add_author function (we will create it once
we get to the “PL/pgSQL. Executing Queries” lecture). For adding books, we
will implement the add_book function (“PL/pgSQL. Arrays”).
To enable book purchase, we will make the catalog_v view updatable
(“PL/pgSQL. Triggers”).
16
Summary
Database design is a separate complex topic
theory is important, but it should not replace common sense
Normalized data makes your life easier and facilitates
consistency support
The client-server interface can use tables, views, functions,
and triggers
17
Practice
1. Create the bookstore schema in the bookstore database.
Set up the search path to this schema at the database level.
2. In the bookstore schema, create books, authors, authorship,
and operations tables with all the necessary constraints,
exactly as shown in the demo.
3. Insert the data about several books into the tables.
Check the result by running some queries.
4. In the bookstore schema, create authors_v, catalog_v,
and operations_v views, so that they look exactly like shown
in the demo.
Check that the application now shows some data in “Books”,
“Authors”, and “Catalog” tabs.
Task 1. Recall the contents of the topic “Data Organization. Logical
Structure.”
Task 2. Use the demonstrated output of psql’s \d commands as a reference.
Task 3. You can use the data shown in the demo, or come up with your own
data.
Task 4. Try writing queries to the base tables that return the same results as
the queries to views shown in the demo. Then save these queries as views.
After completing the assignments, make sure to compare your queries with
those in the provided keys. Make corrections if required.
18
Practice
1. What additional attributes can appear for these entities as the
application evolves?
2. Suppose you have to store the information about the publisher.
Extend the ER-diagram accordingly and create the corresponding
tables.
3. Some books can belong to a series (such as “The Adventure
Collection”). How will it affect the schema?
4. Suppose our store started selling hardware equipment
(motherboards, CPUs, memory, storage devices, monitors, etc.).
What entities and attributes would you single out?
Keep in mind that new types of equipment constantly appear in
the market, and they can have their own specific characteristics.
Task 3. Different publishers can easily have different book series with the
same title.