Last edited by Mek

June 27, 2021 | History

Edit

Open Library Performance Issues

There are some queries which are taking more than a minute to execute. The issues and causes are discussed in document along with some ideas to rectify them.

Access Pattern

The data access pattern can be divided in to 3 groups.

Read all properties of an object at latest or a previous revision
Find keys of all objects matching a particular query
Find all versions matching a particular query

Schema

The database schema has 3 tables, thing, version and datum.

    thing:
        id
        key
        latest_revision
        last_modified
        created
    version:
        thing_id
        revision
        created
        comment
        ip
        machine_comment
datum:
        thing_id
        begin_revision
        end_revision
        key
        value
        datatype
        ordering

Data of all revisions of each object is stored in the datum table.

Performance Issues

Some queries are taking more than a minute to execute. Lets look at an example.

Here is the query plan for things query {"type": "/type/edition", "authors": "/a/OL1458403A", "limit": "20"}.

infobase_production=# explain SELECT thing.key FROM thing, datum as d0, datum as d1
    WHERE thing.site_id = 1
    AND d0.thing_id = thing.id AND d0.end_revision = 2147483647 AND d0.key = 'authors' AND cast(d0.value as int) = 4955588 AND d0.datatype = 0
    AND d1.thing_id = thing.id AND d1.end_revision = 2147483647 AND d1.key = 'type' AND cast(d1.value as int) = 52 AND d1.datatype = 0
    LIMIT 20 OFFSET 0;
QUERY PLAN
-----------------------------------------------------------------------
 Limit  (cost=58668.07..59314.62 rows=7 width=17)
  ->  Nested Loop  (cost=58668.07..59314.62 rows=7 width=17)
        ->  Merge Join  (cost=58668.07..58777.90 rows=53 width=8)
              Merge Cond: (d0.thing_id = d1.thing_id)
              ->  Sort  (cost=24322.35..24344.94 rows=9035 width=4)
                    Sort Key: d0.thing_id
                    ->  Bitmap Heap Scan on datum d0 (cost=1673.50..23728.70 rows=9035 width=4)
                          Recheck Cond: (("key" = 'authors'::text) AND ((value)::integer = 4955588) AND (datatype = 0) AND (end_revision = 2147483647))
                          ->  Bitmap Index Scan on datum_key_value_ref_idx  (cost=0.00..1671.24 rows=9035 width=0)
                                Index Cond: (("key" = 'authors'::text) AND ((value)::integer = 4955588))
              ->  Sort  (cost=34345.71..34377.78 rows=12825 width=4)
                    Sort Key: d1.thing_id
                    ->  Bitmap Heap Scan on datum d1 (cost=2371.88..33470.62 rows=12825 width=4)
                          Recheck Cond: (("key" = 'type'::text) AND ((value)::integer = 52) AND (datatype = 0) AND (end_revision = 2147483647))
                          ->  Bitmap Index Scan on datum_key_value_ref_idx  (cost=0.00..2368.67 rows=12825 width=0)
                                Index Cond: (("key" = 'type'::text) AND ((value)::integer = 52))
        ->  Index Scan using thing_pkey on thing  (cost=0.00..10.11 rows=1 width=21)
              Index Cond: (d0.thing_id = thing.id)
              Filter: (site_id = 1)

For executing this query, more than 2 million disk blocks are read.

infobase_production=# select * from pg_statio_user_tables;
 relid  | schemaname | relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | tidx_blks_read | tidx_blks_hit
--------+------------+---------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
 304963 | public     | datum   |        2247162 |           657 |         31836 |        28822 |               0 |              0 |              0 |             0
 304958 | public     | account |              0 |             0 |               |              |               0 |              0 |              0 |             0
 304987 | public     | version |              0 |             0 |             0 |            0 |               0 |              0 |              0 |             0
 304977 | public     | thing   |              4 |             0 |            11 |            6 |               0 |              0 |              0 |             0
 304970 | public     | site    |              0 |             0 |             0 |            0 |               0 |              0 |              0 |             0
(5 rows)

To see these statistics, the parameter stats_block_level must be enabled before executing the query.

infobase_production=# select set_config('stats_block_level', 'on', false);
 set_config
------------
 on
(1 row)

see http://www.postgresql.org/docs/8.2/static/monitoring-stats.html to learn more about "Monitoring Database Activity".

When postgresql doing Recheck Cond, it reads the actual data blocks and checks the condition. Since all the rows in datum are stored together, the required rows are scattered all over. Which means reading more number of disk blocks. In the worst case, no two rows of the required property shares a disk block.

If all rows of a property are stored together, then the number of disk blocks read will be less.

Ideas for improving performance

Here are some ideas to improve the performance. Some of these ideas are subset of the other, some ideas can be implemented together.

Save type in thing table

Since, every object must have a type property, we can speed up the queries by keeping the type of the latest revision of every object in thing table. However, the type must still continue to be stored in the datum table, as it can change with revisions.

Pre-compute important queries

In the Open Library, the two most frequently used queries are:

getting books written by a particular author (from author view page).
get all authors whose name starts with a prefix (from author author complete in edition edit page)

It is possible to pre-compute all this data and store it in memory to answer the queries fast. These data structures must also be updated on ever write operation.

Separate storage for reading objects and queries.

If we store data of each datatype in a separate table, reading objects will be comes slow because it has to read from multiple tables. Since we are allowing queries only on the latest revisions, the old revisions are not required for executing the queries. So it might be a good idea to separate reading objects from executing queries. One way to do this is to store the JSON representation of that object directly in the version table.

thing: id key latest_revision last_modified created version: thing_id revision created comment ip machine_comment data datum_int: thing_id key value int ordering

datum_ref:
    thing_id
    key
    value int references thing
    ordering
datum_float:
    thing_id
    key
    value float
    ordering
datum_string:
    thing_id
    key
    value text
    ordering

Partial Vertical Partitioning

As discussed in [2], Vertical Partitioning is a method of storing values for every property in a separate table. The authors claim that this approach is very scalable. However schema changes are painful in this approach. Changing schema requires creation and deletion of tables. In Infobase, schema changes are trivial to make. It would be nice to get the best of both approaches.

In every system, there will be some important parts of the schema, which will not change. For example in the Open Library, every book will always have a authors and every author will have a name. So if the system uses vertical partitioning on some specified properties and uses the default behavior on the rest, we'll get the best of both approaches.

References

[1] Freebase Blog: A Brief Tour of Graphd
[2] Scalable Semantic Web Data Management Using Vertical Partitioning (pdf)

Updated Performance after adding type column to thing table

Test 1: `{"type": "/type/edition", "authors": "/a/OL1458403A", "limit": "20"}`.

    SELECT thing.key FROM thing, datum as d0
        WHERE thing.site_id = 1 and thing.type = 52
        AND d0.thing_id = thing.id AND d0.end_revision = 2147483647 AND d0.key = 'authors' AND cast(d0.value as int) = 4955588 AND d0.datatype = 0
        LIMIT 20 OFFSET 0;

Disk blocks read:

 relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit
---------+----------------+---------------+---------------+--------------
 thing   |              3 |             1 |             6 |            6
 datum   |              3 |             1 |             5 |            0

Test 2: `{"type": "/type/edition", "authors": "/a/OL1458403A", "publisher": "Dover Publications", "limit": "20"}`.

Query:

SELECT thing.key FROM thing, datum as d0, datum as d1
    WHERE thing.site_id = 1 and thing.type = 52
    AND d0.thing_id = thing.id AND d0.end_revision = 2147483647 AND d0.key = 'authors' AND cast(d0.value as int) = 4955588 AND d0.datatype = 0
    AND d1.thing_id = thing.id AND d1.end_revision = 2147483647 AND d1.key = 'publishers' AND d1.value='Dover Publications' AND d1.datatype=2
    LIMIT 20 OFFSET 0;

Total runtime: 9586.715 ms

Disk stats:

 relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit
---------+----------------+---------------+---------------+--------------
 thing   |              0 |             0 |             0 |            0
 datum   |           3848 |          5160 |           118 |        48360

Test 3: `{"type": "/type/edition", "authors": "/a/OL1458403A", "sort": "title", "limit": "20"}`.

SELECT thing.key FROM thing, datum as d0, datum as d1
    WHERE thing.site_id = 1 and thing.type = 52
    AND d0.thing_id = thing.id AND d0.end_revision = 2147483647 AND d0.key = 'authors' AND cast(d0.value as int) = 4955588 AND d0.datatype = 0
    AND d1.thing_id = thing.id AND d1.end_revision = 2147483647 AND d1.key = 'title' AND d1.datatype=2
    ORDER BY d1.value LIMIT 20 OFFSET 0;

Takes too long (more than 15 minutes).

History

Created February 4, 2009
2 revisions

June 27, 2021	Edited by Mek	Edited without comment.
February 4, 2009	Created by Anand Chitipothu	moving old docs from staging to production