Mentions légales du service

Skip to content
Snippets Groups Projects
PAPERMAN Charles's avatar
PAPERMAN Charles authored
Hotfix/coalesce

See merge request !98
6d287e14
History

Pydisk-querybuilder (PQB)

build status coverage report Code style: black Ruff Checked with mypy

The goal of PQB is to offer a programmatic interface to various SQL dialects. It aims to replace stringified Schema definitions and Query writing. It can be used to design custom ORMs or embedded within database-equipped applications with a lot of queries.

Most of the SQL query builders available within the Python ecosystem are interwoven within ORMs and/or abandoned projects and/or missing some key features. In particular, they often restrict their focus to the DQL language only. Many project that are reluctant to use ORM end up rebuilding their own tailored query builder. The goal of PQB is to offer a sound interface to the SQL syntax.

The advantage of using a querybuilder are manyfold:

  • Avoid complex bug-prone string templating
  • Avoid syntax errors
  • Avoid SQL injections
  • Simplify programmatic query generations
  • Provide helpers to transform some queries
  • Support DDL/DML/TCL languages

Quickstart

from uuid import UUID, uuid4 as uuid_gen
from querybuilder.helpers import table

@table
class person:
    id: UUID
    name: str
    birth_year: int

This code snippet creates a (non-mutable) table person. From those object we can build queries. Those are not tied to any database.

q = Person.select([Person.c.birth_year, Person.c.name]) # Person.c is a shorthand for Person.columns

We can then get the string representation of the query q by using str(q) which returns

SELECT Person.birth_year, Person.name FROM Person

We can apply simple transformation to q. As queries (and most object) are immutable by design, each operation returns a new updated query object.

q2 = q.add_where(q.c.birth_year.gt(1800)).set_limit(10).set_columns(["name"])

Then, formatting the query with str(q2) gives:

SELECT Person.name FROM Person WHERE Person.birth_year > 1800 FETCH FIRST 10 ROWS ONLY

Using an actual database backend

The module provides a standardization of access to the database backend with a rather precise transaction management. It should be considered as a wrapper that handles all the specificities of each database management system.

Because of the variety of actual support of some SQL features, it is however possible that some queries raise errors when sent to some DBMSs.

SQLite

from querybuilder.drivers.sqlite import Connector

db = Connector(":memory:") # The ":memory:" here is SQLite-specific for in-memory non-persistent database.

We can create the table person using person.create() which produces the following query:

CREATE TABLE Person (id UUID, name TEXT, birth_year INTEGER)

SQLite does not support UUID. Hence, when executing the above query in SQLite, the querybuilder automatically falls back to another type (here TEXT), while raising the following warning:

UserWarning: sqlite does not support UUID, using TEXT type instead

Here is how the query is executed in the SQLite database db:

db.execute(Person.create())

We can then create a query to populate the table:

q_insert = Person.insert_values(in_columns=(Person.c[["name", "birth_year"]]), values=[("Dijkstra", 1930), ("Euler", 1707), ("Steiner", 1796)])
db.execute(q_insert)

Now, our table admit three tuples:

cursor = db.execute(Person.select(Person.c[1:], orderby=Person.c.name))
results = list(cursor.fetchall())
print(results)

gives:

[('Dijkstra', 1930), ('Euler', 1707), ('Steiner', 1796)]

The values are safely embedded within q_insert (click me to see details).

The query q_insert above actually embedded the values within its formatting as shown by its pretty printing q_insert.pretty_print():

INSERT INTO Person (name, age) VALUES (?, 1930), (?, 1707), (?, 1796)
-- ↖{0: 'Dijkstra', 1: 'Euler', 2: 'Steiner'}

But it is still safe against injection as the string values will be passed as placeholders to an underlying prepared query:

q_inject = Person.insert_values(in_columns=Person.c[1:], values= [("\";DROP TABLE Person;", 0000)])
db.execute(q_inject)
cursor = db.execute(person.select(person.c[1:], where=person.c.birth_year.eq(0)))
results = list(cursor.fetchall())
print(results)

gives:

[('"; DROP TABLE Person;"', 0)]

We can also define updates easily, e.g., with the following query:

q_upd = Person.update(dict(id=uuid_gen()), where=Person.c.name.eq("Dijkstra"))
db.execute(q_upd)

which pretty prints as:

UPDATE Person SET id = ? WHERE Person.name = ?
-- ↖{0: UUID('33525c22-f938-4256-b673-e595ff6df828'), 1: 'Dijkstra'}

Of course, deletions are also possible, e.g.:

q_del = Person.delete(where=Person.c.id.isnull())
db.execute(q_del)

is formatted as:

DELETE FROM Person WHERE Person.id IS NULL

If we throw in more complicated tables, we will see that the querybuilder can handle some DBMS specificities on its own.

from querybuilder.helpers import table, colspec
@table
class tag:
    id: colspec(int, primary_key=True, generated_as_identity=True)
    name: str

Calling tag.create() yields the following query:

CREATE TABLE tag (id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY, name TEXT)

While the generated always as identity constraint is not supported by SQLite, it is possible to rely on the AUTOINCREMENT feature for a similar results. This fallback can be observed using the db.stringify method which returns the dialect-aware (here SQLite) raw formatting of the query, namely the query that would be executed by db.execute.

db.stringify(tag.create())
CREATE TABLE tag (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT)

Note that a warning message notifies the user that the GENERATED ALWAYS AS IDENTITY constraint has been imperfectly simulated by the AUTOINCREMENT feature of SQLite.

db.execute(tag.create())

We can then populate the table through INSERT queries:

q_tag = tag.insert_values(in_columns=(tag.c.name,), values=[("mathematician",)])

The string value is automatically passed through a placeholder to avoid SQL injections:

q_tag.pretty_print()
INSERT INTO tag (name) VALUES (?) -- ↖{0: 'mathematician'}

Then to actually insert into the database we simply do:

db.execute(q_tag)

We may also want to bulk insert without having to handle values directly by using placeholders.

q_tag_bulk = tag.insert_many(in_columns=(tag.c.name,))
db.executemany(q_tag_bulk, [dict(name=f"value_{i}") for i in range(1000)])

The query q_tag_bulk contains a placeholder with no embedded values.

Looking at q_tag_bulk.pretty_print(), we can see that it uses a placeholder, named :name, which is associated with the special object MISSING represented by ø.

INSERT INTO tag (name) VALUES (:name) -- ↖{name: ø}

Indeed, q_tag_bulk does not embed the values for the placeholders: they should be provided at execution time. Both db.execute and db.executemany allows to provide the values for valueless placeholders. The former expects one value for each such placeholders, while the second expects an iterable of these values, as illustrated by the above execution.

We can also delete those useless tags with the following query:

q_delete = tag.delete(where=tag.c.name.like("value_%"))
db.execute(q_delete)

Finally, we can build a last table with references to connect tag with person:

@table
class tag_person:
    person_id: person.c.id
    tag_id: tag.c.id
db.execute(tag_person.create())

The query given by str(tag_person.create()) is:

CREATE TABLE tag_person(person_id TEXT REFERENCES Person(id), tag_id INTEGER REFERENCES tag(id))

Again, within SQLite, the UUID type will fallback to TEXT.

Remark that it automatically infers the types and add the constraints associated with the table definitions.

Let us label Dijkstra as a Mathematician:

condition = Person.c.name.eq("Dijkstra") & tag.c.name.eq("mathematician")
q_prod = Person.product(tag).select(columns=(Person.c.id, tag.c.id), where=condition)

The query q_prod is of the following shape:

SELECT Person.id, tag.id
FROM Person, tag
WHERE Person.name = ? AND tag.name = ?
-- ↖{0: 'Dijkstra', 1: 'mathematician'}

We can insert the result in tag_person with:

db.execute(tag_person.insert(query=q_prod))

Let us check by getting the list of mathematicians using the following complex query

q_math = tag_person\
    .inner_join(tag, on=tag.c.id.eq(tag_person.c.tag_id))\
    .inner_join(person, on=person.c.id.eq(tag_person.c.person_id))\
    .select([person.c.name], where=tag.c.name.eq("mathematician"))

which is formatted as

SELECT Person.name FROM (tag_person INNER JOIN tag ON tag.tag_id = tag_person.id) INNER JOIN Person ON Person.id = tag_person.person_id WHERE tag.name = ?

and whose execution

cursor = db.execute(q_math)
results = list(cursor.fetchall())
print(results)

will return only "Dijkstra"

[("Dijkstra",)]

PostgreSQL

The previous queries work seamlessly on PostgreSQL by simply using the PostgreSQL connector:

from querybuilder.drivers.postgresql import Connector

db = Connector() # Postgresql has default connection information

The only differences with respect to using the SQLite driver are:

  1. We need to specify that person.id is the primary key of person to allow us to use it as a foreign key in tag_person;
  2. We do not have warnings, since PostgreSQL has implementations of type UUID, and of GENERATED ALWAYS AS IDENTITY.
@table
class Person:
    id : colspec(UUID, primary_key=True)
    name : str
    birth_year : int
@table
class tag:
    id: colspec(int, primary_key=True, generated_as_identity=True)
    name: str
@table
class tag_person:
    tag_id: tag.c.id
    person_id: person.c.id
db.execute(person.create(temporary=True))
db.execute(tag.create(temporary=True))
db.execute(tag_person.create(temporary=True))

Since PostgreSQL has no non-persistent in-memory database, we indicated temporary=True when generating the creation query of each table, so that CREATE TEMPORARY TABLE queries are produced. For instance, str(Person.create(temporary=True)) gives the query:

CREATE TEMPORARY TABLE Person (id UUID PRIMARY KEY, name TEXT, birth_year INTEGER)