RDF Store

class pyoxigraph.Store(path=None)

RDF store.

It encodes a RDF dataset and allows to query it using SPARQL. It is based on the RocksDB key-value database.

This store ensures the “repeatable read” isolation level: the store only exposes changes that have been “committed” (i.e. no partial writes) and the exposed state does not change for the complete duration of a read operation (e.g. a SPARQL query) or a read/write operation (e.g. a SPARQL update).

The Store constructor opens a read-write instance. To open a static read-only instance use Store.read_only() and to open a read-only instance that tracks a read-write instance use Store.secondary().

Parameters:

path (str or os.PathLike[str] or None, optional) – the path of the directory in which the store should read and write its data. If the directory does not exist, it is created. If no directory is provided a temporary one is created and removed when the Python garbage collector removes the store. In this case, the store data are kept in memory and never written on disk.

Raises:

OSError – if the target directory contains invalid data or could not be accessed.

The str function provides a serialization of the store in NQuads:

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g')))
>>> str(store)
'<http://example.com> <http://example.com/p> "1" <http://example.com/g> .\n'
add(quad)

Adds a quad to the store.

Parameters:

quad (Quad) – the quad to add.

Return type:

None

Raises:

OSError – if an error happens during the quad insertion.

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g')))
>>> list(store)
[<Quad subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
add_graph(graph_name)

Adds a named graph to the store.

Parameters:

graph_name (NamedNode or BlankNode or DefaultGraph) – the name of the name graph to add.

Return type:

None

Raises:

OSError – if an error happens during the named graph insertion.

>>> store = Store()
>>> store.add_graph(NamedNode('http://example.com/g'))
>>> list(store.named_graphs())
[<NamedNode value=http://example.com/g>]
backup(target_directory)

Creates database backup into the target_directory.

After its creation, the backup is usable using Store constructor. like a regular pyxigraph database and operates independently from the original database.

Warning: Backups are only possible for on-disk databases created by providing a path to Store constructor. Temporary in-memory databases created without path are not compatible with the backup system.

Warning: An error is raised if the target_directory already exists.

If the target directory is in the same file system as the current database, the database content will not be fully copied but hard links will be used to point to the original database immutable snapshots. This allows cheap regular backups.

If you want to move your data to another RDF storage system, you should have a look at the dump_dataset() function instead.

Parameters:

target_directory (str or os.PathLike[str]) – the directory name to save the database to.

Return type:

None

Raises:

OSError – if an error happens during the backup.

bulk_extend(quads)

Adds a set of quads to this store.

This function is designed to be as fast as possible without transactional guarantees. Only a part of the data might be written to the store.

Parameters:

quads (collections.abc.Iterable[Quad]) – the quads to add.

Return type:

None

Raises:

OSError – if an error happens during the quad insertion.

>>> store = Store()
>>> store.bulk_extend([Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g'))])
>>> list(store)
[<Quad subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
bulk_load(input=None, format=None, *, path=None, base_iri=None, to_graph=None)

Loads an RDF serialization into the store.

This function is designed to be as fast as possible on big files without transactional guarantees. If the file is invalid only a piece of it might be written to the store.

The load() method is also available for loads with transactional guarantees.

It currently supports the following formats:

It supports also some media type and extension aliases. For example, application/turtle could also be used for Turtle and application/xml or xml for RDF/XML.

Parameters:
  • input (bytes or str or IO[bytes] or IO[str] or None, optional) – The str, bytes or I/O object to read from. For example, it could be the file content as a string or a file reader opened in binary mode with open('my_file.ttl', 'rb').

  • format (RdfFormat or None, optional) – the format of the RDF serialization. If None, the format is guessed from the file name extension.

  • path (str or os.PathLike[str] or None, optional) – The file path to read from. Replaces the input parameter.

  • base_iri (str or None, optional) – the base IRI used to resolve the relative IRIs in the file or None if relative IRI resolution should not be done.

  • to_graph (NamedNode or BlankNode or DefaultGraph or None, optional) – if it is a file composed of triples, the graph in which the triples should be stored. By default, the default graph is used.

Return type:

None

Raises:
  • ValueError – if the format is not supported.

  • SyntaxError – if the provided data is invalid.

  • OSError – if an error happens during a quad insertion or if a system error happens while reading the file.

>>> store = Store()
>>> store.bulk_load(input=b'<foo> <p> "1" .', format=RdfFormat.TURTLE, base_iri="http://example.com/", to_graph=NamedNode("http://example.com/g"))
>>> list(store)
[<Quad subject=<NamedNode value=http://example.com/foo> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
clear()

Clears the store by removing all its contents.

Return type:

None

Raises:

OSError – if an error happens during the operation.

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g')))
>>> store.clear()
>>> list(store)
[]
>>> list(store.named_graphs())
[]
clear_graph(graph_name)

Clears a graph from the store without removing it.

Parameters:

graph_name (NamedNode or BlankNode or DefaultGraph) – the name of the name graph to clear.

Return type:

None

Raises:

OSError – if an error happens during the operation.

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g')))
>>> store.clear_graph(NamedNode('http://example.com/g'))
>>> list(store)
[]
>>> list(store.named_graphs())
[<NamedNode value=http://example.com/g>]
contains_named_graph(graph_name)

Returns if the store contains the given named graph.

Parameters:

graph_name (NamedNode or BlankNode or DefaultGraph) – the name of the named graph.

Return type:

bool

Raises:

OSError – if an error happens during the named graph lookup.

>>> store = Store()
>>> store.add_graph(NamedNode('http://example.com/g'))
>>> store.contains_named_graph(NamedNode('http://example.com/g'))
True
dump(output=None, format=None, *, from_graph=None)

Dumps the store quads or triples into a file.

It currently supports the following formats:

It supports also some media type and extension aliases. For example, application/turtle could also be used for Turtle and application/xml or xml for RDF/XML.

Parameters:
  • output (IO[bytes] or str or os.PathLike[str] or None, optional) – The binary I/O object or file path to write to. For example, it could be a file path as a string or a file writer opened in binary mode with open('my_file.ttl', 'wb'). If None, a bytes buffer is returned with the serialized content.

  • format (RdfFormat or None, optional) – the format of the RDF serialization. If None, the format is guessed from the file name extension.

  • from_graph (NamedNode or BlankNode or DefaultGraph or None, optional) – the store graph from which dump the triples. Required if the serialization format does not support named graphs. If it does supports named graphs the full dataset is written.

Returns:

bytes with the serialization if the output parameter is None, None if output is set.

Return type:

bytes or None

Raises:
  • ValueError – if the format is not supported or the from_graph parameter is not given with a syntax not supporting named graphs.

  • OSError – if an error happens during a quad lookup or file writing.

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1')))
>>> store.dump(format=RdfFormat.TRIG)
b'<http://example.com> <http://example.com/p> "1" .\n'
>>> import io
>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g')))
>>> output = io.BytesIO()
>>> store.dump(output, RdfFormat.TURTLE, from_graph=NamedNode("http://example.com/g"))
>>> output.getvalue()
b'<http://example.com> <http://example.com/p> "1" .\n'
extend(quads)

Adds atomically a set of quads to this store.

Insertion is done in a transactional manner: either the full operation succeeds or nothing is written to the database. The bulk_extend() method is also available for much faster loading of a large number of quads but without transactional guarantees.

Parameters:

quads (collections.abc.Iterable[Quad]) – the quads to add.

Return type:

None

Raises:

OSError – if an error happens during the quad insertion.

>>> store = Store()
>>> store.extend([Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g'))])
>>> list(store)
[<Quad subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
flush()

Flushes all buffers and ensures that all writes are saved on disk.

Flushes are automatically done using background threads but might lag a little bit.

Return type:

None

Raises:

OSError – if an error happens during the flush.

load(input=None, format=None, *, path=None, base_iri=None, to_graph=None)

Loads an RDF serialization into the store.

Loads are applied in a transactional manner: either the full operation succeeds or nothing is written to the database. The bulk_load() method is also available for much faster loading of big files but without transactional guarantees.

Beware, the full file is loaded into memory.

It currently supports the following formats:

It supports also some media type and extension aliases. For example, application/turtle could also be used for Turtle and application/xml or xml for RDF/XML.

Parameters:
  • input (bytes or str or IO[bytes] or IO[str] or None, optional) – The str, bytes or I/O object to read from. For example, it could be the file content as a string or a file reader opened in binary mode with open('my_file.ttl', 'rb').

  • format (RdfFormat or None, optional) – the format of the RDF serialization. If None, the format is guessed from the file name extension.

  • path (str or os.PathLike[str] or None, optional) – The file path to read from. Replaces the input parameter.

  • base_iri (str or None, optional) – the base IRI used to resolve the relative IRIs in the file or None if relative IRI resolution should not be done.

  • to_graph (NamedNode or BlankNode or DefaultGraph or None, optional) – if it is a file composed of triples, the graph in which the triples should be stored. By default, the default graph is used.

Return type:

None

Raises:
  • ValueError – if the format is not supported.

  • SyntaxError – if the provided data is invalid.

  • OSError – if an error happens during a quad insertion or if a system error happens while reading the file.

>>> store = Store()
>>> store.load(input='<foo> <p> "1" .', format=RdfFormat.TURTLE, base_iri="http://example.com/", to_graph=NamedNode("http://example.com/g"))
>>> list(store)
[<Quad subject=<NamedNode value=http://example.com/foo> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
named_graphs()

Returns an iterator over all the store named graphs.

Returns:

an iterator of the store graph names.

Return type:

collections.abc.Iterator[NamedNode or BlankNode]

Raises:

OSError – if an error happens during the named graphs lookup.

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g')))
>>> list(store.named_graphs())
[<NamedNode value=http://example.com/g>]
optimize()

Optimizes the database for future workload.

Useful to call after a batch upload or another similar operation.

Return type:

None

Raises:

OSError – if an error happens during the optimization.

quads_for_pattern(subject, predicate, object, graph_name=None)

Looks for the quads matching a given pattern.

Parameters:
Returns:

an iterator of the quads matching the pattern.

Return type:

collections.abc.Iterator[Quad]

Raises:

OSError – if an error happens during the quads lookup.

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g')))
>>> list(store.quads_for_pattern(NamedNode('http://example.com'), None, None, None))
[<Quad subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
query(query, *, base_iri=None, use_default_graph_as_union=False, default_graph=None, named_graphs=None)

Executes a SPARQL 1.1 query.

Parameters:
  • query (str) – the query to execute.

  • base_iri (str or None, optional) – the base IRI used to resolve the relative IRIs in the SPARQL query or None if relative IRI resolution should not be done.

  • use_default_graph_as_union (bool, optional) – if the SPARQL query should look for triples in all the dataset graphs by default (i.e. without GRAPH operations). Disabled by default.

  • default_graph (NamedNode or BlankNode or DefaultGraph or list[NamedNode or BlankNode or DefaultGraph] or None, optional) – list of the graphs that should be used as the query default graph. By default, the store default graph is used.

  • named_graphs (list[NamedNode or BlankNode] or None, optional) – list of the named graphs that could be used in SPARQL GRAPH clause. By default, all the store named graphs are available.

Returns:

a bool for ASK queries, an iterator of Triple for CONSTRUCT and DESCRIBE queries and an iterator of QuerySolution for SELECT queries.

Return type:

QuerySolutions or QueryBoolean or QueryTriples

Raises:
  • SyntaxError – if the provided query is invalid.

  • OSError – if an error happens while reading the store.

SELECT query:

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1')))
>>> [solution['s'] for solution in store.query('SELECT ?s WHERE { ?s ?p ?o }')]
[<NamedNode value=http://example.com>]

CONSTRUCT query:

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1')))
>>> list(store.query('CONSTRUCT WHERE { ?s ?p ?o }'))
[<Triple subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>>>]

ASK query:

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1')))
>>> bool(store.query('ASK { ?s ?p ?o }'))
True
static read_only(path)

Opens a read-only store from disk.

Opening as read-only while having an other process writing the database is undefined behavior. Store.secondary() should be used in this case.

Parameters:

path (str) – path to the primary read-write instance data.

Returns:

the opened store.

Return type:

Store

Raises:

OSError – if the target directory contains invalid data or could not be accessed.

remove(quad)

Removes a quad from the store.

Parameters:

quad (Quad) – the quad to remove.

Return type:

None

Raises:

OSError – if an error happens during the quad removal.

>>> store = Store()
>>> quad = Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g'))
>>> store.add(quad)
>>> store.remove(quad)
>>> list(store)
[]
remove_graph(graph_name)

Removes a graph from the store.

The default graph will not be removed but just cleared.

Parameters:

graph_name (NamedNode or BlankNode or DefaultGraph) – the name of the name graph to remove.

Return type:

None

Raises:

OSError – if an error happens during the named graph removal.

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g')))
>>> store.remove_graph(NamedNode('http://example.com/g'))
>>> list(store.named_graphs())
[]
static secondary(primary_path, secondary_path=None)

Opens a read-only clone of a running read-write store.

Changes done while this process is running will be replicated after a possible lag.

It should only be used if a primary instance opened with Store() is running at the same time.

If you want to simple read-only store use Store.read_only().

Parameters:
  • primary_path (str) – path to the primary read-write instance data.

  • secondary_path (str or None, optional) – path to an other directory for the secondary instance cache. If not given a temporary directory will be used.

Returns:

the opened store.

Return type:

Store

Raises:

OSError – if the target directories contain invalid data or could not be accessed.

update(update, *, base_iri=None)

Executes a SPARQL 1.1 update.

Updates are applied in a transactional manner: either the full operation succeeds or nothing is written to the database.

Parameters:
  • update (str) – the update to execute.

  • base_iri (str or None, optional) – the base IRI used to resolve the relative IRIs in the SPARQL update or None if relative IRI resolution should not be done.

Return type:

None

Raises:
  • SyntaxError – if the provided update is invalid.

  • OSError – if an error happens while reading the store.

INSERT DATA update:

>>> store = Store()
>>> store.update('INSERT DATA { <http://example.com> <http://example.com/p> "1" }')
>>> list(store)
[<Quad subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<DefaultGraph>>]

DELETE DATA update:

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1')))
>>> store.update('DELETE DATA { <http://example.com> <http://example.com/p> "1" }')
>>> list(store)
[]

DELETE update:

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1')))
>>> store.update('DELETE WHERE { <http://example.com> ?p ?o }')
>>> list(store)
[]