RDF Store

class pyoxigraph.Store(path=None)

RDF store.

It encodes a RDF dataset and allows to query it using SPARQL. It is based on the RocksDB key-value database.

This store ensures the “repeatable read” isolation level: the store only exposes changes that have been “committed” (i.e. no partial writes) and the exposed state does not change for the complete duration of a read operation (e.g. a SPARQL query) or a read/write operation (e.g. a SPARQL update).

The Store constructor opens a read-write instance. To open a static read-only instance use Store.read_only().

Parameters:

path (str or os.PathLike[str] or None, optional) – the path of the directory in which the store should read and write its data. If the directory does not exist, it is created. If no directory is provided a temporary one is created and removed when the Python garbage collector removes the store. In this case, the store data are kept in memory and never written on disk.

Raises:

OSError – if the target directory contains invalid data or could not be accessed.

The str function provides a serialization of the store in NQuads:

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g')))
>>> str(store)
'<http://example.com> <http://example.com/p> "1" <http://example.com/g> .\n'
add(quad)

Adds a quad to the store.

Parameters:

quad (Quad) – the quad to add.

Return type:

None

Raises:

OSError – if an error happens during the quad insertion.

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g')))
>>> list(store)
[<Quad subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
add_graph(graph_name)

Adds a named graph to the store.

Parameters:

graph_name (NamedNode or BlankNode or DefaultGraph) – the name of the name graph to add.

Return type:

None

Raises:

OSError – if an error happens during the named graph insertion.

>>> store = Store()
>>> store.add_graph(NamedNode('http://example.com/g'))
>>> list(store.named_graphs())
[<NamedNode value=http://example.com/g>]
backup(target_directory)

Creates database backup into the target_directory.

After its creation, the backup is usable using Store constructor. like a regular pyxigraph database and operates independently from the original database.

Warning: Backups are only possible for on-disk databases created by providing a path to Store constructor. Temporary in-memory databases created without path are not compatible with the backup system.

Warning: An error is raised if the target_directory already exists.

If the target directory is in the same file system as the current database, the database content will not be fully copied but hard links will be used to point to the original database immutable snapshots. This allows cheap regular backups.

If you want to move your data to another RDF storage system, you should have a look at the dump_dataset() function instead.

Parameters:

target_directory (str or os.PathLike[str]) – the directory name to save the database to.

Return type:

None

Raises:

OSError – if an error happens during the backup.

bulk_extend(quads)

Adds a set of quads to this store without keeping them all into memory.

It always writes new files to disk, the extend() method is also available for fast insertion of a small number of quads.

Parameters:

quads (collections.abc.Iterable[Quad]) – the quads to add.

Return type:

None

Raises:

OSError – if an error happens during the quad insertion.

>>> store = Store()
>>> store.bulk_extend([Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g'))])
>>> list(store)
[<Quad subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
bulk_load(input=None, format=None, *, path=None, base_iri=None, to_graph=None, lenient=False)

Loads some RDF serialization into the store without keeping it all into memory.

This function is designed to be as fast as possible on big files.

It always writes new files to disk, the load() method is also available for fast insertion of small files.

It currently supports the following formats:

Parameters:
  • input (bytes or str or IO[bytes] or IO[str] or None, optional) – The str, bytes or I/O object to read from. For example, it could be the file content as a string or a file reader opened in binary mode with open('my_file.ttl', 'rb').

  • format (RdfFormat or None, optional) – the format of the RDF serialization. If None, the format is guessed from the file name extension.

  • path (str or os.PathLike[str] or None, optional) – The file path to read from. Replace the input parameter.

  • base_iri (str or None, optional) – the base IRI used to resolve the relative IRIs in the file or None if relative IRI resolution should not be done.

  • to_graph (NamedNode or BlankNode or DefaultGraph or None, optional) – if it is a file composed of triples, the graph in which the triples should be stored. By default, the default graph is used.

  • lenient (bool, optional) – Skip some data validation during loading, like validating IRIs. This makes parsing faster at the cost of maybe ingesting invalid data.

Return type:

None

Raises:
  • ValueError – if the format is not supported.

  • SyntaxError – if the provided data is invalid.

  • OSError – if an error happens during a quad insertion or if a system error happens while reading the file.

>>> store = Store()
>>> store.bulk_load(input=b'<foo> <p> "1" .', format=RdfFormat.TURTLE, base_iri="http://example.com/", to_graph=NamedNode("http://example.com/g"))
>>> list(store)
[<Quad subject=<NamedNode value=http://example.com/foo> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
clear()

Clears the store by removing all its contents.

Return type:

None

Raises:

OSError – if an error happens during the operation.

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g')))
>>> store.clear()
>>> list(store)
[]
>>> list(store.named_graphs())
[]
clear_graph(graph_name)

Clears a graph from the store without removing it.

Parameters:

graph_name (NamedNode or BlankNode or DefaultGraph) – the name of the name graph to clear.

Return type:

None

Raises:

OSError – if an error happens during the operation.

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g')))
>>> store.clear_graph(NamedNode('http://example.com/g'))
>>> list(store)
[]
>>> list(store.named_graphs())
[<NamedNode value=http://example.com/g>]
contains_named_graph(graph_name)

Returns if the store contains the given named graph.

Parameters:

graph_name (NamedNode or BlankNode or DefaultGraph) – the name of the named graph.

Return type:

bool

Raises:

OSError – if an error happens during the named graph lookup.

>>> store = Store()
>>> store.add_graph(NamedNode('http://example.com/g'))
>>> store.contains_named_graph(NamedNode('http://example.com/g'))
True
dump(output=None, format=None, *, from_graph=None, prefixes=None, base_iri=None)

Dumps the store quads or triples into a file.

It currently supports the following formats:

Parameters:
  • output (IO[bytes] or str or os.PathLike[str] or None, optional) – The binary I/O object or file path to write to. For example, it could be a file path as a string or a file writer opened in binary mode with open('my_file.ttl', 'wb'). If None, a bytes buffer is returned with the serialized content.

  • format (RdfFormat or None, optional) – the format of the RDF serialization. If None, the format is guessed from the file name extension.

  • from_graph (NamedNode or BlankNode or DefaultGraph or None, optional) – the store graph from which dump the triples. Required if the serialization format does not support named graphs. If it does supports named graphs the full dataset is written.

  • prefixes (dict[str, str] or None, optional) – the prefixes used in the serialization if the format supports it.

  • base_iri (str or None, optional) – the base IRI used in the serialization if the format supports it.

Returns:

bytes with the serialization if the output parameter is None, None if output is set.

Return type:

bytes or None

Raises:
  • ValueError – if the format is not supported or the from_graph parameter is not given with a syntax not supporting named graphs.

  • OSError – if an error happens during a quad lookup or file writing.

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1')))
>>> store.dump(format=RdfFormat.TRIG)
b'<http://example.com> <http://example.com/p> "1" .\n'
>>> import io
>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g')))
>>> output = io.BytesIO()
>>> store.dump(output, RdfFormat.TURTLE, from_graph=NamedNode("http://example.com/g"), prefixes={"ex": "http://example.com/"}, base_iri="http://example.com")
>>> output.getvalue()
b'@base <http://example.com> .\n@prefix ex: </> .\n<> ex:p "1" .\n'
extend(quads)

Adds a set of quads to this store.

Insertion is done in a transactional manner: either the full operation succeeds, or nothing is written to the database. The bulk_extend() method is also available for loading of a very large number of quads without having them all into memory.

Parameters:

quads (collections.abc.Iterable[Quad]) – the quads to add.

Return type:

None

Raises:

OSError – if an error happens during the quad insertion.

>>> store = Store()
>>> store.extend([Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g'))])
>>> list(store)
[<Quad subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
flush()

Flushes all buffers and ensures that all writes are saved on disk.

Flushes are automatically done using background threads but might lag a little bit.

Return type:

None

Raises:

OSError – if an error happens during the flush.

load(input=None, format=None, *, path=None, base_iri=None, to_graph=None, lenient=False)

Loads RDF serialization into the store.

Loads are applied in a transactional manner: either the full operation succeeds, or nothing is written to the database. The bulk_load() method is also available for loading big files without loading all its content into memory.

Beware, the full file is loaded into memory.

It currently supports the following formats:

Parameters:
  • input (bytes or str or IO[bytes] or IO[str] or None, optional) – The str, bytes or I/O object to read from. For example, it could be the file content as a string or a file reader opened in binary mode with open('my_file.ttl', 'rb').

  • format (RdfFormat or None, optional) – the format of the RDF serialization. If None, the format is guessed from the file name extension.

  • path (str or os.PathLike[str] or None, optional) – The file path to read from. Replace the input parameter.

  • base_iri (str or None, optional) – the base IRI used to resolve the relative IRIs in the file or None if relative IRI resolution should not be done.

  • to_graph (NamedNode or BlankNode or DefaultGraph or None, optional) – if it is a file composed of triples, the graph in which the triples should be stored. By default, the default graph is used.

  • lenient (bool, optional) – Skip some data validation during loading, like validating IRIs. This makes parsing faster at the cost of maybe ingesting invalid data.

Return type:

None

Raises:
  • ValueError – if the format is not supported.

  • SyntaxError – if the provided data is invalid.

  • OSError – if an error happens during a quad insertion or if a system error happens while reading the file.

>>> store = Store()
>>> store.load(input='<foo> <p> "1" .', format=RdfFormat.TURTLE, base_iri="http://example.com/", to_graph=NamedNode("http://example.com/g"))
>>> list(store)
[<Quad subject=<NamedNode value=http://example.com/foo> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
named_graphs()

Returns an iterator over all the store named graphs.

Returns:

an iterator of the store graph names.

Return type:

collections.abc.Iterator[NamedNode or BlankNode]

Raises:

OSError – if an error happens during the named graphs lookup.

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g')))
>>> list(store.named_graphs())
[<NamedNode value=http://example.com/g>]
optimize()

Optimizes the database for future workload.

Useful to call after a batch upload or another similar operation.

Return type:

None

Raises:

OSError – if an error happens during the optimization.

quads_for_pattern(subject, predicate, object, graph_name=None)

Looks for the quads matching a given pattern.

Parameters:
Returns:

an iterator of the quads matching the pattern.

Return type:

collections.abc.Iterator[Quad]

Raises:

OSError – if an error happens during the quads lookup.

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g')))
>>> list(store.quads_for_pattern(NamedNode('http://example.com'), None, None, None))
[<Quad subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
query(query, *, base_iri=None, prefixes=None, use_default_graph_as_union=False, default_graph=None, named_graphs=None, substitutions=None, custom_functions=None, custom_aggregate_functions=None)

Executes a SPARQL 1.1 query.

Parameters:
  • query (str) – the query to execute.

  • base_iri (str or None, optional) – the base IRI used to resolve the relative IRIs in the SPARQL query or None if relative IRI resolution should not be done.

  • prefixes (dict[str, str] or None, optional) – a set of default prefixes to use during the SPARQL query parsing as a prefix name -> prefix IRI dictionary.

  • use_default_graph_as_union (bool, optional) – if the SPARQL query should look for triples in all the dataset graphs by default (i.e. without GRAPH operations). Disabled by default.

  • default_graph (NamedNode or BlankNode or DefaultGraph or list[NamedNode or BlankNode or DefaultGraph] or None, optional) – list of the graphs that should be used as the query default graph. By default, the store default graph is used.

  • named_graphs (list[NamedNode or BlankNode] or None, optional) – list of the named graphs that could be used in SPARQL GRAPH clause. By default, all the store named graphs are available.

  • substitutions (dict[Variable, NamedNode or BlankNode or Literal or Triple] or None, optional) – dictionary of values variables should be substituted with. Substitution follows RDF-dev SEP-0007.

  • custom_functions (dict[NamedNode, Callable[[NamedNode or BlankNode or Literal or Triple, ...], NamedNode or BlankNode or Literal or Triple or None]] or None, optional) – dictionary of custom functions mapping function names to their definition. Custom functions takes for input some RDF term and returns a RDF term or None.

  • custom_aggregate_functions (dict[NamedNode, Callable[[], AggregateFunctionAccumulator]] or None, optional) – dictionary of custom aggregate functions mapping function names to their definition. Custom aggregate functions take no input and return an object with two methods, accumulate(self, term: Term) to add a new term to the accumulator and finish(self) -> Term to return the accumulated result.

Returns:

a bool for ASK queries, an iterator of Triple for CONSTRUCT and DESCRIBE queries and an iterator of QuerySolution for SELECT queries.

Return type:

QuerySolutions or QueryBoolean or QueryTriples

Raises:
  • SyntaxError – if the provided query is invalid.

  • OSError – if an error happens while reading the store.

SELECT query:

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1')))
>>> [solution['s'] for solution in store.query('SELECT ?s WHERE { ?s ?p ?o }')]
[<NamedNode value=http://example.com>]

CONSTRUCT query:

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1')))
>>> list(store.query('CONSTRUCT WHERE { ?s ?p ?o }'))
[<Triple subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>>>]

ASK query:

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1')))
>>> bool(store.query('ASK { ?s ?p ?o }'))
True
static read_only(path)

Opens a read-only store from disk.

Opening as read-only while having an other process writing the database is undefined behavior.

Parameters:

path (str) – path to the primary read-write instance data.

Returns:

the opened store.

Return type:

Store

Raises:

OSError – if the target directory contains invalid data or could not be accessed.

remove(quad)

Removes a quad from the store.

Parameters:

quad (Quad) – the quad to remove.

Return type:

None

Raises:

OSError – if an error happens during the quad removal.

>>> store = Store()
>>> quad = Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g'))
>>> store.add(quad)
>>> store.remove(quad)
>>> list(store)
[]
remove_graph(graph_name)

Removes a graph from the store.

The default graph will not be removed but just cleared.

Parameters:

graph_name (NamedNode or BlankNode or DefaultGraph) – the name of the name graph to remove.

Return type:

None

Raises:

OSError – if an error happens during the named graph removal.

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g')))
>>> store.remove_graph(NamedNode('http://example.com/g'))
>>> list(store.named_graphs())
[]
update(update, *, base_iri=None, prefixes=None, custom_functions=None, custom_aggregate_functions=None)

Executes a SPARQL 1.1 update.

Updates are applied in a transactional manner: either the full operation succeeds, or nothing is written to the database.

Parameters:
  • update (str) – the update to execute.

  • base_iri (str or None, optional) – the base IRI used to resolve the relative IRIs in the SPARQL update or None if relative IRI resolution should not be done.

  • prefixes (dict[str, str] or None, optional) – a set of default prefixes to use during the SPARQL query parsing as a prefix name -> prefix IRI dictionary.

  • custom_functions (dict[NamedNode, Callable[[NamedNode or BlankNode or Literal or Triple, ...], NamedNode or BlankNode or Literal or Triple or None]] or None, optional) – dictionary of custom functions mapping function names to their definition. Custom functions take for input some RDF terms and returns a RDF term or None.

  • custom_aggregate_functions (dict[NamedNode, Callable[[], AggregateFunctionAccumulator]] or None, optional) – dictionary of custom aggregate functions mapping function names to their definition. Custom aggregate functions take no input and return an object with two methods, accumulate(self, term: Term) to add a new term to the accumulator and finish(self) -> Term to return the accumulated result.

Return type:

None

Raises:
  • SyntaxError – if the provided update is invalid.

  • OSError – if an error happens while reading the store.

INSERT DATA update:

>>> store = Store()
>>> store.update('INSERT DATA { <http://example.com> <http://example.com/p> "1" }')
>>> list(store)
[<Quad subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<DefaultGraph>>]

DELETE DATA update:

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1')))
>>> store.update('DELETE DATA { <http://example.com> <http://example.com/p> "1" }')
>>> list(store)
[]

DELETE update:

>>> store = Store()
>>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1')))
>>> store.update('DELETE WHERE { <http://example.com> ?p ?o }')
>>> list(store)
[]