RDF Store¶
- class pyoxigraph.Store(path=None)¶
RDF store.
It encodes a RDF dataset and allows to query it using SPARQL. It is based on the RocksDB key-value database.
This store ensures the “repeatable read” isolation level: the store only exposes changes that have been “committed” (i.e. no partial writes) and the exposed state does not change for the complete duration of a read operation (e.g. a SPARQL query) or a read/write operation (e.g. a SPARQL update).
The
Store
constructor opens a read-write instance. To open a static read-only instance useStore.read_only()
and to open a read-only instance that tracks a read-write instance useStore.secondary()
.- Parameters:
path (str or os.PathLike[str] or None, optional) – the path of the directory in which the store should read and write its data. If the directory does not exist, it is created. If no directory is provided a temporary one is created and removed when the Python garbage collector removes the store. In this case, the store data are kept in memory and never written on disk.
- Raises:
OSError – if the target directory contains invalid data or could not be accessed.
The
str
function provides a serialization of the store in NQuads:>>> store = Store() >>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g'))) >>> str(store) '<http://example.com> <http://example.com/p> "1" <http://example.com/g> .\n'
- add(quad)¶
Adds a quad to the store.
- Parameters:
quad (Quad) – the quad to add.
- Return type:
None
- Raises:
OSError – if an error happens during the quad insertion.
>>> store = Store() >>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g'))) >>> list(store) [<Quad subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
- add_graph(graph_name)¶
Adds a named graph to the store.
- Parameters:
graph_name (NamedNode or BlankNode or DefaultGraph) – the name of the name graph to add.
- Return type:
None
- Raises:
OSError – if an error happens during the named graph insertion.
>>> store = Store() >>> store.add_graph(NamedNode('http://example.com/g')) >>> list(store.named_graphs()) [<NamedNode value=http://example.com/g>]
- backup(target_directory)¶
Creates database backup into the target_directory.
After its creation, the backup is usable using
Store
constructor. like a regular pyxigraph database and operates independently from the original database.Warning: Backups are only possible for on-disk databases created by providing a path to
Store
constructor. Temporary in-memory databases created without path are not compatible with the backup system.Warning: An error is raised if the
target_directory
already exists.If the target directory is in the same file system as the current database, the database content will not be fully copied but hard links will be used to point to the original database immutable snapshots. This allows cheap regular backups.
If you want to move your data to another RDF storage system, you should have a look at the
dump_dataset()
function instead.- Parameters:
target_directory (str or os.PathLike[str]) – the directory name to save the database to.
- Return type:
None
- Raises:
OSError – if an error happens during the backup.
- bulk_extend(quads)¶
Adds a set of quads to this store.
This function is designed to be as fast as possible without transactional guarantees. Only a part of the data might be written to the store.
- Parameters:
quads (collections.abc.Iterable[Quad]) – the quads to add.
- Return type:
None
- Raises:
OSError – if an error happens during the quad insertion.
>>> store = Store() >>> store.bulk_extend([Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g'))]) >>> list(store) [<Quad subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
- bulk_load(input=None, format=None, *, path=None, base_iri=None, to_graph=None)¶
Loads an RDF serialization into the store.
This function is designed to be as fast as possible on big files without transactional guarantees. If the file is invalid only a piece of it might be written to the store.
The
load()
method is also available for loads with transactional guarantees.It currently supports the following formats:
N-Triples (
RdfFormat.N_TRIPLES
)N-Quads (
RdfFormat.N_QUADS
)Turtle (
RdfFormat.TURTLE
)TriG (
RdfFormat.TRIG
)N3 (
RdfFormat.N3
)RDF/XML (
RdfFormat.RDF_XML
)
It supports also some media type and extension aliases. For example,
application/turtle
could also be used for Turtle andapplication/xml
orxml
for RDF/XML.- Parameters:
input (bytes or str or IO[bytes] or IO[str] or None, optional) – The
str
,bytes
or I/O object to read from. For example, it could be the file content as a string or a file reader opened in binary mode withopen('my_file.ttl', 'rb')
.format (RdfFormat or None, optional) – the format of the RDF serialization. If
None
, the format is guessed from the file name extension.path (str or os.PathLike[str] or None, optional) – The file path to read from. Replaces the
input
parameter.base_iri (str or None, optional) – the base IRI used to resolve the relative IRIs in the file or
None
if relative IRI resolution should not be done.to_graph (NamedNode or BlankNode or DefaultGraph or None, optional) – if it is a file composed of triples, the graph in which the triples should be stored. By default, the default graph is used.
- Return type:
None
- Raises:
ValueError – if the format is not supported.
SyntaxError – if the provided data is invalid.
OSError – if an error happens during a quad insertion or if a system error happens while reading the file.
>>> store = Store() >>> store.bulk_load(input=b'<foo> <p> "1" .', format=RdfFormat.TURTLE, base_iri="http://example.com/", to_graph=NamedNode("http://example.com/g")) >>> list(store) [<Quad subject=<NamedNode value=http://example.com/foo> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
- clear()¶
Clears the store by removing all its contents.
- Return type:
None
- Raises:
OSError – if an error happens during the operation.
>>> store = Store() >>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g'))) >>> store.clear() >>> list(store) [] >>> list(store.named_graphs()) []
- clear_graph(graph_name)¶
Clears a graph from the store without removing it.
- Parameters:
graph_name (NamedNode or BlankNode or DefaultGraph) – the name of the name graph to clear.
- Return type:
None
- Raises:
OSError – if an error happens during the operation.
>>> store = Store() >>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g'))) >>> store.clear_graph(NamedNode('http://example.com/g')) >>> list(store) [] >>> list(store.named_graphs()) [<NamedNode value=http://example.com/g>]
- contains_named_graph(graph_name)¶
Returns if the store contains the given named graph.
- Parameters:
graph_name (NamedNode or BlankNode or DefaultGraph) – the name of the named graph.
- Return type:
- Raises:
OSError – if an error happens during the named graph lookup.
>>> store = Store() >>> store.add_graph(NamedNode('http://example.com/g')) >>> store.contains_named_graph(NamedNode('http://example.com/g')) True
- dump(output=None, format=None, *, from_graph=None)¶
Dumps the store quads or triples into a file.
It currently supports the following formats:
N-Triples (
RdfFormat.N_TRIPLES
)N-Quads (
RdfFormat.N_QUADS
)Turtle (
RdfFormat.TURTLE
)TriG (
RdfFormat.TRIG
)N3 (
RdfFormat.N3
)RDF/XML (
RdfFormat.RDF_XML
)
It supports also some media type and extension aliases. For example,
application/turtle
could also be used for Turtle andapplication/xml
orxml
for RDF/XML.- Parameters:
output (IO[bytes] or str or os.PathLike[str] or None, optional) – The binary I/O object or file path to write to. For example, it could be a file path as a string or a file writer opened in binary mode with
open('my_file.ttl', 'wb')
. IfNone
, abytes
buffer is returned with the serialized content.format (RdfFormat or None, optional) – the format of the RDF serialization. If
None
, the format is guessed from the file name extension.from_graph (NamedNode or BlankNode or DefaultGraph or None, optional) – the store graph from which dump the triples. Required if the serialization format does not support named graphs. If it does supports named graphs the full dataset is written.
- Returns:
bytes
with the serialization if theoutput
parameter isNone
,None
ifoutput
is set.- Return type:
bytes or None
- Raises:
ValueError – if the format is not supported or the from_graph parameter is not given with a syntax not supporting named graphs.
OSError – if an error happens during a quad lookup or file writing.
>>> store = Store() >>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'))) >>> store.dump(format=RdfFormat.TRIG) b'<http://example.com> <http://example.com/p> "1" .\n'
>>> import io >>> store = Store() >>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g'))) >>> output = io.BytesIO() >>> store.dump(output, RdfFormat.TURTLE, from_graph=NamedNode("http://example.com/g")) >>> output.getvalue() b'<http://example.com> <http://example.com/p> "1" .\n'
- extend(quads)¶
Adds atomically a set of quads to this store.
Insertion is done in a transactional manner: either the full operation succeeds or nothing is written to the database. The
bulk_extend()
method is also available for much faster loading of a large number of quads but without transactional guarantees.- Parameters:
quads (collections.abc.Iterable[Quad]) – the quads to add.
- Return type:
None
- Raises:
OSError – if an error happens during the quad insertion.
>>> store = Store() >>> store.extend([Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g'))]) >>> list(store) [<Quad subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
- flush()¶
Flushes all buffers and ensures that all writes are saved on disk.
Flushes are automatically done using background threads but might lag a little bit.
- Return type:
None
- Raises:
OSError – if an error happens during the flush.
- load(input=None, format=None, *, path=None, base_iri=None, to_graph=None)¶
Loads an RDF serialization into the store.
Loads are applied in a transactional manner: either the full operation succeeds or nothing is written to the database. The
bulk_load()
method is also available for much faster loading of big files but without transactional guarantees.Beware, the full file is loaded into memory.
It currently supports the following formats:
N-Triples (
RdfFormat.N_TRIPLES
)N-Quads (
RdfFormat.N_QUADS
)Turtle (
RdfFormat.TURTLE
)TriG (
RdfFormat.TRIG
)N3 (
RdfFormat.N3
)RDF/XML (
RdfFormat.RDF_XML
)
It supports also some media type and extension aliases. For example,
application/turtle
could also be used for Turtle andapplication/xml
orxml
for RDF/XML.- Parameters:
input (bytes or str or IO[bytes] or IO[str] or None, optional) – The
str
,bytes
or I/O object to read from. For example, it could be the file content as a string or a file reader opened in binary mode withopen('my_file.ttl', 'rb')
.format (RdfFormat or None, optional) – the format of the RDF serialization. If
None
, the format is guessed from the file name extension.path (str or os.PathLike[str] or None, optional) – The file path to read from. Replaces the
input
parameter.base_iri (str or None, optional) – the base IRI used to resolve the relative IRIs in the file or
None
if relative IRI resolution should not be done.to_graph (NamedNode or BlankNode or DefaultGraph or None, optional) – if it is a file composed of triples, the graph in which the triples should be stored. By default, the default graph is used.
- Return type:
None
- Raises:
ValueError – if the format is not supported.
SyntaxError – if the provided data is invalid.
OSError – if an error happens during a quad insertion or if a system error happens while reading the file.
>>> store = Store() >>> store.load(input='<foo> <p> "1" .', format=RdfFormat.TURTLE, base_iri="http://example.com/", to_graph=NamedNode("http://example.com/g")) >>> list(store) [<Quad subject=<NamedNode value=http://example.com/foo> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
- named_graphs()¶
Returns an iterator over all the store named graphs.
- Returns:
an iterator of the store graph names.
- Return type:
- Raises:
OSError – if an error happens during the named graphs lookup.
>>> store = Store() >>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g'))) >>> list(store.named_graphs()) [<NamedNode value=http://example.com/g>]
- optimize()¶
Optimizes the database for future workload.
Useful to call after a batch upload or another similar operation.
- Return type:
None
- Raises:
OSError – if an error happens during the optimization.
- quads_for_pattern(subject, predicate, object, graph_name=None)¶
Looks for the quads matching a given pattern.
- Parameters:
subject (NamedNode or BlankNode or Triple or None) – the quad subject or
None
to match everything.predicate (NamedNode or None) – the quad predicate or
None
to match everything.object (NamedNode or BlankNode or Literal or Triple or None) – the quad object or
None
to match everything.graph_name (NamedNode or BlankNode or DefaultGraph or None, optional) – the quad graph name. To match only the default graph, use
DefaultGraph
. To match everything useNone
.
- Returns:
an iterator of the quads matching the pattern.
- Return type:
- Raises:
OSError – if an error happens during the quads lookup.
>>> store = Store() >>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g'))) >>> list(store.quads_for_pattern(NamedNode('http://example.com'), None, None, None)) [<Quad subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<NamedNode value=http://example.com/g>>]
- query(query, *, base_iri=None, use_default_graph_as_union=False, default_graph=None, named_graphs=None)¶
Executes a SPARQL 1.1 query.
- Parameters:
query (str) – the query to execute.
base_iri (str or None, optional) – the base IRI used to resolve the relative IRIs in the SPARQL query or
None
if relative IRI resolution should not be done.use_default_graph_as_union (bool, optional) – if the SPARQL query should look for triples in all the dataset graphs by default (i.e. without GRAPH operations). Disabled by default.
default_graph (NamedNode or BlankNode or DefaultGraph or list[NamedNode or BlankNode or DefaultGraph] or None, optional) – list of the graphs that should be used as the query default graph. By default, the store default graph is used.
named_graphs (list[NamedNode or BlankNode] or None, optional) – list of the named graphs that could be used in SPARQL GRAPH clause. By default, all the store named graphs are available.
- Returns:
a
bool
forASK
queries, an iterator ofTriple
forCONSTRUCT
andDESCRIBE
queries and an iterator ofQuerySolution
forSELECT
queries.- Return type:
- Raises:
SyntaxError – if the provided query is invalid.
OSError – if an error happens while reading the store.
SELECT
query:>>> store = Store() >>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'))) >>> [solution['s'] for solution in store.query('SELECT ?s WHERE { ?s ?p ?o }')] [<NamedNode value=http://example.com>]
CONSTRUCT
query:>>> store = Store() >>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'))) >>> list(store.query('CONSTRUCT WHERE { ?s ?p ?o }')) [<Triple subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>>>]
ASK
query:>>> store = Store() >>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'))) >>> bool(store.query('ASK { ?s ?p ?o }')) True
- static read_only(path)¶
Opens a read-only store from disk.
Opening as read-only while having an other process writing the database is undefined behavior.
Store.secondary()
should be used in this case.
- remove(quad)¶
Removes a quad from the store.
- Parameters:
quad (Quad) – the quad to remove.
- Return type:
None
- Raises:
OSError – if an error happens during the quad removal.
>>> store = Store() >>> quad = Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g')) >>> store.add(quad) >>> store.remove(quad) >>> list(store) []
- remove_graph(graph_name)¶
Removes a graph from the store.
The default graph will not be removed but just cleared.
- Parameters:
graph_name (NamedNode or BlankNode or DefaultGraph) – the name of the name graph to remove.
- Return type:
None
- Raises:
OSError – if an error happens during the named graph removal.
>>> store = Store() >>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'), NamedNode('http://example.com/g'))) >>> store.remove_graph(NamedNode('http://example.com/g')) >>> list(store.named_graphs()) []
- static secondary(primary_path, secondary_path=None)¶
Opens a read-only clone of a running read-write store.
Changes done while this process is running will be replicated after a possible lag.
It should only be used if a primary instance opened with
Store()
is running at the same time.If you want to simple read-only store use
Store.read_only()
.- Parameters:
- Returns:
the opened store.
- Return type:
- Raises:
OSError – if the target directories contain invalid data or could not be accessed.
- update(update, *, base_iri=None)¶
Executes a SPARQL 1.1 update.
Updates are applied in a transactional manner: either the full operation succeeds or nothing is written to the database.
- Parameters:
- Return type:
None
- Raises:
SyntaxError – if the provided update is invalid.
OSError – if an error happens while reading the store.
INSERT DATA
update:>>> store = Store() >>> store.update('INSERT DATA { <http://example.com> <http://example.com/p> "1" }') >>> list(store) [<Quad subject=<NamedNode value=http://example.com> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<DefaultGraph>>]
DELETE DATA
update:>>> store = Store() >>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'))) >>> store.update('DELETE DATA { <http://example.com> <http://example.com/p> "1" }') >>> list(store) []
DELETE
update:>>> store = Store() >>> store.add(Quad(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'))) >>> store.update('DELETE WHERE { <http://example.com> ?p ?o }') >>> list(store) []