RDF Parsing and Serialization¶

Oxigraph provides functions to parse and serialize RDF files:

Parsing¶

pyoxigraph.parse(input=None, format=None, *, path=None, base_iri=None, without_named_graphs=False, rename_blank_nodes=False, lenient=False)¶

Parses RDF graph and dataset serialization formats.

It currently supports the following formats:

JSON-LD (RdfFormat.JSON_LD)
N-Triples (RdfFormat.N_TRIPLES)
N-Quads (RdfFormat.N_QUADS)
Turtle (RdfFormat.TURTLE)
TriG (RdfFormat.TRIG)
N3 (RdfFormat.N3)
RDF/XML (RdfFormat.RDF_XML)

Parameters:

input (bytes or str or IO[bytes] or IO[str] or None, optional) – The str, bytes or I/O object to read from. For example, it could be the file content as a string or a file reader opened in binary mode with open('my_file.ttl', 'rb').
format (RdfFormat or None, optional) – the format of the RDF serialization. If None, the format is guessed from the file name extension.
path (str or os.PathLike[str] or None, optional) – The file path to read from. Replace the input parameter.
base_iri (str or None, optional) – the base IRI used to resolve the relative IRIs in the file or None if relative IRI resolution should not be done.
without_named_graphs (bool, optional) – Sets that the parser must fail when parsing a named graph.
rename_blank_nodes (bool, optional) – Renames the blank nodes identifiers from the ones set in the serialization to random ids. This allows avoiding identifier conflicts when merging graphs together.
lenient (bool, optional) – Skip some data validation during loading, like validating IRIs. This makes parsing faster at the cost of maybe ingesting invalid data.

Returns:

an iterator of RDF triples or quads depending on the format.

Return type:

QuadParser

Raises:

ValueError – if the format is not supported.
SyntaxError – if the provided data is invalid.
OSError – if a system error happens while reading the file.

>>> list(parse(input=b'<foo> <p> "1" .', format=RdfFormat.TURTLE, base_iri="http://example.com/"))
[<Quad subject=<NamedNode value=http://example.com/foo> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<DefaultGraph>>]

Serialization¶

pyoxigraph.serialize(input, output=None, format=None, *, prefixes=None, base_iri=None)¶

Serializes an RDF graph or dataset.

It currently supports the following formats:

JSON-LD (RdfFormat.JSON_LD)
canonical N-Triples (RdfFormat.N_TRIPLES)
N-Quads (RdfFormat.N_QUADS)
Turtle (RdfFormat.TURTLE)
TriG (RdfFormat.TRIG)
N3 (RdfFormat.N3)
RDF/XML (RdfFormat.RDF_XML)

Parameters:

input (collections.abc.Iterable[Triple] or collections.abc.Iterable[Quad]) – the RDF triples and quads to serialize.
output (IO[bytes] or str or os.PathLike[str] or None, optional) – The binary I/O object or file path to write to. For example, it could be a file path as a string or a file writer opened in binary mode with open('my_file.ttl', 'wb'). If None, a bytes buffer is returned with the serialized content.
format (RdfFormat or None, optional) – the format of the RDF serialization. If None, the format is guessed from the file name extension.
prefixes (dict[str, str] or None, optional) – the prefixes used in the serialization if the format supports it.
base_iri (str or None, optional) – the base IRI used in the serialization if the format supports it.

Returns:

bytes with the serialization if the output parameter is None, None if output is set.

Return type:

bytes or None

Raises:

ValueError – if the format is not supported.
TypeError – if a triple is given during a quad format serialization or reverse.
OSError – if a system error happens while writing the file.

>>> serialize([Triple(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'))], format=RdfFormat.TURTLE)
b'<http://example.com> <http://example.com/p> "1" .\n'

>>> import io
>>> output = io.BytesIO()
>>> serialize([Triple(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'))], output, RdfFormat.TURTLE, prefixes={"ex": "http://example.com/"}, base_iri="http://example.com")
>>> output.getvalue()
b'@base <http://example.com> .\n@prefix ex: </> .\n<> ex:p "1" .\n'

Formats¶

class pyoxigraph.RdfFormat¶

RDF serialization formats.

The following formats are supported:

JSON-LD (RdfFormat.JSON_LD)
N-Triples (RdfFormat.N_TRIPLES)
N-Quads (RdfFormat.N_QUADS)
Turtle (RdfFormat.TURTLE)
TriG (RdfFormat.TRIG)
N3 (RdfFormat.N3)
RDF/XML (RdfFormat.RDF_XML)

>>> RdfFormat.N3.media_type
'text/n3'

file_extension¶

Returns:: the format IANA-registered file extension.
Return type:: str

>>> RdfFormat.N_TRIPLES.file_extension
'nt'

static from_extension(extension)¶

Looks for a known format from an extension.

It supports some aliases.

Parameters:: extension (str) – the extension.
Returns:: RdfFormat if the extension is known or None if not.
Return type:: RdfFormat or None

>>> RdfFormat.from_extension("nt")
<RdfFormat N-Triples>

static from_media_type(media_type)¶

Looks for a known format from a media type.

It supports some media type aliases. For example, “application/xml” is going to return RDF/XML even if it is not its canonical media type.

Parameters:: media_type (str) – the media type.
Returns:: RdfFormat if the media type is known or None if not.
Return type:: RdfFormat or None

>>> RdfFormat.from_media_type("text/turtle; charset=utf-8")
<RdfFormat Turtle>

iri¶

Returns:: the format canonical IRI according to the Unique URIs for file formats registry.
Return type:: str

>>> RdfFormat.N_TRIPLES.iri
'http://www.w3.org/ns/formats/N-Triples'

media_type¶

Returns:: the format IANA media type.
Return type:: str

>>> RdfFormat.N_TRIPLES.media_type
'application/n-triples'

name¶

Returns:: the format name.
Return type:: str

>>> RdfFormat.N_TRIPLES.name
'N-Triples'

supports_datasets¶

Returns:: if the formats supports RDF datasets and not only RDF graphs.
Return type:: bool

>>> RdfFormat.N_TRIPLES.supports_datasets
False
>>> RdfFormat.N_QUADS.supports_datasets
True