RDF Parsing and Serialization

Oxigraph provides functions to parse and serialize RDF files:

Parsing

pyoxigraph.parse(input=None, format=None, *, path=None, base_iri=None, without_named_graphs=False, rename_blank_nodes=False)

Parses RDF graph and dataset serialization formats.

It currently supports the following formats:

It supports also some media type and extension aliases. For example, application/turtle could also be used for Turtle and application/xml or xml for RDF/XML.

Parameters:
  • input (bytes or str or IO[bytes] or IO[str] or None, optional) – The str, bytes or I/O object to read from. For example, it could be the file content as a string or a file reader opened in binary mode with open('my_file.ttl', 'rb').

  • format (RdfFormat or None, optional) – the format of the RDF serialization. If None, the format is guessed from the file name extension.

  • path (str or os.PathLike[str] or None, optional) – The file path to read from. Replaces the input parameter.

  • base_iri (str or None, optional) – the base IRI used to resolve the relative IRIs in the file or None if relative IRI resolution should not be done.

  • without_named_graphs (bool, optional) – Sets that the parser must fail when parsing a named graph.

  • rename_blank_nodes (bool, optional) – Renames the blank nodes identifiers from the ones set in the serialization to random ids. This allows to avoid identifier conflicts when merging graphs together.

Returns:

an iterator of RDF triples or quads depending on the format.

Return type:

collections.abc.Iterator[Quad]

Raises:
  • ValueError – if the format is not supported.

  • SyntaxError – if the provided data is invalid.

  • OSError – if a system error happens while reading the file.

>>> list(parse(input=b'<foo> <p> "1" .', format=RdfFormat.TURTLE, base_iri="http://example.com/"))
[<Quad subject=<NamedNode value=http://example.com/foo> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<DefaultGraph>>]

Serialization

pyoxigraph.serialize(input, output=None, format=None)

Serializes an RDF graph or dataset.

It currently supports the following formats:

It supports also some media type and extension aliases. For example, application/turtle could also be used for Turtle and application/xml or xml for RDF/XML.

Parameters:
  • input (collections.abc.Iterable[Triple] or collections.abc.Iterable[Quad]) – the RDF triples and quads to serialize.

  • output (IO[bytes] or str or os.PathLike[str] or None, optional) – The binary I/O object or file path to write to. For example, it could be a file path as a string or a file writer opened in binary mode with open('my_file.ttl', 'wb'). If None, a bytes buffer is returned with the serialized content.

  • format (RdfFormat or None, optional) – the format of the RDF serialization. If None, the format is guessed from the file name extension.

Returns:

bytes with the serialization if the output parameter is None, None if output is set.

Return type:

bytes or None

Raises:
  • ValueError – if the format is not supported.

  • TypeError – if a triple is given during a quad format serialization or reverse.

  • OSError – if a system error happens while writing the file.

>>> serialize([Triple(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'))], format=RdfFormat.TURTLE)
b'<http://example.com> <http://example.com/p> "1" .\n'
>>> import io
>>> output = io.BytesIO()
>>> serialize([Triple(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'))], output, RdfFormat.TURTLE)
>>> output.getvalue()
b'<http://example.com> <http://example.com/p> "1" .\n'

Formats

class pyoxigraph.RdfFormat

RDF serialization formats.

The following formats are supported:

>>> RdfFormat.N3.media_type
'text/n3'
file_extension
Returns:

the format IANA-registered file extension.

Return type:

str

>>> RdfFormat.N_TRIPLES.file_extension
'nt'
static from_extension(extension)

Looks for a known format from an extension.

It supports some aliases.

Parameters:

extension (str) – the extension.

Returns:

RdfFormat if the extension is known or None if not.

Return type:

RdfFormat or None

>>> RdfFormat.from_extension("nt")
<RdfFormat N-Triples>
static from_media_type(media_type)

Looks for a known format from a media type.

It supports some media type aliases. For example, “application/xml” is going to return RDF/XML even if it is not its canonical media type.

Parameters:

media_type (str) – the media type.

Returns:

RdfFormat if the media type is known or None if not.

Return type:

RdfFormat or None

>>> RdfFormat.from_media_type("text/turtle; charset=utf-8")
<RdfFormat Turtle>
iri
Returns:

the format canonical IRI according to the Unique URIs for file formats registry.

Return type:

str

>>> RdfFormat.N_TRIPLES.iri
'http://www.w3.org/ns/formats/N-Triples'
media_type
Returns:

the format IANA media type.

Return type:

str

>>> RdfFormat.N_TRIPLES.media_type
'application/n-triples'
name
Returns:

the format name.

Return type:

str

>>> RdfFormat.N_TRIPLES.name
'N-Triples'
supports_datasets
Returns:

if the formats supports RDF datasets and not only RDF graphs.

Return type:

bool

>>> RdfFormat.N_TRIPLES.supports_datasets
False
>>> RdfFormat.N_QUADS.supports_datasets
True
supports_rdf_star
Returns:

if the formats supports RDF-star quoted triples.

Return type:

bool

>>> RdfFormat.N_TRIPLES.supports_rdf_star
True
>>> RdfFormat.RDF_XML.supports_rdf_star
False