RDF Parsing and Serialization¶
Oxigraph provides functions to parse and serialize RDF files:
Parsing¶
- pyoxigraph.parse(input=None, format=None, *, path=None, base_iri=None, without_named_graphs=False, rename_blank_nodes=False)¶
Parses RDF graph and dataset serialization formats.
It currently supports the following formats:
N-Triples (
RdfFormat.N_TRIPLES
)N-Quads (
RdfFormat.N_QUADS
)Turtle (
RdfFormat.TURTLE
)TriG (
RdfFormat.TRIG
)N3 (
RdfFormat.N3
)RDF/XML (
RdfFormat.RDF_XML
)
It supports also some media type and extension aliases. For example,
application/turtle
could also be used for Turtle andapplication/xml
orxml
for RDF/XML.- Parameters:
input (bytes or str or IO[bytes] or IO[str] or None, optional) – The
str
,bytes
or I/O object to read from. For example, it could be the file content as a string or a file reader opened in binary mode withopen('my_file.ttl', 'rb')
.format (RdfFormat or None, optional) – the format of the RDF serialization. If
None
, the format is guessed from the file name extension.path (str or os.PathLike[str] or None, optional) – The file path to read from. Replaces the
input
parameter.base_iri (str or None, optional) – the base IRI used to resolve the relative IRIs in the file or
None
if relative IRI resolution should not be done.without_named_graphs (bool, optional) – Sets that the parser must fail when parsing a named graph.
rename_blank_nodes (bool, optional) – Renames the blank nodes identifiers from the ones set in the serialization to random ids. This allows to avoid identifier conflicts when merging graphs together.
- Returns:
an iterator of RDF triples or quads depending on the format.
- Return type:
- Raises:
ValueError – if the format is not supported.
SyntaxError – if the provided data is invalid.
OSError – if a system error happens while reading the file.
>>> list(parse(input=b'<foo> <p> "1" .', format=RdfFormat.TURTLE, base_iri="http://example.com/")) [<Quad subject=<NamedNode value=http://example.com/foo> predicate=<NamedNode value=http://example.com/p> object=<Literal value=1 datatype=<NamedNode value=http://www.w3.org/2001/XMLSchema#string>> graph_name=<DefaultGraph>>]
Serialization¶
- pyoxigraph.serialize(input, output=None, format=None)¶
Serializes an RDF graph or dataset.
It currently supports the following formats:
N-Quads (
RdfFormat.N_QUADS
)Turtle (
RdfFormat.TURTLE
)TriG (
RdfFormat.TRIG
)N3 (
RdfFormat.N3
)RDF/XML (
RdfFormat.RDF_XML
)
It supports also some media type and extension aliases. For example,
application/turtle
could also be used for Turtle andapplication/xml
orxml
for RDF/XML.- Parameters:
input (collections.abc.Iterable[Triple] or collections.abc.Iterable[Quad]) – the RDF triples and quads to serialize.
output (IO[bytes] or str or os.PathLike[str] or None, optional) – The binary I/O object or file path to write to. For example, it could be a file path as a string or a file writer opened in binary mode with
open('my_file.ttl', 'wb')
. IfNone
, abytes
buffer is returned with the serialized content.format (RdfFormat or None, optional) – the format of the RDF serialization. If
None
, the format is guessed from the file name extension.
- Returns:
bytes
with the serialization if theoutput
parameter isNone
,None
ifoutput
is set.- Return type:
bytes or None
- Raises:
ValueError – if the format is not supported.
TypeError – if a triple is given during a quad format serialization or reverse.
OSError – if a system error happens while writing the file.
>>> serialize([Triple(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'))], format=RdfFormat.TURTLE) b'<http://example.com> <http://example.com/p> "1" .\n'
>>> import io >>> output = io.BytesIO() >>> serialize([Triple(NamedNode('http://example.com'), NamedNode('http://example.com/p'), Literal('1'))], output, RdfFormat.TURTLE) >>> output.getvalue() b'<http://example.com> <http://example.com/p> "1" .\n'
Formats¶
- class pyoxigraph.RdfFormat¶
RDF serialization formats.
The following formats are supported:
N-Triples (
RdfFormat.N_TRIPLES
)N-Quads (
RdfFormat.N_QUADS
)Turtle (
RdfFormat.TURTLE
)TriG (
RdfFormat.TRIG
)N3 (
RdfFormat.N3
)RDF/XML (
RdfFormat.RDF_XML
)
>>> RdfFormat.N3.media_type 'text/n3'
- file_extension¶
- Returns:
the format IANA-registered file extension.
- Return type:
>>> RdfFormat.N_TRIPLES.file_extension 'nt'
- static from_extension(extension)¶
Looks for a known format from an extension.
It supports some aliases.
- Parameters:
extension (str) – the extension.
- Returns:
RdfFormat
if the extension is known orNone
if not.- Return type:
RdfFormat or None
>>> RdfFormat.from_extension("nt") <RdfFormat N-Triples>
- static from_media_type(media_type)¶
Looks for a known format from a media type.
It supports some media type aliases. For example, “application/xml” is going to return RDF/XML even if it is not its canonical media type.
- Parameters:
media_type (str) – the media type.
- Returns:
RdfFormat
if the media type is known orNone
if not.- Return type:
RdfFormat or None
>>> RdfFormat.from_media_type("text/turtle; charset=utf-8") <RdfFormat Turtle>
- iri¶
- Returns:
the format canonical IRI according to the Unique URIs for file formats registry.
- Return type:
>>> RdfFormat.N_TRIPLES.iri 'http://www.w3.org/ns/formats/N-Triples'
- media_type¶
- Returns:
the format IANA media type.
- Return type:
>>> RdfFormat.N_TRIPLES.media_type 'application/n-triples'
- supports_datasets¶
- Returns:
if the formats supports RDF datasets and not only RDF graphs.
- Return type:
>>> RdfFormat.N_TRIPLES.supports_datasets False >>> RdfFormat.N_QUADS.supports_datasets True
- supports_rdf_star¶
- Returns:
if the formats supports RDF-star quoted triples.
- Return type:
>>> RdfFormat.N_TRIPLES.supports_rdf_star True >>> RdfFormat.RDF_XML.supports_rdf_star False