This query provides some basic statistics, including average prices, of paintings by the artist Rembrandt van Rijn (i.e., Rembrandt).
Introduction
Doing statistics with SPARQL is a mixed bag. There are a few reasons you might want to do it, and a lot of reasons not to. However, if the goal is to produce some very simple statistics (counts, averages, and min/max is your best bet) then this kind of query might be for you. The pros are that it shortens the number of steps to extract this kind of information from SPARQL data: if you need live data, and don’t want to frame it inside of a more elaborate data pipeline (using Python or something else), then this might be great.
There are significant challenges is setting up such a query for Provenance data. The important part of query construction is to get the right resources to show up in the results. Once we do that, we can apply SPARQL’s native statistical functions (like ‘AVG()’). To see the process of getting the relevant information, look at each Step below. The actual statistical infrastructure is placed before and after, in the SELECT statement, as well as the ORDER BY and GROUP BY statements (which are needed in order to aggregate results).
Query
PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>
PREFIX la: <https://linked.art/ns/terms/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?currency (COUNT(?value) AS ?count)
(AVG(?value) AS ?avg_value)
(MIN(?value) AS ?min_value)
(MAX(?value) AS ?max_value) WHERE
{
# Step 1: Get activities that are of type "purchase"
?activity crm:P2_has_type <http://vocab.getty.edu/aat/300417642> .
?activity crm:P67i_is_referred_to_by ?sales_record .
?sales_record crm:P129_is_about ?painting .
?painting rdfs:label ?painting_label .
# Step 2: Get the timeline right
?activity crm:P9_consists_of ?acquisition .
?acquisition crm:P4_has_time-span ?timespan .
?timespan crm:P82a_begin_of_the_begin ?begintime .
FILTER (?begintime > "1800-01-01T01:00:00+05:30"^^xsd:dateTime)
FILTER (?begintime < "1820-01-01T01:00:00+05:30"^^xsd:dateTime)
# Step 3: Find the paintings by Rembrandt
?painting crm:P108i_was_produced_by ?production_event .
?production_event crm:P9_consists_of ?subproduction_event .
?subproduction_event crm:P14_carried_out_by <https://data.getty.edu/provenance/dde29a68-fea4-3f3d-89b2-dbc0688a48ae> .
?subproduction_event crm:P14_carried_out_by [rdfs:label ?artist] .
# Step 4: Get the sale price of each painting
?activity crm:P9_consists_of ?payment .
?payment <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://linked.art/ns/terms/Payment> .
?payment la:paid_amount ?price .
?price crm:P180_has_currency [rdfs:label ?currency] .
?price crm:P90_has_value ?value .
?activity rdfs:label ?activity_label .
FILTER NOT EXISTS {FILTER CONTAINS (?activity_label, "Offer").}
}
GROUP BY ?currency
ORDER BY ?painting_label ?value