The Sampling Ontology (SamO)

Why an Ontology about Sampling?

Everything started with a few questions:

Can we use knowledge representation and extraction methods to describe the logic and semantics of sampling within modern music production practices?
Can we effectively express which songs — or, indeed, parts of songs — have been used in order to create a new piece of music?
Can we describe the transformation process of a song element into a sample via manipulation?

SamO - The Sampling Ontology was created to try and answer these questions.

Extending the Music Ontology and reusing WhoSampled

The primary starting point for SamO was the WhoSampled website, which is the most popular online repository for sampling data today. WhoSampled used to have an open API, however it was deactivated sometime in the late 2010s and closed off to enable the company to use the data for commercial purposes. Because of this we were unable to get a clear idea of how WhoSampled organizes their data, though we assume that there is some sort of ontological categorization, semantic taxonomy, or detailed metadata at work. Furthermore, their decision to commercialise the data is problematic in that this data originates from public contributions, as evidenced on individual sample pages, and may also originate from earlier web1 efforts by fans to create free, public sampling directories (much in the same way Genius, the most popular lyrics repository today, used web1 directories to kickstart its website).

As Catherine D’Ignazio and Lauren Klein explain in chapter 7 of Data Feminism, "Data work is part of a larger ecology of knowledge, one that must be both sustainable and socially just. […] the network of people who contribute to data projects is vast and complex. Showing this work is an essential component of data feminism […]."1 While WhoSampled has kept public the contributions made by the public over the years, their move to commercialize this data is clearly problematic in how it extracts capital potential from a publicly-built dataset without any say from the contributors. With this in mind, we wanted to try and replicate some of the public facing semantics/logic of WhoSampled in an open ontology, so as to potentially free up this data once more.

For the development of the ontology itself, the main point of reference was the Music Ontology (MO), created back in 2007 and to this day still the primary ontology for music-related events and concepts. We also used the Audio Commons Ontology as an early point of reference. After studying the documentation and literature around these two ontologies, we decided that it would be best to take advantage of some of the already existing classes and predicates within MO to essentially extend it. Extending and reusing existing ontologies whenever possible is best practice to ensure the overall interoperability of Linked Open Data and avoid the creation of redundant ontological elements across different, related ontologies.

While MO does refer to concepts of sampling, the creation of a specific module for dealing with sampling was suggested among its possible future improvements in its 2007 paper. It was clear upon closer study that their existing approach did not satisfy our requirements and so we decided to create classes and predicates needed specifically to deal with sampling while keeping some MO elements as a foundation, in particular their own extension of the four levels of representation as conceptualized by the IFLA's Functional Requirements for Bibliographic Records (FRBR).

1. Show Your Work. (2020). In Data Feminism. Retrieved from https://data-feminism.mitpress.mit.edu/pub/0vgzaln4

Preliminary Development and Competency Questions

We decided to develop SamO using a bottom-up approach, that is beginning with data from WhoSampled to shape the structure of the terminology component of our ontology (TBox). We started by selecting 20 relationships (10 each) from WhoSampled that featured a jazz song being used as a sample in a hip-hop or electronic music song. The focus on jazz as a source genre was due to early conversations about focusing the ontology on the specific relationship between sampling jazz in hip-hop, however we ultimately decided to not limit the focus of our ontology though we kept the themed dataset.

The next step was formulating a set of primary Competency Questions. These would be natural language questions designed to help us understand what SamO should be able to represent and what a user should be able to extract from it.

The final preliminary step in our ontology development was to take a closer look at both the MO and Audio Commons ontologies to best understand how the concept of sampling had already been described. Audio Commons was chosen because it deals with describing audio contents within online libraries, including samples which it considers as equivalent to tracks under a wider concept of an audio clip class. Ultimately this approach proved to not be suited to our needs however by studying it we were able to get a better understanding of how to structure SamO.

Below are examples of some of the main generic competency questions we settled on. Detailed examples of the same questions, as well as others which were devised after the ontology was built, and their equivalent SPARQL results are given in the Competency Questions section.

The Project

Why an Ontology about Sampling?

Extending the Music Ontology and reusing WhoSampled

Preliminary Development and Competency Questions

Initial Competency Questions

How many samples are present in a song?

Which songs are sampled in another song?

Which songs of a specific genre are used as samples?

What are the different types of samples?

Which songs by a specific artist have been used as a sample?

Which song(s) use(s) samples that have been manipulated in a certain way?

What are the different types of sampling manipulation?

Which songs released within a timespan have been used as a sample?

What are the different types of elements within a song that can become a sample?

DOCUMENTATION

Documentation

Documentation

Release

CQ #1

CQ #2

CQ #3

CQ #4

CQ #5

CQ #6

CQ #7

CQ #8

CQ #9

CQ #10

CQ #11

CQ #12

CQ #13

CQ #14

CQ #15

CQ #16

CQ #17

CQ #18

CQ #19

CQ #20

Release

An iterative process

Releasing and publishing

Possible future development

About

The team

Laurent Fintoni

Ilaria Rossi