Social media photo content for Sierra Nevada: a dataset to support the assessment of cultural ecosystem services in protected areas

This dataset provides crowd-sourced and georeferenced information useful for the assessment of cultural ecosystem services in the Sierra Nevada Biosphere Reserve (southern Spain). Data were collected within the European project ECOPOTENTIAL focused on Earth observations of ecosystem services. The dataset comprises 778 records expressing the results of the content analysis of social media photos published in Flickr. Our dataset is illustrated in this data paper with density maps for different types of information.


Introduction
The modern human epoch is characterised by dynamic social-ecological changes, with local communities and individuals showing an important role in ecosystem integrity and health (Rands et al. 2010). In this context, nature conservation mechanisms in protected areas have been increasingly re-shaped to accommodate social aspects of ecosystems (Chan et al. 2006). The establishment of a global network of Biosphere Reserves (UNESCO) is an emblematic effort in this regard, being established to promote strategies that reconcile biodiversity conservation with the sustainable use of ecosystem services (Reed 2016).
Ecosystem services are generally known as the contributions that are obtained from nature (MEA 2005). They include raw material from ecosystems, recognised as provisioning services (e.g. timber and food) and the results from ecological functioning (e.g. hazard mitigation and pollination), i.e. regulating ecosystem services (MEA 2005). Ecosystems also offer non-material benefits, known as cultural ecosystem services, for example, through recreational and inspirational activities (Fish et al. 2016). Despite increasing focus on ecosystem services, assessment approaches have been particularly challenging for cultural ecosystem services (Blicharska et al. 2017).
Evaluations of cultural ecosystem services have been struggling with the inability to capture their subjectivity and utilitarian value (Fish et al. 2016). Conventional assessments include, for instance, the use of public polls which are often expensive and show limited spatio-temporal coverage, as well as biodiversity mapping that tends to merely capture the potential supply of cultural services (Blicharska et al. 2017). In the "information age", the use of "big data" from social media has become a promising approach to monitor naturebased experiences associated with cultural services (Hausmann et al. 2018).
A plethora of social media information has been produced and shared at unprecedented rates, revolutionising traditional methods to address human culture (i.e. culturomics; Ladle et al. 2017), including in the light of conservation problems (see Ladle et al. 2016 for a review). Closely related to culturomics is the content analysis of digital photos posted and shared in social media platforms, such as Flickr (Richards and Friess 2015). These photos contain geographic and temporal information, allowing the mapping of cultural ecosystem services, at high spatial resolutions and for specific time periods in a straightforward and fast way (Vaz et al. 2018).
Despite increasing evaluations of social media information, there is a general deficiency of publicly available databases of photo content analysis. Analysing and mapping the cultural value of ecosystems allow the identification and location of places where nature contributes most to cultural identity and heritage, human health, environmental education and opportunities for nature enjoyment (Soga and Gaston 2016). Under appropriate management strategies, those places can provide great opportunities to promote social support for nature conservation alongside the sustainable use of Biosphere Reserves (Infield 2001).
Our expectation in describing and making available this dataset is to promote the sharing of other similar datasets in order to locate, describe and quantify potential cultural services in protected areas worldwide.

Project details
The dataset was compiled within the context of the H2020 project "ECOPOTEN-TIAL: improving future ecosystem benefits through earth observations" (http://www. ecopotential-project.eu), which has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 641762. ECOPOTENTIAL is focused on internationally recognised Protected Areas, blending Earth Observations, data analysis and modelling of current and future ecosystem conditions and services. ECOPOTENTIAL considers cross-scale geosphere-biosphere interactions at regional to continental scales, addressing long-term and large-scale environmental and ecological challenges.

General spatial coverage
The dataset covers a 1,722 km 2 area corresponding to the UNESCO Biosphere Reserve Sierra Nevada. Sierra Nevada is a mountainous region located in Andalusia (Granada and Almería provinces), in southern Spain. The altitude ranges from 860 m a.s.l to the summits, where the highest peak reaches 3,479 m a.s.l. The climate is Mediterranean, presenting cold winters and hot summers, with pronounced summer drought (July-August). The annual average temperature decreases in altitude from 12-16°C below 1,500 m to 0°C above 3,000 m a.s.l. and the annual average precipitation is about 600 mm. Annual precipitation ranges from less than 250 mm in the lowest parts of the mountain range to more than 700 mm in the summit areas, where, above 2,000 m altitude, winter precipitation is mainly in the form of snow. Topographically, it is a heterogeneous area, with strong climatic contrasts between the sunny, dry south-facing slopes and the shaded, wetter north-facing slopes.
Sierra Nevada hosts more than 80 endemic plant species (Blanca 2001) and more than 2,300 taxa of vascular flora in total, representing the 33.2% of Spanish flora (Lorite 2016), being amongst the most important biodiversity hotspots in the Mediterranean region (Blanca et al. 1998). Overall, Sierra Nevada comprises 27 habitats types from the habitat directive, as well as 31 fauna species (20 birds, 5 mammals, 4 invertebrates, 2 amphibians and reptiles) and 20 plants species listed in Annex I and II of Habitats and Birds Directives. Besides being included in a biosphere reserve, Sierra Nevada has additional legal protections: Special Protection Area and Site of Community Importance (Natura 2000 network); and National and Natural Park.
Regarding its general socioeconomic characteristics, there were 61 municipalities with 90,048 inhabitants in 2017. The population average age is 48.3 years (ten years greater than the population of large urban areas closer to the national park). The main economic activity is services, mostly related to rural tourism (45% of people employed, 75% of registered businesses). Secondary economic activities are farming and construction sector (25% of people employed in each). Finally, the percentage of people working in industrial sector stands around 5%. Registered unemployment in relation to total population is lower than the urban areas (9.3% versus 10.1%), but the net income per inhabitant is half that of urban areas (3,597€ versus 7,158€) (SIMA 2019).

Sampling description
We focused on the screening of photos from a popular social media platform: Flickr (https://www.flickr.com). We used the application programming interface (API) in Flickr (https://www.flickr.com/services/api/) to collect publicly available information published by the users. To protect the users, the obtained information was kept anonymised through the study. Using this API, we collected geographically referenced social media data indicating a time window (between the start of Flickr in 2004 and 2017) and a bounding box with a pair of coordinates around our test area. After transforming the output JSON/XML files to .XLS format, we geoprocessed the data using a GIS in order to clip only those data points included within our case Biosphere Reserve boundaries and to prepare them for the content analysis of random stratified samples (see below). A total of 20,048 photos were downloaded and their information was stored as an excel file with the following attributes: date, latitude, longitude and picture Uniform Resource Locator (URL).
We stratified our sampling over four strata differing in their nature protection regime (National versus Natural Park) and tourist dynamics (rural versus recreational tourism). Specifically, we randomly selected a set of 210 photos across the limits of the National Park (corresponding mostly to the area with the highest elevation of the Biosphere) and another set of 210 photos within the remaining area, coincident with the Natural Park. A third set of 210 photos was considered across ski resorts, corresponding to areas with the highest movement of visitors in autumn and winter. The remaining photos (n = 259) were selected considering the rural areas of the reserve, which were expected to host more visitors during spring and summer. Our final dataset comprised 778 photos from 708 different Flickr users.

Method step description
We checked each individual photo (n = 889) to evaluate its suitability for the content analysis: unidentifiable photos (e.g. due to poor quality) or photos capturing non-natural and indoor elements (e.g. inside parking places or private and business properties) were not considered for the content analysis. Additionally, photos which were not available, for instance, since they were eliminated or protected by users' rights, were also not analysed. After applying the former exclusion criteria, we conducted a "directed content analysis" (following, for example, Hsieh and Shannon 2005;Martínez-Pastur et al. 2016;Oteros-Rozas et al. 2018). For this purpose, we manually classified each photo (n = 778), based on predefined categories (see Table 1). The photo content analysis was first conducted considering the main feature or topic dominating in each photo indicated in the "Main content" variable, whose categories express key cultural elements from ecosystems contributing to the use of cultural ecosystem services by people, in agreement with the new Common International Classification of Ecosystem Services (Haines-Young and Potschin 2018). These categories were associated with each photo, considering the main photographic focus on: fauna and flora or nature and landscape features, as well as on cultural, religious, rural, sports, gastronomy and recreation elements. In cases in which more than one feature or element could be recognised in the photo, we used more than one category to classify the photo. The order of the key elements which define the topic of the photo was based on the album/roll of the user. Specifically, the first topic considered was the element that was identified as the most frequently photographed by the user. The remaining topic was indicated as a secondary category.
In order to provide more detailed information about the photo's content, we further classified each photo considering: (1) The main nature and human features represented in the photo (e.g. lake, natural forest, mountain peak etc.). Again, more than one category per variable could be attributed in cases in which an individual photo showed the dominance of different nature and human features. (2) The type of prevailing sports activity (e.g. hiking, horse riding etc.), when one of the main topics of the photo was "Sports". (3) The represented faunal groups (e.g. ungulate, insect etc.), in those cases in which the main content of the photo was focused on fauna (e.g. categories "Fauna/Flora" and "Birdwatching"). Therefore, these last two variables (Sports activities and Faunal groups) depended on the classification attributed to the first variable "Main content".

Quality control description
The classification of photos into the above-mentioned categories was evaluated by two independent users. Before analysing the content of the whole dataset, a test set of 100 randomly chosen records was first considered and classified. After analysing this test set, the classification procedure was refined for a second round. For both classification Running is the topic of photo Hiking Hiking is the topic of photo Paragliding Paragliding is the topic of photo Horse riding Horse riding is the topic of photo Canoeing Canoeing is the topic of photo Other type Other type of sports activity is the main topic of photo Not applicable The photo is not focused on any sports activity Nature and human features

High mountain
High mountain is the topic of photo Mid-mountain Mid-mountain is the topic of photo Mountain peak Mountain peak is the topic of photo Horizon Horizon is the topic of photo Natural forest Natural forest is the topic of photo Anthropic forest Anthropic forest is the topic of photo Shrub Shrub is the topic of photo Grassland Grassland is the topic of photo Lake, pond Lake is the topic of photo River River is the topic of photo Sky Sky is the topic of photo Urban/built environment Urban/built environment is the topic of photo Non-urban/built environment, infrastructures Non-urban/built environment, infrastructure, is the topic of photo (e.g. rural infrastructure, refuges and recreation infrastructure) Humans, selfies People, including selfies, are the topic of photo Other type Other type of feature is the main topic of photo Not applicable These categories are not applicable Faunal groups Mammal Mammal is the topic of photo Ungulate Wild ungulate is the topic of photo (e.g. Iberian ibex) Waterbird Waterbird is the topic of photo Wader Wader is the topic of photo Raptor Raptor is the topic of photo Passerine Passerine is the topic of photo Reptile Reptile is the topic of photo Fish Fish is the topic of photo Insect Insect is the topic of photo Other type Other type of fauna is the main topic of photo Not applicable The photo is not focused on any type of fauna rounds, the consistency between the two users was analysed through general agreement and kappa statistics. The statistics indicated an increase in classification consistency from the first to the second test set. Specifically, a good consistency between users was found, with agreement levels ranging between 65% (sports activities) and 88% (faunal groups) and kappa values between 0.58 (nature and human features) and 0.60 (sports activities). Figure 1 shows the spatial location of the photos considered in the dataset. The eastern part of Sierra Nevada is by far represented by a higher volume of photos, as indicated by warmer colours. This pattern seems to match the location of ski resorts and rural villages (the "Alpujarras"), which are characterised by a high touristic demand in the Biosphere Reserve. Several other photos are widely represented in westernmost regions of Sierra Nevada. In this case, the location of pictures seems to coincide with the prevalence of walking trails in the Biosphere Reserve. This spatial pattern is also evident for the different categories assigned to the dataset (Figure 2). An exception is rather observed for the faunal groups variable, in which a relative density of photos is also represented in western regions of the Biosphere Reserve.

Dataset overview
We are confident that our dataset (and derived maps) add detail on the potential location of different cultural contributions to people. Specifically, the spatial projections derived from this study can provide useful information for management decisions, for example, on prioritising land planning efforts and resources (Krishnaswamy et al. 2009). They can also be used to maximise synergies between biodiversity conservation and cultural values (Turnhout et al. 2013), identify conflicting areas (or disservices) emerging from recreational activities (such as skiing) tourism and strictly protected zones (with protected habitats/species; Van Cuong et al. 2017), support the monitoring of the natural and cultural capital through remote observations (i.e. "Digital conservation"; Arts et al. 2015) and assist on data collection and dissemination for scientific research and evidence-based conservation (Sherren et al. 2017).
Despite the usefulness of our dataset, some considerations must be recognised when using this and other similar datasets, in the cultural services' arena. For instance, the spatial reference precision of social media photographs can bias the geolocation of collected data (Figueroa-Alfaro and Tang 2017). Still, this issue was likely insignificant in our study, due to the illustration of photographs through a Kernel function (i.e. as a heat-map). We are also aware that distinct social media platforms (such as Instagram, Panoramio) have different audiences, users and temporal/spatial characteristics, which affect the way they can be used (Van Zanten et al. 2016). In our study, we adopted the Flickr platform, due to its more nature-orientated users, ease of data analyses and broad spatial and temporal coverage. Nevertheless, we encourage the inclusion of different types and sources of social media information that can be  complementary to the dataset we propose (Oteros-Rozas et al. 2018). Furthermore, social media users make decisions on which photos they share in social networks, without necessarily meaning that the photos, shared online, express the most preferred and valued elements of the landscape (Malik et al. 2016). Therefore, any effort to further understand the cultural preferences of social media users, should examine the motivations underlying their choices and perceptions in relation to other (social) determinants (e.g. socio-demography, economy), for instance, through traditional state-preference methods or even through social media natural language processing of picture comments. However, attention should be warranted when interpreting and communicating social patterns into more detail (Van Berkel et al. 2018). This was the main reason why our dataset does not compile, nor analyse, data which were protected by users' privacy.

Dataset description
Object name: Georeferenced features of cultural ecosystem services in Sierra Nevada: a dataset based on social media photo content analysis.