Dataset of occurrences and ecological traits of amphibians from Upper Paraguay River Basin, central South America

There are many gaps in our biodiversity knowledge, especially in highly diverse regions such as the Neotropics. Basic information on species occurrence and traits are scattered throughout different literature sources, which makes it difficult to access data and ultimately delays advances in ecology, evolution, and conservation biology. We provide species occurrence and trait data for amphibian species in the Upper Paraguay River Basin, central South America. The compiled information is made available through two different datasets that hold (i) 17K species occurrence records and (ii) 30 species-level traits for 113 amphibian species. The first dataset includes the species occurrence records and informs specimen id, collection of housing, locality, geographical coordinates, geographic accuracy, collection date, and collector name. The second dataset covers species-level attributes on morphometry, diet, activity, habitat, and breeding strategy. These datasets improve accessibility to spatial and trait data for amphibian species in the Pantanal ecoregion, one of the largest wetlands on Earth. Nature Conservation 41: 71–89 (2020) doi: 10.3897/natureconservation.41.54265 http://natureconservation.pensoft.net Copyright Matheus Oliveira Neves et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. DATA PAPER Launched to accelerate biodiversity conservation A peer-reviewed open-access journal


Introduction
The availability of species occurrences data is not uniform throughout the Earth and many gaps exist, especially in megadiverse regions such as the tropics (Collen et al. 2008, Meyer 2016. Georeferenced information is imperative for many basic and applied ecology fields (Whittaker et al. 2005), such as biogeography (Lomolino 2004, Silva et al. 2018, evolutionary biology (Holt 2003), and conservation planning (Whittaker et al. 2005, Chen et al. 2017. Although efforts to reduce knowledge gaps in species distribution have increased over the years, knowledge on species distribution is still incomplete (Lomolino et al. 2016). The accumulated occurrence data are not spatially uniform, with on-ground accessibility, economic development, and nature appeal largely affecting the inventory completeness of particular regions , Moura et al. 2018. Inter-and intra-country variation in public policy may also add up to reduce the efficacy of initiatives to reduce sampling gaps (Beck et al. 2014, Troudet et al. 2017. One way to reduce biodiversity knowledge gaps is through improving accessibility and data sharing networks (Chavan and Penev 2011). The Global Biodiversity Information Facility (GBIF), the largest online depository of occurrence records in the world, allows access to data from many natural history collections worldwide (GBIF.org 2019). However, the raw data fed to GBIF may include misidentifications or invalid species names due to outdated taxonomy provided by scientific collections (Beck et al. 2014). For example, it has been found that herbaria collections can hold up to 40% of Amazonian plant specimens with erroneous identifications (Hopkins 2007). To minimize this problem, researchers have relied on curated data papers (Chavan and Penev 2011). Data papers can also include information on species ecology to extend potential applications (Grimm et al. 2014, Gillings et al. 2019). Ecological traits determine species' ability to persist in a variety of environments and reflect the outcome of ecoevolutionary pressures on species interactions with abiotic and biotic factors (Ingram andShurin 2009, Swenson andWeiser 2010). Spatial and trait data have been used to improve spatial models (Dubuis et al. 2013, D'Amen et al. 2015, Guisan et al. 2019, forecast community structure and dynamics (Cadotte et al. 2015, Blonder et al. 2018, predict population trends (Lips et al. 2003, Williams et al. 2010, Coulthard et al. 2019, and to understand potential impacts of climate change (Diamond et al. 2011, Foden et al. 2013. In spite of their importance, trait data are also scattered throughout literature, making its use difficult in comparative studies (Grimm et al. 2014).
Among those regions without proper biodiversity knowledge is the Upper Paraguay River Basin (UPRB), in the center of South America and home of the largest wetland of the world (Alho et al. 1988, Junk andWantzen 2004). The UPRB covers the Pantanal ecoregion, which is classified as an UNESCO World Heritage Site since 2000 (UNESCO 2020). It encompasses transition zones among Pantanal and other South American ecoregions, such as Cerrado, Amazonia, Chiquitano Dry Forest, and Dry and Humid Chaco (Olson et al. 2001), across the borders of Brazil, Bolivia, and Paraguay. The confluence of diverse fauna and flora from these different ecoregions is a peculiar characteristic of the UPRB (Silva et al. 2000, Piatti et al. 2019. Because of the spatially varying flooding regimes, many areas in the UPRB show low on-ground accessibility and therefore are still poorly sampled (Uetanabaro et al. 2008, Souza et al. 2017. Amphibian assemblages of some areas within the UPRB are completely unexplored, such as the Pantanal do Paiaguás and Pantanal do Nabileque, Central and South-west of the Pantanal, respectively (Souza et al. 2017), despite punctual efforts to catalogue amphibians in the Pantanal and surroundings. Given the present knowledge, the UPRB is characterized by higher amphibian richness in the surrounding plateaus, and fewer species in the floodplain, but with high abundance (Uetanabaro et al. 2008). Herein we make available more than 17,000 records for 113 amphibian taxa that occur in the basin. Whenever available and for each geographical record, we provide information on the collection of housing, locality, geographical coordinates, geographic accuracy, collection date, and collectors. For each species, we present trait information on the morphometry, diet, activity, habitat, and breeding strategy.

Data compilation
We compiled occurrence records for amphibians in the Upper Paraguay River Basin (UPRB) through specimens available in scientific collections and fieldwork. We visited five collections in Brazil: (i) Coleção Zoológica de Referência da Universidade Federal de Mato Grosso do Sul (ZUFMS-AMP), Campo Grande, Mato Grosso do Sul state; (ii) Coleção Herpetológica da Universidade de Brasília (CHUNB), Brasília, Federal District; (iii) Coleção Zoológica da Universidade Federal de Mato Grosso (ZUFMT-AMP), Cuiabá, Mato Grosso state; (iv) Coleção Herpetológica Célio F. B. Haddad, of the Universidade Estadual Paulista (CFBH), Campus of Rio Claro, São Paulo state; and (v) Coleção de Anfíbios do Museu Nacional of the Universidade Federal do Rio de Janeiro (MNRJ), Rio de Janeiro, Rio de Janeiro state. We also verified amphibian specimens deposited at the Colección Herpetológica del Museo de Historia Natural Noel Kempff Mercado (MNKA), Santa Cruz de La Sierra, Santa Cruz Department, Bolivia; and in the Colección Herpetológica del Instituto de Investigación Biológica del Paraguay (IIBT-H), Asunción, Capital District, Paraguay. Fieldwork records were based on ongoing research developed by us and members of the Mapinguari Lab (voucher specimens from such research were housed in the ZUFMS collection). Specimens from fieldwork received the additional label MAP to allow their differentiation from speci-mens previously available at the ZUFMS collection. In addition, we gathered information on morphometry, diet, activity, habitat, and breeding strategy for each amphibian species found in the UPRB based on the literature available.

Data description
Information on spatial and trait data for amphibian species of the Upper Paraguay River Basin (UPRB) is presented as two supplementary tables. Suppl. material 1: Table S1 includes the species occurrence data, whereas the Suppl. material 2: Table S2 contains the species-level trait data. We acknowledge that the spatial occurrence data is subject to taxonomic uncertainty, that is, the difficult of confirming the id of some species based on preserved specimens only. Therefore, for Suppl. material 1: Table S1, we provided 14,900 occurrence records for 89 amphibian species identified at the species-level, and additional 2,189 occurrence records for 24 taxonomic 'entities' of amphibians with taxonomic uncertainty (species identifications including "aff.", "gr.", "cf.", and "sp."). Taxonomy issues for some species are discussed in the session Taxonomy deliberation.
Suppl. material 2: Table S2 is defined at the species-level and therefore not subjected to the taxonomic uncertainty of preserved specimens. The trait data is presented for all 113 amphibian species that occur within the UPRB. References consulted to build the Table S2 are listed in the Suppl. material 2. In the following, we provide details on the fields represented in the two supplementary tables.

Suppl. material 1: Table S1 -Species occurrence records
Suppl. material 1: Table S1 contains  of the Mapinguari Lab (which will be housed in ZUFMS after). Label number: number in the collection in which the specimen is housed. This information was extracted directly from the scientific collection records. Family/Genus/Id._Level/Epithet/Species: Taxonomic data of the specimen (Family, Genus and Species). The column named "Id._Level" is related to the taxonomic level known ("cf.", "aff.", "gr." or "sp.", see "Taxonomy deliberation" session). If the "Id._Level" column is empty, the specimen was identified at the species level.
Locality: Name of the locality where the specimen was collected. It might refer to local designations, villages, or other locations below the municipality level. Locality information was extracted from the specimen catalogue available at each scientific collection. Municipality/Adm_unit/Country: Information on the municipality, administrative unit (state level for Brazil, and department level for Bolivia and Paraguay), and country where the specimen was collected. Latitude/Longitude: Geographic coordinates of the specimen, in decimal degrees.
These data were made available mostly by the collectors. However, for specimens with missing data, we obtained the geographic coordinates of their respective locality via Google Earth Pro. Geographic_Accuracy: Geographic accuracy of the record indicated by one of the three following levels. (i) "Exact_Location", for records of the exact place where the specimen was captured. (ii) "Nearby_Location", the collector did not provide the geographic coordinates but we obtained it through the locality description. And (iii) "Municipality_Centroid", records with unknown exact or nearby location were georeferenced based on the municipality centroid. Collection_Day/Collection_Month/Collection_Year: Day, month, and year of the record. Data extracted from the catalogue of specimens available at each scientific collection. Collector: The name of the collector who made the record, as informed in the catalogue of specimen. Only 37.5% of the records have this field filled.

Suppl. material 2: Table S2 -Species traits
Suppl. material 2: Table S2 contains 30 fields distributed in seven general topics: Identification (columns with gray background color), Conservation (brown background color), Morphometry (green background color), Diet (blue background color), Habitat (red background color), Activity period (yellow background color), Breeding strategy (orange background color), and References:

Identification-related fields
Family/Species/Year_of_description: Taxonomic level for family and species, followed by the year of description for each species. Number_of_records: Number of georeferenced specimens in Suppl. material 1: Table S1 for each species.

Conservation-related fields
IUCN: Conservation status as provided by IUCN Red List Category and Criteria (2020) categorized as: Least Concern (LC), Data Deficient (DD), Near Threatened (NT), and Not Evaluated (NE).
IUCN_Pop: Current population trends as available at IUCN Red List Category and Criteria (2020): stable, unknown, decreasing, increasing, and Not Evaluated (NE).

Morphometry-related fields
Body_size: Mean Snout-Vent Length (SVL, in millimeters) for males, females, and for the species. In some cases, we found this value only for the species (not for male and female separately). If SVL was available only for males and females, we averaged both values to get the species mean. Head_length: Mean Head Length (in millimeters) for males, females, and for the species. In some cases, we found this value only for the species (not for male and female separately). If head length was available only for males and females, we averaged both values to get the species mean. Head_width: Mean Head Width (in millimeters) for males, females, and for the species. In some cases, we found this value only for the species (not for male and female separately). If head width was available only for males and females, we averaged both values to get the species mean. Reference_Morphometry: References consulted for the morphometry of each species.
All references are listed in the Suppl. material 3. Personal Communication were provided directly by us or by other colleagues consulted.

Diet-related fields
The different levels of prey type were organized in multiple columns, each column indicating the taxonomic group (up to Order, mostly) of the respective prey. See Table 1 for the complete list of preys categories. The Order Hymenoptera was registered into three different fields: "Hymenoptera_Formicidae" informs the percentage of ants among amphibian preys; "Hymenoptera_non_Formicidae" informs the percentage of others group of Hymenoptera, exemption of Formicidae; and "Hymenoptera" for the sum of "Hyme-noptera_Formicidae" and "Hymenoptera_non_Formicidae", or when the author of the source provided information on Hymenoptera only. If prey identification was unavailable at the Order level, we used a higher taxonomic rank (e.g., Annelida, Insect_Pupa, Insect_Larvae) to differentiate preys. When two or more prey types belonged to the same higher taxon, but at different ranks, we did not sum those values. For example, Ixodida is an Order of Acari, but in our dataset, we did not add the values of "Ixodida" with those of "Acari" when the author provided these values separately. In each column, the data can be available in two different forms according to the information originally reported: (i) Presence/Absence data (dark blue background color) are classified as 0 (absence) and 1 (presence) if information on the Index of Relative Importance (IRI) was unavailable; and (ii) Percentage of IRI data (%IRI), the relative contribution of each prey category to the total IRI of each species. In these cases, items that could not be identified (e.g., fragmented bodies and advanced stages of digestion) were referred in the column "Not_ Identified_raw_IRI" and they are not considered for the computation of the %IRI.

Habitat-related fields
Major Habitat: Vegetational formation where the species is commonly present. We considered three major habitat types: (i) "Open" for species occurring in the Cerrado sensu stricto, grasslands, shrublands, and wetlands; (ii) "Forest" for species occurring in moist broadleaf forest, dry broadleaf forest, and riparian forest; and (iii) "For-est_Open" if the species is present in physiognomies of both major habitat type. Habitat use: The microhabitat used by the post-metamorphic individuals. We classified microhabitats in four levels: (i) "Aquatic" when the species lives in the water body; (ii) "Arboreal" to species that use shrubs or trees for calling and live; (iii) "Fossorial" for species that lives underground or buried for some period; and (iv) "Terrestrial" for species that lives in the ground. As a note, a same species can use one or more level of habitat use. Reference_Habitat: References consulted for the habitat of each species. All references are listed in Suppl. material 3. Personal Communication were provided directly by us or by other colleagues consulted.

Activity-related fields
Seasonality: Seasonal activity of the species was classified as (i) "Dry" when it breeds in the winter (also the dry season in the UPRB); and (ii) "Wet" if it breeds in the summer (wet season in the UPRB). Species that can breed in both seasons were classified as "Dry_Wet". Habit: Period of activity of the species when it feeds and reproduces. We classified species as (i) "Diurnal" or (ii) "Nocturnal". Species active at both day and night were classified as "Diurnal_Nocturnal".

Phylum/Class/Prey_Category Taxonomic level of Prey Category Aves
Aves

Breeding-related fields
Reproductive_mode: The reproductive mode characteristic of each species. The mode number for anurans follows the description provided by Haddad and Prado (2005) and Wells (2007) for caecilians. Development: The mode of development after the egg hatching, classified as (i) "Indirect" development when the species has larval stages or (ii) "Direct" development of terrestrial eggs without larval stages. Water_system: The water system in which species deposit their eggs, classified as (i) "Lotic" for species that breeds in flowing waters, such as rivers, streams, and rivulets; and (ii) "Lentic" water system for species breeding in still water like ponds and swamps. Species that reproduce in both water systems were classified as "Lentic_Lotic". Eggs_Deposition: The substrate in which the species lay its eggs, classified as (i) "Water" for eggs laid directly in the flowing or still water; (ii) "Ground", when the eggs are laid directly on the ground, rocks or leaf on the ground; (iii) "Burrows", for eggs laid within a natural cavity or in a cavity built by the male or female of the species; (iv) "Basin", when eggs are laid in the water accumulated in a build basin nearby ponds, and (v) "Arboreal", eggs laid on leaves above the water system. Nest: Some species use a foam nest to lay their eggs. We classified such species as "Foam". For the remaining species, we did not fill this field. Reference_Breeding: References consulted for the breeding strategy of each species.
All references are listed in Suppl. material 3. Personal Communication was provided directly by us or by other colleagues consulted.

Taxonomy deliberation
There are many taxonomy issues with amphibian species. Although taxonomists have improved their ability to unveil cryptic species, the cryptic diversity remains unknown in many tropical regions (Fouquet et al. 2007, Funk et al. 2012, Arteaga et al. 2016. Considering that some species in our dataset (Suppl. material 1: Table S1) are still unknown to science, we did not identify all occurrences at the species-level. For some occurrence records, we used confer ("cf.") to refer to species groups of either difficult identification based on preserved specimens or groups with high cryptic diversity. In the latter scenario, species would be distinguishable preferably through molecular analysis or bioacoustics parameters. For example, Elachistocleis matogrosso and E. bicolor, both occurring within UPRB, are diagnosed by the pattern of dorsal stripes (Caramaschi 2010). After specimen fixation and housing in museums, the efficacy of such diagnostic character is extremely reduced, which practically prevent the confirmation of species id based on preserved specimens only. That is also in the case with Adenomera species (Angulo et al. 2003). Species belonging to Adenomera and Elachistocleis were named as "cf." in Suppl. material 1: Table S1.
Because the collector has more tools to identify specimens alive during fieldwork (e.g., bioacoustics, color in life), we also included the original identification for all specimens classified as "cf." in the column "Id._Level" in Suppl. material 1: Table S1, with only two exemptions. First, we joined all specimens identified as Leptodactylus gracilis, L. jolyi, and L. sertanejo as Leptodactylus cf. jolyi, due to the high morphological similarity observed among preserved specimens of these species, besides their overlapping distributions (Neves et al. 2017). We also combined Pithecopus azureus and P. hypocondrialis under the name P. cf. hypocondrialis. Pithecopus azureus is distributed in the southern UPRB whereas P. hypocondrialis is an Amazonian species distributed in the northern UPRB (Frost 2020). However, the populations of P. cf. hypocondrialis found in the transitional zone of the distribution of these two species were not possible to identify using the diagnosis proposed by Caramaschi (2006). Therefore, we decided to place all the specimens as P. cf. hypocondrialis. In Suppl. material 2: Table S2, we provided the ecological traits for each species level named as "cf." in Suppl. material 1: Table S1. We also provide this information for P. azureus and P. hypochondrialis. For Odontophrynus species we kept the identification at the genus-level even knowing that two species within this genus occur in the UPRB (O. americanus and O. lavillai; Rosset and Baldo 2014). However, they are only distinguishable based on the ploidy of the cells or superficial characteristics, such as the pattern of dorsolateral spots (Rosset et al. 2006, Weiler et al. 2013, which makes their identification impossible with preserved species. Another issue concerns two Oreobates species that have their type localities within the UPRB. Oreobates crepitans was described from São Vicente, Cuiabá municipality, whereas O. heterodactylus was discovered in Cáceres municipality, both in Mato Grosso state, Brazil (Miranda-Ribeiro 1937, Bokermann 1965. However, the identity of these specimens in the collection is often unnamed and of difficult diagnosis based on specimens preserved (Padial et al. 2012). We thus placed Odontophrynus and Oreobates records as "sp." in Suppl. material 1: Table S1.
We use group ("gr.") for specimens from Scinax ruber and Dendropsophus parviceps species groups, for which we did not get the correct identification due to the high cryptic diversity among them. Other works have used the same nomenclature for both 'species group' in the literature (Kopp et al. 2010, São-Pedro and Feio 2010, Crivellari et al. 2014). Since the real identification of these species is untraceable, we removed them from Suppl. material 2: Table S2. The specimens named as Leptodactylus aff. natalensis is as yet an unknown species (Carvalho T., Personal communication).

Preliminary analyses and directions for future research
Datapapers compile well-curated species-level information on spatial and trait data of particular taxa and/or regions of interest (Chavan and Penev 2011). This datapaper comprises 17,089 records for 113 amphibian species, distributed in 14 families and 32 genera. Hylidae was the richest family (32 spp.) followed by Leptodactylidae (31 spp.), but this latter one showed the highest number of records (41.13% of total occurrence records), followed by Hylidae (35.77%). These two families are often the most common in short-term studies undertaken within the UPRB (e.g., Uetanabaro et al. 2007, Souza et al. 2010, Pansonato et al. 2011, Sugai et al. 2014. The occurrence dataset is suitable to research on patterns of species diversity and distribution within UPRB (e.g., Valdujo et al. 2012, Roberto and Loebmann 2016, Silva et al. 2018. The regions with the highest number of species-occurrence records and apparent richness were concentrated in protected areas (e.g., Chapada dos Guimarães National Park, and Serra da Bodoquena National Park), target areas of environmental impact studies (e.g., Jauru and Manso hydroelectric plant regions), and around major urban centers (e.g., Campo Grande, Corumbá, and Cuiabá municipalities) (Suppl. material 1: Table S1; Fig. 1). Regions with striking sampling gaps or even without any species occurrence record are those showing low accessibility (e.g., Serra do Amolar, Pantanal do Paiaguás, and Pantanal do Nabileque). The Paraguayan portion of UPRB has a low number of records and only a few regions were sampled, nonetheless this may be biased due to the low number of Paraguayan scientific collections visited (Fig. 1B).
Species traits can greatly improve our understanding of ecological patterns and conservation planning (Grimm et al. 2014). Overall, species in the UPRB show prefer-ence for open habitats and aquatic microhabitat, with predominance of species mostly nocturnal that breed in the wet season (Suppl. material 2: Table S2). The most common breeding strategy was indirect development in lentic water, with eggs laid directly on the water (Fig. 2). It is worth noting the high frequency of missing data for several traits explored in Suppl. material 2: Table S2, even for species that are common and abundant in the Pantanal and surroundings (e.g. Boana lundii, Pseudis platensis, Scinax nasicus, Leptodactylus syphax). Our ecological trait dataset helps identify which ecological aspects of what species are less known and therefore deserve further investigations.
In summary, it is necessary to encourage researchers to make available their unpublished data in order to minimize our biodiversity knowledge gaps (Chavan and Penev 2011). Amphibians are a highly threatened vertebrate group and the UPRB harbors the world's largest tropical wetland area. We hope the present data paper facilitates studies on ecology and conservation of amphibians from the Pantanal and surrounding plateaus.