IBS Malaga 2019 – Workshops
The biennial conference will hold the following workshops on January 8, 2019 – prior to the first day of the symposia and concurrent talk sessions. Please click on the workshop for a full description, background requirements, and organizers. Registration for workshops can be done as an optional add-on when registering for the conference.
Full day workshops
- Level: Introductory
- Background expectations:
- It is assumed that participants are already reasonably proficient users of R, though users of Python or Matlab should have no problem following the workshop
- Need to bring: Computer
- Maximum capacity: 40
- Cost: 85 Euros pp
- Michael Krabbe Borregaard – CMEC, Natural History Museum of Denmark, University of Copenhagen
Julia is a new and groundbreaking programming language for scientific computation and data analysis, with version 1.0 expected in 2018. Though the syntax of Julia is similar to R or MatLab, the speed is very close to that of C, which is the gold standard for speed in computer languages. This does a lot more than just speed up analyses (which isn’t a concern for many scientists): Importantly, it solves the “two-language problem”. In R, which is used today by many ecologists, any performance-critical functionality in packages have to be implemented in a faster language such as C or Fortran, making library code essentially a black box for scientific users. This restricts package development to a small group of users, and greatly weakens the openness and transparency of scientific computing.
Julia is built completely open-source in github repositories, and there’s a strong culture for broad collaboration on developing the core software and new packages. Thus, Julia creates an opportunity to develop a package ecosystem for ecology which is coherent, where different packages use the same types and interfaces. The package ecosystem is still much less complete than R’s, but in steady growth.
This workshop will serve as an introduction to this new programming language and its package ecosystem for ecological analysis, and the participants are expected to be ready to start doing their own analyses in Julia by the end of the workshop. It is assumed that participants are already reasonably proficient users of R, though users of Python or Matlab should have no problem following the workshop.
The morning session will consist of a general introduction to Julia, presenting the most popular working environments Juno and JuliaBox, as well as Julia’s syntax and basic concepts. Extra focus will be given to the type system and the idea of “multiple dispatch”, which is the key programming paradigm of Julia, as well as to how to produce high-quality plots. The rest of the morning session will be an overview and introduction to the package ecosystem.
The afternoon session consists of working through example analyses taken from the literature. Participants will have the opportunity to submit data and R scripts for analyses they’ve already published, and then demonstrate how to do those same analyses in Julia.
Participants are expected to bring their own computers, and will be instructed in how to install the software.
- Level/Background needed:
- Basic users of R, RStudio and QGIS
- Biogeographers or interested in Biogeography
- Equipment needed:
- 64 bit-laptop, and have R, RStudio and QGIS installed
- Maximum capacity = 30 people
- Cost 85 Euros pp
- Jesús Olivero (Universidad de Málaga, Spain)
- A. Marcia Barbosa (Universidade de Évora, Portugal)
- Raimundo Real (Universidad de Málaga, Spain)
The geographical distribution of a species is usually studied on the basis of categorical presence data. However, a species can be “present” in a given location with many different degrees: few or many individuals; sporadically or permanently; all year round, seasonally or in transit; every year or only in some years. This imprecision is intrinsic to the nature of the species distribution, and treating species presence as categorical data (“yes” or “no”) entails a significant loss of information and disregards the natural nuances that are actually observed. Fuzzy logic and fuzzy sets were specifically devised to deal with this kind of gradual data and with their logical consequences, and are thus particularly fit for being used in biogeography.
From a fuzzy set perspective, a geographical area can be seen as having a certain degree of membership in the set of areas with favourable conditions for the occurrence of a species. Given that species are not distributed independently of each other, a species distribution may have a certain degree of membership in a distribution type shared by a set of species, i.e. a chorotype. This allows the implementation of fuzzy set operations, such as fuzzy union, intersection, inclusion, difference and entropy, for the detection and analysis of biogeographical patterns. Fuzzy logic can thus be applied to the modelling of the distribution of individual species; to the comparison between and combination of distribution models for different species; to the analysis of interspecific biogeographic relationships; and to the search for environmental and historical causes of species distribution.
We will train attendants in the use of two analytic tools: fuzzySim (http://fuzzysim.r-forge.r-project.org/) and RMacoqui (http://rmacoqui.r-forge.r-project.org/), both of which are freely available through the R-Forge platform for the development of R packages. fuzzySim enables the definition of species distribution models as fuzzy sets from the prevalence- independent environmental favourability modelling approach; RMacoqui allows objective chorotype identification and contextualizes chorotypes under a fuzzy logic framework.
The morning session will start with a theoretical introduction in which Prof. Raimundo Real will expose the basis of fuzzy logic, the conceptual and practical advantages of their consideration in Biogeography, and the fuzzy nature of concepts such as favourability for occurrence and similarity between distribution ranges. Also in the morning, we will practice with fuzzySim, learning how to build species distribution models based on different explanatory factors and how to combine them using fuzzy-logic operations.
The afternoon session will consist on a practice focused on the identification of chorotypes using RMacoqui, and on the calculation and interpretation of fuzzy concepts such as the degree of membership of species distributions in a chorotype, the union/intersection between chorotypes, the degree of inclusion of one chorotype in another, and the fuzzy entropy of each chorotype. We will learn how to map chorotypes from a fuzzy-logic perspective, and how to use this approach for exploring causal links in the form of potential environmental/historical attractors.
- Level: Geared toward practicing field biologists with little or no computational experience
- Equipment needed: Laptop computer
- Maximum capacity: 30 people
- Cost 85 Euros pp
- Isaac Overcast (City College of New York)
- Deren Eaton (Columbia University)
Over the last 10 years biogeographers have increasingly transitioned from investigating phylogeographic patterns in space and time using datasets composed of one or only a handful of markers to massive datasets containing thousands or tens of thousands of “anonymous” nuclear loci generated using restriction site associated DNA sequencing (RAD-Seq). These larger datasets provide more robust phylogenetic estimates, and can provide additional sources of information such as evidence of historical introgression. The process of organizing and making sense of the vast quantities of reads that come back off a sequencing instrument is non-trivial, and of great consequence. Simple parameter misspecification during the assembly process can have considerable impact on all downstream analysis, potentially influencing the interpretation of the genetic patterns in the data. Prior to the availability of unified assembly tools, these datasets were typically assembled in an ad hoc fashion using scripts developed in-house, leading to wide variability in the quality of assemblies being performed by the community. Additionally, downstream analysis typically involves writing complicated scripts to manage running multiple iterations of statistical inference software, organizing and post-processing the output, and generating publication-ready plots. This proliferation of methods and lack of community standards has two significant consequences:
- Unnecessary complexity in assembly and analysis workflows increases the potential for introducing errors in the process, which increases the potential for researchers to interpret signals in the data which are essentially meaningless;
- Because of the extreme contingency of ad hoc scripts developed in-house within labs, these workflows are rarely if ever reused or evaluated by other labs, which is a significant hindrance to reproducibility in science.
In this workshop we will introduce ipyrad, a unified and self-contained RAD-Seq assembly and analysis framework, which emphasizes simplicity, performance, and reproducibility. We will proceed through all the steps necessary to assemble a small simulated RAD-Seq dataset, including demultiplexing reads to samples, cleaning and trimming reads, clustering, basecalling, and generating multiple output types for downstream analysis. We will introduce both the command line interface, as this is typically used in high performance computing settings, and the ipython notebook API, which allows researchers to generate documented and easily reproducible workflows. Additionally, we will introduce the ipyrad ‘analysis’ API which provides a powerful, simple, and reproducible interface to several widely used methods for inferring phylogenetic relationships, population structure, and admixture. The analysis API leverages the massive parallelization provided by the ipyrad backend, manages organization of intermediate files, and provides a simple interface for generating publication-ready plots of results.
- Level: Introductory
- Background needed: Minimum familiarity with R recommended
- Equipment needed: Laptop computer installed with latest version of R and RStudio
- 85 Euros pp
- Dr. Babak Naimi1* & Prof. Miguel B. Araújo2,3
- 1 University of Amsterdam, the Netherlands; 2Spanish Research Council (CSIC) at the National Museum of Natural Sciences in Madrid, Spain; 3University of Èvora, Portugal
sdm is a comprehensive modelling and simulation framework that enables fitting of individual and community-based species distribution models, while supporting markedly different modelling methods including different correlative and machine learning based approaches. It generates ensembles of models, and provides several options for evaluation of model results and projection of species potential distributions in space and time. The generic design of sdm is object-oriented making it flexible and amenable to efficient handling of errors. Of key importance of this platform is that it can easily be extended by users wanting to support additional models and/or procedures for any of the main steps in species distribution modelling. sdm employs high performance computing solutions to speed up modelling, and also provides a graphical user interface (GUI) making it friendly even for users who are not familiar with R.
The sdm package is developed by the organisers of this workshop, and published as a cover paper in the journal of Ecography (Naimi and Araujo, 2016). The workshop includes two sessions. In the first session, the package will be introduced and its capabilities will be demonstrated. In the second session, the participants have the opportunity to practice on their own laptops and just get quickly started working with the sdm package. It is recommended that the participants have the latest version of R and RStudio installed on their laptops. A minimum familiarity with R is recommended for participants.
Naimi, B. & Araújo, M.B. (2016) sdm: a reproducible and extensible R platform for species distribution modelling. Ecography. 39: 368–375. DOI: 10.1111/ecog.01881
Half day workshops
- Level: Introductory
- Background expectations:
- Attendees need to be familiar with biodiversity data and biogeography concepts;
- Intermediate computer science knowledge
- Cost: 45 Euros pp
- Hanieh Saeedi (Senckenberg Research Institute and Natural History Museum, Germany; OBIS Data Manager, Deep-Sea Node
This workshop will train biogeographers first in how to mine and clean marine data, then how to prepare, integrate, and mobilise their data into open access databases such as Ocean Biogeographic Information System (OBIS).
This workshop provides an overview on how to contribute data to Ocean Biogeographic Information System (OBIS) and how to access data from OBIS. It provides some guidelines about Darwin core standards and data management best practices to ensure that data published via open-access databases are of high quality and follows internationally recognised standards. It also provides guidelines for data users on how to access, process, and visualize data from OBIS.
- Level: Introductory
- Background needed: Competent in English and have read numerous scientific papers
- Maximum number of participants = 50
- Cost: Free for IBS students and developing country members; 45 Euros for all others
- Hans Peter Linder (Department of Systematic and Evolutionary Botany, University of Zurich)
- Michael Dawson (UC Merced, California)
This workshop is aimed primarily at graduate students writing their first paper, or young scientists who have submitted several papers, but are still seeking advice on how to construct and present their work to optimize their acceptance rate. More senior scientists may also benefit (and the participants would also benefit from a wider set of views), if interested in understanding more about joining review and editorial teams. We assume that participants are competent in English and have read numerous scientific papers.
We will discuss the following subjects:
- How to construct a paper. We discuss the optimal construction, and why such a construction is important. This includes communicating the excitement of your findings, and using the Introduction to develop the context and detail the aims. This leads to the Discussion. We also explore how to best write the Abstract and select keywords.
- The increasing importance of understanding what can / should be in Supplementary Information, and the importance of publishing the data.
- The design of illustrations, what you want to communicate, how to use them. Figures have become increasingly important with the free use of colour, and options abundant with powerful graphics packages.
- The publication pipeline: from submission, via review, to proofing and final online publication. We will explain how to interpret the different decisions (minor, major revision, resubmission, rejection, etc). We will also explain how to optimally respond to reviews and editors comments.
- Effective science writing, briefly: what type of grammatical construction works best. This relates to word limits (how to avoid verbosity), and how many citations are optimal.
- The duties and rights of authors, and publication ethics. The former has become important with a rapid increase in the size of the author teams, pressures to publish, and also relates to how to get the best out of your author team.
- How to decide in which journal to publish. This can involve balancing Impact Factors and appropriateness.
The workshop will be a mixture of short thematic presentations and discussions. We try to make it very interactive, all participants are expected to take part in the discussions.