Basics to get started
GEOTEC, Universitat Jaume I
Apr 25, 2024
The fastest way to raise your level of performance: Cut your number of commitments in half.
an experiment in a PhD project
master project thesis
ideas for future research
regular meeting notes/minutes
teaching materials
a review paper
a conference presentation
a workshop/seminar materials
a book
PhD thesis manuscript
README.md
LICENSE
CODE_OF_CONDUCT
CONTRIBUTING
data: data
, data-raw
code: scripts
, analysis
results: reports
, figs
documentation: notes
, docs
…
File names are machine-readable, human-readable, and play well with default ordering
Script file names begin with numbers/letters to indicate the sequence in the analysis: 01_download_data.R
Data file names begin with dates (YYYYMMDD) as prefix: 20200115_survey.csv
Remember Newton’s letter to Flamsteed
Store raw data permanently (data-raw
folder)
Use scripts to process/clean raw datasets
Store processed data in a separate folder (data
or data-clean
folder)
Document the process (simple steps, diagrams, content/structure of datasets, provenance) in a plain text README
file (See Recommendation #6)
Use open, text-based formats whenever possible
Independent of specific software tools or vendors
Alternatively, provide data in an open format besides proprietary format
Microsoft Excel (.xls
) + Comma-separated values (.csv
)
ESRI Shapefile (.shp
) + GeoPackages (.gpkg
)
Example
Dutch national centre of expertise and repository for research data (DANS) - Preferred vs non-preferred formats
README
file in the root
folder to describe the project, basic orientation to use your code, data, etc.Tips
Suggestions for writing a good README and GitHub’s README
If your project is on GitHub, README files will be automatically visualised if written in Markdown
Include (if required) README
files in each subfolder to describe metadata/complex content
Keep track of ideas, discussions and decisions about the project (in the notes
folder)
Plain text files can be easily version controlled (See Recommendations #9 and #10)
Without a license, copyright is automatically attached to your work
If you plan to make your work (data/databases/documents) public, always specify a license via a LICENSE file (LICENSE.md
or LICENSE.txt
)
BY | Creators/authors must be credited |
SA | Derivatives or redistributions must have identical license |
NC | Only non-commercial usage is allowed |
ND | No derivatives are allowed |
As user/viewer, can you | CC BY 4.0 | CC BY-NC-ND |
---|---|---|
Read, print and download it? | YES | YES |
Redistribute or republish it? | YES | YES |
Translate it? | YES | YES (private use only and not for distribution) |
Download for text and data mining? | YES | YES |
Reuse portions in other works 1? | YES | YES |
Sell or re-use it for commercial purpose? | YES | NO |
Creators/researchers/educators put their works into the global public domain for the benefit of society
ODC Public Domain Dedication and License (PDDL): Public Domain for data/databases (≅CC0)
ODC-By: Attribution for data/databases (≅CC-BY)
ODC Open Data License (ODbL): Attribution Share-Alike for data/databases (≅CC-BY-SA)
Final projects (TFG, TFM): CC BY-SA
Doctoral theses: CC BY-SA or CC BY-NC-SA
Teaching materials: CC BY-NC-SA
Permissive = attribution (recommended for academic work)
Copyleft = share-alike (derivative work maintain same license as the original)
to keep track changes of your work over time
to roll back to past edits (track history)
to handle properly text formats (code, text docs, markdown docs) as opposed to rich/binary formats (Word, gif/jpg)
Tools: Git, SVN, Mercurial
Readings: Getting started with Git, An introduction to Git: what it is, and how to use it, (Perez-Riverol et al. 2016), (Jennifer Bryan 2018)
::::
VCS + collaborative features + development support features
Readings: