DataLad is a tool for managing datasets! It is built on top of git and git-annex for storage, and is capable of integrating with external hosting providers (through git annex)

It has a number of interesting extensions that we can learn from

  • crawler allows you to archive web pages

  • OSF Remote - example of an extension for interacting with OSF