Analysis Repositories

Each member of the MOAD group has an analysis repository. This section is about creating and using your personal analysis repository.

We each have an analysis repository so that we have a default place to store, version control, and share work, mostly in the form of Jupyter Notebooks. In time you will work in other repositories, and create your own repositories for papers, course work, etc., but having a default place to do work helps keep things organized, and helps other people find your work when you graduate and move on from MOAD to other adventures. Most of the links you see people sharing on the weekly meeting whiteboard are to notebooks in their analysis repositories that they have pushed to GitHub.

Our conventions are:

  1. Analysis repositories are called analysis-firstname; e.g. analysis-susan

  2. Analysis repositories are public so that other researchers in the group, and outside of it can see the code and visualizations that you are creating, and learn from them

Set Up Your Analysis Repository

The steps to set up your own analysis repository are:

  1. Create an empty public repository on GitHub and clone it to your laptop or MOAD workstation

  2. Use the MOAD analysis repository cookiecutter to generate the directory structure and initial files for your repository

  3. Commit and push the initial files to GitHub

  4. Create the conda environment to use for working in your analysis repository

Create Your Analysis Repository on GitHub

  1. In your browser, go to the SalishSeaCast GitHub organization page, and use the green New button to start creating your analysis repository.

  2. Make sure the Owner selection box on the Create a new repository page shows the SalishSeaCast organization.

  3. Type analysis-yourfirstname into the the Repository name text box; for example analysis-casey.

  4. Ensure the button to make your new repository Public is set.

  5. Click the green Create repository button at the bottom of the page.

  6. Keep the browser tab open because you are going to need information from it shortly.

Clone Your Analysis Repository

Note

This section assumes that you have already followed that steps in the Secure Remote Access section to Generate ssh Keys, and to Copy Your Public ssh Key to GitHub.

  1. Create a top level directory for MOAD work. On a Waterhole workstation do:

    $ mkdir -p /ocean/$USER/MOAD
    

    Or, if you want to set things up on your laptop do:

    $ mkdir -p $HOME/MOAD
    

    The -p option tell mkdir to not show an error message if the directory already exists, and to create any necessary parent directories as needed.

    $HOME expands to your home directory.

    $USER expands to your user name.

  2. Go back to the browser tab in which you created your analysis repository on GitHub and find the section of the page near the top that says “Quick setup — if you’ve done this kind of thing before”. Below that there are 2 buttons that say HTTPS and SSH. Please ensure that the SSH button is enabled, and copy the repository URI string of text beside it that looks like:

    git@github.com:SalishSeaCast/analysis-casey.git
    
  3. Use that repository URI string to clone your analysis repository from GitHub. On a Waterhole workstation do:

    $ cd /ocean/$USER/MOAD
    $ git clone git@github.com:SalishSeaCast/analysis-casey.git
    

    Or, for laptop setup do:

    $ cd $HOME/MOAD
    $ git clone git@github.com:SalishSeaCast/analysis-casey.git
    

Populate Your Analysis Repository

Note

This section assumes that you have Installed Miniforge on your laptop.

It also assumes that you have set up your Git Configuration.

Note

You only need to do the steps in the section in the clone of your analysis repository on either your laptop or on a Waterhole machine. Once you have done these steps to create the basic directories and files in your repository, committed them in Git, and pushed them to GitHub, you can pull the changes from GitHub into other clones of your repository.

  1. Create a conda environment with the latest version of Python and the cookiecutter tool installed in it with the command:

    $ conda create -n cookiecutter -c conda-forge python=3 cookiecutter
    

    That command will do some processing and then show you a list of packages that will be downloaded and installed, and ask you if it is okay to proceed; hit y or Enter to go ahead.

    After some more processing you should see the messages:

    Preparing transaction: done
    Verifying transaction: done
    Executing transaction: done
    #
    # To activate this environment, use
    #
    #     $ conda activate cookiecutter
    #
    # To deactivate an active environment, use
    #
    #     $ conda deactivate
    
  2. Activate the cookiecutter environment, go to your MOAD/ directory, and populate your empty analysis repository clone with the following commands. On a Waterhole workstation do:

    $ conda activate cookiecutter
    (cookiecutter)$ cd /ocean/$USER/MOAD
    (cookiecutter)$ cookiecutter -f gh:UBC-MOAD/cookiecutter-analysis-repo
    

    Or, for laptop setup do:

    $ conda activate cookiecutter
    (cookiecutter)$ cd $HOME/MOAD
    (cookiecutter)$ cookiecutter -f gh:UBC-MOAD/cookiecutter-analysis-repo
    

    Note

    When you activate a conda environment the name of the environment in parentheses is added to the front of your command-line prompt. So, in the above commands, the command-line prompt changed from $ (or perhaps (base)$) to (cookiecutter)$.

    Those command use our analysis repository cookiecutter template repository to create directories and files in the empty analysis repository that you cloned earlier. The -f option lets the cookiecutter tool write directories and files into an already existing directory.

    cookiecutter will ask you for 2 pieces of input:

    researcher_name [Casey Lawrence]:
    Select github_org:
    1 - SalishSeaCast
    2 - UBC-MOAD
    3 - SS-Atlantis
    Choose from 1, 2, 3 [1]:
    

    Type your name in at the researcher_name prompt, and accept the default 1 for github_org so that cookiecutter set things up to use your repository in the the SalishSeaCast GitHub organization.

  3. Deactivate your cookiecutter environment with:

    (cookiecutter)$ conda deactivate
    
  4. Go into your new analysis repository, add and commit the files that cookiecutter created for you, and push them to GitHub. On a Waterhole workstation do:

    $ cd /ocean/$USER/MOAD/analysis-casey
    $ git add .gitignore LICENSE README.rst notebooks/
    $ git commit -m "Initialize repo from MOAD cookiecutter"
    $ git push
    

    Or, for laptop setup do:

    $ cd $HOME/MOAD/analysis-casey
    $ git add .gitignore LICENSE README.rst notebooks/
    $ git commit -m "Initialize repo from MOAD cookiecutter"
    $ git push
    

Create Your Analysis Repository Conda Environment

Note

This section assumes that you have Installed Miniforge on whatever machine you are working on.

One of the files that cookiecutter created for you is notebooks/environment.yaml. It is an environment description file that you use to tell conda how to set up the environment that you will use to work in your analysis repository. That information includes things like the name of the environment, the version of Python to install in it, and the names of the Python packages to install in the environment.

  1. Go into the notebooks/ directory of your analysis repository, and use conda to create the environment. On a Waterhole workstation do:

    $ cd /ocean/$USER/MOAD/analysis-casey/notebooks/
    $ conda env create -f environment.yaml
    

    Or, for laptop setup do:

    $ cd $HOME/MOAD/analysis-casey/notebooks/
    $ conda env create -f environment.yaml
    

    As was the case when you created the cookiecutter environment above, that command will do some processing and then show you a list of packages that will be downloaded and installed, and ask you if it is okay to proceed; hit y or Enter to go ahead.

    After some more processing you should see messages like:

    Preparing transaction: done
    Verifying transaction: done
    Executing transaction: done
    #
    # To activate this environment, use
    #
    #     $ conda activate analysis-casey
    #
    # To deactivate an active environment, use
    #
    #     $ conda deactivate
    

Use the conda activate command to activate your analysis environment so that you can run Jupyter.

Install SalishSeaTools in Your Analysis Environment

The SalishSeaTools package is a collection of Python modules for working with the SalishSeaCast NEMO model results, and associated data. The functions in it have been written by various members of the MOAD group to do common tasks. Please see this notebook about visualization for one of many examples of in our docs and repositories of uses of modules and functions from the SalishSeaTools package. The documentation for the package contains documentation for most of its functions that is automatically generated from the function docstrings in the code.

  1. Clone the SalishSeaTools repository beside your analysis repository. On a Waterhole workstation do:

    $ cd /ocean/$USER/MOAD/
    $ git clone git@github.com:SalishSeaCast/tools.git
    

    Or, for laptop setup do:

    $ cd $HOME/MOAD/
    $ git clone git@github.com:SalishSeaCast/tools.git
    
  2. Activate your analysis environment (if you haven’t already done so) and install the SalishSeaTools package in it:

    $ conda activate analysis-casey
    (analysis-casey)$ python3 -m pip install --editable tools/SalishSeaTools
    

The --editable option in the pip install command installs the packages in a way that it can be updated when new features are pushed to GitHub by simply doing a git pull in the tools/ directory.

Use Your Analysis Repository on Other Machines

Once you have created your analysis repository and pushed it to GitHub you can clone it on other machines, create a conda environment work working in it, and pull changes that you push to GitHub on one machine to update your repository on another machine.