{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Collaborating with Git\n", "\n", "First we will talk about the _technical_ aspects of collaboration with Git including common workflows, updating your branch when the main branch has changed, resolving conflicts and making pull requests. Then we will move to more _conceptual_ aspects of collaboration with Git such as managing your work in ways that facilitate fewer conflicts, opening issues, and reviewing other people's code. These tools and workflows would apply equally to private or public repos. Finally we will move on to a more general section about Open Source Software." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Team Workflows**\n", "\n", "The most important thing about team workflows is to have one! All collaborators should be aware of the expected workflow. You can even use `Rules` in your repositories to enforce the desired workflow (discussed below). \n", "\n", "The most common and straightforward team flow is the **Centralized Workflow** (also referred to as the _Trunk-Based_ workflow):\n", "\n", "1) Everyone works in one repository. This repository can be owned by a teammember or by an organization.\n", "\n", "2) There will be one primary, stable version of the code that exists on the `main` branch (the `main` branch is sometimes called the _trunk_).\n", "\n", "3) In order to make changes to the code, collaborators will:\n", " - Create a new branch off of `main`\n", " - Develop and test their changes on the branch\n", " - Open a Pull Request to merge their changes into `main`\n", "\n", "4) Feature branches exist for the time it takes to complete the task, and eventually are merged back into `main`. This is in contrast to other (less popular) workflows where multiple primary branches exist and features that are not in `main` may be maintained indefinitely.\n", "\n", "5) In this workflow you do not develop directly to the `main` branch. This keeps the primary branch stable and usable even while new code is under development and testing. It is often possible to merge your feature branch directly into main, without a Pull Request, but this is generally frowned upon in a collaborative setting. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Merging Updates into Your Branch**\n", "\n", "Reminder of how to create a new branch off of main:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [], "source": [ "git checkout main # checkout the main branch\n", "git pull # make sure you are up to date with the latest\n", "git checkout -b feature-branch-1 # create a new branch for your feature, use a descriptive name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After you add some code, you will want to commit it and push your new branch to the remote: " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [], "source": [ "git add \n", "git commit -m \"\"\n", "\n", "git push -u origin feature-branch-1 # push the new branch and the changes to the remote repository" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You keep working on your branch, let's say for a week or so, and in the meantime one of your collaborators completes their work and merges it into main. How can you check if there are updates to `main`?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [], "source": [ "git fetch # this will download information about the remote branches, without downloading the new code\n", "git checkout main # switch back to the main branch\n", "git status # check the status of your local repository, this will tell you if you are behind the remote (i.e., if you need to do a git pull)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's say there are new updates on the main branch. How can you update your feature branch so that your work is applied to the latest version of the main branch? The `git merge` command will pull changes from another branch into your yours." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [], "source": [ "git checkout main # if you aren't on main, you should go there\n", "git pull # download the latest updates \n", "git checkout feature-branch-1 # switch back to your feature branch\n", "git merge main # merge the latest changes from main into your feature branch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**NOTE** Merging will not overwrite your changes. If there are direct conflicts, then you will get a notification of a conflict and you will have to work to resolve it. If your changes and the incoming changes from the other branch do not conflict, then Git will seamlessly merge them together. This will keep your most recent work and incorporate anything that has changed on the main branch." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Resolving Conflicts**\n", "\n", "Merging can create _conflicts_ when the same line of code has been changed on two different branches. The output of `git merge` will notify you of the conflict (and the file it is in) and will not complete the merge until you reconcile the issue. You have 3 options:\n", "\n", "1) Accept the _incoming_ changes (in the example above this would be to take the code from the main branch)\n", "2) Accept the _local_ or _current_ changes (in the example above this would be to take the code from the feature branch)\n", "3) Accept a combination of both (in this case you will edit the file directly to tell Git what the reconciled version should look like)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's say we want to overwrite whatever came in on main with our feature branch code. One way to do this is to use `git checkout`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [], "source": [ "git checkout --ours # resolve conflicts by keeping your version of the file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or maybe we didn't mean to touch that file, and we want to accept whatever is on the main branch:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [], "source": [ "git checkout --theirs # resolve conflicts by keeping the version from the main branch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you aren't sure and need to inspect the code, you can do so in any text editor. VSCode has a conflict resolution tool that comes with the Git extension, this will allow you to see the two versions of the file side-by-side and make your decision. If you simply open the file, you will see the conflicting lines of code denoted by >>>>. For example: \n", "\n", "```\n", "<<<<<<< HEAD\n", "fig, ax = plt.subplots(figsize=(10,5))\n", "=======\n", "fig, axes = plt.subplots(figsize(15, 8), nrows=2)\n", ">>>>>>> main\n", "```\n", "\n", "HEAD always refers to the code you have checked out on your local computer. So in this case the incoming change is from main and the current change is from HEAD (our feature branch). You will add/delete whatever lines you would like in order to reconcile the differences in the code. Then you will save the file.\n", "\n", "When the conflict resolution is complete you will commit the changes. If you are in the middle of a merge, then you do not need to type -m to add another message. The auto-generated commit message will already say that you are merging main into feature_branch_1:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [], "source": [ "git commit # finish the merge by committing the changes. \n", "git push # push the resolved changes to the remote repository" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Git Exercise 1: Adding collaborators to a repository and creating feature branches\n", "\n", "1) Get into groups of 3 or 4\n", "\n", "2) _One_ person should make a new repository, change the default branch name to main, and add a README (we did this on Wednesday).\n", "\n", "3) Add collaborators (only the owner needs to do this, but all group members can figure this out together)\n", " - Go into the repository settings and then to the Collaborators tab. \n", " - Add the other members of your group to the repository using their GitHub user name.\n", "\n", "4) Collaborators should receive invites to join the repo (you will get an email, but you can also see them from your GitHub account).\n", "\n", "5) Each collaborator should clone the repo, make their own feature branch.\n", "\n", "6) Add something to your feature branch (maybe a code file or a new text file). The new file shouldn't be empty.\n", "\n", "7) Push your feature branch to the remote (follow the steps above)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Visualizing the Workflow**\n", "\n", "The following figure describes the overall workflow that we are currently implementing. Although the diagram may seem a bit complicated, note the following features:\n", "\n", "1) Commits on main (purple) are _only_ made through pull requests (PRs) after a feature or bugfix was developed on a branch. \n", "\n", "2) If commits are made to main before work on a branch is complete, that branch will need to update on the latest version of main before making a PR.\n", "\n", "3) There is no limit to how many branches you can have at a time.\n", "\n", "4) A new branch always starts from the latest version of main.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can imagine that another branch would continue from the latest version of main... and so on.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "#### **Opening Pull Requests**\n", "\n", "The appropriate way to get code from your feature branch into main is through a Pull Request. This gives a clear visualization of the changes and provides an opportunity for someone else on the team to give feedback. The easiest way to open a Pull Request is on GitHub. \n", "\n", "1) On GitHub, navigate to the `Pull Requests` tab for your repository, and click \"New pull request\".\n", "\n", "\n", "\n", "2) From the \"compare\" drop down menu, select your branch.\n", "\n", "\n", "\n", "3) Click \"Create pull request\". \n", "\n", "\n", "\n", "4) Give the PR a meaningful title and describe the content. Link it to an open issue if there is one (using # will then allow you to search for issues by name). Assign a reviewer and a label if appropriate. Click \"Create pull request\" again to complete the process.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Git Exercise 2: Opening Pull Requests and Assigning Reviewers\n", "\n", "1) Each collaborator should open a Pull Request for their feature branch \n", "\n", "2) Explore the PR's! What kind of information do they show? What did your teammates add?\n", "\n", "3) Let's look at the PR that is listed first. Whoever opened this PR should add a reviewer (on the right). Pick anyone on your team." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Reviewing Code**\n", "\n", "In general, any code that goes into main should be reviewed by at least one person. Every team should decide how many reviewers a PR needs. Typically, the more complex the changes, the more reviewers you should have. You can assign reviewers to the PR and they will get notified of the request.\n", "\n", "Reviewers should look at the changed files. As you scroll through you have the option to add comments. When you add a comment you can either select 'Add single comment' or 'Start a review'. The difference is whether you want to make a single note or if you would like to collect your comments along with an overall summary in a Review. If you have more than one thing to say, the second option is preferred. \n", "\n", "Reviews are a way to have a dialog about the changes to the code, prior to merging them into main. Once the comments are all resolved or addressed, the PR can be merged. Convention says that the person who reviewed the code pushes the merge button (i.e. you don't merge your own code).\n", "\n", "**NOTE** Squashing is almost always the preferred way to merge into main. This keeps the history of the main branch clean and easy to read. Feature branches may end up with hundreds of commits. \n", "\n", "**NOTE** Unless you plan to continue development on the feature that is being merged into main, you should delete the branch after the PR is closed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Git Exercise 3: Reviewing Code, Merging into Main, Updating Feature Branches, Resolving Conflicts\n", "\n", "Let's go through the whole process. With your team:\n", "\n", "1) The assigned reviewer should review and merge the first PR\n", "\n", "2) Look at one of the other PRs... can it be merged?\n", "\n", "3) Update your feature branches and repeat the PR -> merge process until all PR's are closed\n", "\n", "4) At least two people should open new feature branches ... and this time they should make changes to the same file!\n", "\n", "5) Merge someones branch into main, try to update the other branch, work to resolve the conflicts.\n", " - We suggest looking at both the raw text file (see the way git denotes the conflicting lines) and trying to use the VSCode conflict resolution tool\n", " - If you have the Git extension installed and open the conflicting file in VSCode, it will automatically suggest the conflict resolution tool." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Facilitating Easier Merges (Fewer Conflicts)**\n", "\n", "Lots of small commits, using an agreed upon format specification (or a linter), think about the feature development list and prioritize" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Opening Issues**\n", "\n", "The most obvious use for Git issues is to report a bug. This could be on your own project for a code base that you use (like xgcm). A bug report should contain:\n", "\n", "\n", "Along with filing bug reports, you should use Git issues to track your work. This is the best place to consolidate information about why, how, and what you are working on. There are tons of nice GitHub features that can be used to write informative issues." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Forking a Repo**\n", "\n", "Forks are most useful when you want to adapt an existing project that you are not a collaborator on for your own use. \n", "\n", "Example use case: You study Equatorial dynamics, and there is a software project on GitHub where some scientists have already implemented the Matsuno Shallow Water equations in Python. The problem is their assumed domain is too small for your research question. Instead of rewriting the model from scratch (waste of time) or downloading their code and then uploading it as your own new repo (plagiarism), you can create a fork! \n", "\n", "Forks allow you to make a new respository based on an existing one. The forked version is still linked to the original, meaning you can pull updates from the original at a later date (handy!). This gives credit to the original authors, allows you to benefit from updated code, and prevents you from reinventing the wheel. It also allows you to save and push your changes to GitHub without disturbing the original repo (phew!).\n", "\n", "It is also possible to push new code in a forked repository up to the original repo, although we won't cover that here. This might be desired if you made changes that the original owners want to incorporate into the primary version of the project. In general, branching is preferred over forking for git collaboration.\n", "\n", "The easiest way to fork a repository is through GitHub.\n", "\n", "#### _Optional Mini Exercise: Git Fork_\n", "\n", "Start by navigating to a repo that you like! Pick anything. It could be a Python package that you use or a project that your friend has started. Use the GitHub interface to create a fork of this repo. After following the prompts (you typically will want to fork the main branch of the original repo), you should see a copy of the repository show up on you GitHub profile. You will see a link that indicates where the repo was forked from. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Collaborative Git Best Practices\n", "\n", "- Commit often with descriptive commit messages (don't say things like \"bugfix\", \"changes\", \"commit\")\n", "- Update your branch on main often\n", "- Use branches to complete new features\n", "- Use Pull Requests to merge features into main\n", "- Squash merges are always preferred\n", "- Use Git issues to track your work (past, current, and future)\n", "- Add tests where possible to verify the status of your main branch\n", "- Agree upon style and formatting specifications \n", "- Use tags to denote stable points between major feature developments\n", "- Add Git Rules to your repo to enforce the desired workflow (e.g., protect branches, always squash)\n", "- Favor branching over forking for development.\n", "- If you want to make changes to a repository on which you aren't a collaborator, fork it first so that you can save your changes." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.12.6" } }, "nbformat": 4, "nbformat_minor": 2 }