Git Submodule Tutorial

Introduction

In our Git projects, sometimes we will include and use other Git projects as submodules. One easy and reproducible way to do this is simply to clone the other Git projects and copy the Git projects in the directories under our own projects. However, it is also likely that the project we are working on has close relationships or collaboration with the submodule Git projects, and we want the submodules to be always up-to-date. In this way, we will use git submodule functions to manage submodules.

In this short blog post, I will talk about the basic protocols of using git submodule correctly in our projects in practice. For more comprehensive usages, please go to the reference section.

Tutorial

Create Submodules in Our Own Git Projects

We will use git submodule add to create submodules in our own Git projects.

1
$ git submodule add [-b <branch>] [-f|--force] [--name <name>] [--reference <repository>] [--depth <depth>] [--] <repository> [<path>]

For example, if we want to use TensorFlow project branch r1.14 as our submodule in directory third_party/tensorflow. In our Git project git-submodule-demo, we run the following commands.

1
2
3
4
5
6
$ cd git-submodule-demo/
# Empty demo project.
$ ls -a
. .. .git
# Create TensorFlow submodule.
$ git submodule add -b r1.14 https://github.com/tensorflow/tensorflow.git third_party/tensorflow

We then check what submodule information has been added.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
$ ls -a
. .. .git .gitmodules third_party
# We got TensorFlow as our submodule.
$ ls -a third_party/tensorflow/
. BUILD .github RELEASE.md
.. CODE_OF_CONDUCT.md .gitignore SECURITY.md
ACKNOWLEDGMENTS CODEOWNERS ISSUES.md tensorflow
ADOPTERS.md configure ISSUE_TEMPLATE.md third_party
arm_compiler.BUILD configure.py LICENSE tools
AUTHORS CONTRIBUTING.md models.BUILD WORKSPACE
# We will have a `module` directory in the `.git` directory for the submodules
$ ls -a .git/
. branches description hooks info objects
.. config HEAD index modules refs
# The `.git` directory for the TensorFlow submodule
$ ls -a .git/modules/third_party/tensorflow/
. branches description hooks info objects refs
.. config HEAD index logs packed-refs
# The TensorFlow submodule information was added to `.git/config`
$ cat .git/config
...[Git Information Skipped]...
[submodule "third_party/tensorflow"]
url = https://github.com/tensorflow/tensorflow.git
active = true
# The TensorFlow submodule information was added to `.gitmodules`
$ cat .gitmodules
[submodule "third_party/tensorflow"]
path = third_party/tensorflow
url = https://github.com/tensorflow/tensorflow.git
branch = r1.14

We then push the changes to GitLab or GitHub.

1
2
3
$ git add .
$ git commit -m "add submodule"
$ git push origin master

On GitLab or GitHub, there will be symbolic links to the Git projects for the submodules.

Remove Submodules in Our Own Git Projects

It used to be very tricky to remove the Git submodules. However, the latest protocol for the new versions of Git makes life easier. To remove the TensorFlow submodule we just added.

1
2
3
$ git submodule deinit -f third_party/tensorflow/
$ rm -rf .git/modules/third_party/tensorflow/
$ git rm -f third_party/tensorflow/

We then check if the submodules have been totally removed.

1
2
3
4
5
6
7
8
9
10
$ ls -a
. .. .git .gitmodules third_party
# No TensorFlow submodule anymore in `third_party`.
$ ls -a third_party/
. ..
# No TensorFlow submodule anymore in `.git/modules/third_party/`.
$ ls -a .git/modules/third_party/
. ..
# No TensorFlow submodule anymore in `.gitmodules`.
$ cat .gitmodules

Once the submodules were removed, we could add the exact same submodules.

1
$ git submodule add -b r1.14 https://github.com/tensorflow/tensorflow.git third_party/tensorflow

If the submodules were not removed cleanly and there is residual information left, there will be problems to add the exact same submodules.

Work with Submodules in Other Git Projects

Sometimes we would have to work on Git projects from others which contain submodules. After git clone, we actually would not have the submodules cloned.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ cd git-submodule-demo/
$ ls -a
. .. .git .gitmodules third_party
# The submodule information is `.gitmodules `
$ cat .gitmodules
[submodule "third_party/tensorflow"]
path = third_party/tensorflow
url = https://github.com/tensorflow/tensorflow.git
branch = r1.14
# The submodule directory is empty.
$ ls -a third_party/tensorflow/
. ..
# There is no `module` directory to store the `.git` files for submodules.
$ ls -a .git
. branches description hooks info objects refs
.. config HEAD index logs packed-refs
$ cat .git/config
...[Git Information Skipped]...
[No submodule information]

To retrieve the submodules, we have to run the following command.

1
2
3
4
5
6
7
8
9
10
11
# Git submodule would use the submodule information in the `.gitmodules` to retrieve the submodules.
$ git submodule update --init --recursive
# The TensorFlow submodule was retrieved.
$ ls -a third_party/tensorflow/
. BUILD .github RELEASE.md
.. CODE_OF_CONDUCT.md .gitignore SECURITY.md
ACKNOWLEDGMENTS CODEOWNERS ISSUES.md tensorflow
ADOPTERS.md configure ISSUE_TEMPLATE.md third_party
arm_compiler.BUILD configure.py LICENSE tools
AUTHORS CONTRIBUTING.md models.BUILD WORKSPACE
.bazelrc .git README.md

Sometimes we want to create submodules that are exactly used in other Git projects for our own projects. We just have to check the submodule information in the .gitmodules file, and used the method we described in “Create Submodules in Our Own Git Projects” to create those submodules accordingly.

Update Git Submodule When Necessary

Sometimes, the submodules from the third parties might have been updated, and we would also like to update the submodule in our project accordingly. We could go to the submodule directory. We checkout to the branch that we want to update via git checkout if necessary. Then we do git pull to pull the new updates to the submodule.

Alternatively, if we would like to update all the submodules without considering too much, just run the following command.

1
$ git submodule update --remote --merge

Finally, we push the changes to GitLab or GitHub via git add, git commit, and git push.

Final Remarks

We have to be careful when we use git submodule because the submodules might be updated without sending notifications to you and all your dependencies might break.

References

Author

Lei Mao

Posted on

07-30-2019

Updated on

12-15-2019

Licensed under


Comments