public/git-how.md
... ...
@@ -0,0 +1,445 @@
1
+Best resources
2
+
3
+* Overviews
4
+ * <http://aosabook.org/en/git.html>
5
+* How it really works
6
+ * <https://neoteric.eu/do-the-magic-with-git>
7
+* Implementations
8
+ * <https://maryrosecook.com/blog/post/git-from-the-inside-out>
9
+ * <https://robots.thoughtbot.com/rebuilding-git-in-ruby>
10
+* Background
11
+ * <https://en.wikipedia.org/wiki/Merkle_tree>
12
+* Navigating
13
+ * <http://correl.phoenixinquis.net/2015/07/12/git-graphs.html>
14
+
15
+We are going to make a repo for a wiki. Along the way we are going to watch what Git is doing.
16
+
17
+### Why is Git so hard?
18
+
19
+In my opinion, the reason Git is so hard is that the commands (`git add`, `git commit`, `git checkout`, etc.),
20
+have names whose semantics seem to be mapped onto the history of a variety of versioning tools, especially Subversion. The actual git commands, however, have behaviors with very precise meanings regarding specific git entities. For instance, `git add` means: Compress the files and add them to a git-managed directory, and add references to those files to the binary "index" file. Even when the official Git docs use the language of the implementation (for instance, we are told by `git help add` that `git-add` means "Add file contents to the index") it isn't very clear what
21
+"index" means in this context. A lot would be gained by banning words such as "staging area" and "cache" and just sticking with "index."
22
+
23
+### To get started
24
+
25
+Create a couple of extra terminal tabs, and in the second tab,
26
+
27
+ watch-do $HOME/src/wiki $HOME/src/wiki "tree -t -r -C -a ."
28
+
29
+This lets us observe the directory hierarchy Git creates to manage the state of our repository. [To set up `watch-do` see the end of this document.]
30
+
31
+### Create the wiki
32
+
33
+In the first tab,
34
+
35
+ cd $HOME
36
+ cd src
37
+ rm -fR wiki
38
+ mkdir wiki
39
+ cd wiki
40
+
41
+### Create a file and initialize the repository
42
+
43
+ echo "### Welcome to the wiki" >home.md
44
+ git init
45
+
46
+What are these directories and files?
47
+
48
+`home.md` (and everything else except for `.git/`) is your content in what Git calls the "working directory."
49
+
50
+The `.git/` directory holds the entire history of the repository. Within that,
51
+
52
+| Name | What |
53
+|-----------------|------------------------------------------------------------------------------------------------------------------------|
54
+| `index` | (You won't see this until something has been added.) A binary file with a sorted list, each item being a hash of a blob object and its permission. The index becomes the basis for the next "tree" object that will be committed. |
55
+| `refs/tags/` | |
56
+| `refs/heads/` | |
57
+| `objects/` | |
58
+| `objects/pack/` | |
59
+| `objects/info/` | |
60
+| `config` | A text file with configuration information for the repository. |
61
+| `HEAD` | An ordinary text file that tracks a reference to whatever branch is the current head. |
62
+| `description` | (You may not see this) If present, contains a name for the repository. Used by the GitWeb program. |
63
+| `hooks/` | (You may not see this) Contains files that define behaviors that are triggered at various times during Git operations. |
64
+| `info/` | (You may not see this) Contains a global exclude file for patterns you don’t want to track in a .gitignore file |
65
+
66
+### Add a file and watch the index
67
+
68
+ git add home.md
69
+
70
+In a third tab:
71
+
72
+ watch-do $HOME/src/wiki $HOME/src/wiki/.git/index "git ls-files --stage"
73
+
74
+The concept: We are adding `home.md` to the "index." The index is a place where changes accumulate
75
+that you will eventually commit to your repo. **Think of the index, then, as "the next commit."** (Ryan Tomayko
76
+has a nice post on this, though he calls the index "the next patch" -- see <http://2ndscale.com/rtomayko/2008/the-thing-about-git>.) This is the place where you set up your changes properly before making them official.
77
+
78
+A directory has been added in `.git/objects/` called `81` and within that, a file
79
+called `d4aaeff611587c378d25a90e1d2f178b484727`. Additionally, `.git/index` has changed. The idea
80
+is that `objects/` is effectively a database of compressed files, and the file names are based
81
+on the hashes, which are based on the contents. By this means, `git` is a content-addressable
82
+database. Notice that the contents of `objects/` has no information regarding filenames; it's
83
+all about content.
84
+
85
+Since `objects/` just contained compressed objects, there needs to be a structure
86
+that manages the current state: That's what the `index` is for: A list of hashes pointing into
87
+the `objects/` store, with the names used when you added.
88
+
89
+Indeed, if you added a second identical file with a different name with
90
+`echo "### Welcome to the wiki" >home2.md` you will find that what's in `objects/` doesn't change.
91
+If you then add that file with `git add home2.md` the only change you will find is in the index,
92
+which will reference the name object, but with a different name.
93
+
94
+Again, the exact sequence:
95
+
96
+1. A directory `81/` is created. Within `81/` a file `d4aaeff611587c378d25a90e1d2f178b484727`
97
+ is created. [The path `81/d4aaeff611587c378d25a90e1d2f178b484727` is a hash of the content
98
+ of the `home.md` file, which you can get with `git hash-object home.md`
99
+ (try it). `git hash-object` is pretty simple; here's a Ruby version of how to get a git-compatible
100
+ hash for a blob: <https://gist.github.com/jgn/b3dddf091db54b2d719fa02e996dc5a3>
101
+2. The compressed contents of the file `home.md` are put the file `81/d4aaeff611587c378d25a90e1d2f178b484727` --
102
+ to see this, type `git show 81d4aa` (`81d4aa` is the opening part of the hash). Or you can manually
103
+ uncompress it with
104
+
105
+ ruby -rzlib -e 'print Zlib::Inflate.new.inflate(STDIN.read)' < .git/objects/81/d4aaeff611587c378d25a90e1d2f178b484727
106
+
107
+ The git way to see the content (`-p` means "pretty print"):
108
+
109
+ git cat-file -p 81d4aaeff611587c378d25a90e1d2f178b484727
110
+
111
+ And to see the type . . .
112
+
113
+ git cat-file -t 81d4aaeff611587c378d25a90e1d2f178b484727
114
+
115
+3. A binary file `.git/index` is added. An entry is added to `.git/index` with the mode of the file,
116
+ the hash, and the filename. To see this, type `git ls-files --stage` (which is what is being shown
117
+ in your third terminal tab, if you've been following along. Also, `git ls-files --cached` does
118
+ the same thing as `--stage` but just lists the filenames only -- `--cached` is the default.)
119
+
120
+### Make the first commit to the repo
121
+
122
+Commit everything that has been added (everything in the index)
123
+
124
+ git commit -m "Initial commit"
125
+
126
+Observations
127
+
128
+1. The index stays the same
129
+2. There are now a total of three objects; two are new. One is a tree.
130
+
131
+ $ git cat-file -t 8dfcc6
132
+ tree
133
+ $ git cat-file -p 8dfcc6
134
+ 100644 blob 81d4aaeff611587c378d25a90e1d2f178b484727 home.md
135
+
136
+ It contains a pointer to the blob, and also includes a name, so that when the tree is applied, the
137
+ blob will get a name.
138
+
139
+ All of the paths are relative to the root of the project.
140
+
141
+3. The other new object is a commit. Note that the hash will be different from what I'm using, because the hash
142
+ is based on data that is different on your system, namely your name/email and the date/time of the commit,
143
+ which are included in the commit messages and change the hash.
144
+
145
+ $ git cat-file -t 0e12bc
146
+ commit
147
+ $ git cat-file -p 0e12bc
148
+ tree 8dfcc6e23dec849d71d7428290cf66aab644a22f
149
+ author John Norman <john@iorahealth.com> 1475667520 -0500
150
+ committer John Norman <john@iorahealth.com> 1475667520 -0500
151
+
152
+ Initial commit
153
+
154
+ The commit points to the tree. The tree points to the blob.
155
+
156
+### Now add a second file
157
+
158
+In the first tab,
159
+
160
+ echo "Software we like: git, atom" > software.md
161
+ git add software.md
162
+
163
+You should see a new item `ed/91a30546d771a1c22ab2d1e846bdea6b52a624` in the `objects/` directory, and a new line added to the index.
164
+
165
+### Let's check our status
166
+
167
+In the first tab,
168
+
169
+ git status
170
+
171
+Notice that the heading before the files in the index says: "Changes to be committed" and that we are told that the command to "unstage" a file is to type `git rm --cached <file> ...`. Thus already we have a proliferation of terms regarding the files that are compressed in `.git/objects` and listed in `.git/index` -- the terms are (1) changes to be committed; (2) "staged"; (3) "cached." These all mean the same thing. Nowhere is the word "index" mentioned, even though it is in the docs if you type `git help add`. But we know what all this stuff means: `git status` is telling us what is in `.git/objects` and `.git/index`.
172
+
173
+### Let's add a directory and another file and see what happens
174
+
175
+Let's create an `images/` directory and put something in there:
176
+
177
+ mkdir images
178
+ curl -s http://i2.kym-cdn.com/photos/images/original/000/173/575/25810.jpg -o images/wat.jpg
179
+ git add images
180
+
181
+Notice that we have added the compressed image file `d2/dd63f0a1e8f1c07f4c33e5af782a60fda3546f` to `.git/objects` as well as an entry in `.git/index`. The newly added directory (`images/`)
182
+is only indicated as the directory for the `wat.jpg` file (there is no separate addition for the index for `images/` itself. Also, if we
183
+type
184
+
185
+ git rm --cached images/wat.jpg
186
+
187
+the lines is taken out of the index, but the compressed file remains in `objects/`. [**NOTE:** The fact that the compressed file remains in `objects/` provides you some safety should you remove files from the index and from your working copy. But if you have not "pushed" your repo to a remote, deleting stuff from the `objects/` hierarchy can result in loss-of-data. In short, leave `objects/` alone, except for manipulations via git commands.]
188
+
189
+Do this:
190
+
191
+ rm images/wat.jpg
192
+ git add images
193
+
194
+are not files there, and . . . nothing happens to the index. This is because Git cannot represent empty directories in the index (staging area) -- See <https://git.wiki.kernel.org/index.php/GitFaq#Can_I_add_empty_directories.3F>.
195
+
196
+At this point, we removed `images/wat.jpg` from the index AND from our working directory. And yet we can still see the compressed
197
+file in the `.git/objects/` hierarchy under `d2/dd63f0a1e8f1c07f4c33e5af782a60fda3546f` which we can peek at with `git cat-file -p d2dd63f0a1e8f1c07f4c33e5af782a60fda3546f | imgcat`. Thus the arbiter of what Git knows is `.git/index` not the assortment of
198
+things in the `.git/objects` directory. In other words, you may find that you've removed a file from your working directory,
199
+`git status` doesn't know about it, but the file is still safe in the `.git/objects` directory.
200
+
201
+Now, re-do the curl command to get the image,
202
+
203
+ curl -s http://i2.kym-cdn.com/photos/images/original/000/173/575/25810.jpg -o images/wat.jpg
204
+
205
+and `git add images`. What's in `.git/objects` doesn't change (the image was already saved there) but
206
+the index changes.
207
+
208
+As you can see, the fact that `wat.jpg` lives in `images/` is only known to the index. You can get a feel for this
209
+by removing `images/wat.jpg` from the index with `git rm --cached images/wat.jpg`, then adding a new directory `images2`
210
+and then curl'ing a new `wat.jpg`, and then adding `images2/`:
211
+
212
+ git rm --cached images/wat.jpg
213
+ mkdir images2
214
+ curl -s http://i2.kym-cdn.com/photos/images/original/000/173/575/25810.jpg -o images2/wat.jpg
215
+ git add images2
216
+
217
+You will see that only the index changes -- the content
218
+has already been hashed into the objects hierarchy. Reverse the experiment with . . .
219
+
220
+ git rm --cached images2/wat.jpg
221
+ rm -fR images2
222
+ git add images
223
+
224
+Finally, before we commit, let's add one more file to the working directory that will help us understand the relationship
225
+between `git status`, the index, and the last commit.
226
+
227
+ echo -e '### Image list\n\n* more images.md ![](images/wat.jpg)' >images.md
228
+
229
+### Now let's see what happens if we do a commit
230
+
231
+Commit files with
232
+
233
+ git commit -m "Add pages and images"
234
+
235
+You will notice that this immediately creates some new items in `objects/`, a new directory `logs/` with some items, and a new directory `refs/` with some items. What has happened?
236
+
237
+First, the new objects. There are three of them. Use `git cat-file -p` and `git cat-file -t` to inspect them.
238
+
239
+1. `.git/objects/f9/c298b9e62cecfc65ede840f48fa025347909fe` is the **tree** with a blob entry for `wat.jpg`. Doing `git cat-file -p f9c298b9e62cecfc65ede840f48fa025347909fe` we see
240
+
241
+ 100644 blob d2dd63f0a1e8f1c07f4c33e5af782a60fda3546f wat.jpg
242
+
243
+ Do we know the name of the tree? No. Do we know `wat.jpg` is kept in a tree? Yes.
244
+
245
+2. `.git/objects/11/dd91adca91dc2445dfcee4d3418e8ad219db7c` is another **tree** created from the index that
246
+ represents the current state of the whole project. `git cat-file -t 11dd91`
247
+ tells us that it's a "tree," and `git cat-file -p 11dd91` shows
248
+ that it looks like this:
249
+
250
+ 100644 blob 81d4aaeff611587c378d25a90e1d2f178b484727 home.md
251
+ 040000 tree f9c298b9e62cecfc65ede840f48fa025347909fe images
252
+ 100644 blob ed91a30546d771a1c22ab2d1e846bdea6b52a624 software.md
253
+
254
+3. `.git/objects/31/8ff038bef2f31f61729fdebaaa02e8415c659a` (your name will be different; make a note of the hash you have) is a **commit** representing the new root for your repo. `git cat-file -t` tells us that it's a "commit."
255
+
256
+ The contents for mine are
257
+
258
+ tree 11dd91adca91dc2445dfcee4d3418e8ad219db7c
259
+ author John Norman <john@iorahealth.com> 1475414291 -0500
260
+ committer John Norman <john@iorahealth.com> 1475414291 -0500
261
+
262
+ Add pages and images
263
+
264
+Now we can trace this backwards. We want to do `git cat-file -t` and `git cat-file -p` on each of the following SHAs:
265
+
266
+* `318ff038bef2f31f61729fdebaaa02e8415c659a` **commit**
267
+* `11dd91adca91dc2445dfcee4d3418e8ad219db7c` **tree**: references blob home.md, tree images [`f9c298b9e62cecfc65ede840f48fa025347909fe`], and blob software.md)
268
+* `f9c298b9e62cecfc65ede840f48fa025347909fe` **tree**: references blob wat.jpg
269
+* `d2dd63f0a1e8f1c07f4c33e5af782a60fda3546f` **blob**: for wat.jpg
270
+
271
+Meanwhile, the contents of `.git/index` do not change. Tthis is because `.git/index` still represents what is
272
+being tracked, which, after a commit, is whatever is in the last commit -- i.e., `git diff --cached` returns nothing. `git status` will compare the state of the working directory with the index, and will find that there is one file that that is "untracked": `images.md`.
273
+
274
+### Making our entire repo with commands manipulating git objects directly
275
+
276
+Git provides for manipulating the object database directly, without using the commands to add to the index
277
+from your working directory. Here's the complete list of commands we used to set up our repo, add files,
278
+and do the two commits:
279
+
280
+ # Set up our directory and initialize the repo
281
+ cd $HOME
282
+ cd src
283
+ rm -fR wiki
284
+ mkdir wiki
285
+ cd wiki
286
+ echo "### Welcome to the wiki" >home.md
287
+ git init
288
+
289
+ # Add a file and commit
290
+ git add home.md
291
+ git commit -m "Initial commit"
292
+
293
+ # Create another file and add it
294
+ echo "Software we like: git, atom" > software.md
295
+ git add software.md
296
+
297
+ # Create an image directory, put a file in there, add the directory and image, and commit
298
+ mkdir images
299
+ curl -s http://i2.kym-cdn.com/photos/images/original/000/173/575/25810.jpg -o images/wat.jpg
300
+ git add images
301
+ git commit -m "Add pages and images"
302
+
303
+Now let's do all of this *without* creating any files in the working directory. We are going to
304
+add stuff to the index and `objects/` hierarchy directly.
305
+
306
+First, create our directory and initialize our repo.
307
+
308
+ cd $HOME
309
+ cd src
310
+ rm -fR wiki
311
+ mkdir wiki
312
+ cd wiki
313
+ git init
314
+
315
+We will now use the `hash-object` command with `-w` to create our object. Notice that no file is
316
+created in the working directory.
317
+
318
+ echo "### Welcome to the wiki" | git hash-object -w --stdin
319
+ # returns hash: 81d4aaeff611587c378d25a90e1d2f178b484727
320
+
321
+This only created the compressed object in `object/` -- we don't yet have anything in the index. To
322
+put a reference to our object in the index, we use `update-index`:
323
+
324
+ git update-index --add --cacheinfo 100644 81d4aaeff611587c378d25a90e1d2f178b484727 home.md
325
+
326
+The next thing we want to do is commit this object so that it is referenced from `HEAD`. Recall that
327
+the `commit` object needs to reference a `tree` object. Let's make a `tree` object from the current
328
+contents of the index. This command will report the hash for the newly-created tree.
329
+
330
+ git write-tree
331
+ # returns hash: 8dfcc6e23dec849d71d7428290cf66aab644a22f
332
+
333
+The hash value should look familiar from above.
334
+
335
+Now we will make the commit object with `commit-tree`:
336
+
337
+ echo "Initial commit" | git commit-tree 8dfcc6
338
+ # returns hash (yours will be different): 256b5975726098c52222d01787688681b959a991
339
+ # make a note of this hash: You will use it later
340
+
341
+At this point we can do `git log` if we pass in the commit hash
342
+
343
+ git log 256b5975726098c52222d01787688681b959a991
344
+
345
+Add `software.md` as above, but without creating a file in the working directory:
346
+
347
+ echo "Software we like: git, atom" | git hash-object -w --stdin
348
+ # hash = ed91a30546d771a1c22ab2d1e846bdea6b52a624
349
+ git update-index --add --cacheinfo 100644 ed91a30546d771a1c22ab2d1e846bdea6b52a624 software.md
350
+
351
+Add the binary for our image:
352
+
353
+ curl -s http://i2.kym-cdn.com/photos/images/original/000/173/575/25810.jpg | git hash-object -w --stdin
354
+ # returns hash: d2dd63f0a1e8f1c07f4c33e5af782a60fda3546f
355
+
356
+Add our file to the index and then make a new tree:
357
+
358
+ git update-index --add --cacheinfo 100644 d2dd63f0a1e8f1c07f4c33e5af782a60fda3546f images/wat.jpg
359
+ git write-tree
360
+ # returns hash: 11dd91adca91dc2445dfcee4d3418e8ad219db7c
361
+
362
+And commit. Note the additional `-p`: This specifies the prior commit so that this commit references
363
+the first one.
364
+
365
+ echo "Add pages and images" | git commit-tree 11dd91 -p 256b59
366
+ # returns hash (yours will be different): 914b2fef46cd6cd1c58c90def3a751d2fb738e39
367
+
368
+And now do git log with this new hash . . .
369
+
370
+ git log 914b2f
371
+
372
+Now let's create the master branch, and look at the log for that branch:
373
+
374
+ git update-ref refs/heads/master 914b2fef46cd6cd1c58c90def3a751d2fb738e39
375
+ # NOTE: Basically the same as . . .
376
+ # echo "914b2fef46cd6cd1c58c90def3a751d2fb738e39" > .git/refs/heads/master
377
+ git log --pretty=oneline master
378
+
379
+### What's in refs/?
380
+
381
+So now that we've committed something, we have a reference to the tip of our master branch. The SHA for the tip of our
382
+master branch is stored in the ordinary file `.git/refs/heads/master` -- the content is `318ff038bef2f31f61729fdebaaa02e8415c659a `, which
383
+is just the last thing we did with our repo: The last commit. Can we add another head? Yep. Do this:
384
+
385
+ git checkout -b jn-foobar
386
+
387
+Now we get a new file `.git/refs/head/jn-foobar`. And its content is also `318ff038bef2f31f61729fdebaaa02e8415c659a`, the SHA
388
+for our last commit. In other words, `master` and `jn-foobar` point to the same thing; meanwhile, `.git/HEAD` now is `jn-foobar`.
389
+
390
+We can play around with this endlessly, checking out `master`, then `jn-foobar` and see the content of `.git/HEAD` change accordingly.
391
+
392
+### What's in logs/?
393
+
394
+As you work, git will track your commits and keep them in various places in `logs/`. It is possible to reset
395
+your repo and lose references to hashes. But typically the compressed data is still somewhere in `objects/` --
396
+you just need to be able to find the hash for the lost file. By consulting the `logs/` via the `git reflog` command, you can sometimes recover lost work. By default, the log goes back 90 does for commits in your current history,
397
+and 30 days for commits that are no longer relevant.
398
+
399
+In my experience, if you find that you have to deal with the reflog, get help from someone who's dealt with
400
+it before.
401
+
402
+### What does it mean to do git add?
403
+
404
+When do you `git add`, you add a compressed version of the file(s) to the `objects/` directory (if it isn't there already), and add (a) line(s) to the `index`.
405
+
406
+### What's the opposite of git add?
407
+
408
+`git rm --cached filename`. Note, this will not delete the compressed binary version from the `objects/` directory.
409
+
410
+### What is the index? What is the "staging area"?
411
+
412
+git keeps a list of all of the files for the "next" commit. This list is called the `index`. It's also called the "staging area." It's also called the "cache." To get a list of all of the files in the index, type `git ls-files`. To get the same list, along with the hashes for each file and the file mode, do `git ls-files --stage`.
413
+
414
+It's all the files currently being tracked since the last commit, including removed files. So, for instance, if you clone a large repo such as what's at `git@github.com:IoraHealth/IoraHealth.git` and do `git ls-files | wc -l`, as of October 2016 you'll get 3913, which is the same the number of files in the repo not including what's in `.git/` (compare `find . -type f -not -path "./.git/*" | wc -l`).
415
+
416
+### What does git check -b branch do?
417
+
418
+### What does git commit do?
419
+
420
+### What does git reset do?
421
+
422
+### And what is the difference between git reset --hard and regular git reset?
423
+
424
+### What is the difference between git rm --cached [file] and git reset HEAD [file] and git reset [file]?
425
+
426
+### watch-do
427
+
428
+On the Mac,
429
+
430
+ brew install tree fswatch
431
+
432
+Create a file like this, call it `watch-do` and put it in your patch
433
+
434
+```
435
+#!/bin/bash
436
+
437
+# usage
438
+# watch-do <dir> <path-to-watch> <command-to-run>
439
+
440
+DIR=$1
441
+WATCH=$2
442
+COMMAND="clear && cd $DIR && $3"
443
+bash -c "$COMMAND"
444
+fswatch -l 0.1 -o "$WATCH" | xargs -n1 bash -c "$COMMAND"
445
+```
... ...
\ No newline at end of file