Skip to content

Commit

Permalink
added AGPL3 license, added more documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
bil-paul committed Dec 31, 2024
1 parent 2378c8d commit 53be98b
Show file tree
Hide file tree
Showing 11 changed files with 726 additions and 27 deletions.
661 changes: 661 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

41 changes: 27 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# Trusted Timestamping for Scientific Research Data

This repository presents a framework for leveraging [trusted timestamping](https://en.wikipedia.org/wiki/Trusted_timestamping) as defined in [RFC 3161](https://www.ietf.org/rfc/rfc3161.txt) in a manner suitable for providing data integrity assurances for scientific research data (or arbitrary files/data).
This repository presents a [open source](https://en.wikipedia.org/wiki/Open_source) framework for leveraging [trusted timestamping](https://en.wikipedia.org/wiki/Trusted_timestamping) as defined in [RFC 3161](https://www.ietf.org/rfc/rfc3161.txt) in a manner suitable for providing data integrity assurances for scientific research data (or arbitrary files/data).

This frameworks permits a cryptographically secure way to prove that a specific version of a file (or directory of files) existed at the time of timestamping.
This can be helpful in demonstrating that a specific piece of scientific data has not been altered since the timestamp (e.g., shortly after acquisition/creation).
It can also be used to maintain an [electronic lab notebook](https://en.wikipedia.org/wiki/Electronic_lab_notebook) via timestamped git repositories.

## Timestamping Website: [timestamp.stanford.edu](https://timestamp.stanford.edu)

Expand All @@ -15,26 +19,30 @@ Note, in-browser hashing means that very large data files will take time to proc
At the moment, there is a max individual file size constraint due to non-chunked reads (to be fixed).
Max individual file sizes vary depending on the browser, but are around 2-4GB.

Hashes and their respective timestamps submitted to the API are stored and made public on a daily basis at [timestamp-record](https://github.com/bil/timestamp-record).
Hashes submitted to the API and their respective timestamps are stored and made public on a daily basis at [timestamp-record](https://github.com/bil/timestamp-record).

## Binder Demo [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bil/timestamping/HEAD)

A demo of the command-line tools is available via Binder.

Choose `Terminal`, under the `Other` section and navigate to the example directory.

From there, execute `./example.sh` to timestamp an example file and `./example.sh DIR` to timestamp an example directory and its contents.

## Timestamping git Repositories

The `post-commit` file in the `hooks` directory supports the addition of trusted timestamps to any git repository with this timestamping installed.
The `post-commit` files in the `hooks` directory support the timestamping of git repositories.
The `post-commit.local` file performs timestamping locally, and requires the trustedtimestamping scripts to be installed/available on the local system.
The `post-commit.api` file performs timstamping via the API.

To use, copy the `post-commit` file to the `.git/hooks` directory of a given repository and commit as normal.
To use, copy the desired `post-commit` file to the `.git/hooks` directory of a given repository, ensuring it is named `post-commit`, and commit as normal.

Every manual commit will be followed by an automatic timestamp commit against the checksum of the repository HEAD.
A `.timestamps.json` file will be added to the root of the git repository.

Repository timestamps can validated by running `ttsVerifyGit` against any checked out timestamp commit revision.

Revisions to this repository are timestamped in this manner as an example and to validate the temporal history of the repository as of commit ID `395bc18` on Dec 16, 2024.
Revisions to this repository are timestamped in this manner as an example and to validate the temporal history of the repository.

This framework permits a temporally-verifiable record of a git repository, suitable for an electornic lab notebook or other authoritative archive.
When a common repository is used by multiple individuals (and optionally coupled with [signed commits](https://git-scm.com/book/ms/v2/Git-Tools-Signing-Your-Work)), a cryptographically secure and unique record can be defended so long as at least one of the contributing individuals is truthful (up to the truthful individual's last commit, at least).
Expand Down Expand Up @@ -69,33 +77,38 @@ The reason for this is still being investigated.

## Local Configuration

The shell scripts permit local configuration of TSA servers besides the default ones mentioned above.
The scripts permit local configuration of TSA servers besides the default ones mentioned above.

A custom `TSA.source` and `CA` directory can be placed in `~/.config/trustedts` and will take priority over the defaults.
See the structure of `trustedtimestamping/etc/trustedts` for the file structure to be replicated.

## JSON Timestamps File Format

The field structure of the timestamps JSON file is mostly self-explanatory. See the `.timestamps.json` as an example for a git-specific timestamp format.
The field structure of the timestamps JSON file is mostly self-explanatory.
See the `.timestamps.json` as an example for a git-specific timestamp format.

The `format`, `version`, and `timestamps` fields are required. All others are optional.
The `format`, `version`, and `timestamps` fields are required.
All others are optional.

A hash field is optionally included for convenience of identifiability. A time field was specifically omitted and must be derived from the timestamp replies.
A hash field is optionally included for convenience of identifiability.
A time field was specifically omitted and must be derived from the timestamp replies.

### File Timestamps

File timestamps are derived by calculating the SHA256 hash of the file, generating the respective `sha256sum` compliant string, calculating the SHA256 hash of this string, and using this second hash as the digest for the timestamp request.
This is done to 1) make immutable both the file's contents *and* its name and 2) maintain compatibility with the coreutils package.
The format of this string is: `<32 byte hash in lowercase><two spaces><file name><line feed character (\n)>`.
The format of this string is: `<32 byte SHA256 hash in lowercase><two spaces><file name><line feed character (\n)>`.

### Directory Timestamps

Directory timestamps are generated in an identical fashion to file timestamps, except the `shas256sum` compliant string contains multiple lines, one for each file. The order of these lines is important, and is sorted by file name alphabetically, shallow to deep.
Directory timestamps are generated in an identical fashion to file timestamps, except the `shas256sum` compliant string contains multiple lines, one for each file (including relative path).
At the moment, the hash calculation is sensitive to the ordering of these lines, and is sorted by file name alphabetically, shallow to deep.

### Git Repository Timestamps

Git repositories are timestamped based on their commit id.
Git repositories are timestamped based on their commit ID.
The default hash format for a git repository is SHA1, which is then passed through SHA256.
While recent versions of git support SHA256, most git repository servers do not have support for repositories with SHA256 object formats.
While recent versions of git support SHA256, most git repository servers (e.g., GitHub, GitLab, etc) do not have support for repositories with SHA256 object formats.

SHA1 is not as secure as SHA256, however, this is practically still safe since each commit will have its own timestamp, making the feasibility for a collision that replicates all those hashes difficult.
SHA1 is not as secure as SHA256.
However, this is practically still safe since each commit will have its own timestamp, making the feasibility for a collision that replicates mutliple hashes across many commits difficult.
Binary file modified build/deb/trustedtimestamping_0.0.1-1.deb
Binary file not shown.
7 changes: 4 additions & 3 deletions example/example.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
#!/bin/bash

# standalone example script performing trusted timestamping on a single data file
# requirements: openssl (3.0+), curl, jq, bash (3+), coreutils
# standalone example script performing trusted timestamping on a data file (or directory)
# ./example.sh to timestamp a single file
# ./example.sh DIR to timestamp a directory

# exit on error
set -e
Expand Down Expand Up @@ -33,7 +34,7 @@ ttsStamp tsRequest_$PATH_DATA_NAME.tsq
printf "Timestamp replies received\n\n"

echo "Verifying timestamp replies..."
ttsVerify $PATH_DATA
ttsVerify $PATH_DATA ./
printf "Verification complete, timestamps verified if all output reads: \"Verification: OK\"\n\n"

echo "Building timestamps JSON..."
Expand Down
12 changes: 9 additions & 3 deletions trustedtimestamping/usr/local/bin/ttsDOI
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
#!/bin/bash

# extract timestamp json from DOI
# currently only supports DataCite DOIs
# DataCite DOIs: pull timestamp JSON from TechnicalInfo Description Field

set -u
set -e

Expand All @@ -10,11 +14,13 @@ DOI="$1"

DOI_SUFFIX=$(basename $DOI)
DOI_PREFIX=$(basename $(dirname $DOI))
TS_FILE_NAME=timestamps_"$DOI_PREFIX"-"$DOI_SUFFIX".json

RA=$(curl -s -S https://doi.org/ra/$DOI_PREFIX | jq -r ".[0].RA")

if [ "$RA" == "DataCite" ]; then
TS_JSON=$(curl -s https://api.datacite.org/dois/"$DOI_PREFIX"/"$DOI_SUFFIX" | jq '.data.attributes.descriptions[1].description | fromjson')
printf '%s' "$TS_JSON" > timestamps_"$DOI_PREFIX"-"$DOI_SUFFIX".json
$DIR_BIN/ttsUnpackJSON <(printf '%s' "$TS_JSON")
TS_JSON_DESC=$(curl -s https://api.datacite.org/dois/"$DOI_PREFIX"/"$DOI_SUFFIX" | jq '.data.attributes.descriptions')
TECHINFO_IDX=$(printf "%s" "$TS_JSON_DESC" | jq '[.[] | .descriptionType] | index("TechnicalInfo")')
printf '%s' "$TS_JSON_DESC" | jq -r --argjson idx "$TECHINFO_IDX" '.[$idx].description' > $TS_FILE_NAME
$DIR_BIN/ttsUnpackJSON $TS_FILE_NAME
fi
4 changes: 4 additions & 0 deletions trustedtimestamping/usr/local/bin/ttsGenReq
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
#!/bin/bash

# generate timestamp request
# input argument: path to file or directory to be timestamped
# outputs a tsRequest_<name>.tsq file in the working directory

set -e
set -u

Expand Down
4 changes: 3 additions & 1 deletion trustedtimestamping/usr/local/bin/ttsPackJSON
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#!/bin/bash

# requirements: jq, base64
# assembles timestamps json file
# input: path containing timetstamp replies (.tsr)
# outputs a timestamps<name>.json file in the working directory

set -e
set -u
Expand Down
4 changes: 4 additions & 0 deletions trustedtimestamping/usr/local/bin/ttsRepCert
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
#!/bin/bash

# extract certificate chain from a timestamp reply
# input argument: path to .tsr file
# output: certificate chain of pem files to working directory

set -u
set -e

Expand Down
4 changes: 4 additions & 0 deletions trustedtimestamping/usr/local/bin/ttsStamp
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
#!/bin/bash

# stamp timestamp request against timestamp authority servers
# input: path to timestamp request file (.tsq in this framework, but not sensitive to file extension)
# output: a set of timestamp reply files (.tsr), one for every defined timestamp authority server, output working directory

set -u

FILE_REQ=$1
Expand Down
13 changes: 7 additions & 6 deletions trustedtimestamping/usr/local/bin/ttsVerify
Original file line number Diff line number Diff line change
@@ -1,17 +1,20 @@
#!/bin/bash

# verifies a timestamps json file against the candiate originating file/directory
# inputs: <path to candidate file/directory> <path to timestamp replies (.tsr)>
# output: stdout responses from openssl verify

set -u

PATH_ABS=$(realpath $1)
NAME_DATA=$(basename $PATH_ABS)
DIR_DATA=$(dirname $PATH_ABS)
DIR_ORIGIN=$(pwd)
DIR_TS=$(realpath $2)


DIR_LOCAL_BIN=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
source $DIR_LOCAL_BIN/../../../etc/trustedts/tts.source

DIR_ORIGIN=$(pwd)
DIGEST_SIZE=256

# check for Mac
Expand All @@ -33,7 +36,6 @@ fi

DIGEST_DATA=$(printf "%s\n" "$DIGEST_FILE" | $CMD_SHA | cut -c 1-$(($DIGEST_SIZE/4)) )

cd $DIR_ORIGIN
for TSA_idx in $(seq 0 $((${#TSA_names[@]}-1)) ); do

printf 'Verifying %s:\n' "${TSA_names[$TSA_idx]}"
Expand All @@ -43,7 +45,7 @@ for TSA_idx in $(seq 0 $((${#TSA_names[@]}-1)) ); do
cd $DIR_TMP

# extract certificates from timestamp
$DIR_BIN/ttsRepCert $DIR_ORIGIN/tsReply_${TSA_names[$TSA_idx]}.tsr
$DIR_BIN/ttsRepCert $DIR_TS/tsReply_${TSA_names[$TSA_idx]}.tsr
# split cert chain pem into individual certificates
csplit -s -f tsReply_${TSA_names[$TSA_idx]} -b %02d.pem tsReply_${TSA_names[$TSA_idx]}.pem /END\ CERTIFICATE/+2 {*}
# delete empty file
Expand All @@ -56,9 +58,8 @@ for TSA_idx in $(seq 0 $((${#TSA_names[@]}-1)) ); do
# copy over root CAs, overwriting links to any extracted CA root with these root certs
cp -a $DIR_CA/* $DIR_TMP

cd $DIR_ORIGIN
openssl ts -verify -digest $DIGEST_DATA \
-in tsReply_${TSA_names[$TSA_idx]}.tsr \
-in $DIR_TS/tsReply_${TSA_names[$TSA_idx]}.tsr \
-CApath $DIR_TMP 2> >(grep -v "Using configuration from" >&2)

rm -rf $DIR_TMP
Expand Down
3 changes: 3 additions & 0 deletions trustedtimestamping/usr/local/bin/ttsVerifyGit
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/bin/bash

# verify timestamps of a git repository
# must be executed from base of git repository where .timestamps.json file resides

set -u

COMMIT_LAST=$(git log -1 --oneline --pretty=format:%s)
Expand Down

0 comments on commit 53be98b

Please sign in to comment.