Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvester/main - Feature 1&2 DRAFT #1070

Draft
wants to merge 43 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
bf676b8
Initial Data Dump model (datacite/1862)
digitaldogsbody Jul 11, 2023
87f9958
Data dump DB migration (datacite/1862)
digitaldogsbody Jul 11, 2023
53c45ca
Data dump model initial test suite (datacite/1862)
digitaldogsbody Jul 11, 2023
20d5bde
Add Data dump model to the RSpec ElasticSearch helper (datacite/1862)
digitaldogsbody Jul 11, 2023
a3af8ac
Updated Schema after database migration (datacite/1862)
digitaldogsbody Jul 11, 2023
5a641af
Initial data dump controller (datacite/1863)
digitaldogsbody Jul 12, 2023
7490c0b
Data dump controller basic test suite (datacite/1863)
digitaldogsbody Jul 12, 2023
b927152
Initial data dump routes (datacite/1864)
digitaldogsbody Jul 12, 2023
ed04430
Data Dump index controller first pass (datacite/1866)
digitaldogsbody Jul 12, 2023
a367beb
Add a factory for the test suites (datacite/1868)
digitaldogsbody Jul 12, 2023
d94a486
Merge pull request #976 from datacite/harvester/1862
digitaldogsbody Jul 12, 2023
28aa571
Merge pull request #979 from datacite/harvester/1863
digitaldogsbody Jul 12, 2023
8ab398a
Merge pull request #980 from datacite/harvester/1866
digitaldogsbody Jul 12, 2023
93048d8
Add a factory for an incomplete data dump (datacite/1868)
digitaldogsbody Jul 13, 2023
d7d3f9d
Update test to create the data_dump from the factory (datacite/1868)
digitaldogsbody Jul 13, 2023
ac6a7d5
Update controller test to create an object and test presence (datacit…
digitaldogsbody Jul 13, 2023
a73009a
Update data dump factory to add missing attributes (datacite/1868)
digitaldogsbody Jul 13, 2023
efcc34d
Remove erroneous comma in validator (datacite/1862)
digitaldogsbody Jul 13, 2023
2899869
Add missing `query_aggregations` property required by Indexable conce…
digitaldogsbody Jul 13, 2023
cc0b91c
Correctly pass the `query` parameter to the ES query function (dataci…
digitaldogsbody Jul 13, 2023
8a0a9c6
Update factory to use Faker for more attributes so it can be used to …
digitaldogsbody Jul 13, 2023
da56e40
Update factory for incomplete objects (datacite/1868)
digitaldogsbody Jul 13, 2023
08c03cf
Initial Data Dump controller requests rspec suite (datacite/1868)
digitaldogsbody Jul 13, 2023
babb2ae
Add pagination tests to Data Dump controller suite (datacite/1868)
digitaldogsbody Jul 13, 2023
83fcfde
Fix bad requests in test suite (datacite/1868)
digitaldogsbody Jul 13, 2023
356eb6e
Fix validate inclusion model tests (datacite/1868)
digitaldogsbody Jul 13, 2023
6caf1d9
Merge pull request #981 from datacite/harvester/1868
digitaldogsbody Jul 13, 2023
538ea2e
Fix accidental conversion of database table schema to latin1
digitaldogsbody Jul 13, 2023
1c478bb
First pass data dump serializer (#1867)
digitaldogsbody Jul 13, 2023
6cd3e76
Merge pull request #983 from datacite/harvester/1867
digitaldogsbody Jul 13, 2023
c28a352
Fix missing brackets in link generation (#1889)
digitaldogsbody Jul 13, 2023
b5502da
Remove invalid parameters to spec requests (#1889)
digitaldogsbody Jul 13, 2023
b07a5fd
Fix links to return max page when the current page is outside of the …
digitaldogsbody Jul 13, 2023
7775fb6
Correct name of tested attribute to account for lowerCamel transforma…
digitaldogsbody Jul 13, 2023
3650925
Correct expected date format to account for serializer behaviour (#1889)
digitaldogsbody Jul 13, 2023
301ffd2
Merge pull request #984 from datacite/harvester/1889
digitaldogsbody Jul 13, 2023
18d5608
Update data dump controller spec to acquire and use a token (datacit…
digitaldogsbody Jul 19, 2023
a45e6b7
Update data dump request spec to test authorization and abilities (d…
digitaldogsbody Jul 19, 2023
7b6d94d
Add ability to permit reading of data dump files (datacite/1865)
digitaldogsbody Jul 19, 2023
cef0eed
Require ability to access data dump controller methods (datacite/1865)
digitaldogsbody Jul 19, 2023
754381c
Merge pull request #987 from datacite/harvester/1865
digitaldogsbody Jul 19, 2023
344f1ee
Add data dump feature 2
digitaldogsbody Aug 10, 2023
d17791f
Merge pull request #996 from datacite/harvester/feature-2
digitaldogsbody Aug 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions app/controllers/data_dumps_controller.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
class DataDumpsController < ApplicationController

prepend_before_action :authenticate_user!
# load_and_authorize_resource
def index
authorize! :read, :read_data_dumps
sort =
case params[:sort]
when "created"
{ created_at: { order: "asc" } }
when "-created"
{ created_at: { order: "desc" } }
when "start"
{ start_date: { order: "asc" } }
when "-start"
{ start_date: { order: "desc" } }
when "end"
{ end_date: { order: "asc" } }
when "-end"
{ end_date: { order: "desc"} }
else
{ created_at: { order: "desc" } }
end

page = page_from_params(params)

response = DataDump.query(
page: page,
sort: sort,
scope: params[:scope]
)

begin
total = response.results.total
total_pages = page[:size].positive? ? (total.to_f / page[:size]).ceil : 0

data_dumps = response.results

options = {}
options[:meta] = {
total: total,
"totalPages" => total_pages,
page: page[:number]
}.compact

options[:links] = {
self: request.original_url,
next:
if data_dumps.blank? || page[:number] == total_pages
nil
else
request.base_url + "/data_dumps?" +
{ "page[number]" => page[:number] + 1,
"page[size]" => page[:size],
sort: params[:sort],
}.compact.to_query
end,
prev:
if page[:number] == 1 || page[:number] == 0
nil
elsif data_dumps.blank?
# use the max page size
request.base_url + "/data_dumps?" +
{ "page[number]" => total_pages,
"page[size]" => page[:size],
sort: params[:sort],
}.compact.to_query
else
request.base_url + "/data_dumps?" +
{ "page[number]" => page[:number] - 1,
"page[size]" => page[:size],
sort: params[:sort],
}.compact.to_query
end
}.compact

render json:
DataDumpSerializer.new(data_dumps, options).serialized_json, status: :ok

rescue Elasticsearch::Transport::Transport::Errors::BadRequest => e
Raven.capture_exception(e)

message =
JSON.parse(e.message[6..-1]).to_h.dig(
"error",
"root_cause",
0,
"reason",
)

render json: { "errors" => { "title" => message } }.to_json,
status: :bad_request
end
end

def show
authorize! :read, :read_data_dumps
data_dump = DataDump.where(uid: params[:id]).first
if data_dump.blank? ||
(
data_dump.aasm_state != "complete"
# TODO: Add conditional check for role here
)
fail ActiveRecord::RecordNotFound
end
render json: DataDumpSerializer.new(data_dump).serialized_json, status: :ok
end

def latest
authorize! :read, :read_data_dumps
data_dump = DataDump.where(scope: params[:scope], aasm_state: "complete").order(end_date: :desc).first
if data_dump.blank? ||
(
data_dump.aasm_state != "complete"
# TODO: Add conditional check for role here
)
fail ActiveRecord::RecordNotFound
end
render json: DataDumpSerializer.new(data_dump).serialized_json, status: :ok
end
end
1 change: 1 addition & 0 deletions app/models/ability.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ def initialize(user)
can :export, :contacts
can :export, :organizations
can :export, :repositories
can :read, :read_data_dumps
elsif user.role_id == "staff_user"
can %i[read read_billing_information read_contact_information read_analytics], :all
elsif user.role_id == "consortium_admin" && user.provider_id.present?
Expand Down
122 changes: 122 additions & 0 deletions app/models/data_dump.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# frozen_string_literal: true

class DataDump < ApplicationRecord
include Elasticsearch::Model

include Indexable
include AASM

validates_presence_of :uid
validates_presence_of :scope
validates_presence_of :start_date
validates_presence_of :end_date

validates_uniqueness_of :uid, message: "This Data Dump UID is already in use"

validates_inclusion_of :scope, in: %w(metadata link), allow_blank: false

aasm whiny_transitions: false do
# initial state should prevent public visibility
state :generating, initial: true
# we might add more here in the future depending on the granularity of status updates we wish to provide
# but for now, we have a state for when the dump is done and being transferred to S3 and one for when it is
# ready to be downloaded
state :storing, :complete

event :store do
transitions from: :generating, to: :storing
end

event :release do
transitions from: :storing, to: :complete
end
end

if Rails.env.test?
index_name "data-dumps-test#{ENV['TEST_ENV_NUMBER']}"
elsif ENV["ES_PREFIX"].present?
index_name "data-dumps-#{ENV['ES_PREFIX']}"
else
index_name "data-dumps"
end

settings index: {
number_of_shards: 1,
analysis: {
analyzer: {
string_lowercase: {
tokenizer: "keyword", filter: %w[lowercase]
},
},
normalizer: {
keyword_lowercase: { type: "custom", filter: %w[lowercase] },
},
},
} do
mapping dynamic: "false" do
indexes :id
indexes :uid, type: :text
indexes :scope, type: :keyword
indexes :description, type: :text
indexes :start_date, type: :date, format: :date_optional_time
indexes :end_date, type: :date, format: :date_optional_time
indexes :records, type: :integer
indexes :checksum, type: :text
indexes :file_path, type: :text
indexes :aasm_state, type: :keyword
indexes :created_at, type: :date, format: :date_optional_time,
fields: {
created_sort: { type: :date }
}
indexes :updated_at, type: :date, format: :date_optional_time,
fields: {
updated_sort: { type: :date }
}
end
end

def self.query_aggregations
{}
end

def self.query(options = {})

options[:page] ||= {}
options[:page][:number] ||= 1
options[:page][:size] ||= 25

from = ((options.dig(:page, :number) || 1) - 1) * (options.dig(:page, :size) || 25)
sort = options[:sort]

filter = []
if options[:scope].present?
filter << { term: { scope: options[:scope].downcase } }
end

es_query = {bool: {filter: filter}}

if options.fetch(:page, {}).key?(:cursor)
__elasticsearch__.search(
{
size: options.dig(:page, :size),
search_after: search_after,
sort: sort,
query: es_query,
track_total_hits: true,
}.compact,
)
else
__elasticsearch__.search(
{
size: options.dig(:page, :size),
from: from,
sort: sort,
query: es_query,
track_total_hits: true,
}.compact,
)
end

end

end
22 changes: 22 additions & 0 deletions app/serializers/data_dump_serializer.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# frozen_string_literal: true

class DataDumpSerializer
include FastJsonapi::ObjectSerializer
set_key_transform :camel_lower
set_type "data-dump"
set_id :uid

attributes :description,
:scope,
:start_date,
:end_date,
:records,
:checksum,
:download_link,
:created_at,
:updated_at

attribute :download_link do |object|
"https://example.com/#{object.file_path}"
end
end
4 changes: 4 additions & 0 deletions config/routes.rb
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,10 @@
resources :repository_prefixes, path: "repository-prefixes"
resources :resource_types, path: "resource-types", only: %i[show index]

get "/data_dumps/:scope/latest", to: "data_dumps#latest", constraints: { scope: /(metadata|link)/ }
get "/data_dumps/:scope", to: "data_dumps#index", constraints: { scope: /(metadata|link)/ }
resources :data_dumps, constraints: { id: /[A-Za-z0-9_-]+/ }, only: %i[show index]

# custom routes for maintenance tasks
post ":username", to: "datacite_dois#show", as: :user

Expand Down
24 changes: 24 additions & 0 deletions db/migrate/20230711130313_create_data_dumps.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# frozen_string_literal: true

class CreateDataDumps < ActiveRecord::Migration[6.1]
def change
create_table :data_dumps do |t|
t.string :uid, null: false
t.string :scope, null: false
t.text :description
t.datetime :start_date, null: false
t.datetime :end_date, null: false
t.bigint :records
t.string :checksum
t.string :file_path
t.string :aasm_state

t.timestamps

t.index %w[uid], { name: "index_data_dumps_on_uid", unique: true }
t.index %w[updated_at], name: "index_data_dumps_on_updated_at"
t.index %w[scope], name: "index_data_dumps_on_scope"
t.index %w[aasm_state], name: "index_data_dumps_on_aasm_state"
end
end
end
20 changes: 19 additions & 1 deletion db/schema.rb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
#
# It's strongly recommended that you check this file into your version control system.

ActiveRecord::Schema.define(version: 2023_01_23_122711) do
ActiveRecord::Schema.define(version: 2023_07_11_130313) do
create_table "active_storage_attachments", charset: "utf8mb4", force: :cascade do |t|
t.string "name", limit: 191, null: false
t.string "record_type", null: false
Expand Down Expand Up @@ -137,6 +137,24 @@
t.datetime "deleted_at"
end

create_table "data_dumps", charset: "utf8mb4", force: :cascade do |t|
t.string "uid", null: false
t.string "scope", null: false
t.text "description"
t.datetime "start_date", null: false
t.datetime "end_date", null: false
t.bigint "records"
t.string "checksum"
t.string "file_path"
t.string "aasm_state"
t.datetime "created_at", precision: 6, null: false
t.datetime "updated_at", precision: 6, null: false
t.index ["aasm_state"], name: "index_data_dumps_on_aasm_state"
t.index ["scope"], name: "index_data_dumps_on_scope"
t.index ["uid"], name: "index_data_dumps_on_uid", unique: true
t.index ["updated_at"], name: "index_data_dumps_on_updated_at"
end

create_table "datacentre", charset: "utf8", force: :cascade do |t|
t.text "comments", size: :long
t.string "system_email", null: false
Expand Down
24 changes: 24 additions & 0 deletions spec/controllers/data_dumps_controller_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
require 'rails_helper'

RSpec.describe DataDumpsController, type: :controller do

let(:token) { User.generate_token }

describe "GET #index" do
it "returns http success" do
request.headers["Authorization"] = "Bearer " + token
get :index
expect(response).to have_http_status(:success)
end
end

describe "GET #show" do
let(:data_dump) { create(:data_dump) }
it "returns http success" do
request.headers["Authorization"] = "Bearer " + token
get :show, params: { id: data_dump.uid }
expect(response).to have_http_status(:success)
end
end

end
Loading