-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement SQLite database #42
base: main
Are you sure you want to change the base?
Conversation
- Schema defined in TS - Ingests from ./data folder as source of truth (can be blown up and recreated anytime) - Example implementation of scoring algorithm / config in TS, so that scoring layer can be iterated on separately from the data wrangling, or made user configurable even.
Warning There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure. 🔧 eslint
drizzle.config.tsOops! Something went wrong! :( ESLint: 9.18.0 ESLint couldn't find an eslint.config.(js|mjs|cjs) file. From ESLint v9.0.0, the default configuration file is now eslint.config.js. https://eslint.org/docs/latest/use/configure/migration-guide If you still have problems after following the migration guide, please stop by WalkthroughThis pull request introduces a comprehensive SQLite database integration for tracking contributor activities and performance. The changes include setting up database configuration, creating schema definitions, implementing data ingestion scripts, and adding utility functions for querying and scoring contributor data. The implementation leverages Drizzle ORM and provides a structured approach to managing historical contributor information. Changes
Sequence DiagramsequenceDiagram
participant Scripts as Initialization Scripts
participant DB as SQLite Database
participant Ingest as Data Ingestion
participant Queries as Query Modules
Scripts->>DB: Create Database
Scripts->>Ingest: Load Historical Data
Ingest->>DB: Insert Users
Ingest->>DB: Insert Daily Summaries
Ingest->>DB: Insert User Stats
Queries->>DB: Retrieve Contributor Information
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Nitpick comments (13)
src/lib/data/types.ts (2)
31-35
: Restrictstate
field to specific valuesIn the
PullRequestSchema
, thestate
field is a string with a default of"open"
. To ensure consistency, consider limiting this field to accepted values like"open"
,"closed"
, or"merged"
.Apply this change to define allowed values:
- state: z.string().optional().default("open"), + state: z.enum(["open", "closed", "merged"]).optional().default("open"),
55-59
: Simplify thestate
field definitionIn the
IssueSchema
, thestate
field is optional with a default value. Theoptional()
is unnecessary since a default is provided.Simplify the schema as follows:
- state: z - .string() - .transform((s) => s.toLowerCase()) - .optional() - .default("open"), + state: z.string().transform((s) => s.toLowerCase()).default("open"),src/lib/data/ingest.ts (2)
133-133
: Remove unnecessarycontinue
statementThe
continue;
at line 133 is redundant since it's the last statement in the loop iteration.Apply this diff to remove it:
console.error( `Error processing contributor ${contributor.contributor} in ${file}:`, contributorError ); - continue;
🧰 Tools
🪛 Biome (1.9.4)
[error] 133-133: Unnecessary continue statement
Unsafe fix: Delete the unnecessary continue statement
(lint/correctness/noUnnecessaryContinue)
72-86
: Simplify total calculationsYou can streamline the calculation of totals using default values for the arrays.
Apply this diff to simplify:
- const totalAdditions = - contributor.activity.code.commits?.reduce( - (sum, commit) => sum + commit.additions, - 0 - ) ?? 0; + const totalAdditions = (contributor.activity.code.commits || []).reduce( + (sum, commit) => sum + commit.additions, + 0 + ); - const totalDeletions = - contributor.activity.code.commits?.reduce( - (sum, commit) => sum + commit.deletions, - 0 - ) ?? 0; + const totalDeletions = (contributor.activity.code.commits || []).reduce( + (sum, commit) => sum + commit.deletions, + 0 + ); - const totalChangedFiles = - contributor.activity.code.commits?.reduce( - (sum, commit) => sum + commit.changed_files, - 0 - ) ?? 0; + const totalChangedFiles = (contributor.activity.code.commits || []).reduce( + (sum, commit) => sum + commit.changed_files, + 0 + );src/lib/data/scoring.ts (1)
130-135
: Adjust fuzzy match threshold for better accuracyThe current threshold in
fuzzyMatch
may be too lenient, potentially causing false positives in impact detection.Consider tightening the score threshold to improve matching accuracy:
- return result !== null && result.score > -5000; + return result !== null && result.score > -1000;drizzle.config.ts (1)
10-10
: Set 'verbose' logging based on environmentEnabling
verbose: true
may flood logs in production. It's better to control verbosity via environment variables.Example adjustment:
export default defineConfig({ // ... verbose: process.env.NODE_ENV !== "production", });scripts/init-db.ts (2)
12-12
: Remove unnecessary 'process.exit(0)' callThe explicit call to
process.exit(0)
is redundant since the script will exit naturally after completion.Apply this change:
console.log("Done!"); - process.exit(0);
15-18
: Handle errors with more contextEnhance error logging to provide more insight into failures during database initialization.
Consider including error stack traces:
main().catch((error) => { - console.error("Error:", error); + console.error("Initialization error:", error.stack || error); process.exit(1); });src/lib/data/db.ts (1)
15-17
: Enhance cleanup with error handling.Add error handling for database closure to ensure proper cleanup.
-process.on("exit", () => { - sqlite.close(); -}); +process.on("exit", () => { + try { + sqlite.close(); + } catch (error) { + console.error("Error closing database:", error); + } +});src/lib/data/queries.ts (1)
74-96
: Consider pagination for top contributors query.The correlated subquery with LIMIT might not perform well with large datasets.
Consider implementing cursor-based pagination using the score and username as cursor fields.
drizzle/0000_aromatic_slipstream.sql (2)
8-23
: Consider adding ON DELETE CASCADE for user dependencies.The foreign key to users table should probably cascade deletions to clean up related records when a user is deleted.
- FOREIGN KEY (`username`) REFERENCES `users`(`username`) ON UPDATE no action ON DELETE no action + FOREIGN KEY (`username`) REFERENCES `users`(`username`) ON UPDATE CASCADE ON DELETE CASCADE
25-26
: Consider using INTEGER for timestamps.Currently using TEXT for timestamps. SQLite's INTEGER timestamps are more efficient for date/time operations and sorting.
- `created_at` text DEFAULT CURRENT_TIMESTAMP NOT NULL, - `last_updated` text DEFAULT CURRENT_TIMESTAMP NOT NULL, + `created_at` integer DEFAULT (strftime('%s', 'now')) NOT NULL, + `last_updated` integer DEFAULT (strftime('%s', 'now')) NOT NULL,Also applies to: 55-55, 4-4, 33-33, 38-38, 270-276
README.md (1)
74-89
: Enhance database setup documentation.Consider adding these important details to the documentation:
- Database file location and naming convention
- Schema migration commands for future updates
- Data backup recommendations
Example additions:
This will: - Create a SQLite database in the `data/` directory - Set up the required tables and schema - Ingest historical contributor data + +### Database Management + +Database file: `data/contributor.db` + +#### Schema Migrations +```bash +# Generate migration +bun run db:generate + +# Apply migration +bun run init-db +``` + +#### Backups +It's recommended to backup the database file daily. Example: +```bash +cp data/contributor.db data/backups/contributor_$(date +%Y%m%d).db +```
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
bun.lockb
is excluded by!**/bun.lockb
📒 Files selected for processing (16)
.gitignore
(1 hunks)README.md
(1 hunks)drizzle.config.ts
(1 hunks)drizzle/0000_aromatic_slipstream.sql
(1 hunks)drizzle/meta/0000_snapshot.json
(1 hunks)drizzle/meta/_journal.json
(1 hunks)package.json
(2 hunks)scripts/init-db.ts
(1 hunks)src/lib/data/db.ts
(1 hunks)src/lib/data/ingest.ts
(1 hunks)src/lib/data/queries.ts
(1 hunks)src/lib/data/schema.ts
(1 hunks)src/lib/data/scoring.ts
(1 hunks)src/lib/data/types.ts
(1 hunks)src/lib/date-utils.ts
(1 hunks)tsconfig.json
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- drizzle/meta/_journal.json
🧰 Additional context used
🪛 Biome (1.9.4)
src/lib/data/ingest.ts
[error] 133-133: Unnecessary continue statement
Unsafe fix: Delete the unnecessary continue statement
(lint/correctness/noUnnecessaryContinue)
🔇 Additional comments (6)
.gitignore (1)
40-43
: LGTM! Comprehensive coverage of SQLite files.All necessary SQLite database files are properly ignored.
package.json (2)
10-13
: LGTM! Well-organized scripts for database management.The new scripts provide a clear workflow for database operations:
check
: Runs linting and type checkinginit-db
: Initializes the databasedb:generate
: Generates database files
26-27
: Verify package versions for security and compatibility.The new dependencies need version verification:
Also applies to: 37-37, 42-42
✅ Verification successful
Dependencies are up-to-date with minor update available
- All dependencies are using latest versions except
@types/bun
(1.1.17 → 1.1.18)- Consider updating
@types/bun
in a future maintenance cycle🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for latest versions and security advisories echo "Checking drizzle-orm..." curl -s "https://registry.npmjs.org/drizzle-orm" | jq -r '.["dist-tags"].latest' echo "Checking fuzzysort..." curl -s "https://registry.npmjs.org/fuzzysort" | jq -r '.["dist-tags"].latest' echo "Checking @types/bun..." curl -s "https://registry.npmjs.org/@types/bun" | jq -r '.["dist-tags"].latest' echo "Checking drizzle-kit..." curl -s "https://registry.npmjs.org/drizzle-kit" | jq -r '.["dist-tags"].latest' # Check for known vulnerabilities echo "Checking for vulnerabilities..." curl -s "https://api.snyk.io/v1/vuln/npm:[email protected]" || echo "No vulnerabilities found" curl -s "https://api.snyk.io/v1/vuln/npm:[email protected]" || echo "No vulnerabilities found"Length of output: 926
Script:
#!/bin/bash # Check for vulnerabilities using npm audit echo "Creating temporary package.json..." cat << EOF > /tmp/package.json { "dependencies": { "drizzle-orm": "0.38.4", "fuzzysort": "3.1.0", "@types/bun": "1.1.17", "drizzle-kit": "0.30.2" } } EOF echo "Running npm audit..." cd /tmp && npm audit --json || echo "No vulnerabilities found" echo "Checking @types/bun changelog..." curl -s "https://registry.npmjs.org/@types/bun" | jq -r '.versions["1.1.18"].dist.tarball' | xargs curl -s | tar -xzO package/CHANGELOG.md || echo "No changelog found"Length of output: 1285
tsconfig.json (1)
30-31
: LGTM! Correct TypeScript configuration for database scripts.The addition of
scripts/**/*.ts
ensures proper TypeScript compilation for database management scripts.drizzle/0000_aromatic_slipstream.sql (1)
1-6
: LGTM! Well-structured tags table.The tags table has appropriate columns and constraints.
drizzle/meta/0000_snapshot.json (1)
1-457
: LGTM! Schema metadata is consistent with SQL definition.The JSON metadata accurately reflects the database schema structure.
id: text("id").primaryKey(), // username_date | ||
username: text("username").references(() => users.username), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add notNull()
constraint to foreign key username
In the userDailySummaries
table, the username
field references users.username
but lacks a notNull()
constraint. Adding notNull()
ensures data integrity by preventing null values in this foreign key.
Apply this diff to add the constraint:
- username: text("username").references(() => users.username),
+ username: text("username").notNull().references(() => users.username),
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
id: text("id").primaryKey(), // username_date | |
username: text("username").references(() => users.username), | |
id: text("id").primaryKey(), // username_date | |
username: text("username").notNull().references(() => users.username), |
username: text("username").references(() => users.username), | ||
tag: text("tag").references(() => tags.name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add notNull()
constraint to foreign keys username
and tag
In the userTagScores
table, the username
and tag
fields reference other tables but lack notNull()
constraints. To ensure referential integrity, consider adding notNull()
to these fields.
Apply this diff to add the constraints:
- username: text("username").references(() => users.username),
- tag: text("tag").references(() => tags.name),
+ username: text("username").notNull().references(() => users.username),
+ tag: text("tag").notNull().references(() => tags.name),
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
username: text("username").references(() => users.username), | |
tag: text("tag").references(() => tags.name), | |
username: text("username").notNull().references(() => users.username), | |
tag: text("tag").notNull().references(() => tags.name), |
(pr as { author_contributions_count?: number }) | ||
.author_contributions_count ?? 0; | ||
const isNewContributor = authorContributions <= 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure 'author_contributions_count' exists on PR objects
The property author_contributions_count
may not exist on the PR
object as defined. Casting pr
to include this property without assurance can lead to incorrect values or runtime errors.
Consider obtaining author_contributions_count
from the correct source or modifying the data model accordingly.
(issue as { related_prs?: PR[] }).related_prs?.length ?? 0; | ||
if (relatedPRs > 0) { | ||
points += config.issueReferenceBonus * relatedPRs; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify existence of 'related_prs' in Issue objects
The code assumes that issue
has a related_prs
property, which may not be present in the Issue
type. This could result in undefined behavior.
Ensure that related_prs
is a valid property of Issue
or adjust the data retrieval method to access related PRs appropriately.
const calculateCommitPoints = ( | ||
commit: Commit, | ||
config: ScoringConfig | ||
): number => { | ||
const points = config.baseCommitPoints; | ||
|
||
// Add impact multiplier for commits | ||
const multiplier = calculateImpactMultiplier( | ||
{ | ||
number: 0, // Placeholder | ||
title: commit.message ?? "", | ||
state: "closed", | ||
merged: true, | ||
created_at: commit.created_at, | ||
updated_at: commit.created_at, | ||
body: commit.message ?? "", | ||
files: [ | ||
{ | ||
path: commit.sha, | ||
additions: commit.additions, | ||
deletions: commit.deletions, | ||
}, | ||
], | ||
reviews: [], | ||
comments: [], | ||
}, | ||
config | ||
); | ||
|
||
return points * multiplier; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor to avoid constructing placeholder PR objects
Creating a placeholder PR object within calculateCommitPoints
to reuse calculateImpactMultiplier
is a code smell and may lead to maintenance challenges.
Consider refactoring calculateImpactMultiplier
to accept a generic type that accommodates both PRs and commits, or implement a separate impact calculation for commits.
Example modification:
- const calculateImpactMultiplier = (pr: PR, config: ScoringConfig): number => {
+ const calculateImpactMultiplier = (item: { files?: { path: string }[]; title?: string }, config: ScoringConfig): number => {
Committable suggestion skipped: line range outside the PR's diff.
const sqlite = new Database(path.join(process.cwd(), "data/db.sqlite"), { | ||
create: true, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for database initialization.
The database initialization should include try-catch blocks to handle potential file system errors.
-const sqlite = new Database(path.join(process.cwd(), "data/db.sqlite"), {
- create: true,
-});
+try {
+ const sqlite = new Database(path.join(process.cwd(), "data/db.sqlite"), {
+ create: true,
+ });
+} catch (error) {
+ console.error("Failed to initialize database:", error);
+ process.exit(1);
+}
Committable suggestion skipped: line range outside the PR's diff.
const dateMatch = filename.match(/\d{4}[-_]\d{2}[-_]\d{2}/); | ||
return dateMatch ? denormalizeDate(dateMatch[0]) : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add date validation to extractDateFromFilename.
The function should validate extracted dates using the existing isValidDateString
function.
- const dateMatch = filename.match(/\d{4}[-_]\d{2}[-_]\d{2}/);
- return dateMatch ? denormalizeDate(dateMatch[0]) : null;
+ const dateMatch = filename.match(/\d{4}[-_]\d{2}[-_]\d{2}/);
+ if (!dateMatch) return null;
+ const normalizedDate = denormalizeDate(dateMatch[0]);
+ return isValidDateString(normalizedDate) ? normalizedDate : null;
Committable suggestion skipped: line range outside the PR's diff.
const prs = JSON.parse(summary.pullRequests); | ||
return prs.slice(0, limit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for JSON parsing.
JSON.parse can throw errors and should be wrapped in try-catch.
- const prs = JSON.parse(summary.pullRequests);
- return prs.slice(0, limit);
+ try {
+ const prs = JSON.parse(summary.pullRequests);
+ return prs.slice(0, limit);
+ } catch (error) {
+ console.error("Failed to parse pull requests:", error);
+ return [];
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
const prs = JSON.parse(summary.pullRequests); | |
return prs.slice(0, limit); | |
try { | |
const prs = JSON.parse(summary.pullRequests); | |
return prs.slice(0, limit); | |
} catch (error) { | |
console.error("Failed to parse pull requests:", error); | |
return []; | |
} |
const commits = JSON.parse(summary.commits); | ||
return commits.slice(0, limit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for commits JSON parsing.
Similar to PRs, commits parsing needs error handling.
- const commits = JSON.parse(summary.commits);
- return commits.slice(0, limit);
+ try {
+ const commits = JSON.parse(summary.commits);
+ return commits.slice(0, limit);
+ } catch (error) {
+ console.error("Failed to parse commits:", error);
+ return [];
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
const commits = JSON.parse(summary.commits); | |
return commits.slice(0, limit); | |
try { | |
const commits = JSON.parse(summary.commits); | |
return commits.slice(0, limit); | |
} catch (error) { | |
console.error("Failed to parse commits:", error); | |
return []; | |
} |
CREATE TABLE `user_tag_scores` ( | ||
`id` text PRIMARY KEY NOT NULL, | ||
`username` text, | ||
`tag` text, | ||
`score` real DEFAULT 0 NOT NULL, | ||
`level` integer DEFAULT 0 NOT NULL, | ||
`progress` real DEFAULT 0 NOT NULL, | ||
`points_to_next` real DEFAULT 0 NOT NULL, | ||
`last_updated` text DEFAULT CURRENT_TIMESTAMP NOT NULL, | ||
FOREIGN KEY (`username`) REFERENCES `users`(`username`) ON UPDATE no action ON DELETE no action, | ||
FOREIGN KEY (`tag`) REFERENCES `tags`(`name`) ON UPDATE no action ON DELETE no action | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add unique constraint and index for user_tag_scores.
- Add a unique constraint on (username, tag) to prevent duplicate scores
- Add an index on the tag column for faster lookups
+CREATE UNIQUE INDEX `idx_user_tag_scores_unique` ON `user_tag_scores` (`username`, `tag`);
+CREATE INDEX `idx_user_tag_scores_tag` ON `user_tag_scores` (`tag`);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
CREATE TABLE `user_tag_scores` ( | |
`id` text PRIMARY KEY NOT NULL, | |
`username` text, | |
`tag` text, | |
`score` real DEFAULT 0 NOT NULL, | |
`level` integer DEFAULT 0 NOT NULL, | |
`progress` real DEFAULT 0 NOT NULL, | |
`points_to_next` real DEFAULT 0 NOT NULL, | |
`last_updated` text DEFAULT CURRENT_TIMESTAMP NOT NULL, | |
FOREIGN KEY (`username`) REFERENCES `users`(`username`) ON UPDATE no action ON DELETE no action, | |
FOREIGN KEY (`tag`) REFERENCES `tags`(`name`) ON UPDATE no action ON DELETE no action | |
); | |
CREATE TABLE `user_tag_scores` ( | |
`id` text PRIMARY KEY NOT NULL, | |
`username` text, | |
`tag` text, | |
`score` real DEFAULT 0 NOT NULL, | |
`level` integer DEFAULT 0 NOT NULL, | |
`progress` real DEFAULT 0 NOT NULL, | |
`points_to_next` real DEFAULT 0 NOT NULL, | |
`last_updated` text DEFAULT CURRENT_TIMESTAMP NOT NULL, | |
FOREIGN KEY (`username`) REFERENCES `users`(`username`) ON UPDATE no action ON DELETE no action, | |
FOREIGN KEY (`tag`) REFERENCES `tags`(`name`) ON UPDATE no action ON DELETE no action | |
); | |
CREATE UNIQUE INDEX `idx_user_tag_scores_unique` ON `user_tag_scores` (`username`, `tag`); | |
CREATE INDEX `idx_user_tag_scores_tag` ON `user_tag_scores` (`tag`); |
Summary by CodeRabbit
Release Notes
New Features
Database
Development
Documentation