Εργασία DBMS 2022-2023 / Π20074, Π20199, Π20220 #225

vassilikikrg · 2023-02-20T21:57:58Z

Pull Request from P20074,P20199,P20220

Για το issue 1 (15/50pts)

Οι τελεστές AND, OR, NOT, BETWEEN υποστηρίζονται για τις εκφράσεις τυπου select where και delete where.

NOT operator

Για την υλοποίηση του keyword not αλλάξαμε την συνάρτηση _parse_condition και σε περίπτωση που στη συνθήκη εμπεριέχεται το keyword not,το αφαιρούμε από την συνθήκη και ορίζουμε το flag not_condition ίσο με true. Στη περίπτωση αυτή η συνάρτηση επιστρέφει αντί για τον τελεστή op τoν αντίστροφο συντελεστή του op (reverse_op_not function in misc.py)

Παράδειγμα query:
select * from instructor where not name=wu

BETWEEN operator

Μέσα στην συνάρτηση _select/delete_where του table.py ελέγχουμε αν υπάρχει το string between μέσα στο condition και αν έχει την σωστή σύνταξη (column between value1 and value2).
Μετά, έχοντας πάρει απο την συνθηκη το upper και lower bound, καθως και την αντιστοιχη στήλη, κανουμε enumerate γραμμικά και διαλέγουμε τις γραμμές που ανταποκρίνονται στο if (in between) statement μας.

Παράδειγμα query:
select * from instructor where salary between 45000 and 80000

AND Operator:

Μέσα στην συνάρτηση _select/delete_where του table.py, διασπάμε την συνθήκη σε δύο ή παραπάνω (πανω στα “ and “) και για κάθε μία παίρνουμε τις γραμμές που τους αντιστοιχούν.
Μετά το loop, με χρηση του set() κρατάμε όσες εγγραφές βρίσκονται στο all_rows n φορες, οπου n ο αριθμός των διακριτών conditions, γιατί αυτό σημαίνει ότι το συγκεκριμένο row επαληθεύει ολες τις συνθήκες

Παράδειγμα εκτέλεσης query:

select * from instructor where not name=mozart and salary<70000

OR Operator:

Το or λειτουργεί με παρόμοιο τρόπο όπως το AND. Μαζεύει όλα τα rows που κάνουν match τα διακριτά conditions, και φιλτράρει τα διπλότυπα (οσα επαληθεύουν παραπάνω απο μια συνθήκη) κρατώντας μια φορά το συγκεκριμένο row.

Παράδειγμα εκτέλεσης query:

select * from instructor where name=mozart or salary>70000

Για το issue 2 (20/50pts)

Σημείωση:Καθώς πλέον υποστηρίζονται 2 τύποι ευρετηρίων, προσθέσαμε στο πίνακα meta_indexes μια επιπλέον στήλη με το όνομα index_type.

Υποστήριξη unique keyword (10/50)

Όταν στο create table statement δηλώσουμε οποιασήποτε στήλη ως unique, πχ

create table parents(id str primary key, name str , telephone str unique, email str unique)

και έπειτα προσθέσουμε κάποιες εγγραφές:

insert into parents values('11111','john smith','6977889900','[email protected]')
insert into parents values('22222','Ben Collins','6912222222','[email protected]')
insert into parents values('33333','Phil Green','6912224424','[[email protected]](mailto:[email protected])')

όταν επιχειρήσουμε να εισάγουμε εγγραφή με κάποια υπάρχουσα τιμή στη στήλη unique πχ

insert into parents values('44444','Rachel Green','6912224424','[email protected]')

παρατηρούμε πως η εγγραφή δεν εισάγεται.

Επίσης πλέον υποστηρίζεται η δημιουργία btree πάνω σε μία unique στήλη (στήλη πίνακα πάνω στην οποία θα φτιαχτεί το btree είναι η εξής: table(column)).
Παράδειγμα:

create index parentBtree on parents(email) using btree

Πλέον υπάρχει το ευρετήριο.

Hash index over the PK or a unique column of a table(10/50)

Υλοποίηση της εκδοχής extendible hashing που βρίσκεται στο βιβλιο ‘Συστήματα Βάσεων Δεδομένων’ των Avi Silberschatz, Henry F. Korth, and S. Sudarshan (σελίδες 1197-1203)

Παράδειγμα εντολής query:

create index parentHashIndex on parents(email) using hash

Το ευρετήριο έχει πλέον δημιουργηθεί.

Επίσης πλέον υποστηρίζεται και η αναζήτηση μέσω hash index (συνάρτηση _select_where_with_hash in table.py) με την προϋπόθεση ότι η συνθήκη αναζήτησης είναι συνθήκη ισότητας.

Για το issue 3 (30/50pts)

3a.equivalent query plans based on respective RA expressions (10/50)

Δημιουργήσαμε ένα module ( equivalentQueries.py) με το οποίο μετατρέπουμε το dictionary-επιστρεφόμενη τιμή της συνάρτησης interpret (mdb.py) μέσω των κανόνων ΣΑ σε ισοδύναμα queries (τα οποία είναι υπό την μορφή dictionaries)

Παράδειγμα εκτέλεσης:

equivalent of select * from classroom inner join department on building=building where classroom.building=Watson and classroom.capacity>30

- Checks if NOT keyword is present by splicing the condition on whitespace and evaluating the number of items from the split. If there are more than one, not keyword is in the query and we trim the 'not ' part of the condition, and reverse the operator with a modification of the built-in function in the misc.py file - Fixed operators in the misc.py implementation of reverse_op - Added comments to sections of code

-In database.py file: we check if condition contains the keyword "between". If so the condition has the format of "column between value1 and value2" and thus we assign the column name to the condition_column value -In table.py file: if condition contains the keyword "between", condition is splited into a list using whitespace as seperator. If list lenght is different than 5 (correct format is:column between value1 and value2) or 'between' and 'and' aren't in the correct positions,exception is raised. Else, column values are grouped into a list, and then we define a new list(named "rows") in which each value is greater than or to equal value1 and less than or equal to value2 (value1 and value2 are provided by the user in between statement). In case of error,we raise ValueError because value1 and value2 are not valid.

+ started working on unique constraint Other changes -reworked NOT operation -replaced removesuffix with replace function to be compatible with more py versions -wrote new reverse operator function so it doesn't mess with the one built for join -in table.py: added try-catch block for handling primary key exceptions

- AND statement now works for multiple ANDs and will not produce duplicate values - fixed bug where unique keys would only be found if they were the last argument - also changed removeprefix to replace

-added 'unique' keyword to create_table function of database.py -fixed non initialized list self.unique_idx in table.py -print '#UQ#' for unique columns ( function 'show' in table.py )

-in table.py: check if duplicate key value violates unique constraint during the insertion of new values in any of the unique columns of a table

-in mdb.py: added 'index_column' keyword in create index statement that handles both primary key and unique column cases -in database.py: 1.Alteration of meta_indexes table (init function): added index_colum column in order to save the column on which index is created 2.Alteration of the elif statement of select function that checks if database contains an index for the table and the column on which we perform the select query 3.Creation of btree index on both pk and unique columns -> in create_index function: if index_column is not specified by the user,we set pk as the index column(in case there is a pk), else we check that the provided index_column is a unique column. we insert a record that also contains index_column to the meta_indexes table and we construct the index for the specified column -> in _construct_index function: we create the nodes of the btree that contain all the values of the index_column -> in _has_index function: added index_column to arguments (to be continued) -in table.py:(_select_where_with_btree function) we find all unique column names and if the column in condition is not a primary key or a unique column,we abort the select

- Code cannot be tested until bug fixes for BTree are implemented - TBD

- Should check whether or not to use an index if select has no specified column - Also need to handle PK case better

-table.py:(_insert function):fixed violation of unique columns constraint (no duplicates), (function _select_where_with_btree): if the column in condition is not a primary key or a unique column, abort the btree select else continue searching using index over that column -smallRelationsInsertFile.sql: changed create table statement for classroom table in order to test the creation of index over the unique column 'capacity'

- Created file hash.py - Added classes and methods required - Added h(key) code

Currently able to construct a hash index that uses buckets which are assigned to by pointers in a hash_prefix dictionary The values in the buckets are also held in a dictionary - For database.py: - - Now supports index keyword of type hash in a similar fashion as for btrees - - A construct_hash_index function has been created to instantiate the hash class and save it to meta_indexes after inserting values - For hash.py: - - Init now takes the key as an argument, and hash attributes are instantiated here, alongside the required buckets - - Insert computes a binary value to put in the buckets - - Bucket class fields instantiated. Values passed into bucket as dictionary instead of list to keep val-pointer combo // TODO Bucket splitting has not yet been implemented and will cause overflow

-split: Implemented the algorithm from the book (Database System Concepts, Book by Avi Silberschatz, Henry F. Korth, and S. Sudarshan (pg. 1197-1203))

-database.py: --added index_type column to meta_indexes table(init function) --fixed bug in create_index function + altered the insert statement to meta_indexes table when creating a new index(btree or hash) in order to keep the index type --altered _has_index function in order to return the index name and its type -in hash.py: --fixed some bugs,implemented show and search function of Hash Class and added find function to Bucket class also added comments

-database.py: ---modifications in select function in order to support the select using either btree or hash index based on the condition ---returned has_index function to its original state -hash.py: added find function that returns the pointer of the given value -table.py: implemented select_where_with_hash function that can only be used in case of equality condition

-- Between case is built in to select_where_with_btree -- AND/OR operators are forced to use linear search as checking the column for each condition and doing selection would require a massive overhaul

…s in condition

- Rules tested and operational - Started work on recursive creation of equivalentQueries list

- Equivalent queries can now be found using the equivalentQueries function in mdb.py - Call this by typing "equivalent of <QUERY>" in the miniDB mdb.py shell - Fixed bugs in the equivalentQueries function, avoiding cycles in the recursive calls - Fixed bugs in the equivalentQueries function, avoiding infinite loops in the recursive calls - Fixed bugs in the equivalentQueries function, relating to the rules for equivalent queries

- delete_where() now supports AND, OR, and NOT operators Note: this commit has been tested but not thoroughly. Please report any bugs. In case this messes up other things, please revert to the previous commit.

Antonyfrtz and others added 25 commits January 16, 2023 14:59

between: work in progress

52fddcf

Changed ValueError to TypeError

8f10753

Bug Fixes for unique and AND statement

db52147

- AND statement now works for multiple ANDs and will not produce duplicate values - fixed bug where unique keys would only be found if they were the last argument - also changed removeprefix to replace

General bug fixes related to unique

a6775f2

-added 'unique' keyword to create_table function of database.py -fixed non initialized list self.unique_idx in table.py -print '#UQ#' for unique columns ( function 'show' in table.py )

CREATE TABLE statement enriched with 'unique' kw

de103f3

-in table.py: check if duplicate key value violates unique constraint during the insertion of new values in any of the unique columns of a table

Worked on BTree Index checking

7da0b33

- Code cannot be tested until bug fixes for BTree are implemented - TBD

hasIndex now operational and tested

e9c9996

- Should check whether or not to use an index if select has no specified column - Also need to handle PK case better

Created base template for hash index

b570a7d

- Created file hash.py - Added classes and methods required - Added h(key) code

Basic implementation of hash index split function:

abecf96

-split: Implemented the algorithm from the book (Database System Concepts, Book by Avi Silberschatz, Henry F. Korth, and S. Sudarshan (pg. 1197-1203))

Btree select can handle all cases

ebf13bc

-- Between case is built in to select_where_with_btree -- AND/OR operators are forced to use linear search as checking the column for each condition and doing selection would require a massive overhaul

fixed small bug on btree index creation

b6bf303

fixed bug in _select_where_with_btree function when between keyword i…

05a3eb4

…s in condition

Created basic rules for building equivalent query plans

1485de2

Finished RA equivalence transformation rules

cffa6ac

- Rules tested and operational - Started work on recursive creation of equivalentQueries list

AND, OR, and NOT operators supported in delete

32cdb0c

- delete_where() now supports AND, OR, and NOT operators Note: this commit has been tested but not thoroughly. Please report any bugs. In case this messes up other things, please revert to the previous commit.

Update README.md

54f21bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Εργασία DBMS 2022-2023 / Π20074, Π20199, Π20220 #225

Εργασία DBMS 2022-2023 / Π20074, Π20199, Π20220 #225

vassilikikrg commented Feb 20, 2023

Εργασία DBMS 2022-2023 / Π20074, Π20199, Π20220 #225

Are you sure you want to change the base?

Εργασία DBMS 2022-2023 / Π20074, Π20199, Π20220 #225

Conversation

vassilikikrg commented Feb 20, 2023