Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Εργασία DBMS 2022-2023 / Π20074, Π20199, Π20220 #225

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

vassilikikrg
Copy link

Pull Request from P20074,P20199,P20220

Για το issue 1 (15/50pts)

Οι τελεστές AND, OR, NOT, BETWEEN υποστηρίζονται για τις εκφράσεις τυπου select where και delete where.

  • NOT operator

Για την υλοποίηση του keyword not αλλάξαμε την συνάρτηση _parse_condition και σε περίπτωση που στη συνθήκη εμπεριέχεται το keyword not,το αφαιρούμε από την συνθήκη και ορίζουμε το flag not_condition ίσο με true. Στη περίπτωση αυτή η συνάρτηση επιστρέφει αντί για τον τελεστή op τoν αντίστροφο συντελεστή του op (reverse_op_not function in misc.py)

Παράδειγμα query:
select * from instructor where not name=wu

image

  • BETWEEN operator

Μέσα στην συνάρτηση _select/delete_where του table.py ελέγχουμε αν υπάρχει το string between μέσα στο condition και αν έχει την σωστή σύνταξη (column between value1 and value2).
Μετά, έχοντας πάρει απο την συνθηκη το upper και lower bound, καθως και την αντιστοιχη στήλη, κανουμε enumerate γραμμικά και διαλέγουμε τις γραμμές που ανταποκρίνονται στο if (in between) statement μας.

Παράδειγμα query:
select * from instructor where salary between 45000 and 80000

image

  • AND Operator:

Μέσα στην συνάρτηση _select/delete_where του table.py, διασπάμε την συνθήκη σε δύο ή παραπάνω (πανω στα “ and “) και για κάθε μία παίρνουμε τις γραμμές που τους αντιστοιχούν.
Μετά το loop, με χρηση του set() κρατάμε όσες εγγραφές βρίσκονται στο all_rows n φορες, οπου n ο αριθμός των διακριτών conditions, γιατί αυτό σημαίνει ότι το συγκεκριμένο row επαληθεύει ολες τις συνθήκες

Παράδειγμα εκτέλεσης query:

select * from instructor where not name=mozart and salary<70000

image

  • OR Operator:

Το or λειτουργεί με παρόμοιο τρόπο όπως το AND. Μαζεύει όλα τα rows που κάνουν match τα διακριτά conditions, και φιλτράρει τα διπλότυπα (οσα επαληθεύουν παραπάνω απο μια συνθήκη) κρατώντας μια φορά το συγκεκριμένο row.

Παράδειγμα εκτέλεσης query:

select * from instructor where name=mozart or salary>70000

image


Για το issue 2 (20/50pts)

Σημείωση:Καθώς πλέον υποστηρίζονται 2 τύποι ευρετηρίων, προσθέσαμε στο πίνακα meta_indexes μια επιπλέον στήλη με το όνομα index_type.

  • Υποστήριξη unique keyword (10/50)

Όταν στο create table statement δηλώσουμε οποιασήποτε στήλη ως unique, πχ

create table parents(id str primary key, name str , telephone str unique, email str unique)

image

και έπειτα προσθέσουμε κάποιες εγγραφές:

insert into parents values('11111','john smith','6977889900','[email protected]')
insert into parents values('22222','Ben Collins','6912222222','[email protected]')
insert into parents values('33333','Phil Green','6912224424','[[email protected]](mailto:[email protected])')

όταν επιχειρήσουμε να εισάγουμε εγγραφή με κάποια υπάρχουσα τιμή στη στήλη unique πχ

insert into parents values('44444','Rachel Green','6912224424','[email protected]')

image

παρατηρούμε πως η εγγραφή δεν εισάγεται.

Επίσης πλέον υποστηρίζεται η δημιουργία btree πάνω σε μία unique στήλη (στήλη πίνακα πάνω στην οποία θα φτιαχτεί το btree είναι η εξής: table(column)).
Παράδειγμα:

create index parentBtree on parents(email) using btree

image

image

Πλέον υπάρχει το ευρετήριο.
image

  • Hash index over the PK or a unique column of a table(10/50)

    Υλοποίηση της εκδοχής extendible hashing που βρίσκεται στο βιβλιο ‘Συστήματα Βάσεων Δεδομένων’ των Avi Silberschatz, Henry F. Korth, and S. Sudarshan (σελίδες 1197-1203)

Παράδειγμα εντολής query:

create index parentHashIndex on parents(email) using hash

image

Το ευρετήριο έχει πλέον δημιουργηθεί.
image

Επίσης πλέον υποστηρίζεται και η αναζήτηση μέσω hash index (συνάρτηση _select_where_with_hash in table.py) με την προϋπόθεση ότι η συνθήκη αναζήτησης είναι συνθήκη ισότητας.


Για το issue 3 (30/50pts)

  • 3a.equivalent query plans based on respective RA expressions (10/50)

Δημιουργήσαμε ένα module ( equivalentQueries.py) με το οποίο μετατρέπουμε το dictionary-επιστρεφόμενη τιμή της συνάρτησης interpret (mdb.py) μέσω των κανόνων ΣΑ σε ισοδύναμα queries (τα οποία είναι υπό την μορφή dictionaries)

Παράδειγμα εκτέλεσης:

equivalent of select * from classroom inner join department on building=building where classroom.building=Watson and classroom.capacity>30

image

Antonyfrtz and others added 25 commits January 16, 2023 14:59
- Checks if NOT keyword is present by splicing the condition on whitespace and evaluating
the number of items from the split. If there are more than one, not keyword is in the query
and we trim the 'not ' part of the condition, and reverse the operator with a modification of the built-in function
in the misc.py file
- Fixed operators in the misc.py implementation of reverse_op
- Added comments to sections of code
-In database.py file: we check if condition contains the keyword "between".
If so the condition has the format of "column between value1 and value2"
and thus we assign the column name to the condition_column value
-In table.py file: if condition contains the keyword "between", condition is splited into a list using whitespace as seperator.
If list lenght is different than 5 (correct format is:column between value1 and value2) or 'between' and 'and' aren't in the correct positions,exception is raised.
Else, column values are grouped into a list, and then we define a new list(named "rows") in which each value is greater than or to equal value1
and less than or equal to value2 (value1 and value2 are provided by the user in between statement).
In case of error,we raise ValueError because value1 and value2 are not valid.
+ started working on unique constraint

Other changes
-reworked NOT operation
-replaced removesuffix with replace function to be compatible with more py versions
-wrote new reverse operator function so it doesn't mess with the one built for join
-in table.py: added try-catch block for handling primary key exceptions
- AND statement now works for multiple ANDs and will not produce duplicate values
- fixed bug where unique keys would only be found if they were the last argument
- also changed removeprefix to replace
-added 'unique' keyword to create_table function of database.py
-fixed non initialized list self.unique_idx in table.py
-print '#UQ#' for unique columns ( function 'show' in table.py )
-in table.py: check if duplicate key value violates unique
constraint during the insertion of new values in any of the unique columns of a table
-in mdb.py: added 'index_column' keyword in create index statement that
handles both primary key and unique column cases
-in database.py:
1.Alteration of meta_indexes table (init function):
added index_colum column in order to save the column on which index is created
2.Alteration of the elif statement of select function that checks if database contains an index for the table and the column on which we perform the select query
3.Creation of btree index on both pk and unique columns
-> in create_index function: if index_column is not specified by the user,we set pk as the index column(in case there is a pk),
else we check that the provided index_column is a unique column.
we insert a record that also contains index_column to the meta_indexes table and we construct the index for the specified column
-> in _construct_index function: we create the nodes of the btree that contain all the values of the index_column
-> in _has_index function: added index_column to arguments (to be continued)
-in table.py:(_select_where_with_btree function) we find all unique column names and if the column in condition is not a primary key or a unique column,we abort the select
- Code cannot be tested until bug fixes for BTree are implemented - TBD
- Should check whether or not to use an index if select has no specified column
- Also need to handle PK case better
-table.py:(_insert function):fixed violation of unique columns constraint (no duplicates),
(function _select_where_with_btree): if the column in condition is not a primary key or a unique column, abort the btree select else continue searching using index over that column
-smallRelationsInsertFile.sql: changed create table statement for classroom table in order to test the creation of index over the unique column 'capacity'
- Created file hash.py
- Added classes and methods required
- Added h(key) code
Currently able to construct a hash index that uses buckets which are
assigned to by pointers in a hash_prefix dictionary
The values in the buckets are also held in a dictionary

- For database.py:
-    - Now supports index keyword of type hash in a similar fashion as
for btrees
-    - A construct_hash_index function has been created to instantiate the
hash class and save it to meta_indexes after inserting values

- For hash.py:
-    - Init now takes the key as an argument, and hash attributes are instantiated
here, alongside the required buckets
-    - Insert computes a binary value to put in the buckets
-    - Bucket class fields instantiated. Values passed into bucket as dictionary instead of list
to keep val-pointer combo

// TODO

Bucket splitting has not yet been implemented and will cause overflow
-split: Implemented the algorithm from the book (Database System Concepts,

Book by Avi Silberschatz, Henry F. Korth, and S. Sudarshan (pg. 1197-1203))
-database.py:
--added index_type column to meta_indexes table(init function)
--fixed bug in create_index function + altered the insert statement to
meta_indexes table when creating a new index(btree or hash)
in order to keep the index type
--altered _has_index function in order to return the index name and its type
-in hash.py:
--fixed some bugs,implemented show and search function of Hash Class and
added find function to Bucket class also added comments
-database.py:
---modifications in select function in order to support the select using either btree or hash index based on the condition
---returned has_index function to its original state
-hash.py: added find function that returns the pointer of the given value
-table.py: implemented select_where_with_hash function that can only be used in case of equality condition
-- Between case is built in to select_where_with_btree
-- AND/OR operators are forced to use linear search as checking the column
for each condition and doing selection would require a massive overhaul
- Rules tested and operational
- Started work on recursive creation of equivalentQueries list
- Equivalent queries can now be found using the equivalentQueries function in mdb.py
- Call this by typing "equivalent of <QUERY>" in the miniDB mdb.py shell

- Fixed bugs in the equivalentQueries function, avoiding cycles in the recursive calls
- Fixed bugs in the equivalentQueries function, avoiding infinite loops in the recursive calls
- Fixed bugs in the equivalentQueries function, relating to the rules for equivalent queries
- delete_where() now supports AND, OR, and NOT operators
Note: this commit has been tested but not thoroughly. Please report any bugs.
In case this messes up other things, please revert to the previous commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants