Bad Words Detector

A project given to FCI 2021-2025 Suez University Class to test their capabilities in the subject.

Subject: CS342 Automata and Language Theory

Usage

Detect bad words from a .csv file compressed in a .rar format using the provided Bad_Words.csv file and output both excel and .csv files that have analytics about the process.

Documentation

Required Packages

Python 3.10 or newer

pandas

pyahocorasick

rarfile

openpyxl

pytest

How To Use

First method

1- Clone the repo ( using HTTPS or SSH ) and run it through any IDE you like

2- Navigate to the args.json file and put your .csv file in .rar compression format and pass the path into the data_file section

3- Navigate then to the main.py file and run the file

Screenshots

Second method

1- Clone the repo ( using HTTPS or SSH ) and navigate to the project's folder with your terminal

git clone https://github.com/GreenVenom77/Bad_Words_Detector.git

cd Bad_Words_Detector

2- Run the help command to see the arguments

python main.py -h

3- Run the program using the command below

python main.py -d './46,080,374Rows_365Columns.rar' -b './BadWords.csv' -s 150000 -f 'AhoCorasick' -p 'ProcessesPool' -c '1,2,3'

Help menu

usage: Bad Words Filter App [-h] -d DATA_FILE -b BAD_WORDS_FILE [-s CHUNK_SIZE] [-f {Regex,AhoCorasick}]
                            [-p {MultiThreading,MultiProcessing,ProcessesPool}] [-c COLUMNS]

filter the specified columns from a big compressed csv file the bad words rows.

options:
  -h, --help            show this help message and exit
  -d DATA_FILE, --data_file DATA_FILE
                        The csv file that we will filter
  -b BAD_WORDS_FILE, --bad_words_file BAD_WORDS_FILE
                        The name of bad words file
  -s CHUNK_SIZE, --chunk_size CHUNK_SIZE
                        The chunk size will be processed
  -f {Regex,AhoCorasick}, --filter_mode {Regex,AhoCorasick}
                        The mode of filtering.
  -p {MultiThreading,MultiProcessing,ProcessesPool}, --processing_mode {MultiThreading,MultiProcessing,ProcessesPool}        
                        the concurrent model that will work
  -c COLUMNS, --columns COLUMNS
                        specified columns that will be filtered in format column1,column... like 1,2,3,4

Community

Any contribution is very welcomed even if it's a small one.

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.assets		.assets
.github/workflows		.github/workflows
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
BadWords.csv		BadWords.csv
Enums.py		Enums.py
LICENSE		LICENSE
README.md		README.md
UnRAR.exe		UnRAR.exe
args.json		args.json
arguments.py		arguments.py
chunks_processing_info.py		chunks_processing_info.py
concurrent_model.py		concurrent_model.py
consumer.py		consumer.py
filter.py		filter.py
main.py		main.py
producer.py		producer.py
requirements.txt		requirements.txt
statistics_writer.py		statistics_writer.py
visualization.py		visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bad Words Detector

Usage

Documentation

Required Packages

How To Use

First method

1- Clone the repo ( using HTTPS or SSH ) and run it through any IDE you like

2- Navigate to the args.json file and put your .csv file in .rar compression format and pass the path into the data_file section

3- Navigate then to the main.py file and run the file

Screenshots

Second method

1- Clone the repo ( using HTTPS or SSH ) and navigate to the project's folder with your terminal

2- Run the help command to see the arguments

3- Run the program using the command below

Help menu

Community

About

Releases

Packages

Contributors 5

Languages

License

FCI-Suez-2021-2025/Bad-Words-Detector

Folders and files

Latest commit

History

Repository files navigation

Bad Words Detector

Usage

Documentation

Required Packages

How To Use

First method

1- Clone the repo ( using HTTPS or SSH ) and run it through any IDE you like

2- Navigate to the args.json file and put your .csv file in .rar compression format and pass the path into the data_file section

3- Navigate then to the main.py file and run the file

Screenshots

Second method

1- Clone the repo ( using HTTPS or SSH ) and navigate to the project's folder with your terminal

2- Run the help command to see the arguments

3- Run the program using the command below

Help menu

Community

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages