Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Balance Pool Group data for Cardano #145

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 4 additions & 7 deletions consensus_decentralization/helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ def get_pool_identifiers(project_name):
or an empty dictionary if no information is available for the project (the relevant file does not exist)
"""
try:
with open(MAPPING_INFO_DIR / f'identifiers/{project_name}.json') as f:
with open(MAPPING_INFO_DIR / f'identifiers/{project_name}.json', encoding='utf-8') as f:
identifiers = json.load(f)
except FileNotFoundError:
return dict()
Expand Down Expand Up @@ -174,16 +174,13 @@ def write_blocks_per_entity_to_file(output_dir, blocks_per_entity, time_chunks,
:param time_chunks: a list of strings corresponding to the chunks of time that were analyzed
:param filename: the name to be given to the produced file.
"""
with open(output_dir / filename, 'w', newline='') as f:
with open(output_dir / filename, 'w', newline='', encoding='utf-8') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(['Entity \\ Time period'] + time_chunks) # write header
for entity, blocks_per_chunk in blocks_per_entity.items():
entity_row = [entity]
for chunk in time_chunks:
try:
entity_row.append(blocks_per_chunk[chunk])
except KeyError:
entity_row.append(0)
entity_row.append(blocks_per_chunk.get(chunk, 0))
csv_writer.writerow(entity_row)


Expand All @@ -196,7 +193,7 @@ def get_blocks_per_entity_from_file(filepath):
dictionary with entities (keys) and a list of the number of blocks they produced during each time chunk (values)
"""
blocks_per_entity = defaultdict(dict)
with open(filepath, newline='') as f:
with open(filepath, newline='', encoding='utf-8') as f: # Specify encoding to prevent UnicodeDecodeError
csv_reader = csv.reader(f)
header = next(csv_reader, None)
time_chunks = header[1:]
Expand Down
9 changes: 8 additions & 1 deletion consensus_decentralization/metrics/gini.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,11 @@ def gini(array):
array = np.sort(array)
index = np.arange(1, array.shape[0] + 1)
n = array.shape[0]
return (np.sum((2 * index - n - 1) * array)) / (n * np.sum(array))
# Normalize the array to prevent overflow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these changes are fine, but there are two tests that fail because of floating point errors that are produced here. One option is to update the tests to use the new values, as they are not exactly wrong values. Another option would be to round the Gini coefficient (and perhaps all other metrics too) to some fixed number of decimals, e.g. 5, and then use the same number of digits when testing

sum_array = np.sum(array)
normalized_array = array / sum_array
# Calculate the Gini coefficient using the normalized array
gini_numerator = np.sum((2 * index - n - 1) * normalized_array)
# No need to multiply by sum_array as it would cancel out in the division
gini_coefficient = gini_numerator / n
return gini_coefficient
Loading