Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create predictions only for non-labeled images #51

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

olgaliak
Copy link
Owner

@olgaliak olgaliak commented Feb 6, 2019

When we're doing Active Learning "cycle" model will be used to get predictions for bbox locations. Currently it's done even for images that's been already reviewed by human experts.
This change excludes those reviewed ("tagged") images.

Unit test has been added to test the change.
run_all_test.py also is passing.

@olgaliak olgaliak requested a review from yashpande February 6, 2019 06:46
@yashpande
Copy link
Collaborator

Hi Olga! Long time no see 😄 The only suggestion I would have here is that in train/create_predictions.py, line 158, you are doing all_images = np.concatenate((all_images,all_images_this_folder), axis=0). The np.concatenate allocates a new array with memory equal to the sum of the arrays it is concatenating, then makes a copy of all the data in both existing arrays into the new array. This makes concatenating one folder at a time a very time-and-memory-intensive task. Two ways of improving this would be:

  1. Instead of get_images_for_prediction returning an np array, it can simply return a list of filenames. Then, this list of filenames can keep being appended to, and once get_images_for_prediction has been run on every folder, you can simply allocate one np array with enough space for all the images from all the folders, then populate that np array. I would suggest this option, since it requires the least additional memory (keep in mind that np.concatente essentially requires twice as much memory because you need space for the existing arrays and the new ones. This sounds super stupid, but the whole reason I am creating np.zeros arrays in the first place instead of creating a list and then converting it into an np array is because I was facing out of memory errors when making a list first (even on the 128GB DSVM).

  2. Create a list of np arrays that you append to when you iterate through the folders i.e. all_images = [] then all_images.append(get_images_for_prediction(..)). Then at the end of the iteration you can do all_images = np.concatenate(all_images). This still takes twice as much memory, but it at least saves a lot of time because you only do the concatenation once instead of once for each folder.

Let me know if you have any questions/comments!

if (all_images == None):
all_images = all_images_this_folder
else:
all_images = np.concatenate((all_images,all_images_this_folder), axis=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest either getting rid of the np.concatenate or doing it at the end instead of at each folder.

@olgaliak olgaliak requested a review from gadoming February 12, 2019 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants