-
Notifications
You must be signed in to change notification settings - Fork 49
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faces recognised but not shown in People #662
Comments
That would indicate problems with the clustering algorithm. It sounds like it doesn't run at all, though, because it should create at least some clusters, IMO. |
Like I said, it did create one cluster, with 12 faces. Those were all the same person, too, so what work it did, it did correctly. But just to clarify, the clustering algorithm should be able to create clusters with even a single face? There's no minimum number of faces that need to match before a cluster is created? Edit: is there any way to run just the clustering job from the command line? With verbosity turned up? |
There is a minimum. It's currently at 6 detections.
not at the moment, you can manually create an entry in oc_jobs in your database, though, for \OCA\Recgonize\BackgroundJobs\ClusterFacesJob with argument |
I see. That might explain why faces from my photos aren't showing up. Can I ask why this minimum? And is there any way to change it? |
@IndrekHaav The clustering algorithm needs a few hyperparameters to work well. From testing it transpired that setting a minimum cluster size improves clustering because it prevents accidental face matches to agglomerate into larger clusters that don't represent a single person (Something I like to call shit clusters). I recommend trying out v3.5.0 first to see if that improves the situation for you, since we're shipping a new clustering algorithm with that release. If that doesn't help and you're adventurous, you can change the min cluster size constant here:
(We've reduced the value from 6 to 5 in v3.5.0 now) In v3.5.0 there are now convenience occ command for resetting clustering and running clustering manually: |
Thanks for the response! I tried the new algorithm in 3.5.0. Just to be safe, I wiped all detected faces and clusters from the DB and triggered a full re-crawl. This time, it created a few more people, but the vast majority of faces were put into a single cluster, seemingly almost randomly. I retried this a few times and also reran the In other words, a shit cluster. Having incorrect faces in the cluster wouldn't be so bad as they can be removed in the UI, but the problem is that there's no way (at least that I could see, in the Photos or Memories apps) to move photos from one person to a new person. One can only move them to an existing person, or remove them from the cluster completely. I tried the latter, but subsequent I tried changing While I'm messing with the code, is there another constant or parameter that determines how similar the faces have to be to get clustered together? |
How many images do you have?
that would be a bug
yeah, that makes sense
There is no constant value that governs this. HDBSCAN is an adaptive algorithm that learns from the density patterns of the data. The more files you have, the better the outcome. You could also try playing with MIN_SAMPLE_SIZE. |
Hello, I don't have any "peoples" in Photos. and the I had more than 2500 detections
Oh, it looks more like this problem #676 |
I did try setting |
Like @marcelklehr already commented, HDBSCAN, as it is implemented, will try to find the most stable clusters from the data regardless of their size. If the data contains a large number of identities (especially multiple identities with fewer face detections than MIN_CLUSTER_SIZE) it'll still try to find the most stable clusters in the data. This can lead to combining multiple identities of similar looking persons. The easiest way to alleviate this issue is to scan in a larger dataset. Mind you, "similar looking" to the face recognition model may not always be similar looking to you or me. This is especially true in the case of children/infants. The dlib face recognition model hasn't been trained with images of children so they will cause trouble with clustering regardless of the clustering algorithm. (IIRC, Photoprism, for example, had a community effort to retrain their recognition model with datasets containing images of children.) MIN_SAMPLE_SIZE is basically a probability density (i.e. "face detection density") smoothing factor used by HDBSCAN. Reducing this value too much can lead to statistical noise causing issues with larger datasets. Still, it might be that we'll have to reduce this value going forward; the optimal value in my test dataset may not be optimal for all users. Also, the optimal value will depend, to some degree, on how incremental clustering is implemented as that affects the amount of noise in the data that is being clustered. @marcelklehr : Besides fine tuning MIN_SAMPLE_SIZE, another way to improve the clustering in this case might be to implement a limit on the maximum size of a cluster. The obvious, but likely(?) not optimal, solution would be to limit the radius of a face cluster. However, limiting the maximum edge length within a cluster (this was implemented in a previous version of the MstClusterer-class but I stripped it since it was not used) may be a better solution since this will specifically limit forming clusters in sparse areas of the face embedding space (since the mutual reachability distance will also be large in these areas). If the latter is implemented, it may help us get away with a larger MIN_SAMPLE_SIZE which is better for users with larger datasets. It may also be that a combination of both of these limits would provide the best user experience. |
@MB-Finski Thanks for the info, that's an interesting read! However, coming back to the original issue - irrespective of the way the clustering algorithm works, I think there should be a way for the user to review recognised but unclusters faces and, for each one, choose between "not a person" (don't suggest again), "merge with ______" (pick existing cluster) or "new person" (create new cluster). Or is that something that should be handled by another app like Photos or Memories? |
I agree with @IndrekHaav, be able to create a new person is missing when trying to filter out false positive. |
I cannot reproduce this. For me removed face detections are not readded to the same cluster anymore. |
@marcelklehr For me, every time the clustering job ran, the same faces kept getting added to the same person. I ended up deleting the detected faces from the DB, because they were faces I wasn't interested in anyway (random background people, and such). How would this work anyway? Does the app keep track of clusters that a face has been removed from in some way? |
@MB-Finski If you're up for implementing that, I'm happy to merge a pull request (let me know if you need help with git). |
When removing a face from a cluster we store the distance from the cluster centroid along with the face and in the future only add it to a cluster if the distance to the cluster centroid is smaller. |
@MB-Finski I wonder if it would make sense to fall back to DBSCAN clustering for photo collections smaller than x photos, as HDBSCAN results are pretty wild on smaller collections. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Which version of recognize are you using?
3.3.6
Enabled Modes
Face recognition
TensorFlow mode
Normal mode
Which Nextcloud version do you have installed?
25.0.3
Which Operating system do you have installed?
Debian 11
Which Docker container are you using to run Nextcloud? (if applicable)
N/A
How much RAM does your server have?
4G
What processor Architecture does your CPU have?
x86_64
Describe the Bug
I have about 200 photos uploaded to my Nextcloud instance, most of those containing faces. Recognize has processed them all, but only a single person shows up in the People section (of both Photos and Memories apps), with 12 photos.
I don't think this is the same as #588. I have checked the database, and the
oc_recognize_face_detections
table has 237 records, whileoc_recognize_face_clusters
only has one. Furthermore, if I manually insert a record into the clusters table and then link a record in the detections table to it, it shows up as a new person. I don't understand why the majority of the detected faces are not added to a cluster.The same photos, when imported into PhotoPrism (which also uses Tensorflow) resulted in every single detected face showing up.
Expected Behavior
All detected faces should appear in the People section. As individual clusters, if needed, so they can be merged manually.
To Reproduce
Automatic run with standard settings. Results might depend on content, but most of the photos I have uploaded are high-quality.
Debug log
No response
The text was updated successfully, but these errors were encountered: