Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: simsimd_cos_f32() 's performance is not as good as using for loop #245

Open
2 of 3 tasks
ChasenYang opened this issue Jan 11, 2025 · 2 comments
Open
2 of 3 tasks
Labels
bug Something isn't working

Comments

@ChasenYang
Copy link

Describe the bug

I am trying to optimize the Cosine method in my project. I use simsimd_cos_f32() to replace it and compare the time of the two. The following is the comparison code of the two solutions. You can see that the time is relatively close. Can someone tell me why, thanks a lot?

Steps to reproduce

code is here:

#include <chrono>
#include <iostream>
#include <random>
#include <vector>
#include <algorithm>
#include <simsimd/simsimd.h>
#include <immintrin.h>

using namespace std;

static float Cosine(const std::vector<float>& a, const std::vector<float>& b) {
    float sum = 0.0f;
    float norm1 = 0.0f;
    float norm2 = 0.0f;
    size_t dim = a.size();

    for (size_t i = 0; i < dim; i++) {
        float elem1 = a[i];
        float elem2 = b[i];
        sum += elem1 * elem2;
        norm1 += elem1 * elem1;
        norm2 += elem2 * elem2;
    }
    return sum / sqrt(norm1 * norm2);
}

static float SimdCosine2(const std::vector<float>& a,
                         const std::vector<float>& b) {
    simsimd_distance_t distance;
    // Cosine distance between two vectors
    simsimd_cos_f32(a.data(), b.data(), a.size(), &distance);
    return distance;
}

int main() {
    {
        // create Ramdon float value
        size_t count = 100000;
        size_t dim = 1000;
        size_t times = 1000000;
        vector<vector<float>> src(count);

        std::mt19937 generator;
        std::uniform_real_distribution<float> distribution(0, 1);
        std::uniform_int_distribution<int> idx_distribution(0, count-1);

        for (size_t i = 0; i < count; i++) {
            vector<float> tmp(dim);
            for (size_t d = 0; d < dim; d++) {
                tmp[d] = distribution(generator);
            }
            src[i] = tmp;
        }

        vector<std::pair<int, int>> idx(times);
        for (size_t i = 0; i < times; i++) {
            idx[i] = {idx_distribution(generator), idx_distribution(generator)};
        }
        {
            auto start = std::chrono::high_resolution_clock::now();
            for (size_t i = 0; i < times; i++) {
                Cosine(src[idx[i].first], src[idx[i].second]);
            }
            auto end = std::chrono::high_resolution_clock::now();
            std::chrono::duration<double> elapsed_seconds = end - start;
            std::cout << "Cosine cost:"
                      << std::chrono::duration_cast<std::chrono::milliseconds>(
                             end - start)
                             .count()
                      << "ms.\n";
        }
        {
            auto start = std::chrono::high_resolution_clock::now();
            for (size_t i = 0; i < times; i++) {
                SimdCosine2(src[idx[i].first], src[idx[i].second]);
            }
            auto end = std::chrono::high_resolution_clock::now();
            std::chrono::duration<double> elapsed_seconds = end - start;
            std::cout << "SimdCosine2 cost:"
                      << std::chrono::duration_cast<std::chrono::milliseconds>(
                             end - start)
                             .count()
                      << "ms.\n";
        }
    }
}

The compilation instructions are as follows

g++72 -O3 -fPIC -pthread -mavx -msse3 -mavx -msse4.2 -mavx2 -o simd simd.cpp

The execution results are as follows

Cosine cost:1814ms.
SimdCosine2 cost:1873ms.

Expected behavior

Expect simsimd_cos_f32 to be faster than the original solution

SimSIMD version

6.2.3

Operating System

CentOS Linux release 7.3.1611

Hardware architecture

x86

Which interface are you using?

C implementation

Contact Details

No response

Are you open to being tagged as a contributor?

  • I am open to being mentioned in the project .git history as a contributor

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct
@ChasenYang ChasenYang added the bug Something isn't working label Jan 11, 2025
@ashvardanian
Copy link
Owner

What's your CPU model? Can you please attach the lscpu output?

@ChasenYang
Copy link
Author

What's your CPU model? Can you please attach the lscpu output?

thanks for your reply, CPU is

Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants