Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High accuracy variance during the training with SGDClassifier #678

Open
mrpositron opened this issue May 11, 2024 · 1 comment
Open

High accuracy variance during the training with SGDClassifier #678

mrpositron opened this issue May 11, 2024 · 1 comment

Comments

@mrpositron
Copy link

mrpositron commented May 11, 2024

So hello!

I have been training the SGDClassifier with fhe=simulate. I trained the model with different random seeds and collected the results. I found that the standard deviation (for accuracy) is much higher than in sklearn implementation. When sklearn produced 1-2% variance, concrete-ml produced 5-10% variance in results. Is it possible to somehow reduce the variance?

Here is the snipped of my code

scaler = MinMaxScaler(feature_range=(-1, 1))
x_train = scaler.fit_transform(x_train.copy())
x_test = scaler.transform(x_test.copy())
for i in range(n_runs):
        sgd_clf = SGDClassifier(
            n_bits=8,
            random_state=random_state + i,
            max_iter=n_iterations,
            fit_encrypted=True,
            parameters_range=(-1.0, 1.0),
            verbose=False,
        )
        
        sgd_clf.fit(x_train, y_train, fhe="simulate")
        sgd_clf.compile(x_train)
        y_pred = sgd_clf.predict(x_test, fhe="simulate")
        # collect all results ...
@andrei-stoian-zama
Copy link
Collaborator

I think your observation is correct. It may be possible to increase by one the n_bits value (and associated rounding_treshold_bits) in the logistic regression training - this could help get better results but the run-time will be worse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants