High accuracy variance during the training with SGDClassifier #678

mrpositron · 2024-05-11T10:59:19Z

So hello!

I have been training the SGDClassifier with fhe=simulate. I trained the model with different random seeds and collected the results. I found that the standard deviation (for accuracy) is much higher than in sklearn implementation. When sklearn produced 1-2% variance, concrete-ml produced 5-10% variance in results. Is it possible to somehow reduce the variance?

Here is the snipped of my code

scaler = MinMaxScaler(feature_range=(-1, 1))
x_train = scaler.fit_transform(x_train.copy())
x_test = scaler.transform(x_test.copy())
for i in range(n_runs):
        sgd_clf = SGDClassifier(
            n_bits=8,
            random_state=random_state + i,
            max_iter=n_iterations,
            fit_encrypted=True,
            parameters_range=(-1.0, 1.0),
            verbose=False,
        )
        
        sgd_clf.fit(x_train, y_train, fhe="simulate")
        sgd_clf.compile(x_train)
        y_pred = sgd_clf.predict(x_test, fhe="simulate")
        # collect all results ...

The text was updated successfully, but these errors were encountered:

andrei-stoian-zama · 2024-05-16T06:53:33Z

I think your observation is correct. It may be possible to increase by one the n_bits value (and associated rounding_treshold_bits) in the logistic regression training - this could help get better results but the run-time will be worse.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High accuracy variance during the training with SGDClassifier #678

High accuracy variance during the training with SGDClassifier #678

mrpositron commented May 11, 2024 •

edited

andrei-stoian-zama commented May 16, 2024

High accuracy variance during the training with SGDClassifier #678

High accuracy variance during the training with SGDClassifier #678

Comments

mrpositron commented May 11, 2024 • edited

andrei-stoian-zama commented May 16, 2024

mrpositron commented May 11, 2024 •

edited