You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fixed the problem in my latest pull request. But i'll leave this here for context if needed, and to get an answer on the question about aggregating.
I found a small inconsistency between the ECOD and COPOD decision functions in the repository and how they were explained in their corresponding papers.
The COPOD paper (arXiv:2009.09463v1) specifies: "We take the maximum of the negative log of the probability generated by the left tail empirical copula, right tail empirical copula and skewness corrected empirical copula to be the outlier score."
yet the code shows:
self.O = np.maximum(self.U_skew, np.add(self.U_l, self.U_r) / 2)
which would be an aggregation of the left and right ECDF.
the ECOD paper (arXiv:2201.00382v3) specifies: "we aggregate its tail probabilitieŝ F_left and F_right to come up with a final outlier score."
And a question regarding the aggregating. What is the benefit of average aggregating opposed to addition?
Why not use addition as the data is normalized afterwards anyway.
As for the outliers, if i have heavy left-tailed data, the calculated neglog of the left-tailed cdf will be much higher than the neglog of the right-tailed one, to which it would be negligible. It is also very similar to the skew correction
The text was updated successfully, but these errors were encountered:
fixed the problem in my latest pull request. But i'll leave this here for context if needed, and to get an answer on the question about aggregating.
I found a small inconsistency between the ECOD and COPOD decision functions in the repository and how they were explained in their corresponding papers.
The COPOD paper (arXiv:2009.09463v1) specifies: "We take the maximum of the negative log of the probability generated by the left tail empirical copula, right tail empirical copula and skewness corrected empirical copula to be the outlier score."
yet the code shows:
self.O = np.maximum(self.U_skew, np.add(self.U_l, self.U_r) / 2)
which would be an aggregation of the left and right ECDF.
the ECOD paper (arXiv:2201.00382v3) specifies: "we aggregate its tail probabilitieŝ F_left and F_right to come up with a final outlier score."
yet the code shows:
self.O = np.maximum(self.U_l, self.U_r)
self.O = np.maximum(self.U_skew, self.O)
which is the maximum between left, right and SC.
Could they have been switched at some point?
And a question regarding the aggregating. What is the benefit of average aggregating opposed to addition?
Why not use addition as the data is normalized afterwards anyway.
As for the outliers, if i have heavy left-tailed data, the calculated neglog of the left-tailed cdf will be much higher than the neglog of the right-tailed one, to which it would be negligible. It is also very similar to the skew correction
The text was updated successfully, but these errors were encountered: