BUG: DatetimeIndex is_year_start breaks on BusinessMonthBegin frequency #58691

natmokval · 2024-05-12T21:59:19Z

closes BUG: DatetimeIndex.is_year_start breaks on BusinessMonthBegin frequency #58729
Tests added and passed
Added an entry in the latest `doc/source/whatsnew/v3.0.0.rst

…eq-BusinessMonthStart

simonjayhawkins · 2024-05-15T10:46:53Z

Thanks @natmokval for the PR.

Should the OP be an issue instead.? or is there already an open issue for this?

natmokval · 2024-05-15T11:53:19Z

Should the OP be an issue instead.? or is there already an open issue for this?

thanks @simonjayhawkins for the comment. No, there is no open issue for this.
It's my mistake; I should have opened an issue first. Now I've opened the issue for this PR.

…eq-BusinessMonthStart

natmokval · 2024-06-04T14:37:15Z

Fixed bug in DatetimeIndex.is_year_start and DatetimeIndex.is_quarter_start returning False instead of True when using BusinessMonthBegin frequency. @MarcoGorelli, could you please take a look at this PR?

MarcoGorelli · 2024-06-06T08:50:13Z

thanks for looking into this

not sure this fixes the issue to be honest:

In [2]: pd.DatetimeIndex(['1976-11-01', '1976-12-01', '1977-01-03'], freq='BMS')
Out[2]: DatetimeIndex(['1976-11-01', '1976-12-01', '1977-01-03'], dtype='datetime64[s]', freq='BMS')

In [3]: arr = pd.DatetimeIndex(['1976-11-01', '1976-12-01', '1977-01-03'], freq='BMS')

In [4]: arr.is_year_start
Out[4]: array([False, False,  True])

In [5]: [x.is_year_start for x in arr]
Out[5]: [False, False, False]

We'd expect Out[4] and Out[5] to be the same, right?

Here's a little hypothesis test I put together:

import pytest
import hypothesis.strategies as st
from hypothesis import given
import pandas as pd
from datetime import datetime


@pytest.mark.parametrize('freq', [
    'MS',
    'BMS',
])
@given(dt = st.datetimes(min_value=datetime(1960, 1, 1), max_value=datetime(1980, 1, 1)))
def test_me(freq, dt):
    d = pd.date_range(dt, periods=3, freq=freq)
    result = [x for x in d.is_year_start]
    expected = [x.is_year_start for x in d]
    assert result == expected

Granted, it looks like this is already super-broken on main anyway, but I think this requires a more general and a very careful and focused fix

MarcoGorelli · 2024-06-06T09:31:33Z

Sorry, as an addendum to my previous comment: Timestamp doesn't have freq, so the two results can't be expected to be the same

If I amend the hypothesis test to be:

@given(
    dt = st.datetimes(min_value=datetime(1960, 1, 1), max_value=datetime(1980, 1, 1)),
    n = st.integers(min_value=1, max_value=10),
    freq = st.sampled_from(['BMS'])
)
def test_me(freq, dt, n):
    freq = f'{n}{freq}'
    d = pd.date_range(dt, periods=3, freq=freq)
    result = [x for x in d.is_year_start]
    expected = []
    for x in d:
        if x.is_year_start:
            expected.append(True)
        else:
            if x.day_of_week == 0 and (x - pd.Timedelta(days=1)).is_year_start:
                expected.append(True)
            elif x.day_of_week == 0 and (x - pd.Timedelta(days=2)).is_year_start:
                expected.append(True)
            else:
                expected.append(False)
    assert result == expected

then it does indeed pass

I think we should get #57494 in first though, as I think that that also fixes the issues "for free" (but then we keep the test from this one)

natmokval · 2024-06-06T14:12:17Z

I think we should get #57494 in first though, as I think that that also fixes the issues "for free" (but then we keep the test from this one)

thanks, I will update this PR after we get the #57494 in.

natmokval added 3 commits May 12, 2024 23:58

bug-DatetimeIndex-is_year_start-breaks-on-freq-BusinessMonthStart

346457d

correct def get_start_end_field

6197e69

Merge branch 'main' into bug-DatetimeIndex-is_year_start-breaks-on-fr…

7761989

…eq-BusinessMonthStart

natmokval added 2 commits June 3, 2024 18:01

Merge branch 'main' into bug-DatetimeIndex-is_year_start-breaks-on-fr…

f8a5f83

…eq-BusinessMonthStart

fixup

a9f4bb3

natmokval marked this pull request as ready for review June 4, 2024 09:41

natmokval requested a review from MarcoGorelli as a code owner June 4, 2024 09:41

natmokval added Bug Frequency DateOffsets labels Jun 4, 2024

natmokval changed the title ~~BUG: DatetimeIndex is_year_start breaks on freq BusinessMonthStart~~ BUG: DatetimeIndex is_year_start breaks on BusinessMonthBegin frequency Jun 4, 2024

parametrize test, and a note to v3.0.0

4e0b00d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DatetimeIndex is_year_start breaks on BusinessMonthBegin frequency #58691

BUG: DatetimeIndex is_year_start breaks on BusinessMonthBegin frequency #58691

natmokval commented May 12, 2024 •

edited

simonjayhawkins commented May 15, 2024

natmokval commented May 15, 2024 •

edited

natmokval commented Jun 4, 2024

MarcoGorelli commented Jun 6, 2024

MarcoGorelli commented Jun 6, 2024

natmokval commented Jun 6, 2024

BUG: DatetimeIndex is_year_start breaks on BusinessMonthBegin frequency #58691

Are you sure you want to change the base?

BUG: DatetimeIndex is_year_start breaks on BusinessMonthBegin frequency #58691

Conversation

natmokval commented May 12, 2024 • edited

simonjayhawkins commented May 15, 2024

natmokval commented May 15, 2024 • edited

natmokval commented Jun 4, 2024

MarcoGorelli commented Jun 6, 2024

MarcoGorelli commented Jun 6, 2024

natmokval commented Jun 6, 2024

natmokval commented May 12, 2024 •

edited

natmokval commented May 15, 2024 •

edited