Issue
I have a pandas dataframe df_sample:
columnA columnB
A AA
A AB
B BA
B BB
B BC
And I am already creating a random column with some date objects in it:
df_sample['contract_starts'] = np.random.choice(pd.date_range('2024-01-01', '2024-05-01'), len(df_sample))
which leads to the following output:
columnA columnB contract_starts
A AA 2024-01-21
A AB 2024-03-03
B BA 2024-01-18
B BB 2024-02-18
B BC 2024-04-03
How can I create another datetime column contract_noted, that the values also have a given range (e.g. until 2024-05-01 ) but does not exceed the contract_starts
column, so for example:
columnA columnB contract_starts contract_noted
A AA 2024-01-21 2024-01-20
A AB 2024-03-03 2024-01-01
B BA 2024-01-18 2024-01-13
B BB 2024-02-18 2024-02-01
B BC 2024-04-03 2024-03-28
Solution
You can subtract random timedeltas from contract_starts
columns by numpy.random.randint
with to_timedelta
:
df_sample['contract_noted'] = (df_sample['contract_starts'] -
pd.to_timedelta(np.random.randint(1,30, len(df_sample)),
unit='d'))
print (df_sample)
columnA columnB contract_starts contract_noted
0 A AA 2024-04-18 2024-03-21
1 A AB 2024-02-12 2024-01-22
2 B BA 2024-02-21 2024-02-02
3 B BB 2024-04-12 2024-03-29
4 B BC 2024-02-10 2024-02-03
If need also datetimes between start and end same like contract_starts
generate inetegers between 1
and difference with start datetime:
days =(df_sample['contract_starts'] - pd.Timestamp('2024-01-01')).dt.days
print (days)
df_sample['contract_noted'] = (df_sample['contract_starts'] -
pd.to_timedelta(np.random.randint(1,days, len(df_sample)),
unit='d'))
print (df_sample)
columnA columnB contract_starts contract_noted
0 A AA 2024-02-09 2024-01-09
1 A AB 2024-04-26 2024-02-23
2 B BA 2024-04-10 2024-04-06
3 B BB 2024-01-31 2024-01-07
4 B BC 2024-01-14 2024-01-08
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.