Monday, February 5, 2024

[FIXED] Create random datetime column with condition to another datetime column pandas

February 05, 2024 dataframe, datetime, pandas, python No comments

Issue

I have a pandas dataframe df_sample:

columnA columnB
A         AA
A         AB
B         BA
B         BB
B         BC

And I am already creating a random column with some date objects in it:

df_sample['contract_starts'] = np.random.choice(pd.date_range('2024-01-01', '2024-05-01'), len(df_sample))

which leads to the following output:

columnA columnB contract_starts
A         AA     2024-01-21
A         AB     2024-03-03
B         BA     2024-01-18
B         BB     2024-02-18
B         BC     2024-04-03

How can I create another datetime column contract_noted, that the values also have a given range (e.g. until 2024-05-01 ) but does not exceed the contract_startscolumn, so for example:

columnA columnB contract_starts contract_noted
A         AA     2024-01-21      2024-01-20
A         AB     2024-03-03      2024-01-01
B         BA     2024-01-18      2024-01-13
B         BB     2024-02-18      2024-02-01
B         BC     2024-04-03      2024-03-28

Solution

You can subtract random timedeltas from contract_starts columns by numpy.random.randint with to_timedelta:

df_sample['contract_noted'] = (df_sample['contract_starts'] - 
                               pd.to_timedelta(np.random.randint(1,30, len(df_sample)), 
                                               unit='d'))

print (df_sample)
  columnA columnB contract_starts contract_noted
0       A      AA      2024-04-18     2024-03-21
1       A      AB      2024-02-12     2024-01-22
2       B      BA      2024-02-21     2024-02-02
3       B      BB      2024-04-12     2024-03-29
4       B      BC      2024-02-10     2024-02-03

If need also datetimes between start and end same like contract_starts generate inetegers between 1 and difference with start datetime:

days =(df_sample['contract_starts'] - pd.Timestamp('2024-01-01')).dt.days
print (days)

df_sample['contract_noted'] = (df_sample['contract_starts'] - 
                               pd.to_timedelta(np.random.randint(1,days, len(df_sample)), 
                                               unit='d'))
print (df_sample)
  columnA columnB contract_starts contract_noted
0       A      AA      2024-02-09     2024-01-09
1       A      AB      2024-04-26     2024-02-23
2       B      BA      2024-04-10     2024-04-06
3       B      BB      2024-01-31     2024-01-07
4       B      BC      2024-01-14     2024-01-08

Answered By - jezrael

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, February 5, 2024

[FIXED] Create random datetime column with condition to another datetime column pandas

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels