How to convert a text column into dates (in Python, using pandas)
Task
When loading data, many software systems make intelligent guesses about the format and data type of each column, but sometimes that is not sufficient. If you have a column of text that should be interpreted as dates, how can we ask the software to convert it?
Solution
Let’s create a small example DataFrame to use here (using the method from how to create a data frame from scratch). Naturally, you would apply this solution to your own data instead.
1
2
3
4
import pandas as pd
df = pd.DataFrame( { 'Date' : [ '5/7/19', '5/10/19', '5/11/19' ],
'Event' : [ 'Work', 'Party', 'More work' ] } )
df
Date | Event | |
---|---|---|
0 | 5/7/19 | Work |
1 | 5/10/19 | Party |
2 | 5/11/19 | More work |
If you’ve already got the data in a DataFrame column, and you wish to convert it to dates, use the pd.to_datetime
function, which will do its best to read whatever format your dates are in:
1
2
df['Date'] = pd.to_datetime( df['Date'] )
df
1
2
/tmp/ipykernel_4832/1311655736.py:1: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
df['Date'] = pd.to_datetime( df['Date'] )
Date | Event | |
---|---|---|
0 | 2019-05-07 | Work |
1 | 2019-05-10 | Party |
2 | 2019-05-11 | More work |
But if they aren’t in a standard format, you can specify just about any format as in the following example. See the Python documentation for format details.
1
2
# If the dates had been, for example, 5-7-2019 10:15:00
df['Date'] = pd.to_datetime( df['Date'], format="%m-%d-%Y %H:%M:%S" )
It’s often easier to handle date conversions while reading the data. You can tell pandas to read dates in most of the common date formats using any of the following methods.
1
2
3
4
5
6
7
8
9
10
# Any columns that look like dates, treat as dates:
df = pd.read_csv( "example.csv", parse_dates=True )
# Convert the specific columns you name into dates:
df = pd.read_csv( "example.csv", parse_dates=['col1','col2'] )
# If the date is spread over multiple columns, do this:
# (Let's say the year, month, and day are in columns, 4, 5, and 6.)
df = pd.read_csv( "example.csv", parse_dates=[[4,5,6]] )
# Note the double brackets, and indices start counting at zero.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Contributed by Nathan Carter (ncarter@bentley.edu)