Link Search Menu Expand Document (external link)

How to quickly load some sample data (in Python)

See all solutions.

Task

Sometimes you just need to try out a new piece of code, whether it be data manipulation, statistical computation, plotting, or whatever. And it’s handy to be able to quickly load some example data to work with. There is a lot of freely available sample data out there. What’s the easiest way to load it?

Solution

The R programming language comes with many free datasets built in. To make these same datasets available to Python programmers as well, you can install and import the rdatasets package.

First, ensure that you have it installed, by running pip install rdatasets or conda install rdatasets from your command line. Then you can get access to many datasets as follows:

1
2
3
from rdatasets import data
df = data( 'iris' )  # Load the famous Fisher's irises dataset
df.head()
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

But what datasets are available? There are many! You can find a full list in the package itself.

1
2
from rdatasets import summary
summary()
Package Item Title Rows Cols n_binary n_character n_factor n_logical n_numeric CSV Doc
0 boot acme Monthly Excess Returns 60 3 0 1 0 0 2 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
1 boot aids Delay in AIDS Reporting in England and Wales 570 6 1 0 0 0 6 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
2 boot aircondit Failures of Air-conditioning Equipment 12 1 0 0 0 0 1 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
3 boot aircondit7 Failures of Air-conditioning Equipment 24 1 0 0 0 0 1 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
4 boot amis Car Speeding and Warning Signs 8437 4 1 0 0 0 4 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
... ... ... ... ... ... ... ... ... ... ... ... ...
1340 Zelig tobin Tobin's Tobit Data 20 3 0 0 0 0 3 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
1341 Zelig turnout Turnout Data Set from the National Election Su... 2000 5 2 0 1 0 4 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
1342 Zelig voteincome Sample Turnout and Demographic Data from the 2... 1500 7 3 0 1 0 6 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
1343 Zelig Weimar 1932 Weimar election data 10 11 0 0 0 0 11 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
1344 Zelig Zelig.url Table of links for Zelig 49 2 0 0 2 0 0 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...

1345 rows × 12 columns

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Contributed by Nathan Carter (ncarter@bentley.edu)