How to quickly load some sample data (in Python)
Task
Sometimes you just need to try out a new piece of code, whether it be data manipulation, statistical computation, plotting, or whatever. And it’s handy to be able to quickly load some example data to work with. There is a lot of freely available sample data out there. What’s the easiest way to load it?
Solution
The R programming language comes with many free datasets built in. To make these
same datasets available to Python programmers as well, you can install and import
the rdatasets
package.
First, ensure that you have it installed, by running pip install rdatasets
or
conda install rdatasets
from your command line. Then you can get access to many
datasets as follows:
1
2
3
from rdatasets import data
df = data( 'iris' ) # Load the famous Fisher's irises dataset
df.head()
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
But what datasets are available? There are many! You can find a full list in the package itself.
1
2
from rdatasets import summary
summary()
Package | Item | Title | Rows | Cols | n_binary | n_character | n_factor | n_logical | n_numeric | CSV | Doc | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | boot | acme | Monthly Excess Returns | 60 | 3 | 0 | 1 | 0 | 0 | 2 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
1 | boot | aids | Delay in AIDS Reporting in England and Wales | 570 | 6 | 1 | 0 | 0 | 0 | 6 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
2 | boot | aircondit | Failures of Air-conditioning Equipment | 12 | 1 | 0 | 0 | 0 | 0 | 1 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
3 | boot | aircondit7 | Failures of Air-conditioning Equipment | 24 | 1 | 0 | 0 | 0 | 0 | 1 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
4 | boot | amis | Car Speeding and Warning Signs | 8437 | 4 | 1 | 0 | 0 | 0 | 4 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1340 | Zelig | tobin | Tobin's Tobit Data | 20 | 3 | 0 | 0 | 0 | 0 | 3 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
1341 | Zelig | turnout | Turnout Data Set from the National Election Su... | 2000 | 5 | 2 | 0 | 1 | 0 | 4 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
1342 | Zelig | voteincome | Sample Turnout and Demographic Data from the 2... | 1500 | 7 | 3 | 0 | 1 | 0 | 6 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
1343 | Zelig | Weimar | 1932 Weimar election data | 10 | 11 | 0 | 0 | 0 | 0 | 11 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
1344 | Zelig | Zelig.url | Table of links for Zelig | 49 | 2 | 0 | 0 | 2 | 0 | 0 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
1345 rows × 12 columns
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Contributed by Nathan Carter (ncarter@bentley.edu)