How to quickly load some sample data
Description
Sometimes you just need to try out a new piece of code, whether it be data manipulation, statistical computation, plotting, or whatever. And it’s handy to be able to quickly load some example data to work with. There is a lot of freely available sample data out there. What’s the easiest way to load it?
Solution, in Julia
The R programming language comes with many free datasets built in. To make these
same datasets available to Julia programmers as well, you can install and import
the RDatasets
package.
First, ensure that you have it installed, by running the Julia commands using Pkg
and then Pkg.add( "RDatasets" )
. Then you can get access to many datasets as follows:
1
2
3
using RDatasets
iris = dataset( "datasets", "iris" )
first( iris, 5 ) # just show the first 5 rows
Row | SepalLength | SepalWidth | PetalLength | PetalWidth | Species |
---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | Cat… | |
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
But what datasets are available? There are many! You can find a full list in the package itself.
1
RDatasets.packages()
Row | Package | Title |
---|---|---|
String15 | String | |
1 | COUNT | Functions, data and code for count data. |
2 | Ecdat | Data sets for econometrics |
3 | HSAUR | A Handbook of Statistical Analyses Using R (1st Edition) |
4 | HistData | Data sets from the history of statistics and data visualization |
5 | ISLR | Data for An Introduction to Statistical Learning with Applications in R |
6 | KMsurv | Data sets from Klein and Moeschberger (1997), Survival Analysis |
7 | MASS | Support Functions and Datasets for Venables and Ripley's MASS |
8 | SASmixed | Data sets from "SAS System for Mixed Models" |
9 | Zelig | Everyone's Statistical Software |
10 | adehabitatLT | Analysis of Animal Movements |
11 | boot | Bootstrap Functions (Originally by Angelo Canty for S) |
12 | car | Companion to Applied Regression |
13 | cluster | Cluster Analysis Extended Rousseeuw et al. |
⋮ | ⋮ | ⋮ |
23 | plm | Linear Models for Panel Data |
24 | plyr | Tools for splitting, applying and combining data |
25 | pscl | Political Science Computational Laboratory, Stanford University |
26 | psych | Procedures for Psychological, Psychometric, and Personality Research |
27 | quantreg | Quantile Regression |
28 | reshape2 | Flexibly Reshape Data: A Reboot of the Reshape Package. |
29 | robustbase | Basic Robust Statistics |
30 | rpart | Recursive Partitioning and Regression Trees |
31 | sandwich | Robust Covariance Matrix Estimators |
32 | sem | Structural Equation Models |
33 | survival | Survival Analysis |
34 | vcd | Visualizing Categorical Data |
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Solution, in Python
The R programming language comes with many free datasets built in. To make these
same datasets available to Python programmers as well, you can install and import
the rdatasets
package.
First, ensure that you have it installed, by running pip install rdatasets
or
conda install rdatasets
from your command line. Then you can get access to many
datasets as follows:
1
2
3
from rdatasets import data
df = data( 'iris' ) # Load the famous Fisher's irises dataset
df.head()
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
But what datasets are available? There are many! You can find a full list in the package itself.
1
2
from rdatasets import summary
summary()
Package | Item | Title | Rows | Cols | n_binary | n_character | n_factor | n_logical | n_numeric | CSV | Doc | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | boot | acme | Monthly Excess Returns | 60 | 3 | 0 | 1 | 0 | 0 | 2 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
1 | boot | aids | Delay in AIDS Reporting in England and Wales | 570 | 6 | 1 | 0 | 0 | 0 | 6 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
2 | boot | aircondit | Failures of Air-conditioning Equipment | 12 | 1 | 0 | 0 | 0 | 0 | 1 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
3 | boot | aircondit7 | Failures of Air-conditioning Equipment | 24 | 1 | 0 | 0 | 0 | 0 | 1 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
4 | boot | amis | Car Speeding and Warning Signs | 8437 | 4 | 1 | 0 | 0 | 0 | 4 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1340 | Zelig | tobin | Tobin's Tobit Data | 20 | 3 | 0 | 0 | 0 | 0 | 3 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
1341 | Zelig | turnout | Turnout Data Set from the National Election Su... | 2000 | 5 | 2 | 0 | 1 | 0 | 4 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
1342 | Zelig | voteincome | Sample Turnout and Demographic Data from the 2... | 1500 | 7 | 3 | 0 | 1 | 0 | 6 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
1343 | Zelig | Weimar | 1932 Weimar election data | 10 | 11 | 0 | 0 | 0 | 0 | 11 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
1344 | Zelig | Zelig.url | Table of links for Zelig | 49 | 2 | 0 | 0 | 2 | 0 | 0 | https://raw.github.com/vincentarelbundock/Rdat... | https://raw.github.com/vincentarelbundock/Rdat... |
1345 rows × 12 columns
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Solution, in R
R comes with many datasets in its datasets
package.
Ensure that you have it installed as follows.
1
library(datasets)
Then you can load any one of them with the data
function, as follows.
1
2
data(iris) # Load the famous Fisher's irises dataset.
head(iris) # It has been placed in a variable of the same name.
1
2
3
4
5
6
7
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
To page through a list of all available datasets, just call data()
with no arguments.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Topics that include this task
Opportunities
This website does not yet contain a solution for this task in any of the following software packages.
- Excel
If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.