Link Search Menu Expand Document (external link)

How to quickly load some sample data

Description

Sometimes you just need to try out a new piece of code, whether it be data manipulation, statistical computation, plotting, or whatever. And it’s handy to be able to quickly load some example data to work with. There is a lot of freely available sample data out there. What’s the easiest way to load it?

Solution, in Julia

View this solution alone.

The R programming language comes with many free datasets built in. To make these same datasets available to Julia programmers as well, you can install and import the RDatasets package.

First, ensure that you have it installed, by running the Julia commands using Pkg and then Pkg.add( "RDatasets" ). Then you can get access to many datasets as follows:

1
2
3
using RDatasets
iris = dataset( "datasets", "iris" )
first( iris, 5 ) # just show the first 5 rows
5×5 DataFrame
RowSepalLengthSepalWidthPetalLengthPetalWidthSpecies
Float64Float64Float64Float64Cat…
15.13.51.40.2setosa
24.93.01.40.2setosa
34.73.21.30.2setosa
44.63.11.50.2setosa
55.03.61.40.2setosa

But what datasets are available? There are many! You can find a full list in the package itself.

1
RDatasets.packages()
34×2 DataFrame
9 rows omitted
RowPackageTitle
String15String
1COUNTFunctions, data and code for count data.
2EcdatData sets for econometrics
3HSAURA Handbook of Statistical Analyses Using R (1st Edition)
4HistDataData sets from the history of statistics and data visualization
5ISLRData for An Introduction to Statistical Learning with Applications in R
6KMsurvData sets from Klein and Moeschberger (1997), Survival Analysis
7MASSSupport Functions and Datasets for Venables and Ripley's MASS
8SASmixedData sets from "SAS System for Mixed Models"
9ZeligEveryone's Statistical Software
10adehabitatLTAnalysis of Animal Movements
11bootBootstrap Functions (Originally by Angelo Canty for S)
12carCompanion to Applied Regression
13clusterCluster Analysis Extended Rousseeuw et al.
23plmLinear Models for Panel Data
24plyrTools for splitting, applying and combining data
25psclPolitical Science Computational Laboratory, Stanford University
26psychProcedures for Psychological, Psychometric, and Personality Research
27quantregQuantile Regression
28reshape2Flexibly Reshape Data: A Reboot of the Reshape Package.
29robustbaseBasic Robust Statistics
30rpartRecursive Partitioning and Regression Trees
31sandwichRobust Covariance Matrix Estimators
32semStructural Equation Models
33survivalSurvival Analysis
34vcdVisualizing Categorical Data

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Solution, in Python

View this solution alone.

The R programming language comes with many free datasets built in. To make these same datasets available to Python programmers as well, you can install and import the rdatasets package.

First, ensure that you have it installed, by running pip install rdatasets or conda install rdatasets from your command line. Then you can get access to many datasets as follows:

1
2
3
from rdatasets import data
df = data( 'iris' )  # Load the famous Fisher's irises dataset
df.head()
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

But what datasets are available? There are many! You can find a full list in the package itself.

1
2
from rdatasets import summary
summary()
Package Item Title Rows Cols n_binary n_character n_factor n_logical n_numeric CSV Doc
0 boot acme Monthly Excess Returns 60 3 0 1 0 0 2 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
1 boot aids Delay in AIDS Reporting in England and Wales 570 6 1 0 0 0 6 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
2 boot aircondit Failures of Air-conditioning Equipment 12 1 0 0 0 0 1 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
3 boot aircondit7 Failures of Air-conditioning Equipment 24 1 0 0 0 0 1 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
4 boot amis Car Speeding and Warning Signs 8437 4 1 0 0 0 4 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
... ... ... ... ... ... ... ... ... ... ... ... ...
1340 Zelig tobin Tobin's Tobit Data 20 3 0 0 0 0 3 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
1341 Zelig turnout Turnout Data Set from the National Election Su... 2000 5 2 0 1 0 4 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
1342 Zelig voteincome Sample Turnout and Demographic Data from the 2... 1500 7 3 0 1 0 6 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
1343 Zelig Weimar 1932 Weimar election data 10 11 0 0 0 0 11 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...
1344 Zelig Zelig.url Table of links for Zelig 49 2 0 0 2 0 0 https://raw.github.com/vincentarelbundock/Rdat... https://raw.github.com/vincentarelbundock/Rdat...

1345 rows × 12 columns

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Solution, in R

View this solution alone.

R comes with many datasets in its datasets package. Ensure that you have it installed as follows.

1
library(datasets)

Then you can load any one of them with the data function, as follows.

1
2
data(iris)  # Load the famous Fisher's irises dataset.
head(iris)  # It has been placed in a variable of the same name.
1
2
3
4
5
6
7
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1          3.5         1.4          0.2         setosa 
2 4.9          3.0         1.4          0.2         setosa 
3 4.7          3.2         1.3          0.2         setosa 
4 4.6          3.1         1.5          0.2         setosa 
5 5.0          3.6         1.4          0.2         setosa 
6 5.4          3.9         1.7          0.4         setosa 

To page through a list of all available datasets, just call data() with no arguments.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Topics that include this task

Opportunities

This website does not yet contain a solution for this task in any of the following software packages.

  • Excel

If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.