How to quickly load some sample data (in Julia)
Task
Sometimes you just need to try out a new piece of code, whether it be data manipulation, statistical computation, plotting, or whatever. And it’s handy to be able to quickly load some example data to work with. There is a lot of freely available sample data out there. What’s the easiest way to load it?
Solution
The R programming language comes with many free datasets built in. To make these
same datasets available to Julia programmers as well, you can install and import
the RDatasets
package.
First, ensure that you have it installed, by running the Julia commands using Pkg
and then Pkg.add( "RDatasets" )
. Then you can get access to many datasets as follows:
1
2
3
using RDatasets
iris = dataset( "datasets", "iris" )
first( iris, 5 ) # just show the first 5 rows
Row | SepalLength | SepalWidth | PetalLength | PetalWidth | Species |
---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | Cat… | |
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
But what datasets are available? There are many! You can find a full list in the package itself.
1
RDatasets.packages()
Row | Package | Title |
---|---|---|
String15 | String | |
1 | COUNT | Functions, data and code for count data. |
2 | Ecdat | Data sets for econometrics |
3 | HSAUR | A Handbook of Statistical Analyses Using R (1st Edition) |
4 | HistData | Data sets from the history of statistics and data visualization |
5 | ISLR | Data for An Introduction to Statistical Learning with Applications in R |
6 | KMsurv | Data sets from Klein and Moeschberger (1997), Survival Analysis |
7 | MASS | Support Functions and Datasets for Venables and Ripley's MASS |
8 | SASmixed | Data sets from "SAS System for Mixed Models" |
9 | Zelig | Everyone's Statistical Software |
10 | adehabitatLT | Analysis of Animal Movements |
11 | boot | Bootstrap Functions (Originally by Angelo Canty for S) |
12 | car | Companion to Applied Regression |
13 | cluster | Cluster Analysis Extended Rousseeuw et al. |
⋮ | ⋮ | ⋮ |
23 | plm | Linear Models for Panel Data |
24 | plyr | Tools for splitting, applying and combining data |
25 | pscl | Political Science Computational Laboratory, Stanford University |
26 | psych | Procedures for Psychological, Psychometric, and Personality Research |
27 | quantreg | Quantile Regression |
28 | reshape2 | Flexibly Reshape Data: A Reboot of the Reshape Package. |
29 | robustbase | Basic Robust Statistics |
30 | rpart | Recursive Partitioning and Regression Trees |
31 | sandwich | Robust Covariance Matrix Estimators |
32 | sem | Structural Equation Models |
33 | survival | Survival Analysis |
34 | vcd | Visualizing Categorical Data |
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Contributed by Nathan Carter (ncarter@bentley.edu)