# Getting Started

## Installation

From [PyPI](https://pypi.org/project/xrf/)

```bash
pip install xrf
```

From [conda-forge](https://anaconda.org/conda-forge/xrf)

```bash
conda install conda-forge::xrf
```

## Quickstart

### Classification forests

Let us start by importing the tic-tac-toe dataset from [openml.org](https://www.openml.org).

```python
from sklearn.datasets import fetch_openml
from sklearn.preprocessing import OneHotEncoder

dataset = fetch_openml(name="tic-tac-toe", parser="auto")

y = dataset.target.values

X = OneHotEncoder().fit_transform(dataset.data.values).toarray()
```

Let us split the dataset into a training and a test set.

```python
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.75)
```

Let us now fit an explainable random forest classifier; we can use the same parameters as for standard random forest classifiers as implemented in `scikit-learn`.

```python
from xrf import XRandomForestClassifier

rfx = XRandomForestClassifier(n_jobs=-1)
rfx.fit(X_train, y_train)
```

We get the predictions in the usual way, using either `predict` or `predict_proba`, here resulting in exactly the same output as the standard random forest classifiers in `scikit-learn`. 

```python
rfx.predict_proba(X_test)
```

```numpy
array([[0.05, 0.95],
       [0.56, 0.44],
       [0.4 , 0.6 ],
       ...,
       [0.21, 0.79],
       [0.17, 0.83],
       [0.59, 0.41]])
```

We may now limit the number of examples involved in a prediction, e.g., to at most 5.

```python
rfx.predict_proba(X_test, k=5)
```

```numpy
array([[0.        , 1.        ],
       [0.85416634, 0.14583366],
       [0.34500622, 0.65499378],
       ...,
       [0.27464175, 0.72535825],
       [0.12693503, 0.87306497],
       [1.        , 0.        ]])
```

Let us also obtain the example attributions, by setting `return_examples` and `return_weights` to `True`.

```python
predictions, examples, weights = rfx.predict_proba(X_test, k=5, 
                                                   return_examples=True, 
                                                   return_weights=True)
```

Let us also take a look at the example attributions; `examples` will contain the indexes of the training objects involved in each prediction, while `weights` will contain the corresponding weights.

```python
examples
```

```numpy
array([[ 26, 131,  40, 193, 169],
       [ 48, 121,  52, 164,   6],
       [203, 176, 213, 110,  99],
       ...,
       [ 52, 167, 194, 175,  53],
       [104,  71,  20,  35, 122],
       [ 33,  47, 188, 228, 120]])
```

```python
weights
```

```numpy
array([[0.23050922, 0.21026052, 0.19812573, 0.18882078, 0.17228375],
       [0.24554293, 0.20930998, 0.20651394, 0.19279949, 0.14583366],
       [0.2935989 , 0.25979051, 0.21957101, 0.12543522, 0.10160437],
       ...,
       [0.27464175, 0.23320384, 0.19853987, 0.15467345, 0.13894108],
       [0.32220957, 0.21056097, 0.20287181, 0.13742261, 0.12693503],
       [0.26857466, 0.20863132, 0.20008477, 0.18560888, 0.13710037]])
```

### Regression forests

Let us import the Miami housing dataset from [openml.org](https://www.openml.org).

```python
from sklearn.datasets import fetch_openml
from sklearn.preprocessing import OneHotEncoder

dataset = fetch_openml(name="miami_housing", parser="auto")

y = dataset.target.values
X = dataset.data.values
```

Let us split the dataset into a training and a test set.

```python
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.75)
```

Let us generate and apply an explainable random forest regressor without constraining the number of training examples involved in the predictions.

```python
from xrf import XRandomForestRegressor

rfx = XRandomForestRegressor(n_jobs=-1)
rfx.fit(X_train, y_train)
rfx.predict(X_test)
```

```numpy
array([492859., 193170., 260507., ..., 330824., 416856., 241969.])
```

We may now limit the number of examples involved in a prediction, e.g., to at most 5.

```python
rfx.predict(X_test, k=5)
```

```numpy
array([541411.11111111, 196994.81865285, 210900.81300813, ...,
       340516.66666667, 389410.25641026, 241550.27422303])
```

The example attributions are obtained by setting `return_examples` and `return_weights` to `True`.

```python
predictions, examples, weights = rfx.predict(X_test, k=5,
                                             return_examples=True,
                                             return_weights=True)
```

We may check that the predictions are the same as the weighted targets of the training examples.

```python
import numpy as np

weighted_predictions = np.sum([weights[i]*y_train[examples[i]] 
                               for i in range(len(weights))], axis=1)

np.allclose(predictions, weighted_predictions)
```

```python
True
```

You are welcome to download and try out `xrf`; you may find the following notebook helpful:

[Examples.ipynb](https://github.com/henrikbostrom/xrf/blob/main/docs/Examples.ipynb)