Data

class gdeep.data.AbstractPreprocessing(*args, **kwds)
class gdeep.data.DatasetFactory

Dataset factory class for the tori dataset and torchvision datasets using the factory design pattern

Examples:

# Create a dataset for the tori dataset
dataset = get_dataset("Tori", name="DoubleTori", n_points=100)

# Create the MNIST dataset
dataset = get_dataset("Torchvision", name="MNIST")
build(key: str, **kwargs) Any

This method returns the DataLoader builder corresponding to the input key.

Args:
key:

the name of the dataset

register_builder(key: str, builder: Any)

this method adds to the internal builders dictionary new dataloader builders

exception gdeep.data.MissingVocabularyError

Class to raise the missing vocabulary exception in the tokenizers

class gdeep.data.PreprocessingPipeline(preprocessors: Iterable[AbstractPreprocessing[Any, Any]])

Pipeline to fit non-fitted preprocessors to a dataset in a sequential manner. The fitted preprocessing transform can be attached to a dataset using the ´attach_transform_to_dataset´ method. The intended use case is to fit the preprocessors to the training dataset and then attach the fitted transform to the training, validation and test datasets.

The transform is only applied to the data and not the labels.

Examples:

from gdeep.data.preeprocessors import PreprocessingPipeline, Normalization,             PreprocessImageClassification
from gdeep.data.datasets import DatasetImageClassificationFromFiles

image_dataset = DatasetImageClassificationFromFiles(
    os.path.join(file_path, "img_data"),
    os.path.join(file_path, "img_data", "labels.csv"))

preprocessing_pipeline = PreprocessingPipeline((PreprocessImageClassification((32, 32)),
                                                Normalization()))
preprocessing_pipeline.fit_to_dataset(image_dataset)  # this will not change the image_dataset
preprocessed_dataset = preprocessing_pipeline.attach_transform_to_dataset(image_dataset)
class gdeep.data.TransformingDataset(dataset: Dataset[R], transform: Callable[[R], S])

This class is the base class for all the Datasets that need to be transformed via preprocessors. This base class expects to get data from Dataset.

Args:
dataset :

The source dataset for this class.

transform :

This is either a function defined by the users or a fitted preprocessor. The preprocessors inherits from AbstractPreprocessing

Preprocessors

class gdeep.data.preprocessors.FilterPersistenceDiagramByHomologyDimension(homology_dimensions_to_filter: List[int])

This class filters the persistence diagrams of a dataset by their homology dimension.

Here we assume that the dataset is a tuple of (persistence diagram, label) and that the points in the diagram are sorted by ascending lifetime. This is an invariant of the OneHotEncodedPersistenceDiagram class but could go wrong if the diagrams are modified in a way that breaks this invariant.

Args:
homology_dimensions_to_filter:

The homology dimensions of the points in the diagram that should be kept.

fit_to_dataset(dataset: Dataset[Tuple[OneHotEncodedPersistenceDiagram, T]]) None

This method does nothing.

class gdeep.data.preprocessors.FilterPersistenceDiagramByLifetime(min_lifetime: float, max_lifetime: float)

This class filters the persistence diagrams of a dataset by their lifetime, i.e. the difference between the birth and death coordinates.

Here we assume that the dataset is a tuple of (persistence diagram, label) and that the points in the diagram are sorted by ascending lifetime. This is an invariant of the OneHotEncodedPersistenceDiagram class but could go wrong if the diagrams are modified in a way that breaks this invariant.

Args:
min_lifetime:

The minimum lifetime of the points in the diagram.

max_lifetime:

The maximum lifetime of the points in the diagram.

fit_to_dataset(dataset: Dataset[Tuple[OneHotEncodedPersistenceDiagram, T]]) None

This method does nothing.

class gdeep.data.preprocessors.MinMaxScalarPersistenceDiagram

This class runs the standard min-max normalisation on the birth and death times of the persistence diagrams. For example. The transformation is: X_scaled = X_std * (max - min) + min

class gdeep.data.preprocessors.Normalization

This class runs the standard normalisation on all the dimensions of the tensors of a dataset. For example, in case of images where each item is of shape (C, H, W), the average and the standard deviations will be tensors of shape (C, H, W)

class gdeep.data.preprocessors.NormalizationPersistenceDiagram(num_homology_dimensions: int)

This class runs the standard normalisation on the birth and death coordinates of the persistence diagrams of a dataset accross all the homology dimensions.

The one-hot encoded persistence diagrams are kept as is.

class gdeep.data.preprocessors.ToTensorImage(size: int | List[int])

Class to preprocess image files for classification tasks

Args:
size :

Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. I.e, if height > width, then image will be rescaled to (size * height / width, size).

class gdeep.data.preprocessors.TokenizerQA(vocabulary: Vocab | None = None, tokenizer: partial | None = None)

Class to preprocess text dataloaders for Q&A tasks. The type of dataset is assumed to be of the form (string,string,list[string], list[string]).

Args:
vocabulary:

the torch vocabulary

tokenizer :

the tokenizer of the source text

Examples:

from gdeep.data import TorchDataLoader
from gdeep.data import TransformingDataset
from gdeep.data.preprocessors import TokenizerQA

dl = TorchDataLoader(name="SQuAD2", convert_to_map_dataset=True)
dl_tr, dl_ts = dl.build_dataloaders()

textds = TransformingDataset(dl_tr_str.dataset,
                       TokenizerQA())
fit_to_dataset(dataset: Dataset[Tuple[str, str, List[str], List[int]]]) None

Method to fit the vocabulary to the input text

Args:
dataset:

the dataset to fit to

class gdeep.data.preprocessors.TokenizerTextClassification(tokenizer: partial | None = None, vocabulary: Vocab | None = None)

Preprocessing class. This class is useful to convert the data format (label, text) into the proper tensor format ( word_embedding, label). The labels should be integers; if they are string, they will be converted.

Args:
tokenizer :

the tokenizer of the source text

vocabulary :

the vocubulary; it can be built or it can be given.

fit_to_dataset(dataset: Dataset[Tuple[Any, str]]) None

Method to extract global data, like to length of the sentences to be able to pad.

Args:
dataset :

the data in the format (label, text)

class gdeep.data.preprocessors.TokenizerTranslation(vocabulary: Dict[str, int] | None = None, vocabulary_target: Dict[str, int] | None = None, tokenizer: Callable[[str], List[str]] | None = None, tokenizer_target: Callable[[str], List[str]] | None = None)

Class to preprocess text dataloaders for translation tasks. The Dataset type is supposed to be (string, string). The padding item is supposed to be of index 0.

Args:
vocabulary :

the vocabulary of the source text; it can be built automatically or it can be given.

vocabulary_target :

the vocubulary of the target text; it can be built automatically or it can be given.

tokenizer:

the tokenizer of the source text

tokenizer_target:

the tokenizer of the target text

Examples:

from gdeep.data import DatasetBuilder
from gdeep.data import TransformingDataset
from gdeep.data.preprocessors import TokenizerTranslation

db = DatasetBuilder(name="Multi30k", convert_to_map_dataset=True)
ds_tr, ds_val, _ = db.build()

textds = TransformingDataset(ds_tr,
    TokenizerTranslation())

Datasets

class gdeep.data.datasets.AbstractDataLoaderBuilder

The abstract class to interface the Giotto dataloaders

class gdeep.data.datasets.DataLoaderBuilder(tuple_of_datasets: List[Dataset[Any]])

This class builds, out of a tuple of datasets, the corresponding dataloaders. Note that this class would use the same parameters for all the datasets. You can use different parameters for each dataset by passing a list of dictionaries to the build method

Args:
tuple_of_datasets :

Tuple consisting of the training, validation and test datasets. Also one or two elements are acceptable: they will be considered as training first and validation afterwards.

Example:
>>> import torch
>>> from gdeep.data.dataloaders import DataLoaderBuilder
>>> x, y = torch.rand(10, 3, 32, 32), torch.randint(0, 1, (10,))
>>> x_train, y_train = x[:8], y[:8]
>>> x_val, y_val = x[8:], y[8:]
>>> train_dataset = gdeep.data.datasets.FromArray(x_train, y_train)
>>> val_dataset = gdeep.data.datasets.FromArray(x_val, y_val)
>>> dataloader_builder = DataLoaderBuilder(train_dataset, val_dataset)
>>> train_loader, val_loader = dataloader_builder.build()
build(tuple_of_kwargs: List[Dict[str, Any]] | DataLoaderParamsTuples | None = None) List[DataLoader[Any]]

This method accepts the arguments of the torch Dataloader and applies them when creating the tuple. If the tuple of kwargs is a list of dictionaries, then the first dictionary will be applied to the training dataset, the second to the validation dataset and the third to the test dataset. If the tuple of kwargs is a DataLoaderParamsTuples, then the parameters will be applied to the corresponding dataset.

Args:
tuple_of_kwargs:

Tuple consisting of the training, validation and test dataloaders. Also one or two elements are acceptable: they will be considered as training first and validation afterwards.

class gdeep.data.datasets.DataLoaderKwargs(*, train_kwargs, val_kwargs, test_kwargs)

Object to store keyword arguments for train, val, and test dataloaders

class gdeep.data.datasets.DatasetBuilder(name: str = 'MNIST', convert_to_map_dataset: bool = False)

Class to obtain Datasets from the classical datasets available on pytorch. Also the torus dataset and all its variations can be found here

Args:
name:

check the available datasets at https://pytorch.org/vision/stable/datasets.html and https://pytorch.org/text/stable/datasets.html

convert_to_map_dataset:

whether to conver to the MapDataset or to keep IterableDataset

build(**kwargs) Tuple[Dataset[Any], Dataset[Any] | None, Dataset[Any] | None]

Method that returns the dataset.

Args:
kwargs:

the arguments to pass to the dataset builder. For example, you may want to use the options split=("train","dev") or split=("train","test")

class gdeep.data.datasets.DatasetCloud(dataset_name: str, bucket_name: str = 'adversarial_attack', download_directory: None | str = None, use_public_access: bool = True, path_to_credentials: None | str = None, make_public: bool = True)

DatasetCloud class to handle the download and upload of datasets to the DataCloud. If the download_directory does not exist, it will be created and if a folder with the same name as the dataset exists in the download directory, it will not be downloaded again. If a folder with the same name as the dataset does not exists locally, it will be created when downloading the dataset.

Args:
dataset_name (str):

Name of the dataset to be downloaded or uploaded.

bucket_name (str, optional):

Name of the bucket in the DataCloud. Defaults to DATASET_BUCKET_NAME.

download_directory (Union[None, str], optional):

Directory where the dataset will be downloaded to. Defaults to DEFAULT_DOWNLOAD_DIR.

use_public_access (bool, optional):

If True, the dataset will be downloaded via public url. Defaults to False.

path_credentials (Union[None, str], optional):

Path to the credentials file. Only used if public_access is False and credentials are not provided. Defaults to None.

make_public (bool, optional):

If True, the dataset will be made public

Raises:
ValueError:

Dataset does not exits in cloud.

Returns:

None

download() None

Download a dataset from the DataCloud. If the dataset does not exist in the cloud, an exception will be raised. If the dataset exists locally in the download directory, the dataset will not be downloaded again.

Raises:
ValueError:

Dataset does not exits in cloud.

ValueError:

Dataset exists locally but checksums do not match.

get_existing_datasets() List[str]

Returns a list of datasets in the cloud.

Returns:
List[str]:

List of datasets in the cloud.

class gdeep.data.datasets.DlBuilderFromDataCloud(dataset_name: str, download_directory: str, use_public_access: bool = True, path_to_credentials: None | str = None)

Class that loads data from Google Cloud Storage

This class is useful to build dataloaders from a dataset stored in the GDeep Dataset Cloud on Google Cloud Storage.

The constructor takes the name of a dataset as a string, and a string for the download directory. The constructor will download the dataset to the download directory. The dataset is downloaded in the version used by Datasets Cloud, which may be different from the version used by the dataset’s original developers.

Args:
dataset_name (str):

The name of the dataset.

download_dir (str):

The directory where the dataset will be downloaded.

use_public_access (bool):

Whether to use public access. If you want to use the Google Cloud Storage API, you must set this to True. Please make sure you have the appropriate credentials.

path_to_credentials (str):

Path to the credentials file. Only used if public_access is False and credentials are not provided. Defaults to None.

Returns:

torch.utils.data.DataLoader: The dataloader for the dataset.

Raises:
ValueError:

If the dataset_name is not a valid dataset that exists in Datasets Cloud.

ValueError:

If the download_directory is not a valid directory.

build(tuple_of_kwargs: List[Dict[str, Any]]) Tuple[DataLoader, DataLoader, DataLoader]

Builds the dataloaders for the dataset.

Args:

**tuple_of_kwargs: Arguments for the dataloader builder.

Returns:
Tuple[DataLoader, DataLoader, DataLoader]:

The dataloaders for the dataset (train, validation, test).

get_metadata() Dict[str, Any]

Returns the metadata of the dataset.

Returns:
Dict[str, Any]:

The metadata of the dataset.

class gdeep.data.datasets.FromArray(x: Tensor | ndarray, y: Tensor | ndarray)

This class is useful to build dataloaders from a array of X and y. Tensors are also supported.

Args:
X :

The data. The first dimension is the datum index

y :

The labels, need to match the first dimension with the data

class gdeep.data.datasets.ImageClassificationFromFiles(img_folder: str = '.', labels_file: str = 'labels.csv')

This class is useful to build a dataset directly from image files

Args:
img_folder (string):

The path to the folder where the training images are located

labels_file (string):

The path and file name of the labels. It shall be a .csv file with two columns. The first columns contains the name of the image and the second one contains the label value

transform (AbstractPreprocessing):

the instance of the class of preprocessing. It inherits from AbstractPreprocessing

target_transform (AbstractPreprocessing):

the instance of the class of preprocessing. It inherits from AbstractPreprocessing

class gdeep.data.datasets.OrbitsGenerator(parameters: Sequence[float] = (2.5, 3.5, 4.0, 4.1, 4.3), num_orbits_per_class: int = 1000, num_pts_per_orbit: int = 1000, homology_dimensions: Sequence[int] = (0, 1), validation_percentage: float = 0.0, test_percentage: float = 0.0, dynamical_system: str = 'classical_convention', n_jobs: int = 1, dtype: str = 'float32', arbitrary_precision=False)

Generate Orbit dataset consistent of orbits defined by the dynamical system x[n+1] = x[n] + r * y[n] * (1 - y[n]) % 1 y[n+1] = y[n] + r * x[n+1] * (1 - x[n+1]) % 1 Note that there is an x[n+1] value in the second dimension. The parameter r is an hyperparameter and the classification task is to predict it given the orbit. By default r is chosen from (2.5, 3.5, 4.0, 4.1, 4.3). Args:

parameters (Tuple[float]):

Hyperparameter of the dynamical systems.

num_orbits_per_class (int):

number of orbits per class.

num_pts_per_orbit (int):

number of points per orbit.

homology_dimensions (Sequence[int]):

homology dimension of the persistence diagrams.

validation_percentage (float, optional):

Percentage of the validation dataset. Defaults to 0.0.

test_percentage (float, optional):

Percentage of the test dataset. Defaults to 0.0.

dynamical_system (str, optional):

either use persistence paths convention ´pp_convention´ or the classical convention ´classical_convention´. Defaults to ‘´classical_convention´’.

n_jobs (int, optional):

number of cpus to run the computation on. Defaults to 1.

get_dataloader_combined(dataloaders_kwargs: DataLoaderKwargs) Tuple[DataLoader, DataLoader, DataLoader]

Generates a Dataloader from the orbits dataset and the persistence diagrams Returns:

DataLoader:

Dataloader of orbits and persistence diagrams

get_dataloader_orbits(dataloaders_kwargs: DataLoaderKwargs) Tuple[DataLoader, DataLoader, DataLoader]

Generates a Dataloader from the orbits dataset Returns:

DataLoader:

Dataloader of orbits

get_dataloader_persistence_diagrams(dataloaders_kwargs: DataLoaderKwargs) Tuple[DataLoader, DataLoader, DataLoader]

Generates a Dataloader from the persistence diagrams dataset Returns:

Tuple[DataLoader, DataLoader, DataLoader]:

Dataloaders of persistence diagrams

get_orbits() None | ndarray

Returns the orbits as an ndarrays of shape (num_classes * num_orbits_per_class, num_pts_per_orbit, 2) Returns:

np.ndarray:

Orbits

get_persistence_diagrams() None | ndarray

Returns the orbits as an ndarrays of shape (num_classes * num_orbits_per_class, num_topological_features, 3) Returns:

np.ndarray:

Persistence diagrams

class gdeep.data.datasets.PersistenceDiagramFromFiles(file_path: str)
class gdeep.data.datasets.Rotation(axis_0: int, axis_1: int, angle: float)

Class for rotations

class gdeep.data.datasets.ToriDataset(name: str, **kwargs)

This class is used to generate data loaders for the family of tori-datasets

Args:
name:

name of the torus dataset to generate

gdeep.data.datasets.create_pd_orbits(orbits, num_classes, homology_dimensions=(0, 1), n_jobs=2) Tensor

Computes the weak alpha persistence of the orbit data clouds.

Args:
orbits (np.array):

Orbits of shape [n_points, 2]

homology_dimensions (tuple, optional):

Dimensions to compute the persistence diagrams. Defaults to (0, 1).

n_jobs (int, optional):

Number of cpus to use for parallel computation. Defaults to multiprocessing.cpu_count().

Returns:
np.array:

Array of persistence diagrams of shape [num_classes, num_orbits, num_persistence_points, 3]. In the last dimension the first two values are the coordinates of the points in the persistence diagrams and the third is the homology dimension.

gdeep.data.datasets.generate_orbit_parallel(num_classes, num_orbits, num_pts_per_orbit: int = 100, parameters: List[float] = [1.0]) ndarray

Generate sequence of points of a dynamical system in a parallel manner.

Args:
num_classes (int):

number of classes of dynamical systems.

num_orbits (int):

number of orbits of dynamical system per class.

num_pts_per_orbit (int, optional):

Number of points to generate. Defaults to 100.

parameter (List[float], optional):

List of parameters of the dynamical system. Defaults to [1.0].

Returns:
np.ndarray:

Array of sampled points of the dynamical system.

gdeep.data.datasets.get_dataset(key: str, **kwargs) Tuple[Dataset[Any]]

Get a dataset from the factory

Args:
key :

The name of the dataset, corresponding to the key in the list of builders

**kwargs:

The keyword arguments to pass to the dataset builder

Returns:
torch.utils.data.Dataset:

The dataset

Persistence Diagrams

class gdeep.data.persistence_diagrams.OneHotEncodedPersistenceDiagram(data: Tensor, homology_dimension_names: List[str] | None = None)

This class represents a single one-hot encoded persistence diagram.

Args:
data:

The data of the persistence diagram. The data must be a tensor of shape (num_points, 2 + num_homology_dimensions) and the last dimension must be the concatenation of the birth-death-coordinates and the one-hot encoded homology dimension. The invariants of the persistence diagram are checked in the constructor.

homology_dimension_names:

The names of the homology dimensions. If None, the names are set to H_0, H_1, …

Example::
pd = torch.tensor ([[0.0928, 0.0995, 0.0000, 0.0000, 1.0000, 0.0000],

[0.0916, 0.1025, 1.0000, 0.0000, 0.0000, 0.0000], [0.0978, 0.1147, 1.0000, 0.0000, 0.0000, 0.0000], [0.0978, 0.1147, 0.0000, 0.0000, 1.0000, 0.0000], [0.0916, 0.1162, 0.0000, 0.0000, 0.0000, 1.0000], [0.0740, 0.0995, 1.0000, 0.0000, 0.0000, 0.0000], [0.0728, 0.0995, 1.0000, 0.0000, 0.0000, 0.0000], [0.0740, 0.1162, 0.0000, 0.0000, 0.0000, 1.0000], [0.0728, 0.1162, 0.0000, 0.0000, 1.0000, 0.0000], [0.0719, 0.1343, 0.0000, 0.0000, 0.0000, 1.0000], [0.0830, 0.2194, 1.0000, 0.0000, 0.0000, 0.0000], [0.0830, 0.2194, 1.0000, 0.0000, 0.0000, 0.0000], [0.0719, 0.2194, 0.0000, 1.0000, 0.0000, 0.0000]])

names = [“Ord0”, “Ext0”, “Rel1”, “Ext1”] pd = OneHotEncodedPersistenceDiagram(pd, names)

all_close(other: OneHotEncodedPersistenceDiagram, atol: float = 1e-07) bool

This method checks if the persistence diagrams are close.

filter_by_lifetime(min_lifetime: float, max_lifetime: float) OneHotEncodedPersistenceDiagram

This method filters the persistence diagram by lifetime.

Args:
min_lifetime:

The minimum lifetime of the remaining points.

max_lifetime:

The maximum lifetime of the remaining points.

static from_numpy(data: ndarray) OneHotEncodedPersistenceDiagram

This method creates a persistence diagram from a numpy array.

get_all_points_in_homology_dimension(homology_dimension: int) OneHotEncodedPersistenceDiagram

This method returns all points in a given homology dimension.

get_lifetimes() Tensor

This method returns the lifetimes of the points.

get_num_homology_dimensions() int

This method returns the number of homology dimensions.

get_num_points() int

This method returns the number of points.

get_points_in_homology_dimension(homology_dimension: int) OneHotEncodedPersistenceDiagram

This method returns all points in a given homology dimension.

get_raw_data() Tensor

This method returns the raw data of the persistence diagram. This function should not be used to change the data.

static load(path: str) OneHotEncodedPersistenceDiagram

This method loads a persistence diagram from a file.

plot(names: List[str] | None = None) Figure

This method plots the persistence diagram.

Args:
names:

The names of the homology dimensions.

Examples:

pd =  torch.tensor                      ([[0.0928, 0.0995, 0.0000, 0.0000, 1.0000, 0.0000],
            [0.0916, 0.1025, 1.0000, 0.0000, 0.0000, 0.0000],
            [0.0978, 0.1147, 1.0000, 0.0000, 0.0000, 0.0000],
            [0.0978, 0.1147, 0.0000, 0.0000, 1.0000, 0.0000],
            [0.0916, 0.1162, 0.0000, 0.0000, 0.0000, 1.0000],
            [0.0740, 0.0995, 1.0000, 0.0000, 0.0000, 0.0000],
            [0.0728, 0.0995, 1.0000, 0.0000, 0.0000, 0.0000],
            [0.0740, 0.1162, 0.0000, 0.0000, 0.0000, 1.0000],
            [0.0728, 0.1162, 0.0000, 0.0000, 1.0000, 0.0000],
            [0.0719, 0.1343, 0.0000, 0.0000, 0.0000, 1.0000],
            [0.0830, 0.2194, 1.0000, 0.0000, 0.0000, 0.0000],
            [0.0830, 0.2194, 1.0000, 0.0000, 0.0000, 0.0000],
            [0.0719, 0.2194, 0.0000, 1.0000, 0.0000, 0.0000]])

names = ["Ord0", "Ext0", "Rel1", "Ext1"]
pd = OneHotEncodedPersistenceDiagram(pd, names)
pd.plot()
save(path: str) None

This method saves the persistence diagram to a file.

set_homology_dimension_names(homology_dimension_names: List[str]) None

This method sets the homology dimension names.

gdeep.data.persistence_diagrams.collate_fn_persistence_diagrams(batch: List[Tuple[OneHotEncodedPersistenceDiagram, int]]) Tuple[List[Tensor], Tensor]

This function collates the data for the persistence diagram by padding the data, converting the data to tensors, converting the labels to tensors and generating masks for the valid entries.

The input is a list of tuples of the form (persistence diagram, label).

Args:
batch:

The list of tuples of the form (persistence diagram, label).

Returns:

The data, the labels and the masks.

gdeep.data.persistence_diagrams.get_one_hot_encoded_persistence_diagram_from_gtda(persistence_diagram: ndarray) OneHotEncodedPersistenceDiagram

This function takes a single persistence diagram from giotto-tda and returns a one-hot encoded persistence diagram.

Args:
persistence_diagram:

An array of shape (num_points, 3) where the first two columns represent the coordinates of the points and the third column represents the index of the homology dimension.

Returns:
OneHotEncodedPersistenceDiagram:

A one-hot encoded persistence diagram. If the persistence diagram has only one homology dimension, the third column will be filled with ones.

gdeep.data.persistence_diagrams.get_one_hot_encoded_persistence_diagram_from_gudhi_extended(diagram: Tuple[ndarray, ndarray, ndarray, ndarray]) OneHotEncodedPersistenceDiagram

Convert an extended persistence diagram of a single graph to an array with one-hot encoded homology type. Args:

diagram (Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]):

The diagram of an extended persistence of a single graph.

Returns:
np.ndarray:

The diagram in one-hot encoded homology type of size (num_points, 6).