jittor_geometric.datasets
Dataset loaders and utilities.
- class jittor_geometric.datasets.Planetoid(root, name, split='public', num_train_per_class=20, num_val=500, num_test=1000, transform=None, pre_transform=None)[source]
- Bases: - InMemoryDataset- The citation network datasets “Cora”, “CiteSeer” and “PubMed” from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. - This class represents three widely-used citation network datasets: Cora, CiteSeer, and PubMed. Nodes correspond to documents, and edges represent citation links between them. The datasets are designed for semi-supervised learning tasks, where training, validation, and test splits are provided as binary masks. - Dataset Details: - Cora: A citation network where nodes represent machine learning papers, and edges represent citations. The task is to classify papers into one of seven classes. 
- CiteSeer: A citation network of research papers in computer and information science. The task is to classify papers into one of six classes. 
- PubMed: A citation network of biomedical papers on diabetes. The task is to classify papers into one of three classes. 
 - Splitting Options: - public: The original fixed split from the paper “Revisiting Semi-Supervised Learning with Graph Embeddings”. - full: Uses all nodes except those in the validation and test sets for training, inspired by “FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling”. - random: Generates random splits for train, validation, and test sets based on the specified parameters. - Parameters:
- root (str) – Root directory where the dataset should be saved. 
- name (str) – The name of the dataset ( - "Cora",- "CiteSeer",- "PubMed").
- split (str) – The type of dataset split ( - "public",- "full",- "random"). Default is- "public".
- num_train_per_class (int, optional) – Number of training samples per class for - "random"split. Default is 20.
- num_val (int, optional) – Number of validation samples for - "random"split. Default is 500.
- num_test (int, optional) – Number of test samples for - "random"split. Default is 1000.
- transform (callable, optional) – A function/transform that takes in a - torch_geometric.data.Dataobject and returns a transformed version. Default is- None.
- pre_transform (callable, optional) – A function/transform that takes in a - torch_geometric.data.Dataobject and returns a transformed version before saving to disk. Default is- None.
 
 - Example - >>> dataset = Planetoid(root='/path/to/dataset', name='Cora', split='random') >>> data = dataset[0] # Access the processed data object - url = 'https://github.com/kimiyoung/planetoid/raw/master/data'
 - __init__(root, name, split='public', num_train_per_class=20, num_val=500, num_test=1000, transform=None, pre_transform=None)[source]
 - property raw_dir
 - property processed_dir
 - property raw_file_names
- The name of the files to find in the - self.raw_dirfolder in order to skip the download.
 - property processed_file_names
- The name of the files to find in the - self.processed_dirfolder in order to skip the processing.
 
- class jittor_geometric.datasets.Amazon(root, name, transform=None, pre_transform=None)[source]
- Bases: - InMemoryDataset- The Amazon Computers and Amazon Photo datasets from the paper “Pitfalls of Graph Neural Network Evaluation” <https://arxiv.org/abs/1811.05868>`_. - This class represents the Amazon dataset used in the paper “Pitfalls of Graph Neural Network Evaluation”. In this dataset, nodes represent products, and edges indicate that two products are frequently bought together. The dataset provides product reviews represented as bag-of-words node features, and the task is to classify products into their respective categories. - Dataset Details: - Amazon Computers: This dataset contains products related to computers, where the task is to classify the products based on the reviews and co-purchase information. 
- Amazon Photo: This dataset contains products related to photography, with a similar task of classifying products based on reviews and co-purchase data. 
 - Parameters:
- root (str) – Root directory where the dataset should be saved. 
- name (str) – The name of the dataset, either - "Computers"or- "Photo".
- transform (callable, optional) – A function/transform that takes in a - torch_geometric.data.Dataobject and returns a transformed version. The data object will be transformed on each access. (default:- None)
- pre_transform (callable, optional) – A function/transform that takes in a - torch_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:- None)
 
 - Example - >>> dataset = Amazon(root='/path/to/dataset', name='Computers') >>> dataset.data >>> dataset[0] # Accessing the first data point - url = 'https://github.com/shchur/gnn-benchmark/raw/master/data/npz/'
 - property raw_file_names: str
- The name of the files to find in the - self.raw_dirfolder in order to skip the download.
 
- class jittor_geometric.datasets.WikipediaNetwork(root, name, transform=None, pre_transform=None)[source]
- Bases: - InMemoryDataset- Heterophilic dataset from the paper ‘A critical look at the evaluation of GNNs under heterophily: Are we really making progress?’ <https://arxiv.org/abs/2302.11640>. - This class represents a collection of heterophilic graph datasets used to evaluate the performance of Graph Neural Networks (GNNs) in heterophilic settings. These datasets consist of graphs where nodes are connected based on certain relationships, and the task is to classify the nodes based on their features or labels. The datasets in this collection come from different domains, and each dataset has a unique structure and task. - Dataset Details: - Chameleon 
- Squirrel 
- Chameleon-Filtered 
- Squirrel-Filtered 
 - Parameters:
- root (str) – Root directory where the dataset should be saved. 
- name (str) – The name of the dataset to load. Options include: - “chameleon” - “squirrel” - “chameleon_filtered” - “squirrel_filtered” 
- transform (callable, optional) – A function/transform that takes in a - Dataobject and returns a transformed version. The data object will be transformed on every access. (default:- None)
- pre_transform (callable, optional) – A function/transform that takes in a - Dataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:- None)
 
 - Example - >>> dataset = Wikipedia(root='/path/to/dataset', name='chameleon') >>> dataset.data >>> dataset[0] # Accessing the first data point - url = 'https://github.com/yandex-research/heterophilous-graphs/raw/main/data'
 - property raw_file_names: str
- The name of the files to find in the - self.raw_dirfolder in order to skip the download.
 
- class jittor_geometric.datasets.GeomGCN(root, name, transform=None, pre_transform=None)[source]
- Bases: - InMemoryDataset- The GeomGCN datasets used in the “Geom-GCN: Geometric Graph Convolutional Networks” <https://openreview.net/forum?id=S1e2agrFvS>`_ paper. - This class represents the datasets used in the Geom-GCN paper, which focuses on geometric graph convolutional networks. The datasets consist of graphs where nodes represent various entities, and edges represent relationships between them. The goal is to apply graph convolutional networks (GCNs) in the context of geometric graphs to classify nodes based on their features. - Dataset Details: - Cornell, Texas, Wisconsin: These datasets represent web pages from the Cornell, Texas, and Wisconsin universities, where nodes are web pages, and edges represent hyperlinks between them. The task is to classify web pages into one of five categories: student, project, course, staff, and faculty. 
- Actor: In the Actor dataset, each node corresponds to an actor, and edges between nodes represent co-occurrence on the same Wikipedia page. The task is to classify the actors into one of five categories based on keywords extracted from their Wikipedia pages. 
 - Parameters:
- root (str) – Root directory where the dataset should be saved. 
- name (str) – The name of the dataset to load. Options include: - - "Cornell"-- "Texas"-- "Wisconsin"-- "Actor"
- transform (callable, optional) – A function/transform that takes in a - jittor_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before every access. (default:- None)
- pre_transform (callable, optional) – A function/transform that takes in a - jittor_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:- None)
 
 - Example - >>> dataset = GeomGCN(root='/path/to/dataset', name='Cornell') >>> dataset.data >>> dataset[0] # Accessing the first data point - url = 'https://raw.githubusercontent.com/graphdml-uiuc-jlu/geom-gcn/master'
 - property raw_file_names: List[str]
- The name of the files to find in the - self.raw_dirfolder in order to skip the download.
 
- class jittor_geometric.datasets.LINKXDataset(root, name, transform=None, pre_transform=None)[source]
- Bases: - InMemoryDataset- A variety of non-homophilous graph datasets from the paper “Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods” <https://arxiv.org/abs/2110.14446>. - Dataset Details: - Penn94: A friendship network of university students from the Facebook 100 dataset. Nodes represent students, with labels indicating gender. Node features include major, dorm, year, and high school. 
- Pokec: A friendship network from a Slovak online social network. Nodes represent users, connected by directed friendship relations. Node features include profile information like region, registration time, and age, with labels based on gender. 
- arXiv-year: Based on the ogbn-arXiv network, with nodes representing papers and edges representing citations. The classification task is set to predict the year a paper was posted, using word2vec features derived from the title and abstract. 
- snap-patents: A citation network of U.S. utility patents, where nodes represent patents and edges denote citations. The classification task is to predict the year a patent was granted, with node features derived from patent metadata. 
- genius: A social network from genius.com, where nodes are users connected by mutual follows. The task is to predict whether a user account is marked as “gone,” based on usage features like expertise score and contribution counts. 
- twitch-gamers: A network of Twitch accounts with edges between mutual followers. Node features include account statistics like views, creation date, and account status. The binary classification task is to predict whether a channel has explicit content. 
- wiki: A graph of Wikipedia articles, with nodes representing pages and edges representing links between them. Node features are GloVe embeddings from the title and abstract. Labels represent total page views, categorized into quintiles. 
 - Parameters:
- root (str) – Root directory where the dataset should be saved. 
- name (str) – The name of the dataset to load. Options include: - - "penn94"-- "pokec"-- "arxiv-year"-- "snap-patents"-- "genius"-- "twitch-gamers"-- "wiki"
- transform (callable, optional) – A function/transform that takes in a - Dataobject and returns a transformed version. The data object will be transformed on each access. (default:- None)
- pre_transform (callable, optional) – A function/transform that takes in a - Dataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:- None)
 
 - Example - >>> dataset = LINKXDataset(root='/path/to/dataset', name='pokec') >>> dataset.data >>> dataset[0] # Accessing the first data point - property raw_file_names: List[str]
- The name of the files to find in the - self.raw_dirfolder in order to skip the download.
 
- class jittor_geometric.datasets.OGBNodePropPredDataset(name, root='dataset', transform=None, pre_transform=None, meta_dict=None)[source]
- Bases: - InMemoryDataset- The Open Graph Benchmark (OGB) Node Property Prediction Datasets, provided by the OGB team. These datasets are designed to benchmark large-scale node property prediction tasks on real-world graphs. - This class provides access to various OGB datasets focused on node property prediction tasks. Each dataset contains nodes representing entities (e.g., papers, products) and edges representing relationships (e.g., citations, co-purchases). The goal is to predict specific node-level properties, such as categories or timestamps, based on the graph structure and node features. - Dataset Details: - ogbn-arxiv: A citation network where nodes represent arXiv papers and directed edges indicate citation relationships. The task is to predict the subject area of each paper based on word2vec features derived from the title and abstract. 
- ogbn-products: An Amazon product co-purchasing network where nodes represent products and edges indicate frequently co-purchased products. The task is to classify each product based on its category, with node features based on product descriptions. 
- ogbn-paper100M: A large-scale citation network where nodes represent research papers and edges indicate citation links. The node features are derived from word embeddings of the paper abstracts. The task is to predict the subject area of each paper. 
 - These datasets are provided by the Open Graph Benchmark (OGB) team, which aims to facilitate machine learning research on graphs by offering diverse, large-scale datasets. For more details, visit the OGB website: https://ogb.stanford.edu/. - Parameters:
- name (str) – The name of the dataset to load. Options include: - - "ogbn-arxiv"-- "ogbn-products"-- "ogbn-paper100M"
- root (str) – Root directory where the dataset folder will be stored. 
- transform (callable, optional) – A function/transform that takes in a graph object and returns a transformed version. The graph object will be transformed on each access. (default: - None)
- pre_transform (callable, optional) – A function/transform that takes in a graph object and returns a transformed version. The graph object will be transformed before being saved to disk. (default: - None)
- meta_dict (dict, optional) – A dictionary containing meta-information about the dataset. When provided, it overrides default meta-information, useful for debugging or contributions from external users. 
 
 - Example - >>> dataset = OGBNodePropPredDataset(name="ogbn-arxiv", root="path/to/dataset") >>> data = dataset[0] # Access the first graph object - Acknowledgment:
- The OGBNodePropPredDataset is developed and maintained by the Open Graph Benchmark (OGB) team. We sincerely thank the OGB team for their significant contributions to the graph machine learning community. 
 - property num_classes
- The number of classes in the dataset. 
 - property raw_file_names
- The name of the files to find in the - self.raw_dirfolder in order to skip the download.
 - property processed_file_names
- The name of the files to find in the - self.processed_dirfolder in order to skip the processing.
 
- class jittor_geometric.datasets.HeteroDataset(root, name, transform=None, pre_transform=None)[source]
- Bases: - InMemoryDataset- Heterophilic dataset from the paper ‘A critical look at the evaluation of GNNs under heterophily: Are we really making progress?’ <https://arxiv.org/abs/2302.11640>. - This class represents a collection of heterophilic graph datasets used to evaluate the performance of Graph Neural Networks (GNNs) in heterophilic settings. These datasets consist of graphs where nodes are connected based on certain relationships, and the task is to classify the nodes based on their features or labels. The datasets in this collection come from different domains, and each dataset has a unique structure and task. - Dataset Details: - Roman Empire: A graph from the Wikipedia article on the Roman Empire. Nodes represent words, connected based on their order or syntactic dependencies, with the task to classify words by syntactic roles. 
- Amazon Ratings: Based on Amazon co-purchasing data, where nodes are products connected if frequently bought together. The task is to predict the product’s average rating. 
- Minesweeper: A synthetic graph based on Minesweeper. Nodes represent grid cells with edges to neighbors. The task is to predict which cells contain mines. 
- Tolokers: Represents workers from the Toloka platform, with edges indicating shared tasks. The goal is to predict if a worker was banned. 
- Questions: Built from Yandex Q data, with nodes representing users who answered each other’s questions on the topic of medicine. The task is to predict if a user remained active. 
 - Parameters:
- root (str) – Root directory where the dataset should be saved. 
- name (str) – The name of the dataset to load. Options include: - “roman-empire” - “amazon-ratings” - “minesweeper” - “tolokers” - “questions” 
- transform (callable, optional) – A function/transform that takes in a - Dataobject and returns a transformed version. The data object will be transformed on every access. (default:- None)
- pre_transform (callable, optional) – A function/transform that takes in a - Dataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:- None)
 
 - Example - >>> dataset = HeteroDataset(root='/path/to/dataset', name='amazon-ratings') >>> dataset.data >>> dataset[0] # Accessing the first data point - url = 'https://github.com/yandex-research/heterophilous-graphs/raw/main/data'
 - property raw_file_names: str
- The name of the files to find in the - self.raw_dirfolder in order to skip the download.
 
- class jittor_geometric.datasets.JODIEDataset(root, name, transform=None, pre_transform=None)[source]
- Bases: - InMemoryDataset- The temporal graph datasets from the paper “JODIE: Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks” <https://cs.stanford.edu/~srijan/pubs/jodie-kdd2019.pdf>. - This class handles loading and processing temporal graph datasets used in the JODIE paper. It is designed for graph-based machine learning tasks, such as dynamic embedding and link prediction. The dataset includes interactions between users and entities (e.g., subreddits, Wikipedia pages, songs, or MOOC course items), and the interactions are timestamped. - Dataset Details: - Reddit Post Dataset: This dataset consists of interactions between users and subreddits. We selected the 1,000 most active subreddits and the 10,000 most active users, resulting in over 672,447 interactions. Each post’s text is represented as a feature vector using LIWC categories. 
- Wikipedia Edits: This dataset represents edits made by users on Wikipedia pages. We selected the 1,000 most edited pages and users with at least 5 edits, totaling 8,227 users and 157,474 interactions. Each edit is converted into a LIWC-feature vector. 
- LastFM Song Listens: This dataset records user-song interactions, with 1,000 users and the 1,000 most listened-to songs, resulting in 1,293,103 interactions. Unlike other datasets, interactions do not have features. 
- MOOC Student Drop-Out: This dataset captures student interactions (e.g., viewing videos, submitting answers) on a MOOC online course. There are 7,047 users interacting with 98 items (videos, answers, etc.), generating over 411,749 interactions, including 4,066 drop-out events. 
 - Parameters:
- root (str) – Root directory where the dataset should be saved. 
- name (str) – The name of the dataset, options include: - - "Reddit"-- "Wikipedia"-- "LastFM"-- "MOOC"
- transform (callable, optional) – A function/transform that takes in a - Dataobject and returns a transformed version. The data object will be transformed on each access. (default:- None)
- pre_transform (callable, optional) – A function/transform that takes in a - Dataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:- None)
 
 - Example - >>> dataset = JODIEDataset(root='/path/to/dataset', name='Reddit') >>> dataset.data >>> dataset[0] # Accessing the first data point - url = 'http://snap.stanford.edu/jodie/{}.csv'
 - names = ['reddit', 'wikipedia', 'mooc', 'lastfm']
 - property raw_file_names: str
- The name of the files to find in the - self.raw_dirfolder in order to skip the download.
 
- class jittor_geometric.datasets.Reddit(root, transform=None, pre_transform=None)[source]
- Bases: - InMemoryDataset- The Reddit dataset from the “Inductive Representation Learning on Large Graphs” paper, containing Reddit posts belonging to different communities. - This dataset is designed for large-scale graph representation learning. Nodes in the graph represent Reddit posts, and edges represent interactions (e.g., comments) between posts in the same community. The task is to classify posts into one of the 41 communities based on their content and connectivity. - Dataset Statistics: - Number of Nodes: 232,965 
- Number of Edges: 114,615,892 
- Number of Features: 602 
- Number of Classes: 41 
 - The dataset is pre-split into training, validation, and test sets using node type masks. - Parameters:
- root (str) – Root directory where the dataset should be saved. 
- transform (callable, optional) – A function/transform that takes in a - torch_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before every access. (default:- None)
- pre_transform (callable, optional) – A function/transform that takes in an - torch_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:- None)
- force_reload (bool, optional) – Whether to re-process the dataset. (default: - False)
 
 - Example - >>> dataset = Reddit(root='/path/to/reddit') >>> data = dataset[0] # Access the first graph object - url = 'https://data.dgl.ai/dataset/reddit.zip'
 - property raw_file_names: List[str]
- The name of the files to find in the - self.raw_dirfolder in order to skip the download.
 
- class jittor_geometric.datasets.TemporalDataLoader(data, batch_size=1, neg_sampling_ratio=None, drop_last=False, num_neg_sample=None, neg_samples=None)[source]
- Bases: - object
- class jittor_geometric.datasets.QM9(root, transform=None, pre_transform=None, pre_filter=None)[source]
- Bases: - InMemoryDataset- # ! IF YOU MEET NETWORK ERROR, PLEASE TRY TO RUN THE COMMAND BELOW: # export HF_ENDPOINT=https://hf-mirror.com, # TO USE THE MIRROR PROVIDED BY Hugging Face. - The QM9 dataset from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, consisting of about 130,000 molecules with 19 regression targets. Each molecule includes complete spatial information for the single low energy conformation of the atoms in the molecule. In addition, we provide the atom features from the “Neural Message Passing for Quantum Chemistry” paper. - Target - Property - Description - Unit - 0 - \(\mu\) - Dipole moment - \(\textrm{D}\) - 1 - \(\alpha\) - Isotropic polarizability - \({a_0}^3\) - 2 - \(\epsilon_{\textrm{HOMO}}\) - Highest occupied molecular orbital energy - \(\textrm{eV}\) - 3 - \(\epsilon_{\textrm{LUMO}}\) - Lowest unoccupied molecular orbital energy - \(\textrm{eV}\) - 4 - \(\Delta \epsilon\) - Gap between \(\epsilon_{\textrm{HOMO}}\) and \(\epsilon_{\textrm{LUMO}}\) - \(\textrm{eV}\) - 5 - \(\langle R^2 \rangle\) - Electronic spatial extent - \({a_0}^2\) - 6 - \(\textrm{ZPVE}\) - Zero point vibrational energy - \(\textrm{eV}\) - 7 - \(U_0\) - Internal energy at 0K - \(\textrm{eV}\) - 8 - \(U\) - Internal energy at 298.15K - \(\textrm{eV}\) - 9 - \(H\) - Enthalpy at 298.15K - \(\textrm{eV}\) - 10 - \(G\) - Free energy at 298.15K - \(\textrm{eV}\) - 11 - \(c_{\textrm{v}}\) - Heat capavity at 298.15K - \(\frac{\textrm{cal}}{\textrm{mol K}}\) - 12 - \(U_0^{\textrm{ATOM}}\) - Atomization energy at 0K - \(\textrm{eV}\) - 13 - \(U^{\textrm{ATOM}}\) - Atomization energy at 298.15K - \(\textrm{eV}\) - 14 - \(H^{\textrm{ATOM}}\) - Atomization enthalpy at 298.15K - \(\textrm{eV}\) - 15 - \(G^{\textrm{ATOM}}\) - Atomization free energy at 298.15K - \(\textrm{eV}\) - 16 - \(A\) - Rotational constant - \(\textrm{GHz}\) - 17 - \(B\) - Rotational constant - \(\textrm{GHz}\) - 18 - \(C\) - Rotational constant - \(\textrm{GHz}\) - Note - We also provide a pre-processed version of the dataset in case - rdkitis not installed. The pre-processed version matches with the manually processed version as outlined in- process().- Parameters:
- root (str) – Root directory where the dataset should be saved. 
- transform (callable, optional) – A function/transform that takes in an - jt_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before every access. (default:- None)
- pre_transform (callable, optional) – A function/transform that takes in an - jt_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:- None)
- pre_filter (callable, optional) – A function that takes in an - jt_geometric.data.Dataobject and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:- None)
 
 - STATS: - #graphs - #nodes - #edges - #features - #tasks - 130,831 - ~18.0 - ~37.3 - 11 - 19 - raw_url = 'https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/molnet_publish/qm9.zip'
 - raw_url2 = 'https://ndownloader.figshare.com/files/3195404'
 - property raw_file_names: List[str]
- The name of the files to find in the - self.raw_dirfolder in order to skip the download.
 
- class jittor_geometric.datasets.MoleculeNet(root, name, transform=None, pre_transform=None, pre_filter=None, from_smiles=None)[source]
- Bases: - InMemoryDataset- The MoleculeNet benchmark collection from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, containing datasets from physical chemistry, biophysics and physiology. All datasets come with the additional node and edge features introduced by the Open Graph Benchmark. - Parameters:
- root (str) – Root directory where the dataset should be saved. 
- name (str) – The name of the dataset ( - "ESOL",- "FreeSolv",- "Lipo",- "PCBA",- "MUV",- "HIV",- "BACE",- "BBBP",- "Tox21",- "ToxCast",- "SIDER",- "ClinTox").
- transform (callable, optional) – A function/transform that takes in an - jittor_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before every access. (default:- None)
- pre_transform (callable, optional) – A function/transform that takes in an - jittor_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:- None)
- pre_filter (callable, optional) – A function that takes in an - jittor_geometric.data.Dataobject and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:- None)
- from_smiles (callable, optional) – A custom function that takes a SMILES string and outputs a - Dataobject. If not set, defaults to- from_smiles(). (default:- None)
 
 - STATS: - Name - #graphs - #nodes - #edges - #features - #classes - ESOL - 1,128 - ~13.3 - ~27.4 - 9 - 1 - FreeSolv - 642 - ~8.7 - ~16.8 - 9 - 1 - ClinTox - 1,484 - ~26.1 - ~55.5 - 9 - 2 - url = 'https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/{}'
 - 
names: Dict[str,Tuple[str,str,str,int,Union[int,slice]]] = {'bace': ('BACE', 'bace.csv', 'bace', 0, 2), 'bbbp': ('BBBP', 'BBBP.csv', 'BBBP', -1, -2), 'clintox': ('ClinTox', 'clintox.csv.gz', 'clintox', 0, slice(1, 3, None)), 'esol': ('ESOL', 'delaney-processed.csv', 'delaney-processed', -1, -2), 'freesolv': ('FreeSolv', 'SAMPL.csv', 'SAMPL', 1, 2), 'hiv': ('HIV', 'HIV.csv', 'HIV', 0, -1), 'lipo': ('Lipophilicity', 'Lipophilicity.csv', 'Lipophilicity', 2, 1), 'muv': ('MUV', 'muv.csv.gz', 'muv', -1, slice(0, 17, None)), 'pcba': ('PCBA', 'pcba.csv.gz', 'pcba', -1, slice(0, 128, None)), 'sider': ('SIDER', 'sider.csv.gz', 'sider', 0, slice(1, 28, None)), 'tox21': ('Tox21', 'tox21.csv.gz', 'tox21', -1, slice(0, 12, None)), 'toxcast': ('ToxCast', 'toxcast_data.csv.gz', 'toxcast_data', 0, slice(1, 618, None))}
 - __init__(root, name, transform=None, pre_transform=None, pre_filter=None, from_smiles=None)[source]
 - property raw_file_names: str
- The name of the files to find in the - self.raw_dirfolder in order to skip the download.
 
- class jittor_geometric.datasets.MD17(root, name, train=None, transform=None, pre_transform=None, pre_filter=None)[source]
- Bases: - InMemoryDataset- A variety of ab-initio molecular dynamics trajectories from the authors of sGDML. This class provides access to the original MD17 datasets, their revised versions, and the CCSD(T) trajectories. - For every trajectory, the dataset contains the Cartesian positions of atoms (in Angstrom), their atomic numbers, as well as the total energy (in kcal/mol) and forces (kcal/mol/Angstrom) on each atom. The latter two are the regression targets for this collection. - Note - Data objects contain no edge indices as these are most commonly constructed via the - jittor_geometric.transforms.RadiusGraphtransform, with its cut-off being a hyperparameter.- The original MD17 dataset contains ten molecule trajectories. This version of the dataset was found to suffer from high numerical noise. The revised MD17 dataset contains the same molecules, but the energies and forces were recalculated at the PBE/def2-SVP level of theory using very tight SCF convergence and very dense DFT integration grid. The third version of the dataset contains fewer molecules, computed at the CCSD(T) level of theory. The benzene molecule at the DFT FHI-aims level of theory was released separately. - Check the table below for detailed information on the molecule, level of theory and number of data points contained in each dataset. Which trajectory is loaded is determined by the - nameargument. For the coupled cluster trajectories, the dataset comes with pre-defined training and testing splits which are loaded separately via the- trainargument.- Molecule - Level of Theory - Name - #Examples - Benzene - DFT - benzene- 627,983 - Uracil - DFT - uracil- 133,770 - Naphthalene - DFT - naphthalene- 326,250 - Aspirin - DFT - aspirin- 211,762 - Salicylic acid - DFT - salicylic acid- 320,231 - Malonaldehyde - DFT - malonaldehyde- 993,237 - Ethanol - DFT - ethanol- 555,092 - Toluene - DFT - toluene- 442,790 - Paracetamol - DFT - paracetamol- 106,490 - Azobenzene - DFT - azobenzene- 99,999 - Benzene (R) - DFT (PBE/def2-SVP) - revised benzene- 100,000 - Uracil (R) - DFT (PBE/def2-SVP) - revised uracil- 100,000 - Naphthalene (R) - DFT (PBE/def2-SVP) - revised naphthalene- 100,000 - Aspirin (R) - DFT (PBE/def2-SVP) - revised aspirin- 100,000 - Salicylic acid (R) - DFT (PBE/def2-SVP) - revised salicylic acid- 100,000 - Malonaldehyde (R) - DFT (PBE/def2-SVP) - revised malonaldehyde- 100,000 - Ethanol (R) - DFT (PBE/def2-SVP) - revised ethanol- 100,000 - Toluene (R) - DFT (PBE/def2-SVP) - revised toluene- 100,000 - Paracetamol (R) - DFT (PBE/def2-SVP) - revised paracetamol- 100,000 - Azobenzene (R) - DFT (PBE/def2-SVP) - revised azobenzene- 99,988 - Benzene - CCSD(T) - benzene CCSD(T)- 1,500 - Aspirin - CCSD - aspirin CCSD- 1,500 - Malonaldehyde - CCSD(T) - malonaldehyde CCSD(T)- 1,500 - Ethanol - CCSD(T) - ethanol CCSD(T)- 2,000 - Toluene - CCSD(T) - toluene CCSD(T)- 1,501 - Benzene - DFT FHI-aims - benzene FHI-aims- 49,863 - Warning - It is advised to not train a model on more than 1,000 samples from the original or revised MD17 dataset. - Parameters:
- root (str) – Root directory where the dataset should be saved. 
- name (str) – Keyword of the trajectory that should be loaded. 
- train (bool, optional) – Determines whether the train or test split gets loaded for the coupled cluster trajectories. (default: - None)
- transform (callable, optional) – A function/transform that takes in an - jittor_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before every access. (default:- None)
- pre_transform (callable, optional) – A function/transform that takes in an - jittor_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:- None)
- pre_filter (callable, optional) – A function that takes in an - jittor_geometric.data.Dataobject and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:- None)
 
 - STATS: - Name - #graphs - #nodes - #edges - #features - #tasks - Benzene - 627,983 - 12 - 0 - 1 - 2 - Uracil - 133,770 - 12 - 0 - 1 - 2 - Naphthalene - 326,250 - 10 - 0 - 1 - 2 - Aspirin - 211,762 - 21 - 0 - 1 - 2 - Salicylic acid - 320,231 - 16 - 0 - 1 - 2 - Malonaldehyde - 993,237 - 9 - 0 - 1 - 2 - Ethanol - 555,092 - 9 - 0 - 1 - 2 - Toluene - 442,790 - 15 - 0 - 1 - 2 - Paracetamol - 106,490 - 20 - 0 - 1 - 2 - Azobenzene - 99,999 - 24 - 0 - 1 - 2 - Benzene (R) - 100,000 - 12 - 0 - 1 - 2 - Uracil (R) - 100,000 - 12 - 0 - 1 - 2 - Naphthalene (R) - 100,000 - 10 - 0 - 1 - 2 - Aspirin (R) - 100,000 - 21 - 0 - 1 - 2 - Salicylic acid (R) - 100,000 - 16 - 0 - 1 - 2 - Malonaldehyde (R) - 100,000 - 9 - 0 - 1 - 2 - Ethanol (R) - 100,000 - 9 - 0 - 1 - 2 - Toluene (R) - 100,000 - 15 - 0 - 1 - 2 - Paracetamol (R) - 100,000 - 20 - 0 - 1 - 2 - Azobenzene (R) - 99,988 - 24 - 0 - 1 - 2 - Benzene CCSD-T - 1,500 - 12 - 0 - 1 - 2 - Aspirin CCSD-T - 1,500 - 21 - 0 - 1 - 2 - Malonaldehyde CCSD-T - 1,500 - 9 - 0 - 1 - 2 - Ethanol CCSD-T - 2000 - 9 - 0 - 1 - 2 - Toluene CCSD-T - 1,501 - 15 - 0 - 1 - 2 - Benzene FHI-aims - 49,863 - 12 - 0 - 1 - 2 - gdml_url = 'http://quantum-machine.org/gdml/data/npz'
 - revised_url = 'https://archive.materialscloud.org/record/file?filename=rmd17.tar.bz2&record_id=466'
 - file_names = {'aspirin': 'md17_aspirin.npz', 'aspirin CCSD': 'aspirin_ccsd.zip', 'azobenzene': 'azobenzene_dft.npz', 'benzene': 'md17_benzene2017.npz', 'benzene CCSD(T)': 'benzene_ccsd_t.zip', 'benzene FHI-aims': 'benzene2018_dft.npz', 'ethanol': 'md17_ethanol.npz', 'ethanol CCSD(T)': 'ethanol_ccsd_t.zip', 'malonaldehyde': 'md17_malonaldehyde.npz', 'malonaldehyde CCSD(T)': 'malonaldehyde_ccsd_t.zip', 'naphthalene': 'md17_naphthalene.npz', 'paracetamol': 'paracetamol_dft.npz', 'revised aspirin': 'rmd17_aspirin.npz', 'revised azobenzene': 'rmd17_azobenzene.npz', 'revised benzene': 'rmd17_benzene.npz', 'revised ethanol': 'rmd17_ethanol.npz', 'revised malonaldehyde': 'rmd17_malonaldehyde.npz', 'revised naphthalene': 'rmd17_naphthalene.npz', 'revised paracetamol': 'rmd17_paracetamol.npz', 'revised salicylic acid': 'rmd17_salicylic.npz', 'revised toluene': 'rmd17_toluene.npz', 'revised uracil': 'rmd17_uracil.npz', 'salicylic acid': 'md17_salicylic.npz', 'toluene': 'md17_toluene.npz', 'toluene CCSD(T)': 'toluene_ccsd_t.zip', 'uracil': 'md17_uracil.npz'}
 - property raw_file_names: str | List[str]
- The name of the files to find in the - self.raw_dirfolder in order to skip the download.
 
- class jittor_geometric.datasets.PCQM4Mv2(root, split='train', transform=None, from_smiles=None)[source]
- Bases: - InMemoryDataset- The PCQM4Mv2 dataset from the “OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs” paper. - PCQM4Mv2is a quantum chemistry dataset originally curated under the PubChemQC project. The task is to predict the DFT-calculated HOMO-LUMO energy gap of molecules given their 2D molecular graphs.- Parameters:
- root (str) – Root directory where the dataset should be saved. 
- split (str, optional) – If - "train", loads the training dataset. If- "val", loads the validation dataset. If- "test", loads the test dataset. If- "holdout", loads the holdout dataset. (default:- "train")
- transform (callable, optional) – A function/transform that takes in an - jittor_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before every access. (default:- None)
- from_smiles (callable, optional) – A custom function that takes a SMILES string and outputs a - Dataobject. If not set, defaults to- from_smiles(). (default:- None)
 
 - url = 'https://dgl-data.s3-accelerate.amazonaws.com/dataset/OGB-LSC/pcqm4m-v2.zip'
 - split_mapping = {'holdout': 'test-challenge', 'test': 'test-dev', 'train': 'train', 'val': 'valid'}
 - property raw_file_names: List[str]
- The name of the files to find in the - self.raw_dirfolder in order to skip the download.
 
- class jittor_geometric.datasets.MovieLens1M(root, transform=None, pre_transform=None, train_ratio=0.8, val_ratio=0.1, test_ratio=0.1, seed=42, shuffle=True, with_aux=False)[source]
- Bases: - RecSysBase- MovieLens-1M dataset with auto-download from Recbole:
- https://recbole.s3-accelerate.amazonaws.com/ProcessedDatasets/MovieLens/ml-1m.zip 
- Expected (after extraction) in raw_dir:
- ml-1m.item 
- ml-1m.user 
- ml-1m.inter 
 
 - Files are tab-separated; first header row is skipped (skiprows=1). - Parameters:
- with_aux ( - bool)
 - url = 'https://recbole.s3-accelerate.amazonaws.com/ProcessedDatasets/MovieLens/ml-1m.zip'
 - __init__(root, transform=None, pre_transform=None, train_ratio=0.8, val_ratio=0.1, test_ratio=0.1, seed=42, shuffle=True, with_aux=False)[source]
- Parameters:
- with_aux ( - bool)
 
 - property raw_file_names
- The name of the files to find in the - self.raw_dirfolder in order to skip the download.
 
- class jittor_geometric.datasets.MovieLens100K(root, transform=None, pre_transform=None, train_ratio=0.8, val_ratio=0.1, test_ratio=0.1, seed=42, shuffle=True, with_aux=False)[source]
- Bases: - RecSysBase- MovieLens-100K (RecBole processed). - Downloads: https://recbole.s3-accelerate.amazonaws.com/ProcessedDatasets/MovieLens/ml-100k.zip - Expected in raw_dir after extraction: - ml-100k.item 
- ml-100k.user 
- ml-100k.inter 
 - Parameters:
- with_aux ( - bool)
 - url = 'https://recbole.s3-accelerate.amazonaws.com/ProcessedDatasets/MovieLens/ml-100k.zip'
 - __init__(root, transform=None, pre_transform=None, train_ratio=0.8, val_ratio=0.1, test_ratio=0.1, seed=42, shuffle=True, with_aux=False)[source]
- Parameters:
- with_aux ( - bool)
 
 - property raw_file_names
- The name of the files to find in the - self.raw_dirfolder in order to skip the download.
 
- class jittor_geometric.datasets.Yelp2018(root, transform=None, pre_transform=None, train_ratio=0.8, val_ratio=0.1, test_ratio=0.1, seed=42, shuffle=True, with_aux=False)[source]
- Bases: - RecSysBase- Yelp-2018 (RecBole processed). - Downloads: https://recbole.s3-accelerate.amazonaws.com/ProcessedDatasets/Yelp/yelp2018.zip - Accepts either file naming variant inside the zip: - yelp2018.item/.user/.inter (common) 
- yelp-2018.item/.user/.inter (also supported) 
 - After extraction, we normalize to yelp-2018.* in raw_dir. - Parameters:
- with_aux ( - bool)
 - url = 'https://recbole.s3-accelerate.amazonaws.com/ProcessedDatasets/Yelp/yelp2018.zip'
 - property raw_file_names
- The name of the files to find in the - self.raw_dirfolder in order to skip the download.