Command Line Scripts¶

datatest.py¶

Tool for displaying data using loader.read_data_sets.

usage: datatest [-h] [-hashlist HASHLIST [HASHLIST ...]]
                [-cold | -subfolders SUBFOLDERS [SUBFOLDERS ...]]
                datadirectory

Positional arguments:

datadirectory

Path to folder where data to be loaded and displayed is stored.

Options:

`-hashlist`	List of hashes to read. Files will be read of the form “features_<hash>.ext” or”labels_<hash>.ext” where <hash> is a string in hashlist. If a hashlist is not specified all files of the form “features_<hash>.ext” or “labels_<hash>.ext” regardless what string <hash> is will be loaded.
`-cold=False`	Extra loading and testing for cold datasets
`-subfolders=('test', 'dev', 'train')`
	List of subfolders to load and display.

normalize.py¶

Given the path to a file, Capitalization and punctuation is removed, except for infix apostrophes, e.g. “hasn’t”, “David’s”. The normalized text is saved with “_norm” appended to the file name before the extension. The normalized text is saved in the same directory as the original text. Beginning and end of sentence tokens are not provided by this normalization script.

usage: normalize [-h] filepath

Positional arguments:

filepath

The path to the file including filename

Movie Lens Processing¶

generateTermDoc.py¶

usage: generateTermDoc [-h] datapath dictionary descriptions doc_term_file

Positional arguments:

`datapath`	Path to folder where dictionary and descriptions are located, and created document term matrix will be saved.
`dictionary`	Name of the file containing line separated words in vocabulary.
`descriptions`	Name of the file containing line separated text descriptions.
`doc_term_file`	Name of the file to save the created sparse document term matrix.

ml100k_item_process.py¶

Reads MovieLens 100k item meta data and converts to feature files. features_item_month.index: The produced files are: A file storing a HotIndex object of movie month releases.

features_item_year.mat: A file storing a numpy array of movie year releases.

features_item_genre.mat: A file storing a scipy sparse csr_matrix of one hot encodings for movie genre.

usage: ml100k_item_process [-h] datapath outpath

Positional arguments:

`datapath`	The path to ml-100k dataset. Usually “some_relative_path/ml-100k
`outpath`	The path to the folder to store the processed Movielens 100k item data feature files.

ml100k_user_process.py¶

Tool to process Movielens 100k user Metadata.

usage: ml100k_user_process [-h] datapath outpath

Positional arguments:

`datapath`	Path to ml-100k
`outpath`	Path to save created files to.