flurs.datasets.movielens
MovieLens datasets created by GroupLens.
Functions
|
Compute difference between given 2 dates in month/day. |
|
Fetch MovieLens data in the form of User, Item, Event. |
|
Load movie genres as a context. |
|
Load all samples in the dataset. |
|
Load user demographics as contexts. |
- flurs.datasets.movielens.delta(d1, d2, opt='d')[source]
Compute difference between given 2 dates in month/day.
- Parameters
d1 (datetime) – First date.
d2 (datetime) – Second date.
opt ({"d", "m"}, default="d") –
"d"
if d1 and d2 have day-level granularity. If they only tell about month (day is always 1),"m"
interpolates the information.
- Returns
Difference between two dates in days.
- Return type
int
- flurs.datasets.movielens.fetch_movielens(data_home=None, size='100k')[source]
Fetch MovieLens data in the form of User, Item, Event.
- Parameters
data_home (str) – Absolute path to MovieLens data folder.
size ({"100k", "1m"}, default="100k") – String notation of MovieLens data size.
- Returns
Dictionary-like object, with the following attributes.
- sampleslist of Event
All rating events.
- can_repeatbool
False
because MovieLens does not contain same user-item interaction more than twice.- contextsdict
Contextual feature name -> Number of its dimensions.
- n_userint
Number of unique users.
- n_itemint
Number of unique items.
- n_sampleint
Number of events.
- Return type
sklearn.utils.Bunch
- flurs.datasets.movielens.load_movies(data_home, size)[source]
Load movie genres as a context.
- Parameters
data_home (str) – Absolute path to MovieLens data folder.
size ({"100k", "1m"}) – String notation of MovieLens data size.
- Returns
item_id -> numpy array (n_genre, ).
- Return type
dict of movie vectors
- flurs.datasets.movielens.load_ratings(data_home, size)[source]
Load all samples in the dataset.
- Parameters
data_home (str) – Absolute path to MovieLens data folder.
size ({"100k", "1m"}) – String notation of MovieLens data size.
- Returns
Single row is
[user_id, item_id, rating, timestamp]
. Rows are sorted bytimestamp
.- Return type
array
- flurs.datasets.movielens.load_users(data_home, size)[source]
Load user demographics as contexts. User ID -> {sex (M/F), age (7 groupd), occupation(0-20; 21)}
- Parameters
data_home (str) – Absolute path to MovieLens data folder.
size ({"100k", "1m"}) – String notation of MovieLens data size.
- Returns
user_id -> numpy array (1 + 1 + 21, ); (sex_flg + age_group + n_occupation, ).
- Return type
dict of user vectors