flurs.datasets.movielens

MovieLens datasets created by GroupLens.

Functions

delta(d1, d2[, opt])

Compute difference between given 2 dates in month/day.

fetch_movielens([data_home, size])

Fetch MovieLens data in the form of User, Item, Event.

load_movies(data_home, size)

Load movie genres as a context.

load_ratings(data_home, size)

Load all samples in the dataset.

load_users(data_home, size)

Load user demographics as contexts.

flurs.datasets.movielens.delta(d1, d2, opt='d')[source]

Compute difference between given 2 dates in month/day.

Parameters
  • d1 (datetime) – First date.

  • d2 (datetime) – Second date.

  • opt ({"d", "m"}, default="d") – "d" if d1 and d2 have day-level granularity. If they only tell about month (day is always 1), "m" interpolates the information.

Returns

Difference between two dates in days.

Return type

int

flurs.datasets.movielens.fetch_movielens(data_home=None, size='100k')[source]

Fetch MovieLens data in the form of User, Item, Event.

Parameters
  • data_home (str) – Absolute path to MovieLens data folder.

  • size ({"100k", "1m"}, default="100k") – String notation of MovieLens data size.

Returns

Dictionary-like object, with the following attributes.

sampleslist of Event

All rating events.

can_repeatbool

False because MovieLens does not contain same user-item interaction more than twice.

contextsdict

Contextual feature name -> Number of its dimensions.

n_userint

Number of unique users.

n_itemint

Number of unique items.

n_sampleint

Number of events.

Return type

sklearn.utils.Bunch

flurs.datasets.movielens.load_movies(data_home, size)[source]

Load movie genres as a context.

Parameters
  • data_home (str) – Absolute path to MovieLens data folder.

  • size ({"100k", "1m"}) – String notation of MovieLens data size.

Returns

item_id -> numpy array (n_genre, ).

Return type

dict of movie vectors

flurs.datasets.movielens.load_ratings(data_home, size)[source]

Load all samples in the dataset.

Parameters
  • data_home (str) – Absolute path to MovieLens data folder.

  • size ({"100k", "1m"}) – String notation of MovieLens data size.

Returns

Single row is [user_id, item_id, rating, timestamp]. Rows are sorted by timestamp.

Return type

array

flurs.datasets.movielens.load_users(data_home, size)[source]

Load user demographics as contexts. User ID -> {sex (M/F), age (7 groupd), occupation(0-20; 21)}

Parameters
  • data_home (str) – Absolute path to MovieLens data folder.

  • size ({"100k", "1m"}) – String notation of MovieLens data size.

Returns

user_id -> numpy array (1 + 1 + 21, ); (sex_flg + age_group + n_occupation, ).

Return type

dict of user vectors