在try 通过将deep model混合到列表中进行复制时,我陷入了无法动态设置序列层内池大小的境地.例如,考虑以下代码
!pip install -q tensorflow-recommenders
!pip install -q --upgrade tensorflow-datasets
!pip install -q tensorflow-ranking
import pprint
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_ranking as tfr
import tensorflow_recommenders as tfrs
from typing import Dict, Text
import os
import tempfile
import datetime
ratings = tfds.load("movielens/100k-ratings", split="train")
movies = tfds.load("movielens/100k-movies", split="train")
ratings = ratings.map(lambda x: {
"movie_title": x["movie_title"],
"user_id": x["user_id"],
"user_rating": x["user_rating"],
# "timestamp": x["timestamp"],
})
movies = movies.map(lambda x: x["movie_title"])
unique_movie_titles = np.unique(np.concatenate(list(movies.batch(1000))))
unique_user_ids = np.unique(np.concatenate(list(ratings.batch(1_000).map(
lambda x: x["user_id"]))))
class MovieModel(tf.keras.Model):
def __init__(self):
super().__init__()
max_tokens = 10_000_00
self.title_vectorizer = tf.keras.layers.TextVectorization(
max_tokens=max_tokens)
self.title_text_embedding = tf.keras.Sequential([
# tf.keras.layers.Flatten(),
self.title_vectorizer,
tf.keras.layers.Embedding(max_tokens, 32, mask_zero=True),
tf.keras.layers.AveragePooling2D(pool_size=(1,4),strides=1, padding='valid',),
])
self.title_vectorizer.adapt(movies)
def call(self, titles):
return self.title_text_embedding(titles)
在我们创建电影模型之后,让我们先测试一下,然后再将其用于适当的电影数据
下面是测试代码
test_movie_titles = [["M*A*S*H (1970)", "Dances with Wolves (1990)", "Speed (1994)","Dances with Wolves (1990)", "Speed (1994)"]]
md = MovieModel()
test_ratings = md(tf.constant(tf.reshape(test_movie_titles,[1,5,1])) )
test_ratings
这现在工作得很好,我将得到如下输出
<tf.Tensor: shape=(1, 5, 1, 32), dtype=float32, numpy=
array([[[[ 0.00778975, -0.00899004, 0.02926993, -0.00527342,
0.00706512, 0.02012717, 0.03438753, 0.01971687,
-0.00543808, -0.00754605, -0.02241766, 0.00045748,
-0.00785657, -0.00291913, 0.00670988, 0.01176082,
-0.02052191, -0.00751739, -0.01433057, 0.008
-----
----
现在,如果你注意到在上面的代码中,我已经将pool_大小硬编码为1,4(tf.keras.layers.AveragePooling2D(pool_size=(1,4),strides=1, padding='valid',),
),因为我上面使用的测试样本最多只有4个字,所以矢量化将生成大小为4的向量,现在的问题是,当我将整个数据集(电影)传递给模型时,如何确保正确的池大小.我如何将这样的外部值(pool\u size)从外部传递到连续层?
以上代码是使用tensorflow 2.9.1版本在google colab上运行的