<acronym id="s8ci2"><small id="s8ci2"></small></acronym>
<rt id="s8ci2"></rt><rt id="s8ci2"><optgroup id="s8ci2"></optgroup></rt>
<acronym id="s8ci2"></acronym>
<acronym id="s8ci2"><center id="s8ci2"></center></acronym>
0
  • 聊天消息
  • 系統消息
  • 評論與回復
登錄后你可以
  • 下載海量資料
  • 學習在線課程
  • 觀看技術視頻
  • 寫文章/發帖/加入社區
會員中心
創作中心

完善資料讓更多小伙伴認識你,還能領取20積分哦,立即完善>

3天內不再提示

PyTorch教程-15.9。預訓練 BERT 的數據集

jf_pJlTbmA9 ? 來源:PyTorch ? 作者:PyTorch ? 2023-06-05 15:44 ? 次閱讀

為了預訓練第 15.8 節中實現的 BERT 模型,我們需要以理想的格式生成數據集,以促進兩項預訓練任務:掩碼語言建模和下一句預測。一方面,原始的 BERT 模型是在兩個巨大的語料庫 BookCorpus 和英文維基百科(參見第15.8.5 節)的串聯上進行預訓練的,這使得本書的大多數讀者難以運行。另一方面,現成的預訓練 BERT 模型可能不適合醫學等特定領域的應用。因此,在自定義數據集上預訓練 BERT 變得越來越流行。為了便于演示 BERT 預訓練,我們使用較小的語料庫 WikiText-2 ( Merity et al. , 2016 )。

與 15.3節用于預訓練word2vec的PTB數據集相比,WikiText-2(i)保留了原有的標點符號,適合下一句預測;(ii) 保留原始案例和編號;(iii) 大兩倍以上。

import os
import random
import torch
from d2l import torch as d2l

import os
import random
from mxnet import gluon, np, npx
from d2l import mxnet as d2l

npx.set_np()

在 WikiText-2 數據集中,每一行代表一個段落,其中在任何標點符號及其前面的標記之間插入空格。保留至少兩句話的段落。為了簡單起見,為了拆分句子,我們只使用句點作為分隔符。我們將在本節末尾的練習中討論更復雜的句子拆分技術。

#@save
d2l.DATA_HUB['wikitext-2'] = (
  'https://s3.amazonaws.com/research.metamind.io/wikitext/'
  'wikitext-2-v1.zip', '3c914d17d80b1459be871a5039ac23e752a53cbe')

#@save
def _read_wiki(data_dir):
  file_name = os.path.join(data_dir, 'wiki.train.tokens')
  with open(file_name, 'r') as f:
    lines = f.readlines()
  # Uppercase letters are converted to lowercase ones
  paragraphs = [line.strip().lower().split(' . ')
         for line in lines if len(line.split(' . ')) >= 2]
  random.shuffle(paragraphs)
  return paragraphs

#@save
d2l.DATA_HUB['wikitext-2'] = (
  'https://s3.amazonaws.com/research.metamind.io/wikitext/'
  'wikitext-2-v1.zip', '3c914d17d80b1459be871a5039ac23e752a53cbe')

#@save
def _read_wiki(data_dir):
  file_name = os.path.join(data_dir, 'wiki.train.tokens')
  with open(file_name, 'r') as f:
    lines = f.readlines()
  # Uppercase letters are converted to lowercase ones
  paragraphs = [line.strip().lower().split(' . ')
         for line in lines if len(line.split(' . ')) >= 2]
  random.shuffle(paragraphs)
  return paragraphs

15.9.1。為預訓練任務定義輔助函數

下面,我們首先為兩個 BERT 預訓練任務實現輔助函數:下一句預測和掩碼語言建模。這些輔助函數將在稍后將原始文本語料庫轉換為理想格式的數據集以預訓練 BERT 時調用。

15.9.1.1。生成下一句預測任務

根據15.8.5.2 節的描述,該 _get_next_sentence函數為二元分類任務生成一個訓練樣例。

#@save
def _get_next_sentence(sentence, next_sentence, paragraphs):
  if random.random() < 0.5:
    is_next = True
  else:
    # `paragraphs` is a list of lists of lists
    next_sentence = random.choice(random.choice(paragraphs))
    is_next = False
  return sentence, next_sentence, is_next

#@save
def _get_next_sentence(sentence, next_sentence, paragraphs):
  if random.random() < 0.5:
    is_next = True
  else:
    # `paragraphs` is a list of lists of lists
    next_sentence = random.choice(random.choice(paragraphs))
    is_next = False
  return sentence, next_sentence, is_next

以下函數paragraph通過調用該 _get_next_sentence函數從輸入生成用于下一句預測的訓練示例。這paragraph是一個句子列表,其中每個句子都是一個標記列表。該參數 max_len指定預訓練期間 BERT 輸入序列的最大長度。

#@save
def _get_nsp_data_from_paragraph(paragraph, paragraphs, vocab, max_len):
  nsp_data_from_paragraph = []
  for i in range(len(paragraph) - 1):
    tokens_a, tokens_b, is_next = _get_next_sentence(
      paragraph[i], paragraph[i + 1], paragraphs)
    # Consider 1 '' token and 2 '' tokens
    if len(tokens_a) + len(tokens_b) + 3 > max_len:
      continue
    tokens, segments = d2l.get_tokens_and_segments(tokens_a, tokens_b)
    nsp_data_from_paragraph.append((tokens, segments, is_next))
  return nsp_data_from_paragraph

#@save
def _get_nsp_data_from_paragraph(paragraph, paragraphs, vocab, max_len):
  nsp_data_from_paragraph = []
  for i in range(len(paragraph) - 1):
    tokens_a, tokens_b, is_next = _get_next_sentence(
      paragraph[i], paragraph[i + 1], paragraphs)
    # Consider 1 '' token and 2 '' tokens
    if len(tokens_a) + len(tokens_b) + 3 > max_len:
      continue
    tokens, segments = d2l.get_tokens_and_segments(tokens_a, tokens_b)
    nsp_data_from_paragraph.append((tokens, segments, is_next))
  return nsp_data_from_paragraph

15.9.1.2。生成掩碼語言建模任務

為了從 BERT 輸入序列為掩碼語言建模任務生成訓練示例,我們定義了以下 _replace_mlm_tokens函數。在它的輸入中,tokens是代表BERT輸入序列的token列表,candidate_pred_positions 是BERT輸入序列的token索引列表,不包括特殊token(masked語言建模任務中不預測特殊token),num_mlm_preds表示預測(召回 15% 的隨機標記來預測)。遵循第 15.8.5.1 節中屏蔽語言建模任務的定義 ,在每個預測位置,輸入可能被特殊的“”標記或隨機標記替換,或者保持不變。最后,該函數返回可能替換后的輸入標記、發生預測的標記索引以及這些預測的標簽。

#@save
def _replace_mlm_tokens(tokens, candidate_pred_positions, num_mlm_preds,
            vocab):
  # For the input of a masked language model, make a new copy of tokens and
  # replace some of them by '' or random tokens
  mlm_input_tokens = [token for token in tokens]
  pred_positions_and_labels = []
  # Shuffle for getting 15% random tokens for prediction in the masked
  # language modeling task
  random.shuffle(candidate_pred_positions)
  for mlm_pred_position in candidate_pred_positions:
    if len(pred_positions_and_labels) >= num_mlm_preds:
      break
    masked_token = None
    # 80% of the time: replace the word with the '' token
    if random.random() < 0.8:
      masked_token = ''
    else:
      # 10% of the time: keep the word unchanged
      if random.random() < 0.5:
        masked_token = tokens[mlm_pred_position]
      # 10% of the time: replace the word with a random word
      else:
        masked_token = random.choice(vocab.idx_to_token)
    mlm_input_tokens[mlm_pred_position] = masked_token
    pred_positions_and_labels.append(
      (mlm_pred_position, tokens[mlm_pred_position]))
  return mlm_input_tokens, pred_positions_and_labels

#@save
def _replace_mlm_tokens(tokens, candidate_pred_positions, num_mlm_preds,
            vocab):
  # For the input of a masked language model, make a new copy of tokens and
  # replace some of them by '' or random tokens
  mlm_input_tokens = [token for token in tokens]
  pred_positions_and_labels = []
  # Shuffle for getting 15% random tokens for prediction in the masked
  # language modeling task
  random.shuffle(candidate_pred_positions)
  for mlm_pred_position in candidate_pred_positions:
    if len(pred_positions_and_labels) >= num_mlm_preds:
      break
    masked_token = None
    # 80% of the time: replace the word with the '' token
    if random.random() < 0.8:
      masked_token = ''
    else:
      # 10% of the time: keep the word unchanged
      if random.random() < 0.5:
        masked_token = tokens[mlm_pred_position]
      # 10% of the time: replace the word with a random word
      else:
        masked_token = random.choice(vocab.idx_to_token)
    mlm_input_tokens[mlm_pred_position] = masked_token
    pred_positions_and_labels.append(
      (mlm_pred_position, tokens[mlm_pred_position]))
  return mlm_input_tokens, pred_positions_and_labels

通過調用上述_replace_mlm_tokens函數,以下函數將 BERT 輸入序列 ( tokens) 作為輸入并返回輸入標記的索引(在可能的標記替換之后,如第15.8.5.1 節所述)、發生預測的標記索引和標簽這些預測的指標。

#@save
def _get_mlm_data_from_tokens(tokens, vocab):
  candidate_pred_positions = []
  # `tokens` is a list of strings
  for i, token in enumerate(tokens):
    # Special tokens are not predicted in the masked language modeling
    # task
    if token in ['', '']:
      continue
    candidate_pred_positions.append(i)
  # 15% of random tokens are predicted in the masked language modeling task
  num_mlm_preds = max(1, round(len(tokens) * 0.15))
  mlm_input_tokens, pred_positions_and_labels = _replace_mlm_tokens(
    tokens, candidate_pred_positions, num_mlm_preds, vocab)
  pred_positions_and_labels = sorted(pred_positions_and_labels,
                    key=lambda x: x[0])
  pred_positions = [v[0] for v in pred_positions_and_labels]
  mlm_pred_labels = [v[1] for v in pred_positions_and_labels]
  return vocab[mlm_input_tokens], pred_positions, vocab[mlm_pred_labels]

#@save
def _get_mlm_data_from_tokens(tokens, vocab):
  candidate_pred_positions = []
  # `tokens` is a list of strings
  for i, token in enumerate(tokens):
    # Special tokens are not predicted in the masked language modeling
    # task
    if token in ['', '']:
      continue
    candidate_pred_positions.append(i)
  # 15% of random tokens are predicted in the masked language modeling task
  num_mlm_preds = max(1, round(len(tokens) * 0.15))
  mlm_input_tokens, pred_positions_and_labels = _replace_mlm_tokens(
    tokens, candidate_pred_positions, num_mlm_preds, vocab)
  pred_positions_and_labels = sorted(pred_positions_and_labels,
                    key=lambda x: x[0])
  pred_positions = [v[0] for v in pred_positions_and_labels]
  mlm_pred_labels = [v[1] for v in pred_positions_and_labels]
  return vocab[mlm_input_tokens], pred_positions, vocab[mlm_pred_labels]

15.9.2。將文本轉換為預訓練數據集

現在我們幾乎準備好定制一個Dataset用于預訓練 BERT 的類。在此之前,我們仍然需要定義一個輔助函數 _pad_bert_inputs來將特殊的“”標記附加到輸入中。它的參數examples包含輔助函數 _get_nsp_data_from_paragraph和_get_mlm_data_from_tokens兩個預訓練任務的輸出。

#@save
def _pad_bert_inputs(examples, max_len, vocab):
  max_num_mlm_preds = round(max_len * 0.15)
  all_token_ids, all_segments, valid_lens, = [], [], []
  all_pred_positions, all_mlm_weights, all_mlm_labels = [], [], []
  nsp_labels = []
  for (token_ids, pred_positions, mlm_pred_label_ids, segments,
     is_next) in examples:
    all_token_ids.append(torch.tensor(token_ids + [vocab['']] * (
      max_len - len(token_ids)), dtype=torch.long))
    all_segments.append(torch.tensor(segments + [0] * (
      max_len - len(segments)), dtype=torch.long))
    # `valid_lens` excludes count of '' tokens
    valid_lens.append(torch.tensor(len(token_ids), dtype=torch.float32))
    all_pred_positions.append(torch.tensor(pred_positions + [0] * (
      max_num_mlm_preds - len(pred_positions)), dtype=torch.long))
    # Predictions of padded tokens will be filtered out in the loss via
    # multiplication of 0 weights
    all_mlm_weights.append(
      torch.tensor([1.0] * len(mlm_pred_label_ids) + [0.0] * (
        max_num_mlm_preds - len(pred_positions)),
        dtype=torch.float32))
    all_mlm_labels.append(torch.tensor(mlm_pred_label_ids + [0] * (
      max_num_mlm_preds - len(mlm_pred_label_ids)), dtype=torch.long))
    nsp_labels.append(torch.tensor(is_next, dtype=torch.long))
  return (all_token_ids, all_segments, valid_lens, all_pred_positions,
      all_mlm_weights, all_mlm_labels, nsp_labels)

#@save
def _pad_bert_inputs(examples, max_len, vocab):
  max_num_mlm_preds = round(max_len * 0.15)
  all_token_ids, all_segments, valid_lens, = [], [], []
  all_pred_positions, all_mlm_weights, all_mlm_labels = [], [], []
  nsp_labels = []
  for (token_ids, pred_positions, mlm_pred_label_ids, segments,
     is_next) in examples:
    all_token_ids.append(np.array(token_ids + [vocab['']] * (
      max_len - len(token_ids)), dtype='int32'))
    all_segments.append(np.array(segments + [0] * (
      max_len - len(segments)), dtype='int32'))
    # `valid_lens` excludes count of '' tokens
    valid_lens.append(np.array(len(token_ids), dtype='float32'))
    all_pred_positions.append(np.array(pred_positions + [0] * (
      max_num_mlm_preds - len(pred_positions)), dtype='int32'))
    # Predictions of padded tokens will be filtered out in the loss via
    # multiplication of 0 weights
    all_mlm_weights.append(
      np.array([1.0] * len(mlm_pred_label_ids) + [0.0] * (
        max_num_mlm_preds - len(pred_positions)), dtype='float32'))
    all_mlm_labels.append(np.array(mlm_pred_label_ids + [0] * (
      max_num_mlm_preds - len(mlm_pred_label_ids)), dtype='int32'))
    nsp_labels.append(np.array(is_next))
  return (all_token_ids, all_segments, valid_lens, all_pred_positions,
      all_mlm_weights, all_mlm_labels, nsp_labels)

將兩個預訓練任務生成訓練樣例的輔助函數和填充輸入的輔助函數放在一起,我們自定義如下類_WikiTextDataset作為預訓練 BERT 的 WikiText-2 數據集。通過實現該 __getitem__功能,我們可以任意訪問從 WikiText-2 語料庫中的一對句子生成的預訓練(掩碼語言建模和下一句預測)示例。

原始 BERT 模型使用詞匯量為 30000 的 WordPiece 嵌入( Wu et al. , 2016 )。WordPiece 的標記化方法是對15.6.2 節中原始字節對編碼算法的輕微修改。為簡單起見,我們使用該d2l.tokenize函數進行標記化。過濾掉出現次數少于五次的不常見標記。

#@save
class _WikiTextDataset(torch.utils.data.Dataset):
  def __init__(self, paragraphs, max_len):
    # Input `paragraphs[i]` is a list of sentence strings representing a
    # paragraph; while output `paragraphs[i]` is a list of sentences
    # representing a paragraph, where each sentence is a list of tokens
    paragraphs = [d2l.tokenize(
      paragraph, token='word') for paragraph in paragraphs]
    sentences = [sentence for paragraph in paragraphs
           for sentence in paragraph]
    self.vocab = d2l.Vocab(sentences, min_freq=5, reserved_tokens=[
      '', '', '', ''])
    # Get data for the next sentence prediction task
    examples = []
    for paragraph in paragraphs:
      examples.extend(_get_nsp_data_from_paragraph(
        paragraph, paragraphs, self.vocab, max_len))
    # Get data for the masked language model task
    examples = [(_get_mlm_data_from_tokens(tokens, self.vocab)
           + (segments, is_next))
           for tokens, segments, is_next in examples]
    # Pad inputs
    (self.all_token_ids, self.all_segments, self.valid_lens,
     self.all_pred_positions, self.all_mlm_weights,
     self.all_mlm_labels, self.nsp_labels) = _pad_bert_inputs(
      examples, max_len, self.vocab)

  def __getitem__(self, idx):
    return (self.all_token_ids[idx], self.all_segments[idx],
        self.valid_lens[idx], self.all_pred_positions[idx],
        self.all_mlm_weights[idx], self.all_mlm_labels[idx],
        self.nsp_labels[idx])

  def __len__(self):
    return len(self.all_token_ids)

#@save
class _WikiTextDataset(gluon.data.Dataset):
  def __init__(self, paragraphs, max_len):
    # Input `paragraphs[i]` is a list of sentence strings representing a
    # paragraph; while output `paragraphs[i]` is a list of sentences
    # representing a paragraph, where each sentence is a list of tokens
    paragraphs = [d2l.tokenize(
      paragraph, token='word') for paragraph in paragraphs]
    sentences = [sentence for paragraph in paragraphs
           for sentence in paragraph]
    self.vocab = d2l.Vocab(sentences, min_freq=5, reserved_tokens=[
      '', '', '', ''])
    # Get data for the next sentence prediction task
    examples = []
    for paragraph in paragraphs:
      examples.extend(_get_nsp_data_from_paragraph(
        paragraph, paragraphs, self.vocab, max_len))
    # Get data for the masked language model task
    examples = [(_get_mlm_data_from_tokens(tokens, self.vocab)
           + (segments, is_next))
           for tokens, segments, is_next in examples]
    # Pad inputs
    (self.all_token_ids, self.all_segments, self.valid_lens,
     self.all_pred_positions, self.all_mlm_weights,
     self.all_mlm_labels, self.nsp_labels) = _pad_bert_inputs(
      examples, max_len, self.vocab)

  def __getitem__(self, idx):
    return (self.all_token_ids[idx], self.all_segments[idx],
        self.valid_lens[idx], self.all_pred_positions[idx],
        self.all_mlm_weights[idx], self.all_mlm_labels[idx],
        self.nsp_labels[idx])

  def __len__(self):
    return len(self.all_token_ids)

通過使用_read_wiki函數和_WikiTextDataset類,我們定義了以下內容load_data_wiki來下載 WikiText-2 數據集并從中生成預訓練示例。

#@save
def load_data_wiki(batch_size, max_len):
  """Load the WikiText-2 dataset."""
  num_workers = d2l.get_dataloader_workers()
  data_dir = d2l.download_extract('wikitext-2', 'wikitext-2')
  paragraphs = _read_wiki(data_dir)
  train_set = _WikiTextDataset(paragraphs, max_len)
  train_iter = torch.utils.data.DataLoader(train_set, batch_size,
                    shuffle=True, num_workers=num_workers)
  return train_iter, train_set.vocab

#@save
def load_data_wiki(batch_size, max_len):
  """Load the WikiText-2 dataset."""
  num_workers = d2l.get_dataloader_workers()
  data_dir = d2l.download_extract('wikitext-2', 'wikitext-2')
  paragraphs = _read_wiki(data_dir)
  train_set = _WikiTextDataset(paragraphs, max_len)
  train_iter = gluon.data.DataLoader(train_set, batch_size, shuffle=True,
                    num_workers=num_workers)
  return train_iter, train_set.vocab

將批量大小設置為 512,將 BERT 輸入序列的最大長度設置為 64,我們打印出 BERT 預訓練示例的小批量形狀。請注意,在每個 BERT 輸入序列中,10 (64×0.15) 位置是為掩碼語言建模任務預測的。

batch_size, max_len = 512, 64
train_iter, vocab = load_data_wiki(batch_size, max_len)

for (tokens_X, segments_X, valid_lens_x, pred_positions_X, mlm_weights_X,
   mlm_Y, nsp_y) in train_iter:
  print(tokens_X.shape, segments_X.shape, valid_lens_x.shape,
     pred_positions_X.shape, mlm_weights_X.shape, mlm_Y.shape,
     nsp_y.shape)
  break

Downloading ../data/wikitext-2-v1.zip from https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip...
torch.Size([512, 64]) torch.Size([512, 64]) torch.Size([512]) torch.Size([512, 10]) torch.Size([512, 10]) torch.Size([512, 10]) torch.Size([512])

batch_size, max_len = 512, 64
train_iter, vocab = load_data_wiki(batch_size, max_len)

for (tokens_X, segments_X, valid_lens_x, pred_positions_X, mlm_weights_X,
   mlm_Y, nsp_y) in train_iter:
  print(tokens_X.shape, segments_X.shape, valid_lens_x.shape,
     pred_positions_X.shape, mlm_weights_X.shape, mlm_Y.shape,
     nsp_y.shape)
  break

(512, 64) (512, 64) (512,) (512, 10) (512, 10) (512, 10) (512,)

最后,讓我們看一下詞匯量。即使在過濾掉不常見的標記后,它仍然比 PTB 數據集大兩倍以上。

len(vocab)

20256

len(vocab)

20256

15.9.3。概括

與 PTB 數據集相比,WikiText-2 數據集保留了原始標點符號、大小寫和數字,并且大了一倍多。

我們可以任意訪問從 WikiText-2 語料庫中的一對句子生成的預訓練(掩碼語言建模和下一句預測)示例。

15.9.4。練習

為簡單起見,句點用作拆分句子的唯一分隔符。嘗試其他句子拆分技術,例如 spaCy 和 NLTK。以 NLTK 為例。您需要先安裝 NLTK:. 在代碼中,首先. 然后,下載 Punkt 句子分詞器: 。要拆分諸如 之類的句子 ,調用 將返回兩個句子字符串的列表:。pip install nltkimport nltknltk.download('punkt')sentences = 'This is great ! Why not ?'nltk.tokenize.sent_tokenize(sentences)['This is great !', 'Why not ?']

如果我們不過濾掉任何不常見的標記,詞匯表的大小是多少?

聲明:本文內容及配圖由入駐作者撰寫或者入駐合作網站授權轉載。文章觀點僅代表作者本人,不代表電子發燒友網立場。文章及其配圖僅供工程師學習之用,如有內容侵權或者其他違規問題,請聯系本站處理。 舉報投訴
  • 數據集
    +關注

    關注

    4

    文章

    1182

    瀏覽量

    24427
  • pytorch
    +關注

    關注

    2

    文章

    766

    瀏覽量

    12877
收藏 人收藏

    評論

    相關推薦

    【大語言模型:原理與工程實踐】大語言模型的訓練

    大語言模型的核心特點在于其龐大的參數量,這賦予了模型強大的學習容量,使其無需依賴微調即可適應各種下游任務,而更傾向于培養通用的處理能力。然而,隨著學習容量的增加,對訓練數據的需求也相應
    發表于 05-07 17:10

    Pytorch模型訓練實用PDF教程【中文】

    本教程以實際應用、工程開發為目的,著重介紹模型訓練過程中遇到的實際問題和方法。在機器學習模型開發中,主要涉及三大部分,分別是數據、模型和損失函數及優化器。本文也按順序的依次介紹數據、模型和損失函數
    發表于 12-21 09:18

    Detectron訓練第三方數據測試

    從零開始使用Detectron訓練第三方數據是什么體驗(六)
    發表于 04-14 11:44

    使用YOLOv3訓練BDD100K數據之開始訓練

    (三)使用YOLOv3訓練BDD100K數據之開始訓練
    發表于 05-12 13:38

    高階API構建模型和數據使用

    了TensorFlow2.0Beta版本,同pytorch一樣支持動態執行(TensorFlow2.0默認eager模式,無需啟動會話執行計算圖),同時刪除了雜亂低階API,使用高階API簡單地構建復雜神經網絡模型,本文主要分享用高階API構建模型和數據
    發表于 11-04 07:49

    用于計算機視覺訓練的圖像數據介紹

    用于計算機視覺訓練的圖像數據
    發表于 02-26 07:35

    怎樣使用PyTorch Hub去加載YOLOv5模型

    在Python>=3.7.0環境中安裝requirements.txt,包括PyTorch>=1.7。模型和數據從最新的 YOLOv5版本自動下載。簡單示例此示例從
    發表于 07-22 16:02

    介紹XLNet的原理及其與BERT的不同點

    1、什么是XLNet?  首先,XLNet是一個類似于bert的模型,而不是一個完全不同的模型。但它是一個非常有前途和潛力的??傊?,XLNet是一種廣義的自回歸訓練方法?! ∧敲?,什么是自回歸
    發表于 11-01 15:29

    BERT模型的PyTorch實現

    BertModel是一個基本的BERT Transformer模型,包含一個summed token、位置和序列嵌入層,然后是一系列相同的self-attention blocks(BERT-base是12個blocks, BERT
    的頭像 發表于 11-13 09:12 ?1.4w次閱讀

    PyTorch教程之15.2近似訓練

    電子發燒友網站提供《PyTorch教程之15.2近似訓練.pdf》資料免費下載
    發表于 06-05 11:07 ?1次下載
    <b class='flag-5'>PyTorch</b>教程之15.2近似<b class='flag-5'>訓練</b>

    PyTorch教程15.4之預訓練word2vec

    電子發燒友網站提供《PyTorch教程15.4之預訓練word2vec.pdf》資料免費下載
    發表于 06-05 10:58 ?0次下載
    <b class='flag-5'>PyTorch</b>教程15.4之預<b class='flag-5'>訓練</b>word2vec

    PyTorch教程15.9之預訓練BERT數據

    電子發燒友網站提供《PyTorch教程15.9之預訓練BERT數據集.pdf》資料免費下載
    發表于 06-05 11:06 ?0次下載
    <b class='flag-5'>PyTorch</b>教程<b class='flag-5'>15.9</b>之預<b class='flag-5'>訓練</b><b class='flag-5'>BERT</b>的<b class='flag-5'>數據</b>集

    PyTorch教程15.10之預訓練BERT

    電子發燒友網站提供《PyTorch教程15.10之預訓練BERT.pdf》資料免費下載
    發表于 06-05 10:53 ?0次下載
    <b class='flag-5'>PyTorch</b>教程15.10之預<b class='flag-5'>訓練</b><b class='flag-5'>BERT</b>

    PyTorch教程16.6之針對序列級和令牌級應用程序微調BERT

    電子發燒友網站提供《PyTorch教程16.6之針對序列級和令牌級應用程序微調BERT.pdf》資料免費下載
    發表于 06-05 10:51 ?0次下載
    <b class='flag-5'>PyTorch</b>教程16.6之針對序列級和令牌級應用程序微調<b class='flag-5'>BERT</b>

    PyTorch教程16.7之自然語言推理:微調BERT

    電子發燒友網站提供《PyTorch教程16.7之自然語言推理:微調BERT.pdf》資料免費下載
    發表于 06-05 10:52 ?0次下載
    <b class='flag-5'>PyTorch</b>教程16.7之自然語言推理:微調<b class='flag-5'>BERT</b>
    亚洲欧美日韩精品久久_久久精品AⅤ无码中文_日本中文字幕有码在线播放_亚洲视频高清不卡在线观看
    <acronym id="s8ci2"><small id="s8ci2"></small></acronym>
    <rt id="s8ci2"></rt><rt id="s8ci2"><optgroup id="s8ci2"></optgroup></rt>
    <acronym id="s8ci2"></acronym>
    <acronym id="s8ci2"><center id="s8ci2"></center></acronym>