avatar
Big Data Science
@bdscience
03.05.2024 20:59
💡😎A great resource to add to the collection: datasets for LLMs
Many examples (including LLama-3 and Phi-3) show that LLM development = creating quality datasets.
A developer from London has taken and described in this repository a huge number of datasets for pre-training or fintuning LLM in table format: reference, size, authors, date and personal notes.
There are also instructions on how to build your own quality dataset, and what the word "quality" means in the context of a dataset.
GitHub
GitHub - mlabonne/llm-datasets: Curated list of datasets and tools for post-training.
Curated list of datasets and tools for post-training. - mlabonne/llm-datasets
6 2K

Обсуждение 0

Обсуждение не доступно в веб-версии. Чтобы написать комментарий, перейдите в приложение Telegram.

Обсудить в Telegram