Big Data Science (@bdscience): 💡📊Open Source Synthetic Text-to-SQL Dataset Gretel releases largest open source dataset t…

💡😎📊Open Source Synthetic Text-to-SQL Dataset
Gretel releases largest open source Text-to-SQL dataset to speed up training of AI models
As of April 2024, the dataset is believed to be the largest and most diverse synthetic text-to-SQL conversion dataset available today, according to the developers.
The dataset contains approximately 23 million tokens, including approximately 12 million SQL tokens, and a wide range of SQL complexity levels, including subqueries, single joins, multiple joins, aggregations, window functions, and set operations.
To load a dataset via the Python API, you need to write the following script:
from datasets import load_dataset
dataset = load_dataset("gretelai/synthetic_text_to_sql")

Обсуждение 0

Вход в экосистему

Ваши настройки cookie