Big Data Science (@bdscience): 📊A huge dataset of images and their captions is a dataset that contains over 16 million d…

Big Data Science

@bdscience

24.06.2024 20:59

📊A huge dataset of images and their captions

Pixel Prose is a dataset that contains over 16 million diverse images from three different web databases (commonPool, CC12M, RedCaps) with captions created using Google Gemini 1.0 Pro Vision.

The following Python script can be used to load a dataset using the API:
from datasets import load_dataset
# for downloading the whole data
ds = load_dataset("tomg-group-umd/pixelprose")

huggingface.co

tomg-group-umd/pixelprose · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

2 1.5K

Обсуждение 0

Обсуждение не доступно в веб-версии. Чтобы написать комментарий, перейдите в приложение Telegram.

Обсудить в Telegram

Обсуждение 0

Вход в экосистему

Ваши настройки cookie