Objective: Build a basic prototype where a user enters a GIF-theme prompt, provides a YouTube link & uploads a video, and the system automatically generates captioned GIFs based on the content and prompt.
Flow
Video:
YouTube URL
MP4 file upload
Service: Groq Whisper API
API Key: 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
Call:
js
Copy
Edit
const response = await fetch('https://api.groq.com/v1/whisper/transcribe', {
method: 'POST',
headers: {
'Authorization': Bearer ${GROQ_API_KEY}
,
'Content-Type': 'application/json'
},
body: JSON.stringify({
audio_url: uploadedOrYouTubeAudioUrl,
model: 'whisper-large-v3-turbo'
})
});
const transcriptData = await response.json();
Output: Full transcript with timestamps.
Key-Line Extraction
Combine the transcript text + user “GIF theme” prompt.
Run a simple NLP/scoring heuristic (e.g., TF-IDF or embeddings similarity) to pick the top 2–3 lines that best match the theme.
Clip Segmentation
For each selected line, pull its start/end timestamps from the transcript.
Use FFmpeg to cut those segments out of the uploaded MP4 (or the downloaded YouTube stream).
Caption Overlay
Render the extracted text onto each clip (e.g., using FFmpeg’s drawtext filter).
Style captions for readability (semi-transparent background, centered, etc.).
GIF Conversion
Convert each captioned clip into a looping GIF (via FFmpeg).
Optimize size (e.g., palette generation & dithering).
Under each GIF, provide:
Download button (GIF file)
Preview (auto-play on hover)
Notes & Next Steps Environment Variables:
Store GROQ_API_KEY securely (e.g., in .env)
YouTube Download:
Use ytdl-core (Node) or youtube-dl (Python) to fetch MP4 streams.
Error Handling:
Gracefully fall back to YouTube’s auto-captions if the Groq API fails.
and this is the Gemini api key : AIzaSyDNg41kb7s9l-mnzxVGDzrjSzxyzX_Wt_s now create the app
Loading...
Loading...