98%
921
2 minutes
20
Story visualization aims to create visually compelling images or videos corresponding to textual narratives. Despite recent advances in diffusion models yielding promising results, existing methods still struggle to create a coherent sequence of subject-consistent frames based solely on a story. To this end, we propose DreamStory, an automatic open-domain story visualization framework by leveraging the LLMs and a novel multi-subject consistent diffusion model. DreamStory consists of (1) an LLM acting as a story director and (2) an innovative Multi-Subject consistent Diffusion model (MSD) for generating consistent multi-subject across the images. First, DreamStory employs the LLM to generate descriptive prompts for subjects and scenes aligned with the story, annotating each scene's subjects for subsequent subject-consistent generation. Second, DreamStory utilizes these detailed subject descriptions to create portraits of the subjects, with these portraits and their corresponding textual information serving as multimodal anchors (guidance). Finally, the MSD uses these multimodal anchors to generate story scenes with consistent multi-subject. Specifically, the MSD includes Masked Mutual Self-Attention (MMSA) and Masked Mutual Cross-Attention (MMCA) modules. MMSA module ensures detailed appearance consistency with reference images, while MMCA captures key attributes of subjects from their reference text to ensure semantic consistency. Both modules employ masking mechanisms to restrict each scene's subjects to referencing the multimodal information of the corresponding subject, effectively preventing blending between multiple subjects. To validate our approach and promote progress in story visualization, we established a benchmark, DS-500, which can assess the overall performance of the story visualization framework, subject-identification accuracy, and the consistency of the generation model. Extensive experiments validate the effectiveness of DreamStory in both subjective and objective evaluations. Please visit our project homepage at https://dream-xyz.github.io/dreamstory.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2025.3600149 | DOI Listing |
Med Humanit
September 2025
School of Modern Languages and Cultures, University of Durham, Durham, UK
This article is a critical and creative essay that employs the story of my late maternal grandmother's experience of blindness to interrogate the relationship between blindness and sight in the context of contemporary Lebanese social, cultural and political life. To this end, the essay draws on retrospective autoethnography and memoirs as well as on critical disability studies and visual culture to reflect on broader notions of seeing and unseeing. In addition, I also incorporate a creative component to imaginatively narrate the relation of blindness and sight, especially in contexts marked by multiple forms of sociopolitical vulnerability and fragility.
View Article and Find Full Text PDFJ Laparoendosc Adv Surg Tech A
September 2025
Montefiore Medical Center, New York, New York, USA.
Clinical studies often define their findings as statistically significant based solely on a value of less than .05. In hernia surgery, pain intensity is a key patient-reported outcome, commonly measured using the visual analogue scale (VAS).
View Article and Find Full Text PDFChild Neuropsychol
September 2025
Children's National Division of Neuropsychology, Washington DC, USA.
Learning and memory are crucial neuropsychological skills, linked with the development of play, adaptive skills, and academic functioning. Children and adolescents with critical congenital heart disease (cCHD) are at risk for a range of neurodevelopmental difficulties. Here, we examine visual and verbal learning and memory skills in a school-age sample of children and adolescents with cCHD, and explore how medical, neuropsychological, and social variables predict school-age learning and memory.
View Article and Find Full Text PDFFront Sociol
August 2025
Graduate Program in Sociology, York University, Toronto, ON, Canada.
This institutional ethnographic (IE) study of a little-known Ontario-based mad history recounts how, in the 1980s and 1990s, ex-mental patients established a number of social enterprises (also known as consumer/survivor businesses), secured government funding and through these sites, got politically active around issues that impacted their lives. This research poses critical sociological questions about the circulation of activist knowledge-practices and the formation of these businesses as sites of community organizing. Methodologically, IE offers an approach through which I began from the experiences of ex-mental patients while aiming to explore how their activist practices are coordinated trans-locally.
View Article and Find Full Text PDFSci Data
August 2025
Department of Radiology, Washington University in St. Louis, St. Louis, MO, 63110, USA.
Understanding linguistic and semantic processing in the human brain involves exploring intricate neural networks. However, it remains unclear whether and how the amygdala and hippocampus are involved in these processes. Here, we recorded single-neuron activity from the human amygdala and hippocampus while neurosurgical patients with intractable epilepsy performed various language tasks.
View Article and Find Full Text PDF