I hold two Master's degrees: one in Applied Mathematics from CentraleSupelec, where I graduated with the highest honors, and another in Machine Learning and Computer Vision (Master MVA) from Ecole Normale Supérieure Paris-Saclay (formerly known as ENS Cachan).
Prior to joining Meta full-time, I gained diverse industry experience through internships at Adobe Research (San Jose) working on primitive fitting for large 3D point clouds, Snap Inc. (London) focusing on 3D body mesh reconstruction from 2D images, and Meta AI (London) where I contributed to 3D generation research.
We introduce Llama 3, a new family of foundation models supporting multilinguality, coding, reasoning, and tool use. The largest model is a 405B-parameter Transformer with a 128K-token context window. Llama 3 matches GPT-4 in quality across various tasks. We publicly release pre-trained and post-trained versions, along with Llama Guard 3 for safety. Additionally, we explore integrating image, video, and speech capabilities, achieving competitive performance.
Meta 3D Gen
Raphael Bensadoun*,
Tom Monnier*,
Yanir Kleiman*,
Filippos Kokkinos,
Yawar Siddiqui,
Mahendra Kariya,
Omri Harosh,
Roman Shapovalov,
Benjamin Graham,
Emilien Garreau,
Animesh Karnewar,
Ang Cao,
Idan Azuri,
Iurii Makarov,
Eric-Tuan Le,
Antoine Toisoul,
David Novotny,
Oran Gafni,
Natalia Neverova,
Andrea Vedaldi
ArXiv, 2024
project page
/
paper
/
bibtex
We introduce Meta 3D Gen (3DGen), a fast, state-of-the-art pipeline for text-to-3D asset generation. 3DGen creates high-quality, PBR-compatible 3D assets in under a minute and supports generative retexturing. By integrating Meta 3D AssetGen and TextureGen, it combines view, volumetric, and UV space representations, achieving a 68% win rate over single-stage models. 3DGen outperforms industry baselines in prompt fidelity, visual quality, and speed.
We introduce MeshPose, a method that unifies DensePose and Human Mesh Reconstruction (HMR). By leveraging new losses, we use weak DensePose supervision to localize a subset of mesh vertices in 2D (‘VertexPose’) and lift them to 3D, creating a low-poly body mesh (‘MeshPose’). Our end-to-end system achieves competitive DensePose accuracy while remaining lightweight and efficient, making it suitable for real-time AR applications.
We introduce StyleMorph, a 3D-aware generative model that disentangles shape, pose, object appearance, and background for high-quality image synthesis. It learns a 3D morphable model in an unsupervised manner by morphing a canonical template and leveraging implicit surface rendering of "Template Object Coordinates" (TOCS), an unsupervised alternative to UV maps. A StyleGAN-based deferred neural rendering network conditions synthesis on 2D TOCS maps and independent appearance codes. StyleMorph achieves competitive results on multiple datasets while enabling joint disentanglement of shape, pose, and texture.
We introduce SoftMesh, a fully differentiable pipeline that converts 3D point clouds into probabilistic mesh representations for direct 2D image rendering. By learning point connectivity with only 2D rendering supervision, SoftMesh reduces the need for full mesh supervision. We evaluate its performance on silhouette, normal, and depth rendering for both rigid and non-rigid objects, exploring transfer learning and cross-category learning. SoftMesh achieves competitive results, rivaling methods trained with full mesh supervision.
We introduce Cascaded Primitive Fitting Networks (CPFN), which detect both large and fine-scale primitives in high-resolution point cloud scans. CPFN utilizes an adaptive patch sampling network to combine global and local primitive detection results. A dynamic merging formulation aggregates primitives across scales. Our evaluation shows that CPFN improves state-of-the-art SPFN performance by 13-14% and enhances fine-scale primitive detection by 20-22% on high-resolution datasets.
We introduce Lean Point Networks (LPNs), which enhance point processing networks by improving memory usage, inference time, and accuracy. LPNs feature three novel blocks: a memory-efficient convolution block for point sets, a crosslink block for efficient information sharing across resolution levels, and a multi-resolution processing block for faster information diffusion. These blocks enable the design of deeper, wider point-based architectures. Our approach achieves significant improvements in accuracy and memory consumption across multiple segmentation tasks, using LPN modules as drop-in replacements in existing architectures like PointNet++, DGCNN, SpiderNet, and PointCNN.
Work Experience
June to December 2023: Research Intern at Meta AI
June to December 2022: Research Intern at Snapchat AR
June to November 2020: Research Intern at Adobe Research
Computer Vision conferences:
Reviewer for CVPR 2021, 2022, 2023
Reviewer for ICCV 2021, 2023
Reviewer for ECCV 2022 Computer Graphics conferences:
Reviewer for SIGGRAPH Asia 2022 Machine Learning conferences:
Reviewer for NeurIPS 2023
Education
2018-2023: PhD in Computer Vision at University College London
Co-supervised by Iasonas Kokkinos and Niloy J. Mitra 2016-2017: MSc in Machine Learning and Computer Vision at Ecole Normale Supérieure de Cachan
GPA: 4.00 (range from 0 to 4) - Graduated with Highest Honors (Overall grade: 17.49/20) 2014-2017: Master in Management at ESCP Europe
Z-Score: 1.89 (range from -3 to 3) - Graduated with Highest Honors 2013-2017: MSc in Applied Mathematics at CentraleSupelec
GPA: 3.98 (range from 0 to 4) - Graduated with Highest Honors