Last week, I attended the Thirty-seventh Conference on Neural Information Processing Systems (a.k.a. NeurIPS 2023) in New Orleans, Louisiana as part of a paper on structured pruning for convolutional networks. Aside from the scenic waterside views, great seafood, and rich historical culture, I engaged in a deep conversation with the current landscape of artificial intelligence research and left with several key findings that I’d like to share.
Firstly, is that A.I. moves incredibly fast - both academically and socially. Coming in, I was under the predisposition that an academic conference would be, for lack of a better word, academic. That is, much like how classes at a university are structured, the conference would in a similar manner be slow, methodical, and incremental. However, it became immediately apparent to me that this was not the case. There was an aura of constant excitement and hurriedness that permeated the halls – everyone was rushing somewhere to see something. And no matter what that something was, it was almost guaranteed to be fascinating. It was awe-inspiring to see the groundbreaking work done by individuals from all over the world in areas from all over the world; whether it was climate data, interpretability, education, or ethics, it seemed as if every field had passionate and intelligent minds spearheading progress within their respective communities.
Such a sense of progress was not lost upon observers like myself. Papers built upon and cited other papers published mere months ago. Startups and venture capitalists roamed the halls, scouting the competition while eagerly attempting to secure the latest talent. Tech demos came and went, and what was last month’s superchip became this month’s benchmark. As a friend I would meet aptly put it, “A.I. moves in dog years.”
My second discovery was that there currently exists no other field that interweaves so many disciplines as does artificial intelligence. To be an effective A.I. researcher is to be an effective scholar, programmer, writer, philosopher, mathematician, and more. The field demands and will continue to demand the best and brightest minds from every sector, and I’m excited to see the kind of innovation that will take place in the next few years. Hopefully, it’s an effort that I can meaningfully contribute to.
Finally, I’d like to share some papers that stood out to me. Of course, I was not able to read every poster at the conference, so this will just be a kind of non-emcompassing personal highlight reel (in no particular order):
- TaskMet: Task-Driven Metric Learning for Model Learning
Dishank Bansal, Ricky T. Q. Chen, Mustafa Mukadam, Brandon Amos, Meta - DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization
Zhiqing Sun, Yiming Yang, Carnegie Mellon - Passive learning of active causal strategies in agents and language models
Andrew Kyle Lampinen, Stephanie C Y Chan, Ishita Dasgupta, Andrew J Nam, Jane X Wang, Google Deepmind/Stanford - Reflexion: Language Agents with Verbal Reinforcement Learning
Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao, Northeastern/Princeton/MIT - Are Emergent Abilities of Large Language Models a Mirage?
Rylan Schaeffer, Brando Miranda, Sanmi Koyejo, Stanford - Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn, Stanford - The ToMCAT Dataset
Pyarelal et al., UofA - Interactive Visual Feature Search
Devon Ulrich, Ruth Fong, Princeton - Transformers learn through gradual rank increase
Enric Boix-Adserà, Etai Littwin, Emmanuel Abbe, Samy Bengio, Joshua M. Susskind, Apple/MIT/EPFL - One Less Reason for Filter Pruning: Gaining Free Adversarial Robustness with Structured Grouped Kernel Pruning
Shaochen Zhong, Zaichuan You, Jiamu Zhang, Sebastian Zhao, Zachary LeClaire, Zirui Liu, Daochen Zha, Vipin Chaudhary, Shuai Xu, Xia Hu, Rice/Case Western/Berkeley - Pruning vs Quantization: Which is Better?
Andrey Kuzmin, Markus Nagel, Mart van Baalen, Arash Behboodi, Tijmen Blankevoort, Qualcomm - The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks
Ziqian Zhong, Ziming Liu, Max Tegmark, Jacob Andreas, MIT
Without much surprise, language models dominated this year with the paradigm set by architectures like transformers for the academic world and products like ChatGPT for the general public. However, that is not to say there was not a myriad of interesting research done in fields elsewhere. Whether or not your research is concerned with LLMs, being able to present your work and explore that of others at events like NeurIPS is quite an unparalleled experience – and one that I would very much like to experience again soon.