Postdoctoral Appointee - Data Management Techniques for AI/Machine Learning (argonne)
Job posting number: #125177 (Ref:418473)
This Job Posting is Expired.
Job Description
We are seeking a Postdoctoral Appointee - Data Management Techniques for AI/Machine Learning. A candidate with expertise at the intersection of HPC and AI. Major efforts at Argonne in this direction include AuroraGPT, a LLM/transformer-based foundational model that aims to become a scientific AI assistant in the spirit of commercially available solutions, but delivered as an open-source solution that is pre-trained with curated scientific data on Aurora, an Exascale ready supercomputer hosted at Argonne, and fine-tuned to closely match the needs of scientific applications.
This position is exploring a new data model centered around the notion of data states, which are intermediate representations of datasets automatically recorded into a lineage when tagged by applications with hints, constraints and persistency semantics. Such an approach enables the applications to focus on the meaning and properties of their data rather than how to access it, effectively reducing complexity while unlocking high performance and scalability for many use cases: finding and reusing previous intermediate results to explore alternatives, inspecting the evolution of datasets, verifying correctness, etc. This is especially important in the context of deep learning and foundational models, where there is an acute need for advanced data management capabilities, such as: checkpointing and versioning of models to facilitate stable pre-training, transfer learning and fine-tuning (by addressing resilience, model spikes and other anomalies, suspend-resume), a searchable lineage of related DNN models that are derived from each other (e.g. to facilitate network architecture search), capturing, caching and reusing intermediate transformations of training data (e.g., embeddings), memory optimizations (capturing and offloading the optimizer state and other data structures to host memory and other memory tiers) to overcome limited GPU memory, etc.
The successful candidate will explore data states and data management techniques in general for AI scenarios as exemplified above while emphasizing their applicability for AuroraGPT. In addition to addressing such transformative challenges that arise at the intersection of HPC and AI, you will have the opportunity to work closely with many domain experts to identify the requirements and bottlenecks of real-life scientific applications that address the needs of our society over the next decades. In general, you will be part of a vibrant and diverse research community from more than 100 countries. Our lab hosts Aurora, one of the first Exascale supercomputers in the world, which you will have an opportunity to use for your experiments. In addition, you will have access to a large array of bleeding-edge experimental testbeds through the Joint Laboratory for System Evaluation (JLSE), which feature the latest technologies from top vendors like Intel, NVIDIA, AMD, etc.
JOB IS FROM: postdocjobs.oneVIEWThis job description documents the general nature of work but is not intended to be a comprehensive list of all activities, duties and responsibilities required of the job incumbent. Consequently, the job incumbent may be required to perform other duties as assigned.
Position Requirements
Required skills and experience:
- PhD degree completed within the last 0-5 years, or nearing completion
- Strong scientific background in distributed computing and HPC in particular
- Experience with code development skills with C/C++ and Python
- Understanding of modern data management and I/O best practices
- Familiarity with machine/deep learning
Ability to model Argonne’s core values of impact, safety, respect, integrity, and teamwork
Desired skills and experience:
- Has experience with LLMs and transformers
- Familiarity with large scale deep learning techniques: data, tensor and pipeline parallelism
- Ability to conduct interdisciplinary research at the intersection of HPC and deep learning
- Lead and promote new insights and approaches
- Participate in team work and broad collaborative efforts involving other laboratories and universities, supercomputer centers and industry.
Job Family
Postdoctoral FamilyJob Profile
Postdoctoral AppointeeWorker Type
Long-Term (Fixed Term)Time Type
Full timeAs an equal employment opportunity and affirmative action employer, and in accordance with our core values of impact, safety, respect, integrity and teamwork, Argonne National Laboratory is committed to a diverse and inclusive workplace that fosters collaborative scientific discovery and innovation. In support of this commitment, Argonne encourages minorities, women, veterans and individuals with disabilities to apply for employment. Argonne considers all qualified applicants for employment without regard to age, ancestry, citizenship status, color, disability, gender, gender identity, gender expression, genetic information, marital status, national origin, pregnancy, race, religion, sexual orientation, veteran status or any other characteristic protected by law.
Argonne employees, and certain guest researchers and contractors, are subject to particular restrictions related to participation in Foreign Government Sponsored or Affiliated Activities, as defined and detailed in United States Department of Energy Order 486.1A. You will be asked to disclose any such participation in the application phase for review by Argonne's Legal Department.
All Argonne offers of employment are contingent upon a background check that includes an assessment of criminal conviction history conducted on an individualized and case-by-case basis. Please be advised that Argonne positions require upon hire (or may require in the future) for the individual be to obtain a government access authorization that involves additional background check requirements. Failure to obtain or maintain such government access authorization could result in the withdrawal of a job offer or future termination of employment.