Pour échanger, demandez l’accès au :
Nous avons le projet d’une infolettre, seriez-vous intéressé ?
03/04/2024
•Tags : IA, AI, veille
Toutes les notes de veille : [[+ Sommaire veille]] Date de récolte : [[2024-04-03-mercredi]]
April 2, 2024
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
AI News for 4/1/2024-4/2/2024. We checked 5 subreddits and 364 Twitters and 26 Discords (382 channels, and 4481 messages) for you. Estimated reading time saved (at 200wpm): 463 minutes.
So you have time to either:
And congrats to Logan on joining Google.
Table of Contents
Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence. Comment crawling still not implemented but coming soon.
Open Source Models and Libraries
Model Performance and Capabilities
Hardware and Performance
Stable Diffusion and Image Generation
Miscellaneous
Memes and Humor
all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.
AI Models and Architectures
Retrieval Augmented Generation (RAG)
Tooling and Infrastructure
Research and Techniques
Memes and Humor
A summary of Summaries of Summaries
Claude 3 Haiku Impresses as Budget-Friendly Opus Alternative: The smaller and cheaper Claude 3 Haiku model is generating buzz for its effective reasoning and trick question handling, posing as a cost-efficient alternative to Opus in Perplexity AI. Discussions also focused on Perplexity's potential plans to introduce ads and the preference for the Writing focus mode over All focus for cleaner LLM interactions. (Perplexity AI Discord)
Gecko and Aurora-M Push Boundaries in Text Embedding and Multilingual LLMs: The new Gecko model demonstrates robust performance on the Massive Text Embedding Benchmark (MTEB) and may accelerate diffusion model training, as detailed in its Hugging Face paper and arXiv abstract. Meanwhile, the Aurora-M model, with 15.5B parameters, is geared towards multilingual tasks and has processed over 2 trillion training tokens, as highlighted on Twitter and arXiv. (LAION Discord)
Efficient Fine-Tuning Techniques Spark Debate: Conversations in the Unsloth AI community revolved around strategies for dataset splitting, the efficacy of sparse fine-tuning (SFT) versus quantization methods like QLora, and the steep costs associated with model pre-training. Members also highlighted the need for robust detection systems to combat AI misuse and protect Discord servers from malicious bots and scams. (Unsloth AI Discord)
Stable Diffusion Community Anticipates SD3 and Tackles Model Challenges: The Stable Diffusion community is buzzing with anticipation for the 4-6 week release timeline of Stable Diffusion 3 (SD3), while also addressing challenges with rendering facial and hand details using tools like Adetailer and various embeddings. Discussions touched on the rapid pace of AI development, ethical considerations around using professional artwork for training, and the potential memory demands of future SD versions. (Stability.ai Discord)
Mojo 24.2 Introduces Python-Friendly Features as Tensor Talk Heats Up: The Mojo Programming Language community is abuzz with the release of Mojo 24.2, which brings a host of Python-friendly features and enhancements. Discussions delved into Mojo's handling of parallelism, value types, and tensor performance optimizations. The announcement of the MAX Engine and C/C++ interop in Mojo also generated excitement for its potential to streamline Reinforcement Learning (RL) Python training. (Modular Discord)
Tinygrad Grapples with AMD GPU Instability and Cultural Resistance: The tinygrad community expressed frustration with severe system instability when using AMD GPUs, highlighting issues like memory leaks and non-recoverable errors. Skepticism was directed towards AMD's commitment to resolving underlying problems, with calls for open-source documentation and modern software practices. Discussions also touched on workaround strategies and the need for a fundamental cultural shift in AMD's approach to software and firmware. (tinygrad Discord)
LLM Serving Platforms Compete as Triton Alternatives Emerge: Discussions in the LM Studio and MAX Serving communities focused on the capabilities of different LLM serving platforms, with MAX Serving being explored as a potential alternative to Triton. Users sought guidance on migrating existing setups and inquired about support for features like GPU-hosted models. The LM Studio community also grappled with error messages and compatibility issues across various models and hardware configurations. (LM Studio Discord, Modular Discord)
Retrieval-Augmented Fine-Tuning (RAFT) Takes Center Stage: LlamaIndex hosted a webinar featuring Retrieval-Augmented Fine-Tuning (RAFT) with lead co-authors Tianjun Zhang and Shishir Patil, delving into how RAFT combines the benefits of retrieval-augmented generation (RAG) and fine-tuning to improve language models' performance in domain-specific settings. The webinar aimed to provide insights and resources for those interested in implementing RAFT in their own projects. (LlamaIndex Discord)
Axolotl Advances with Lisa Merge and DeepSpeed Challenges: The Axolotl AI Collective celebrated the approval of the latest PR for lisa
and the addition of a YAML example for testing. However, developers encountered out-of-memory errors when attempting to train models with DeepSpeed or FairScale Single-Process Single-GPU (FSDP). The collective also made strides in dataset unification efforts and expressed interest in exploring runpod serverless for very large language models (VLLM). (OpenAccess AI Collective Discord)
FastLLM and RankLLM Push Boundaries in Retrieval and Reranking: Qdrant introduced FastLLM, a language model boasting a 1 billion token context window, aimed at enhancing AI-driven content generation, as detailed in their announcement post. Meanwhile, RankLLM by @rpradeep42 et al., an open-source collection of LLMs fine-tuned for reranking, was recommended for those building advanced RAG systems, with emphasis on the importance of choosing the right reranker. (HuggingFace Discord, LlamaIndex Discord)
Claude 3 Haiku Enters the Fray: The smaller and cheaper Claude 3 Haiku is generating buzz for its effective reasoning and trick question handling, posing as a cost-efficient alternative to Opus.
Perplexity Users Ponder Advertising Prospects: Discussion thrives around a potential shift in Perplexity AI's strategy with the introduction of ads, stoking debates about the authenticity of recent announcements, possibly tied to an April Fools' gag.
Selecting the Superior Search: Participants advocate for the Writing focus over the All focus in Perplexity for a streamlined and less problematic Large Language Model (LLM) interaction.
Prompt Defence Protocols in Question: Security concerns heighten over prompt attacks on Perplexity AI's models. Discourse turns to the necessity for robust safeguards against malicious injections and data poisoning.
Price Tag Shock for Gemini 1.5 Pro API: An active dialogue contests the steep pricing of Gemini 1.5 Pro API, leading to conversations about more budget-conscious tiered pricing structures based on token consumption.
Embracing the Cascade: Keen members exchange insights into Stable Cascades, referencing Perplexity AI for in-depth understanding.
A Peek into Perplexity's Neo: Queries launched to unravel what sets Neo apart, with an eye on distinct attributes.
Managing Perplexity Subscriptions Skillfully: A hiccup arises as API credits get ensnared in "Pending" limbo, and the lack of a team signup option for Perplexity's API draws attention.
Comparing Token Economies: Resources are shared to contrast the token expense between Perplexity and ChatGPT, fostering informed decisions for users.
Gecko Climbs to New Heights in Text Embedding: The new Gecko model demonstrates robust performance on the Massive Text Embedding Benchmark (MTEB) and may accelerate diffusion model training, as detailed in its Hugging Face paper and arXiv abstract. Interest in Gecko's practical application is reflected in queries about the availability of its weights.
Aurora-M Lights Up Multilingual LLM Space: The Aurora-M model, with 15.5B parameters, is geared towards multilingual tasks while adhering to guidelines set by the White House EO and is celebrated for processing over 2 trillion training tokens, as highlighted on Twitter and arXiv.
Hugging Face's Diffusers Under the Spotlight: Contributions to Hugging Face's Diffusers stirred debates around efficiency, with a focus on a PR regarding autocast for CUDA in Diffusers and incomplete unification in pipelines, as seen in discussion #551 and PR #7530.
PyTorch Gears Up with 2.6 Stirring Curiosity: Discussions around updates in PyTorch versioning sparked interest, especially regarding the silent addition of bfloat16 support in PyTorch 2.3, and anticipation for new features in the upcoming PyTorch 2.6. Noteworthy contributions include a critique of autocast performance with details in a GitHub thread.
LangChain Event Hooks in AI Engineers With Harrison Chase: Harrison Chase, CEO of LangChain, prepares to talk at an online event about leveraging LangSmith in moving from prototype to production on April 17 at 6:30 PM, with registration available here. His company focuses on using LLMs for context-aware reasoning applications.
Model Might on a Budget: Guild members actively debated cost vs quality in AI modeling, with discussions ranging from $50K to several million dollars needed for pre-training varying in dataset size. A strong emphasis was placed on finding a balance between resource efficiency and maintaining high-quality outputs.
Scam Shield Tightens: Concerned with an increase in malicious bots and scams, the engineer community underscored the need for robust detection systems to thwart AI misuse and protect Discord servers.
Precision in Saving Space: Tips were shared on conserving space when saving finetuned models on platforms like Google Colab, with one user suggesting a method that saves 8GB of space but warned of a slight loss in accuracy.
Training Tactics Tussle: The optimal approach for division of datasets and the application of sparse fine-tuning (SFT) versus quantization methods was a hot topic, with insights into the trade-offs between performance and cost-effectiveness being highly sought after.
Integration Enthusiasm for DeepSeek: A user-proposed integration of the DeepSeek model into Unslotsh 4bit, showcasing the community's push for model diversity and efficiency improvements, with an accompanying Hugging Face repository and a Google Colab notebook set for implementation.
Cyberrealistic vs. EpicRealism XL: Debate is ongoing about the performance of two Stable Diffusion models: while Cyberrealistic demands precise prompts, EpicRealism XL outshines with broader prompt tolerance for realistic imagery.
SD3 Is Coming: The community is buzzing with the 4-6 weeks anticipated release schedule for Stable Diffusion 3 (SD3), with some doubt about the timing but evident excitement for improved features, notably a fixed text function.
Fixing Faces and Hands: The Stable Diffusion aficionados are tackling challenges with rendering facial and hand details, recommending tools such as Adetailer and various embeddings to enhance image quality without sacrificing processing speed.
CHKPT Model Confusion: In the sea of CHKPT models, users seek guidance for best use cases, pointing towards models like ponyxl, dreamshaperxl, juggernautxl, and zavychroma as part of a suggested checkpoint "starter pack" for Stable Diffusion.
Ethics and Performance in Model Development: Discussions touch on the rapid pace of AI development, ethical questions around using professional artwork for AI training, and speculated memory demands for future Stable Diffusion versions, all peppered with light-hearted community banter.
DBRX Revealed: A new open-source language model titled DBRX is making waves, claiming top performance on established benchmarks. Watch the introduction of DBRX.
Whisper Models Under the Microscope: WhisperX might replace BetterTransformer given concerns over the latter's high error rates. Community mulling over Transforming the Web and Apple's latest paper on reference resolution.
Speed Meets Precision in LLM Operations: LlamaFile boasts 1.3x - 5x improved speed over llama.cpp on CPU for specific tasks, potentially altering future local operations. A configuration file for Hercules fine-tuning resulted in decreased accuracy, stirring debates over settings like lora_r
and lora_alpha
.
Hugging Face Misstep Halts Upload: ModelInfo loading issues caused by safetensors.sharded
metadata from Hugging Face are preventing uploads to the chain, driving discussions for fixes.
Brainstorming for WorldSim: WorldSim enthusiasts propose a "LLM Coliseum" with competitive benchmarks, file uploads facilitating pre-written scripts, and speculation on future developments like competitive leaderboards and AI battles.
Traffic Signal Dataset Signals Opportunity: A traffic signal image dataset surfaced, promising to aid vision models despite Hugging Face's viewer compatibility issues.
Trouble in GPU Paradise: AMD GPUs are causing major headaches for tinygrad users, with system crashes and memory leak errors like "amdgpu: failed to allocate BO for amdkfd"
. Users share workarounds involving PCIe power cycling but remain unimpressed by AMD's perceived lack of commitment to addressing these bugs.
A Virtual Side-Eye to AMD's Program: An invitation to AMD's Vanguard program drew skepticism from George Hotz and others, sparking a debate over the effectiveness of such initiatives and the need for open-source solutions and better software practices at AMD.
Learning Curve for Linear uOps: A detailed write-up explaining linear uops was shared in the #learn-tinygrad channel, aiming to demystify the intermediate representation in tinygrad, complemented by a tutorial on the new command queue following a significant merge.
Tinygrad Pull Requests Under Scrutiny: Pull Request #4034 addressed confusion around unit test code and backend checks. A focus on maintaining proper test environments for various backends like CLANG and OpenCL is emphasized.
Jigsaw Puzzle of Jitted Functions: A knowledge gap regarding why jitted functions don't show up in command queue logs led to discussions about the execution of jitted versus scheduled operations within tinygrad’s infrastructure.
LM Studio Tangles with Model Troubles: Engage with caution, LM Studio is throwing unknown exceptions particularly with estopian maid 13B q4 models on RTX 3060 GPUs, and users report crashes during prolonged inferencing. There's a growing need for Text-to-Speech and Speech-to-Text functionality, but currently, one must tether tools like whisper.cpp for voice capabilities.
In Quest for Localized Privacy: While the quest for privacy in local LLMs continues, one suggestion is to pair LM Studio with AnythingLLM for a confidential setup, though LM Studio itself does not have built-in document support. Meanwhile, Autogen is producing a mere 2 tokens at a time, leaving users to wonder about optimal configurations.
GPU Discussions Heat Up: SLI isn't necessary for multi-GPU setups; however, VRAM rather than combined VRAM is what's at play - an important spec for running models. A dual Tesla P40 setup is touting 3-4 tokens/sec for 70B models, while those on a budget admire P40s' VRAM, weighing it against the prowess of the 4090 GPU.
Top Models for Anonymous Needs: For the discrete engineer, the Nous-Hermes 2 Mistral DPO and Nous-Hermes-2-SOLAR-10.7B models come recommended, particularly for those needing to handle NSFW content. Tech hiccups with model downloads and execution have left some discontented, suspecting missing proxy support as the culprit.
Desiring Previous Generation Functionality: The convenience of splitting text on each new generation is missed, as current LM Studio updates overwrite existing output, prompting requests for a revert to the previous modality.
Google Packs the Web in RAM: Engineers noted Google's robust search performance may be due to embedding the web in RAM using a distributed version of FAISS and refined indexing strategies like inverted indexes. The discussion delved into Google's infrastructure choices, hinting at methods for handling complex and precise search queries.
Sleuthing Google's Programming Paradigms: Participants dissected Google's use of programming strategies that include otherwise shunned constructs like global variables and goto
, illustrating a pragmatic approach to problem-solving and efficiency in their systems.
Sparse Autoencoders Reveal Their Secrets: A new visualization library for Sparse Autoencoder (SAE) has been released, shedding light on their feature structures. Mixed reactions to categorizing SAE features in AI models reflect both the detailed complexities and the abstract challenges in AI interpretability.
New Horizons in Music AI: A paper examining GANs and transformers in music composition was discussed, hinting at potential future directions in music AI, including text-to-music conversion metrics. Meanwhile, gaps in lm-eval-harness benchmarks for Anthropic Claude models suggest a growing interest in comprehensive model evaluation frameworks.
Batch Size Trade-offs in GPT-NeoX: Tuning GPT-NeoX for uneven batch sizes may introduce computational bottlenecks due to load imbalances, as larger batches hold up processing speed.
Bonus Bullet for AI Sportsmanship: Suggestions were made for EleutherAI community engagement in the Kaggle AI Mathematical Olympiad competition. Compute grants could support these inclinations towards "AI in science" initiatives.
ChatGPT Goes Unrestricted: OpenAI has introduced a new way to use ChatGPT instantly, enabling access without sign-up requirements, aiming to broaden AI accessibility and confirming that user interactions help enhance model performance, with optional data contribution.
Prompt Pondering and Managerial Models: Engineers discussed the efficacy of schema versus open-ended prompts in converting PDFs to JSON, raising concerns about potential Terms of Service breaches, and seeking advice on prompts to automate managerial tasks, including the division of directives and performance planning.
AI Creative Limits and Originality Inquisition: A comparison of different AI responses to song recognition challenges revealed the boundaries of AI's creativity, and a study pointed to AI demonstrating emergent behaviors, potentially offering original outputs not evident in their training sets.
Anticipating GPT-5 and Navigating GPT-4: Dialogues within the community reflected on the reflective capabilities of Large Language Models (LLMs), joked about April Fools' tech links, discussed GPT-4's advancements over Opus and server stability issues, and shared the use of DALL-E 3's image editing feature, with a nod towards the potential of the anticipated GPT-5.
AI Serving Diversity in Functions: Engineers are exploring various AI tools, like Claude 3 Sonnet and Midjourney, for image description, and discussing compatibility challenges with AI apps on devices such as the Samsung Galaxy Note 9, with solutions involving checking system versions or utilizing mobile web browsers as alternatives.
Get Schooled on RAFT: A Retrieval-Augmented Fine-Tuning (RAFT) webinar with Tianjun Zhang and Shishir Patil is scheduled for Thursday at 9am PT, promising insights into RAFT's advantages over traditional fine-tuning in language models. Prep materials include RAFT blog posts and the full RAFT paper, with registration available here.
LlamaIndex's Call for Webinar Participation: LlamaIndex is hosting a webinar on RAFT, comparing it to taking an "open-book exam," and has shared a schematic diagram for constructing RAG frameworks using different tools which can be found here in a step-by-step guide.
Troubleshooting with LlamaIndex: There were reports of outdated LlamaIndex documentation, difficulties with the OpenAI.organization
setting, and deprecated models like text-davinci-003
. Also discussed was the use of WeatherReader for weather-related queries within RAG, and manual methods for handling images in PDFs using LlamaParse.
Question Over-Simplification in Agent-Based Systems: In the realm of creating a multi-document RAG system, one user highlighted an issue where the top_agent over-simplified the input question resulting in inadequate search outcomes. They shared details about incorrect narrowing of queries like "expiration date of chocolate," reducing it merely to "expiration date."
Tutorial Worth Watching: A user recommended a YouTube tutorial on building a RAG application using LlamaIndex, highlighting integration with Pinecone and Gemini Pro for content scraping, embedding conversion, and querying, which can be accessed here.
JSON Juggling Woes: Engineers are discussing challenges with parsing JSON in LangChain, where each line currently creates a separate Document instead of one Document with comprehensive metadata. The issue is detailed in the JSON loader documentation, but a solution has not been posted.
Token Tally Rising with Tool Use: There's a noted 50% increase in token usage when LangChain agents employ tools, attributed to the tools' process of data retrieval and tokenization. While the system prompt is executed once for inference assumptions, not all tools necessitate this.
LangGraph Labyrinth: Insights into utilizing a base model as the state in LangGraph were shared alongside a Github notebook example. Moreover, StructuredTool fields in LangChain can be validated using Pydantic's BaseModel
and Field
classes, as referenced in Github issues.
Fine-tuning Foil or Friend?: Dialogues around achieving structured output from a chain suggest employing two agents to balance specialized knowledge and general intelligence post fine-tuning. However, no clear consensus or strategy has been provided to address this challenge.
PDFs and PersonaFinders Proliferate: The discourse includes attempts to map content across PDFs using vector embeddings for matching paragraphs semantically, while a new release called PersonaFinder GPT promises conversational AI abilities based on identified personal attributes and invites testing on PersonaFinder Pro.
LinkedIn Badges: Stat or Fad?: A LinkedIn user flaunted having over 30 Top Voice badges, raising questions on the value of such accolades; LinkedIn's badges in question.
AI Hallucinates, Developers Take Notes: Software packages imagined by AI are being created and mistakenly utilized by major companies like Alibaba, showcasing a potential malware vector; more in The Register's coverage.
Billion-Token Models on the Horizon: Qdrant introduces FastLLM, capable of a 1 billion token context window, aimed at enhancing AI-driven content generation; dive into the details in their announcement post.
Depth in Diffuser Channels: Discussions focused on the intricacies of LoRA with diffusers, touching upon model queries without clear resolutions, and tackling the challenge of fine-tuning language models on PDF files without conclusive advice being provided.
Gradio 4.25 Debuts, Brings Enhanced UX: Gradio 4.25.0 rolls out features like auto-deletion of gr.State
variables, cache_examples="lazy"
, a fix for streaming audio outputs, and a more intuitive gr.ChatInterface
to streamline user interactions.
Mojo Gets Mighty with MAX Engine: The imminent introduction of the MAX Engine and C/C++ interop in Mojo aims to streamline RL Python training, potentially allowing Python environments to be speedily re-implemented in Mojo, as detailed in the Mojo Roadmap. Meanwhile, Mojo 24.2 has excited developers with its focus on Python-friendly features, whose depth is explored in the MAX 24.2 announcement and the blog post on Mojo open-source.
Tune in to Modular's Frequencies: Modular's busy Twitter activity seems part of an outreach or announcement series, and details on their ideas can be tracked on Modular's Twitter for those interested in their updates or campaigns.
Tensors, Tests, and Top-level Code Talk: Open dialogue about the quirks and features of Mojo continued with insights like the need for improved Tensor performance, which was tackled by reducing copy initialization inefficiencies. Engineers also raised issues around top-level code and SIMD implementations, highlighting challenges like Swift-style concurrency and intrinsic function translations, with some guidance available in the Swift Concurrency Manifesto.
Unwrapping the CLI with Prism: The Prism
CLI library's overhaul brings new capabilities like shorthand flags and nested command structures, harmonizing with Mojo's 24.2 update. Enhancements include command-specific argument validators, with the development journey and usability of References being a point of focus, as seen on thatstoasty's Prism on GitHub.
Deploy with MAX While Anticipating GPU Support: Questions about using MAX as a Triton backend alternative point to MAX Serving's utility, though currently lacking GPU support; documentation can guide trials via local Docker, found in the MAX Serving docs. Ongoing support and clarifications for prospective MAX adopters are discussed, emphasizing that ONNX models could fit smoothly into the MAX framework.
Nightly Mojo Moves and Documentation: Dedicated Mojo users were alerted about the nightly build updates and directed to use modular update
commands, with changes listed in the nightly build changelog. Additionally, valuable guidelines for local Mojo stdlib development and best testing practices are documented, suggesting testing
module use over FileCheck and pointing to stdlib development guide.
Miniconda Shrinks the Stack: Miniconda is validated as an effective, smaller substitute to Anaconda for those needing lighter installs without sacrificing functionality.
Call to Collaborate on OhanaPal: The OhanaPal app is an innovative tool leveraging OpenAI GPT APIs to aid neurodivergent individuals, with the developers seeking contributors for further brainstorming and prototyping. Interested parties can engage through their website.
3D Printing Size Adjustments for Gadgets: When 3D printing the O1 Light, scale the model up by 119.67% to properly accommodate an M5 Atom Echo, and a GitHub pull request #214 enhances the M5Atom with auto-reconnection features.
Windows Package Management Enhanced: A tip for Windows users: consider winget and scoop as viable tools for software package management, alongside the traditional Microsoft offerings.
Open Source AI Fosters Independence: The fabric repository on GitHub provides an open-source AI augmentation framework to solve specific problems using crowdsourced AI prompts, and Microsoft’s UFO (GitHub - microsoft/UFO) explores UI-Focused Agents for Windows interaction.
Chatbot Prefix Quirk in OpenRouter: The undi95/remm-slerp-l2-13b:extended model is unexpectedly prefixing responses with {bot_name}:
in OpenRouter during roleplay chats; however, recent prompt templating changes were ruled out as the cause. The usage of the name
field in this scenario is under investigation.
SSL Connection Mystery: A connection attempt to OpenRouter was thwarted by an SSL error described as EOF occurred in violation of protocol, yet the community did not reach a consensus on a solution.
New Book Alert: Architecting with AI in Mind: Obie Fernandez has launched an early release of his book, Patterns of AI-Driven Application Architecture, spotlighting OpenRouter applications. The book is accessible here.
Nitro Model Discussion Heats Up: Despite concerns over the availability of nitro models, it's been affirmed that nitro models are still accessible and forthcoming. Confusion around the performance of different AI models suggests a prominent interest in optimizing speed and efficiency.
Model Troubleshooting & Logit Bias: Users encountered issues with models like NOUS-HERMES-2-MIXTRAL-8X7B-DPO and debated alternatives such as Nous Capybara 34B for specific tasks, noting its 30k context window for improved performance. Clarifications were made regarding OpenRouter logit bias application, which is currently limited to OpenAI's models only.
NumPy's Unexpected Thread Behavior: A member was surprised to find that NumPy wasn't fully utilizing threads, which was confirmed by benchmarking code showing better performance with a custom matmul
function. This highlighted NumPy's suboptimal multi-threading capabilities.
Prompting for llamafile Documentation: The impending release of llamafile 0.7 sparked conversations on prompt templating within openchat 3.5, revealing a need for better documentation to clear confusion among users. The community eagerly awaits clearer guidance on integration specifics.
TinyBLAS Offers a CUDA-Free Alternative: The discussion addressed TinyBLAS as an alternative for GPU acceleration, though it was noted that its performance is contingent upon the specific graphics card used. This option enables GPU support without needing the installation of CUDA or ROCm SDKs, which could significantly ease setup for some users.
Windows ARM64 Compatibility Hurdles with llamafile: Users inquiring about Windows ARM64 support for llamafile discovered that while the ARM64X binary format is supported, there are emulation issues with AVX/AVX2, a detail crucial to developers working within the Windows ARM64 ecosystem.
Local Deployment Troubles: Participants encountered an "exec format error" during local deployment of llamafile, sparking a troubleshooting discussion that included suggestions to switch from zsh to bash and details on the correct execution of Mixtral models dependent on hardware configurations.
15 Billion Reasons to Consider AMD: MDEL's successful training of a 15B model on AMD GPUs suggests that AMD may be a viable option in the hardware landscape for large-scale AI models.
The Mystery of the Training Freeze: Post-epoch training hangs were reported without the apparent use of val_set_size
or eval_table
, with hints suggesting the cause could be due to insufficient storage or yet-unidentified bugs in certain models or configurations.
Axolotl Development Continues Amid Pranks: The Axolotl Dev team approved a PR merge for lisa
, added a YAML example for testing, and jovially proposed an April Fool's partnership with OpenAI. However, there are issues with missing documentation and out of memory errors potentially related to DeepSpeed or FSDP training attempts.
Unified Data Dilemma: There's a significant effort to combine 15 datasets into a unified format, with members tackling hurdles from data volume to misaligned translations.
Rigorous Runpod Reviews Requested: Interest has been shown in the use of runpod serverless offerings for very large language models, seeking insights from community experiences.
FastLLM Blasts into the AI Scene: Qdrant announced FastLLM (FLLM), a language model boasting a 1 billion token context window for Retrieval Augmented Generation, though skeptics suggest the timing of its announcement on April 1 may signal a jest.
Visualization for Understanding GPTs: A visual introduction to Transformers and GPTs by popular YouTube channel 3Blue1Brown has garnered attention among AI professionals looking for a clearer conceptual understanding of these architectures.
Engineers Build Open Source LLM Answer Engine: An open source "llm-answer-engine" project unveiled on GitHub has intrigued the community with its use of Next.js, Groq, Mixtral, Langchain, and OpenAI to create a Perplexity-Inspired Answer Engine.
Structured Outputs from LLMs Become Simpler: The engineering crowd noted the release of instructor 1.0.0, a tool aimed at ensuring Large Language Models (LLMs) produce structured outputs that conform to user-defined Pydantic models, assisting in seamless integration into broader systems.
Google Powers Up AI Division: In a pivot to bolster its AI offerings, Google has tapped Logan Kilpatrick to lead AI Studio and advance the Gemini API, signaling the tech giant's intensified commitment to becoming the hub for AI developers.
ncu --target-processes all --set detailed --import-source yes -o output_file python your_script.py
shared for better developmental workflow. Key performance improvements were emphasized, referencing resources from Accelerating Triton.RLAIF Could Boost Opus: It's speculated that applying Reinforcement Learning with Augmented Intermediate Features (RLAIF) could further enhance Opus by refining its decision-making accuracy.
Google's Bold AI Aspiration: A new AI product leader at Google announced their commitment to making Google the paramount destination for AI developers, supported by the AI Studio and Gemini API.
Advancements and Discussion in DPO: A recent preprint explores verbosity issues in Direct Preference Optimization (DPO) at large scales. The discourse also mentioned the rebuttal of a study on verbosity exploitation in Reinforcement Learning from Human Feedback (RLHF), available on arXiv.
A Veil Over AI Post-GPT-4: Post-GPT-4, AI communities notice a trend toward increased secrecy from companies sharing less about model intricacies, deviating from prior norms of transparency.
Jamba's Speed Insight: Engineers scrutinized how Jamba's end-to-end throughput efficiency improves with more tokens during the decoding process. Some members questioned the increase, given decoding is sequential, but the consensus highlighted that throughput gains exist even as the context size increases, impacting decoding speed.
Decoding Efficiency Puzzler: A pivotal discussion unfolded around a graph showing Jamba's decoding step becoming more efficient with a larger number of tokens. Confusion was addressed, and it was elucidated that the higher throughput per token affects decoding phase efficiency, countering initial misconceptions.
Perplexity AI ▷ #general (888 messages🔥🔥🔥):
Claude 3 Haiku Holds Its Own Against Opus: Users discussed the effectiveness of Claude 3 Haiku within Perplexity AI, analyzing how well it handles reasoning and trick questions compared to Opus, and its cost-effectiveness as a smaller, cheaper model.
Concerns Over Introducing Ads to Perplexity: There's speculation and concern among users regarding Perplexity AI's potential plans to introduce ads, especially in relation to the Pro subscription. The credibility of the news, possibly being an April Fools' joke, is being debated, with references to articles from AdWeek and Gadgets360 discussing advertising strategies.
Prevalence of Writing Mode Focus: Discussions centered around whether Writing focus mode within Perplexity is superior, with users suggesting it provides a better user experience and less problematic results than the All focus mode which encompasses web search. There is a clear preference for writing mode for its cleaner LLM interactions.
Prompt Attacks Security Concerns: A user inquired about how Perplexity AI secures its models, like Sonar, against prompt attacks and other security vulnerabilities. The conversation shifted towards the broader issue of protecting LLMs against policy violations due to poisoned data or prompt injections.
Gemini 1.5 Pro API Pricing Commentary: Users discussed the preview pricing of Gemini 1.5 Pro, which is noted to be expensive at $7 per million tokens for its 1 million token context ability. Conversations point to hopes for future price adjustments and the potential for tiered pricing based on context window usage.
Links mentioned:
Perplexity AI ▷ #sharing (19 messages🔥):
Perplexity AI ▷ #pplx-api (16 messages🔥):
sonar-medium-online
model, experiencing 429 errors despite adhering to the stated 20 requests per minute limit.Links mentioned:
LAION ▷ #general (525 messages🔥🔥🔥):
Hugging Face Diffusers PR Discussion: Community members discussed a PR about disabling autocast for CUDA devices on the Hugging Face Diffusers GitHub. The conversation pivots to a critique of Hugging Face for not having unified code across different pipelines and trainers, pointing to efficiency and absurdity.
Persistent Issue with Merging Different SD Pipelines: Community members highlighted an ongoing issue captured in GitHub discussion #551 about merging different Stable Diffusion pipelines, noting the complication persists due to being decided to keep separate pipelines.
Criticisms of Hugging Face's Engineering Priorities: A discussion emerged criticizing Hugging Face's engineering work on Diffusers, relating to both a lack of enough engineers and too many 'AI code thought leaders,' as well as conflicting approaches like the adoption of microservices frameworks by the engineers.
PyTorch Version Specific Discussions: Community members had extensive technical discussions on pytorch versions, mentioning the silent addition of bfloat16 support in PyTorch 2.3 and the complexities of nightly builds. There were comments on autocast performance issues and possible fixes, details added on a GitHub thread, and the anticipation for the PyTorch 2.6 release.
AI Generated Images and Sampling Settings: The quality of images generated by various diffusion model versions and configurations were critiqued, with a particular focus on images of hammers. Differences in samplers and their configurations led to an exchange on the efficacy and correct use of these parameters.
Links mentioned:
LAION ▷ #research (9 messages🔥):
Introducing Gecko for Efficient Text Embedding: Gecko, a compact text embedding model, showcases strong retrieval performance by distilling knowledge from large language models into a retriever. It outperforms existing models on the Massive Text Embedding Benchmark (MTEB), with details available in the Hugging Face paper and the arXiv abstract.
Potential Application of Gecko in Diffusion Models: The conversation suggests exploring the use of Gecko to potentially accelerate diffusion model training, replacing the usage of T5. The discussion is speculative about the impact on model performance, especially in terms of embeddings.
Gecko Weights Inquiry: A member inquired if the weights for the aforementioned Gecko are available, indicating interest in its practical application.
Assessing Large Vision-Language Models: The MMStar Benchmark examines the efficacy of evaluating Large Vision-Language Models, pinpointing issues such as unnecessary visual content for problem-solving where text suffices.
Announcement of Aurora-M, a Multilingual LLM: The new preprint for Aurora-M, a 15.5B parameter, red-teamed, open-source, and continually pre-trained multilingual large language model, is introduced. It has processed over 2T training tokens and meets the guidelines of the White House EO, with more details found on Twitter and arXiv.
Improving Spatial Consistency in t2i Translations: Incorporating better spatial descriptions in captions during fine-tuning enhances the spatial consistency of images generated by text-to-image models. The study's results are detailed in an arXiv preprint.
Links mentioned:
LAION ▷ #learning-ml (1 messages):
Link mentioned: Meetup #3 LangChain and LLM: Using LangSmith to go from prototype to production, mer. 17 avr. 2024, 18:30 | Meetup: Nous avons le plaisir d'accueillir Harrison Chase, le Co-Founder et CEO de LangChain, pour notre troisième Meetup LangChain and LLM France ! Ne loupez pas cette occasion u
Unsloth AI (Daniel Han) ▷ #general (212 messages🔥🔥):
Links mentioned:
Unsloth AI (Daniel Han) ▷ #help (311 messages🔥🔥):
Unsloth Update Breaks Inference: Users reported encountering size mismatch errors during inference with Unsloth AI after an update. A fix was applied, reverting changes, which resolved the issue for users.
Model Saving Challenges on Colab: A user struggling with saving a finetuned 13B model on limited storage available on Google Colab was advised to try Kaggle for its free access to 2x Tesla T4s. Another user suggested using the model.save_pretrained_gguf("model", tokenizer, quantization_method = "q5_k_m", first_conversion = "q8_0")
method to save 8GB of space but noted a potential 0.1% loss in accuracy.
Jamba Model Support Speculation: Conversation about the complexity of adding Jamba model support to Unsloth, acknowledging the difficulty due to it being a Mamba and MoE model.
Finetuning Evaluation Clarification: A detailed response was given regarding the Unsloth SFT Trainer's evaluation process, explaining how to get evaluation metrics by explicitly passing the evaluation dataset and strategy.
Load Dataset Slowness and Potential IPv6 Issue: Users discussed significant delays when using load_dataset
in local Jupyter notebooks, suspecting it might be related to IPv6 settings on Ubuntu systems. The same command was reported to work normally on Windows with WSL and IPv4.
Links mentioned:
pad_token_id
to eos_token_id
:2 for open-end gen...Unsloth AI (Daniel Han) ▷ #suggestions (4 messages):
Stability.ai (Stable Diffusion) ▷ #general-chat (377 messages🔥🔥):
Cyberrealistic vs. EpicRealism XL: Participants discussed the Cyberrealistic and EpicRealism XL models in relation to realistic image generation. They found that while Cyberrealistic requires detailed prompts, EpicRealism XL produces better outcomes with more relaxed prompts.
SD3 Anticipation: There's community anticipation for SD3 release with a timeframe mentioned of 4-6 weeks from a previous announcement. Doubts are expressed regarding the release timing, with some users sharing their eagerness for the new version's capabilities and improvements, especially the fixed text function.
Face and appendage model challenges: Users described issues with face and hand rendering using Stable Diffusion, with various fixes like Adetailer and embeddings suggested. Efforts are made to find quick and reliable solutions that can minimize additional processing time during batch image generations.
CHKPT Model Guidance: Queries about guides to CHKPT models were shared due to the vast quantity available, seeking information on which models are best for specific purposes. Suggestions for specific models like ponyxl, dreamshaperxl, juggernautxl, and zavychroma were given as part of a stable diffusion checkpoint "starter pack."
Model Performance Discussions: Conversations spanned various topics including the speed of AI development, ethical considerations surrounding AI training with professional artwork, and the potential memory requirements for upcoming Stable Diffusion versions. There was also banter and jokes, highlighting the lighter side of community engagement.
Links mentioned:
Nous Research AI ▷ #off-topic (4 messages):
Link mentioned: DBRX: A New State-of-the-Art Open LLM: Introducing DBRX, an open, general-purpose LLM created by Databricks. Across a range of standard benchmarks, DBRX sets a new state-of-the-art for established...
Nous Research AI ▷ #interesting-links (18 messages🔥):
Surprising Error Rates for HF Transformers: Members shared surprise over the high error rates with HF transformers and mentioned using BetterTransformer with Distil whisper v3 large. The intention to consider WhisperX for future projects was expressed.
Anticipating the End of Web Silos: A link was shared discussing impending transformations of the web, including a shift in search engines from information retrieval to a more predictive and personalized approach, the need for revenue strategy overhauls due to interconnected digital ecosystems, and potential obsolescence of dedicated Web User Interfaces (UIs). Read about the impending digital transformations in Transforming the Web.
New Apple Paper on Reference Resolution: Discussion about a new Apple paper suggesting an 80M parameter model outperforms GPT-3.5 and a 250M parameter model beats GPT-4 in most benchmarks for reference resolution. The conversation noted the critical importance of reference resolution in the effectiveness of AI agents. Catch up on the Apple paper here.
Reference Resolution's Role in AI Accuracy: Continuing on the subject of reference resolution, it was highlighted that this could be a significant factor in AI agents committing errors during task execution. Further dialogue involved an interpretation that 80M parameter models performed surprisingly well on unseen tasks, potentially due to a high margin of error across models or similarities in accuracy.
Latest Reference from Twitter: A Twitter post highlighting newly released information was suggested for comparison in the ongoing discussions. It could be added as a further point of reference for model comparisons. Check out the new information on this Twitter post.
Links mentioned:
Nous Research AI ▷ #general (104 messages🔥🔥):
Links mentioned:
Nous Research AI ▷ #ask-about-llms (37 messages🔥):
llama 2
into a website as a fine-tuned chatbot. No specific resources or solutions were provided in response.NousResearch/Hermes-2-Pro-Mistral-7B
on a custom domain benchmark but found accuracy decreased post-fine-tuning. Items in the config, like lora_r
, lora_alpha
, and sequence_len
were outlined, but no diagnosis was given for the observed accuracy drop.OLMo-Bitnet-1B
and NousResearch/Hermes-2-Pro-Mistral-7B
were provided, including discussions on handling special tokens and tokenizer configurations. A specific technique ensured tokenizer configs contained necessary tokens for primary functionality, and PRs for these configurations were merged.Links mentioned:
Nous Research AI ▷ #project-obsidian (4 messages):
Link mentioned: Sayali9141/traffic_signal_images · Datasets at Hugging Face: no description found
Nous Research AI ▷ #bittensor-finetune-subnet (2 messages):
safetensors.sharded = true/false
key to the model metadata. This key is not accepted by the Hugging Face Python library's hf_api.py
method, causing a failure to load ModelInfo necessary for pushing and validating models.Nous Research AI ▷ #rag-dataset (5 messages):
<scratchpad>
to gather evidence for RAG (retrieval-augmented generation) from the Claude prompt-engineering guide.notes
similar to a scratchpad intended for user interaction.Nous Research AI ▷ #world-sim (110 messages🔥🔥):
Envisioning File Uploads & Local Cache: Members discussed the value of file uploading as a feature for WorldSim, suggesting it could enhance efficiency by running pre-written scripts. Another suggestion was maintaining a local cache to simulate file system navigation and the concept of a "Generative Virtual Machine (GVM)" for dumping files to maintain consistency.
WorldSim Easter Egg Uncovered: Users discovered an April Fools' Day easter egg within WorldSim. The easter egg is triggered when discussing morality and adds a playful twist to interactions.
Competitive WorldSim Challenges: A member proposed a concept akin to "LLM Coliseum," where a similar competitive benchmark involving WorldSim tasks could test LLMs against each other in a competitive setting, potentially even with an LLM acting as a judge for the competitions.
WorldSim Future Features and Roadmap Speculation: There was discussion on the potential for future WorldSim features, such as a competitive leaderboard, text to video integration, and input/output capabilities. Users express a desire for a publicly available roadmap or updates for transparency on upcoming developments.
WorldSim as a Platform for AI Battles: Ideas were shared about AI battles within WorldSim, where a "constellation of BBS's" could run games from various eras, with an emergent theme of unifying opposites and philosophical dimensions, including treasure hunts and alchemical lore.
Links mentioned:
tinygrad (George Hotz) ▷ #general (244 messages🔥🔥):
GPU Stability Nightmares: Users shared experiences of severe system instability when using AMD GPUs, highlighting issues such as memory leaks and non-recoverable errors after running benchmarks and stressing the GPUs. Errors like "amdgpu: failed to allocate BO for amdkfd"
and "amdgpu: Failed to create process device data"
were reported, indicating hardware/firmware level issues.
AMD's Invitation Viewed Skeptically: One user received an invitation to AMD's Vanguard program after bringing attention to GPU reset issues on AMD's subreddit. However, George Hotz and others express doubt about AMD's commitment to resolving underlying problems, emphasizing actions over words and stressing the importance of open-source documentation and sources for meaningful progress.
Perception of AMD within Tinygrad Community: There is palpable frustration with AMD's approach to software and drivers within the tinygrad community. Hotz predicts future regrets for large investments in AMD's MI300X due to poor testing and cultural resistance to modern software practices.
Approaches to Dealing with GPU Resets: Discussions surround workaround strategies like PCIe power cycling and redundancy to tackle the inability to reset AMD cards after crashes. Various anecdotes and potential solutions like "PCIe hotswap" or "GPUs in RAID 1" are humorously proposed, but ultimately signify the gravity of the issue.
Reflections on Software and Firmware Practices: There's an ongoing dialogue about the need for a fundamental cultural shift in AMD regarding their software and firmware practices. George Hotz speculates that with the right management and testing protocols, such as CI and fuzzing, it may be possible to replace the current complex firmware with something simpler and more robust.
Links mentioned:
tinygrad (George Hotz) ▷ #learn-tinygrad (31 messages🔥):
Linear uOps Exposed: A member shared a write-up on linear uops to assist others in understanding the intermediate representation used in tinygrad. They note that while the content is based on personal study notes, feedback and suggestions are welcomed.
Command Queue Clarification: There was a discussion around the new command queue implementation in tinygrad. A tutorial has been shared that explains changes following a recent merge, indicating that the command queue is a replacement for the "run_schedule" function.
Test Code Puzzles: A Pull Request to tinygrad raised questions about commented-out unittest code and backend checks. It was clarified and fixed in PR #4034, ensuring tests could be run with different backends like CLANG and OpenCL on Intel GPUs.
ShapeTracker Specs Scrutinized: A specification for a high-level shapetracker was briefly discussed, touching upon the topic of mergeability and expressing strides for shapes in mathematical notation (Z^n), which can be negative.
Findings on Jitted Functions: Members were trying to understand why jitted functions didn't appear in the command queue logs, discussing operational aspects of tinygrad's command queue and how it affects the execution of scheduled items.
Links mentioned:
LM Studio ▷ #💬-general (89 messages🔥🔥):
LM Studio Error Troubles: Members are experiencing errors with LM Studio, such as unknown exceptions during inferencing and crashes after a few messages with Quantized models, specifically citing issues with estopian maid 13B q4 on an RTX 3060 GPU.
Seeking Voice Understanding Models in LM Studio: A member inquired about models that understand voice directly, to which a response clarified that LM Studio requires a separate tool for speech-to-text since there's no TTS (Text-to-Speech) or STT (Speech-to-Text) functionality built in, mentioning whisper.cpp as an example.
Optimizing Context Length and Model Selection for Development: Discussions about how to manage context length in LM Studio and which models are best for software development surfaced, echoing that best practices and model choice can vary based on the user's hardware.
LM Studio Settings and Performance Queries: Users are seeking tips on how to increase performance, with advice given on how to set the model to use more GPU via the "GPU Offload" option in settings, and confirming that LM Studio can load locally stored GGUF files.
Updates, Downgrades, and Usage Help: Individuals are navigating issues with model loading, seeking ways to downgrade to a previous stable version of LM Studio, and looking for specific and potentially nonexistent features such as support for PKL models or running embedding models—both of which aren't supported at this time.
Links mentioned:
LM Studio ▷ #🤖-models-discussion-chat (44 messages🔥):
Links mentioned:
LM Studio ▷ #🧠-feedback (1 messages):
LM Studio ▷ #🎛-hardware-discussion (80 messages🔥🔥):
Links mentioned:
LM Studio ▷ #autogen (1 messages):
Eleuther ▷ #general (54 messages🔥):
Model Distillation and Claude3 Haiku Performance: Users discussed the distillation of larger models like Claude3 Opus into smaller, more efficient ones like Claude3 Haiku. Some are impressed by Haiku's performance, considering it might suffice for many use cases originally thought to require GPT-4.
Residual Block Discussion Sparks Technical Debate: A technical conversation emerged around why residual blocks in neural architectures often use two linear layers. Users explained that two layers with non-linearity increase expressiveness and allow for flexible parameterization.
AI Mathematical Olympiad Engagement: Mention of the Kaggle AI Mathematical Olympiad competition sparked a suggestion that the EleutherAI community could form groups to compete. Compute grants for "AI in science" could potentially support such initiatives.
Resource Sharing and Project Joining: New members introduced themselves, sharing their research interests in fields like alignment, privacy, fine-tuning of large language models, and autoformalisation. They are looking to contribute to projects and learn from the community.
Building a Credible Benchmarking Dataset: One user inquired about the necessity of a peer-reviewed paper when creating a benchmarking dataset for a new language, seeking advice on establishing credibility for their dataset.
Link mentioned: AI Mathematical Olympiad - Progress Prize 1 | Kaggle: no description found
Eleuther ▷ #research (111 messages🔥🔥):
Deciphering Google's Search Infrastructure: Discussion revolved around Google's ability to embed the entire web in RAM for rapid indexing and retrieval. Participants talked about the potential infrastructure, with comments stating that Google may use a distributed version of FAISS and operates primarily with data stored in RAM to ensure fast response times, essential to their operations.
Musing Over Google's Approach to Programming: In a further conversation about Google's technical strategies, it was mentioned that Google isn't afraid to utilize "bad" programming constructs like global variables or goto
if they serve a purpose. There's also reference to utilizing thread local storage to streamline context handling in remote procedure calls.
Discussing the Limits of Text Indexing: Questions arose around how Google handles obscure text search queries that necessitate exact matches, leading to an explanation of Google's use of inverted indexes. Different indexing strategies, such as full-text search and inverted indexes, were considered for handling wide-ranging and exact match queries efficiently.
Suspense Over New Research Papers: There was anticipation for new papers being shared, with specific interest in the robustness of safety filters for LLMs. A link to recent research was provided, with a nod to the essential nature of continued exploration in the field, involving safeguarding against reverse-engineering or misusing language models.
Nonchalant Revelation of Sleek OSS Agent: A link was shared highlighting an open-source software agent from Princeton NLP named SWE-Agent, which claims to perform on par with proprietary agents like Devin in software engineering tasks. This piqued interest as an example of cutting-edge open-source contributions to NLP.
Links mentioned:
Eleuther ▷ #interpretability-general (4 messages):
Links mentioned:
Eleuther ▷ #lm-thunderdome (18 messages🔥):
ValueError
issues despite using DEBUG verbosity and was advised to share the YAML configuration file contents for further troubleshooting assistance.Links mentioned:
Eleuther ▷ #multimodal-general (3 messages):
Eleuther ▷ #gpt-neox-dev (2 messages):
OpenAI ▷ #annnouncements (1 messages):
ChatGPT without the Wait: OpenAI introduces the option to use ChatGPT instantly, without needing to sign-up, aiming to make AI accessible to a broader audience. The tool is already used weekly by over 100 million people in 185 countries.
No Account, No Problem: The company is rolling out this feature gradually, with a commitment to its mission of making AI tools like ChatGPT widely available.
Your Input Helps Improve AI: Users' interactions with ChatGPT may be used to improve model performance, but there is an option to opt out of this in the settings—even without creating an account. More details on data usage can be found in the Help Center.
Link mentioned: Start using ChatGPT instantly: We’re making it easier for people to experience the benefits of AI without needing to sign up
OpenAI ▷ #ai-discussions (95 messages🔥🔥):
/describe
command, while also observing limitations in the effectiveness of such tools and their availability.Link mentioned: Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models: With LLMs shifting their role from statistical modeling of language to serving as general-purpose AI agents, how should LLM evaluations change? Arguably, a key ability of an AI agent is to flexibly co...
OpenAI ▷ #gpt-4-discussions (38 messages🔥):
Exploring LLM's Reflective Capabilities: Members discussed the possibility of prompting LLM to reflect internally, with one sharing insights that while LLM operates as a text predictor, structuring prompts effectively can avoid leaps in logic, with a reference to OpenAI's official guide on prompt engineering found here.
April Fools' Technical Jokes?: One member flagged a link presuming it to be an April Fools' joke, while another confirmed the functionality of the discussed feature as operational, despite not being able to provide a screenshot due to permission restrictions.
GPT Model Comparisons and Anticipation for GPT-5: The conversation included a commentary on GPT-4's perceived superiority over Opus, despite its larger context window, and expressed anticipation for GPT-5, suggesting that once it includes better reasoning, code interpretation, and internet access, it could become a strong contender.
Server Stability Issues Garner User Attention: Several members experienced issues with server stability, affecting their ability to log in and use services with one reporting a persistent "Error in input stream" for an extended period and soliciting known solutions.
Diverse AI Utilization and Development: Users shared their developments and experiences with different AI services, including a custom-built GPT for finding persona details and a discussion about the availability of the new image editing feature in DALL-E 3, linking to the official instructions here.
OpenAI ▷ #prompt-engineering (7 messages):
JSON Conundrums with PDFs: A member inquired about the best approach to convert a pdf into a JSON object using GPT and pondered whether a schema or an open-ended prompt would be more effective. Another user suggested always sending a schema, although they noted the process can be quite random in effectiveness.
TOS Warning Over PDF Conversion: It was pointed out that converting a PDF to JSON potentially violates the terms of service.
Call for Research Participants: Anna, a Computer Science graduate conducting research at the American University of Armenia, invited ML engineers, content creators, prompt engineers, and other language model users for a 20-minute interview to discuss challenges associated with large language models.
Seeking Manager Replacement Prompts: A member requested suggestions for effective prompts to replace managerial tasks, focusing on division of directives and performance planning for middle to C-suite management positions.
OpenAI ▷ #api-discussions (7 messages):
Choosing the Best JSON Approach: Members are discussing the optimal way to extract JSON from a PDF using GPT. While one member has tried specifying the JSON schema, another is experimenting with a more open-ended approach to let GPT capture as much data as possible.
Random Results in Schema Enforcement: In the process of converting documents to JSON, schema provision to GPT was addressed, with experiences indicating varying levels of success and an element of unpredictability in the results.
Understanding LLM Use-Cases: Anna, a recent graduate in her research phase, is seeking to discuss with ML engineers and other professionals on their experiences and challenges in using large language models, asking interested parties to direct message or respond for a potential meetup.
Exploring Manager Replacement Prompts: A member is seeking advice on good manager replacement prompts related to middle and C suite management tasks, such as dividing up directives and performance plans, hinting at potential advancements in automating managerial functions.
LlamaIndex ▷ #announcements (1 messages):
Dive into RAFT with LlamaIndex Webinar: Join LlamaIndex's special webinar on Retrieval-Augmented Fine-Tuning (RAFT) featuring lead co-authors Tianjun Zhang and Shishir Patil for an in-depth session. Register for the event scheduled for this Thursday at 9am PT here.
Understanding RAFT via Upcoming Webinar: The webinar will explore how RAFT combines the benefits of retrieval-augmented generation (RAG) and fine-tuning to improve language models' performance in domain-specific settings. Take part to learn from the experts behind this technique on Thursday at 9am PT.
Complementary Resources for RAFT Enthusiasts: For additional context on the RAFT methodology, check out the dedicated RAFT blog posts and access the full RAFT paper to prepare for the webinar.
Generate Your Own RAFT Dataset: Thanks to @ravithejads, you can now create a dataset for RAFT using the RAFTDatasetPack provided by LlamaIndex. Access the pack here and find the corresponding notebook on GitHub.
Links mentioned:
LlamaIndex ▷ #blog (4 messages):
LlamaIndex ▷ #general (118 messages🔥🔥):
openai.organization
in the code or pass the organization parameter when initializing OpenAI
with Settings.llm = OpenAI(organization="orgID",...)
.NLSQLTableQueryEngine
to newer ones like SQLTableRetriever
.OpenAIAgent.from_tools
, handling errors such as 404 Not Found
with server endpoints like ollama, and implementing features in RAG such as weather queries with WeatherReader.text-davinci-003
model being deprecated. It was suggested to replace this model with gpt-3.5-turbo-instruct
in their GPTVectorStoreIndex setup.unstructured
extraction and vector store summarization.Links mentioned:
LlamaIndex ▷ #ai-discussion (4 messages):
Top Agent Simplification Issue: A member reported a problem while building a multi-document rag system using Agents where the top_agent oversimplifies the question. For instance, a query about the expiration date of chocolate gets reduced to just "expiration date," leading to unsatisfactory search results.
Specific Query Simplification Examples: The same member further illustrated the issue with another example, where the user asked for the expiration date of a fire extinguisher, but the agent only queried the retrieval engine with the term "expiration date."
IPEX-LLM and LlamaIndex Could Revolutionize Chat and Text Generation: A link to a Medium article titled "Unlocking the Future of Text Generation and Chat with IPEX-LLM and LlamaIndex" was shared, discussing the potential impacts these tools could have on the future of text generation and chat applications. Read the article here.
Tutorial Alert: Creating a RAG App with LlamaIndex: A member shared a YouTube video tutorial that provides a step-by-step guide for building a simple RAG application using LlamaIndex, Pinecone, and Gemini Pro. Key processes such as scraping content, converting to vector embeddings, storing on Pinecone index, and using LlamaIndex to query Gemini Pro are covered. Watch the tutorial here.
Link mentioned: How to build a RAG app using Gemini Pro, LlamaIndex (v0.10+), and Pinecone: Let's talk about building a simple RAG app using LlamaIndex (v0.10+) Pinecone, and Google's Gemini Pro model. A step-by-step tutorial if you're just getting ...
LangChain AI ▷ #general (109 messages🔥🔥):
Handling Complex JSON with LangChain: A user encountered difficulty when each JSON line created a Document instead of one Document with metadata for the full JSON. They inquired about a solution, but no follow-up was given, the original issue is described in the JSON loader documentation.
Increased Token Usage with Agents Using Tools: A user noticed a 50% increase in token usage with agents using tools. It was clarified that tools retrieve and tokenize data, hence more tokens are used, and the system prompt is run once for inference assumptions but not every tool requires it.
Discussions on LangGraph and Structured Tool Validation: Users discussed the potential to use a base model as the state in LangGraph and provided a Github notebook example. Also, instructions on using Pydantic's BaseModel
and Field
classes to validate a field in a StructuredTool in LangChain were shared from Github issues and LangChain documentation.
Issues with Structured Output and Fine-tuning: Users discussed problems related to obtaining structured output from a chain and the preservation of base knowledge after fine-tuning a model. A user suggested having two agents, a fine-tuned one and a regular GPT model, to maintain both specialized and general knowledge. No definitive solution to the structured output issue was posted within the conversation.
Mapping Content Between PDFs Using LangChain: A user was attempting to map related content between PDFs using a RAG with RetrievalQA chain and was advised to try using vector embeddings to match paragraphs based on semantic content. They also asked about handling images in LangHub, encountering a deserialization error, but again, no solution was provided within the conversation.
Links mentioned:
LangChain AI ▷ #share-your-work (5 messages):
Langgraph Advocated for Conversational Bots: A member praised langgraph for making it easy to implement cycles, highlighting its importance in creating advanced conversational taskbots. This feature sets it apart from other LLM app frameworks and will be further documented through community contributions like blog posts.
Custom Food Ordering with OpenGPTs: A user showcased the extensibility of OpenGPTs by integrating a custom food ordering API, demonstrating the platform's adaptability for custom AI applications. Feedback is sought on their YouTube demo titled "Hack OpenGPT to Automate Anything".
PersonaFinder GPT Released: A member has developed PersonaFinder GPT, a conversational AI that can provide information on individuals based on their name, country, and profession. The tool is available for testing at PersonaFinder Pro.
Call for Proficient Prompters to Test New Tool: There's a request for proficient prompters to test and provide feedback on a new tool designed for automated code transformations to maintain code standards and quality for production deployment. The tool is accessible here.
Kleinanzeigen Ad Shared: An ad on Kleinanzeigen for a picture was shared, though it seems unrelated to AI or projects on the LangChain AI Discord. The Mona Bild can be viewed here.
Links mentioned:
HuggingFace ▷ #general (79 messages🔥🔥):
Links mentioned:
HuggingFace ▷ #today-im-learning (1 messages):
docphaedrus: https://youtu.be/7na-VCB8gxw?si=azqUL6dGSMCYbgdg
HuggingFace ▷ #cool-finds (5 messages):
FastLLM Breaks the Billion-Token Barrier: FastLLM (FLLM), Qdrant's lightweight Language Model for Retrieval Augmented Generation (RAG), enters Early Access with an impressive context window of 1 billion tokens. It is specifically designed to integrate with Qdrant, heralding a revolution in AI-driven content generation and retrieval capabilities. Read more about it on their announcement post.
Reinforcement Learning with Entropy in Mind: An academic paper on Soft Actor-Critic, a method for Off-Policy Maximum Entropy Deep Reinforcement Learning, is shared, providing insights into stochastic actor approaches for reinforcement learning. The full text can be found on arXiv.
Finding the Right Open Source Status Page: A blog post on Medium introduces the 6 best open-source status page alternatives for 2024, offering insights for developers and teams looking to efficiently monitor and communicate their systems' status. The full article is available on Medium.
IPEX-LLM and LlamaIndex Lead the Way: A new Medium article discusses IPEX-LLM and LlamaIndex as potential game-changers in the realm of text generation and chat capabilities. The detailed piece on these advanced tools is accessible here.
Link mentioned: Introducing FastLLM: Qdrant’s Revolutionary LLM - Qdrant: Lightweight and open-source. Custom made for RAG and completely integrated with Qdrant.
HuggingFace ▷ #i-made-this (12 messages🔥):
<ul> <li><strong>Stream of Bot Conscience:</strong> Introducing <strong>LLMinator</strong>, a context-aware streaming Chatbot that enables running LLMs locally with Langchain and Gradio, compatible with both CPU and CUDA from HuggingFace. Check it out on <a href="https://github.com/Aesthisia/LLMinator">GitHub</a>.</li> <li><strong>Data Management Made Easier:</strong> DagsHub launches a new integration for Colab with DagsHub Storage Buckets, promising a better data management experience akin to a scalable Google Drive for ML. Example notebook is available on <a href="https://colab.research.google.com/#fileId=https%3a%2f%2fdagshub.com%2fDagsHub%2fDagsHubxColab%2fraw%2fmain%2fDagsHub_x_Colab-DagsHub_Storage.ipynb">Google Colab</a>.</li> <li><strong>Python's New Rival, Mojo:</strong> Speculations arise about the Mojo Programming Language surpassing Python in performance, as discussed in a YouTube video titled "Mojo Programming Language killed Python." Watch the full explanation <a href="https://youtu.be/vDyonow9iLo">here</a>.</li> <li><strong>Robotics Showcase:</strong> A member has built an advanced line follower and wall follower robot with a colour sensor, demonstrated in a YouTube video by SUST_BlackAnt. Find the full presentation <a href="https://www.youtube.com/watch?v=9YmcekQUJPs">here</a>.</li> <li><strong>Launch SaaS with OneMix:</strong> The new SaaS boilerplate OneMix claims to accelerate project launches by providing essentials like landing page, payment, and authentication setup. More details are available at <a href="https://saask.ing">saask.ing</a> and a demo on <a href="https://www.youtube.com/watch?v=NUfAtIY85GU&t=8s&ab_channel=AdityaKumarSaroj">YouTube</a>.</li> </ul>
Links mentioned:
HuggingFace ▷ #reading-group (1 messages):
grimsqueaker: yay! thanks!
HuggingFace ▷ #computer-vision (7 messages):
Batch Size Equivalence Query: A member questioned if using batch size 32 with accumulation of 2 is comparable to batch size 64 for training different sizes of architectures, such as ConvNeXt.
Research Outreach in Quantum Neural Networks: A member shared they are conducting research on the performance of quantum neural network models on traditional image datasets and faced a hiccup.
Feature Extraction & Quantum SVM Inquiry: The member elaborated on their research, mentioning they extracted features using a transformer model and seeking advice for using these in a Quantum SVM (QSVM) for multi-class classification.
Seeking Quantum Kernel & Hyperparameter Guidance: Recommendations for choosing an appropriate quantum kernel and hyperparameters for QSVM were sought, specifically within Qiskit 1.0.2.
Open Collaboration Invitation: An interest in the QSVM research was expressed by another member, leading to an open invitation for direct messaging and potential collaboration.
HuggingFace ▷ #diffusion-discussions (8 messages🔥):
Triggering LoRA with Diffusers: A member inquired about how to trigger LoRA with diffusers for which the response provided guidance on using PEFT for inference including loading and managing adapters, especially LoRA, with the DiffusionPipeline.
Model Usage Confirmation: The same member followed up with another query about knowing if a model is being used, which did not receive a direct response within the provided messages.
Seeking Assistance with PDFs: A community member requested help for fine-tuning an open-source language model on PDF files, expressing challenges in this endeavor, yet no specific advice was offered in the chat record provided.
Checking In on Mistral: A check-in was made regarding updates to Mistral, but no new information or responses followed the inquiry.
Realtime Video Jitters in Technology Discussion: A community member shared observations of jitter and drift in a realtime video, questioning whether this could be due to rounding errors or a bug in the process, and hoping for insights into controlling this issue for better output.
HuggingFace ▷ #gradio-announcements (1 messages):
cache_examples="lazy"
, especially benefiting ZeroGPU users by caching examples upon their first request rather than at server startup.Modular (Mojo 🔥) ▷ #general (8 messages🔥):
RL Integration Challenges in Mojo: A member enquired about the challenges in running Reinforcement Learning (RL) Python training in Mojo, specifically the use of a PyTorch environment within Mojo. They were informed about the upcoming MAX Engine and C/C++ interop in Mojo, as detailed in the Mojo Roadmap, which will allow for re-implementing PyTorch interfaces and speed up RL environment development and execution.
Mojo Doc Cheerleader: In response to a discussion about documentation, a member praised the new documentation for Mojo, saying it is quite comprehensive.
Mathematical Symbols in Mojo Variable Names: There was a question about Mojo's support for mathematical names similar to Julia. It was clarified that Mojo currently supports only ASCII characters for variable names and follows Python's conventions for variable names.
String Handling Curiosity in Mojo: A member was curious about why the division operator "/" isn't "Stringable" in Mojo, questioning if all string entities should inherently possess the Stringable trait.
Emoji Variable Naming Workaround: A different member pointed out that in Mojo, symbols (including emojis) can be used as variable names by enclosing them in backticks, providing an example where an emoji is used as a variable.
Link mentioned: Mojo🔥 roadmap & sharp edges | Modular Docs: A summary of our Mojo plans, including upcoming features and things we need to fix.
Modular (Mojo 🔥) ▷ #💬︱twitter (10 messages🔥):
Modular (Mojo 🔥) ▷ #✍︱blog (1 messages):
Link mentioned: Modular: What’s new in Mojo 24.2: Mojo Nightly, Enhanced Python Interop, OSS stdlib and more: We are building a next-generation AI developer platform for the world. Check out our latest post: What’s new in Mojo 24.2: Mojo Nightly, Enhanced Python Interop, OSS stdlib and more
Modular (Mojo 🔥) ▷ #🔥mojo (47 messages🔥):
is
operator might have different semantics, focusing on value equality rather than object identity.DTypePointer
showed significant performance differences, which was attributed to inefficient copy initialization and was rectified by improving the implementation see gist for details.Links mentioned:
Modular (Mojo 🔥) ▷ #community-projects (2 messages):
Refactoring Triumph for Prism CLI library: The Prism
CLI library modeled after Cobra underwent significant refactoring with the 24.2 update, resulting in a slew of new features such as shorthand flag support and enhanced command structure, which now manages parent and child relationships within struct fields. The update also ensures commands can use customized positional argument validation functions; the library however comes with several built-in validators. Check out the details and examples on GitHub.
Easing the Reference Wrangle: The creator of Prism
has signaled a strong interest in the evolution of References, citing them as a main challenge during development. Better usability around References is eagerly anticipated in future updates.
Link mentioned: GitHub - thatstoasty/prism: Mojo CLI Library modeled after Cobra.: Mojo CLI Library modeled after Cobra. Contribute to thatstoasty/prism development by creating an account on GitHub.
Modular (Mojo 🔥) ▷ #performance-and-benchmarks (4 messages):
matmul.mojo
related to test_matrix_equal[matmul_vectorized](C, A, B)
; adjusting the tolerance fixed the issue, suggesting a problem with result consistency between implementations.DType.float32
to DType.float64
at the top of the matmul.mojo
file, the member was able to eliminate the error for some matrix elements but not all, indicating the error might be related to rounding.Modular (Mojo 🔥) ▷ #⚡serving (7 messages):
Exploring MAX Beyond Triton: A member inquired about the potential benefits of MAX beyond just serving as a Triton backend. MAX Serving is described as a wrapper around MAX Engine which can be tried out using a local Docker container, with details available in the MAX Serving documentation.
Migration Clarification Sought: The same member asked about migrating from a current setup using Triton inference server with two models (a tokenizer and an ONNX/TensorRT model) to MAX, questioning whether the migration would be as simple as updating the backend in the config.
Assistance Offered for Migration to MAX: A rep offered help to the member contemplating the migration to MAX, expressing eagerness to understand the use case better and to support their pipeline's performance upgrade.
Details Matter in Migration: The rep asked for specifics regarding the member's setup, inquiring about how the tokenizer model was implemented and how the two models are connected, particularly if Ensemble Models or Business Logic Scripting features were being used.
Seamless ONNX, GPU Support Pending: While confirming that ONNX models would seamlessly work with a simple backend change in the config, the rep noted that MAX doesn't currently support GPU-hosted models, stating that it's being actively developed.
Link mentioned: Get started with MAX Serving | Modular Docs: A walkthrough showing how to try MAX Serving on your local system.
Modular (Mojo 🔥) ▷ #nightly (11 messages🔥):
modular update nightly/mojo
. A changelog detailing the differences between the stable and new nightly builds can be found on GitHub.MODULAR_MOJO_NIGHTLY_IMPORT_PATH
environment variable for configuration.testing
module over FileCheck for better practices.Links mentioned:
OpenInterpreter ▷ #general (17 messages🔥):
Links mentioned:
OpenInterpreter ▷ #O1 (45 messages🔥):
Links mentioned:
OpenInterpreter ▷ #ai-content (7 messages):
Exploring Open Interpreter: A member shared their YouTube video titled "Open Interpreter Advanced Experimentation - Part 2," which may contain new experiments with the Open Interpreter. The video is available at YouTube.
Fabric, the AI Augmentation Framework: A GitHub repository named fabric was introduced; it's an open-source framework for augmenting humans with AI. It utilizes a crowdsourced set of AI prompts for solving specific problems, accessible at GitHub - danielmiessler/fabric.
Microsoft's UFO for Windows OS Interaction: A member found Microsoft's UFO, a GitHub project described as a UI-Focused Agent for Windows OS Interaction. Questions arose if this is Microsoft's testing ground for implementing Open Interpreter (OI) on Windows, repository available at GitHub - microsoft/UFO.
Visual Intro to Transformers on YouTube: A video titled "But what is a GPT? Visual intro to Transformers | Deep learning, chapter 5" was shared, providing an introduction to transformers, the technology behind LLMs (Large Language Models). The video can be watched on YouTube.
Community Excitement for GPT Educational Content: Members expressed excitement about the educational content regarding transformers and GPTs. They shared their anticipation and approval with comments like "bookmarked!" and "Awesome 🚀".
Links mentioned:
OpenRouter (Alex Atallah) ▷ #general (66 messages🔥🔥):
Bot Name Prefix in Chatbot Responses: A user encountered responses starting with {bot_name}:
from the undi95/remm-slerp-l2-13b:extended model when using OpenRouter for roleplay chat using the messages
key and queried whether it was due to a prompt error or required text replacement. It was clarified that recent updates to prompt templating shouldn't have caused this and the issue was discussed further, exploring whether the name
field was being used.
Error Connecting to OpenRouter: A user reported an SSL error (EOF occurred in violation of protocol) when trying to connect to OpenRouter, but no solution was directly offered in the chat.
Announcement of "Patterns of Application Development Using AI": Obie Fernandez announced the early release of his book, Patterns of AI-Driven Application Architecture, highlighting the use of OpenRouter.
Enquiry About Model Performance and Availability: Users discussed the performance of various models and the availability of nitro and non-nitro models, with one seeking the fastest options available after the unavailability of nitro models. It was confirmed that nitro models are still available, and more are on the way.
General Troubleshooting and Model Suggestions: Users shared experiences with model failures such as NOUS-HERMES-2-MIXTRAL-8X7B-DPO, and gave advice on alternative models for specific tasks like roleplay, with suggestions including Nous Capybara 34B equipped with 30k context window. Concerns about OpenRouter logit bias not working on certain models were addressed with an explanation that it's supported only on OpenAI's models.
Links mentioned:
Mozilla AI ▷ #llamafile (41 messages🔥):
Benchmarks and Revelation of Thread Utilization: A member expressed surprise at the significant improvements over NumPy, assuming it was already heavily optimized, and requested to see benchmarking code. The code shared utilizes both NumPy and a custom matmul
function to demonstrate performance differences, revealing that NumPy does not use threads.
Eager Anticipation for New AI Updates: Discussion revolves around the release of llamafile 0.7 and attempts to use it with openchat 3.5. Members sought clarification on prompt templating and the use of variables within the UI, highlighting confusion due to a lack of documentation.
TinyBLAS vs Proprietary Libraries: In discussing llamafile's performance on CPU vs. GPU, it was stated that the --tinyblas
flag can be used for GPU support without installing CUDA or ROCm SDKs, but performance may vary based on the graphics card.
Compatibility Queries for Windows ARM64: Discussion on Windows ARM64 compatibility with llamafile raised questions about support and binary formats, revealing that Windows on ARM supports PE format with ARM64X binaries, but has issues with AVX/AVX2 emulation.
Exercise and Troubleshooting in Local Deployment: Users encountered an "exec format error" when trying to run llamafile locally, with suggestions to use bash instead of zsh, and clarifications provided for running Mixtral models on specific hardware configurations.
Links mentioned:
OpenAccess AI Collective (axolotl) ▷ #general (6 messages):
MDEL Marches Ahead with AMD: MDEL has successfully trained a 15B model using AMD GPUs, marking a potentially interesting development in hardware utilization for large-scale models.
Mistral Opens Its Doors: The Mistral team invited community members to an office hour session for questions, signaling an open channel for dialogue and support.
Skepticism Over New Release: A member jokingly inquired if the v0.2 release is an April Fools' prank, reflecting community surprise or skepticism towards the update.
Dataset Unification Challenges: A contributor is working on unifying approximately 15 different datasets into TSV and pickle-formatted index files, facing challenges such as misaligned translations and the sheer data volume. They're considering the creation of a single, gigantic JSON of language pairs without weighting.
Seeking Runpod Experience: A user inquired about experiences with runpod serverless for very large language models (VLLM), suggesting interest in community knowledge on this service.
OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (18 messages🔥):
lisa
has been approved for merging, indicating that the changes made are considered solid and beneficial.Links mentioned:
OpenAccess AI Collective (axolotl) ▷ #general-help (16 messages🔥):
val_set_size: 0
had been set, suggesting evaluation was not being performed.eval_table
, which is known for generating predictions and uploading to wandb
during evaluation.eval_table
enabled since they do not perform evaluation during training. A hint was given that this feature could be buggy.Latent Space ▷ #ai-general-chat (29 messages🔥):
FastLLM Launches with Big Claims: Qdrant announced their new language model FastLLM (FLLM) designed for Retrieval Augmented Generation with a staggering context window of 1 billion tokens. The AI community highlighted this as potentially effective trolling, referring to its announcement on April Fools' Day.
New Instructional Gem on Transformers: A video by 3Blue1Brown titled "But what is a GPT? Visual intro to Transformers | Deep learning, chapter 5" received attention for offering a visual introduction to transformers and GPTs.
LLM Answer Engine Github Project Unveiled: An open source project titled "llm-answer-engine" on GitHub garnered interest for building a Perplexity-Inspired Answer Engine using a robust stack including Next.js, Groq, Mixtral, Langchain, and OpenAI.
Instructor Abstraction for Structured LLM Outputs: The release of instructor 1.0.0 was noted, which is a tool that ensures structured outputs from LLMs align with user-defined Pydantic models, simplifying the interaction and integration with other system modules.
Google Revs Up on AI with New Leadership: Logan Kilpatrick announced his move to Google to lead product for AI Studio and support the Gemini API, indicating a significant focus on making Google a prime location for developers in AI.
Links mentioned:
CUDA MODE ▷ #general (4 messages):
Links mentioned:
CUDA MODE ▷ #triton (6 messages):
ncu --target-processes all --set detailed --import-source yes -o output_file python your_script.py
. This allows for profiling and subsequent analysis of the Triton code.Link mentioned: Accelerating Triton Dequantization Kernels for GPTQ: TL;DR
CUDA MODE ▷ #cuda (3 messages):
.run
file on Ubuntu, confirming the execution rights with chmod +x
and using sudo
, but was unable to find the DL Design application post-installation. They sought advice on how to open the DL Design app after installation.CUDA MODE ▷ #torch (2 messages):
Links mentioned:
CUDA MODE ▷ #off-topic (1 messages):
c_cholesky: Thank u 😊
Interconnects (Nathan Lambert) ▷ #news (1 messages):
Interconnects (Nathan Lambert) ▷ #random (2 messages):
Link mentioned: Tweet from Logan Kilpatrick (@OfficialLoganK): Excited to share I’ve joined @Google to lead product for AI Studio and support the Gemini API. Lots of hard work ahead, but we are going to make Google the best home for developers building with AI. ...
Interconnects (Nathan Lambert) ▷ #rl (5 messages):
Links mentioned:
Interconnects (Nathan Lambert) ▷ #sp2024-history-of-open-alignment (1 messages):
AI21 Labs (Jamba) ▷ #jamba (8 messages🔥):
Viewed using Just Read