Technology stack
Matured over many years of scientific research and engineering, Atlastic’s technology is built to bring media perception signals in real-time to the professional user, meaning no-code and no-quant needed. Whether it’s about corporate trust, reputation, ESG, or simply media presence: Atlastic combines the data, the AI’s and the UX-optimized platform to analyze the perceived value of any of the +50,000 publicly listed companies in real-time.
An ingenious solution is like sophisticated precision-clockwork: each technological component is designed to enhance the working of another — the combined yields system-level insights that are not detectable by merely crunching raw data, or having AI’s merely enrich articles. Valuable, time-critical insights are delivered when the raw data and AIs work in unison, and that there’s a UX-optimized platform that bridges the gap from information to insight by aggregating outputs from the AI/data unison. Atlastic’s technology is built for exactly this.
Media analysis the Atlastic way
Step 1. Raw news data collection
Our proprietary crawler and data pipelines continuously monitors 8.2M websites all around the world, identifies new web content, and then subsequently collects this raw data for subsequent processing.
Step 2. Data cleaning
Visual and textual elements not belonging to the article are removed for a clean article. Our AIs recognize the parts that are relevant to the article, and discards the noise.
Step 3. Turning unstructured into structured
Text is transformed into a machine readable format that facilitates further AI-based enrichments, as well as instant automated analytics across millions of articles.
Step 4. Document Analysis
The first phase of analysis, taking into account data points about the document and augments the machine-readable representation of the article with more information points, like ESG, sentiment and reach.
Step 5. Deep semantic analysis
Using state of the art NLP models, the persons, companies and locations that occur are identified. Their fine-grained perception is evaluated taking into account context and company knowhow.
No-code, no-quant platform
- Discover how each of the +50,000 publicly listed companies is perceived
- Get customizable alerts on signals from real-time data patterns
- Analyze whole industries, sectors or any investment thesis in one overview
- Have core insights extracted by distilling millions of raw articles and billions of AI-enriched data points in real-time
- Track the crypto-universe; digital assets rely even more on perception to keep momentum
Big data with a bit of space systems engineering DNA
- Atlastic has an in-house crawler that continuously monitors +8 million news sources, covering the latest news, magazine articles, blog posts, company press releases and more. Besides covering all mainstream news, it also goes deep into the niches within all fields of the media landscape — finance, hobbies, computers, religion, arts, ecommerce and much more.
- This is made possible using state-of-the-art big data. The +8 million sources are sharded and put out for embarrassingly distributed processing, while also fully utilizing the multi-core multi-level caches of the most modern processing chips.
- The media landscape never sleeps. That means the data is a continuous stream with articles and news from all over the world in hundreds of different languages. The system auto-detects different character encodings suitable for the writing system of each language.
- All data, including enriched data, is stored in a unified fault-tolerant database that allows for sophisticated boolean searches, a wide range of aggregations for deep analytics and index-optimized for real-time querying and data retrieval.
- Several fault management systems are put in place — inspired by space systems engineering. The whole big data infrastructure detects faults, isolates them to their causes and activates self-healing algorithms to ensure 24/7 reliable data processing.
Billion-scale AI, more than 100 languages covered
- No more generic AI’s: Atlastic’s AI’s are purpose-built for analyzing perception by our team of data scientists and linguists with cultural backgrounds from all corners of the world. The media landscape is global and multi-lingual. Our AI models are therefore built to be multi-lingual with support of over 100 languages.
- The NLP models deploy a transformer architecture: these provide state-of-the-art results on categorization, translation, and summarization. The models are trained using custom curated datasets using aligned data-labeling practices.
- The AIs are deployed into the streaming data pipeline, and comprise over a billion low-latency AI enrichments a day. To enable this scale, we accelerate the AI enrichments on dedicated hardware chips that are optimized for micro-batched parallel matrix operations.
Our AI models
In a world of endless media information, we combine 6 AI systems and 13 interconnected AI’s to make the understanding a lot easier and faster for the user.
Structuring: Turns raw data into structured data. The AI automatically adapts to various website conventions, article formats and accounts for cultural conventions and variations (e.g. Asian vs Western website layouts)
- ESG: Analyzes the nature of the text with respect to the 3 main ESG categories: Environment, Social and Governance and 26 subcategories.
- Reputation: Deep-learning based natural language processing is used to analyze and classify the perception, whether positive or negative at both document-level and the much more precise entity-level. It takes into account context, intonation, sentiment, slang and cultural factors.
- Impressions: Analyzes the website, audience and the article to analyze the estimated reached people using data on website traffic, link life-times and demographics.
- Names of Interest: Determines whether words and combination of words are either persons, companies, organizations, products, locations or events. It narrows down on specifically ambiguous cases, like Apple the tech company vs apple the fruit.
- Publication date: To enable time-series analysis, this AI’s extracts the exact publication data — whether it’s in English or Mandarin. It can disambiguate and interpret American conventions (MM/DD) and ISO-conventions (DD/MM)
Free demo