Machine Learning and Data Science Applications in Industry
Have a look at the newly started FirmAI Medium publication where we have experts of AI in business, write about their topics of interest.
Please add your tools and notebooks to this Google Sheet. Or simply add it to this subreddit, r/datascienceproject
Highlight in YELLOW to get your package added, you can also just add it yourself with a pull request.
A curated list of applied machine learning and data science
notebooks and libraries accross different industries. The code in
this repository is in Python (primarily using jupyter notebooks)
unless otherwise stated. The catalogue is inspired by
awesome-machine-learning. r/datascienceproject is a subreddit where you can
share all your data science projects.
Caution: This is a work in progress, please contribute, especially if you are a subject expert in any of the industries listed below. If you are a [analytical, computational, statistical, quantitive] researcher/analyst in field X or a field X [machine learning engineer, data scientist, modeler, programmer] then your contribution will be greatly appreciated.
If you want to contribute to this list (please do), send me a pull request or contact me @dereknow or on linkedin or get in contact on the website FirmAI. Also, a listed repository should be deprecated if:
- Repository's owner explicitly say that "this library is not maintained".
- Not committed for long time (2~3 years).
Help Needed: If there is any contributors out there willing to help first populate and then maintain a Python analytics section in any one of the following sub/industries, please get in contact with me. Also contact me to add additional industries.
Table of Contents
- Accommodation & Food
- Banking & Insurance
- Biotechnological & Life Sciences
- Construction & Engineering
- Education & Research
- Emergency & Relief
- Justice, Law and Regulations
- Media & Publishing
- Government and Public Works
- Real Estate, Rental & Leasing
- Wholesale & Retail
ML/DS Career Section for Industry Machine Learning
See data-science-career repo for more.
- Triplebyte - Take a quiz. Get offers from multiple top tech companies at once (now have a machine learning track).
- Toptal - Developers seeking to gain entry into the Toptal community are put through a battery of personality and technical tests.
- Hired - Hired matches employers with qualified candidates through a combination of in-house algorithms and online support.
- Kaggle - Scalable Path is a premium talent matching service.
- Glassdoor - Best employee narratives.
- Indeed - Best coverage.
- Kununu - Best well-rounded infromation.
- Comparably - Best comparison functionality.
- InHerSight - Best female-friendly perspective.
Accommodation & Food
- RobotChef - Refining recipes based on user reviews.
- Food Amenities - Predicting the demand for food amenities using neural networks
- Recipe Cuisine and Rating - Predict the rating and type of cuisine from a list of ingredients.
- Food Classification - Classification using Keras.
- Image to Recipe - Translate an image to a recipe using deep learning.
- Calorie Estimation - Estimate calories from photos of food.
- Fine Food Reviews - Sentiment analysis on Amazon Fine Food Reviews.
- Restaurant Violation - Food inspection violation forecasting.
- Restaurant Success - Predict whether a restaurant is going to fail.
- Predict Michelin - Predict the likelihood that restaurant is a Michelin restaurant.
- Restaurant Inspection - An inspection analysis to see if cleanliness is related to rating.
- Sales - Restaurant sales forecasting with LSTM.
- Visitor Forecasting - Reservation and visitation number prediction.
- Restaurant Profit - Restaurant regression analysis.
- Competition - Restaurant competitiveness analysis.
- Business Analysis - Restaurant business analysis project.
- Location Recommendation - Restaurant location recommendation tool and analysis.
- Closure, Rating and Recommendation - Three prediction tasks using Yelp data.
- Anti-recommender - Find restaurants you don’t want to attend.
- Menu Analysis - Deeper analysis of restaurants through their menus.
- Menu Recommendation - NLP to recommend restaurants with similar menus.
- Food Price - Predict food cost.
- Automated Restaurant Report - Automated machine learning company report.
- Peer-to-Peer Housing - The effect of peer to peer rentals on housing.
- Roommate Recommendation - A system for students seeking roommates.
- Room Allocation - Room allocation process.
- Dynamic Pricing - Hotel dynamic pricing calculations.
- Hotel Similarity - Compare brands that directly compete
- Hotel Reviews - Cluster hotel reviews.
- Predict Prices - Predict hotel room rates.
- Hotels vs Airbnb - Comparing the two approaches.
- Hotel Improvement - Analyse reviews to suggest hotel improvements.
- Orders - Order cancellation prediction for hotels.
- Fake Reviews - Identify whether reviews are fake/spam.
- Reverse Image Lodging - Find your preferred lodging by uploading an image.
- Chart of Account Prediction - Using labeled data to suggest the account name for every transaction.
- Accounting Anomalies - Using deep-learning frameworks to identify accounting anomalies.
- Financial Statement Anomalies - Detecting anomalies before filing, using R.
- Useful Life Prediction (FirmAI) - Predict the useful life of assets using sensor observations and feature engineering.
- AI Applied to XBRL - Standardized representation of XBRL into AI and Machine learning.
- Forensic Accounting - Collection of case studies on forensic accounting using data analysis. On the lookout for more data to practise forensic accounting, please get in touch
- General Ledger (FirmAI) - Data processing over a general ledger as exported through an accounting system.
- Bullet Graph (FirmAI) - Bullet graph visualisation helpful for tracking sales, commission and other performance.
- Aged Debtors (FirmAI) - Example analysis to invetigate aged debtors.
- Automated FS XBRL - XML Language, however, possibly port analysis into Python.
- Financial Sentiment Analysis - Sentiment, distance and proportion analysis for trading signals.
- Extensive NLP - Comprehensive NLP techniques for accounting research.
Data, Parsing and APIs
- EDGAR - A walk-through in how to obtain EDGAR data.
- PyEDGAR - A library for downloading, caching, and accessing EDGAR filings.
- IRS - Acessing and parsing IRS filings.
- Financial Corporate - Rutgers corporate financial datasets.
- Non-financial Corporate - Rutgers non-financial corporate dataset.
- PDF Parsing - Extracting useful data from PDF documents.
- PDF Tabel to Excel - How to output an excel file from a PDF.
Research And Articles
- Understanding Accounting Analytics - An article that tackles the importance of accounting analytics.
- VLFeat - VLFeat is an open and portable library of computer vision algorithms, which has Matlab toolbox.
- Rutgers Raw - Good digital accounting research from Rutgers.
- Computer Augmented Accounting - A video series from Rutgers University looking at the use of computation to improve accounting.
- Accounting in a Digital Era - Another series by Rutgers investigating the effects the digital age will have on accounting.
- Prices - Agricultural price prediction.
- Prices 2 - Agricultural price prediction.
- Yield - Agricultural analysis looking at crop yields in Ukraine.
- Recovery - Strategic land use for agriculture and ecosystem recovery
- MPR - Mandatory Price Reporting data from the USDA's Agricultural Marketing Service.
- Segmentation - Agricultural field parcel segmentation using satellite images.
- Water Table - Predicting water table depth in agricultural areas.
- Assistant - Notebooks from agricultural assistant.
- Eco-evolutionary - Eco-evolutionary dynamics.
- Diseases - Identification of crop diseases and pests using Deep Learning framework from the images.
- Irrigation and Pest Prediction - Analyse irrigation and predict pest likelihood.
Banking & Insurance
- Loan Acceptance - Classification and time-series analysis for loan acceptance.
- Predict Loan Repayment - Predict whether a loan will be repaid using automated feature engineering.
- Loan Eligibility Ranking - System to help the banks check if a customer is eligible for a given loan.
- Home Credit Default (FirmAI) - Predict home credit default.
- Mortgage Analytics - Extensive mortgage loan analytics.
- Credit Approval - A system for credit card approval.
- Loan Risk - Predictive model to help to reduce charge-offs and losses of loans.
- Amortisation Schedule (FirmAI) - Simple amortisation schedule in python for personal use.
Management and Operation
- Credit Card - Estimate the CLV of credit card customers.
- Survival Analysis - Perform a survival analysis of customers.
- Next Transaction - Deep learning model to predict the transaction amount and days to next transaction.
- Credit Card Churn - Predicting credit card customer churn.
- Bank of England Minutes - Textual analysis over bank minutes.
- CEO - Analysis of CEO compensation.
- Zillow Prediction - Zillow valuation prediction as performed on Kaggle.
- Real Estate - Predicting real estate prices from the urban environment.
- Used Car - Used vehicle price prediction.
- XGBoost - Fraud Detection by tuning XGBoost hyper-parameters with Simulated Annealing
- Fraud Detection Loan in R - Fraud detection in bank loans.
- AML Finance Due Diligence - Search news articles to do finance AML DD.
- Credit Card Fraud - Detecting credit card fraud.
Insurance and Risk
- Car Damage Detective - Assessing car damage with convolution neural networks for a personal auto claims.
- Medical Insurance Claims - Predicting medical insurance claims.
- Claim Denial - Predicting insurance claim denial
- Claim Fraud - Predictive models to determine which automobile claims are fraudulent.
- Claims Anomalies - Anomaly detection system for medical insurance claims data.
- Actuarial Sciences (R) - A range of actuarial tools in R.
- Bank Failure - Predicting bank failure.
- Risk Management - Finance risk engagement course resources.
- VaR GaN - Estimate Value-at-Risk for market risk management using Keras and TensorFlow.
- Compliance - Bank Grievance Compliance Management.
- Stress Testing - ECB stress testing.
- Stress Testing Techniques - A notebook with various stress testing exercises.
- Reverse Stress Test - Given a portfolio and a predefined loss size, determine which factors stress (scenarios) would lead to that loss
- BoE stress test- Stress test results and plotting.
- Recovery - Recovery of money owed.
- Quality Control - Quality control for banking using LDA
- Bank Note Fraud Detection - Bank Note Authentication Using DNN Tensorflow Classifier and RandomForest.
- ATM Surveillance - ATM Surveillance in banks use case.
Biotechnological & Life Sciences
- Programming - Python Programming for Biologists
- Introduction DL - A Primer on Deep Learning in Genomics
- Pose - Estimating animal poses using DL.
- Privacy - Privacy preserving NNs for clinical data sharing.
- Population Genetics - DL for population genetic inference.
- Bioinformatics Course - Course materials for Computational Biologyand Bioinformatics
- Applied Stats - Applied Statistics for High-Throughput Biology
- Scripts - Python scripts for biologists.
- Molecular NN - A mini-framework to build and train neural networks for molecular biology.
- Systems Biology Simulations - Systems biology practical on writing simulators with F# and Z3
- Cell Movement - LSTM to predict biological cell movement.
- Deepchem - Democratizing Deep-Learning for Drug Discovery, Quantum Chemistry, Materials Science and Biology
- DNA, RNA and Protein Sequencing - Anew representation for biological sequences using DL.
- CNN Sequencing - A toolbox for learning motifs from DNA/RNA sequence data using convolutional neural networks
- NLP Sequencing - Language transfer learning model for genomics
Chemoinformatics and drug discovery
- Novel Molecules - A convolutional net that can learn features.
- Automating Chemical Design - Generate new molecules for efficient exploration.
- GAN drug Discovery - A method that combines generative models with reinforcement learning.
- RL - generating compounds predicted to be active against a biological target.
- One-shot learning - Python library that aims to make the use of machine-learning in drug discovery straightforward and convenient.
- Jupyter Genomics - Collection of computation biology and bioinformatics notebooks.
- Variant calling - Correctly identify variations from the reference genome in an individual's DNA.
- Gene Expression Graphs - Using convolutions on an image.
- Autoencoding Expression - Extracting relevant patterns from large sets of gene expression data
- Gene Expression Inference - Predict the expression of specified target genes from a panel of about 1,000 pre-selected “landmark genes”.
- Plant Genomics - Presentation and example material for Plant and Pathogen Genomics
- Plants Disease - App that detects diseases in plants using a deep learning model.
- Leaf Identification - Identification of plants through plant leaves on the basis of their shape, color and texture.
- Crop Analysis - An imaging library to detect and track future position of ears on maize plants
- Seedlings - Plant Seedlings Classification from kaggle competition
- Plant Stress - An ontology containing plant stresses; biotic and abiotic.
- Animal Hierarchy - Package for calculating animal dominance hierarchies.
- Animal Identification - Deep learning for animal identification.
- Species - Big Data analysis of different species of animals
- Animal Vocalisations - A generative network for animal vocalizations
- Evolutionary - Evolution Strategies Tool
- Glaciers - Educational material about glaciers.
Construction & Engineering
- DL Architecture - Deep learning classifier and image generator for building architecture.
- Construction Materials - A course on construction materials.
- Bad Actor Risk Model - Risk model to improve construction related building safety
- Inspectors - Determine the assigned inspections.
- Corrupt Social Interactions - Uncover potential corrupt social interactions between an industry member and the staff at the DOB
- Risk Construction - Identify high risk construction.
- Facade Risk - A risk model to predict unsafe facades.
- Staff Levels - Predicting staff levels for front line workers.
- Injuries - Building related injuries topic modelling.
- Building Violations - Predictive analysis of building violations.
- Productivity - Productivity analysis and inspection with Tableau.
- Structural Analysis - 2D Structural Analysis in Python.
- Structural Engineering - Structural engineering modules.
- Nusa - Structural analysis using the finite element method.
- StructPy - Structural Analysis Library for Python based on the direct stiffness method
- Aileron - Structural analysis of the aileron of a Boeing 737
- Vibration - Educational vibration programs.
- Civil - Collection of civil engineering tools in FreeCAD
- GEstimator - Simple civil estimation software
- Fatpack - Functions and classes for fatigue analysis of data series.
- Pysteel - Automated design of different steel structure
- Structural Uncertainty - Quantifying structural uncertainty with deep learning.
- Pymech - A Python module for mechanical engineers
- Aerospace Engineering - Astrodynamics and Statistics
- Interactive Quantum Chemistry - Combining Psi4 and Numpy for education and development.
- Chemical and Process Engineering - Various resources.
- PyTherm - Applied Thermodynamics
- Aerogami - Aerodynamics using planes.
- Electro geophysics - Interactive applications for electromagnetics in geophysics
- Graph Signal - Graph signal processing tutorial.
- Mechanical Vibrations - Mechanical Vibrations at the Univsersity of Louisiana.
- Process Dynamics - Process Dynamics and Control
- Battery Life Cycle - Data driven prediction of batter life cycle.
- Wind Energy - Python for wind energy
- Energy Use - Standard methods for calculating normalized metered energy consumption
- Nuclear Radiation - How people are affected by radiations emitted by nuclear power plants
- Python Materials Genomics - Robust material analysis code used in a well-established project.
- Materials Mining - Scripts for simulations and analysis of materials.
- Emmet - Build databases of material properties.
- Megnet - Graph networks as a ML framework for Molecules and Crystals
- Atomate - Pre-built workflows for computational material science.
- Bylaws Compliance - Predicting property fines.
- Asphalt Binder - Construction materials, free energy and chemical composition of asphalt binder.
- Steel - Optimisation of steel.
- Awesome Materials Informatics - Curated list of known efforts in materials informatics.
- Trading Economics API - Information for 196 countries.
- Development Economics - Development microeconomics are written mostly as interactive jupyter notebooks
- Applied Econ & Fin - Applied Computational Economics and Finance
- Macroeconomics - Topics in macroeconomics with notebook examples.
- EconML - Automated Learning and Intelligence for Causation and Economics.
- Auctions - Optimal auctions using deep learning.
- Quant Econ - Quantitative economics course by NYU
- Computational - Computational methods in economics.
- Computational 2 - Small course in computational economics.
- Econometric Theory - Notebooks of A Primer on Econometric theory.
Education & Research
- Student Performance - Mining student performance using machine learning.
- Student Performance 2 - Student exam performance.
- Student Performance 3 - Student achievement in secondary education.
- Student Performance 4 - Students Performance Evaluation using Feature Engineering
- Student Intervention - Building a student intervention system.
- Student Enrolment - Student enrolment and performance analysis.
- Academic Performance - Explore the demographic and family features that have an impact a student's academic performance.
- Grade Analysis - Student achievement analysis.
- School Choice - Data analysis for education's school choice.
- School Budgets and Priorities - Helping the school board and mayor make strategic decisions regarding future school budgets and priorities
- School Performance - Data analysis practice using data from data.utah.gov on school performance.
- School Performance 2 - Using pandas to analyze school and student performance within a district
- School Performance 3 - Philadelphia School Performance
- School Performance 4 - NJ School Performance
- School Closure - Identify schools at risk for closure by performance and other characteristics.
- School Budgets - Tools and techniques for school budgeting.
- School Budgets - Same as a above, datacamp.
- PyCity - School analysis.
- PyCity 2 - School budget vs school results.
- Budget NLP - NLP classification for budget resources.
- Budget NLP 2 - Further classification exercise.
- Budget NLP 3 - Budget classification.
- Survey Analysis - Education survey analysis.
Emergency & Police
Preventative and Reactive
- Emergency Mapping - Detection of destroyed houses in California
- Emergency Room - Supporting emergency room decision making
- Emergency Readmission - Adjusted Risk of Emergency Readmission.
- Forest Fire - Forest fire detection through UAV imagery using CNNs
- Emergency Response - Emergency response analysis.
- Emergency Transportation - Transportation prompt on emergency services
- Emergency Dispatch - Reducing response times with predictive modeling, optimization, and automation
- Emergency Calls - Emergency calls analysis project.
- Calls Data Analysis - 911 data analysis.
- Emergency Response - Chemical factory RL.
- Crime Classification - Times analysis of serious assaults misclassified by LAPD.
- Article Tagging - Natural Language Processing of Chicago news article
- Crime Analysis - Association Rule Mining from Spatial Data for Crime Analysis
- Chicago Crimes - Exploring public Chicago crimes data set in Python
- Graph Analytics - The Hague Crimes.
- Crime Prediction - Crime classification, analysis & prediction in Indore city.
- Crime Prediction - Developed predictive models for crime rate.
- Crime Review - Crime review data analysis.
- Crime Trends - The Crime Trends Analysis Tool analyses crime trends and surfaces problematic crime conditions
- Crime Analytics - Analysis of crime data in Seattle and San Francisco.
- Ambulance Analysis - An investigation of Local Government Area ambulance time variation in Victoria.
- Site Location - Ambulance site locations.
- Dispatching - Applying game theory and discrete event simulation to find optimal solution for ambulance dispatching
- Ambulance Allocation - Time series analysis of ambulance dispatches in the City of San Diego.
- Response Time - An analysis on the improvements of ambulance response time.
- Optimal Routing - Project to find optimal routing of ambulances in Ithaca.
- Crash Analysis - Predicting the probability of accidents on a given segment on a given time.
- Conflict Prediction - Notebooks on conflict prediction.
- Burglary Prediction - Spatio-Temporal Modelling for burglary prediction.
- Predicting Disease Outbreak - Machine Learning implementation based on multiple classifier algorithm implementations.
- Road accident prediction - Prediction on type of victims on federal road accidents in Brazil.
- Text Mining - Disaster Management using Text mining.
- Twitter and disasters - Try to correctly predict whether tweets that are about disasters.
- Flood Risk - Impact of catastrophic flood events.
- Fire Prediction - We used 4 different algorithms to predict the likelihood of future fires.
Trading and Investment
- For more see financial-machine-learning
- For asset management see financial-machine-learning
- Deep Portfolio - Deep learning for finance Predict volume of bonds.
- AI Trading - Modern AI trading techniques.
- Corporate Bonds - Predicting the buying and selling volume of the corporate bonds.
- Simulation - Investigating simulations as part of computational finance.
- Industry Clustering - Project to cluster industries according to financial attributes.
- Financial Modeling - HFT trading and implied volatility modeling.
- Trend Following - A futures trend following portfolio investment strategy.
- Financial Statement Sentiment - Extracting sentiment from financial statements using neural networks.
- Applied Corporate Finance - Studies the empirical behaviors in stock market.
- Market Crash Prediction - Predicting market crashes using an LPPL model.
- NLP Finance Papers - Curating quantitative finance papers using machine learning.
- ARIMA-LTSM Hybrid - Hybrid model to predict future price correlation coefficients of two assets
- Basic Investments - Basic investment tools in python.
- Basic Derivatives - Basic forward contracts and hedging.
- Basic Finance - Source code notebooks basic finance applications.
- Advanced Pricing ML - Additional implementation of Advances in Financial Machine Learning (Book)
- Options and Regression - Financial engineering project for option pricing techniques.
- Quant Notebooks - Educational notebooks on quant finance, algorithmic trading and investment strategy.
- Forecasting Challenge - Financial forecasting challenge by G-Research (Hedge Fund)
- XGboost - A trading algorithm using XgBoost
- Research Paper Trading - A strategy implementation based on a paper using Alpaca Markets.
- Various - Options, Allocation, Simulation
- ML & RL NYU - Machine Learning and Reinforcement Learning in Finance.
- Datastream - Datastrem from Thomson Reuters accessible through Python.
- AlphaVantage - API wrapper to simplify the process of acquiring free financial data.
- FSA- A project to transfer SEC Edgar Filings’ financial data to custom financial statement analysis models.
- TradeConnector - A layer to connect with market data providers.
- Employee Count SEC Filings - Extraction to get the exact employee count values for companies from SEC filings.
- SEC Parsing - NLP to find and extract specific information from long, unstructured documents
- Open Edgar - OpenEDGAR (openedgar.io)
- Rating Industries - Histories from multiple agencies converted to CSV format
- Financial Machine Learning Regulation
- Predicting Restaurant Facility Closures
- Predicting Corporate Bankruptcies
- Predicting Earnings Surprises
- Machine Learning in Asset Management
- zEpid - Epidemiology analysis package.
- Python For Epidemiologists - Tutorial to introduce epidemiology analysis in Python.
- Prescription Compliance - An analysis of prescription and medical compliance
- Respiratory Disease - Tracking respiratory diseases in Olympic athletes
- Bubonic Plague - Bubonic plague and SIR model.
Justics, Law & Regulations
- LexPredict - Software package and library.
- AI Para-legal - Lobe is the world's first AI paralegal.
- Legal Entity Detection - NER For Legal Documents.
- Legal Case Summarisation - Implementation of different summarisation algorithms applied to legal case judgements.
- Legal Documents Google Scholar - Using Google scholar to extract cases programatically.
- Chat Bot - Chat-bot and email notifications.
- Congress API - ProPublica congress API access.
- Data Generator GDPR - Dummy data generator for GDPR compliance
- Blackstone - spaCy pipeline and model for NLP on unstructured legal text.
Policy and Regulatory
- GDPR scores - Predicting GDPR Scores for Legal Documents.
- Driving Factors FINRA - Identify the driving factors that influence the FINRA arbitration decisions.
- Securities Bias Correction - Bias-Corrected Estimation of Price Impact in Securities Litigation.
- Public Firm to Legal Decision - Embed public firms based on their reaction to legal decisions.
- Night Life Regulation - Australian nightlife and its regulation and policing
- Comments - Public comments on government regulations.
- Clustering - Clustering Canadian regulations.
- Environment - Regulation of Energy and the Environment
- Risk - Systematic risk of various financial regulations.
- FINRA Compliance - Topic modelling on compliance.
- Supreme Court Prediction - Predicting the ideological direction of Supreme Court decisions: ensemble vs. unified case-based model.
- Supreme Court Topic Modeling - Multiple steps necessary to implement topic modeling on supreme court decisions.
- Judge Opinion - Using text mining and machine learning to analyze judges’ opinions for a particular concern.
- ML Law Matching - A machine learning law match maker.
- Bert Multi-label Classification - Fine Grained Sentiment Analysis from AI.
- Some Computational AI Course - Video series Law MIT.
- Financial Machine Learning Regulation (Paper)
- Green Manufacturing - Mercedes-Benz Greener Manufacturing competition on Kaggle.
- Semiconductor Manufacturing - Semicondutor manufacturing process line data analysis.
- Smart Manufacturing - Shared work of a modelling Methodology.
- Bosch Manufacturing - Bosch manufacturing project, Kaggle.
- Predictive Maintenance 1 - Predict remaining useful life of aircraft engines
- Predictive Maintenance 2 - Time-To-Failure (TTF) or Remaining Useful Life (RUL)
- Manufacturing Maintenance - Simulation of maintenance in manufacturing systems.
- Predictive Analytics - Method for Predicting failures in Equipment using Sensor data.
- Detecting Defects - Anomaly detection for defective semiconductors
- Defect Detection - Smart defect detection for pill manufacturing.
- Manufacturing Failures - Reducing manufacturing failures.
- Manufacturing Anomalies - Intelligent anomaly detection for manufacturing line.
- Quality Control - Bosh failure of quality control.
- Manufacturing Quality - Intelligent Manufacturing Quality Forecast
- Auto Manufacturing - Regression Case Study Project on Manufacturing Auction Sale Data.
Media & Publishing
- Video Popularity - HIP model for predicting the popularity of videos.
- YouTube transcriber - Automatically transcribe YouTube videos.
- Marketing Analytics - Marketing analytics case studies.
- Algorithmic Marketing - Models from Introduction to Algorithmic Marketing book
- Marketing Scripts - Marketing data science applications.
- Social Mining - Mining the social web.
- Painting Forensics - Analysing paintings to find out their year of creation.
- Flickr - Metadata mining tool for tourism research.
- Fashion - A clothing retrieval and visual recommendation model for fashion images
- Gamma-hadron Reconstruction - Tools used in Gamma-ray ground based astronomy.
- Curriculum - Newtonian notebooks.
- Interaction Networks - Interaction Networks for Learning about Objects, Relations and Physics.
- Particle Physics - Training, generation, and analysis code for learning Particle Physics
- Computational Physics - A computational physics repository.
- Medical Physics - Useful python for medical physics.
- Medical Physics 2 - A common, core Python package for Medical Physics
- Flow Physics - Flow Physics and Aeroacoustics Toolbox with Python
- Physics ML and Stats - Machine learning and statistics for physicists
- High Energy - Machine Learning for High Energy Physics.
- High Energy GAN - Generative Adversarial Networks for High Energy Physics.
- Neural Networks - Physics meets neural networks
Government and Public Works
- Triage - General Purpose Risk Modeling and Prediction Toolkit for Policy and Social Good Problems.
- World Bank Poverty I - A comparative assessment of machine learning classification algorithms applied to poverty prediction.
- World Bank Poverty II - Repository for the World Bank Pover-t Test Competition Solution Overseas Company Land Ownership .
- Overseas Company Land Ownership - Identifying foreign ownership in the UK.
- CFPB - Consumer Finances Protection Bureau complaints analysis.
- Cannabis Legalisation Effect - Effects of cannabis legalization on crime.
- Public Credit Card - Identification of potential fraud for council credit cards. Data
- Recidivism Prediction - Transparency and audibility to recidivism risk assessment
- Household Poverty - Predict poverty in households in Costa Rica.
- NLP Public Policy - An example of an NLP use-case in public policy.
- World Food Production - Comparing Top food and feed Producers around the globe.
- Tax Inequality - Data project around taxation and inequality in Basel Stadt.
- Sheriff Compliance - Compliance to ICE requests.
- Apps Detection - Suspicious app detection for kids.
- Social Assistance - Trending information on social assistance
- Computational Social Science - Social data science summer school course.
- Liquor and Crime - Effect of liquor licenses issued on the crime rate.
- Animal Placement Kennels - Optimising animal placement in shelters.
- Staffing Wall - Independent exploration project on U.S. Mexican Border wall
- Worker Fatalities - Worker Fatalities and Catastrophes Map from OSHA data
- Census Data API - Pull variables from the 5-year American Community Survey.
- Philantropic Giving - Work done by numerous DataKind volunteers on harnessing Form 990 data
- Charity Recommender - NYC Charity Collaborative Recommender System on an Implicit DataSet.
- Donor Identification - A machine learning project in which we need to find donors for charity.
- US Charities - Charity exploration and machine learning.
- Charity Effectiveness - Scraping online data about charities to understand effectiveness
- Election Analysis - Election Analysis and Prediction Models
- American Election Causal - Using ANES data with causal inference models.
- Campaign Finance and Election Results - Investigating the relation between campaign finance and subsequent election results.
- Voting System - Proportional representation voting methods.
- President Vote - Vote by income level analysis..
- Congressional politics - House and senate congressional partisanship.
- Politico - A platform for profiling public figures in Brazilian politics.
- Bots - Tools and algorithms to analyze Paraguayan Tweets in times of election
- Gerrymander tests - Lots of metrics for quantifying gerrymandering.
- Sentiment - Analyse newspapers with respect to their political conviction using entity sentiments of party representatives.
- DL Politics - Prediction of Spanish Political Affinity with Deep Neural Nets: Socialist vs People's Party
- PAC Money - Effects of PAC money on US politics.
- Power Networks - Constructing a watchdog for Indian corporate and political networks
- Elite - Political elite in the US.
- Debate Analysis - Program to analyze political debates.
- Political Affiliation - Political affiliation prediction using twitter metadata.
- Political Ads - Investigation into Facebook Political Ads and Targeting
- Political Identity - Multi-axial political model.
- YT Politics - Mapping Politics on YouTube
- Political Ideology - Unsupervised learning of political ideology by word vector projections
Real Estate, Rental & Leasing
- Finding Donuts - Finding real estate opportunities by predicting transforming neighbourhoods.
- Neighbourhood - Predicting real estate prices from the urban environment.
- Real Estate Classification - Classifying the type of property given Real Estate, satellite and Street view Images
- Recommender - This tools aims to recommend a user the top 5 real estate properties that matches their search.
- House Price - Predicting house prices using Linear Regression and GBR
- House Price Portland - Predict housing prices in Portland.
- Zillow Prediction - Zillow valuation prediction as performed on Kaggle.
- Real Estate - Predicting real estate prices from the urban environment.
Rental & Leasing
- Analysing Rentals - Analyzing and visualizing rental listings data.
- Interest Prediction - Predict people interest in renting specific NYC apartments.
- Housing Uni vs Non-Uni - The effect on university lodging after the GFC.
- Predict Household Poverty - Predict the poverty of households in Costa Rica using automated feature engineering.
- Airbnb public analytics competition: - Now strategic management.
- Electricity Price - Electricity price comparison Singapore.
- Electricity-Coal Correlation - Determining the correlation between state electricity rates and coal generation over the past decade.
- Electricity Capacity - A Los Angeles Times analysis of California's costly power glut.
- Electricity Systems - Optimal Wind+Hydrogen+Other+Battery+Solar (WHOBS) electricity systems for European countries.
- Load Disaggregation - Smart meter load disaggregation with Hidden Markov Models
- Price Forecasting - Forecasting Day-Ahead electricity prices in the German bidding zone with deep neural networks.
- Carbon Index - Calculation of electricity CO₂ intensity at national, state, and NERC regions from 2001-present.
- Demand Forecasting - Electricity demand forecasting for Austin.
- Electricity Consumption - Estimating Electricity Consumption from Household Surveys
- Household power consumption - Individual household power consumption LSTM.
- Electricity French Distribution - An analysis of electricity data provided by the French Distribution Network (RTE)
- Renewable Power Plants - Time series of cumulated installed capacity.
- Wind Farm Flow - A repository of wind plant flow models connected to FUSED-Wind.
- Power Plant - The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011).
Coal, Oil & Gas
- Coal Phase Out - Generation adequacy issues with Germany’s coal phaseout.
- Coal Prediction - Predicting coal production.
- Oil & Gas - Oil & Natural Gas price prediction using ARIMA & Neural Networks
- Gas Formula - Calculating potential economic effect of price indexation formula.
- Demand Prediction - Natural gas demand prediction.
- Consumption Forecasting - Natural gas consumption forecasting.
- Gas Trade - World Model for Natural Gas Trade.
Water & Pollution
- Safe Water - Predict health-based drinking water violations in the United States.
- Hydrology Data - A suite of convenience functions for exploring water data in Python.
- Water Observatory - Monitoring water levels of lakes and reservoirs using satellite imagery.
- Water Pipelines - Using machine learning to find water pipelines in aerial images.
- Water Modelling - Australian Water Resource Assessment (AWRA) Community Modelling System.
- Drought Restrictions - A Los Angeles Times analysis of water usage after the state eased drought restrictions
- Flood Prediction - Applying LSTM on river water level data
- Sewage Overflow - Insights into the sanitary sewage overflow (SSO). - This has been removed
- Water Accounting - Assembles water budget data for the US from existing data source
- Air Quality Prediction - Predict air quality(aq) in Beijing and London in the next 48 hours.
- Transdim - Creating accurate and efficient solutions for the spatio-temporal traffic data imputation and prediction tasks.
- Transport Recommendation - Context-Aware Multi-Modal Transportation Recommendation
- Transport Data - Data and notebooks for Toronto transport.
- Transport Demand - Predicting demand for public transportation in Nairobi.
- Demand Estimation - Implementation of dynamic origin-destination demand estimation.
- Congestion Analysis - Transportation systems analysis
- TS Analysis - Time series analysis on transportation data.
- Network Graph Subway - Vulnerability analysis for transportation networks. - Have been taken down
- Transportation Inefficiencies - Quantifying the inefficiencies of Transportation Networks
- Train Optimisation - Train schedule optimisation
- Traffic Prediction - multi attention recurrent neural networks for time-series (city traffic)
- Predict Crashes - Crash prediction modelling application that leverages multiple data sources
- AI Supply chain - Supply chain optimisation system.
- Transfer Learning Flight Delay - Using variation encoders in Keras to predict flight delay.
- Replenishment - Retail replenishment code for supply chain management.
Wholesale & Retail
- Customer Analysis - Wholesale customer analysis.
- Distribution - JB wholesale distribution analysis.
- Clustering - Unsupervised learning techniques are applied on product spending data collected for customers
- Market Basket Analysis - Instacart public dataset to report which products are often shopped together.
- Retail Analysis - Studying Online Retail Dataset and getting insights from it.
- Online Insights - Analyzing the Online Transactions in UK
- Retail Use-case - Notebooks & Data for CyberShop Retail Use Case
- Dwell Time - Customer dwell time and other analysis.
- Retail Cohort - Cohort analysis.