Recent Trend

Hazel
Hazel Engine
PayloadsAllTheThings
A list of useful payloads and bypass for Web Application Security and Pentest/CTF
send
Simple, private file sharing from the makers of Firefox
windows
V2ray , Trojan, Trojan-go, NaiveProxy, shadowsocksR install tools for windows V2ray,Trojan,Trojan-go, NaiveProxy, shadowsocksR的一键安装工具windows下用(一键科学上网)
silero-models
Silero Models: pre-trained STT models and benchmarks made embarrassingly simple
vue-next
Repo for Vue 3.0 (currently in RC)
FreeCAD
This is the official source code of FreeCAD, a free and opensource multiplatform 3D parametric modeler. Issues are managed on our own bug tracker at https://www.freecadweb.org/tracker
kb
A minimalist knowledge base manager
Mask_RCNN
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
mimikatz
A little tool to play with Windows security
vagas-junior-estagio
Empresas que constantemente oferecem vagas para junior e estagiários
material-shell
A modern desktop interface for Linux. Improve your user experience and get rid of the anarchy of traditional desktop workflows. Designed to simplify navigation and reduce the need to manipulate window
dayjs
⏰ Day.js 2KB immutable date library alternative to Moment.js with the same modern API
ML_course
EPFL Machine Learning Course, Fall 2019
create-react-app
Set up a modern web app by running one command.
Background-Matting
Background Matting: The World is Your Green Screen
Kingfisher
A lightweight, pure-Swift library for downloading and caching images from the web.
istio
Connect, secure, control, and observe services.
linux-command
Linux命令大全搜索工具,内容包含Linux命令手册、详解、学习、搜集。https://git.io/linux
generative_inpainting
DeepFill v1/v2 with Contextual Attention and Gated Convolution, CVPR 2018, and ICCV 2019 Oral
yearn-protocol
Yearn solidity smart contracts
pebble
RocksDB/LevelDB inspired key-value database in Go
jazzit
Laughs at your expense
moment
Parse, validate, manipulate, and display dates in javascript.
n8n
Free and open fair-code licensed node based Workflow Automation Tool. Easily automate tasks across different services.
Hero
Elegant transition library for iOS & tvOS
DAIN
Depth-Aware Video Frame Interpolation (CVPR 2019)
eat_tensorflow2_in_30_days
Tensorflow2.0 ?? is delicious, just eat it! ??
tmpmail
✉️ A temporary email right from your terminal
team-comtress-2
Team Fortress 2, but with a lot of fixes, QoL improvements and performance optimizations!
CVE-2020-1472
PoC for Zerologon - all research credits go to Tom Tervoort of Secura
DefinitelyTyped
The repository for high quality TypeScript type definitions.
rails
Ruby on Rails
stats-illustrations
R & stats illustrations by @allison_horst
pytorch-gans
My implementation of various GAN (generative adversarial networks) architectures like vanilla GAN, cGAN, DCGAN, etc.
data-science-interviews
Data science interview questions and answers
sds1

jupyter-text2code
A proof-of-concept jupyter extension which converts english queries into relevant python code
Notebooks
Learn Python for free using open-source notebooks in Hebrew.
jellyfin
The Free Software Media System
BIGTREETECH-SKR-mini-E3
BIGTREETECH SKR-mini-E3 motherboard is a ultra-quiet, low-power, high-quality 3D printing machine control board. It is launched by the 3D printing team of Shenzhen BIGTREE technology co., LTD. This bo
XiaomiADBFastbootTools
A simple tool for managing Xiaomi devices on desktop using ADB and Fastboot
fastmac
Get a MacOS or Linux shell, for free, in around 2 minutes
pipedream
Serverless integration and compute platform. Free for developers.
DeepVision
在我很多项目中用到的CV算法推理框架应用。
dive
A tool for exploring each layer in a docker image
libra
Libra’s mission is to enable a simple global payment system and financial infrastructure that empowers billions of people.
Kalman-and-Bayesian-Filters-in-Python
Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filters,
kinto

CVE-2020-1472
Test tool for CVE-2020-1472
leetcode_company_wise_questions
This is a repository containing the list of company wise questions available on leetcode premium
makani
Makani was a project to develop a commercial-scale airborne wind turbine, culminating in a flight test of the Makani M600 off the coast of Norway. All Makani software has now been open-sourced. This r
Fantasy-Premier-League
Creates a .csv file of all players in the English Player League with their respective team and total fantasy points
996.ICU
Repo for counting stars and contributing. Press F to pay respect to glorious developers.
graal
GraalVM: Run Programs Faster Anywhere ?
understand-nodejs
通过源码分析nodejs原理
tensorboard
TensorFlow's Visualization Toolkit
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Relativty
An open source VR headset with SteamVR supports for $200
guide-rpc-framework
A custom RPC framework implemented by Netty+Kyro+Zookeeper.(一款基于 Netty+Kyro+Zookeeper 实现的自定义 RPC 框架-附详细实现过程和相关教程。)
UTM
Virtual machines for iOS
v2ray-heroku
用于在 Heroku 上部署 V2Ray Websocket,本项目不宜做为长期使用之对策。
learning
Becoming 1% better at data science everyday
wirehole

minecraft-react

mem-doc
This is a document to help with .NET memory analysis and diagnostics.
HarmonyOS
A curated list of awesome things related to HarmonyOS. 华为鸿蒙操作系统。
radar-covid-backend-dp3t-server
DP^3T Radar COVID fork
eiten
Statistical and Algorithmic Investing Strategies for Everyone
radar-covid-backend-verification-server
Radar COVID Verification Service
radar-covid-ios
Native iOS app using DP^3T iOS sdk to handle Exposure Notification framework from Apple
radar-covid-android
Native Android app using DP^3T Android sdk to handle Exposure Notifications API from Google
react-challenge-amazon-clone

solana
Web-Scale Blockchain for fast, secure, scalable, decentralized apps and marketplaces.
cockroach
CockroachDB - the open source, cloud-native distributed SQL database.
machine-learning-for-trading
Code and resources for Machine Learning for Algorithmic Trading, 2nd edition.
hwp.js
Open source hwp viewer and parser library powered by web technology
awesome-react
A collection of awesome things regarding React ecosystem
connectedhomeip
Project Connected Home over IP is a new Working Group within the Zigbee Alliance. This Working Group plans to develop and promote the adoption of a new connectivity standard to increase compatibility
Yolo-Fastest
⚡ Yolo universal target detection model combined with EfficientNet-lite, the calculation amount is only 230Mflops(0.23Bflops), and the model size is 1.3MB
laravel
A PHP framework for web artisans
onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
cim
?cim(cross IM) 适用于开发者的分布式即时通讯系统
ESP32-WiFi-Hash-Monster
WiFi Hash Purple Monster, store EAPOL & PMKID packets in an SD CARD using a M5STACK / ESP32 device
react-portfolio

Algorithms
A collection of algorithms and data structures
frp
A fast reverse proxy to help you expose a local server behind a NAT or firewall to the internet.
funNLP
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名
GRAT2
We developed GRAT2 Command & Control (C2) project for learning purpose.
DescomplicandoKubernetes

aes-finder
Utility to find AES keys in running processes
mml-book.github.io
Companion webpage to the book "Mathematics For Machine Learning"
ultimate-python
Ultimate Python study guide for newcomers and professionals alike. ? ? ?
sushiswap-frontend

pytorch-lightning
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
awesome-flutter
An awesome list that curates the best Flutter libraries, tools, tutorials, articles and more.
Interview_Question_for_Beginner
? ? Technical-Interview guidelines written for those who started studying programming. I wish you all the best. ?
free

talk
A group video call for the web. No signups. No downloads.
bitcoin
Bitcoin Core integration/staging tree
eleventy-high-performance-blog
A high performance blog template for the 11ty static site generator.
awesome-project-ideas
Curated list of Machine Learning, NLP, Vision, Recommender Systems Project Ideas
ghcide
A library for building Haskell IDE tooling
moon
? The minimal & fast library for functional user interfaces
jdk
JDK main-line development
Tasmota
Alternative firmware for ESP8266 with easy configuration using webUI, OTA updates, automation using timers or rules, expandability and entirely local control over MQTT, HTTP, Serial or KNX. Full docum
Server
PanDownload的个人维护版本
a32nx
The A32NX Project is a community driven open source project to create a free Airbus A320neo in Microsoft Flight Simulator that is as close to reality as possible. It aims to enhance the default A320ne
keras
Deep Learning for humans
Red-Teaming-Toolkit
A collection of open source and commercial tools that aid in red team operations.
data-engineer-roadmap
Roadmap to becoming a data engineer in 2020
hivemind
Decentralized deep learning framework in pytorch. Built to train models on thousands of volunteers across the world.
scipio
Scipio is a thread-per-core framework that aims to make the task of writing highly parallel asynchronous application in a thread-per-core architecture easier for rustaceans
hoppscotch
? A free, fast and beautiful API request builder used by 75k+ developers. https://hoppscotch.io
Wav2Lip
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020.
KingOfBugBountyTips

autoscraper
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
100-nlp-papers
100 Must-Read NLP Papers
croc
Easily and securely send things from one computer to another ? ?
spark-nlp
State of the Art Natural Language Processing
display-switch
Turn a $30 USB switch into a full-featured multi-monitor KVM switch
surpriver
Find big moving stocks before they move using machine learning and anomaly detection
flink-learning
flink learning blog. http://www.flink-learning.com 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监
deeplearning-models
A collection of various deep learning architectures, models, and tips
fes.js
Fes.js 是一个管理台应用解决方案,提供初始项目、开发调试、编译打包的命令行工具,内置布局、权限、数据字典、状态管理、Api等多个模块,文件目录结构即路由,用户只需要编写页面内容。基于Vue.js,内置管理台常用能力,让用户写的更少,更简单。经过多个项目中打磨,趋于稳定。
stitches
The modern styling library. Near-zero runtime, server-side rendering, multi-variant support, and best-in-class developer experience.
18S191
Course 18.S191 at MIT, fall 2020 - Introduction to computational thinking with Julia
grafana
The tool for beautiful monitoring and metric analytics & dashboards for Graphite, InfluxDB & Prometheus & More
fortify

jetstream

scikit-learn-tips
?⚡ Daily scikit-learn tips
12306
12306智能刷票,订票
desafio-6-2020

30-seconds-of-code
Short JavaScript code snippets for all your development needs
gdal
GDAL is an open source X/MIT licensed translator library for raster and vector geospatial data formats.
toBeTopJavaer
To Be Top Javaer - Java工程师成神之路
companies-sponsoring-visas
A list of companies that sponsor employees from other countries.
howtheytest
A collection of public resources about how software companies test their software
bicep

htop
htop - an interactive process viewer
portainer
Making Docker management easy.
gorm
The fantastic ORM library for Golang, aims to be developer friendly
SuperPower
Here you should find the best power supplies for your low-power projects
CompEcon2020
Computational Economics Course 2020 by Kenneth Judd
vimac
Vimium for macOS.
Windows10Debloater
Script to remove Windows 10 bloatware.
HowToHunt
Some Tutorials and Things to Do while Hunting That Vulnerability.
Hack-Tools
The all-in-one Red Team extension for Web Pentester ?
KingOfBugBountyTips

Showkase
? Showkase is an annotation-processor based Android library that helps you organize, discover, search and visualize Jetpack Compose UI elements
webrtc-for-the-curious
WebRTC for the Curious: Go beyond the APIs
matplotplusplus
Matplot++: A C++ Graphics Library for Data Visualization ??
Flutter-Course-Resources
Learn to Code While Building Apps - The Complete Flutter Development Bootcamp
flutter-development-roadmap
Flutter App Developer Roadmap - A complete roadmap to learn Flutter App Development. I tried to learn flutter using this roadmap. If you want to add something please contribute to the project. Happy L
objax

sushiswap
? SushiSwap smart contracts
Cloudreve
?支持多家云存储的云盘系统 (A project helps you build your own cloud in minutes)
learn-python
? Playground and cheatsheet for learning Python. Collection of Python scripts that are split by topics and contain code examples with explanations.
Python-programming-exercises
100+ Python challenging programming exercises
learn-python3
Jupyter notebooks for teaching/learning Python 3
vscode-debug-visualizer
An extension for VS Code that visualizes data during debugging.
rapier
2D and 3D physics engines focused on performances.
project-guidelines
A set of best practices for JavaScript projects
d3
Bring data to life with SVG, Canvas and HTML. ???
OpenBot
OpenBot leverages smartphones as brains for low-cost robots. We have designed a small electric vehicle that costs about $50 and serves as a robot body. Our software stack for Android smartphones suppo
speakeasy
Windows kernel and user mode emulation.
Learn-Vim
A book for learning the Vim editor
maratona-fullcycle-4

arwes
Futuristic Sci-Fi and Cyberpunk Graphical User Interface Framework for Web Apps
gitignore
A collection of useful .gitignore templates
black
The uncompromising Python code formatter
airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
shift-ctrl-f
? Search the information available on a webpage using natural language instead of an exact string match.
traefik
The Cloud Native Edge Router
TecoGAN
This repo will contain source code and materials for the TecoGAN project, i.e. code for a TEmporally COherent GAN
present
A terminal-based presentation tool with colors and effects.
gitpod
Gitpod is an open-source Kubernetes application providing prebuilt, collaborative development environments in your browser - powered by VS Code.
compose-samples

react-native-navigation
A complete native navigation solution for React Native
kubernetes-examples
Minimal self-contained examples of standard Kubernetes features and patterns in YAML
vscode
Visual Studio Code
cpp-httplib
A C++ header-only HTTP/HTTPS server and client library
AWS-SAA-C02-Course
Personal notes for SAA-C02 test from: https://learn.cantrill.io
clean-architecture-swiftui
A demo project showcasing the production setup of the SwiftUI app with Clean Architecture
Gooey
Turn (almost) any Python command line program into a full GUI application with one line
baiduwp-php
PanDownload网页复刻版
latexify_py
Generates LaTeX math description from Python functions.
open-source-cs-python

RAFT

volt-bootstrap-5-dashboard
⚡️ Volt Bootstrap 5 Admin Dashboard Template with vanilla Javascript
itlwm
Intel Wi-Fi Drivers
packer
Packer is a tool for creating identical machine images for multiple platforms from a single source configuration.
open-source-cs
Video discussing this curriculum:
nsfw-filter
A Google Chrome / Firefox extension that blocks NSFW images from the web pages that you load using TensorFlow JS.
sudoku-solver
Smart solution to solve sudoku in VR
desafio-4-2020

msfs-a320neo

mit-15-003-data-science-tools
Study guides for MIT's 15.003 Data Science Tools
CascadeTabNet
This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"
certified-kubernetes-administrator-course
Certified Kubernetes Administrator - CKA Course
everyones-guide-for-starting-up-on-wechat-network
微信互联网平民创业
manim
Animation engine for explanatory math videos
CS-Notes
我的自学笔记,在学习shell和MLSys,整理C++、算法、操作系统,后续学习分布式系统,终身更新。
egua
? Linguagem de programação simples e moderna em português
talent-plan
open source training courses about distributed database and distributed systemes
godot
Godot Engine – Multi-platform 2D and 3D game engine
desafio-3-2020

optuna
A hyperparameter optimization framework
Ventoy
A new bootable USB solution.
Alt-F4
Alternative Factorio Friday Fan Facts, also known as Alt-F4
awesome-made-by-brazilians
?? A collection of amazing open source projects built by brazilian developers
zig
General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
cxx
Safe interop between Rust and C++
lireddit

chakra-ui
⚡️ Simple, Modular & Accessible UI Components for your React Applications
VancedManager
Vanced Installer
react-native-video
A
NYPD-Misconduct-Complaint-Database
This database is a record of NYPD misconduct complaints made by the public to the Civilian Complaint Review Board (CCRB).
awesome-django
A curated list of awesome things related to Django
Front-End-Checklist
? The perfect Front-End Checklist for modern websites and meticulous developers
fet.sh
a fetch written in posix shell without any external commands (linux only)
baiduwp
PanDownload Web, built with CloudFlare Workers
machine-learning-interview
Minimum Viable Study Plan for Machine Learning Interviews from FAAG, Snapchat, LinkedIn.
RSSHub
? Everything is RSSible
metamask-extension
? ? The MetaMask browser extension enables browsing Ethereum blockchain enabled websites
amplify-flutter
Amplify Framework provides a declarative and easy-to-use interface across different categories of cloud operations.
ent
An entity framework for Go
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
element3
(WIP)fork from ElemeFE/element ,A Vue.js 3.0 UI Toolkit for Web
posthog
? PostHog is developer-friendly, open-source product analytics.
awesome-hpp
A curated list of awesome header-only C++ libraries
fabric
Hyperledger Fabric is an enterprise-grade permissioned distributed ledger framework for developing solutions and applications. Its modular and versatile design satisfies a broad range of industry use
insight
Repository for Project Insight: NLP as a Service
omatsuri
Browser application with 9 open source frontend focused tools
InfoSpider
INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱?,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源
element-plus
A Vue.js 3.0 UI Toolkit for Web
autoscaler
Autoscaling components for Kubernetes
magento2
All Submissions you make to Magento Inc. ("Magento") through GitHub are subject to the following terms and conditions: (1) You grant Magento a perpetual, worldwide, non-exclusive, no charge, royalty f
ts-migrate
A tool to help migrate JavaScript code quickly and conveniently to TypeScript
ar-cutpaste
Cut and paste your surroundings using AR
chinese-programmer-wrong-pronunciation
中国程序员容易发音错误的单词
labs_campaigns

AdGuardHome
Network-wide ads & trackers blocking DNS server
COLA
Clean Object-oriented & Layered Architecture
Godzilla
哥斯拉
diagrams
? Diagram as Code for prototyping cloud system architectures
PaddleDetection
Object detection and instance segmentation toolkit based on PaddlePaddle.
handcalcs
Python library for converting Python calculations into rendered latex.
mern-course-bootcamp
Complete Free Coding Bootcamp 2020 MERN Stack
handwritten.js
Convert typed text to realistic handwriting!
archivy
Archivy is a self-hosted knowledge repository that allows you to safely preserve useful content that contributes to your knowledge bank.
mall-swarm
mall-swarm是一套微服务商城系统,采用了 Spring Cloud Hoxton & Alibaba、Spring Boot 2.3、Oauth2、MyBatis、Docker、Elasticsearch等核心技术,同时提供了基于Vue的管理后台方便快速搭建系统。mall-swarm在电商业务的基础集成了注册中心、配置中心、监控中心、网关等系统功能。文档齐全,附带全套Spring Clou
umami
Umami is a simple, fast, website analytics alternative to Google Analytics.
nl-covid19-notification-app-android
Android sources for the Dutch Covid19 Notification App
contenidos
Material del curso IIC2233 Programación Avanzada ?
locast2plex
A very simple script to connect locast to Plex's live tv/dvr feature.
h1st
H1st AI solves the critical “cold-start” problem of Industrial AI: encoding human expertise to augment the lack of data, while building a smooth transition toward a machine-learning future. This probl
minGPT
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Catch2
A modern, C++-native, header-only, test framework for unit-tests, TDD and BDD - using C++11, C++14, C++17 and later (or C++03 on the Catch1.x branch)
libra
Ergonomic machine learning.
annie
? Fast, simple and clean video downloader
spotMicro
Spot Micro Quadripeg Project
LeetCode-Go
✅ Solutions to LeetCode by Go, 100% test coverage, runtime beats 100% / LeetCode 题解
bootcamp-gostack-desafios
Repositório contendo todos os desafios dos módulos do Bootcamp Gostack
NoVmp
A static devirtualizer for VMProtect x64 3.x. powered by VTIL.
fullcalendar
Full-sized drag & drop event calendar
latexify_py
Generates LaTeX math description from Python functions.
vue-nodejs-youtube-clone
This is the frontend (VueJS) of the Youtube clone called VueTube.
youtube-clone-nodejs-api
VueTube is a YouTube clone built with nodejs, expressjs & mongodb. This is the RESTful API repository.
Behinder
“冰蝎”动态二进制加密网站管理客户端
low-level-design-primer

E-commerce-Complete-Flutter-UI

handson-ml
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in python using Scikit-Learn and TensorFlow.
aws-machine-learning-university-accelerated-tab

aws-machine-learning-university-accelerated-cv

mogollar
A MongoDB UI built with Electron
BespokeSynth
Software modular synth
desafio-1-2020

desafio-1-2020

kosmonaut
A web browser engine for the space age ?
aws-machine-learning-university-accelerated-nlp

fastbook
Draft of the fastai book
Hierarchical-Localization
Visual localization made easy
TypeScript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
Penetration_Testing_POC
渗透测试有关的POC、EXP、脚本、提权、小工具等,欢迎补充、完善---About penetration-testing python-script poc getshell csrf xss cms php-getshell domainmod-xss penetration-testing-poc csrf-webshell cobub-razor cve rce sql sql-poc p
God-Of-BigData
大数据面试题,大数据成神之路开启...Flink/Spark/Hadoop/Hbase/Hive...
OSCPRepo
A list of commands, scripts, resources, and more that I have gathered and attempted to consolidate for use as OSCP (and more) study material. Commands in 'Usefulcommands' Keepnote. Bookmarks and readi
drogon
Drogon: A C++14/17 based HTTP web application framework running on Linux/macOS/Unix/Windows
papercups
Open-source live customer chat
jupyter-book
Build interactive, publication-quality documents from Jupyter Notebooks
awesome-java
Collection of awesome Java project on Github(Github 上非常棒的 Java 开源项目集合).
espflix
A free video streaming service that runs on a ESP32
servo
The Servo Browser Engine
halfmoon
Front-end framework with a built-in dark mode, designed for rapidly building beautiful dashboards and product pages.
eventnative
EventNative is an open-source data collection framework
go-github
Go library for accessing the GitHub API
yam-protocol
A stablizing reserve currency protocol
mmdetection3d
OpenMMLab's next-generation platform for general 3D object detection.
sherlock
? Hunt down social media accounts by username across social networks
computervision-recipes
Best Practices, code samples, and documentation for Computer Vision.
clean-code-javascript
? Clean Code concepts adapted for JavaScript
laravel-admin
Build a full-featured administrative interface in ten minutes
OpenJailbreak
GeoSn0w's OpenJailbreak Project, an open-source iOS 11 to iOS 13 Jailbreak project & vault.
azure-quickstart-templates
Azure Quickstart Templates
nodejs.dev
A new Node.js resource built using Gatsby.js with React.js, TypeScript, Emotion, and Remark.
KOOM
KOOM is an OOM killer on mobile platform by Kwai.
bevy
A refreshingly simple data-driven game engine built in Rust
eat_pytorch_in_20_days
Pytorch?? is delicious, just eat it! ??
datasets
? 2,000,000+ Unsplash images made available for research and machine learning
malwoverview
Malwoverview is a first response tool to perform an initial and quick triage in a directory containing malware samples, specific malware sample, suspect URL and domains. Additionally, it allows to dow
streisand
Streisand sets up a new server running your choice of WireGuard, OpenConnect, OpenSSH, OpenVPN, Shadowsocks, sslh, Stunnel, or a Tor bridge. It also generates custom instructions for all of these serv
LeetCode
LeetCode刷题记录
IntelOwl
Intel Owl: analyze files, domains, IPs in multiple ways from a single API at scale
archive-program
The GitHub Archive Program & Arctic Code Vault
rancher
Complete container management platform
Noctilucent
Using TLS 1.3 to evade censors, bypass network defenses, and blend in with the noise
data-science
? Path to a free self-taught education in Data Science!
FigmaToCode
Generate responsive pages and apps on Tailwind, Flutter and SwiftUI.
twitter-clone

my-arsenal-of-aws-security-tools
List of open source tools for AWS security: defensive, offensive, auditing, DFIR, etc.
InvoiceNet
Deep neural network to extract intelligent information from invoice documents.
macOS_Big_Sur_icons_replacements
Replacement icons for popular apps in the style of macOS Big Sur
AnimeGANv2
[Open Source]. The improved version of AnimeGAN.
bluezone-app
Bluezone - Bảo vệ mình, bảo vệ cộng đồng
awesome-sysadmin
A curated list of amazingly awesome open source sysadmin resources inspired by Awesome PHP.
facebook-scripts-dom-manipulation
An open-source project includes many scripts with no Access Token needed for Facebook users by directly manipulating the DOM.
MCinaBox
MCinaBox - A Minecraft Java Edition Launcher on Android
ai-economist
Foundation is a flexible, modular, and composable framework to model socio-economic behaviors and dynamics with both agents and governments. This framework can be used in conjunction with reinforcemen
TikTok-Shares-Botter
Adds TikTok Shares for you.
prefect
The easiest way to automate your data
tuya-convert
A collection of scripts to flash Tuya IoT devices to alternative firmwares
crush
Crush is an attempt to make a command line shell that is also a powerful modern programming language.
pyre-check
Performant type-checking for python.
polkadot
Polkadot Node Implementation
incyber

mesh
Cloud native service mesh for the rest of us.
V2rayU
V2rayU,基于v2ray核心的mac版客户端,用于科学上网,使用swift编写,支持vmess,shadowsocks,socks5等服务协议,支持订阅, 支持二维码,剪贴板导入,手动配置,二维码分享等
TLS-poison

heroicons
A set of free MIT-licensed high-quality SVG icons for UI development.
react-native
A framework for building native apps with React.
gui.cs
Console-based user interface toolkit for .NET applications.
Atlas
Atlas: End-to-End 3D Scene Reconstruction from Posed Images
aws-sdk-go
AWS SDK for the Go programming language.
charts
Curated applications for Kubernetes
pybind11
Seamless operability between C++11 and Python
mediapipe
MediaPipe is the simplest way for researchers and developers to build world-class ML solutions and applications for mobile, edge, cloud and the web.
proffy-discovery
A proposta do projeto é uma aplicação que possa ligar quem deseja aprender, com quer ensinar. É possível encontrar alunos para o que você leciona, ou encontrar o professor para aquela matéria que você
mixer
Add-on for real-time collaboration in Blender.
iOS-DeviceSupport
This repository holds the device support files for the iOS, and I will update it regularly.
simdjson
Parsing gigabytes of JSON per second
amplify-js
A declarative JavaScript library for application development using cloud services.
lottie-ios
An iOS library to natively render After Effects vector animations
Faze4-Robotic-arm
All files for 6 axis robot arm with cycloidal gearboxes .
xiaobaiyang

Javascript
A repository for All algorithms implemented in Javascript (for educational purposes only)
blog-post-workflow
Show your latest blog posts from any sources or StackOverflow activity on your GitHub profile/project readme automatically using the RSS feed
reverse-interview
Questions to ask the company during your interview
expo
An open-source platform for making universal native apps with React. Expo runs on Android, iOS, and the web.
955.WLB
955 不加班的公司名单 - 工作 955,work–life balance (工作与生活的平衡)
A-to-Z-Resources-for-Students
✅ Curated list of resources for college students
TDengine
An open-source big data platform designed and optimized for the Internet of Things (IoT).
django-jazzmin
Jazzy theme for Django
full-stack-fastapi-postgresql
Full stack, modern web application generator. Using FastAPI, PostgreSQL as database, Docker, automatic HTTPS and more.
Reflection_Summary
算法理论基础知识应知应会
Best-websites-a-programmer-should-visit
? Some useful websites for programmers.
bpytop
Linux/OSX/FreeBSD resource monitor
TelemetrySourcerer
Enumerate and disable common sources of telemetry used by AV/EDR.
instagrabber
InstaGrabber, the open-source Instagram client for Android. Originally by @AwaisKing.
pe_tree

Powershell-Scripts
Helpful list of powershell scripts I have found/created
drawio
Source to app.diagrams.net
analytics
Simple and privacy-friendly alternative to Google Analytics
pycaret
An open source, low-code machine learning library in Python
Ciphey
Automated decryption tool
Data-Science-Interview-Resources
A repository listing out the potential sources which will help you in preparing for a Data Science/Machine Learning interview. New resources added frequently.
ps4-ipv6-uaf

UNSAM_2020c2_Python
Curso de programación en Python - 2do cuatrimestre 2020 - UNSAM
gpu.js
GPU Accelerated JavaScript
how-to-secure-anything
How to systematically secure anything: a repository about security engineering
paperview
A high performance X11 animated wallpaper setter
core
? JAVClub - 让你的大姐姐不再走丢
home-cloud
The "cloud" at home
haoel.github.io

InstaPy
? Instagram Bot - Tool for automated Instagram interactions
bat
A cat(1) clone with wings.
DeOldify
A Deep Learning based project for colorizing and restoring old images (and video!)
educative.io_courses
this is downloadings of all educative.io free student subscription courses as pdf from GitHub student pack
rustlings
? Small exercises to get you used to reading and writing Rust code!
trackerslist
Updated list of public BitTorrent trackers
Statistical-Learning-Method_Code
手写实现李航《统计学习方法》书中全部算法
mobile
React Native client application for COVID Shield on iOS and Android
binary_search
A collection of improved binary search algorithms.
mirai

TapTap
Port of the double tap on back of device feature from Android 11 to any armv8 Android device
complete-javascript-course
Starter files, final projects and FAQ for my Complete JavaScript course
icons
Official open source SVG icon library for Bootstrap.
oneflow
OneFlow is a performance-centered and open-source deep learning framework.
ml-engineer-roadmap
WIP: Roadmap to becoming a machine learning engineer in 2020
hvmi
Hypervisor Memory Introspection Core Library
fhe-toolkit-linux
IBM Fully Homomorphic Encryption Toolkit For Linux
teenyicons
Tiny minimal 1px icons designed to fit in the smallest places.
project-citadel
An open source project management tool with Kanban boards
covid-alert-app
Exposure notification client application / Application client de notification d'exposition
ChromeAppHeroes
?谷粒-Chrome插件英雄榜, 为优秀的Chrome插件写一本中文说明书, 让Chrome插件英雄们造福人类~ ChromePluginHeroes, Write a Chinese manual for the excellent Chrome plugin, let the Chrome plugin heroes benefit the human~ 公众号「0加1」同步更新
SSPanel-Uim
SSPanel V3 魔改再次修改版
UnusualVolumeDetector
Gets the last 5 months of volume history for every ticker, and alerts you when a stock's volume exceeds 10 standard deviations from the mean within the last 3 days
formik
Build forms in React, without the tears ?
learn-cantrill-io-labs
Standard and Advanced Demos for learn.cantrill.io courses
TransCoder
Public release of the TransCoder research project https://arxiv.org/pdf/2006.03511.pdf
bounty-targets-data
This repo contains hourly-updated data dumps of bug bounty platform scopes (like Hackerone/Bugcrowd/Intigriti/etc) that are eligible for reports
CtCI-6th-Edition
Cracking the Coding Interview 6th Ed. Solutions
windows95
?? Windows 95 in Electron. Runs on macOS, Linux, and Windows.
SkyArk
SkyArk helps to discover, assess and secure the most privileged entities in Azure and AWS
interviews
Everything you need to know to get the job.
Android-Analysis
Getting Genymotion & Burpsuite setup for Android Mobile App Analysis
detext
DeText: A Deep Neural Text Understanding Framework for Ranking and Classification Tasks
awesome-java
A curated list of awesome frameworks, libraries and software for the Java programming language.
workflow

tye
Tye is a tool that makes developing, testing, and deploying microservices and distributed applications easier. Project Tye includes a local orchestrator to make developing microservices easier and the
java-design-patterns
Design patterns implemented in Java
java8-tutorial
Modern Java - A Guide to Java 8
generator-jhipster
JHipster is a development platform to quickly generate, develop, & deploy modern web applications & microservice architectures.
stayaway-app
Official repository for the STAYAWAY COVID mobile application
api-guidelines
Microsoft REST API Guidelines
win10script
This is the Ultimate Windows 10 Script from a creation from multiple debloat scripts and gists from github.
tutorials
Just Announced - "Learn Spring Security OAuth":
Otto
Otto makes machine learning an intuitive, natural language experience.? Facebook AI Challenge winner
first-order-model
This repository contains the source code for the paper First Order Motion Model for Image Animation
laravel-best-practices
Laravel best practices
hiring-without-whiteboards
⭐️ Companies that don't have a broken hiring process
PyTorch_YOLOv4
PyTorch implementation of YOLOv4
macintosh.js
A virtual Apple Macintosh with System 8, running in Electron. I'm sorry.
QuickCut
Your most handy video processing software
Super-mario-bros-PPO-pytorch
Proximal Policy Optimization (PPO) algorithm for Super Mario Bros
arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficien
swift
The Swift Programming Language
flutter
Flutter makes it easy and fast to build beautiful apps for mobile and beyond.
pikvm
Open and cheap DIY IP-KVM based on Raspberry Pi
ILSpy
.NET Decompiler with support for PDB generation, ReadyToRun, Metadata (&more) - cross-platform!
aluraflix
⚛️ Projeto feito durante a Imersão React da Alura
starship
☄?️ The minimal, blazing-fast, and infinitely customizable prompt for any shell!
leonsans
Leon Sans is a geometric sans-serif typeface made with code in 2019 by Jongmin Kim.
MCVmComputers
Order computer parts from a satellite orbiting around your minecraft world and build actual working computers with them!
CleanArchitecture.WebApi
An implementation of Clean Architecture for ASP.NET Core 3.1 WebAPI. Built with loosely coupled architecture and clean-code practices in mind.
NutShell
RISC-V SoC designed by students in UCAS
bartosz-basics-of-haskell
Code and exercises from Bartosz Milewski's Basics of Haskell Tutorial
fullstack-starterkit
GraphQL first full-stack starter kit with Node, React. Powered by TypeScript
movement-tracking
UP - DOWN - LEFT - RIGHT movement tracking.
OSCP-Exam-Report-Template-Markdown
? OSCP Exam Report Template in Markdown
react-native-instagram-clone
A React Native app - Clone Instagram mobile app (In progress)
felicette
Satellite imagery for dummies.
neovim
Vim-fork focused on extensibility and usability
machine-learning-roadmap
A roadmap connecting many of the most important concepts in machine learning, how to learn them and what tools to use to perform them.
python-cheatsheet
Comprehensive Python Cheatsheet
awesome-cold-showers
For when people get too hyped up about things
cutter
Free and Open Source Reverse Engineering Platform powered by radare2
ORB_SLAM3
ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM
RustScan
Faster Nmap Scanning with Rust
openpilot
openpilot is an open source driver assistance system. openpilot performs the functions of Automated Lane Centering and Adaptive Cruise Control for over 85 supported car makes and models.
retinaface
The remake of the https://github.com/biubug6/Pytorch_Retinaface
awesome-gpt3

GitHub520
?让你“爱”上 GitHub,解决访问时图裂、加载慢的问题。
LeetcodeTop
汇总各大互联网公司容易考察的高频leetcode题?
angular-tetris
Tetris game built with Angular 10 and Akita ?
umi-core
UMI Core Go Library
RustScan
Faster Nmap Scanning with Rust
rpi-power-monitor
Raspberry Pi Power Monitor
umi-core-py
UMI Core Python Library
gpt3-sandbox
The goal of this project is to enable users to create cool web demos using the newly released OpenAI GPT-3 API with just a few lines of Python.
easy_rust
Rust explained using easy English
rengine
reNgine is an automated reconnaissance framework meant for gathering information during penetration testing of web applications. reNgine has customizable scan engines, which can be used to scan the we
industry-machine-learning
A curated list of applied machine learning and data science notebooks and libraries across different industries (by @firmai)
umi-core-js
UMI Core JS Library
bloatbox
☑️? Get rid of bloatware and clean your Windows 10 Start menu
umi-core-php
UMI Core PHP Library
proposal-record-tuple
ECMAScript proposal for the Record and Tuple value types. | Stage 2: it will change!
jetbrains-agent-latest
jetbrains全家桶永久激活破解,不需要修改host。完美破解!共享给各个程序员兄弟使用。适用于2020版本。
applied-ml
Curated papers, articles & videos on data science & machine learning applied in production, with results.
lotus
Implementation of the Filecoin protocol, written in Go
cat
CAT 作为服务端项目基础组件,提供了 Java, C/C++, Node.js, Python, Go 等多语言客户端,已经在美团点评的基础架构中间件框架(MVC框架,RPC框架,数据库框架,缓存框架等,消息队列,配置系统等)深度集成,为美团点评各业务线提供系统丰富的性能指标、健康状况、实时告警等。
fawkes
Fawkes, privacy preserving tool against facial recognition systems. More info at http://sandlab.cs.uchicago.edu/fawkes
terminal
The new Windows Terminal and the original Windows console host, all in the same place!
kibana
Your window into the Elastic Stack
terraform
Terraform enables you to safely and predictably create, change, and improve infrastructure. It is an open source tool that codifies APIs into declarative configuration files that can be shared amongst
gotraining
Go Training Class Material :
JavaFamily
【Java面试+Java学习指南】 一份涵盖大部分Java程序员所需要掌握的核心知识。
storybook
? The UI component workshop. Develop, document, & test for React, Vue, Angular, Ember, Web Components, & more!
awesome-remote-job
A curated list of awesome remote jobs and resources. Inspired by https://github.com/vinta/awesome-python
vueuse
? Collection of Composition API utils for Vue 2 and 3
fe-interview
前端面试每日 3+1,以面试题来驱动学习,提倡每日学习与思考,每天进步一点!每天早上5点纯手工发布面试题(死磕自己,愉悦大家),3000+道前端面试题全面覆盖,HTML/CSS/JavaScript/Vue/React/Nodejs/TypeScript/ECMAScritpt/Webpack/Jquery/小程序/软技能……
stock
stock,股票系统。使用python进行开发。
awesome-ml-courses
Awesome free machine learning and AI courses with video lectures.
laravel-boilerplate
The Laravel Boilerplate Project - https://laravel-boilerplate.com
reactjs-interview-questions
List of top 500 ReactJS Interview Questions & Answers....Coding exercise questions are coming soon!!
lx-music-desktop
一个基于 electron 的音乐软件
number-verifier
Number Verifier is a SMS verification tool that makes it easy to get a disposable SMS number and bypass SMS number verifications on any site.
CyberProfDevelopmentCovidResources
An awesome list of FREE resources for training, conferences, speaking, labs, reading, etc that are free all the time or during COVID-19 that cybersecurity professionals with downtime can take advantag
opentelemetry-specification
Specifications for OpenTelemetry
front-end-interview-handbook
? No bullshit answers to the famous h5bp "Front-end Job Interview Questions"
hello-algorithm
?????? 本项目包括:1、我写的 30w 字图解算法题典 2、100 张编程类超清晰思维导图 3、100 篇大厂面经汇总 4、各语言编程电子书 100 本 5、小浩算法网站源代码 ( ?? 国人项目上榜不容易,右上角助力一波!干就对了,奥利给 !??)

funNLP

Python LINK
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名

NLP民工的乐园

The Most Powerful NLP-Weapon Arsenal

NLP民工的乐园: 几乎最全的中文NLP资源库

很多包非常有趣,值得收藏,满足大家的收集癖! 如果觉得有用,请分享并star,谢谢!

长期不定时更新,欢迎watch和fork!

涉及内容包括但不限于:**中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库、中文聊天语料、中文谣言数据、百度中文问答数据集、句子相似度匹配算法集合、bert资源、文本生成&摘要相关工具、cocoNLP信息抽取工具、国内电话号码正则匹配、清华大学XLORE:中英文跨语言百科知识图谱、清华大学人工智能技术系列报告、自然语言生成、NLU太难了系列、自动对联数据及机器人、用户名黑名单列表、罪名法务名词及分类模型、微信公众号语料、cs224n深度学习自然语言处理课程、中文手写汉字识别、中文自然语言处理 语料/数据集、变量命名神器、分词语料库+代码、任务型对话英文数据集、ASR 语音数据集 + 基于深度学习的中文语音识别系统、笑声检测器、Microsoft多语言数字/单位/如日期时间识别包、中华新华字典数据库及api(包括常用歇后语、成语、词语和汉字)、文档图谱自动生成、SpaCy 中文模型、Common Voice语音识别数据集新版、神经网络关系抽取、基于bert的命名实体识别、关键词(Keyphrase)抽取包pke、基于医疗领域知识图谱的问答系统、基于依存句法与语义角色标注的事件三元组抽取、依存句法分析4万句高质量标注数据、cnocr:用来做中文OCR的Python3包、中文人物关系知识图谱项目、中文nlp竞赛项目及代码汇总、中文字符数据、speech-aligner: 从“人声语音”及其“语言文本”产生音素级别时间对齐标注的工具、AmpliGraph: 知识图谱表示学习(Python)库:知识图谱概念链接预测、Scattertext 文本可视化(python)、语言/知识表示工具:BERT & ERNIE、中文对比英文自然语言处理NLP的区别综述、Synonyms中文近义词工具包、HarvestText领域自适应文本挖掘工具(新词发现-情感分析-实体链接等)、word2word:(Python)方便易用的多语言词-词对集:62种语言/3,564个多语言对、语音识别语料生成工具:从具有音频/字幕的在线视频创建自动语音识别(ASR)语料库、构建医疗实体识别的模型(包含词典和语料标注)、单文档非监督的关键词抽取、Kashgari中使用gpt-2语言模型、开源的金融投资数据提取工具、文本自动摘要库TextTeaser: 仅支持英文、人民日报语料处理工具集、一些关于自然语言的基本模型、基于14W歌曲知识库的问答尝试--功能包括歌词接龙and已知歌词找歌曲以及歌曲歌手歌词三角关系的问答、基于Siamese bilstm模型的相似句子判定模型并提供训练数据集和测试数据集、用Transformer编解码模型实现的根据Hacker News文章标题自动生成评论、用BERT进行序列标记和文本分类的模板代码、LitBank:NLP数据集——支持自然语言处理和计算人文学科任务的100部带标记英文小说语料、百度开源的基准信息抽取系统、虚假新闻数据集、Facebook: LAMA语言模型分析,提供Transformer-XL/BERT/ELMo/GPT预训练语言模型的统一访问接口、CommonsenseQA:面向常识的英文QA挑战、中文知识图谱资料、数据及工具、各大公司内部里大牛分享的技术文档 PDF 或者 PPT、自然语言生成SQL语句(英文)、中文NLP数据增强(EDA)工具、英文NLP数据增强工具 、基于医药知识图谱的智能问答系统、京东商品知识图谱、基于mongodb存储的军事领域知识图谱问答项目、基于远监督的中文关系抽取、语音情感分析、中文ULMFiT-情感分析-文本分类-语料及模型、一个拍照做题程序、世界各国大规模人名库、一个利用有趣中文语料库 qingyun 训练出来的中文聊天机器人、中文聊天机器人seqGAN、省市区镇行政区划数据带拼音标注、教育行业新闻语料库包含自动文摘功能、开放了对话机器人-知识图谱-语义理解-自然语言处理工具及数据、中文知识图谱:基于百度百科中文页面-抽取三元组信息-构建中文知识图谱、masr: 中文语音识别-提供预训练模型-高识别率、Python音频数据增广库、中文全词覆盖BERT及两份阅读理解数据、ConvLab:开源多域端到端对话系统平台、中文自然语言处理数据集、基于最新版本rasa搭建的对话系统、基于TensorFlow和BERT的管道式实体及关系抽取、一个小型的证券知识图谱/知识库、复盘所有NLP比赛的TOP方案、OpenCLaP:多领域开源中文预训练语言模型仓库、UER:基于不同语料+编码器+目标任务的中文预训练模型仓库、中文自然语言处理向量合集、基于金融-司法领域(兼有闲聊性质)的聊天机器人、g2pC:基于上下文的汉语读音自动标记模块、Zincbase 知识图谱构建工具包、诗歌质量评价/细粒度情感诗歌语料库、快速转化「中文数字」和「阿拉伯数字」、百度知道问答语料库、基于知识图谱的问答系统、jieba_fast 加速版的jieba、正则表达式教程、中文阅读理解数据集、基于BERT等最新语言模型的抽取式摘要提取、Python利用深度学习进行文本摘要的综合指南、知识图谱深度学习相关资料整理、维基大规模平行文本语料、StanfordNLP 0.2.0:纯Python版自然语言处理包、NeuralNLP-NeuralClassifier:腾讯开源深度学习文本分类工具、端到端的封闭域对话系统、中文命名实体识别:NeuroNER vs. BertNER、新闻事件线索抽取、2019年百度的三元组抽取比赛:“科学空间队”源码、基于依存句法的开放域文本知识三元组抽取和知识库构建、中文的GPT2训练代码、ML-NLP - 机器学习(Machine Learning)NLP面试中常考到的知识点和代码实现、nlp4han:中文自然语言处理工具集(断句/分词/词性标注/组块/句法分析/语义分析/NER/N元语法/HMM/代词消解/情感分析/拼写检查、XLM:Facebook的跨语言预训练语言模型、用基于BERT的微调和特征提取方法来进行知识图谱百度百科人物词条属性抽取、中文自然语言处理相关的开放任务-数据集-当前最佳结果、CoupletAI - 基于CNN+Bi-LSTM+Attention 的自动对对联系统、抽象知识图谱、MiningZhiDaoQACorpus - 580万百度知道问答数据挖掘项目、brat rapid annotation tool: 序列标注工具、大规模中文知识图谱数据:1.4亿实体、数据增强在机器翻译及其他nlp任务中的应用及效果、allennlp阅读理解:支持多种数据和模型、PDF表格数据提取工具 、 Graphbrain:AI开源软件库和科研工具,目的是促进自动意义提取和文本理解以及知识的探索和推断、简历自动筛选系统、基于命名实体识别的简历自动摘要、中文语言理解测评基准,包括代表性的数据集&基准模型&语料库&排行榜、树洞 OCR 文字识别 、从包含表格的扫描图片中识别表格和文字、语声迁移、Python口语自然语言处理工具集(英文)、 similarity:相似度计算工具包,java编写、海量中文预训练ALBERT模型 、Transformers 2.0 、基于大规模音频数据集Audioset的音频增强 、Poplar:网页版自然语言标注工具、图片文字去除,可用于漫画翻译 、186种语言的数字叫法库、Amazon发布基于知识的人-人开放领域对话数据集 、中文文本纠错模块代码、繁简体转换 、 Python实现的多种文本可读性评价指标、类似于人名/地名/组织机构名的命名体识别数据集 、东南大学《知识图谱》研究生课程(资料)、. 英文拼写检查库 、 wwsearch是企业微信后台自研的全文检索引擎、CHAMELEON:深度学习新闻推荐系统元架构 、 8篇论文梳理BERT相关模型进展与反思、DocSearch:免费文档搜索引擎、 LIDA:轻量交互式对话标注工具 、aili - the fastest in-memory index in the East 东半球最快并发索引 、知识图谱车音工作项目、自然语言生成资源大全 、中日韩分词库mecab的Python接口库、中文文本摘要/关键词提取、汉字字符特征提取器 (featurizer),提取汉字的特征(发音特征、字形特征)用做深度学习的特征、中文生成任务基准测评 、中文缩写数据集、中文任务基准测评 - 代表性的数据集-基准(预训练)模型-语料库-baseline-工具包-排行榜、PySS3:面向可解释AI的SS3文本分类器机器可视化工具 、中文NLP数据集列表、COPE - 格律诗编辑程序、doccano:基于网页的开源协同多语言文本标注工具 、PreNLP:自然语言预处理库、简单的简历解析器,用来从简历中提取关键信息、用于中文闲聊的GPT2模型:GPT2-chitchat、基于检索聊天机器人多轮响应选择相关资源列表(Leaderboards、Datasets、Papers)、(Colab)抽象文本摘要实现集锦(教程 、词语拼音数据、高效模糊搜索工具、NLP数据增广资源集、微软对话机器人框架 、 GitHub Typo Corpus:大规模GitHub多语言拼写错误/语法错误数据集、TextCluster:短文本聚类预处理模块 Short text cluster、面向语音识别的中文文本规范化、BLINK:最先进的实体链接库、BertPunc:基于BERT的最先进标点修复模型、Tokenizer:快速、可定制的文本词条化库、中文语言理解测评基准,包括代表性的数据集、基准(预训练)模型、语料库、排行榜、spaCy 医学文本挖掘与信息提取 、 NLP任务示例项目代码集、 python拼写检查库、chatbot-list - 行业内关于智能客服、聊天机器人的应用和架构、算法分享和介绍、语音质量评价指标(MOSNet, BSSEval, STOI, PESQ, SRMR)、 用138GB语料训练的法文RoBERTa预训练语言模型 、BERT-NER-Pytorch:三种不同模式的BERT中文NER实验、无道词典 - 有道词典的命令行版本,支持英汉互查和在线查询、2019年NLP亮点回顾、 Chinese medical dialogue data 中文医疗对话数据集 、最好的汉字数字(中文数字)-阿拉伯数字转换工具、 基于百科知识库的中文词语多词义/义项获取与特定句子词语语义消歧、awesome-nlp-sentiment-analysis - 情感分析、情绪原因识别、评价对象和评价词抽取、LineFlow:面向所有深度学习框架的NLP数据高效加载器、中文医学NLP公开资源整理 、MedQuAD:(英文)医学问答数据集、将自然语言数字串解析转换为整数和浮点数、Transfer Learning in Natural Language Processing (NLP) 、面向语音识别的中文/英文发音辞典、Tokenizers:注重性能与多功能性的最先进分词器、CLUENER 细粒度命名实体识别 Fine Grained Named Entity Recognition、 基于BERT的中文命名实体识别、中文谣言数据库、NLP数据集/基准任务大列表、nlp相关的一些论文及代码, 包括主题模型、词向量(Word Embedding)、命名实体识别(NER)、文本分类(Text Classificatin)、文本生成(Text Generation)、文本相似性(Text Similarity)计算等,涉及到各种与nlp相关的算法,基于keras和tensorflow 、Python文本挖掘/NLP实战示例、 Blackstone:面向非结构化法律文本的spaCy pipeline和NLP模型通过同义词替换实现文本“变脸” 、中文 预训练 ELECTREA 模型: 基于对抗学习 pretrain Chinese Model 、albert-chinese-ner - 用预训练语言模型ALBERT做中文NER 、基于GPT2的特定主题文本生成/文本增广、开源预训练语言模型合集、多语言句向量包、编码、标记和实现:一种可控高效的文本生成方法、 英文脏话大列表 、attnvis:GPT2、BERT等transformer语言模型注意力交互可视化、CoVoST:Facebook发布的多语种语音-文本翻译语料库,包括11种语言(法语、德语、荷兰语、俄语、西班牙语、意大利语、土耳其语、波斯语、瑞典语、蒙古语和中文)的语音、文字转录及英文译文、Jiagu自然语言处理工具 - 以BiLSTM等模型为基础,提供知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类等功能、用unet实现对文档表格的自动检测,表格重建、NLP事件提取文献资源列表 、 金融领域自然语言处理研究资源大列表、CLUEDatasetSearch - 中英文NLP数据集:搜索所有中文NLP数据集,附常用英文NLP数据集 、medical_NER - 中文医学知识图谱命名实体识别 、(哈佛)讲因果推理的免费书、知识图谱相关学习资料/数据集/工具资源大列表、Forte:灵活强大的自然语言处理pipeline工具集 、Python字符串相似性算法库、PyLaia:面向手写文档分析的深度学习工具包、TextFooler:针对文本分类/推理的对抗文本生成模块、Haystack:灵活、强大的可扩展问答(QA)框架、中文关键短语抽取工具**。


1. textfilter: 中英文敏感词过滤 observerss/textfilter

 >>> f = DFAFilter()
 >>> f.add("sexy")
 >>> f.filter("hello sexy baby")
 hello **** baby

敏感词包括政治、脏话等话题词汇。其原理主要是基于词典的查找(项目中的keyword文件),内容很劲爆。。。

2. langid:97种语言检测 https://github.com/saffsd/langid.py

pip install langid

>>> import langid
>>> langid.classify("This is a test")
('en', -54.41310358047485)

3. langdetect:另一个语言检测https://code.google.com/archive/p/language-detection/

pip install langdetect

from langdetect import detect
from langdetect import detect_langs

s1 = "本篇博客主要介绍两款语言探测工具,用于区分文本到底是什么语言,"
s2 = 'We are pleased to introduce today a new technology'
print(detect(s1))
print(detect(s2))
print(detect_langs(s3))    # detect_langs()输出探测出的所有语言类型及其所占的比例

输出结果如下: 注:语言类型主要参考的是ISO 639-1语言编码标准,详见ISO 639-1百度百科

跟上一个语言检测比较,准确率低,效率高。

4. phone 中国手机归属地查询: ls0f/phone

已集成到 python package cocoNLP中,欢迎试用

from phone import Phone
p  = Phone()
p.find(18100065143)
#return {'phone': '18100065143', 'province': '上海', 'city': '上海', 'zip_code': '200000', 'area_code': '021', 'phone_type': '电信'}

支持号段: 13,15,18*,14[5,7],17[0,6,7,8]

记录条数: 360569 (updated:2017年4月)

作者提供了数据phone.dat 方便非python用户Load数据。

5. phone国际手机、电话归属地查询:AfterShip/phone

npm install phone

import phone from 'phone';
phone('+852 6569-8900'); // return ['+85265698900', 'HKG']
phone('(817) 569-8900'); // return ['+18175698900, 'USA']

6. ngender 根据名字判断性别:observerss/ngender 基于朴素贝叶斯计算的概率

pip install ngender

>>> import ngender
>>> ngender.guess('赵本山')
('male', 0.9836229687547046)
>>> ngender.guess('宋丹丹')
('female', 0.9759486128949907)

7. 抽取email的正则表达式

已集成到 python package cocoNLP中,欢迎试用

email_pattern = '^[*#\u4e00-\u9fa5 a-zA-Z0-9_.-][email protected][a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)*\.[a-zA-Z0-9]{2,6}$'
emails = re.findall(email_pattern, text, flags=0)

8. 抽取phone_number的正则表达式

已集成到 python package cocoNLP中,欢迎试用

cellphone_pattern = '^((13[0-9])|(14[0-9])|(15[0-9])|(17[0-9])|(18[0-9]))\d{8}$'
phoneNumbers = re.findall(cellphone_pattern, text, flags=0)

9. 抽取身份证号的正则表达式

IDCards_pattern = r'^([1-9]\d{5}[12]\d{3}(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])\d{3}[0-9xX])$'
IDs = re.findall(IDCards_pattern, text, flags=0)

10. 人名语料库: wainshine/Chinese-Names-Corpus

人名抽取功能 python package cocoNLP,欢迎试用

中文(现代、古代)名字、日文名字、中文的姓和名、称呼(大姨妈、小姨妈等)、英文->中文名字(李约翰)、成语词典

(可用于中文分词、姓名识别)

11. 中文缩写库:github

全国人大: 全国/n 人民/n 代表大会/n
中国: 中华人民共和国/ns
女网赛: 女子/n 网球/n 比赛/vn

12. 汉语拆字词典:kfcd/chaizi

漢字  拆法 (一)  拆法 (二)  拆法 (三)
拆   手 斥 扌 斥 才 斥

13. 词汇情感值:rainarch/SentiBridge

山泉水 充沛  0.400704566541  0.370067395878
视野          宽广  0.305762728932  0.325320747491
大峡谷 惊险  0.312137906517  0.378594957281

14. 中文词库、停用词、敏感词 dongxiexidian/Chinese

此package的敏感词库分类更细:

反动词库敏感词库表统计暴恐词库民生词库色情词库

15. 汉字转拼音:mozillazg/python-pinyin

文本纠错会用到

16. 中文繁简体互转:skydark/nstools

17. 英文模拟中文发音引擎 funny chinese text to speech enginee:tinyfool/ChineseWithEnglish

say wo i ni
#说:我爱你

相当于用英文音标,模拟中文发音。

18. 汪峰歌词生成器:phunterlau/wangfeng-rnn

我在这里中的夜里
就像一场是一种生命的意旪
就像我的生活变得在我一样
可我们这是一个知道
我只是一天你会怎吗

19. 同义词库、反义词库、否定词库:guotong1988/chinese_dictionary

20. 无空格英文串分割、抽取单词:wordninja

>>> import wordninja
>>> wordninja.split('derekanderson')
['derek', 'anderson']
>>> wordninja.split('imateapot')
['im', 'a', 'teapot']

21. IP地址正则表达式:

(25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]?\d)\.(25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]?\d)\.(25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]?\d)\.(25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]?\d)

22. 腾讯QQ号正则表达式:

[1-9]([0-9]{5,11})

23. 国内固话号码正则表达式:

[0-9-()()]{7,18}

24. 用户名正则表达式:

[A-Za-z0-9_\-\u4e00-\u9fa5]+

25. 汽车品牌、汽车零件相关词汇:

见本repo的data文件 [data](https://github.com/fighting41love/funNLP/tree/master/data)

26. 时间抽取:

已集成到 python package cocoNLP中,欢迎试用

在2016年6月7日9:44执行測試,结果如下

Hi,all。下周一下午三点开会

>> 2016-06-13 15:00:00-false

周一开会

>> 2016-06-13 00:00:00-true

下下周一开会

>> 2016-06-20 00:00:00-true

java version

python version

27. 各种中文词向量: github repo

中文词向量大全

28. 公司名字大全: github repo

29. 古诗词库: github repo 更全的古诗词库

30. THU整理的词库: link

已整理到本repo的data文件夹中.

IT词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库

31. 中文聊天语料 link

该库搜集了包含:豆瓣多轮, PTT八卦语料, 青云语料, 电视剧对白语料, 贴吧论坛回帖语料,微博语料,小黄鸡语料

32. 中文谣言数据: github

该数据文件中,每一行为一条json格式的谣言数据,字段释义如下:

rumorCode: 该条谣言的唯一编码,可以通过该编码直接访问该谣言举报页面。
title: 该条谣言被举报的标题内容
informerName: 举报者微博名称
informerUrl: 举报者微博链接
rumormongerName: 发布谣言者的微博名称
rumormongerUr: 发布谣言者的微博链接
rumorText: 谣言内容
visitTimes: 该谣言被访问次数
result: 该谣言审查结果
publishTime: 该谣言被举报时间

33. 情感波动分析:github

词库已整理到本repo的data文件夹中.

本repo项目是一个通过与人对话获得其情感值波动图谱, 内用词库在data文件夹中.

34. 中文问答数据集链接 提取码: 2dva

35. 句子、QA相似度匹配:MatchZoo github

文本相似度匹配算法的集合,包含多个深度学习的方法,值得尝试。

36. bert资源:

37. Texar - Toolkit for Text Generation and Beyond: github

38. 中文事件抽取: github

39. cocoNLP: github

人名、地址、邮箱、手机号、手机归属地 等信息的抽取,rake短语抽取算法。

pip install cocoNLP

>>> from cocoNLP.extractor import extractor

>>> ex = extractor()

>>> text = '急寻特朗普,男孩,于2018年11月27号11时在陕西省安康市汉滨区走失。丢失发型短发,...如有线索,请迅速与警方联系:18100065143,132-6156-2938,[email protected] 和yangyangfuture at gmail dot com'

# 抽取邮箱
>>> emails = ex.extract_email(text)
>>> print(emails)

['[email protected]', '[email protected]']
# 抽取手机号
>>> cellphones = ex.extract_cellphone(text,nation='CHN')
>>> print(cellphones)

['18100065143', '13261562938']
# 抽取手机归属地、运营商
>>> cell_locs = [ex.extract_cellphone_location(cell,'CHN') for cell in cellphones]
>>> print(cell_locs)

cellphone_location [{'phone': '18100065143', 'province': '上海', 'city': '上海', 'zip_code': '200000', 'area_code': '021', 'phone_type': '电信'}]
# 抽取地址信息
>>> locations = ex.extract_locations(text)
>>> print(locations)
['陕西省安康市汉滨区', '安康市汉滨区', '汉滨区']
# 抽取时间点
>>> times = ex.extract_time(text)
>>> print(times)
time {"type": "timestamp", "timestamp": "2018-11-27 11:00:00"}
# 抽取人名
>>> name = ex.extract_name(text)
>>> print(name)
特朗普

40. 国内电话号码正则匹配(三大运营商+虚拟等): github

41. 清华大学XLORE:中英文跨语言百科知识图谱: link
上述链接中包含了所有实体及关系的TTL文件,更多数据将在近期发布。 概念,实例,属性和上下位关系数目

百度 中文维基 英文维基 总数
概念数量 32,009 150,241 326,518 508,768
实例数量 1,629,591 640,622 1,235,178 3,505,391
属性数量 157,370 45,190 26,723 229.283
InstanceOf 7,584,931 1,449,925 3,032,515 12,067,371
SubClassOf 2,784 191,577 555,538 749,899

跨语言连接(概念/实例)

百度 中文维基 英文维基
百度 - 10,216/336,890 4,846/303,108
中文维基 10,216/336,890 - 28,921/454,579
英文维基 4,846/303,108 28,921/454,579 -

42. 清华大学人工智能技术系列报告: link
每年会出AI领域相关的报告,内容包含

43.自然语言生成方面:

44.: jiebahanlp就不必介绍了吧。

45.NLP太难了系列: github

46.自动对联数据及机器人:
70万对联数据 link
代码 link

上联 下联
殷勤怕负三春意 潇洒难书一字愁
如此清秋何吝酒 这般明月不须钱

47.用户名黑名单列表: github 包含了用户名禁用列表,比如: link

administrator
administration
autoconfig
autodiscover
broadcasthost
domain
editor
guest
host
hostmaster
info
keybase.txt
localdomain
localhost
master
mail
mail0
mail1

48.罪名法务名词及分类模型: github

包含856项罪名知识图谱, 基于280万罪名训练库的罪名预测,基于20W法务问答对的13类问题分类与法律资讯问答功能

49.微信公众号语料: github

3G语料,包含部分网络抓取的微信公众号的文章,已经去除HTML,只包含了纯文本。每行一篇,是JSON格式,name是微信公众号名字,account是微信公众号ID,title是题目,content是正文

50.cs224n深度学习自然语言处理课程:link

51.中文手写汉字识别:github

52.中文自然语言处理 语料/数据集:github 竞品:THUOCL(THU Open Chinese Lexicon)中文词库

53.变量命名神器:github link

54.分词语料库+代码:百度网盘链接

55. NLP新书推荐《Natural Language Processing》by Jacob Eisenstein: link

56. 任务型对话英文数据集: github
【最全任务型对话数据集】主要介绍了一份任务型对话数据集大全,这份数据集大全涵盖了到目前在任务型对话领域的所有常用数据集的主要信息。此外,为了帮助研究者更好的把握领域进展的脉络,我们以Leaderboard的形式给出了几个数据集上的State-of-the-art实验结果。

57. ASR 语音数据集 + 基于深度学习的中文语音识别系统: github

58. 笑声检测器: github

59. Microsoft多语言数字/单位/如日期时间识别包: [github](https://github.com/Microsoft/Recognizers-Text

60. chinese-xinhua 中华新华字典数据库及api,包括常用歇后语、成语、词语和汉字 github

61. 文档图谱自动生成 github

62. SpaCy 中文模型 github

63. Common Voice语音识别数据集新版 link

64. 神经网络关系抽取 pytorch github

65. 基于bert的命名实体识别 pytorch github

66. 关键词(Keyphrase)抽取包 pke github
pke: an open source python-based keyphrase extraction toolkit

67. 基于医疗领域知识图谱的问答系统 github

68. 基于依存句法与语义角色标注的事件三元组抽取 github

69. 依存句法分析4万句高质量标注数据 by 苏州大学汉语依存树库(SUCDT) Homepage 数据下载详见homepage底部,需要签署协议,需要邮件接收解压密码。

70. cnocr:用来做中文OCR的Python3包,自带了训练好的识别模型 github

71. 中文人物关系知识图谱项目 github

72. 中文nlp竞赛项目及代码汇总 github

73. 中文字符数据 github

74. speech-aligner: 从“人声语音”及其“语言文本”,产生音素级别时间对齐标注的工具 github

75. AmpliGraph: 知识图谱表示学习(Python)库:知识图谱概念链接预测 github

76. Scattertext 文本可视化(python) github

77. 语言/知识表示工具:BERT & ERNIE github

78. 中文对比英文自然语言处理NLP的区别综述 link

79. Synonyms中文近义词工具包 github

80. HarvestText领域自适应文本挖掘工具(新词发现-情感分析-实体链接等) github

81. word2word:(Python)方便易用的多语言词-词对集:62种语言/3,564个多语言对 github

82. 语音识别语料生成工具:从具有音频/字幕的在线视频创建自动语音识别(ASR)语料库 github

83. ASR语音大辞典/词典: github

84. 构建医疗实体识别的模型,包含词典和语料标注,基于python: github

85. 单文档非监督的关键词抽取: github

86. Kashgari中使用gpt-2语言模型 github

87. 开源的金融投资数据提取工具 github

88. 文本自动摘要库TextTeaser: 仅支持英文 github

89. 人民日报语料处理工具集 github

90. 一些关于自然语言的基本模型 github

91. 基于14W歌曲知识库的问答尝试,功能包括歌词接龙,已知歌词找歌曲以及歌曲歌手歌词三角关系的问答 github

92. 基于Siamese bilstm模型的相似句子判定模型,提供训练数据集和测试数据集 github

93. 用Transformer编解码模型实现的根据Hacker News文章标题自动生成评论 github

94. 用BERT进行序列标记和文本分类的模板代码 github

95. LitBank:NLP数据集——支持自然语言处理和计算人文学科任务的100部带标记英文小说语料 github

96. 百度开源的基准信息抽取系统 github

97. 虚假新闻数据集 fake news corpus github

98. Facebook: LAMA语言模型分析,提供Transformer-XL/BERT/ELMo/GPT预训练语言模型的统一访问接口 github

99. CommonsenseQA:面向常识的英文QA挑战 link

100. 中文知识图谱资料、数据及工具 github

101. 各大公司内部里大牛分享的技术文档 PDF 或者 PPT github

102. 自然语言生成SQL语句(英文) github

103. 中文NLP数据增强(EDA)工具 github

104. 基于医药知识图谱的智能问答系统 github

105. 京东商品知识图谱 github

106. 基于mongodb存储的军事领域知识图谱问答项目 github

107. 基于远监督的中文关系抽取 github

108. 语音情感分析 github

109. 中文ULMFiT 情感分析 文本分类 语料及模型 github

110. 一个拍照做题程序。输入一张包含数学计算题的图片,输出识别出的数学计算式以及计算结果 github

111. 世界各国大规模人名库 github

112. 一个利用有趣中文语料库 qingyun 训练出来的中文聊天机器人 github

113. 中文聊天机器人, 根据自己的语料训练出自己想要的聊天机器人,可以用于智能客服、在线问答、智能聊天等场景 github

114. 省市区镇行政区划数据带拼音标注 github

115. 教育行业新闻 自动文摘 语料库 github

116. 开放了对话机器人、知识图谱、语义理解、自然语言处理工具及数据 github

117. 中文知识图谱:基于百度百科中文页面,抽取三元组信息,构建中文知识图谱 github

118. masr: 中文语音识别,提供预训练模型,高识别率 github

119. Python音频数据增广库 github

120. 中文全词覆盖BERT及两份阅读理解数据 github

121. ConvLab:开源多域端到端对话系统平台 github

122. 中文自然语言处理数据集 github

123. 基于最新版本rasa搭建的对话系统 github

124. 基于TensorFlow和BERT的管道式实体及关系抽取 github

125. 一个小型的证券知识图谱/知识库 github

126. 复盘所有NLP比赛的TOP方案 github

127. OpenCLaP:多领域开源中文预训练语言模型仓库 github 包含如下语言模型及百度百科数据

128. UER:基于不同语料、编码器、目标任务的中文预训练模型仓库(包括BERT、GPT、ELMO等) github

129. 中文自然语言处理向量合集 github

130. 基于金融-司法领域(兼有闲聊性质)的聊天机器人 github

131. g2pC:基于上下文的汉语读音自动标记模块 github

132. Zincbase 知识图谱构建工具包 github

133. 诗歌质量评价/细粒度情感诗歌语料库 github

134. 快速转化「中文数字」和「阿拉伯数字」 github

135. 百度知道问答语料库 github

136. 基于知识图谱的问答系统 github

137. jieba_fast 加速版的jieba github

138. 正则表达式教程 github

139. 中文阅读理解数据集 github

140. 基于BERT等最新语言模型的抽取式摘要提取 github

141. Python利用深度学习进行文本摘要的综合指南 link

142. 知识图谱深度学习相关资料整理 github

143. 维基大规模平行文本语料 github

144. StanfordNLP 0.2.0:纯Python版自然语言处理包 link

145. NeuralNLP-NeuralClassifier:腾讯开源深度学习文本分类工具 github

146. 端到端的封闭域对话系统 github

147. 中文命名实体识别:NeuroNER vs. BertNER github

148. 新闻事件线索抽取 github

149. 2019年百度的三元组抽取比赛,“科学空间队”源码(第7名) github

150. 基于依存句法的开放域文本知识三元组抽取和知识库构建 github

151. 中文的GPT2训练代码 github

152. ML-NLP - 机器学习(Machine Learning)、NLP面试中常考到的知识点和代码实现 github

153. nlp4han:中文自然语言处理工具集(断句/分词/词性标注/组块/句法分析/语义分析/NER/N元语法/HMM/代词消解/情感分析/拼写检查 github

154. XLM:Facebook的跨语言预训练语言模型 github

155. 用基于BERT的微调和特征提取方法来进行知识图谱百度百科人物词条属性抽取 github

156. 中文自然语言处理相关的开放任务,数据集, 以及当前最佳结果 github

157. CoupletAI - 基于CNN+Bi-LSTM+Attention 的自动对对联系统 github

158. 抽象知识图谱,目前规模50万,支持名词性实体、状态性描述、事件性动作进行抽象 github

159. MiningZhiDaoQACorpus - 580万百度知道问答数据挖掘项目 github

160. brat rapid annotation tool: 序列标注工具 link

161. 大规模中文知识图谱数据::1.4亿实体 github

162. 数据增强在机器翻译及其他nlp任务中的应用及效果 link

163. allennlp阅读理解:支持多种数据和模型 github

164. PDF表格数据提取工具 github

165. Graphbrain:AI开源软件库和科研工具,目的是促进自动意义提取和文本理解以及知识的探索和推断 github

166. 简历自动筛选系统 github

167. 基于命名实体识别的简历自动摘要 github

168. 中文语言理解测评基准,包括代表性的数据集&基准模型&语料库&排行榜 github

169. 树洞 OCR 文字识别 github

171. 语声迁移 github

172. Python口语自然语言处理工具集(英文) github

173. similarity:相似度计算工具包,java编写 github

174. 海量中文预训练ALBERT模型 github

175. Transformers 2.0 github

176. 基于大规模音频数据集Audioset的音频增强 github

177. Poplar:网页版自然语言标注工具 github

178. 图片文字去除,可用于漫画翻译 github

179. 186种语言的数字叫法库 github

180. Amazon发布基于知识的人-人开放领域对话数据集 github

181. 中文文本纠错模块代码 github

182. 繁简体转换 github

183. Python实现的多种文本可读性评价指标 github

184. 类似于人名/地名/组织机构名的命名体识别数据集 github

185. 东南大学《知识图谱》研究生课程(资料) github

186. 英文拼写检查库 github

from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])

for word in misspelled:
    # Get the one `most likely` answer
    print(spell.correction(word))

    # Get a list of `likely` options
    print(spell.candidates(word))

187. wwsearch是企业微信后台自研的全文检索引擎 github

188. CHAMELEON:深度学习新闻推荐系统元架构 github

189. 8篇论文梳理BERT相关模型进展与反思 github

190. DocSearch:免费文档搜索引擎 github

191. LIDA:轻量交互式对话标注工具 github

192. aili - the fastest in-memory index in the East 东半球最快并发索引 github

193. 知识图谱车音工作项目 github

194. 自然语言生成资源大全 github

195. 中日韩分词库mecab的Python接口库 github

196. 中文文本摘要/关键词提取 github

197. 汉字字符特征提取器 (featurizer),提取汉字的特征(发音特征、字形特征)用做深度学习的特征 github

198. 中文生成任务基准测评 github

199. 中文缩写数据集 github

200. 中文任务基准测评 - 代表性的数据集-基准(预训练)模型-语料库-baseline-工具包-排行榜 github

201. PySS3:面向可解释AI的SS3文本分类器机器可视化工具 github

202. 中文NLP数据集列表 github

203. COPE - 格律诗编辑程序 github

204. doccano:基于网页的开源协同多语言文本标注工具 github

205. PreNLP:自然语言预处理库 github

206. 简单的简历解析器,用来从简历中提取关键信息 github

207. 用于中文闲聊的GPT2模型:GPT2-chitchat github

208. 基于检索聊天机器人多轮响应选择相关资源列表(Leaderboards、Datasets、Papers) github

209. (Colab)抽象文本摘要实现集锦(教程 github

210. 词语拼音数据 github

211. 高效模糊搜索工具 github

212. NLP数据增广资源集 github

213. 微软对话机器人框架 github

214. GitHub Typo Corpus:大规模GitHub多语言拼写错误/语法错误数据集 github

215. TextCluster:短文本聚类预处理模块 Short text cluster github

216. 面向语音识别的中文文本规范化 github

217. BLINK:最先进的实体链接库 github

218. BertPunc:基于BERT的最先进标点修复模型 github

219. Tokenizer:快速、可定制的文本词条化库 github

220. 中文语言理解测评基准,包括代表性的数据集、基准(预训练)模型、语料库、排行榜 github

221. spaCy 医学文本挖掘与信息提取 github

222. NLP任务示例项目代码集 github

223. python拼写检查库 github

224. chatbot-list - 行业内关于智能客服、聊天机器人的应用和架构、算法分享和介绍 github

225. 语音质量评价指标(MOSNet, BSSEval, STOI, PESQ, SRMR) github

226. 用138GB语料训练的法文RoBERTa预训练语言模型 link

227. BERT-NER-Pytorch:三种不同模式的BERT中文NER实验 github

228. 无道词典 - 有道词典的命令行版本,支持英汉互查和在线查询 github

229. 2019年NLP亮点回顾 download

230. Chinese medical dialogue data 中文医疗对话数据集 github

231. 最好的汉字数字(中文数字)-阿拉伯数字转换工具 github

232. 基于百科知识库的中文词语多词义/义项获取与特定句子词语语义消歧 github

233. awesome-nlp-sentiment-analysis - 情感分析、情绪原因识别、评价对象和评价词抽取 github

234. LineFlow:面向所有深度学习框架的NLP数据高效加载器 github

235. 中文医学NLP公开资源整理 github

236. MedQuAD:(英文)医学问答数据集 github

237. 将自然语言数字串解析转换为整数和浮点数 github

238. Transfer Learning in Natural Language Processing (NLP) youtube

239. 面向语音识别的中文/英文发音辞典 github

240. Tokenizers:注重性能与多功能性的最先进分词器 github

241. CLUENER 细粒度命名实体识别 Fine Grained Named Entity Recognition github

242. 基于BERT的中文命名实体识别 github

243. 中文谣言数据库 github

244. NLP数据集/基准任务大列表 github

245. nlp相关的一些论文及代码, 包括主题模型、词向量(Word Embedding)、命名实体识别(NER)、文本分类(Text Classificatin)、文本生成(Text Generation)、文本相似性(Text Similarity)计算等,涉及到各种与nlp相关的算法,基于keras和tensorflow github

246. Python文本挖掘/NLP实战示例 github

247. Blackstone:面向非结构化法律文本的spaCy pipeline和NLP模型 github

248. 通过同义词替换实现文本“变脸” github

249. 中文 预训练 ELECTREA 模型: 基于对抗学习 pretrain Chinese Model github

250. albert-chinese-ner - 用预训练语言模型ALBERT做中文NER github

251. 基于GPT2的特定主题文本生成/文本增广 github

252. 开源预训练语言模型合集 github

253. 多语言句向量包 github

254. 编码、标记和实现:一种可控高效的文本生成方法 github

255. 英文脏话大列表 github

256. attnvis:GPT2、BERT等transformer语言模型注意力交互可视化 github

257. CoVoST:Facebook发布的多语种语音-文本翻译语料库,包括11种语言(法语、德语、荷兰语、俄语、西班牙语、意大利语、土耳其语、波斯语、瑞典语、蒙古语和中文)的语音、文字转录及英文译文 github

258. Jiagu自然语言处理工具 - 以BiLSTM等模型为基础,提供知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类等功能 github

259. 用unet实现对文档表格的自动检测,表格重建 github

260. NLP事件提取文献资源列表 github

261. 金融领域自然语言处理研究资源大列表 github

262. CLUEDatasetSearch - 中英文NLP数据集:搜索所有中文NLP数据集,附常用英文NLP数据集 github

263. medical_NER - 中文医学知识图谱命名实体识别 github

264. (哈佛)讲因果推理的免费书 pdf

265. 知识图谱相关学习资料/数据集/工具资源大列表 github

266. Forte:灵活强大的自然语言处理pipeline工具集 github

267. Python字符串相似性算法库 github

268. PyLaia:面向手写文档分析的深度学习工具包 github

269. TextFooler:针对文本分类/推理的对抗文本生成模块 github

270. Haystack:灵活、强大的可扩展问答(QA)框架 github

271. 中文关键短语抽取工具 github

272. pdf文档解析相关工具包

273. 中文词语相似度计算方法 gihtub

274. 人民日报语料库处理工具集 github

275. stanza:斯坦福团队NLP工具 github

276. 一个大规模医疗对话数据集 github

277. 新冠肺炎相关数据

278. DGL-KE 图嵌入表示学习算法 github

279. nlp-recipes:微软出品--自然语言处理最佳实践和范例 github

280. chinese_keyphrase_extractor (CKPE) - A tool for chinese keyphrase extraction 一个快速从自然语言文本中提取和识别关键短语的工具 github

281. 使用GAN生成表格数据(仅支持英文) github

282. Google发布Taskmaster-2自然语言任务对话数据集 github

283. BDCI2019金融负面信息判定 github

284. 用神经网络符号推理求解复杂数学方程 github

285. 粤语/英语会话双语语料库 github

286. 中文ELECTRA预训练模型 github

287. 面向深度学习研究人员的自然语言处理实例教程 github

288. Parakeet:基于PaddlePaddle的文本-语音合成 github

289. 103976个英语单词库(sql版,csv版,Excel版)包 github

290. 《海贼王》知识图谱 github

291. 法务智能文献资源列表 github

292. Datasaur.ai 在线数据标注工作流管理工具 link

293. (Java)准确的语音自然语言检测库 github

294. 面向各语种/任务的BERT模型大列表/搜索引擎 link

295. CoVoST:Facebook发布的多语种语音-文本翻译语料库 github

296. 基于预训练模型的中文关键词抽取方法 github

297. Fancy-NLP:用于建设商品画像的文本知识挖掘工具 github

298. 基于百度webqa与dureader数据集训练的Albert Large QA模型 github

299. BERT/CRF实现的命名实体识别 github

300. ssc, Sound Shape Code, 音形码 - 基于“音形码”的中文字符串相似度计算方法

301. 中文指代消解数据 github

302. 全面简便的中文 NLP 工具包 github

303. 中文地址分词(地址元素识别与抽取),通过序列标注进行NER github

304. 用Transformers(BERT, XLNet, Bart, Electra, Roberta, XLM-Roberta)预测下一个词(模型比较) github

305. 文本机器学习模型最先进解释器库 github

306. 多文档摘要数据集 github

307. 用记事本渲染3D图像 github

308. char_featurizer - 汉字字符特征提取工具 github

309. SimBERT - 基于UniLM思想、融检索与生成于一体的BERT模型 github

310. Python音频特征提取包 github

311. TensorFlow 2 实现的文本语音合成 github

312. 情感分析技术:让智能客服更懂人类情感 github

313. TensorFlow Hub最新发布40+种语言的新语言模型(包括中文) link

314. 汉字字符特征提取器 (featurizer),提取汉字的特征(发音特征、字形特征)用做深度学习的特征 github

315. 工业界常用基于DSSM向量化召回pipeline复现 github

316. 不存在的词:用GPT-2变体从头生成新词及其定义、例句 github

317. TextAttack:自然语言处理模型对抗性攻击框架 github

318. 仇恨言论检测进展 link

319. OPUS-100:以英文为中心的多语(100种)平行语料 github

320. 从论文中提取表格数据 github

321. 让人人都变得“彬彬有礼”:礼貌迁移任务——在保留意义的同时将非礼貌语句转换为礼貌语句,提供包含1.39M + 实例的数据集 paper and code

322. 用BERT在表格中寻找答案 github

323. PyTorch实现的BERT事件抽取(ACE 2005 corpus) github

324. 表格问答的系列文章

325. LibKGE:面向可复现研究的知识图谱嵌入库 github

326. comparxiv :用于比较arXiv上两提交版本差异的命令 pypi

327. ViSQOL:音频质量感知客观、完整参考指标,分音频、语音两种模式 github

328. 方面情感分析包 github

329. dstlr:非结构化文本可扩展知识图谱构建平台 github

330. 由文本自动生成多项选择题 github

331. 大规模跨领域中文任务导向多轮对话数据集及模型CrossWOZ paper & data

332. whatlies:词向量交互可视化 spacy 工具

333. 支持批并行的LatticeLSTM中文命名实体识别 github

334. 基于Albert、Electra,用维基百科文本作为上下文的问答引擎 github

335. Deepmatch:针对推荐、广告和搜索的深度匹配模型库 github

336. 语音工具合集

337. 多音字词典数据及代码 github

338. audio:面向语音行为检测、二值化、说话人识别、自动语音识别、情感识别等任务的音频标注工具 github

339. 大规模、结构化、中英文双语的新冠知识图谱(COKG-19) link

340. 132个知识图谱的数据集 link

341. 42GB的JD客服对话数据(CSDD) github

342. 合成数据生成基准 github

343. 汉字、词语、成语查询接口 github

344. 中文问题句子相似度计算比赛及方案汇总 github

345. Texthero:文本数据高效处理包,包括预处理、关键词提取、命名实体识别、向量空间分析、文本可视化等 github

346. SIMPdf:Python写的简单PDF文件文字编辑器 github

347. 《配色辞典》数据集 github

348. carefree-learn:(PyTorch)表格数据集自动化机器学习(AutoML)包 github

349. token2index:与PyTorch/Tensorflow兼容的强大轻量词条索引库 github

350. 开源对话式信息搜索平台 github

351. 对联数据 github

352. 基于Pytorch的Bert应用,包括命名实体识别、情感分析、文本分类以及文本相似度等 github

353. TaBERT:理解表格数据查询的新模型 paper

354. Dakshina数据集:十二种南亚语言的拉丁/本地文字平行数据集合 github

355. NLP标注平台综述 github

356. 封闭域微调表格检测 github

357. 深度学习情感文本语音合成 github

358. 中文写作校对工具 github

359. 用Quora问题对训练的T5问题意译(Paraphrase) github

360. 情境互动多模态对话挑战2020(DSTC9 2020) github

361. nlpgnn:图神经网络自然语言处理工具箱 github

362. Macadam:以Tensorflow(Keras)和bert4keras为基础,专注于文本分类、序列标注和关系抽取的自然语言处理工具包 github

363. 用新版nlp库加载17GB+英文维基语料只占用9MB内存遍历速度2-3 Gbit/s github