diff options
Diffstat (limited to '')
| -rw-r--r-- | README.md | 52 | ||||
| -rw-r--r-- | TODO.norg | 2 | ||||
| -rw-r--r-- | cpp/libchelleport.cpp | 41 | ||||
| -rw-r--r-- | include/libchelleport.h | 15 | ||||
| -rw-r--r-- | src/Chelleport/OCR.hs | 7 |
5 files changed, 94 insertions, 23 deletions
@@ -1,35 +1,57 @@ # Chelleport -Control your mouse pointer with your keyboard +Control your mouse pointer entirely with your keyboard. -> Note: So far it only supports Linux running X11 display server with a compositor, because that's what I use. Might look into supporting more systems if there is interest. +> Note: Current it only supports Linux running X11 display server with a compositor, because that's what I use. Might look into supporting more systems if there is interest. + + +## Modes +- **Labelled Hints mode (default. `ctrl+h`)**: Displays a grid overlay on your screen, where each cell is labeled with a unique two-key combination. Press the corresponding keys to move the cursor to the desired cell. +- **Text Search mode (`ctrl+s`)**: Uses OCR to identify and highlight words on the screen, allowing you to search for text and move the cursor directly to matching text. -https://github.com/user-attachments/assets/93ddc1ff-6cbe-4be4-9507-d68de880212a ## Features -- **Text search mode**: Pressing `<c-s>` puts you in search mode which uses OCR to find words on the screen that you can search and move your cursor to. -- **Labelled hints mode**: This is the default mode. It shows a grid on the screen with 2 keys for each cell. You can move to any cell by pressing the keys shown. -- **Click**: Pressing `space` left clicks at current mouse position. Holding `shift` key left clicks and show the grid again. -- **Select text/Drag-n-drop**: Pressing `Ctrl+V` starts dragging/selecting/holding down left mouse button. Press `space` to stop dragging. Or press `Ctrl+V` again to stop dragging and show the grid again. -- **Double click**: Pressing `2` followed by `space` will click twice. Any digit key followed by `space` will click that many times. -- **Right click**: Pressing `minus` key right clicks at current mouse position. Holding `shift` key right clicks and shows the grid again. -- **Granular movement**: Once you match with a label on the screen, you can use `hjkl` keys to move your cursor. Holding `shift` key will use bigger steps for movements. You can also repeat movement by pressing a digit before the movement. Eg: `5k` moves 5 small steps up. `5K` moves 5 big steps up. +- **Search by text**: + - Use OCR to locate any visible text on the screen and position your cursor precisely. +- **Click**: + - Press `space` left clicks at current mouse position. + - Press `shift+space` left clicks and show the grid again. +- **Select text/Drag-n-drop**: + - Press `ctrl+v` starts dragging/selecting/holding down left mouse button. + - Press `space` to stop dragging. + - Press `ctrl+v` again to stop dragging and show the grid again. +- **Double click**: + - Press `2` followed by `space` will click twice. + - Any digit key followed by `space` will click that many times. +- **Right click**: + - Pressing `minus` key right clicks at current mouse position. + - Holding `shift` key right clicks and shows the grid again. +- **Granular movement**: + - Once you match with a label on the screen, you can use `hjkl` keys to move your cursor. + - Holding `shift` key will use bigger steps for movements. + - You can also repeat movement by pressing a digit before the movement. Eg: `5k` moves 5 small steps up. `5K` moves 5 big steps up. + + +https://github.com/user-attachments/assets/93ddc1ff-6cbe-4be4-9507-d68de880212a ## Install -- Clone the repo and build it yourself: `cabal build chelleport` +- Clone the repo and build it yourself: `cabal build chelleport` or `nix build` - Nix flakes users can try it out by running: `nix run github:phenax/chelleport#chelleport` ## Usage Use [sxhkd](https://github.com/baskerville/sxhkd), [shotkey](https://github.com/phenax/shotkey), your window manager or any other key binding manager to set up a keybinding for `chelleport`. -### Hints mode (default. `<c-h>` to switch to hints mode) +### Hints mode (`ctrl-h` to switch to hints mode) - With the grid open, type any of the key sequences shown on the grid to move the pointer there -- Once a match is found, you can now use `hjkl` keys to make smaller movements. Hold `shift` to move in bigger increments. +- Once a match is found, you can now use `hjkl` keys to make smaller movements. Hold `shift` + `hjkl` to move in bigger increments. - Press `space` to click -### Search mode (`<c-s>` to switch to search mode) +### Search mode (`ctrl-s` to switch to search mode) - Words that are recognized by OCR will be highlighted - Type the characters in one of the words to move the cursor to it -- Press `<c-n>` & `<c-p>` to go to next/previous match respectively +- Press `ctrl-n` & `ctrl-p` to go to next/previous match respectively + +## Feedback and Support +Interested in extending platform compatibility or new features? Let me know! Contributions and suggestions are welcome. @@ -1,7 +1,6 @@ * Current - ( ) Optimize speed of ocr --- Load incrementally? - - ( ) Preprocessing screenshot for better ocr - ( ) Add hjkl for search mode - ( ) Middle click @@ -12,4 +11,5 @@ * Maybe - ( ) Scroll + - ( ) Configuration - ( ) Process mode? Run in bg with root key binding to toggle diff --git a/cpp/libchelleport.cpp b/cpp/libchelleport.cpp index 4ec3599..5653068 100644 --- a/cpp/libchelleport.cpp +++ b/cpp/libchelleport.cpp @@ -19,6 +19,8 @@ OCRMatch *findWordCoordinates(const char *image_path, int *size) { // for (const auto &match : matches) // showMatch(match); + printf("Count: %ld\n", matches.size()); + *size = matches.size(); return ptr; } @@ -38,12 +40,16 @@ std::vector<OCRMatch> extractTextCoordinates(const char *imagePath) { return results; } + preprocessImage(&image); + + // printf("imagePath: %s\n", imagePath); + // pixWrite(imagePath, image, IFF_JFIF_JPEG); + tesseract->SetImage(image); tesseract->Recognize(0); tesseract::ResultIterator *iterator = tesseract->GetIterator(); auto level = RESULT_ITER_MODE; - int x1, y1, x2, y2; if (iterator != 0) { do { @@ -52,8 +58,11 @@ std::vector<OCRMatch> extractTextCoordinates(const char *imagePath) { if (conf > CONFIDENCE_THRESHOLD && word != nullptr && strlen(word) >= MIN_CHARACTER_COUNT) { + int x1, y1, x2, y2; iterator->BoundingBox(level, &x1, &y1, &x2, &y2); - results.push_back(OCRMatch{x1, y1, x2, y2, word}); + results.push_back( + OCRMatch{(int)(x1 / scaleFactor), (int)(y1 / scaleFactor), + (int)(x2 / scaleFactor), (int)(y2 / scaleFactor), word}); } } while (iterator->Next(level)); } @@ -66,6 +75,34 @@ std::vector<OCRMatch> extractTextCoordinates(const char *imagePath) { return results; } +void preprocessImage(Pix **image) { + Pix *temp; + + // Scale + if (scaleFactor != 1) { + temp = pixScale(*image, scaleFactor, scaleFactor); + pixDestroy(image); + *image = temp; + } + + // Grayscale + if (pixGetDepth(*image) > 8) { + temp = pixConvertRGBToGray(*image, grayscaleWeightRed, grayscaleWeightGreen, + grayscaleWeightBlue); + pixDestroy(image); + *image = temp; + } + + // Contrast + pixContrastTRC(*image, *image, contrast); + + // Sharpness + // temp = pixUnsharpMaskingGrayFast(*image, 1, sharpness, 1); + temp = pixUnsharpMasking(*image, 1, sharpness); + pixDestroy(image); + *image = temp; +} + void showMatch(const OCRMatch &match) { std::cout << "Text: " << match.text << "; Position: (" << match.startX << "," << match.startY << ") -> (" << match.endX << "," << match.endY diff --git a/include/libchelleport.h b/include/libchelleport.h index ef693cb..c74058d 100644 --- a/include/libchelleport.h +++ b/include/libchelleport.h @@ -1,3 +1,4 @@ +#include <leptonica/allheaders.h> #include <tesseract/publictypes.h> #include <vector> @@ -8,11 +9,19 @@ struct OCRMatch { const char *text; }; +// OCR configuration #define CONFIDENCE_THRESHOLD 25. -#define MIN_CHARACTER_COUNT 2 - +#define MIN_CHARACTER_COUNT 3 const tesseract::PageIteratorLevel RESULT_ITER_MODE = tesseract::RIL_WORD; +// Preprocessing configuration +const float contrast = 0.3; +const float sharpness = 0.7; +const float scaleFactor = 1; +const float grayscaleWeightRed = 0.114; +const float grayscaleWeightGreen = 0.587; +const float grayscaleWeightBlue = 0.299; + extern "C" { OCRMatch *findWordCoordinates(const char *image_path, /* returns */ int *size); } @@ -20,3 +29,5 @@ OCRMatch *findWordCoordinates(const char *image_path, /* returns */ int *size); std::vector<OCRMatch> extractTextCoordinates(const char *imagePath); void showMatch(const OCRMatch &match); + +void preprocessImage(Pix **image); diff --git a/src/Chelleport/OCR.hs b/src/Chelleport/OCR.hs index ef9dc9e..87cad62 100644 --- a/src/Chelleport/OCR.hs +++ b/src/Chelleport/OCR.hs @@ -30,8 +30,9 @@ instance (MonadIO m) => MonadOCR (AppM m) where threadDelay 20_000 pure path - getWordsInImage filePath = do - liftIO $ findWordCoordinates filePath <* removeFile filePath + getWordsInImage filePath = liftIO $ do + print filePath + findWordCoordinates filePath <* removeFile filePath findWordCoordinates :: String -> IO [OCRMatch] findWordCoordinates imgPath = alloca $ \sizePtr -> do @@ -43,7 +44,7 @@ findWordCoordinates imgPath = alloca $ \sizePtr -> do createTemporaryScreenshot :: DrawContext -> (CInt, CInt) -> (CInt, CInt) -> IO String createTemporaryScreenshot ctx offset size = do - tmpFilePath <- emptySystemTempFile "chelleport-screenshot.png" + tmpFilePath <- emptySystemTempFile "chelleport-screenshot.ppm" screenshot ctx tmpFilePath offset size pure tmpFilePath |
