aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--README.md52
-rw-r--r--TODO.norg2
-rw-r--r--cpp/libchelleport.cpp41
-rw-r--r--include/libchelleport.h15
-rw-r--r--src/Chelleport/OCR.hs7
5 files changed, 94 insertions, 23 deletions
diff --git a/README.md b/README.md
index e26a626..8ddf74f 100644
--- a/README.md
+++ b/README.md
@@ -1,35 +1,57 @@
# Chelleport
-Control your mouse pointer with your keyboard
+Control your mouse pointer entirely with your keyboard.
-> Note: So far it only supports Linux running X11 display server with a compositor, because that's what I use. Might look into supporting more systems if there is interest.
+> Note: Current it only supports Linux running X11 display server with a compositor, because that's what I use. Might look into supporting more systems if there is interest.
+
+
+## Modes
+- **Labelled Hints mode (default. `ctrl+h`)**: Displays a grid overlay on your screen, where each cell is labeled with a unique two-key combination. Press the corresponding keys to move the cursor to the desired cell.
+- **Text Search mode (`ctrl+s`)**: Uses OCR to identify and highlight words on the screen, allowing you to search for text and move the cursor directly to matching text.
-https://github.com/user-attachments/assets/93ddc1ff-6cbe-4be4-9507-d68de880212a
## Features
-- **Text search mode**: Pressing `<c-s>` puts you in search mode which uses OCR to find words on the screen that you can search and move your cursor to.
-- **Labelled hints mode**: This is the default mode. It shows a grid on the screen with 2 keys for each cell. You can move to any cell by pressing the keys shown.
-- **Click**: Pressing `space` left clicks at current mouse position. Holding `shift` key left clicks and show the grid again.
-- **Select text/Drag-n-drop**: Pressing `Ctrl+V` starts dragging/selecting/holding down left mouse button. Press `space` to stop dragging. Or press `Ctrl+V` again to stop dragging and show the grid again.
-- **Double click**: Pressing `2` followed by `space` will click twice. Any digit key followed by `space` will click that many times.
-- **Right click**: Pressing `minus` key right clicks at current mouse position. Holding `shift` key right clicks and shows the grid again.
-- **Granular movement**: Once you match with a label on the screen, you can use `hjkl` keys to move your cursor. Holding `shift` key will use bigger steps for movements. You can also repeat movement by pressing a digit before the movement. Eg: `5k` moves 5 small steps up. `5K` moves 5 big steps up.
+- **Search by text**:
+ - Use OCR to locate any visible text on the screen and position your cursor precisely.
+- **Click**:
+ - Press `space` left clicks at current mouse position.
+ - Press `shift+space` left clicks and show the grid again.
+- **Select text/Drag-n-drop**:
+ - Press `ctrl+v` starts dragging/selecting/holding down left mouse button.
+ - Press `space` to stop dragging.
+ - Press `ctrl+v` again to stop dragging and show the grid again.
+- **Double click**:
+ - Press `2` followed by `space` will click twice.
+ - Any digit key followed by `space` will click that many times.
+- **Right click**:
+ - Pressing `minus` key right clicks at current mouse position.
+ - Holding `shift` key right clicks and shows the grid again.
+- **Granular movement**:
+ - Once you match with a label on the screen, you can use `hjkl` keys to move your cursor.
+ - Holding `shift` key will use bigger steps for movements.
+ - You can also repeat movement by pressing a digit before the movement. Eg: `5k` moves 5 small steps up. `5K` moves 5 big steps up.
+
+
+https://github.com/user-attachments/assets/93ddc1ff-6cbe-4be4-9507-d68de880212a
## Install
-- Clone the repo and build it yourself: `cabal build chelleport`
+- Clone the repo and build it yourself: `cabal build chelleport` or `nix build`
- Nix flakes users can try it out by running: `nix run github:phenax/chelleport#chelleport`
## Usage
Use [sxhkd](https://github.com/baskerville/sxhkd), [shotkey](https://github.com/phenax/shotkey), your window manager or any other key binding manager to set up a keybinding for `chelleport`.
-### Hints mode (default. `<c-h>` to switch to hints mode)
+### Hints mode (`ctrl-h` to switch to hints mode)
- With the grid open, type any of the key sequences shown on the grid to move the pointer there
-- Once a match is found, you can now use `hjkl` keys to make smaller movements. Hold `shift` to move in bigger increments.
+- Once a match is found, you can now use `hjkl` keys to make smaller movements. Hold `shift` + `hjkl` to move in bigger increments.
- Press `space` to click
-### Search mode (`<c-s>` to switch to search mode)
+### Search mode (`ctrl-s` to switch to search mode)
- Words that are recognized by OCR will be highlighted
- Type the characters in one of the words to move the cursor to it
-- Press `<c-n>` & `<c-p>` to go to next/previous match respectively
+- Press `ctrl-n` & `ctrl-p` to go to next/previous match respectively
+
+## Feedback and Support
+Interested in extending platform compatibility or new features? Let me know! Contributions and suggestions are welcome.
diff --git a/TODO.norg b/TODO.norg
index b13d6e9..1ed3db6 100644
--- a/TODO.norg
+++ b/TODO.norg
@@ -1,7 +1,6 @@
* Current
- ( ) Optimize speed of ocr
--- Load incrementally?
- - ( ) Preprocessing screenshot for better ocr
- ( ) Add hjkl for search mode
- ( ) Middle click
@@ -12,4 +11,5 @@
* Maybe
- ( ) Scroll
+ - ( ) Configuration
- ( ) Process mode? Run in bg with root key binding to toggle
diff --git a/cpp/libchelleport.cpp b/cpp/libchelleport.cpp
index 4ec3599..5653068 100644
--- a/cpp/libchelleport.cpp
+++ b/cpp/libchelleport.cpp
@@ -19,6 +19,8 @@ OCRMatch *findWordCoordinates(const char *image_path, int *size) {
// for (const auto &match : matches)
// showMatch(match);
+ printf("Count: %ld\n", matches.size());
+
*size = matches.size();
return ptr;
}
@@ -38,12 +40,16 @@ std::vector<OCRMatch> extractTextCoordinates(const char *imagePath) {
return results;
}
+ preprocessImage(&image);
+
+ // printf("imagePath: %s\n", imagePath);
+ // pixWrite(imagePath, image, IFF_JFIF_JPEG);
+
tesseract->SetImage(image);
tesseract->Recognize(0);
tesseract::ResultIterator *iterator = tesseract->GetIterator();
auto level = RESULT_ITER_MODE;
- int x1, y1, x2, y2;
if (iterator != 0) {
do {
@@ -52,8 +58,11 @@ std::vector<OCRMatch> extractTextCoordinates(const char *imagePath) {
if (conf > CONFIDENCE_THRESHOLD && word != nullptr &&
strlen(word) >= MIN_CHARACTER_COUNT) {
+ int x1, y1, x2, y2;
iterator->BoundingBox(level, &x1, &y1, &x2, &y2);
- results.push_back(OCRMatch{x1, y1, x2, y2, word});
+ results.push_back(
+ OCRMatch{(int)(x1 / scaleFactor), (int)(y1 / scaleFactor),
+ (int)(x2 / scaleFactor), (int)(y2 / scaleFactor), word});
}
} while (iterator->Next(level));
}
@@ -66,6 +75,34 @@ std::vector<OCRMatch> extractTextCoordinates(const char *imagePath) {
return results;
}
+void preprocessImage(Pix **image) {
+ Pix *temp;
+
+ // Scale
+ if (scaleFactor != 1) {
+ temp = pixScale(*image, scaleFactor, scaleFactor);
+ pixDestroy(image);
+ *image = temp;
+ }
+
+ // Grayscale
+ if (pixGetDepth(*image) > 8) {
+ temp = pixConvertRGBToGray(*image, grayscaleWeightRed, grayscaleWeightGreen,
+ grayscaleWeightBlue);
+ pixDestroy(image);
+ *image = temp;
+ }
+
+ // Contrast
+ pixContrastTRC(*image, *image, contrast);
+
+ // Sharpness
+ // temp = pixUnsharpMaskingGrayFast(*image, 1, sharpness, 1);
+ temp = pixUnsharpMasking(*image, 1, sharpness);
+ pixDestroy(image);
+ *image = temp;
+}
+
void showMatch(const OCRMatch &match) {
std::cout << "Text: " << match.text << "; Position: (" << match.startX << ","
<< match.startY << ") -> (" << match.endX << "," << match.endY
diff --git a/include/libchelleport.h b/include/libchelleport.h
index ef693cb..c74058d 100644
--- a/include/libchelleport.h
+++ b/include/libchelleport.h
@@ -1,3 +1,4 @@
+#include <leptonica/allheaders.h>
#include <tesseract/publictypes.h>
#include <vector>
@@ -8,11 +9,19 @@ struct OCRMatch {
const char *text;
};
+// OCR configuration
#define CONFIDENCE_THRESHOLD 25.
-#define MIN_CHARACTER_COUNT 2
-
+#define MIN_CHARACTER_COUNT 3
const tesseract::PageIteratorLevel RESULT_ITER_MODE = tesseract::RIL_WORD;
+// Preprocessing configuration
+const float contrast = 0.3;
+const float sharpness = 0.7;
+const float scaleFactor = 1;
+const float grayscaleWeightRed = 0.114;
+const float grayscaleWeightGreen = 0.587;
+const float grayscaleWeightBlue = 0.299;
+
extern "C" {
OCRMatch *findWordCoordinates(const char *image_path, /* returns */ int *size);
}
@@ -20,3 +29,5 @@ OCRMatch *findWordCoordinates(const char *image_path, /* returns */ int *size);
std::vector<OCRMatch> extractTextCoordinates(const char *imagePath);
void showMatch(const OCRMatch &match);
+
+void preprocessImage(Pix **image);
diff --git a/src/Chelleport/OCR.hs b/src/Chelleport/OCR.hs
index ef9dc9e..87cad62 100644
--- a/src/Chelleport/OCR.hs
+++ b/src/Chelleport/OCR.hs
@@ -30,8 +30,9 @@ instance (MonadIO m) => MonadOCR (AppM m) where
threadDelay 20_000
pure path
- getWordsInImage filePath = do
- liftIO $ findWordCoordinates filePath <* removeFile filePath
+ getWordsInImage filePath = liftIO $ do
+ print filePath
+ findWordCoordinates filePath <* removeFile filePath
findWordCoordinates :: String -> IO [OCRMatch]
findWordCoordinates imgPath = alloca $ \sizePtr -> do
@@ -43,7 +44,7 @@ findWordCoordinates imgPath = alloca $ \sizePtr -> do
createTemporaryScreenshot :: DrawContext -> (CInt, CInt) -> (CInt, CInt) -> IO String
createTemporaryScreenshot ctx offset size = do
- tmpFilePath <- emptySystemTempFile "chelleport-screenshot.png"
+ tmpFilePath <- emptySystemTempFile "chelleport-screenshot.ppm"
screenshot ctx tmpFilePath offset size
pure tmpFilePath